Java API Coverage Analysis¶

This section provides a practical mapping between the ArcadeDB Java API and the Python bindings surface in this repository. It reflects the current code in arcadedb_embedded rather than a theoretical, full Java surface comparison.

Executive Summary¶

The Python bindings expose the core database, schema, graph, vector, async, import/export, and server workflows needed for typical application usage. Most omissions are low-level JVM internals (WAL details, bucket scanning, binary protocol, server plugins, clustering) that are not typically used from Python.

Coverage by Area (Qualitative)¶

Area	Status	Notes
Core Database	✅ Supported	`DatabaseFactory`, `Database`, transactions, lookups, async helpers
Query Execution	✅ Supported	SQL, OpenCypher, MongoDB, GraphQL passthrough
Schema & Indexes	✅ Supported	Types, properties, LSM_TREE/HASH/FULL_TEXT/LSM_VECTOR/GEOSPATIAL indexes
Graph API	✅ Supported	SQL/OpenCypher graph workflows plus `Document`/`Vertex`/`Edge` wrapper compatibility
Vector Search	✅ Supported	JVector indexes + NumPy conversion helpers
Async Execution	✅ Supported	`AsyncExecutor` plus record-level and SQL/Cypher async flows
Data Import	✅ Supported	SQL import workflows plus a narrow `db.import_documents(...)` wrapper for document files
Data Export	✅ Supported	JSONL/GraphML/GraphSON + CSV for query results
Server Mode	✅ Supported	Embedded server lifecycle + Studio access
Advanced/Low-level	❌ Not exposed	WAL internals, binary protocol, HA/replication, plugins

Detailed Coverage¶

1. Core Database Operations¶

DatabaseFactory:

✅ create(), open(), exists()

Database:

✅ query(language, query, *args) and command(language, command, *args)
✅ Transactions: begin(), commit(), rollback(), transaction()
✅ Records: new_document(), new_vertex(), lookup_by_rid(), lookup_by_key()
✅ Utilities: count_type(), drop(), get_name(), get_database_path(), is_open(), close()
✅ Configuration: set_auto_transaction(), set_read_your_writes()
✅ Async execution: async_executor()
✅ Export helpers: export_database() and export_to_csv()

Not directly exposed: bucket scans, WAL internals, low-level binary protocol

2. Query Execution¶

All query languages supported by the underlying ArcadeDB engine can be used via db.query() and db.command():

✅ SQL
✅ OpenCypher
✅ MongoDB query syntax
✅ GraphQL

ResultSet & Results:

✅ Pythonic iteration (__iter__, __next__)
✅ has_next(), next()
✅ get(), has_property(), get_property_names()
✅ to_json(), to_dict() (Python enhancement)

3. Graph API¶

Recommended approach: SQL/OpenCypher for graph writes and traversals, with wrapper APIs available when you explicitly need record objects

Wrapper/record APIs available:

✅ db.new_vertex(type) / db.new_document(type)
✅ record.set(name, value) / record.save() / record.delete() / record.modify()
✅ vertex.new_edge(label, target, **props) (bidirectionality controlled by EdgeType schema)
✅ vertex.get_out_edges(), get_in_edges(), get_both_edges()
✅ db.lookup_by_rid(rid) for direct record access

Graph Traversals & Queries:

✅ SQL traversal: SELECT * FROM User WHERE out('Follows').name = 'Alice'
✅ OpenCypher patterns: MATCH (a:User)-[:FOLLOWS]->(b) RETURN b
✅ Path finding, shortest paths, pattern matching

Not exposed: event listeners/callback hooks, low-level graph internals

Recommended query-first approach:

# Create vertices via SQL
with db.transaction():
    db.command("sql", "INSERT INTO User SET id = 1, name = 'Alice'")
    db.command("sql", "INSERT INTO User SET id = 2, name = 'Bob'")
    db.command("sql", """
        CREATE EDGE Follows
        FROM (SELECT FROM User WHERE id = 1)
        TO (SELECT FROM User WHERE id = 2)
    """)

# Traverse via OpenCypher
result = db.query("opencypher", """
    MATCH (user:User {name: 'Alice'})-[:FOLLOWS]->(friend)
    RETURN friend.name
""")

Wrapper/object APIs still available:

with db.transaction():
    alice = db.new_vertex("Person").set("name", "Alice").save()
    bob = db.new_vertex("Person").set("name", "Bob").save()
    alice.new_edge("Follows", bob, since=date.today()).save()

4. Schema & Index API¶

Full Pythonic Schema API available via db.schema:

✅ create_document_type(), create_vertex_type(), create_edge_type()
✅ get_or_create_*() helpers
✅ create_property(), drop_property()
✅ drop_type(), exists_type(), get_type(), get_types()
✅ Indexes: create_index(), drop_index(), get_indexes(), exists_index()
✅ Vector indexes: SQL CREATE INDEX ... LSM_VECTOR, secondary/manual helper coverage, list_vector_indexes()

5. Server Mode¶

✅ ArcadeDBServer(root_path, config) - Server initialization
✅ start(), stop(), context manager support
✅ get_database(), create_database() - Database management
✅ get_studio_url(), get_http_port()
✅ Context manager support
✅ get_studio_url(), get_http_port() - Python enhancements
✅ Embedded and HTTP mode support
❌ Plugin management, HA/replication, advanced user/security management

6. Data Import¶

Supported:

✅ SQL IMPORT DATABASE for CSV/TSV documents
✅ SQL IMPORT DATABASE for CSV graph vertices and edges with ID resolution
✅ SQL IMPORT DATABASE for XML
✅ SQL IMPORT DATABASE for ArcadeDB JSONL exports
✅ SQL IMPORT DATABASE for RDF, Neo4j, Word2Vec, and timeseries scenarios covered by tests
✅ db.import_documents(...) wrapper for document-shaped file imports via the Java importer
✅ Batch processing and automatic type inference where supported by the Java importer

The importer surface is intentionally still described conservatively in this repository. Support exists, but the current repository guidance is:

bulk table/document ingest: async SQL with a single async worker
bulk graph ingest: GraphBatch
importer-based paths: available, but not the recommended default because they have shown reliability issues, including OoM failures, in larger tests

7. Data Export¶

✅ JSONL export - Full database backup format
✅ GraphML export - Graph visualization format
✅ GraphSON export - TinkerPop-compatible graph JSON
✅ CSV export of query results via export_to_csv()
✅ Type filtering via include_types / exclude_types
✅ Compression when exporting JSONL/GraphML/GraphSON (Java exporter)

8. Vector Search¶

✅ Vector index creation - SQL CREATE INDEX ... LSM_VECTOR
✅ NumPy array support - to_java_float_array(), to_python_array()
✅ Similarity search - SQL vectorNeighbors
✅ Distance functions - cosine, euclidean, inner_product
✅ Index tuning parameters (connections, beam width, quantization)
✅ Automatic indexing of existing records
✅ List vector indexes - schema.list_vector_indexes()

9. Advanced / Low-Level APIs Not Exposed¶

❌ WAL and storage internals
❌ Binary protocol and custom network stacks
❌ HA/replication, distributed clustering
❌ Server plugins and module management
❌ Custom query engines and DSLs

Design Philosophy: Query-First Approach¶

The Python bindings follow a "query-first, API-second" philosophy, which is ideal for Python developers. Instead of exposing every Java object, operations are enabled through:

SQL DDL for schema management
SQL/OpenCypher for graph and document operations
Thin helper APIs for transactions, vector search, and targeted record access

This approach is actually cleaner and more maintainable than direct API exposure:

# Python way (clean):
db.command("sql", "CREATE INDEX ON User (email) UNIQUE")
db.query("opencypher", "MATCH (a)-[:FOLLOWS]->(b) RETURN b")

# vs. hypothetical direct API (complex):
schema = db.getSchema()
type = schema.getType("User")
index_builder = schema.buildTypeIndex("User", ["email"])
index = index_builder.withUnique(true).create()

Use Case Suitability¶

Use Case	Suitable?	Notes
Embedded database in Python app	✅ Excellent	Core use case
Graph analytics with Cypher	✅ Excellent	SQL and OpenCypher supported
Document store	✅ Excellent	SQL and schema APIs
Vector similarity search	✅ Excellent	JVector + NumPy integration
Development with Studio UI	✅ Excellent	Server mode included
Data migration (CSV/XML/JSONL import)	✅ Good	SQL import workflows exercised by tests
Async bulk ingestion	✅ Good	`AsyncExecutor`
Multi-master replication	❌ Not supported	Java server only
Custom query language	❌ Not supported	Use built-in languages

Conclusion¶

These bindings cover the primary workflows most Python developers need:

Embedded multi-model database
Graph, document, vector, and time-series data
SQL and OpenCypher queries
Server mode for Studio UI and HTTP access

They intentionally do not expose low-level JVM internals, clustering, and plugin management. For those scenarios, use the Java APIs directly.

🚧 Future Work¶

SQL-level vector syntax in ArcadeDB (when available upstream)
Expanded performance benchmarks and scale testing
Continued alignment with upstream Java releases

📝 License¶

Apache License 2.0

🙏 Contributing¶

Contributions welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Run tests: python3 -m pytest tests/ -v
Submit a pull request