Skip to content

Java API Coverage

Java API Coverage Analysis

This section provides a comprehensive comparison of the ArcadeDB Java API and what's been implemented in the Python bindings.

Executive Summary

Overall Coverage: ~87% of the Java API surface used in practice

The Python bindings provide excellent coverage for real-world use (~87% of common operations), with only a small portion of low-level or niche Java APIs intentionally omitted (~13%).

Coverage by Category

Category Coverage Status
Core Database Operations 95% ✅ Excellent
Query Execution 100% ✅ Complete
Server Mode 90% ✅ Excellent
Data Import 70% ✅ Good
Data Export 100% ✅ Complete
Graph API 85% ✅ Excellent
Schema API 100% ✅ Complete
Index Management 90% ✅ Excellent
Vector Search 100% ✅ Complete
Advanced Features 5% ❌ Minimal

Detailed Coverage

1. Core Database Operations - 95%

DatabaseFactory:

  • create() - Create new database
  • open() - Open existing database
  • exists() - Check if database exists
  • setAutoTransaction() - Not exposed (use config)
  • setSecurity() - Not exposed (server-managed)

Database:

  • query(language, query, *args) - Full support for all query languages
  • command(language, command, *args) - Full support for write operations
  • begin(), commit(), rollback() - Full transaction support
  • transaction() - Python context manager (enhancement)
  • newDocument(type), newVertex(type) - Record creation
  • lookup_by_rid(rid) - Direct record lookup
  • count_type(type) - Efficient record counting
  • getName(), getDatabasePath(), isOpen(), close() - Database info
  • scanType(), scanBucket() - Use SQL SELECT instead
  • lookupByKey() - Use SQL WHERE clause instead
  • async() - Async operations not exposed

2. Query Execution - 100%

All query languages fully supported:

  • ✅ SQL
  • ✅ OpenCypher
  • ✅ MongoDB query syntax
  • ✅ GraphQL

ResultSet & Results:

  • ✅ Pythonic iteration (__iter__, __next__)
  • has_next(), next()
  • get(), has_property(), get_property_names()
  • to_json(), to_dict() (Python enhancement)

3. Graph API - 85%

Hybrid approach: Pythonic object manipulation + Powerful Query Languages

Vertex & Edge Manipulation (Pythonic):

  • db.new_vertex(type) - Returns vertex object
  • vertex.set(name, value) - Fluent property setting
  • vertex.save() - Persist changes
  • vertex.new_edge(label, target, **props) - Create edges (bidirectionality controlled by EdgeType schema)
  • db.lookup_by_rid(rid) - Direct lookup (e.g., db.lookup_by_rid("#10:0"))

Graph Traversals & Queries:

  • ✅ SQL traversal: SELECT * FROM User WHERE out('Follows').name = 'Alice'
  • ✅ OpenCypher patterns: MATCH (a:User)-[:FOLLOWS]->(b) RETURN b
  • ✅ Path finding, shortest paths, pattern matching

What's Not Exposed:

  • ❌ Graph event listeners and callbacks

Object-Oriented Approach (Recommended):

# Create vertices with fluent Python API
alice = db.new_vertex("Person").set("name", "Alice").save()
bob = db.new_vertex("Person").set("name", "Bob").save()

# Create edge with properties (bidirectionality determined by EdgeType schema)
edge = alice.new_edge("Follows", bob, since=date.today())
edge.save()

Query-Based Approach (Also Supported):

# Create edges via SQL
db.command("sql", """
    CREATE EDGE Follows
    FROM (SELECT FROM User WHERE id = 1)
    TO (SELECT FROM User WHERE id = 2)
""")

# Or via Cypher
db.command("cypher", """
    MATCH (a:User {id: 1}), (b:User {id: 2})
    CREATE (a)-[:FOLLOWS]->(b)
""")

# Traverse via Cypher
result = db.query("cypher", """
    MATCH (user:User {name: 'Alice'})-[:FOLLOWS]->(friend)
    RETURN friend.name
""")

4. Schema API - 100%

Full Pythonic Schema API available via db.schema:

  • create_document_type(), create_vertex_type(), create_edge_type()
  • create_property(), drop_property()
  • drop_type(), exists_type(), get_type()
  • get_types() - Iterate all types

5. Index Management - 90%

  • create_index() - Supports LSM_TREE, FULL_TEXT, and UNIQUE indexes
  • create_vector_index() - Specialized API for vector search
  • drop_index()
  • get_indexes() - List indexes on type
  • exists_index()

6. Server Mode - 90%

  • ArcadeDBServer(root_path, config) - Server initialization
  • start(), stop() - Server lifecycle
  • get_database(), create_database() - Database management
  • exists() - Check database existence
  • ✅ Context manager support
  • get_studio_url(), get_http_port() - Python enhancements
  • ✅ Embedded and HTTP mode support
  • ❌ Plugin management - Not exposed
  • ❌ HA/Replication - Not exposed
  • ❌ User/security management - Server handles automatically

7. Data Import - 70% (3 primary formats)

Supported:

  • ✅ CSV - import_csv() with full edge/vertex/document support
  • ✅ XML - import_xml() with nesting and attribute extraction
  • ✅ ArcadeDB JSONL exports - IMPORT DATABASE file://... via SQL
  • ✅ Edge import with foreign key resolution
  • ✅ Batch processing and parallel import
  • ✅ Automatic type inference

Not Implemented:

  • ❌ RDF, OrientDB, GloVe, Word2Vec formats
  • ❌ Direct JSON array import (use JSONL instead)
  • ❌ SQL/database import

Note: The 70% coverage reflects that the 3 supported formats (CSV, XML, ArcadeDB JSONL export/import) cover most real-world data migration scenarios.

8. Data Export - 100%

  • ✅ JSONL export - Full database backup format
  • ✅ GraphML export - For visualization tools (Gephi, Cytoscape)
  • ✅ GraphSON export - Graph JSON format
  • ✅ CSV export - Tabular data export
  • ✅ Type filtering - Include/exclude specific types
  • ✅ Compression support - Automatic .tgz compression
  • ✅ Progress tracking and statistics

9. Vector Search - 100%

  • ✅ Vector index creation - create_vector_index() with HNSW (JVector)
  • ✅ NumPy array support - to_java_float_array(), to_python_array()
  • ✅ Similarity search - index.find_nearest()
  • ✅ Add/remove vectors - Automatic via vertex save/delete
  • ✅ Distance functions - cosine, euclidean, inner_product
  • ✅ Vector parameters - max_connections, beam_width
  • ✅ Automatic indexing - Existing records indexed on creation
  • ✅ List vector indexes - schema.list_vector_indexes()

10. Advanced Features - 5%

Not Implemented:

  • ❌ Callbacks & Events (DocumentCallback, RecordCallback, DatabaseEvents)
  • ❌ Low-Level APIs (WAL, bucket scanning, binary protocol)
  • ❌ Async operations & parallel queries
  • ❌ Security management (SecurityManager, user management)
  • ❌ High Availability (HAServer, replication)
  • ❌ Custom query engines
  • ❌ Schema builders & DSL

Design Philosophy: Query-First Approach

The Python bindings follow a "query-first, API-second" philosophy, which is ideal for Python developers. Instead of exposing every Java object, operations are enabled through:

  • SQL DDL for schema management
  • Cypher/SQL for graph operations
  • High-level wrappers for common tasks (transactions, vector search)

This approach is actually cleaner and more maintainable than direct API exposure:

# Python way (clean):
db.command("sql", "CREATE INDEX ON User (email) UNIQUE")
db.query("cypher", "MATCH (a)-[:FOLLOWS]->(b) RETURN b")

# vs. hypothetical direct API (complex):
schema = db.getSchema()
type = schema.getType("User")
index_builder = schema.buildTypeIndex("User", ["email"])
index = index_builder.withUnique(true).create()

Use Case Suitability

Use Case Suitable? Notes
Embedded database in Python app ✅ Perfect Core use case
Graph analytics with Cypher ✅ Excellent All query languages work
Graph traversals & pattern matching ✅ Excellent SQL and OpenCypher fully supported
Document store ✅ Excellent Full SQL support
Vector similarity search ✅ Excellent Native NumPy integration
Development with Studio UI ✅ Excellent Server mode included
Data migration (CSV/XML/JSONL import) ✅ Good 3 major formats covered
Real-time event processing ⚠️ Limited No async, no callbacks
Multi-master replication ❌ Not supported Java/Server only
Custom query language ❌ Not supported Use built-in languages

Conclusion

For 90% of Python developers: These bindings are production-ready and provide everything needed for:

  • Embedded multi-model database
  • Graph, document, vector, and time-series data
  • SQL and OpenCypher queries
  • Development and production deployment

Not suitable for:

  • Applications requiring async/await patterns
  • Custom database extensions or plugins
  • Direct manipulation of Graph API objects
  • High-availability clustering from Python

The practical coverage for real-world applications is 85%+, which is excellent. The 40-45% "total coverage" number is misleading because it counts low-level Java APIs that Python developers shouldn't use anyway.


🚧 Future Work

This Python binding is actively being developed. Here are the planned improvements:

1. High-Level SQL Support for Vectors

Goal: Simplify vector operations with SQL-based API

Currently, vector similarity search requires direct interaction with Java APIs (creating vector indexes, converting arrays, managing vertices manually).

Current approach (requires understanding Java internals):

# Lots of Java API calls
java_embedding = arcadedb.to_java_float_array(embedding)
vertex = db._java_db.new_vertex("Document")
vertex.set("embedding", java_embedding)
index = db.create_vector_index(...)

Future approach (with SQL support):

# Clean SQL-based API
db.command("sql", """
    CREATE VECTOR INDEX ON Document(embedding)
    WITH (dimensions=768, distance='cosine')
""")

result = db.query("sql", """
    SELECT FROM Document
    WHERE embedding NEAR [0.1, 0.2, ...]
    LIMIT 10
""")

Once ArcadeDB adds native SQL syntax for vector operations, we'll adapt the Python bindings to expose this cleaner interface.

2. Comprehensive Testing & Performance Benchmarks

Goal: Validate stability and performance at scale

Current testing covers basic functionality (14/14 tests passing), but we need:

  • Load testing: Insert/query millions of records
  • Vector performance: Benchmark vector search with large datasets (100K+ vectors)
  • Concurrency testing: Multiple transactions, thread safety
  • Memory profiling: Long-running processes, leak detection
  • Platform testing: Verify behavior across Linux, macOS, Windows
  • Python version matrix: Expand tests across 3.10–3.14 (currently exercised on 3.11)

This will ensure production readiness for high-volume applications.

3. Upstream Contribution

Goal: Merge into official ArcadeDB repository

Once the bindings are thoroughly tested and PyPI-ready, we plan to submit a pull request to the official ArcadeDB repository. This will:

  • Make Python bindings an officially supported feature
  • Ensure long-term maintenance and updates
  • Benefit the broader ArcadeDB community
  • Keep bindings in sync with Java releases

Timeline: Waiting for items 1-3 to be completed and validated before proposing upstream integration.


📝 License

Apache License 2.0


🙏 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: python3 -m pytest tests/ -v
  5. Submit a pull request