Skip to content

Troubleshooting

Common issues, solutions, and debugging techniques for ArcadeDB Python bindings.

Installation Issues

Package Import Errors

Problem: Can't import arcadedb_embedded module

Solutions:

  1. Verify Installation:

    uv pip show arcadedb-embedded
    uv pip list | grep arcadedb
    

  2. Reinstall Package:

    uv pip uninstall arcadedb-embedded
    uv pip install arcadedb-embedded
    

  3. Reinstall if wheel looks corrupted: Wheels bundle the ArcadeDB JRE and JARs. If imports fail, reinstall the wheel (no external Java install is needed):

    uv pip uninstall -y arcadedb-embedded
    uv pip install --no-cache-dir arcadedb-embedded
    

  4. Check Python Path:

    import sys
    print(sys.path)
    


Runtime Errors

Database Connection Issues

Problem: Can't connect to database

Solutions:

  1. Check Database Path:

    import os
    db_path = "databases/mydb"
    print(f"Exists: {os.path.exists(db_path)}")
    

  2. Verify Database Created:

    import arcadedb_embedded as arcadedb
    
    # Create if not exists
    if not os.path.exists(db_path):
        db = arcadedb.create_database(db_path)
    else:
        db = arcadedb.open_database(db_path)
    

  3. Check Permissions:

    ls -la databases/
    chmod -R 755 databases/
    


Database Already Exists

Symptom:

arcadedb.create_database("./mydb")
# ArcadeDBError: Database already exists

Solution:

Use open_database() instead:

import os
import arcadedb_embedded as arcadedb

if os.path.exists("./mydb"):
    db = arcadedb.open_database("./mydb")
else:
    db = arcadedb.create_database("./mydb")

Or delete existing database:

import shutil

# Remove existing database
if os.path.exists("./mydb"):
    shutil.rmtree("./mydb")

# Create fresh database
db = arcadedb.create_database("./mydb")

Database Locked

Symptom: ArcadeDBError: Database is locked by another process

Cause: Another process has the database open.

Solution:

  1. Close other connections:

    # Ensure previous database is closed
    db.close()
    

  2. Check for orphaned processes:

    # Linux/macOS
    ps aux | grep python
    kill <PID>
    
    # Windows
    tasklist | findstr python
    taskkill /PID <PID>
    

  3. Remove lock file (last resort):

    # Only if you're sure no process is using the database
    rm ./mydb/.lock
    


Memory Configuration

JVM Memory Configuration

Configure JVM memory via the ARCADEDB_JVM_ARGS environment variable before importing arcadedb_embedded:

Basic Configuration:

# Default: 4GB heap
python script.py

# Production: 8GB heap with matching initial size
export ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g"
python script.py

# One-liner
ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g" python script.py

Common JVM Options:

Option Description Example
-Xmx<size> Maximum heap memory -Xmx8g (8 gigabytes)
-Xms<size> Initial heap size (recommended: same as -Xmx) -Xms8g
-XX:MaxDirectMemorySize=<size> Limit off-heap direct buffers -XX:MaxDirectMemorySize=8g
-Darcadedb.vectorIndex.locationCacheSize=<count> Max vector locations to cache (default: -1 = unlimited) -Darcadedb.vectorIndex.locationCacheSize=100000
-Darcadedb.vectorIndex.graphBuildCacheSize=<count> Max vectors cached during HNSW build (default: 10000) -Darcadedb.vectorIndex.graphBuildCacheSize=3000
-Darcadedb.vectorIndex.mutationsBeforeRebuild=<count> Mutations before graph rebuild (default: 100) -Darcadedb.vectorIndex.mutationsBeforeRebuild=200

Vector Index Memory Tuning:

For applications using vector indexes, control memory usage:

# Conservative: bounded caches for large vector datasets
export ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=8g \
  -Darcadedb.vectorIndex.locationCacheSize=100000 \
  -Darcadedb.vectorIndex.graphBuildCacheSize=3000 \
  -Darcadedb.vectorIndex.mutationsBeforeRebuild=200"
python vector_app.py

Cache Size Guidelines:

  • locationCacheSize: Number of vector locations (each ~56 bytes)
  • 100000 entries ≈ 5.6 MB
  • -1 = unlimited (backward compatible, may consume unbounded memory)
  • Recommended: 100000 for datasets with 1M+ vectors

  • graphBuildCacheSize: Number of vectors during HNSW build

  • Memory ≈ cacheSize × (dimensions × 4 + 64) bytes
  • For 768-dim: 10000 entries ≈ 30 MB
  • Lower values reduce build-time memory spikes
  • Recommended: 3000-5000 for high-dimensional vectors

Memory Planning:

Total Process Memory = JVM Heap + Off-Heap Components

Off-Heap Components:
- Direct buffers (MaxDirectMemorySize)
- Metaspace (class definitions)
- Page cache
- Thread stacks
- Vector index caches (if bounded)

Rule of thumb: Plan for 1.5-2× your heap size in actual RAM

Example Configurations:

# Small datasets (<1M records, <100K vectors)
ARCADEDB_JVM_ARGS="-Xmx2g -Xms2g"

# Medium datasets (1M-10M records, 100K-1M vectors)
ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=8g"

# Large datasets (10M+ records, 1M+ vectors) with bounded caches
ARCADEDB_JVM_ARGS="-Xmx16g -Xms16g -XX:MaxDirectMemorySize=16g \
  -Darcadedb.vectorIndex.locationCacheSize=100000 \
  -Darcadedb.vectorIndex.graphBuildCacheSize=5000 \
  -Darcadedb.vectorIndex.mutationsBeforeRebuild=200"

# High-dimensional vectors (e.g., 1536-dim embeddings)
ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g -XX:MaxDirectMemorySize=8g \
  -Darcadedb.vectorIndex.locationCacheSize=50000 \
  -Darcadedb.vectorIndex.graphBuildCacheSize=2000 \
  -Darcadedb.vectorIndex.mutationsBeforeRebuild=150"

Configuration Timing

ARCADEDB_JVM_ARGS must be set before the first import arcadedb_embedded. The JVM can only be configured once per Python process.

Alternative: ARCADEDB_JVM_ERROR_FILE

Set crash log location:

export ARCADEDB_JVM_ERROR_FILE="/var/log/arcade/errors.log"

Out of Memory Errors

Problem: OutOfMemoryError or heap space errors

Solutions:

  1. Increase Heap via Environment Variable (Recommended):

    export ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g"
    python script.py
    

  2. Bound Vector Caches (for vector workloads):

    export ARCADEDB_JVM_ARGS="-Xmx8g -Xms8g \
      -Darcadedb.vectorIndex.locationCacheSize=100000 \
      -Darcadedb.vectorIndex.graphBuildCacheSize=3000"
    python script.py
    

  3. Use Batch Processing:

    batch_size = 1000
    for i in range(0, len(data), batch_size):
        batch = data[i:i + batch_size]
        process_batch(batch)
    

  4. Close ResultSets:

    result = db.query("sql", "SELECT FROM LargeTable")
    try:
        for row in result:
            process(row)
    finally:
        result.close()
    


Data Type Issues

Problem: Type conversion errors

Solutions:

  1. Use Correct Types:

    # Integer
    vertex.set("age", 25)
    
    # String
    vertex.set("name", "Alice")
    
    # List
    vertex.set("tags", ["python", "database"])
    
    # DateTime
    from datetime import datetime
    vertex.set("created", datetime.now())
    

  2. Convert NumPy Arrays:

    from arcadedb_embedded import to_java_float_array
    import numpy as np
    
    arr = np.array([1.0, 2.0, 3.0], dtype=np.float32)
    vertex.set("embedding", to_java_float_array(arr))
    


Transaction Already Active

Symptom:

with db.transaction():
    with db.transaction():  # Nested!
        pass
# ArcadeDBError: Transaction already active

Cause: Nested transactions not supported.

Solution:

Don't nest transactions:

# Bad
with db.transaction():
    some_operation()
    with db.transaction():  # ✗ Error
        another_operation()

# Good
with db.transaction():
    some_operation()
    another_operation()

Or use separate transaction blocks:

with db.transaction():
    some_operation()

# First transaction committed

with db.transaction():
    another_operation()


Query Syntax Error

Symptom:

db.query("sql", "SELECT * FROM User WHERE name = Alice")
# ArcadeDBError: Syntax error near 'Alice'

Cause: String not properly quoted.

Solution:

Use parameters (RECOMMENDED):

db.query("sql",
    "SELECT FROM User WHERE name = :name",
    {"name": "Alice"}
)

Or quote strings in SQL:

db.query("sql", "SELECT FROM User WHERE name = 'Alice'")
#                                              ↑    ↑ quotes


Function Name Errors

Problem: SQL function not recognized

Solutions:

  1. Check Function Name Case:

    # Wrong
    with db.transaction():
        db.command("sql", "INSERT INTO Product SET created = SYSDATE()")
    
    # Correct
    with db.transaction():
        db.command("sql", "INSERT INTO Product SET created = sysdate()")
    

  2. Use Built-in Functions:

    # Date/time
    with db.transaction():
        db.command("sql", "INSERT INTO Event SET timestamp = sysdate()")
    
    # UUID
    with db.transaction():
        db.command("sql", "INSERT INTO User SET id = uuid()")
    


Multi-line Query Issues

Problem: SQL parser errors with complex queries

Solution: Use single-line queries or proper escaping:

# ✅ Single line (wrap in a transaction when executing)
query = "INSERT INTO Product SET name = 'test', created_at = sysdate()"

# ✅ Multi-line with proper formatting
query = """
INSERT INTO Product SET
    name = 'test',
    created_at = sysdate()
""".strip()


Type Conversion Error

Symptom:

vertex.set("embedding", numpy_array)
# TypeError: Cannot convert numpy.ndarray to Java type

Cause: NumPy arrays need explicit conversion.

Solution:

Use conversion utilities:

from arcadedb_embedded import to_java_float_array
import numpy as np

embedding = np.array([1.0, 2.0, 3.0], dtype=np.float32)
vertex.set("embedding", to_java_float_array(embedding))

Performance Issues

Slow Queries

Symptom: Queries take seconds or minutes.

Diagnosis:

Use EXPLAIN to analyze:

result = db.query("sql", "EXPLAIN SELECT FROM User WHERE email = 'alice@example.com'")
for row in result:
    print(row.to_dict())

Solutions:

  1. Create indexes:

    # Schema API is auto-transactional (preferred for embedded use)
    db.schema.create_index("User", ["email"], unique=True)
    

  2. Use LIMIT:

    # Bad: Load everything
    result = db.query("sql", "SELECT FROM User")
    
    # Good: Limit results
    result = db.query("sql", "SELECT FROM User LIMIT 100")
    

  3. Project only needed fields:

    # Bad: Load all properties
    result = db.query("sql", "SELECT FROM User")
    
    # Good: Only needed fields
    result = db.query("sql", "SELECT name, email FROM User")
    


Slow Imports

Symptom: Importing data is very slow.

Solutions:

  1. Increase batch size (commitEvery):

    from arcadedb_embedded import Importer
    importer = Importer(db)
    stats = importer.import_file(
        file_path="users.csv",
        import_type="vertices",
        type_name="User",
        typeIdProperty="id",
        commitEvery=10000,  # Default is 5000
    )
    

  2. Drop indexes during import:

    # Drop indexes (Schema API preferred for embedded)
    db.schema.drop_index("User[email]", force=True)
    
    # Import data (vertices)
    stats = importer.import_file(
        file_path="users.csv",
        import_type="vertices",
        type_name="User",
        typeIdProperty="id",
    )
    
    # Recreate indexes
    db.schema.create_index("User", ["email"], unique=True)
    

  3. Use transactions efficiently:

    # Bad: Many small transactions
    for record in records:
        with db.transaction():
            vertex = db.new_vertex("Data")
            vertex.set("data", record)
            vertex.save()
    
    # Good: Batch in larger transactions
    batch_size = 10000
    for i in range(0, len(records), batch_size):
        with db.transaction():
            for record in records[i:i+batch_size]:
                vertex = db.new_vertex("Data")
                vertex.set("data", record)
                vertex.save()
    


High Memory Usage

Symptom: Process memory grows continuously.

Diagnosis:

Monitor memory:

import psutil
import os

process = psutil.Process(os.getpid())
print(f"Memory: {process.memory_info().rss / 1024 / 1024:.1f} MB")

Solutions:

  1. Stream large ResultSets:

    # Bad: Load all results
    result = db.query("sql", "SELECT FROM LargeTable")
    all_results = list(result)  # Loads everything!
    
    # Good: Process streaming
    result = db.query("sql", "SELECT FROM LargeTable")
    for row in result:
        process(row)
        # Only one row in memory
    

  2. Close ResultSets:

    result = db.query("sql", "SELECT FROM User")
    for row in result:
        if some_condition(row):
            break
    # ResultSet automatically closed when iterator exhausted
    

  3. Force garbage collection:

    import gc
    
    for batch in large_dataset:
        process_batch(batch)
        gc.collect()  # Trigger GC
    

  4. Smaller transactions:

    # Bad: Huge transaction
    with db.transaction():
        for i in range(1000000):
            vertex = db.new_vertex("Data")
            vertex.save()
    
    # Good: Batch transactions
    batch_size = 10000
    for i in range(0, 1000000, batch_size):
        with db.transaction():
            for j in range(batch_size):
                vertex = db.new_vertex("Data")
                vertex.save()
    

Server Mode Issues

Server Won't Start

Symptom:

server = arcadedb.create_server("./databases")
server.start()
# ArcadeDBError: Unable to start server

Solutions:

  1. Check port availability:
    # Linux/macOS
    lsof -i :2480
    
    # Windows
    netstat -ano | findstr :2480
    

Use different port:

server = arcadedb.create_server(
    root_path="./databases",
    http_port=8080  # Different port
)

  1. Check permissions:

    ls -la ./databases
    # Ensure write permissions
    chmod -R 755 ./databases
    

  2. Check logs:

    # Enable logging
    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    server = arcadedb.create_server("./databases")
    server.start()
    # Check log output
    


Can't Connect to Server

Symptom: Server running but can't connect via HTTP.

Solutions:

  1. Verify server is running:

    if server.is_started():
        print("Server is running")
        print(f"URL: http://localhost:{server.http_port}")
    

  2. Check firewall:

    # Linux
    sudo ufw allow 2480
    
    # macOS
    # System Preferences > Security & Privacy > Firewall
    

  3. Test with curl:

    curl http://localhost:2480/api/v1/server
    

Vector Search Issues

Vector Dimension Mismatch

Symptom:

vertex.save()
# ArcadeDBError: Vector dimension mismatch

Cause: Embedding dimension doesn't match index dimension.

Solution:

Verify dimensions match:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Check model dimension
test_embedding = model.encode("test")
print(f"Model dimension: {len(test_embedding)}")  # 384

# Create index with matching dimension
index = db.create_vector_index(
    vertex_type="Document",
    vector_property="embedding",
    dimensions=384  # Must match!
)


Slow First Query

Symptom: The first vector search query takes significantly longer than subsequent queries.

Cause: The vector index is built lazily. The first query triggers the actual construction of the index ("warm up").

Solution: This is expected behavior. You can perform a "warm up" query during application startup if consistent query latency is required.

# Warm up index on startup
print("Warming up vector index...")
index.find_nearest(np.zeros(384), k=1)
print("Index ready")

Poor Search Results

Symptom: Vector search returns irrelevant results.

Solutions:

  1. Try different distance function:

    # Cosine (default, usually best for text)
    index = db.create_vector_index(
        vertex_type="Doc",
        vector_property="embedding",
        dimensions=384,
        distance_function="cosine"
    )
    
    # Euclidean (sometimes better for images)
    index = db.create_vector_index(
        vertex_type="Image",
        vector_property="features",
        dimensions=512,
        distance_function="euclidean"
    )
    

  2. Tune vector parameters:

    # Better recall, slower
    index = db.create_vector_index(
        vertex_type="Doc",
        vector_property="embedding",
        dimensions=384,
        max_connections=32,  # Default: 16
        beam_width=200       # Default: 100
    )
    

  3. Improve embeddings:

    # Combine title and content
    text = f"{doc['title']}. {doc['content']}"
    embedding = model.encode(text)
    
    # vs. just content
    embedding = model.encode(doc['content'])  # May be less effective
    

Debugging

Enable Logging

Python logging:

import logging

# Basic logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# File logging
logging.basicConfig(
    level=logging.DEBUG,
    filename='arcadedb.log',
    filemode='w',
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

import arcadedb_embedded as arcadedb
# Now all operations will be logged

Java logging:

import jpype

# Enable Java logging before importing arcadedb
jpype.startJVM(
    classpath=[...],
    "-Djava.util.logging.config.file=logging.properties"
)

logging.properties:

.level=INFO
handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level=ALL
com.arcadedata.level=DEBUG


Inspect Java Objects

# Get Java class name
java_obj = vertex._java_vertex
print(java_obj.getClass().getName())

# List methods
for method in java_obj.getClass().getMethods():
    print(method.getName())

# Get property value (raw Java)
value = java_obj.get("property_name")
print(f"Type: {type(value)}, Value: {value}")

Transaction Debugging

class DebugTransaction:
    """Debug wrapper for transactions."""

    def __init__(self, db):
        self.db = db
        self.transaction = None

    def __enter__(self):
        print("Starting transaction")
        self.transaction = self.db.transaction()
        return self.transaction.__enter__()

    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type:
            print(f"Transaction failed: {exc_type.__name__}: {exc_val}")
        else:
            print("Transaction committed")
        return self.transaction.__exit__(exc_type, exc_val, exc_tb)

# Usage
with DebugTransaction(db):
    vertex = db.new_vertex("User")
    vertex.set("name", "Alice")
    vertex.save()

Query Debugging

def debug_query(db, language, query, *args):
    """Execute query with debugging."""
    print(f"Query: {query}")
    if args:
        print(f"Params: {args}")

    try:
        result = db.query(language, query, *args)
        rows = list(result)
        print(f"Results: {len(rows)} rows")
        return rows
    except Exception as e:
        print(f"Error: {e}")
        raise

# Usage
results = debug_query(db, "sql", "SELECT FROM User WHERE name = :name", {"name": "Alice"})

Common Error Messages

"Property not found"

Meaning: Trying to get property that doesn't exist.

Solution:

# Check if property exists
if vertex.has_property("name"):
    name = vertex.get("name")
else:
    name = "Unknown"

# Or use default
name = vertex.get("name") or "Unknown"


"Type not found"

Meaning: Vertex/Edge type doesn't exist.

Solution:

# Create type first (Schema API is auto-transactional)
db.schema.get_or_create_vertex_type("User")

# Then create vertex
with db.transaction():
    vertex = db.new_vertex("User")


"Index already exists"

Meaning: Trying to create duplicate index.

Solution:

# Drop existing index
try:
    db.schema.drop_index("User[email]", force=True)
except Exception:
    pass  # Index doesn't exist

# Create new index
db.schema.create_index("User", ["email"], unique=True)


"Unique constraint violation"

Meaning: Trying to insert duplicate value for unique property.

Solution:

# Check if exists first
result = db.query("sql", "SELECT FROM User WHERE email = :email", {"email": "alice@example.com"})

if result.has_next():
    vertex = result.next()
    # Update existing
    vertex.set("name", "Alice")
    vertex.save()
else:
    # Create new
    with db.transaction():
        vertex = db.new_vertex("User")
        vertex.set("email", "alice@example.com")
        vertex.set("name", "Alice")
        vertex.save()

Getting Help

  1. Check Documentation:

  2. Search Issues:

  3. Report Bug: Include:

    • Python version (python --version)
    • Package version (uv pip show arcadedb-embedded)
    • Minimal reproducible example
    • Full error message with stack trace
    • Operating system

See Also