Vector Search Tests (JVector / LSM)¶
The test suite exercises the current Java-native JVector + LSM vector index used by ArcadeDB (no Python hnswlib dependency). All tests run through the Python bindings.
Overview¶
What the tests cover:
- ✅ HNSW (JVector)/LSM index creation via SQL and Python helper coverage
- ✅ Nearest-neighbor search via SQL and embedded helper coverage
- ✅ RID filtering using
allowed_rids - ✅ Exact-search beam tuning (
ef_search) - ✅ Distance functions (cosine default, euclidean variants)
- ✅ Persistence & size checks (index files survive reopen)
- ✅ Chunked inserts via explicit transactions (preferred for embedded)
Test Coverage (high level)¶
test_create_vector_index– covers the Python helper surface for vector index creationtest_lsm_vector_search– basic nearest-neighbor search through the embedded helpertest_lsm_vector_search_with_filter–allowed_ridsfilteringtest_lsm_vector_delete_and_search_others– deletes vertices, ensures others are still foundtest_lsm_vector_search_ef_search– adjustsef_searchtest_get_vector_index_lsm– fetches index metadatatest_lsm_index_size– asserts index file presence/sizetest_lsm_persistence– reopen DB and reuse the index- Distance suites – cosine/euclidean correctness for orthogonal/parallel/opposite/high-dim vectors
test_lsm_vector_search_comprehensive– end-to-end search path
SQL Vector Functions Tests¶
SQL vector operations are tested separately in test_vector_sql.py, including vector math functions, distance calculations, aggregations, quantization (with known limitations), and SQL-based index creation and search.
Common Patterns¶
Create JVector (LSM-backed) index¶
with arcadedb.create_database("./test_db") as db:
db.command("sql", "CREATE VERTEX TYPE Doc")
db.command("sql", "CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS")
db.command(
"sql",
'''
CREATE INDEX ON Doc (embedding)
LSM_VECTOR
METADATA {
"dimensions": 384,
"similarity": "COSINE",
"maxConnections": 16,
"beamWidth": 100
}
''',
)
Search with filters and ef_search¶
with arcadedb.create_database("./test_db") as db:
db.command("sql", "CREATE VERTEX TYPE Doc")
db.command("sql", "CREATE PROPERTY Doc.docId INTEGER")
db.command("sql", "CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS")
db.command(
"sql",
'CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {"dimensions": 3}',
)
# Insert test vertices with embeddings
with db.transaction():
doc1 = db.new_vertex("Doc", docId=1, embedding=[1.0, 0.0, 0.0])
doc1.save()
doc2 = db.new_vertex("Doc", docId=2, embedding=[0.0, 1.0, 0.0])
doc2.save()
# Search with filters
query = [1.0, 0.0, 0.0]
allowed_rids_sql = f"['{doc1.get_rid()}', '{doc2.get_rid()}']"
query_literal = "[" + ", ".join(str(float(v)) for v in query) + "]"
results = db.query(
"sql",
(
"SELECT expand(vectorNeighbors('Doc[embedding]', "
f"{query_literal}, 2, 100)) WHERE @rid IN {allowed_rids_sql}"
),
).to_list()
Chunked insert vectors (preferred)¶
with arcadedb.create_database("./test_db") as db:
db.command("sql", "CREATE VERTEX TYPE Doc")
db.command("sql", "CREATE PROPERTY Doc.docId INTEGER")
db.command("sql", "CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS")
# Prefer chunked transactions for embedded (avoids batch_context overhead)
vectors = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
chunk_size = 100
for start in range(0, len(vectors), chunk_size):
with db.transaction():
for idx, vec in enumerate(vectors[start : start + chunk_size]):
doc = db.new_vertex("Doc")
doc.set("docId", start + idx)
doc.set("embedding", vec)
doc.save()
Key Takeaways¶
- JVector is fully Java-native and LSM-backed; no legacy hnswlib path remains.
max_connectionsandbeam_widthmap to JVector graph degree and search beam; tune per workload.- Prefer chunked
db.transaction()inserts for embedded workloads rather than a separate batching abstraction.
See Also¶
- Vector API – Full Python API reference
- NumPy Tests – NumPy integration
- Example 03: Vector Search – End-to-end usage
- Example 06: Movie Recommendations – Vector-powered recommender
- Vector Guide – Concepts and tuning