Vector Search Tests (JVector / LSM)¶
The test suite exercises the current Java-native JVector + LSM vector index used by ArcadeDB (no Python hnswlib dependency). All tests run through the Python bindings.
Overview¶
What the tests cover:
- ✅ HNSW (JVector)/LSM index creation via
create_vector_index - ✅ Nearest-neighbor search with
find_nearest - ✅ RID filtering using
allowed_rids - ✅ Overquery factor tuning (
overquery_factor) - ✅ Distance functions (cosine default, euclidean variants)
- ✅ Persistence & size checks (index files survive reopen)
- ✅ Chunked inserts via explicit transactions (preferred for embedded)
Test Coverage (high level)¶
test_create_vector_index– creates HNSW (JVector)/LSM index and verifies schema listingtest_lsm_vector_search– basic nearest-neighbor searchtest_lsm_vector_search_with_filter–allowed_ridsfilteringtest_lsm_vector_delete_and_search_others– deletes vertices, ensures others are still foundtest_lsm_vector_search_overquery– adjustsoverquery_factortest_get_vector_index_lsm– fetches index metadatatest_lsm_index_size– asserts index file presence/sizetest_lsm_persistence– reopen DB and reuse the index- Distance suites – cosine/euclidean correctness for orthogonal/parallel/opposite/high-dim vectors
test_lsm_vector_search_comprehensive– end-to-end search path
SQL Vector Functions Tests¶
SQL vector operations are tested separately in test_vector_sql.py, including vector math functions, distance calculations, aggregations, quantization (with known limitations), and SQL-based index creation and search.
Common Patterns¶
Create JVector (LSM-backed) index¶
with arcadedb.create_database("./test_db") as db:
db.command("sql", "CREATE VERTEX TYPE Doc")
db.command("sql", "CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS")
index = db.create_vector_index(
"Doc",
"embedding",
dimensions=384,
distance_function="cosine", # default
max_connections=16, # graph degree (default)
beam_width=100 # search/construction beam (default)
)
Search with filters and overquery factor¶
with arcadedb.create_database("./test_db") as db:
db.command("sql", "CREATE VERTEX TYPE Doc")
db.command("sql", "CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS")
index = db.create_vector_index(
"Doc",
"embedding",
dimensions=3,
)
# Insert test vertices with embeddings
with db.transaction():
doc1 = db.new_vertex("Doc", docId=1, embedding=[1.0, 0.0, 0.0])
doc1.save()
doc2 = db.new_vertex("Doc", docId=2, embedding=[0.0, 1.0, 0.0])
doc2.save()
# Search with filters
query = [1.0, 0.0, 0.0]
results = index.find_nearest(
query,
k=2,
allowed_rids=[doc1.get_rid(), doc2.get_rid()],
overquery_factor=4,
)
Chunked insert vectors (preferred)¶
with arcadedb.create_database("./test_db") as db:
db.command("sql", "CREATE VERTEX TYPE Doc")
db.command("sql", "CREATE PROPERTY Doc.docId INTEGER")
db.command("sql", "CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS")
# Prefer chunked transactions for embedded (avoids batch_context overhead)
vectors = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]
chunk_size = 100
for start in range(0, len(vectors), chunk_size):
with db.transaction():
for idx, vec in enumerate(vectors[start : start + chunk_size]):
doc = db.new_vertex("Doc")
doc.set("docId", start + idx)
doc.set("embedding", vec)
doc.save()
Key Takeaways¶
- JVector is fully Java-native and LSM-backed; no legacy hnswlib path remains.
- Use
allowed_ridsfor pre-filtered searches andoverquery_factorfor recall/speed trade-offs. max_connectionsandbeam_widthmap to JVector graph degree and search beam; tune per workload.- Prefer chunked
db.transaction()inserts for embedded workloads; reservebatch_contextfor legacy/tests that explicitly need it.
See Also¶
- Vector API – Full Python API reference
- NumPy Tests – NumPy integration
- Example 03: Vector Search – End-to-end usage
- Example 06: Movie Recommendations – Vector-powered recommender
- Vector Guide – Concepts and tuning