Vector Search Benchmark¶

Purpose:

compare the current HumemVector v0 execution candidates in one concrete scenario
measure exact NumPy search, scalar-int8 NumPy search, LanceDB flat search, and LanceDB indexed search
report both setup costs and steady-state query latency plus recall

Representative command:

HUMEMDB_THREADS=8 python scripts/benchmarks/vector_search.py \
  --rows 100000 \
  --dimensions 384 \
  --queries 64 \
  --top-k 10 \
  --warmup 1 \
  --repetitions 3

Scenario:

100,000 vectors
384 dimensions
cosine metric
64 queries
top_k=10
filtered bucket candidate count: 733
LanceDB index type: IVF_PQ with library-default partition settings

Stage timings:

Stage	Time
SQLite seed	1243.06 ms
SQLite to NumPy load	662.98 ms
NumPy FP32 build	41.28 ms
NumPy SQ8 build	131.77 ms
LanceDB table create	2292.47 ms
LanceDB index build	12553.09 ms

Per-query latency and recall:

Path	Global mean	Filtered mean	Recall@k global	Recall@k filtered	Takeaway
NumPy FP32 exact	4.05 ms	0.07 ms	1.0000	1.0000	Current exact baseline.
NumPy SQ8	14.44 ms	0.07 ms	0.9859	0.9891	Saves memory, but this run did not turn that into a latency win.
LanceDB flat	38.77 ms	37.64 ms	1.0000	1.0000	Exact but much slower than in-memory NumPy here.
LanceDB indexed default	1.27 ms	2.18 ms	0.1594	0.2734	Lowest latency, but recall is far below a high-recall target.

Artifact sizes:

Artifact	Size
NumPy FP32 matrix	153,600,000 bytes
NumPy SQ8 quantized data	38,400,000 bytes
NumPy SQ8 scales	1,536 bytes
Query batch FP32	98,304 bytes

Current interpretation:

NumPy FP32 exact remains the strongest baseline in this representative mid-size case
NumPy SQ8 reduces memory substantially, but the current implementation does not turn that into a speed win
default LanceDB indexed search can beat NumPy exact on latency, but its recall is far below an acceptable high-recall default for this query shape
the sweep and tuning benchmarks are the better place to decide routing thresholds, because one single-run point is not enough to define a crossover policy