Relational Direct-Read Benchmark¶

Purpose:

compare SQLite and DuckDB across a broader relational workload mix over the SQLite source of truth
cover OLTP-style event reads, analytical aggregates, document-tag joins, and memory-style rollups

Representative command:

HUMEMDB_THREADS=8 python scripts/benchmarks/duckdb_direct_read.py \
  --rows 10000000 \
  --repetitions 5 \
  --warmup 1 \
  --batch-size 50000 \
  --users 200000 \
  --tags 2048

Dataset:

200,000 users
10,000,000 event rows
500,000 documents
2,048 tags
1,500,000 document-tag rows
1,000,000 memory chunks
1 warmup iteration and 5 timed repetitions per query shape
initial load time: 22266.28 ms

Observed means:

Query shape	SQLite mean	DuckDB mean	Takeaway
`event_point_lookup`	0.01 ms	973.49 ms	SQLite is vastly better for indexed point reads against the canonical store.
`event_filtered_range`	8.37 ms	439.16 ms	SQLite stays much better for selective OLTP-style filters.
`event_aggregate_topk`	4646.88 ms	523.44 ms	DuckDB is about 8.9x faster on broad scan-and-group workloads.
`event_region_join`	4668.97 ms	528.00 ms	DuckDB is about 8.8x faster on analytical join aggregation.
`document_tag_rollup`	1.29 ms	107.13 ms	A selective indexed document join still strongly favors SQLite.
`memory_hot_rollup`	311.48 ms	60.78 ms	DuckDB is about 5.1x faster on broader grouped rollups over the memory-style table.

Current interpretation:

SQLite remains the clear default for point lookups, selective filters, and selective indexed joins, even when the overall dataset is large
DuckDB wins once the workload becomes genuinely analytical: broad scans, grouping, and large aggregation over many rows
the richer SQL suite makes an important point that the older benchmark could not: not every join is analytical, and join shape plus selectivity matter more than the mere presence of joins