Benchmarks¶
CypherGlot has benchmark entrypoints for compiler, runtime, and schema experiments, and they answer different questions:
scripts/benchmarks/schema/sqlite_shapes.pycompares alternative SQLite storage schemas on the same synthetic graph workload.scripts/benchmarks/compiler/benchmark.pymeasures compiler-stage and compiler-entrypoint latency.scripts/benchmarks/runtime/sqlite.pymeasures SQLite-backed compile-plus-execute runtime cost over the graph-to-table schema contract.scripts/benchmarks/runtime/duckdb.pymeasures DuckDB-backed OLTP and OLAP runtime over the same synthetic graph contract.scripts/benchmarks/runtime/postgresql.pymeasures PostgreSQL- backed compile-plus-execute runtime cost over the same contract.scripts/benchmarks/runtime/ladybug.pymeasures LadybugDB-backed direct Cypher runtime over the same synthetic graph contract.
This page documents them separately so each benchmark path has its own scope, inputs, commands, and output model.
Schema benchmark¶
Script:
scripts/benchmarks/schema/sqlite_shapes.py
Supporting files:
scripts/benchmarks/results/schema/sqlite_schema_shape_benchmark.json- the checked-in repeated-run schema summary Markdown artifact under
scripts/benchmarks/results/
Schema scope¶
This harness is for physical-schema experiments inside SQLite. It does not run CypherGlot compilation. Instead, it builds the same synthetic graph into three different SQLite layouts and benchmarks a representative set of direct SQL query shapes against each layout:
- generic compatibility
nodesandedges - generic typed-property tables
- type-aware per-node-type and per-edge-type tables
The goal is to compare setup cost, database size, point reads, ordered top-k reads, one-hop adjacency reads, multi-hop traversals, relationship aggregates, and relationship-heavy projections under the same generated graph.
The default single-run schema benchmark is intentionally broader than the runtime harness. It uses:
10node types10edge types5000nodes per node type4outgoing edges per source node for each edge type- a
5-hop traversal query 10numeric node properties per type
The scale is configurable with:
--node-type-count--edge-type-count--nodes-per-type--edges-per-source--multi-hop-length--node-numeric-property-count--node-text-property-count--node-boolean-property-count--edge-numeric-property-count--edge-text-property-count--edge-boolean-property-count
Schema commands¶
From the repo root:
By default, the schema benchmark runs all three layouts: json, typed, and
typeaware. Use repeated --schema flags only when you want to restrict the
comparison to a subset.
Schema matrix runner¶
Script:
scripts/benchmarks/schema/matrix.py
The leaf schema benchmark remains a single-run benchmark. Use the schema matrix runner when you want repeated runs, worker-level parallelism, per-run logs, and stable paper-style summaries across fresh process starts.
Each queued job writes:
- its benchmark JSON into
scripts/benchmarks/results/schema/ - a per-job log file plus a manifest into
scripts/benchmarks/results/schema-matrix/<run-stamp>/
The schema matrix runner uses three named presets:
small:4node types,4edge types,1000nodes per node type,3outgoing edges per sourcemedium:6node types,8edge types,100000nodes per node type,4outgoing edges per sourcelarge:10node types,10edge types,1000000nodes per node type,8outgoing edges per source
These presets now match the runtime matrix dataset sizes directly.
Suggested repeated-run commands with explicit methodology:
python -m scripts.benchmarks.schema.matrix \
--scale small \
--workers 3 \
--repeats 3 \
--iterations 10000 \
--warmup 200
python -m scripts.benchmarks.schema.matrix \
--scale medium \
--workers 3 \
--repeats 3 \
--iterations 2000 \
--warmup 50
python -m scripts.benchmarks.schema.matrix \
--scale large \
--workers 3 \
--repeats 3 \
--iterations 500 \
--warmup 10
These commands intentionally leave out --schema, so each run compares all
three layouts. They also leave the preset batch size unchanged. Small is still
sampled most heavily; medium and large now trade some inner-loop sampling for
lower wall-clock cost while keeping repeats=3. The recommended worker count
also drops with scale to reduce machine-level contention during the heaviest
ingest and query phases.
Schema result summarizer¶
Script:
scripts/benchmarks/schema/summarize_results.py
This summarizer scans repeated schema benchmark JSON files, groups runs that share the same benchmark configuration, and emits Markdown tables with repeat- level means and sample standard deviations.
The repeated-run summary now reports mean/std across runs for:
- setup timings
- RSS checkpoints
- database size
- pooled execute
mean,p50,p95, andp99 - per-query
mean,p50,p95, andp99
Each grouped Markdown section also prints the dataset shape for that benchmark configuration, including the node/edge type counts, nodes per type, edges per source, multi-hop length, total node/edge counts, and per-entity property counts.
The query sections are also split into lightweight workload groupings using the existing schema query set:
- OLTP-leaning: point reads, ordered top-k, and one-hop adjacency reads
- OLAP-leaning: multi-hop traversal, relationship aggregate, and relationship projection queries
Schema output and current evidence¶
The single-run schema-shape entrypoint still defaults to
scripts/benchmarks/results/schema/sqlite_schema_shape_benchmark.json, but the
current repository-level evidence is the checked-in repeated-run schema summary
Markdown artifact under scripts/benchmarks/results/.
The single-run JSON records:
- benchmark entrypoint and run status metadata
- benchmark controls such as iterations, warmup, batch size, and selected schemas
- environment metadata
- the generated graph scale and property counts
- the synthetic edge-type routing plan
- per-schema setup timings, RSS snapshots, and database size
- per-schema row counts
- pooled execute summaries
- per-query timing summaries for each schema shape
For current interpretation, prioritize the repeated-run summary over any one single JSON file. It captures run-to-run means and sample standard deviations for setup cost, RSS, database size, pooled latency, and per-query latency.
The checked-in repeated-run results currently show the same ordering across the small, medium, and large datasets: the generated type-aware layout is both the best general-purpose storage contract and the smallest on-disk shape, while the typed-property layout remains the expensive middle path that the repo no longer targets.
Representative large-dataset results from the checked-in summary:
- Type-aware size is about
16050.43 MiB, versus about27521.76 MiBfor generic JSON and about88688.90 MiBfor typed-property. - Type-aware pooled
p50is about885.31 ms, versus about1385.39 msfor generic JSON and about3764.79 msfor typed-property. - Type-aware
relationship_projectionp50is about9497.18 ms, versus about24110.96 msfor generic JSON and about33892.63 msfor typed-property. - The typed-property layout remains especially poor for ordered top-k queries:
large
top_active_scorelands at about2216.44 msp50, versus about0.02 msfor generic JSON and about0.01 msfor type-aware.
Schema setup is timed in the standard order
connect -> schema -> ingest -> index -> analyze, so ingest reflects row
loading before query indexes exist and index captures the post-load index
build step explicitly.
The schema benchmark is still primarily a comparative storage-layout experiment
rather than a tail-latency benchmark. The percentile summaries are useful for
compatibility with the other benchmark scripts, but in practice explicit
repeats matter more here than driving single-run iterations to
runtime-benchmark levels.
Compiler benchmark¶
Script:
scripts/benchmarks/compiler/benchmark.py
Supporting files:
scripts/benchmarks/corpora/compiler_benchmark_corpus.jsonscripts/benchmarks/corpora/compiler_sqlglot_benchmark_corpus.jsonscripts/benchmarks/results/compiler_benchmark.json- the checked-in compiler summary Markdown artifact under
scripts/benchmarks/results/ scripts/benchmarks/compiler/summarize_results.py
Compiler scope¶
This harness is for compiler latency, not backend execution. It now measures
the general relational IR pipeline plus the current public compiler entrypoints
over the admitted v0.1.0 subset.
Public entrypoints covered:
parse_cypher_text(...)validate_cypher_text(...)normalize_cypher_text(...)to_sqlglot_ast(...)to_sql(...)to_sqlglot_program(...)render_cypher_program_text(...)
Backend-aware pipeline timings recorded for SQLite, DuckDB, and PostgreSQL:
- IR build
- backend bind
- backend lower
- rendered-program emission
- backend-specific end-to-end raw Cypher to rendered SQL/program text
The same script also runs a separate SQLGlot comparison suite over a PostgreSQL-to-SQLite SQL corpus using:
tokenize(...)parse_one(...)parse_one(...).sql(dialect="sqlite")transpile(..., read="postgres", write="sqlite")
The compiler corpus intentionally mixes query families rather than timing only a
single read shape. It currently includes ordinary reads, optional reads, WITH
queries, grouped aggregation, bounded variable-length reads including zero-hop
coverage, fixed-length multi-hop reads, graph-introspection projections,
metadata projections, UNWIND, standalone writes, traversal-backed program
shapes, and vector-aware normalization queries. In the runtime matrix, the
general variable-hop cap is scale-dependent (2/5/8 for
small/medium/large), while grouped-rollup variable-hop OLAP queries stay
capped at min(variable_hop_max, 3).
Vector-aware queries are benchmarked only through parse, validate, and
normalize. That matches the current product contract: CypherGlot carries vector
intent for host runtimes, but does not compile vector-aware CALL queries to
SQL-backed output directly.
Compiler commands¶
From the repo root:
python -m scripts.benchmarks.compiler.benchmark --iterations 10000 --warmup 200
python -m scripts.benchmarks.compiler.summarize_results
python -m scripts.benchmarks.compiler.summarize_results --output scripts/benchmarks/results/compiler-summary.md
The default compiler run uses:
10000measured iterations per query and entrypoint200warmup iterations per query and entrypoint- both the installed and pure-Python SQLGlot package layouts for the PostgreSQL-to-SQLite comparison
Compiler output and current evidence¶
The default compiler entrypoint still writes
scripts/benchmarks/results/compiler_benchmark.json, while the current
checked-in human-readable summary lives as a Markdown artifact under
scripts/benchmarks/results/.
The checked-in compiler artifacts currently reflect a 22-query CypherGlot
compiler corpus and a matching 22-query SQLGlot comparison corpus over the
current type-aware contract:
- node types:
User,Company,Person - edge types:
KNOWS,WORKS_AT,INTRODUCED
Those artifacts record:
- a
benchmark_sectionsblock that declares how to read the result file shared_entrypoint_resultsfor backend-neutral public compiler entrypointsbackend_entrypoint_resultsfor backend-dependent public compiler entrypoints measured once per SQL backend- per-query summaries across the mixed admitted-subset corpus
- backend-aware IR-build, bind, lower, render, and end-to-end summaries for SQLite, DuckDB, and PostgreSQL
- vector-only parse / validate / normalize summaries
- SQLGlot comparison results for compiled and pure-Python installs when enabled, including version and module-layout metadata
Shared compiler entrypoint summary from the checked-in summary:
| Entrypoint | p50 | p95 | p99 |
|---|---|---|---|
parse_cypher_text(...) |
0.54 ms |
0.90 ms |
0.93 ms |
validate_cypher_text(...) |
0.64 ms |
1.01 ms |
1.04 ms |
normalize_cypher_text(...) |
0.70 ms |
1.14 ms |
1.20 ms |
Compiler result summarizer¶
Script:
scripts/benchmarks/compiler/summarize_results.py
This summarizer reads one or more compiler benchmark JSON files and renders a
Markdown report. By default it consumes the checked-in single-run baseline at
scripts/benchmarks/results/compiler_benchmark.json and emits:
- an overview block with schema and environment metadata
- a shared-entrypoint summary table
- a backend-entrypoint summary table
- a backend-lowering summary table
- SQLGlot comparison tables when
sqlglot_suitesare present in the input
Use --output to write the Markdown to a file; otherwise it prints to stdout.
Backend-dependent public entrypoint summary from the same run:
| Entrypoint | SQLite p50 | DuckDB p50 | PostgreSQL p50 | SQLite p95 | DuckDB p95 | PostgreSQL p95 |
|---|---|---|---|---|---|---|
to_sqlglot_ast(...) |
0.94 ms |
0.96 ms |
0.95 ms |
1.28 ms |
1.29 ms |
1.27 ms |
to_sql(...) |
1.08 ms |
1.07 ms |
1.08 ms |
1.39 ms |
1.43 ms |
1.41 ms |
to_sqlglot_program(...) |
0.85 ms |
0.85 ms |
0.85 ms |
1.27 ms |
1.27 ms |
1.27 ms |
render_cypher_program_text(...) |
0.97 ms |
0.96 ms |
0.96 ms |
1.42 ms |
1.42 ms |
1.40 ms |
Backend pipeline summary from the same run:
| Backend | IR build p50 | Bind p50 | Lower p50 | Render p50 | End-to-end p50 | End-to-end p95 |
|---|---|---|---|---|---|---|
| SQLite | 2.93 us |
0.38 us |
65.82 us |
67.99 us |
0.96 ms |
1.44 ms |
| DuckDB | 2.95 us |
0.38 us |
67.28 us |
67.48 us |
0.96 ms |
1.46 ms |
| PostgreSQL | 2.93 us |
0.37 us |
65.87 us |
66.78 us |
0.96 ms |
1.42 ms |
The current compiler result remains the same at a higher confidence level than the older single-run tables: SQLite, DuckDB, and PostgreSQL are tightly clustered in the compiler-only path, and the earlier DuckDB-specific render gap is not present in the checked-in summary.
What the checked-in compiler summary shows:
- Shared frontend entrypoints remain sub-millisecond at
p50. - Backend-dependent public entrypoints stay tightly grouped around
0.85 msto1.08 msp50across SQLite, DuckDB, and PostgreSQL. - The lowerer-plus-renderer path below the public API remains similarly close:
backend
end_to_endp50is about0.96 msfor all three SQL targets. - Any remaining backend skew is small enough that runtime benchmarks are the more meaningful place to look for backend-specific behavior.
SQLGlot PostgreSQL-to-SQLite comparison summary from the same run:
| Implementation | Method | Queries | p50 | p95 | p99 |
|---|---|---|---|---|---|
compiled (sqlglotc) |
tokenize(...) |
22 | 12.26 us |
26.31 us |
31.97 us |
compiled (sqlglotc) |
parse_one(...) |
22 | 34.63 us |
82.33 us |
95.17 us |
compiled (sqlglotc) |
parse_one(...).sql(...) |
22 | 100.27 us |
252.18 us |
290.36 us |
compiled (sqlglotc) |
transpile(...) |
22 | 61.30 us |
142.93 us |
166.39 us |
| pure Python | tokenize(...) |
22 | 45.58 us |
148.63 us |
167.35 us |
| pure Python | parse_one(...) |
22 | 129.49 us |
345.77 us |
390.37 us |
| pure Python | parse_one(...).sql(...) |
22 | 230.01 us |
615.92 us |
705.77 us |
| pure Python | transpile(...) |
22 | 166.30 us |
441.03 us |
475.67 us |
Compiled SQLGlot is still clearly faster than the pure-Python build. That gap remains materially larger than any compiler-side difference between CypherGlot's SQL backends.
Runtime benchmark¶
Scale presets¶
| Scale | Shape | Extra properties | Traversal | Batch |
|---|---|---|---|---|
| small | 4 node types, 4 edge types, 1000 nodes per type, 3 edges per source, uniform degree |
node: 2 text, 6 numeric, 2 boolean; edge: 1 text, 3 numeric, 1 boolean |
--variable-hop-max 2 |
1000 |
| medium | 6 node types, 8 edge types, 100000 nodes per type, 4 edges per source, skewed degree |
node: 4 text, 10 numeric, 4 boolean; edge: 2 text, 6 numeric, 2 boolean |
--variable-hop-max 5 |
5000 |
| large | 10 node types, 10 edge types, 1000000 nodes per type, 8 edges per source, skewed degree |
node: 8 text, 18 numeric, 8 boolean; edge: 4 text, 10 numeric, 4 boolean |
--variable-hop-max 8 |
10000 |
Runtime matrix runner¶
Script:
scripts/benchmarks/runtime/matrix.py
This runner schedules the current 10 runtime variants through a shuffled job
queue instead of launching a fixed set of terminals by hand. You choose:
--scaleas one ofsmall,medium, orlarge--workersas the number of concurrent worker threads--repeatsas the number of times to run each selected variant- optional per-workload overrides via
--oltp-iterations,--oltp-warmup,--olap-iterations, and--olap-warmup
Each queued job writes:
- its benchmark JSON into
scripts/benchmarks/results/runtime/ - a per-job log file plus a manifest into
scripts/benchmarks/results/runtime-matrix/<run-stamp>/ - any persisted database artifacts under
my_test_databases/runtime-<scale>-<run-stamp>/
The queue is shuffled by default. Use --shuffle-seed for a deterministic
order or --no-shuffle to preserve the declared variant order.
Use repeated --variant flags when you want to run only a subset of the
matrix. The available variant names are the same ones returned by
python -m scripts.benchmarks.runtime.matrix --list-variants.
The current runtime matrix variants are:
sqlite-indexedsqlite-unindexedduckdb-indexedduckdb-unindexedpostgresql-indexedpostgresql-unindexedneo4j-indexedneo4j-unindexedarcadedb-indexedarcadedb-unindexedladybug-unindexed
ArcadeDB heap defaults now follow the scale preset automatically:
small:ARCADEDB_JVM_ARGS='-Xmx4g'medium:ARCADEDB_JVM_ARGS='-Xmx16g'large:ARCADEDB_JVM_ARGS='-Xmx64g'
Override that default for a given run with --arcadedb-jvm-args.
When per-job containers are enabled with --container-cpus, you can also pass
--arcadedb-wheel-path /absolute/path/to/arcadedb_embedded-...whl to install a
local ArcadeDB wheel into those containers instead of resolving the latest
arcadedb-embedded build from PyPI.
Recommended small run:
python -m scripts.benchmarks.runtime.matrix \
--scale small \
--workers 4 \
--repeats 3 \
--oltp-iterations 10000 \
--oltp-warmup 200 \
--oltp-timeout-ms 400 \
--olap-iterations 500 \
--olap-warmup 20 \
--olap-timeout-ms 10000 \
--arcadedb-worker-startup-timeout-s 60 \
--neo4j-password cypherglot1 \
--container-cpus 4
Recommended medium run:
python -m scripts.benchmarks.runtime.matrix \
--scale medium \
--workers 4 \
--repeats 3 \
--oltp-iterations 5000 \
--oltp-warmup 100 \
--oltp-timeout-ms 1000 \
--olap-iterations 100 \
--olap-warmup 10 \
--olap-timeout-ms 100000 \
--arcadedb-worker-startup-timeout-s 180 \
--neo4j-password cypherglot1 \
--container-cpus 4
Recommended large run:
python -m scripts.benchmarks.runtime.matrix \
--scale large \
--workers 4 \
--repeats 3 \
--oltp-iterations 2000 \
--oltp-warmup 20 \
--oltp-timeout-ms 2000 \
--olap-iterations 50 \
--olap-warmup 5 \
--olap-timeout-ms 200000 \
--arcadedb-worker-startup-timeout-s 3600 \
--neo4j-password cypherglot1 \
--container-cpus 4
For runtime runs, keep repeats=3 across all scales and scale down worker
parallelism plus per-run inner-loop sampling as datasets grow, but not so far
that medium and large OLAP suites become too noisy. The current recommended
methodology is to run the full eleven-variant matrix at each scale:
sqlite-indexed, sqlite-unindexed, duckdb-indexed, duckdb-unindexed,
postgresql-indexed, postgresql-unindexed, neo4j-indexed,
neo4j-unindexed, arcadedb-indexed, arcadedb-unindexed, and
ladybug-unindexed. The commands above now rely on the matrix runner's default
behavior, which is to queue all eleven variants unless you explicitly narrow the
run with repeated --variant flags. They also pin the current runtime
guardrails explicitly: scale-specific OLTP and OLAP query timeouts plus a
separate ArcadeDB worker startup budget so larger ArcadeDB datasets have time
to open before query timing begins. The query timeout limits are the emergency
brake for queries that stop making progress; the ArcadeDB startup timeout only
covers time from ArcadeDB worker process launch until that worker reports
ready, including Python worker startup, opening the ArcadeDB database, and any
pre-ready initialization work. It does not include the startup probe query,
warmup iterations, measured iterations, or their query timeout windows. In
practice, ArcadeDB first waits for worker readiness, then runs the startup
probe, and only then can the OLTP or OLAP query timeout window begin.
Per-iteration progress output from the underlying benchmark scripts is enabled
by default. Use --no-iteration-progress when you want quieter worker logs.
Runtime result summarizer¶
Script:
scripts/benchmarks/runtime/summarize_results.py
When you run repeated runtime jobs, the per-run JSON files keep each run's own suite percentiles and setup timings. This summarizer scans those JSON files, groups runs that share the same benchmark configuration, skips non-completed checkpoint payloads, and emits Markdown tables with repeat-level means and sample standard deviations.
The suite tables aggregate the already-recorded suite percentiles, so values
such as p50, p95, and p99 are reported as:
- mean across repeated runs
- sample standard deviation across repeated runs
It also aggregates suite setup timings such as connect_ms, schema_ms,
ingest_ms, index_ms, analyze_ms, gav_ms, or checkpoint_ms whenever
those fields exist for the grouped backend. Per-query end-to-end percentile
tables are now included by default; use --no-queries if you want only the
suite-level tables.
The cross-engine suite tables keep only shared setup phases side by side, so
ArcadeDB's gav_ms is not shown there. Instead, the generated report adds an
ArcadeDB-only setup section where GAV is broken out explicitly and described
as part of the ArcadeDB setup hierarchy:
connect/reset -> schema/constraints -> ingest -> index -> GAV -> analyze/checkpoint.
The ArcadeDB-only worker-startup tables also report open timing from the raw
worker_startup payloads. Worker close time is not currently recorded, so the
report cannot show it yet.
Current checked-in repeated-run report¶
The current checked-in repeated-run runtime summary lives at
the runtime summary Markdown artifact under scripts/benchmarks/results/.
That report aggregates 99 completed JSON result files into 33 grouped
configurations for the large runtime preset. The checked-in large dataset uses:
10,000,000total nodes77,790,000total edges10node types and10edge types61total property fields across the schema11backend/index combinations across SQLite, DuckDB, PostgreSQL, Neo4j, ArcadeDB Embedded, and LadybugDB
Representative large-dataset findings from the checked-in summary:
- Indexed OLTP
p50is best for direct runtimes, with ArcadeDB Embedded at about0.09 msand Neo4j at about0.25 ms; among the compile-plus-execute SQL paths, SQLite lands at about1.27 ms, PostgreSQL at about1.54 ms, and DuckDB at about3.37 ms. - Indexed OLAP
p50is strongest on DuckDB among the SQL backends at about566.42 ms; LadybugDB lands at about746.38 mson its direct-Cypher path, ArcadeDB Embedded at about3259.27 ms, SQLite at about4117.93 ms, PostgreSQL at about6493.17 ms, and Neo4j at about6902.01 ms. - The unindexed OLTP penalty is severe for SQLite, PostgreSQL, Neo4j, and
ArcadeDB Embedded, but much smaller for DuckDB: about
5.80 msunindexed versus about3.37 msindexed. - Large-run wall-clock time is dominated by setup and ingest cost. DuckDB
finishes in roughly
35minutes, SQLite and PostgreSQL in multiple hours, Neo4j and ArcadeDB in roughly4to6.5hours, and LadybugDB in roughly38hours. - Large-run RSS diverges sharply by engine: SQLite remains in the hundreds of
MiB, PostgreSQL is roughly in the
0.5to1.2 GiBrange, DuckDB is around5.7to7.1 GiB, while ArcadeDB Embedded and LadybugDB both reach into the tens of GiB.
For cross-engine interpretation, keep the runtime split explicit:
- SQLite, DuckDB, and PostgreSQL numbers are compile-plus-execute timings through CypherGlot.
- Neo4j, ArcadeDB Embedded, and LadybugDB numbers are direct Cypher execution timings.
- The repeated-run summary therefore answers both backend-comparison and methodology questions, but it is not a single apples-to-apples leaderboard.
Runtime caveats¶
LadybugDB has two known upstream follow-ups that affect the large-runtime benchmark path. One is the long ingest time on the largest dataset. The other is the grouped variable-length traversal below:
MATCH (a:NodeType01)-[:EdgeType01*0..3]->(b:NodeType01)
RETURN b.active AS active, count(b) AS total, avg(b.score) AS avg_score
ORDER BY total DESC, active
Upstream LadybugDB work is expected to address both the large ingest cost and this query shape.
ArcadeDB also has a known reopen issue when the database is indexed and a persisted GAV is enabled. Reopening that database can fail in the OLAP path, so the benchmark harness works around it by keeping one GAV-enabled OLAP worker open for the full timeout-probe and classification pass instead of reopening the database for each query. A future ArcadeDB release may make that workaround unnecessary.
Notes¶
- The current checked-in experiment summaries were produced on a Linux workstation built around a Ryzen 9 7950X. Treat that hardware note as result provenance, not as part of the public-facing benchmark naming.
- Percentiles are computed from raw per-iteration latency samples using linear interpolation.
- The measured loop disables Python GC to reduce avoidable collection noise.
- Not every query applies to every compiler entrypoint, so the compiler corpus explicitly declares valid entrypoints per query shape.
- The compiler benchmark and runtime benchmark answer different questions and should not be compared directly.
- The checked-in baselines are repository-local regression anchors, not general benchmark claims across machines, operating systems, or Python builds.
- The pure-Python SQLGlot comparison path runs in a subprocess with a temporary
package copy that excludes compiled
.somodules, so the active virtualenv is not mutated.