Compiler Contract¶
The intended contract for CypherGlot is:
raw Cypher string
→ parse
→ validate admitted subset
→ normalize
→ graph-relational IR
→ backend-aware lowering
→ SQLGlot-backed output
That means:
cypherglotowns raw Cypher parsingcypherglotowns Cypher normalization and loweringcypherglotreturns either a SQLGlotExpressionor a small compiled program composed of SQLGlot-backed statements- a host runtime such as
humemdbowns planning, vector execution, dialect generation, and backend execution
That contract is live for the admitted subset: parse, validate, normalize, and compile are repo-owned boundaries, while execution remains outside CypherGlot.
The physical graph-to-table layout assumed by current compilation is documented in the Schema Contract guide.
Current backend stance¶
CypherGlot returns SQLGlot-backed output through an explicit multi-backend compiler pipeline.
That means:
- the compiler now targets
Cypher AST -> normalize -> graph-relational IR -> backend-aware lowering -> SQLGlot-backed output - SQLite has an executable lowering path through the shared IR
- DuckDB now has an explicit lowerer from the same shared IR path; parity work is still in progress and support claims remain strict
- PostgreSQL is another first-class lowerer from the same IR path
- HumemDB is the main reference runtime for execution
- host runtimes should treat the current graph-to-table schema contract as part of the execution boundary, not as an incidental implementation detail
to_sql(..., dialect=...)andrender_cypher_program_text(..., dialect=...)expose SQLGlot rendering controls, but those controls do not by themselves make the compiled output backend-neutral- a backend counts as supported only when admitted Cypher shapes execute correctly against that backend's schema and runtime contract, not merely when SQLGlot can render SQL text for it
Scope¶
cypherglot should:
- parse Neo4j-like Cypher input
- validate admitted subset boundaries clearly
- lower admitted Cypher into SQLGlot-backed compiled output
cypherglot should not:
- execute SQL
- own graph storage
- execute vector search
- manage vector index lifecycle
Vector-aware but not vector-executing¶
For mixed Cypher vector queries, cypherglot should parse the ordinary Cypher
structure and carry vector intent forward as metadata or compiler-recognizable
structure. A host runtime should then turn that into vector search plus a
conditioned relational query path.
That handoff shape is the normalized
NormalizedQueryNodesVectorSearch contract, which carries:
procedure_kind='queryNodes'index_namequery_param_nametop_kas the admitted normalized top-k valuecandidate_queryas one admitted normalizedMATCHquery built from either post-callMATCH ...orYIELD ... WHERE ...return_itemsovernode.idand/orscoreorder_byovernode.idand/orscore
That is intentionally a host-runtime handoff contract, not ordinary SQL lowering.
compile_cypher_text(...), compile_cypher_program_text(...), and the rendering
helpers built on them still reject vector-aware CALL db.index.vector.queryNodes(...)
queries so the compiler does not pretend vector planning or execution is backend-native
SQLGlot work.
Output shapes¶
CypherGlot exposes two related output contracts:
- single-statement helpers such as
to_sqlglot_ast(...)andto_sql(...)for admitted shapes that lower to one SQLGlot expression - program helpers such as
to_sqlglot_program(...)andrender_cypher_program_text(...)for admitted shapes that require multiple SQL-backed steps
The admitted language boundary is documented in the Admitted Subset guide.
That also includes a narrow vector-aware normalization path for
CALL db.index.vector.queryNodes(...) YIELD node, score ... queries. Those
queries are validated and normalized so host runtimes can consume their vector
intent, but they are not yet compiled into SQLGlot-backed output directly.
For ordinary non-vector aggregation, the current admitted aggregate contract is intentionally narrow compared with full Cypher, but it includes the practical grouped families that matter for mainstream onboarding:
count(binding_alias)count(*)sum(...)avg(...)min(...)max(...)
Those surfaces remain restricted to already admitted field or scalar-binding inputs, depending on the query shape.