Example 16: Import Database vs Transactional Graph Ingest¶
This example compares four graph-ingest strategies against the same generated vertex and edge dataset shape.
Overview¶
Example 16 is the graph-ingest comparison harness for embedded Python.
- Generates synthetic graph CSV data for vertices and edges
- Runs four ingest modes:
- transactional SQL vertex and edge creation
- embedded
GraphBatch - async SQL vertex and edge creation
- SQL
IMPORT DATABASE
- Checks final vertex and edge count parity before trusting the timing result
Current Repository Guidance¶
- This example exists because ingest winners are workload-dependent
- Current results for the 5M/5M benchmark shape show that
IMPORT DATABASEwith--parallel 4is the fastest option in this repository GraphBatchremains competitive and is the strongest non-import path in the current snapshot- Async SQL is still useful as a baseline comparison, but it is not the recommended bulk graph ingest path here
- SQL
IMPORT DATABASEis now a preferred bulk graph load path for this benchmark shape when parallel import is enabled
Recent Benchmark Snapshot¶
For this shape:
vertices=5,000,000edges=5,000,000vertex-int-props=10vertex-str-props=10edge-int-props=10edge-str-props=10string-size=64batch-size=10,000heap-size=8g
Measured times:
Transactional(1 thread):575.078sAsync SQL(--async-parallel 1):701.080sGraphBatch(--parallel 1):507.983sGraphBatch(--parallel 4):359.672sIMPORT DATABASE(--parallel 1):453.481sIMPORT DATABASE(--parallel 4):275.325s
All four methods produced the same final graph output for this benchmark shape.
Run¶
From bindings/python/examples:
python 16_import_database_vs_transactional_graph_ingest.py \
--vertices 100000 \
--edges 300000 \
--vertex-int-props 6 \
--vertex-str-props 4 \
--edge-int-props 2 \
--edge-str-props 1 \
--string-size 64 \
--batch-size 10000 \
--async-parallel 1 \
--parallel 1 \
--heap-size 4g
Key Options¶
--vertices: number of generated vertices--edges: number of generated edges--vertex-int-props/--vertex-str-props: vertex property counts--edge-int-props/--edge-str-props: edge property counts--string-size: generated string payload size--batch-size: ingest batch size--async-parallel: async SQL worker count--parallel: SQL import worker count and GraphBatch parallel-flush toggle--heap-size: JVM heap size
Parity Semantics¶
Timing comparisons only matter if all four modes produce matching final vertex and edge counts.