Example 09: Stack Overflow Graph (OLTP)¶
This example builds a Stack Overflow property graph and runs a mixed OLTP workload.
Overview¶
Example 09 is the graph-oriented OLTP benchmark in the Python examples set.
- Builds the Stack Overflow graph with the repository's directed schema conventions
- Runs mixed graph CRUD operations against the selected backend
- Supports ArcadeDB embedded and Neo4j client/server execution paths alongside the in-process backends
- Measures throughput, latency, disk usage, and peak RSS
- Supports deterministic single-thread verification for repeatability checks
Table-To-Graph Projection¶
The graph benchmark projects six XML sources into a directed property graph.
Source Files Used¶
Required inputs for the graph build:
Users.xmlPosts.xmlComments.xmlBadges.xmlTags.xmlPostLinks.xml
Not used by this graph projection:
Votes.xmlPostHistory.xml
Graph Model¶
The example uses six vertex types and nine directed edge types.
Vertices¶
UserQuestionAnswerTagBadgeComment
Edges¶
ASKEDANSWEREDHAS_ANSWERACCEPTED_ANSWERTAGGED_WITHCOMMENTED_ONCOMMENTED_ON_ANSWEREARNEDLINKED_TO
Each vertex type gets a unique Id index.
Vertex Mapping¶
| Vertex type | Source | Required key | Properties carried into the graph |
|---|---|---|---|
User |
Users.xml |
Id |
Id, DisplayName, Reputation, CreationDate, Views, UpVotes, DownVotes |
Question |
Posts.xml where PostTypeId = 1 |
Id |
Id, Title, Body, Score, ViewCount, CreationDate, AnswerCount, CommentCount, FavoriteCount |
Answer |
Posts.xml where PostTypeId = 2 |
Id |
Id, Body, Score, CreationDate, CommentCount |
Tag |
Tags.xml |
Id |
Id, TagName, Count |
Badge |
Badges.xml |
Id |
Id, Name, Date, Class |
Comment |
Comments.xml |
Id |
Id, Text, Score, CreationDate |
Edge Mapping¶
| Edge type | From | To | Source rule | Edge properties |
|---|---|---|---|---|
ASKED |
User |
Question |
question owner from Posts.xml |
CreationDate |
ANSWERED |
User |
Answer |
answer owner from Posts.xml |
CreationDate |
HAS_ANSWER |
Question |
Answer |
ParentId on answer row |
none |
ACCEPTED_ANSWER |
Question |
Answer |
AcceptedAnswerId on question row |
none |
TAGGED_WITH |
Question |
Tag |
parsed question tags from Posts.xml |
none |
COMMENTED_ON |
Comment |
Question |
comment target is a question | CreationDate, Score |
COMMENTED_ON_ANSWER |
Comment |
Answer |
comment target is an answer | CreationDate, Score |
EARNED |
User |
Badge |
badge owner from Badges.xml |
Date, Class |
LINKED_TO |
Question |
Question |
PostLinks.xml question-to-question links |
LinkTypeId, CreationDate |
Projection Rules¶
Source Mapping¶
| Source file | Output |
|---|---|
Users.xml |
User vertices |
Posts.xml |
Question and Answer vertices, plus ASKED, ANSWERED, HAS_ANSWER, ACCEPTED_ANSWER, TAGGED_WITH |
Comments.xml |
Comment vertices, plus COMMENTED_ON and COMMENTED_ON_ANSWER |
Badges.xml |
Badge vertices, plus EARNED |
Tags.xml |
Tag vertices |
PostLinks.xml |
LINKED_TO |
Vertex Projection Rules¶
Posts.xmlwithPostTypeId = 1becomesQuestionPosts.xmlwithPostTypeId = 2becomesAnswer- there is no generic
Postvertex type in the graph model
Edge Derivation Rules¶
| Edge | Rule |
|---|---|
ASKED |
User.OwnerUserId -> Question.Id, carries CreationDate |
ANSWERED |
User.OwnerUserId -> Answer.Id, carries CreationDate |
HAS_ANSWER |
Question.ParentId -> Answer.Id |
ACCEPTED_ANSWER |
Question.Id -> Answer.AcceptedAnswerId |
TAGGED_WITH |
parse question Tags, resolve by TagName -> Id, create Question -> Tag |
COMMENTED_ON |
comment PostId resolves to a question |
COMMENTED_ON_ANSWER |
comment PostId resolves to an answer |
EARNED |
User.UserId -> Badge.Id, carries Date, Class |
LINKED_TO |
Question.PostId -> Question.RelatedPostId, carries LinkTypeId, CreationDate |
Tag parsing rule:
- replace
><with a separator - strip
<and> - split into tag names
Recommended Load Order¶
TagUserQuestionAnswerBadgeComment- build
Idindexes - build endpoint lookups
ASKEDANSWEREDHAS_ANSWERACCEPTED_ANSWERTAGGED_WITHCOMMENTED_ONCOMMENTED_ON_ANSWEREARNEDLINKED_TO
Exact OLTP Workload¶
Operation Mix¶
The source fixes the workload to:
read: 60%update: 20%insert: 10%delete: 10%
Read Targets¶
Each read chooses one of these target kinds:
userquestionanswerbadgetagcommentedge_sample
ArcadeDB SQL Mode Reads¶
- SQL-R1: edge sample
- SQL-R2: user by
Id
- SQL-R3: question by
Id
- SQL-R4: answer by
Id
- SQL-R5: tag by
Id
- SQL-R6: comment by
Id
- SQL-R7: badge by
Id
ArcadeDB Cypher Mode Reads¶
- CYP-R1: user outbound activity
- CYP-R2: question tags
- CYP-R3: answer comments
- CYP-R4: questions for a tag
- CYP-R5: comment target
- CYP-R6: badge owner
- CYP-R7: edge sample
MATCH ()-[r:ASKED|ANSWERED|HAS_ANSWER|ACCEPTED_ANSWER|TAGGED_WITH|COMMENTED_ON|COMMENTED_ON_ANSWER|EARNED|LINKED_TO]->()
RETURN r
LIMIT 1
Updates¶
ArcadeDB SQL Mode Updates¶
- SQL-U1: question score
- SQL-U2: answer score
- SQL-U3: comment score
- SQL-U4: tag count
- SQL-U5: user reputation
The SQL OLTP path does not update edges.
ArcadeDB Cypher Mode Updates¶
- CYP-U1: question score
- CYP-U2: answer score
- CYP-U3: comment score
- CYP-U4: tag count
- CYP-U5: user reputation
- CYP-U6:
ASKEDedge
MATCH (u:User {Id: %d})-[r:ASKED]->(q:Question {Id: %d})
SET r.CreationDate = coalesce(r.CreationDate, 0) + 1
- CYP-U7:
ANSWEREDedge
MATCH (u:User {Id: %d})-[r:ANSWERED]->(a:Answer {Id: %d})
SET r.CreationDate = coalesce(r.CreationDate, 0) + 1
- CYP-U8:
COMMENTED_ONedge
MATCH (c:Comment {Id: %d})-[r:COMMENTED_ON]->(q:Question {Id: %d})
SET r.Score = coalesce(r.Score, 0) + 1
- CYP-U9:
COMMENTED_ON_ANSWERedge
MATCH (c:Comment {Id: %d})-[r:COMMENTED_ON_ANSWER]->(a:Answer {Id: %d})
SET r.Score = coalesce(r.Score, 0) + 1
- CYP-U10:
EARNEDedge
- CYP-U11:
LINKED_TOedge
MATCH (q1:Question {Id: %d})-[r:LINKED_TO]->(q2:Question {Id: %d})
SET r.LinkTypeId = coalesce(r.LinkTypeId, 0) + 1
Inserts¶
ArcadeDB SQL Mode Inserts¶
- SQL-I1: synthetic user
- SQL-I2: synthetic question
INSERT INTO Question SET Id = :id, Title = 'Synthetic', Body = 'Synthetic body', Score = 0, CreationDate = :ts
- SQL-I3:
ASKEDedge
CREATE EDGE ASKED FROM (SELECT FROM User WHERE Id = :uid LIMIT 1) TO (SELECT FROM Question WHERE Id = :qid LIMIT 1) SET CreationDate = :ts
- SQL-I4: synthetic answer
INSERT INTO Answer SET Id = :id, Body = 'Synthetic answer', Score = 0, CreationDate = :ts, CommentCount = 0
- SQL-I5:
ANSWEREDedge
CREATE EDGE ANSWERED FROM (SELECT FROM User WHERE Id = :uid LIMIT 1) TO (SELECT FROM Answer WHERE Id = :aid LIMIT 1) SET CreationDate = :ts
- SQL-I6:
HAS_ANSWERedge
CREATE EDGE HAS_ANSWER FROM (SELECT FROM Question WHERE Id = :qid LIMIT 1) TO (SELECT FROM Answer WHERE Id = :aid LIMIT 1)
- SQL-I7: synthetic comment
- SQL-I8:
COMMENTED_ONedge
CREATE EDGE COMMENTED_ON FROM (SELECT FROM Comment WHERE Id = :cid LIMIT 1) TO (SELECT FROM Question WHERE Id = :qid LIMIT 1) SET CreationDate = :ts, Score = 0
- SQL-I9:
COMMENTED_ON_ANSWERedge
CREATE EDGE COMMENTED_ON_ANSWER FROM (SELECT FROM Comment WHERE Id = :cid LIMIT 1) TO (SELECT FROM Answer WHERE Id = :aid LIMIT 1) SET CreationDate = :ts, Score = 0
- SQL-I10:
TAGGED_WITHedge
CREATE EDGE TAGGED_WITH FROM (SELECT FROM Question WHERE Id = :qid LIMIT 1) TO (SELECT FROM Tag WHERE Id = :tid LIMIT 1)
- SQL-I11:
LINKED_TOedge
CREATE EDGE LINKED_TO FROM (SELECT FROM Question WHERE Id = :qid LIMIT 1) TO (SELECT FROM Question WHERE Id = :rid LIMIT 1) SET LinkTypeId = 1, CreationDate = :ts
- SQL-I12:
ACCEPTED_ANSWERedge
CREATE EDGE ACCEPTED_ANSWER FROM (SELECT FROM Question WHERE Id = :qid LIMIT 1) TO (SELECT FROM Answer WHERE Id = :aid LIMIT 1)
- SQL-I13: synthetic badge
- SQL-I14:
EARNEDedge
CREATE EDGE EARNED FROM (SELECT FROM User WHERE Id = :uid LIMIT 1) TO (SELECT FROM Badge WHERE Id = :bid LIMIT 1) SET Date = :ts, Class = 1
ArcadeDB Cypher Mode Inserts¶
- CYP-I1: synthetic user and question with
ASKED
CREATE (u:User {Id: %d, DisplayName: 'Synthetic', Reputation: 0, CreationDate: %d})
CREATE (q:Question {Id: %d, Title: 'Synthetic', Body: 'Synthetic body', Score: 0, CreationDate: %d})
CREATE (u)-[:ASKED {CreationDate: %d}]->(q)
- CYP-I2: synthetic answer with
ANSWEREDandHAS_ANSWER
MATCH (u:User {Id: %d}), (q:Question {Id: %d})
CREATE (a:Answer {Id: %d, Body: 'Synthetic answer', Score: 0, CreationDate: %d, CommentCount: 0})
CREATE (u)-[:ANSWERED {CreationDate: %d}]->(a)
CREATE (q)-[:HAS_ANSWER]->(a)
- CYP-I3: synthetic comment on question
MATCH (q:Question {Id: %d})
CREATE (c:Comment {Id: %d, Text: 'Synthetic comment', Score: 0, CreationDate: %d})
CREATE (c)-[:COMMENTED_ON {CreationDate: %d, Score: 0}]->(q)
- CYP-I4: synthetic comment on answer
MATCH (a:Answer {Id: %d})
CREATE (c:Comment {Id: %d, Text: 'Synthetic comment', Score: 0, CreationDate: %d})
CREATE (c)-[:COMMENTED_ON_ANSWER {CreationDate: %d, Score: 0}]->(a)
- CYP-I5:
TAGGED_WITHedge
- CYP-I6:
LINKED_TOedge
MATCH (q1:Question {Id: %d}), (q2:Question {Id: %d})
CREATE (q1)-[:LINKED_TO {LinkTypeId: 1, CreationDate: %d}]->(q2)
- CYP-I7:
ACCEPTED_ANSWERedge
- CYP-I8: synthetic badge with
EARNED
MATCH (u:User {Id: %d})
CREATE (b:Badge {Id: %d, Name: 'SyntheticBadge', Date: %d, Class: 1})
CREATE (u)-[:EARNED {Date: %d, Class: 1}]->(b)
Deletes¶
ArcadeDB SQL Mode Deletes¶
- SQL-D1: question
- SQL-D2: answer
- SQL-D3: comment
- SQL-D4: badge
- SQL-D5: user
- SQL-D6: tag
The SQL OLTP path does not delete edges.
ArcadeDB Cypher Mode Deletes¶
- CYP-D1: node delete template used for
Question,Answer,Comment,Badge,User, andTag
- CYP-D2:
ASKED
- CYP-D3:
ANSWERED
- CYP-D4:
HAS_ANSWER
- CYP-D5:
ACCEPTED_ANSWER
- CYP-D6:
TAGGED_WITH
- CYP-D7:
COMMENTED_ON
- CYP-D8:
COMMENTED_ON_ANSWER
- CYP-D9:
EARNED
- CYP-D10:
LINKED_TO
Current Repository Guidance¶
- ArcadeDB graph preload now uses
GraphBatchfor the initial node and edge load, driven by the configured--threadsvalue GraphBatchis the repository's recommended bulk graph ingest path from Python- Neo4j runs through a Dockerized server plus Python driver wrapper, with the benchmark
splitting the configured global memory/CPU budget between client and server via
--server-fraction - Traversal expectations should be read as directed unless the query pattern explicitly traverses both directions
- For cross-database comparability,
--threads 1is the recommended baseline --verify-single-thread-seriesuses DB-scoped baselines for deterministic repeatability, not strict cross-database equality
Supported Backends¶
arcadedb_sqlarcadedb_cypherneo4jladybug/ladybugdbgraphqliteduckdbsqlitepython_memory
Run¶
From bindings/python/examples:
python 09_stackoverflow_graph_oltp.py \
--dataset stackoverflow-tiny \
--db arcadedb_cypher \
--threads 1 \
--transactions 10000 \
--batch-size 10000 \
--mem-limit 4g \
--verify-single-thread-series
Key Options¶
--dataset: dataset size fromstackoverflow-tinythroughstackoverflow-full--db: graph backend to test--threads: worker threads for the OLTP run--transactions: number of OLTP operations--batch-size: preload XML insert batch size--mem-limit: Docker and JVM memory budget--server-fraction: for Neo4j, fraction of the total CPU/memory budget reserved for the server process--sqlite-profile: SQLite tuning profile when using SQLite-backed paths
Result Notes¶
du_mibis real post-run filesystem usagedisk_after_*fields are benchmark-reported logical size counters- Neo4j runs also record
client_rss_peak_*andserver_rss_peak_*, whilerss_peak_*represents the combined observed peak - Per-operation latency is derived from
latency_summary.ops.{50,95,99}with values converted from seconds to milliseconds - Operation totals come from
op_counts
Validation Checklist¶
After loading the graph, validate at least the following:
- every vertex type has the expected count
- every edge type has the expected count
- all vertex
Idindexes exist COMMENTED_ONonly targetsQuestionCOMMENTED_ON_ANSWERonly targetsAnswerLINKED_TOonly connectsQuestion -> Question