Example 07: Stack Overflow Tables (OLTP)¶
This example loads the Stack Overflow XML tables into a relational-style layout and runs a point-oriented CRUD workload.
Overview¶
Example 07 is the table OLTP benchmark for the Stack Overflow dataset.
- It loads eight XML-derived tables.
- It uses a fixed mixed workload: 60% read, 20% update, 10% insert, 10% delete.
- Every operation picks one table and one target row at random.
- It measures preload time, index time, OLTP latency, throughput, disk usage, and RSS.
Source Tables¶
The example defines these tables directly in TABLE_DEFS.
| Table | Source file | Columns |
|---|---|---|
User |
Users.xml |
Id, Reputation, CreationDate, DisplayName, LastAccessDate, WebsiteUrl, Location, AboutMe, Views, UpVotes, DownVotes, AccountId |
Post |
Posts.xml |
Id, PostTypeId, ParentId, AcceptedAnswerId, CreationDate, Score, ViewCount, Body, OwnerUserId, LastEditorUserId, LastEditorDisplayName, LastEditDate, LastActivityDate, Title, Tags, AnswerCount, CommentCount, FavoriteCount, ClosedDate, CommunityOwnedDate |
Comment |
Comments.xml |
Id, PostId, Score, Text, CreationDate, UserId |
Badge |
Badges.xml |
Id, UserId, Name, Date, Class, TagBased |
Vote |
Votes.xml |
Id, PostId, VoteTypeId, CreationDate, UserId, BountyAmount |
PostLink |
PostLinks.xml |
Id, CreationDate, PostId, RelatedPostId, LinkTypeId |
Tag |
Tags.xml |
Id, TagName, Count, ExcerptPostId, WikiPostId |
PostHistory |
PostHistory.xml |
Id, PostHistoryTypeId, PostId, RevisionGUID, CreationDate, UserId, UserDisplayName, Comment, Text, CloseReasonId |
Each backend also creates a unique Id index for every table.
Supported Backends¶
arcadedb_sqlsqliteduckdbpostgresql
Run¶
From bindings/python/examples:
python 07_stackoverflow_tables_oltp.py \
--dataset stackoverflow-tiny \
--db arcadedb_sql \
--threads 1 \
--transactions 10000 \
--batch-size 10000 \
--mem-limit 4g \
--verify-single-thread-series
Workload Model¶
The operation planner is fixed in source as:
The selected table for each operation is random across all eight tables.
Read Projections¶
Reads always filter on Id, but the projected columns vary by table. The source code
uses get_read_projection(), which chooses one of two projections per table.
| Table | Projection A | Projection B |
|---|---|---|
User |
Id, Reputation, CreationDate |
Id, Reputation, DisplayName |
Post |
Id, PostTypeId, ParentId |
Id, PostTypeId, AcceptedAnswerId |
Comment |
Id, PostId, Score |
Id, PostId, Text |
Badge |
Id, UserId, Name |
Id, UserId, Date |
Vote |
Id, PostId, VoteTypeId |
Id, PostId, CreationDate |
PostLink |
Id, CreationDate, PostId |
Id, CreationDate, RelatedPostId |
Tag |
Id, TagName, Count |
Id, TagName, ExcerptPostId |
PostHistory |
Id, PostHistoryTypeId, PostId |
Id, PostHistoryTypeId, RevisionGUID |
Update Targets¶
The update column is the first non-Id field whose declared type is INTEGER or
BOOLEAN.
| Table | Updated column |
|---|---|
User |
Reputation |
Post |
PostTypeId |
Comment |
PostId |
Badge |
UserId |
Vote |
PostId |
PostLink |
PostId |
Tag |
Count |
PostHistory |
PostHistoryTypeId |
Insert Payloads¶
Insert rows are synthetic. The source uses build_synthetic_row():
Idis the next generated integer for that table.INTEGERfields get a random value in[1, 1000].BOOLEANfields get a random boolean.DATETIMEfields get the current UTC timestamp.- string fields get
synthetic_<Table>_<Field>_<Id>.
Query Suite¶
ArcadeDB SQL¶
The ArcadeDB path issues SQL directly.
ArcadeDB Read¶
ArcadeDB Update¶
ArcadeDB Insert¶
The same statement shape is used both for preload batches and OLTP inserts.
ArcadeDB Delete¶
SQLite¶
The SQLite path uses parameterized SQL with quoted identifiers.
SQLite Read¶
SQLite Update¶
SQLite Insert¶
SQLite Delete¶
DuckDB¶
The DuckDB OLTP path uses the same SQL shapes as SQLite, with explicit transactions.
DuckDB Read¶
DuckDB Update¶
DuckDB Insert¶
DuckDB Delete¶
PostgreSQL¶
The PostgreSQL path uses quoted identifiers and %s parameters.
Read¶
The projection_sql string is built by quoting each projected column returned by
get_read_projection().
Update¶
Insert¶
Delete¶
Preload Paths¶
The benchmark does not use the same preload mechanism for every backend.
- ArcadeDB preload uses async
INSERT INTO ... SET ...SQL. - SQLite preload uses batched
INSERT INTO ... VALUES ...statements. - DuckDB preload uses per-table CSV materialization followed by:
- PostgreSQL preload writes CSV and then streams it through:
Those load-path differences matter for ingest timing, but they do not change the OLTP CRUD statements listed above.
Result Notes¶
du_mibis real post-run filesystem usage.disk_after_*fields are benchmark-reported logical size counters.- Per-operation latency is summarized from the recorded
read,update,insert, anddeletelatency buckets. --verify-single-thread-serieschecks deterministic repeatability for one backend configuration, not strict equality across different engines.