Example 08: Stack Overflow Tables (OLAP)¶
This example loads the Stack Overflow tables and runs a fixed analytical SQL suite.
Overview¶
Example 08 is the table OLAP benchmark.
- It loads the same eight tables used by Example 07.
- It builds
Idindexes after load. - It runs one fixed list of ten SQL queries.
- It can repeat the suite with fixed or shuffled order.
Supported Backends¶
arcadedb_sqlsqliteduckdbpostgresql
Run¶
From bindings/python/examples:
python 08_stackoverflow_tables_olap.py \
--dataset stackoverflow-tiny \
--db arcadedb_sql \
--batch-size 10000 \
--query-runs 3 \
--query-order shuffled \
--mem-limit 4g
Query Suite Definition¶
The source defines the workload once in QUERY_DEFS. For ArcadeDB SQL, SQLite, and
DuckDB, the exact SQL below is executed as written. PostgreSQL runs the same logical
queries after applying translate_sql_for_postgresql(), which mechanically quotes
identifiers such as User, Post, Id, and Count.
Exact SQL Queries¶
Q1. Post Type Counts¶
Q2. Top Users By Reputation¶
SELECT Id, DisplayName, Reputation
FROM User
WHERE Reputation IS NOT NULL
ORDER BY Reputation DESC, Id ASC
LIMIT 10
Q3. Top Questions By Score¶
Q4. Top Answers By Score¶
Q5. Most Commented Posts¶
SELECT PostId, count(*) as comment_count
FROM Comment
GROUP BY PostId
ORDER BY comment_count DESC, PostId ASC
LIMIT 10
Q6. Votes By Type¶
Q7. Top Badges¶
Q8. PostLinks By Type¶
Q9. PostHistory By Type¶
SELECT PostHistoryTypeId, count(*) as count
FROM PostHistory
GROUP BY PostHistoryTypeId
ORDER BY PostHistoryTypeId
Q10. Top Tags By Count¶
PostgreSQL Translation Rule¶
The PostgreSQL runner does not define a separate query suite. It rewrites the exact SQL above by quoting a fixed identifier list. In source, the translation rule is:
with the identifiers:
UserPostCommentBadgeVotePostLinkTagPostHistoryIdReputationCreationDateDisplayNamePostTypeIdScoreViewCountPostIdVoteTypeIdNameLinkTypeIdPostHistoryTypeIdTagNameCount
That means the benchmark intent stays identical across engines, while PostgreSQL gets reserved-word-safe SQL.
Load And Index Notes¶
The benchmark measures query execution, but the setup path still matters for context.
- ArcadeDB loads documents through async SQL inserts.
- SQLite uses batched inserts.
- DuckDB bulk-loads CSV via
COPY. - PostgreSQL bulk-loads CSV via
COPY ... FROM STDIN. - After load, all engines build
Idindexes for the benchmark tables.
Result Notes¶
--query-runscontrols how many measured executions happen per query.--query-order fixedpreserves the source order above.--query-order shuffledrandomizes query order between runs.- Query latency should be interpreted together with the repeated-run policy and query order.