Skip to content

Data Import Examples

This page covers examples for importing data into ArcadeDB from various sources.

Before running any import example, download the datasets using Dataset Downloader.

CSV Import Examples

Import Tabular Data (Documents)

Example 04 - CSV Import: Documents

Learn how to import CSV files as documents:

  • Using the Importer API
  • Defining schema mappings
  • Handling data transformations
  • Performance optimization

Import Graph Data

Example 05 - CSV Import: Graph Database

Learn how to import CSV files as graph vertices and edges:

  • Creating vertices from CSV
  • Creating edges from relationships
  • Bulk import optimization
  • Real-world graph migration

Importer API

The ArcadeDB Python bindings provide a powerful Importer class for efficient data loading.

Basic CSV Import

import arcadedb_embedded as arcadedb
from arcadedb_embedded import import_csv

with arcadedb.create_database("./import_demo") as db:
    # Convenience helper: auto-detect CSV, create schema on-the-fly
    stats = import_csv(
        db,
        file_path="data.csv",
        type_name="MyType",
        commitEvery=5000,
    )
    print(stats)

Import with Schema Types

import arcadedb_embedded as arcadedb
from arcadedb_embedded import import_csv

with arcadedb.create_database("./import_demo") as db:
    # Define schema up front so imports get typed correctly
    with db.transaction():
        db.command("sql", "CREATE DOCUMENT TYPE Product")
        db.command("sql", "CREATE PROPERTY Product.id INTEGER")
        db.command("sql", "CREATE PROPERTY Product.name STRING")
        db.command("sql", "CREATE PROPERTY Product.price FLOAT")
        db.command("sql", "CREATE PROPERTY Product.inStock BOOLEAN")

    stats = import_csv(
        db,
        file_path="products.csv",
        type_name="Product",
        commitEvery=5000,
    )
    print(stats)

Bulk Import for Performance

import arcadedb_embedded as arcadedb
from arcadedb_embedded import import_csv

with arcadedb.create_database("./import_demo") as db:
    # Import in batches (import_csv handles commitEvery internally)
    stats = import_csv(
        db,
        file_path="large_dataset.csv",
        type_name="LargeType",
        commitEvery=10000,  # Commit every 10k records
        parallel=4,  # Optional: parallel importer threads
    )
    print(stats)

Import Graph Data

Create Vertices from CSV

import arcadedb_embedded as arcadedb

with arcadedb.create_database("./graph_import_demo") as db:
    # Import vertices (CSV columns become properties)
    stats = arcadedb.import_csv(
        db,
        file_path="users.csv",
        type_name="User",
        import_type="vertices",
        typeIdProperty="userId",
        commitEvery=5000,
    )
    print(stats)

Create Edges from CSV

import arcadedb_embedded as arcadedb

with arcadedb.open_database("./graph_import_demo") as db:
    # Import edges (FK resolution using typeIdProperty)
    stats = arcadedb.import_csv(
        db,
        file_path="follows.csv",
        type_name="Follows",
        import_type="edges",
        edgeFromField="follower_id",
        edgeToField="following_id",
        typeIdProperty="userId",
        commitEvery=5000,
    )
    print(stats)

Performance Tips

Optimize Import Speed

  1. Use Transactions: Batch multiple inserts in one transaction
  2. Disable Indexes: Temporarily disable indexes during bulk import
  3. Use Parallel Processing: Split large files and import in parallel
  4. Tune commitEvery: Adjust commitEvery (e.g., 1000-10000) for performance vs. transaction size
import arcadedb_embedded as arcadedb
from arcadedb_embedded import Importer

with arcadedb.create_database("./import_demo") as db:
    # Example: Optimized bulk import
    # Drop heavy indexes before bulk insert (replace with your index names)
    db.command("sql", "DROP INDEX `MyType[id]`")

    importer = Importer(db)

    # Bulk import (importer handles transactions internally)
    importer.import_file(
        file_path="huge_file.csv",
        type_name="MyType",
        commitEvery=5000
    )

    # Recreate indexes after import (schema ops are auto-transactional)
    db.command("sql", "CREATE INDEX ON MyType (id) UNIQUE")

Additional Resources

Source Code

View the complete import example source code: