Importer API¶
The Importer class and convenience functions provide high-performance data import
capabilities for ArcadeDB. For full-database migrations, use ArcadeDB's native JSONL
export/import via the IMPORT DATABASE file://... SQL command (see JSONL example
below).
Overview¶
The importer uses streaming parsers for memory efficiency and performs batch transactions (default 1000 records per commit) for optimal performance. It can import data as documents, vertices, or edges depending on your schema needs.
Supported Formats:
- CSV/TSV: Comma or tab-separated values (recommended for bulk imports)
- ArcadeDB JSONL export/import: Use
IMPORT DATABASE file://...via SQL for full database moves (see example) - XML: Limited support via Java importer (not recommended for production use)
Module Functions¶
Convenience functions for common import tasks without creating an Importer instance.
These call the underlying Java importer (CSV/TSV) or native SQL for full-database JSONL
imports.
import_csv(database, file_path, type_name, **options)¶
Import CSV or TSV files as documents, vertices, or edges.
Parameters:
database(Database): Database instancefile_path(str): Path to CSV/TSV filetype_name(str): Target type name**options: Format-specific optionsdelimiter(str): Field delimiter (default: ',', use '\t' for TSV)header(bool): File has header row (default: True)commitEvery(int): Records per transaction (default: 5000)import_type(str):"documents"(default),"vertices", or"edges"typeIdProperty(str): Unique ID column when importing vertices/edges (e.g., "id")typeIdType(str): Type of the ID column (e.g., "String", "Long")edgeFromField(str): Column name for edge source IDs/RIDs (REQUIRED for edges)edgeToField(str): Column name for edge target IDs/RIDs (REQUIRED for edges)verboseLevel(int): Java importer logging level 0-3 (default: 2)
Examples:
Import as Documents:
Import as Vertices:
stats = arcadedb.import_csv(
db, "users.csv", "User",
import_type="vertices",
typeIdProperty="id",
typeIdType="String",
commitEvery=500
)
Import as Edges:
stats = arcadedb.import_csv(
db, "follows.csv", "Follows",
import_type="edges",
edgeFromField="from_rid",
edgeToField="to_rid",
# REQUIRED to enable FK resolution when using IDs (and required by API)
typeIdProperty="id",
typeIdType="String"
)
Full-Database Import (JSONL):
import arcadedb_embedded as arcadedb
with arcadedb.create_database("./restored_db") as db:
db.command("sql", "IMPORT DATABASE file:///export.jsonl.tgz WITH commitEvery = 50000")
For XML imports, use the Importer class directly (see below) with format_type='xml'.
Importer Class¶
For more control over the import process, use the Importer class directly.
Importer(database)¶
Constructor:
Parameters:
database(Database): Database instance to import into
Example:
from arcadedb_embedded import Importer
db = arcadedb.open_database("./mydb")
importer = Importer(db)
import_file(file_path, format_type=None, import_type="documents", type_name=None, **options)¶
Import data from a file with auto-detection or explicit format specification.
Parameters:
file_path(str): Path to file to importformat_type(Optional[str]): Format type ('csv', 'xml'); auto-detected from extension if Noneimport_type(str):"documents"(default),"vertices", or"edges"type_name(Optional[str]): Target type name**options: Format-specific options (e.g.,delimiter,header,commitEvery,typeIdProperty,edgeFromField,edgeToField)
Returns:
Dict[str, Any]: Import statistics
Raises:
ArcadeDBError: If file not found, format unsupported, or import fails
Example:
importer = Importer(db)
# Import CSV as vertices
stats = importer.import_file(
file_path="users.csv",
import_type="vertices",
type_name="User",
typeIdProperty="id",
commitEvery=1000
)
# Import CSV as edges (RID-based)
stats = importer.import_file(
file_path="follows.csv",
import_type="edges",
type_name="Follows",
edgeFromField="from_rid",
edgeToField="to_rid",
typeIdProperty="id"
)
Format-Specific Details¶
CSV Format¶
File Structure:
Options:
delimiter(str): Field separator (default: ',')- Use
'\t'for tab-separated (TSV)
- Use
header(bool): First row contains column names (default: True)- If False, columns named
col_0,col_1, etc.
- If False, columns named
import_type(str):"documents","vertices", or"edges"typeIdProperty(str): Unique ID column for vertices/edgestypeIdType(str): Type of the ID column (e.g.,String,Long)edgeFromField(str): Column for edge source IDs/RIDsedgeToField(str): Column for edge target IDs/RIDs
Type Inference: String values are automatically converted:
"true","false"→ boolean- Valid integers → int
- Valid floats → float
- Everything else → string
- Empty strings → None
Documents Example:
# people.csv:
# name,age,email
# Alice,30,alice@example.com
# Bob,25,bob@example.com
stats = arcadedb.import_csv(db, "people.csv", "Person")
Vertices Example:
stats = arcadedb.import_csv(
db, "users.csv", "User",
import_type="vertices",
typeIdProperty="id",
delimiter=','
)
Edges Example:
# relationships.csv:
# from_rid,to_rid,type,since
# #1:0,#1:1,FRIEND,2020
# #1:1,#1:2,COLLEAGUE,2021
# First create the schema (Schema API is preferred for embedded use)
db.schema.create_edge_type("Relationship")
# Then import (edges)
stats = arcadedb.import_csv(
db, "relationships.csv", "Relationship",
import_type="edges",
edgeFromField="from_rid",
edgeToField="to_rid",
# Required when resolving vertices by ID
typeIdProperty="id",
typeIdType="String",
)
Important for Edge Imports:
- CSV must have header row (
header=True) - Source and target columns must contain valid RIDs (e.g.,
#1:0) - Edge type must exist in schema before import
- Additional columns become edge properties
Performance Tips¶
Batch Size¶
The commitEvery parameter controls transaction size:
# Smaller batches (safer, slower)
stats = arcadedb.import_csv(db, "data.csv", "Data", commitEvery=100)
# Larger batches (faster, more memory)
stats = arcadedb.import_csv(db, "data.csv", "Data", commitEvery=5000)
Guidelines:
- Small files (<10K records): 1000-2000
- Medium files (10K-1M records): 2000-5000
- Large files (>1M records): 5000-10000
Memory Efficiency¶
The importer uses streaming parsers:
- CSV: Line-by-line processing (very efficient)
- XML: Streaming parser; keep attributes consistent across rows
Schema Pre-Creation¶
Create types before import for better performance:
# Create schema first (Schema API is auto-transactional)
db.schema.create_document_type("Person")
db.schema.create_property("Person", "email", "STRING")
db.schema.create_index("Person", ["email"], unique=True)
# Then import (type already exists)
stats = arcadedb.import_csv(db, "people.csv", "Person")
Indexing Strategy¶
Create indexes AFTER import for faster loading:
# Import without indexes (vertices)
stats = arcadedb.import_csv(
db, "users.csv", "User",
import_type="vertices",
typeIdProperty="id"
)
# Create indexes after import (Schema API)
db.schema.create_index("User", ["email"], unique=True)
db.schema.create_index("User", ["username"], unique=True)
Error Handling¶
The importer continues on row-level errors and reports them in statistics:
stats = arcadedb.import_csv(
db, "data.csv", "Data",
verbose=True # Print errors as they occur
)
if stats['errors'] > 0:
print(f"Warning: {stats['errors']} records failed to import")
print(f"Successfully imported: {stats['documents'] + stats['vertices']}")
Common Errors:
- Type mismatch: Value doesn't match schema constraint
- Missing required fields: Schema requires field not in data
- Invalid RIDs (edges): Referenced vertex doesn't exist
- JSON parse errors: Malformed JSON
- Encoding issues: Non-UTF-8 characters
Error Recovery:
try:
stats = importer.import_file("data.csv", type_name="Data")
except arcadedb.ArcadeDBError as e:
print(f"Import failed: {e}")
# File not found, format error, or critical failure
Complete Examples¶
Multi-Format Import Pipeline¶
import arcadedb_embedded as arcadedb
# Open or create database (auto-closes)
with arcadedb.create_database("./import_demo") as db:
# Create schema with embedded API
db.schema.create_vertex_type("Person")
db.schema.create_edge_type("Knows")
# Import vertices from CSV
stats = arcadedb.import_csv(
db, "people.csv", "Person",
import_type="vertices",
typeIdProperty="id",
)
print(f"People: {stats['vertices']}")
# Import edges from CSV
stats2 = arcadedb.import_csv(
db, "relationships.csv", "Knows",
import_type="edges",
edgeFromField="person1_id",
edgeToField="person2_id",
typeIdProperty="id",
typeIdType="String",
)
print(f"Relationships: {stats2['edges']}")
Large-Scale Import with Progress Tracking¶
import arcadedb_embedded as arcadedb
import time
with arcadedb.create_database("./large_import") as db:
# Create schema with embedded API
db.schema.create_vertex_type("Product")
# Import with progress monitoring
print("Starting import...")
start = time.time()
stats = arcadedb.import_csv(
db, "products.csv", "Product",
import_type="vertices",
typeIdProperty="id",
commitEvery=10000, # Large batches for performance
verboseLevel=2 # Show importer logs
)
elapsed = time.time() - start
print(f"\nImport complete!")
print(f"Records: {stats['vertices']:,}")
print(f"Errors: {stats['errors']}")
print(f"Time: {elapsed:.2f}s")
print(f"Rate: {stats['vertices'] / elapsed:.0f} records/sec")
# Create indexes after import
ArcadeDB JSONL Import (native export)¶
Use ArcadeDB's built-in SQL command to import JSONL exports created by EXPORT DATABASE.
import arcadedb_embedded as arcadedb
import_path = "/exports/mydb.jsonl.tgz" # JSONL export created by ArcadeDB
# Use context manager for lifecycle
with arcadedb.create_database("./restored_db") as db:
# Use SQL IMPORT DATABASE with optional tuning parameters
db.command("sql", f"IMPORT DATABASE file://{import_path} WITH commitEvery = 50000")
print("Import complete!")
This approach preserves the full database (schema + data) and is the recommended path for large migrations or round-tripping ArcadeDB exports.
See Also¶
- Data Import Guide - Comprehensive import strategies
- Database API - Database operations
- Graph Operations Guide - Working with graph data
- Import Examples - More practical examples