Multi-Platform JRE Bundling Architecture¶
This document describes the build architecture for creating platform-specific Python wheels with bundled JRE for ArcadeDB Embedded.
Overview¶
Goal: Distribute a single arcadedb-embedded package that works on 4 platforms with zero Java installation required.
Achievement: 4 platform-specific wheels (~68MB compressed) with bundled platform-specific JRE, built and tested on GitHub Actions using native runners.
Supported Platforms¶
| Platform | Wheel Size | JRE Size | Runner | Build Method | Notes |
|---|---|---|---|---|---|
| linux/amd64 | ~68M | ~60M | ubuntu-24.04 |
Docker native | Most common Linux platform |
| linux/arm64 | ~68M | ~60M | ubuntu-24.04-arm |
Docker native | ARM64 servers, Raspberry Pi |
| darwin/arm64 | ~68M | ~60M | macos-15 |
Native build | Apple Silicon Macs (2020+) |
| windows/amd64 | ~68M | ~60M | windows-2025 |
Native build | Windows x86_64 |
All supported platforms:
- ✅ 271 tests passing
- ✅ 31.7M JARs (83 files, identical across platforms)
- ✅ All native runners (no QEMU emulation)
- ✅ Reproducible builds (pinned runner versions)
Architecture¶
Build Strategy¶
We use a hybrid build approach to create platform-specific wheels:
-
Linux platforms: Docker native builds
- linux/amd64: Native Docker on
ubuntu-24.04 - linux/arm64: Native Docker on
ubuntu-24.04-arm(GitHub ARM64 runner) - Builds platform-specific JRE via
jlink
- linux/amd64: Native Docker on
-
macOS platform: Native builds
- Uses platform-specific GitHub Actions runner
- Native
jlinkcreates correct JRE for the platform - Pre-filtered JARs from artifact (eliminates glob issues)
-
Windows platform: Native builds
- Uses platform-specific GitHub Actions runner
- Native
jlinkcreates correct JRE for the platform - Pre-filtered JARs from artifact (eliminates glob issues)
Critical: All wheels are platform-specific (not py3-none-any). This is achieved by:
- setup.py with BinaryDistribution class: Overrides default behavior
- Platform-specific JRE: Each wheel contains native binaries
- Platform tags: Automatically set by setuptools based on JRE contents
Why Platform-Specific Wheels Matter¶
Initially, we were creating py3-none-any wheels because:
pyproject.tomlalone doesn't communicate platform-specificity to setuptools- Without
setup.py, setuptools assumes "pure Python" package - Result: All platforms got same wheel name, uv pip couldn't select correct one
The Solution - setup.py:
from setuptools import setup
from setuptools.dist import Distribution
class BinaryDistribution(Distribution):
"""Distribution which always forces a binary package with platform name"""
def has_ext_modules(self):
return True # Tells setuptools: "I have platform-specific content!"
setup(
distclass=BinaryDistribution,
# ... rest of setup
)
This simple class tells setuptools "this package has binary content" which:
- Triggers platform-specific wheel naming
- Makes uv pip download the correct wheel for each platform
- Enables platform tags like
macosx_11_0_arm64,manylinux_2_17_x86_64, etc.
Without setup.py: All platforms → arcadedb_embedded-X.Y.Z-py3-none-any.whl (wrong!)
With setup.py: Each platform → arcadedb_embedded-X.Y.Z-py3-none-<platform>.whl (correct!)
See bindings/python/setup.py for the complete implementation.
Why This Works¶
Key Insight: jlink can ONLY create JREs for the platform it's running on.
- Running
jlinkon macOS-amd64 → Creates macOS-amd64 JRE ✅ - Running
jlinkin Docker on linux-x64 → Creates linux-x64 JRE ✅ - Running
jlinkin Docker on linux-arm64 → Creates linux-arm64 JRE ✅ - Running
jlinkwith--platform linux/arm64on x64 → Still creates linux-x64 JRE ❌
Solution: Run builds on native hardware for each platform.
Build Pipeline¶
Two-Job Strategy¶
jobs:
download-jars:
runs-on: ubuntu-24.04
# Downloads 84 ArcadeDB JARs, filters to 83, uploads artifact
test:
needs: download-jars
strategy:
matrix:
platform: [linux/amd64, linux/arm64, darwin/arm64, windows/amd64]
# Builds platform-specific wheel, runs tests
Job 1: download-jars (Ubuntu)¶
Purpose: Single point of JAR filtering to avoid cross-platform issues.
Steps:
- Download 84 JARs from ArcadeDB Docker image
- Read
jar_exclusions.txt(single source of truth) - Filter out excluded JARs (currently:
arcadedb-grpcw-*.jar) - Result: 83 JARs (167.4M)
- Upload as artifact for native builds
Why Ubuntu? Bash filtering works reliably and avoids cross-platform glob differences.
Job 2: test (Matrix)¶
Platform-specific build and test:
Linux Platforms (Docker)¶
- Run Docker multi-stage build on native ARM64/AMD64 runner
- Build platform-specific wheel:
jre-builder: Creates platform-specific JRE viajlinkpython-builder: Builds wheel with bundled JRE
- Skip artifact download (Docker gets JARs directly)
- Tests run on same native platform
macOS Platform (Native)¶
- Download pre-filtered JARs artifact
- Run
build-native.sh:- Uses system Java (GitHub runner provides Java 25)
- Runs
jlinknatively → platform-specific JRE - Builds wheel with
python -m build
- Run tests on native platform
Windows Platform (Native)¶
- Download pre-filtered JARs artifact
- Run
build-native.sh:- Uses system Java (GitHub runner provides Java 25)
- Runs
jlinknatively → platform-specific JRE - Builds wheel with
python -m build
- Run tests on native platform
JAR Exclusion System¶
Single Source of Truth: jar_exclusions.txt¶
Location: bindings/python/jar_exclusions.txt
Format: One glob pattern per line
Used by:
.github/workflows/test-python-bindings.yml(download-jars job)bindings/python/Dockerfile.build(Docker builds)bindings/python/setup_jars.py(documentation/validation)
Result: ~40MB savings per wheel (gRPC is ~38MB)
Implementation¶
Before (Broken): Each build step filtered independently
build-native.sh: Filtered with bash on macOSDockerfile.build: Filtered with bash on Linuxsetup_jars.py: Filtered with Python glob- Problem: Glob patterns varied across shells, causing duplication and inconsistency
After (Fixed): Single upstream filter
download-jarsjob: Filters once on Ubuntu (reliable bash)- Native builds: Use pre-filtered JARs from artifact
- Docker builds: Filter independently (different source)
- Result: Consistent 83 JARs across all platforms
Test Parsing¶
JUnit XML for Reliable Results¶
Challenge: Parse test results across Linux (bash) and macOS (BSD tools)
Solution: Structured data via pytest's JUnit XML output
# Run tests with XML output
pytest tests/ --junitxml=test-results.xml
# Parse with POSIX-compatible grep (not GNU-only grep -P)
tests_run=$(grep -oE 'tests="[0-9]+"' test-results.xml | grep -oE '[0-9]+')
failures=$(grep -oE 'failures="[0-9]+"' test-results.xml | grep -oE '[0-9]+')
errors=$(grep -oE 'errors="[0-9]+"' test-results.xml | grep -oE '[0-9]+')
Benefits:
- ✅ Cross-platform compatible (POSIX grep, not GNU)
- ✅ Structured data (no fragile regex)
- ✅ Reliable counts (no sed greediness issues)
Docker Multi-Stage Build¶
Stages¶
# Stage 1: java-builder (downloads JARs from ArcadeDB image)
FROM arcadedb/arcadedb:24.11.1 AS java-builder
# Downloads 84 JARs to /jars
# Stage 2: jre-builder (filters JARs, creates JRE)
FROM amazoncorretto:25 AS jre-builder
COPY --from=java-builder /jars/*.jar /jars/
# Reads jar_exclusions.txt
# Filters to 83 JARs (167.4M)
# Runs jlink → creates /jre (platform-specific!)
# Stage 3: python-builder (builds wheel)
FROM python:3.12-slim
COPY --from=jre-builder /jars/*.jar /jars/
COPY --from=jre-builder /jre /jre
# Builds wheel with bundled JRE
Key Fix: Copy from jre-builder, not java-builder¶
Bug: Originally copied from java-builder → got 84 JARs (unfiltered)
Fix: Copy from jre-builder → gets 83 JARs (filtered)
Native Build Script¶
build-native.sh Workflow¶
# 1. Check for pre-filtered JARs (from artifact)
if [ -d "$JARS_DIR" ]; then
echo "Using existing JARs from artifact"
else
# Fallback: download from Docker (not used in CI)
download_jars_from_docker
fi
# 2. Create platform-specific JRE via jlink
jlink --output jre \
--add-modules "$MODULES" \
--strip-debug \
--no-man-pages \
--no-header-files \
--compress zip-6
# 3. Copy JARs and JRE to package
python setup_jars.py
# 4. Build wheel
python -m build --wheel
Simplification: Removed ~30 lines of JAR filtering logic (now uses pre-filtered artifact)
GitHub ARM64 Runners (linux/arm64)¶
Native ARM64 Support¶
As of late 2024, GitHub Actions provides free native ARM64 runners for public repositories:
Benefits¶
- Native performance: No emulation overhead (3-4x faster than QEMU)
- True platform builds:
jlinkcreates actual ARM64 JRE - Free for public repos: Part of GitHub Actions free tier
- Consistent with other platforms: Same build process as linux/amd64
Build Process¶
docker build \
--platform linux/arm64 \
--build-arg TARGETARCH=arm64 \
-t arcadedb-python-builder:arm64 \
.
Since the runner itself is ARM64, Docker builds run natively without emulation.
File Structure¶
bindings/python/
├── jar_exclusions.txt # Single source of truth for JAR filtering
├── build-native.sh # Native builds (macOS)
├── Dockerfile.build # Docker builds (Linux)
├── setup_jars.py # Copies JARs/JRE to package
├── pyproject.toml # Package metadata, dependencies
└── src/arcadedb/
└── jre/ # Bundled JRE (created during build)
├── bin/java # Platform-specific Java binary
├── lib/ # JRE libraries
└── ...
Build Workflow File¶
Location: .github/workflows/test-python-bindings.yml
Key sections:
-
download-jars job (lines 18-89)
- Downloads and filters JARs once
- Uploads artifact for native builds
-
test job matrix (lines 91-364)
- Builds 4 platforms
- Platform-specific steps (native runners, artifact download, tests)
- Builds 4 platforms
-
Test parsing (lines 200-237)
- JUnit XML generation and parsing
- Cross-platform compatible
Common Issues & Solutions¶
Issue 1: All Platforms Created Identical Linux Wheels¶
Problem: Original Docker-only approach built linux-x64 JRE for all platforms.
Solution: Native runners for macOS, Docker only for Linux.
Issue 2: Test Count Parsing Failed¶
Problem: grep -P (Perl regex) not available on macOS.
Solution: Switch to JUnit XML + POSIX-compatible grep -oE.
Issue 3: Docker Copied Unfiltered JARs¶
Problem: python-builder copied from java-builder (84 JARs) instead of jre-builder (83 JARs).
Solution: Change COPY --from=java-builder to COPY --from=jre-builder.
Issue 4: Linux Builds Downloaded Unnecessary Artifact¶
Problem: Linux Docker builds downloaded pre-filtered artifact but didn't use it.
Solution: Skip artifact download for Linux platforms (Docker gets JARs directly).
Issue 5: Bash Counter Increment Failed¶
Problem: COUNTER=$((COUNTER+1)) failed with set -e in bash.
Solution: Use ((COUNTER++)) or COUNTER=$((COUNTER + 1)) (spaces matter).
Issue 6: Sed Pattern Too Greedy¶
Problem: Sed regex captured too much when parsing test output.
Solution: Switch to JUnit XML (structured data, no regex).
Size Breakdown (current, as of 29-Jan-2026)¶
Sizes are now consistent across platforms (ballpark):
- Wheel: ~68M (compressed)
- JRE: ~60M (uncompressed)
- JARs: ~32M (uncompressed)
- Installed: ~95M (uncompressed)
Development¶
Local Build¶
# Build for current platform
cd bindings/python
./build-native.sh
# Or use Docker (Linux only)
docker build -f Dockerfile.build -t arcadedb-python-builder .
Test Locally¶
References¶
- jlink documentation: Oracle jlink man page
- GitHub Actions runners: GitHub-hosted runners
- GitHub ARM64 runners: Supported runners and hardware resources
- pytest JUnit XML: pytest JUnit XML output