Run Observation & Artifact Registration
roar tracks data artifacts and execution steps in ML pipelines, enabling reproducibility and lineage queries. roar tracking happens automagically by observing your commands as they run, capturing essential context without requiring you to define a pipeline explicitly.
By identifying files based on their actual content rather than their names, it ensures you can always trace a result back to the exact inputs and code that produced it. This gives you reliable reproducibility and a clear history of your artifacts, all derived naturally from your workflow.
While roar captures your work locally, connecting it to a GLaaS (Global Lineage-as-a-Service) server like glaas.ai allows you to publish your lineage graphs to a shared global registry for easy visualization and collaboration. Now your team can search for any artifact by its hash to see exactly how it was made and generate the precise commands needed to reproduce it on another machine.
pip install roar-cli
# or with uv
uv pip install roar-cliRequires Python 3.10+ and Linux (x86_64 or aarch64) for full functionality.
| Platform | roar run |
Other commands |
|---|---|---|
| Linux x86_64 | Full support | Full support |
| Linux aarch64 | Full support | Full support |
| macOS | Not supported | Full support |
| Windows | Not supported | Full support |
The roar run command uses a native tracer binary that requires Linux. Other commands work on all platforms.
# Clone the repository
git clone https://github.com/treqs/roar.git
cd roar
# Install in development mode (automatically builds tracer if Rust is installed)
uv pip install -e ".[dev]"
# or without uv
pip install -e ".[dev]"# Initialize roar in your project
cd my-ml-project
roar init
# Run commands with provenance tracking
roar run python preprocess.py --input data.csv --output features.parquet
roar run python train.py --data features.parquet --output model.pt
roar run python evaluate.py --model model.pt --output metrics.jsonInitialize roar in the current directory. Creates a .roar/ directory to store the local database and a config.toml with default settings.
roar init # Initialize, prompt for gitignore
roar init -y # Initialize and auto-add to gitignore
roar init -n # Initialize without modifying gitignoreRun a command with provenance tracking. Roar captures:
- Files read and written
- Git commit and branch
- Execution time and exit code
- Command arguments
roar run python train.py --epochs 10 --lr 0.001
roar run ./scripts/preprocess.sh
roar run torchrun --nproc_per_node=4 train.py
# Re-run a previous DAG step
roar run @2 # Re-run DAG node 2
roar run @2 --epochs=10 # Re-run with parameter overrideReproduce an artifact by tracing its lineage.
# Show the reproduction plan (preview)
roar reproduce abc123de
# Run full reproduction
roar reproduce abc123de --run
# Run without prompts
roar reproduce abc123de --run -y
# Include system packages during setup
roar reproduce abc123de --run --package-sync
# Show all required packages (no truncation)
roar reproduce abc123de --list-requirementsFull reproduction clones the git repository, creates a virtual environment, installs recorded packages, and runs the pipeline steps.
Run a build step with provenance tracking. Build steps run before pipeline steps during reproduction.
# Compile native extensions
roar build maturin develop --release
roar build make -j4
# Install local packages
roar build pip install -e .Use for setup that should run before the main pipeline (compiling, installing).
Manage GLaaS authentication.
roar auth register # Show SSH public key for registration
roar auth test # Test connection to GLaaS server
roar auth status # Show current auth statusTo register with GLaaS:
- Run
roar auth registerto display your public key - Sign up at https://glaas.ai where you can paste your public key
- Run
roar auth testto verify
View or set configuration options.
roar config list
roar config get <key>
roar config set <key> <value>Run roar config list to see all available options with descriptions. Common options:
| Key | Default | Description |
|---|---|---|
output.track_repo_files |
false | Include repo files in provenance |
output.quiet |
false | Suppress written files report |
filters.ignore_system_reads |
true | Ignore /sys, /etc, /sbin reads |
filters.ignore_package_reads |
true | Ignore installed package reads |
filters.ignore_torch_cache |
true | Ignore torch/triton cache |
filters.ignore_tmp_files |
true | Ignore /tmp files |
glaas.url |
https://api.glaas.ai | GLaaS server URL |
glaas.web_url |
https://glaas.ai | GLaaS web UI URL |
registration.omit.enabled |
true | Enable secret filtering |
hash.primary |
blake3 | Primary hash algorithm |
logging.level |
warning | Log level (debug, info, warning, error) |
Display the pipeline DAG for the current session.
roar dag # Compact view with colors
roar dag --expanded # Show all executions including reruns
roar dag --json # Machine-readable JSON output
roar dag --show-artifacts # Show intermediate artifactsManage persistent environment variables injected into roar run and roar build.
roar env set FOO bar # Set FOO=bar
roar env get FOO # Print value of FOO
roar env list # List all env vars
roar env unset FOO # Remove FOODisplay recent job execution history.
roar log # Show recent job historyRegister artifact lineage with GLaaS.
roar register model.pt # Register model lineage
roar register --dry-run model.pt # Preview without registering
roar register -y model.pt # Skip confirmation promptStart a fresh session. Previous session data is preserved in the database.
roar reset # Reset with confirmation prompt
roar reset -y # Reset without confirmationShow session, job, or artifact details.
roar show # Show active session overview
roar show @1 # Show details for step 1
roar show @B1 # Show details for build step 1
roar show a1b2c3d4 # Show job by UID
roar show ./output/model.pkl # Show artifact by pathShow a summary of the active session.
roar statusRemove the most recent job from the active session. Useful for undoing a mistaken roar run or correcting the pipeline before registration.
roar pop # Pop with confirmation prompt
roar pop -y # Pop without confirmation (skip prompt)What it does:
- Removes the last job from the session history
- Deletes output artifacts created by that job (unless they're packages/system files)
- Does not affect the original input files
Data files tracked by their content hash (BLAKE3). The same file content always has the same hash, regardless of filename or location.
Recorded executions that consume input artifacts and produce output artifacts. Each roar run creates a job record.
Named groups of artifacts, used for downloaded datasets or upload bundles.
# Record your pipeline
roar run python preprocess.py
roar run python train.py --epochs 10
roar run python evaluate.py
# Later, reproduce an artifact
roar reproduce <model-hash> --runRoar automatically captures git metadata:
- Current commit hash
- Branch name
- Repository path
All data is stored locally in .roar/roar.db (SQLite). The database includes:
- Artifact hashes and metadata
- Job records with inputs/outputs
- Hash cache for performance
Add .roar/ to your .gitignore (roar offers to do this during roar init).
Roar can register artifacts and jobs with a GLaaS (Global Lineage-as-a-Service) server using the roar register command.
# Install with server dependencies
uv pip install -e ".[server]"
# or without uv
pip install -e ".[server]"
# Run the server
glaas-server
# Or with custom host/port
GLAAS_HOST=0.0.0.0 GLAAS_PORT=8080 glaas-serverThe server provides:
- REST API for artifact and job registration
- Web UI at
/with artifact and job browsers - Search and filtering by command, GPU, file type, etc.
# Set the GLaaS server URL
roar config set glaas.url http://localhost:8000
# Show your SSH key (copy to GLaaS web UI)
roar auth register
# Test authentication
roar auth test- Python 3.10+
- Rust toolchain (for building the tracer) - install from https://rustup.rs/
# Install dev dependencies (automatically builds tracer if Rust is installed)
uv pip install -e ".[dev]"# Linting
ruff check .
# Format check
ruff format --check
# Type checking
mypy roar
# Run all checks at once
ruff check . && ruff format --check && mypy roar# Run all tests (excluding those requiring a live GLaaS server)
pytest tests/ -v -m "not glaas and not live_glaas"
# Run with coverage
pytest tests/ -v --cov=roar --cov-report=term-missing -m "not glaas and not live_glaas"
# Run tests in parallel
pytest tests/ -v -n auto -m "not glaas and not live_glaas"
# Run only unit tests (fast)
pytest tests/ -v -m "not integration and not e2e and not glaas and not live_glaas"Apache 2.0