Language: English | 한국어
InferEdgeEnv is a local-first run evidence registry and comparability checker for Edge AI inference benchmark results. The user-facing CLI command is edgeenv.
v0.1.5 is the current v1-complete release baseline. InferEdgeEnv v1 is complete as a local-first run evidence registry and comparability checker; later work should be treated as v1.1+ extensions, not missing MVP scope. The first path is:
- Install and run
doctor. - Record a deterministic fake run.
- Try a local command run.
- Compare only after EdgeEnv checks comparability.
- Use Jetson docs only when you are ready to run EdgeEnv locally on the Jetson shell.
Validated scope: fake/local benchmark recording, artifact storage, registry lookup, export/import, comparability reports, optional resource metrics, read-only bundle summaries, and Jetson tegrastats sampled evidence through local execution on Jetson.
Start with Quickstart. If install fails while pip is fetching build dependencies, check Install And Quickstart Resilience before treating it as an EdgeEnv runtime failure.
If the first path is confusing or blocked, open a README Quickstart feedback issue and use the first-user feedback backlog to classify the first blocked step.
After the first fake run, choose the next path:
- Connect your command: Local Command Contract Guide
- Compare two runs: Compare Workflow Guide
- Repeat Jetson measurements: Jetson Measurement Operations Checklist
Edge inference results are easy to record but hard to compare honestly. A latency number is only meaningful when model identity, input shape, precision, batch size, warmup/repeat protocol, and preprocess/postprocess boundaries are known.
EdgeEnv focuses on recording benchmark evidence locally and judging whether two runs are directly comparable, conditionally comparable, or not comparable.
EdgeEnv is not:
- An OS, bootloader, GRUB, BCD, or Linux compatibility layer
- A VM, Docker, WSL, or cloud target manager
- A cloud database, login/auth system, web dashboard, or public leaderboard
- A model upload server or dataset upload server
- A single-score ranking system for all models
Install and confirm both entrypoints:
python -m pip install -e ".[dev]"
python -m inferedge_env.cli doctor
edgeenv doctorRun the deterministic fake benchmark first. This checks the CLI, config schema, artifact writer, and registry without executing a real model.
edgeenv profile validate examples/profiles/local_fake.yaml
edgeenv bench validate examples/benches/yolov8n_fire.yaml
edgeenv bench run --target examples/profiles/local_fake.yaml --config examples/benches/yolov8n_fire.yaml
edgeenv runs list
edgeenv runs show <run_id>Use the Run ID printed by bench run, or copy it from edgeenv runs list, when replacing <run_id>.
Then try the local runner examples. These execute small deterministic Python commands on the current machine.
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_echo_metrics.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_resource_metrics.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_template.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_adapter_template.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_runtime_adapter.yamlThe local target executes command on the current machine and reads an explicit EDGEENV_METRICS_JSON= line from stdout. Local commands may also emit an optional EDGEENV_RESOURCE_METRICS_JSON= line for memory, power, energy, or temperature evidence. bench run reports whether resource metrics were stored or omitted.
To connect your own benchmark command, start from examples/scripts/adapter_template.py when wrapping an existing command, or examples/scripts/local_benchmark_template.py when writing the benchmark loop directly. Then review the adapter pattern in Local Real Benchmark Example Guide.
Compare two registered runs after you have at least two successful run IDs. EdgeEnv prints the comparability judgement before any metric delta.
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_compare_a.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_compare_b.yaml
edgeenv runs list
edgeenv report compare <run_id_a> <run_id_b>For the full flow, see Compare Workflow Guide.
Sampler wrapper examples show the first integration boundary for optional resource evidence.
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_sampler_wrapper.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_sampler_unavailable.yamlOn Jetson, use the tegrastats wrapper path from the repo root:
edgeenv bench run --target examples/profiles/jetson_nano_local.yaml --config examples/benches/jetson_tegrastats_local.yamlFor the sampler adapter lifecycle path on Jetson, use the sampled local profile and inspect sampler metadata without opening artifact files manually:
edgeenv bench run --target examples/profiles/jetson_nano_sampled_local.yaml --config examples/benches/jetson_sampled_local.yaml
edgeenv runs sampler show <run_id>If a sampler is unavailable, the wrapper should omit EDGEENV_RESOURCE_METRICS_JSON= and preserve the successful primary benchmark run. If a wrapper emits malformed resource metrics, EdgeEnv writes a failed-run artifact and does not update the registry:
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_sampler_malformed_resource.yaml
edgeenv failed-runs list
edgeenv failed-runs show <failed_run_id>
edgeenv failed-runs export <failed_run_id> --output edgeenv-failed-run-<failed_run_id>.zip
edgeenv failed-runs import edgeenv-failed-run-<failed_run_id>.zipruns show reads the result artifact and includes resource evidence when the local command emits it:
edgeenv runs show <run_id>
edgeenv runs resources list --metric memory_peak_mb
edgeenv runs resources list --metric memory_peak_mb --json{
"resource_metrics": {
"energy_j": 31.7,
"memory_mean_mb": 420.5,
"memory_peak_mb": 512.0,
"power_mean_w": 8.2,
"power_peak_w": 11.4,
"source": "example-script",
"temperature_peak_c": 72.0
}
}The fake target uses FakeRunner, so it does not execute a real model.
Local benchmark configs may set timeout_seconds, working_directory, and uppercase extra_env keys for controlled command execution.
The Python package is inferedge_env; the user-facing CLI command remains edgeenv.
English representative path:
- InferEdgeEnv Portfolio Summary — 30-second role, boundary, and reviewer path for this repository
- Documentation Language Guide — choose the English representative path or Korean entry path
- EdgeEnv v0.1.5 Follow-up Note — current v1-complete release baseline and trusted starting point
- Portfolio Demo Path — reviewer-facing fake/local/compare/export-import/bundle-summary demo path
- Local Command Contract Guide — how to connect your own local benchmark command
- Compare Workflow Guide — how to judge comparability before reading metric deltas
- Export/Import Design — portable evidence bundle contract
- Schema Versioning And Migration Policy — evidence compatibility and future-version rejection policy
- Release Maintenance Checklist — repeatable local, clean-room, optional Jetson, tag, and GitHub Release gate
Operational records:
- EdgeEnv v0.1.5 Release Rehearsal — clean-room source archive release gate and patch-candidate judgement
- EdgeEnv v0.1.4 Follow-up Note — previous release quality baseline
- EdgeEnv v0.1.4 Bilingual Docs Sanity Sweep — README, Korean README, and representative docs reading-path check
- EdgeEnv v0.1.4 Release Rehearsal — release quality gate run before the v0.1.4 candidate
- EdgeEnv v0.1.4 Post-release Sanity Sweep — post-release check of README, follow-up note, and GitHub Release wording
- Release Quality Gate Refresh — local release smoke script and optional Jetson gate after the six-month quality roadmap
- README Quickstart Clean-room Rehearsal — fresh source archive and venv validation of the README path
- Jetson Measurement Operations Checklist — repeated hardware measurement procedure
- Jetson Sampled Evidence Bundle Handoff — sampled bundle export/import and imported compare validation
- EdgeEnv MVP v1 Handoff Status — current capability snapshot and future-work entry points
- First-user Feedback Backlog — v0.1.5 candidate usability observations before new feature work
Design references:
- InferEdgeEnv Six-Month Quality Roadmap
- InferEdgeEnv Portfolio Summary
- Evidence Contract Conformance Suite
- CLI Error Message Polish
- Local Real Benchmark Example Guide
- Local Runner Design
- Resource Metrics Design
- Sampler Metadata Artifact Policy
- Bundle Report Generation Design
name: yolov8n-fire-fake
command: python run_yolov8n.py --input fire.jpg
model_name: yolov8n-fire
model_version: "1.0"
model_format: onnx
model_path: models/yolov8n-fire.onnx
task: object-detection
input_shape: [1, 3, 640, 640]
input_dtype: float32
runtime: fake-runtime
execution_provider: fake-provider
precision: fp32
batch_size: 1
warmup_runs: 3
repeat_runs: 10
include_preprocess: true
include_postprocess: true
timeout_seconds: 30
working_directory: .
extra_env:
LOCAL_DEMO_FLAG: enabledtarget_name: local-fake
target_type: fake
board_name: local-dev-machine
os: macOS
runtime_tags:
- fake
- localMVP v1 accepts fake and local target types. SSH is reserved for a later version.
Required same-condition fields:
model_hashinput_shapeinput_dtypetaskprecisionbatch_sizewarmup_runsrepeat_runsinclude_preprocessinclude_postprocess
If these fields match and runtime, execution provider, and target also match, EdgeEnv reports:
Comparable: Yes
Mode: same-condition
For same-condition comparisons only, report compare also prints supplemental latency and throughput deltas after the comparability judgement. Conditional and non-comparable reports do not print metric deltas, and EdgeEnv does not produce rankings or composite scores.
If required fields differ, EdgeEnv reports:
Comparable: No
Reason:
- Different model hash
- Different input shape
If required fields match but runtime, execution provider, or target differs, EdgeEnv reports:
Comparable: Conditional
Mode: runtime-comparison
Reason:
- Same model hash
- Same input shape
- Different runtime or execution provider
.edgeenv/
runs.db
runs/
<run_id>/
result.json
config.yaml
target.yaml
env.json
stdout.log
stderr.log
failed-runs/
<run_id>/
failure.json
config.yaml
target.yaml
env.json
stdout.log
stderr.log
runs.db is a local SQLite index. The run directory remains the evidence bundle.
Failed local runs are stored under failed-runs/ for debugging and are not inserted into runs.db. Use edgeenv failed-runs list and edgeenv failed-runs show <run_id> to inspect failed-run artifacts safely.
Resource metrics remain canonical in result.json. runs.db also keeps a rebuildable resource_metric_index so edgeenv runs resources list --metric <name> can find runs by normalized memory, power, energy, or temperature evidence without turning those values into rankings or comparability gates. Add --json when scripts need the same supplemental lookup results with explicit filters, units, and source counts.
Use edgeenv runs export <run_id> --output edgeenv-run-<run_id>.zip to create a portable successful-run evidence bundle. Use edgeenv runs import edgeenv-run-<run_id>.zip to validate the bundle, copy it into .edgeenv/runs/, and rebuild the local registry row.
Use edgeenv failed-runs export <run_id> --output edgeenv-failed-run-<run_id>.zip and edgeenv failed-runs import edgeenv-failed-run-<run_id>.zip for portable failed-run diagnostic evidence. Failed-run import copies files into .edgeenv/failed-runs/ and does not update runs.db. The artifact-first zip contract is described in Export/Import Design.
Use edgeenv report bundle-summary --scenario <label>:<run_id_a>:<run_id_b> to generate a read-only Markdown handoff summary from imported successful runs and normal compare judgement. The summary is for human review only; it does not replace result.json, sampler artifacts, manifests, or report compare.
InferEdge validates whether a model is deployable across build provenance, runtime execution, evaluation, comparison, optional diagnosis, and deployment decision reports.
In portfolio terms, InferEdgeLab is the validation / decision layer. InferEdgeEnv is the v0.1.5 v1-complete experiment hygiene / comparability layer.
InferEdgeEnv records whether benchmark evidence can be trusted and compared. Its scope is narrower and separate: local run artifacts, SQLite registry rows, portable evidence bundles, and comparability judgement.
In the top-level InferEdge ecosystem map, InferEdgeEnv is the v0.1.5 v1-complete experiment hygiene / comparability layer. It is not part of the pinned Core 4 validation path, but it has a completed role: preserving benchmark evidence and judging same-condition, conditional, or non-comparable runs before any metric delta is discussed.
InferEdgeOrchestrator is also separate: it is the post-deployment operation-control layer for scheduling, load shedding, telemetry, and runtime coordination after a model is already deployed. InferEdgeEnv does not control live inference operations; it records benchmark evidence and preserves honest comparison boundaries before or around review handoff.
EdgeBench is adjacent in benchmark motivation, but InferEdgeEnv is not a public leaderboard. It is a local-first run evidence registry and comparability checker, not a ranking surface.
Included in MVP v1:
- Python CLI skeleton
- Typer-based CLI
- Rich output
- Pydantic benchmark config and target profile schemas
- FakeRunner deterministic benchmark result
- LocalRunner command execution with explicit metrics JSON capture
- Local runtime adapter example for user-owned command integration
- Result JSON and artifact directory creation
- SQLite local registry
runs listandruns showruns resources listruns exportruns importfailed-runs list,failed-runs show,failed-runs export, andfailed-runs import- Jetson
tegrastatswrapper example for optional resource metrics report comparecomparability checkerreport bundle-summaryread-only Markdown handoff summary- pytest tests
Non-goals:
- OS, VM, WSL, Docker, SSH target implementation
- Cloud DB, auth, web dashboard, public leaderboard
- Model or dataset upload service
- Single-score model ranking