Skip to content

gwonxhj/InferEdgeEnv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

125 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EdgeEnv

Language: English | 한국어

InferEdgeEnv is a local-first run evidence registry and comparability checker for Edge AI inference benchmark results. The user-facing CLI command is edgeenv.

Start Here for v0.1.5

v0.1.5 is the current v1-complete release baseline. InferEdgeEnv v1 is complete as a local-first run evidence registry and comparability checker; later work should be treated as v1.1+ extensions, not missing MVP scope. The first path is:

  1. Install and run doctor.
  2. Record a deterministic fake run.
  3. Try a local command run.
  4. Compare only after EdgeEnv checks comparability.
  5. Use Jetson docs only when you are ready to run EdgeEnv locally on the Jetson shell.

Validated scope: fake/local benchmark recording, artifact storage, registry lookup, export/import, comparability reports, optional resource metrics, read-only bundle summaries, and Jetson tegrastats sampled evidence through local execution on Jetson.

Start with Quickstart. If install fails while pip is fetching build dependencies, check Install And Quickstart Resilience before treating it as an EdgeEnv runtime failure.

If the first path is confusing or blocked, open a README Quickstart feedback issue and use the first-user feedback backlog to classify the first blocked step.

After the first fake run, choose the next path:

Problem

Edge inference results are easy to record but hard to compare honestly. A latency number is only meaningful when model identity, input shape, precision, batch size, warmup/repeat protocol, and preprocess/postprocess boundaries are known.

EdgeEnv focuses on recording benchmark evidence locally and judging whether two runs are directly comparable, conditionally comparable, or not comparable.

What EdgeEnv Is Not

EdgeEnv is not:

  • An OS, bootloader, GRUB, BCD, or Linux compatibility layer
  • A VM, Docker, WSL, or cloud target manager
  • A cloud database, login/auth system, web dashboard, or public leaderboard
  • A model upload server or dataset upload server
  • A single-score ranking system for all models

Quickstart

Install and confirm both entrypoints:

python -m pip install -e ".[dev]"
python -m inferedge_env.cli doctor
edgeenv doctor

1. Record a Fake Run

Run the deterministic fake benchmark first. This checks the CLI, config schema, artifact writer, and registry without executing a real model.

edgeenv profile validate examples/profiles/local_fake.yaml
edgeenv bench validate examples/benches/yolov8n_fire.yaml
edgeenv bench run --target examples/profiles/local_fake.yaml --config examples/benches/yolov8n_fire.yaml
edgeenv runs list
edgeenv runs show <run_id>

Use the Run ID printed by bench run, or copy it from edgeenv runs list, when replacing <run_id>.

2. Record a Local Command Run

Then try the local runner examples. These execute small deterministic Python commands on the current machine.

edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_echo_metrics.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_resource_metrics.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_template.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_adapter_template.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_runtime_adapter.yaml

The local target executes command on the current machine and reads an explicit EDGEENV_METRICS_JSON= line from stdout. Local commands may also emit an optional EDGEENV_RESOURCE_METRICS_JSON= line for memory, power, energy, or temperature evidence. bench run reports whether resource metrics were stored or omitted.

To connect your own benchmark command, start from examples/scripts/adapter_template.py when wrapping an existing command, or examples/scripts/local_benchmark_template.py when writing the benchmark loop directly. Then review the adapter pattern in Local Real Benchmark Example Guide.

3. Compare Two Runs

Compare two registered runs after you have at least two successful run IDs. EdgeEnv prints the comparability judgement before any metric delta.

edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_compare_a.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_compare_b.yaml
edgeenv runs list
edgeenv report compare <run_id_a> <run_id_b>

For the full flow, see Compare Workflow Guide.

4. Optional Resource And Sampler Evidence

Sampler wrapper examples show the first integration boundary for optional resource evidence.

edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_sampler_wrapper.yaml
edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_sampler_unavailable.yaml

On Jetson, use the tegrastats wrapper path from the repo root:

edgeenv bench run --target examples/profiles/jetson_nano_local.yaml --config examples/benches/jetson_tegrastats_local.yaml

For the sampler adapter lifecycle path on Jetson, use the sampled local profile and inspect sampler metadata without opening artifact files manually:

edgeenv bench run --target examples/profiles/jetson_nano_sampled_local.yaml --config examples/benches/jetson_sampled_local.yaml
edgeenv runs sampler show <run_id>

If a sampler is unavailable, the wrapper should omit EDGEENV_RESOURCE_METRICS_JSON= and preserve the successful primary benchmark run. If a wrapper emits malformed resource metrics, EdgeEnv writes a failed-run artifact and does not update the registry:

edgeenv bench run --target examples/profiles/local.yaml --config examples/benches/local_sampler_malformed_resource.yaml
edgeenv failed-runs list
edgeenv failed-runs show <failed_run_id>
edgeenv failed-runs export <failed_run_id> --output edgeenv-failed-run-<failed_run_id>.zip
edgeenv failed-runs import edgeenv-failed-run-<failed_run_id>.zip

5. Inspect Evidence

runs show reads the result artifact and includes resource evidence when the local command emits it:

edgeenv runs show <run_id>
edgeenv runs resources list --metric memory_peak_mb
edgeenv runs resources list --metric memory_peak_mb --json
{
  "resource_metrics": {
    "energy_j": 31.7,
    "memory_mean_mb": 420.5,
    "memory_peak_mb": 512.0,
    "power_mean_w": 8.2,
    "power_peak_w": 11.4,
    "source": "example-script",
    "temperature_peak_c": 72.0
  }
}

The fake target uses FakeRunner, so it does not execute a real model. Local benchmark configs may set timeout_seconds, working_directory, and uppercase extra_env keys for controlled command execution. The Python package is inferedge_env; the user-facing CLI command remains edgeenv.

Guide Map

English representative path:

Operational records:

Design references:

Benchmark Config Example

name: yolov8n-fire-fake
command: python run_yolov8n.py --input fire.jpg
model_name: yolov8n-fire
model_version: "1.0"
model_format: onnx
model_path: models/yolov8n-fire.onnx
task: object-detection
input_shape: [1, 3, 640, 640]
input_dtype: float32
runtime: fake-runtime
execution_provider: fake-provider
precision: fp32
batch_size: 1
warmup_runs: 3
repeat_runs: 10
include_preprocess: true
include_postprocess: true
timeout_seconds: 30
working_directory: .
extra_env:
  LOCAL_DEMO_FLAG: enabled

Target Profile Example

target_name: local-fake
target_type: fake
board_name: local-dev-machine
os: macOS
runtime_tags:
  - fake
  - local

MVP v1 accepts fake and local target types. SSH is reserved for a later version.

Comparability Rules

Required same-condition fields:

  • model_hash
  • input_shape
  • input_dtype
  • task
  • precision
  • batch_size
  • warmup_runs
  • repeat_runs
  • include_preprocess
  • include_postprocess

If these fields match and runtime, execution provider, and target also match, EdgeEnv reports:

Comparable: Yes
Mode: same-condition

For same-condition comparisons only, report compare also prints supplemental latency and throughput deltas after the comparability judgement. Conditional and non-comparable reports do not print metric deltas, and EdgeEnv does not produce rankings or composite scores.

If required fields differ, EdgeEnv reports:

Comparable: No
Reason:
- Different model hash
- Different input shape

If required fields match but runtime, execution provider, or target differs, EdgeEnv reports:

Comparable: Conditional
Mode: runtime-comparison
Reason:
- Same model hash
- Same input shape
- Different runtime or execution provider

Local Registry Layout

.edgeenv/
  runs.db
  runs/
    <run_id>/
      result.json
      config.yaml
      target.yaml
      env.json
      stdout.log
      stderr.log
  failed-runs/
    <run_id>/
      failure.json
      config.yaml
      target.yaml
      env.json
      stdout.log
      stderr.log

runs.db is a local SQLite index. The run directory remains the evidence bundle. Failed local runs are stored under failed-runs/ for debugging and are not inserted into runs.db. Use edgeenv failed-runs list and edgeenv failed-runs show <run_id> to inspect failed-run artifacts safely.

Resource metrics remain canonical in result.json. runs.db also keeps a rebuildable resource_metric_index so edgeenv runs resources list --metric <name> can find runs by normalized memory, power, energy, or temperature evidence without turning those values into rankings or comparability gates. Add --json when scripts need the same supplemental lookup results with explicit filters, units, and source counts.

Use edgeenv runs export <run_id> --output edgeenv-run-<run_id>.zip to create a portable successful-run evidence bundle. Use edgeenv runs import edgeenv-run-<run_id>.zip to validate the bundle, copy it into .edgeenv/runs/, and rebuild the local registry row.

Use edgeenv failed-runs export <run_id> --output edgeenv-failed-run-<run_id>.zip and edgeenv failed-runs import edgeenv-failed-run-<run_id>.zip for portable failed-run diagnostic evidence. Failed-run import copies files into .edgeenv/failed-runs/ and does not update runs.db. The artifact-first zip contract is described in Export/Import Design.

Use edgeenv report bundle-summary --scenario <label>:<run_id_a>:<run_id_b> to generate a read-only Markdown handoff summary from imported successful runs and normal compare judgement. The summary is for human review only; it does not replace result.json, sampler artifacts, manifests, or report compare.

Relation To InferEdge And EdgeBench

InferEdge validates whether a model is deployable across build provenance, runtime execution, evaluation, comparison, optional diagnosis, and deployment decision reports.

In portfolio terms, InferEdgeLab is the validation / decision layer. InferEdgeEnv is the v0.1.5 v1-complete experiment hygiene / comparability layer.

InferEdgeEnv records whether benchmark evidence can be trusted and compared. Its scope is narrower and separate: local run artifacts, SQLite registry rows, portable evidence bundles, and comparability judgement.

In the top-level InferEdge ecosystem map, InferEdgeEnv is the v0.1.5 v1-complete experiment hygiene / comparability layer. It is not part of the pinned Core 4 validation path, but it has a completed role: preserving benchmark evidence and judging same-condition, conditional, or non-comparable runs before any metric delta is discussed.

InferEdgeOrchestrator is also separate: it is the post-deployment operation-control layer for scheduling, load shedding, telemetry, and runtime coordination after a model is already deployed. InferEdgeEnv does not control live inference operations; it records benchmark evidence and preserves honest comparison boundaries before or around review handoff.

EdgeBench is adjacent in benchmark motivation, but InferEdgeEnv is not a public leaderboard. It is a local-first run evidence registry and comparability checker, not a ranking surface.

MVP Scope

Included in MVP v1:

  • Python CLI skeleton
  • Typer-based CLI
  • Rich output
  • Pydantic benchmark config and target profile schemas
  • FakeRunner deterministic benchmark result
  • LocalRunner command execution with explicit metrics JSON capture
  • Local runtime adapter example for user-owned command integration
  • Result JSON and artifact directory creation
  • SQLite local registry
  • runs list and runs show
  • runs resources list
  • runs export
  • runs import
  • failed-runs list, failed-runs show, failed-runs export, and failed-runs import
  • Jetson tegrastats wrapper example for optional resource metrics
  • report compare comparability checker
  • report bundle-summary read-only Markdown handoff summary
  • pytest tests

Non-goals:

  • OS, VM, WSL, Docker, SSH target implementation
  • Cloud DB, auth, web dashboard, public leaderboard
  • Model or dataset upload service
  • Single-score model ranking

Design Notes