InferEdgeOrchestrator

Language: English | 한국어

Release: v0.1.2

InferEdgeOrchestrator is a post-deployment runtime operation-control layer and lightweight scheduler for constrained edge devices. It controls multiple inference tasks after deployment, using per-task priority, latency budgets, bounded queues, load shedding, and telemetry so high-priority workloads stay responsive when backlog and latency spikes appear.

It is not a Triton or DeepStream replacement. The project is a runtime operation-control layer that makes overload-control decisions explicit, testable, and explainable.

The goal is not maximum-throughput serving. The goal is controllable inference behavior under constrained edge workloads.

Portfolio positioning: post-deployment runtime operation control, not Triton/DeepStream replacement or throughput serving.

Portfolio brief: PORTFOLIO.md (한국어)

30-Second Read

Solves the post-deployment operation problem: what runs first, what gets dropped, and why, when edge inference tasks contend for limited resources.
Protects high-priority workloads with priority/deadline-aware scheduling, bounded queues, and adaptive load shedding.
Does not silently drop work: overload decisions, drop reasons, and protected tasks are recorded as structured telemetry evidence.
Connects Forge agent_manifest.json and Runtime result.agent metadata to an inferedge-orchestration-summary-v1 scheduling evidence contract.
Validated with local pytest, GitHub Actions package/CLI smoke, synthetic overload comparison, Jetson dummy/ONNX smoke, and Jetson TensorRT-backed contention evidence.

What It Does

Runtime concern	Implementation
Multi-task inference	Config-driven task registration for detector/classifier/OCR-style workloads
Priority control	Priority and deadline-aware scheduling based on `priority` and `latency_budget_ms`
Backlog control	Bounded per-task queues with `drop_oldest`, `drop_newest`, and low-priority shedding behavior
Overload stability	Adaptive load shedding limits low-priority work to protect high-priority latency
Worker abstraction	Shared worker interface with `dummy`, `onnxruntime`, and TensorRT-backed workers
Runtime evidence	Telemetry JSON records executed/dropped counts, latency, backlog, result events, resource snapshots, and policy decisions
Agent contract bridge	Optional task references to Forge agent manifests and Runtime agent results, exported as orchestration summary evidence
Remote dispatch starter	File-based worker registry + task request contract selects an edge worker and records decision, fallback, plan-only evidence, and optional explicit HTTP/SSH starter execution evidence
Jetson smoke coverage	Jetson Orin Nano smoke scripts exercise CLI, telemetry, `tegrastats` parsing, ONNX Runtime execution, and TensorRT-backed contention

Runtime Model

Input Source
-> Frame Router
-> Bounded Task Queues
-> Priority + Deadline-Aware Scheduler
-> Inference Worker
-> Result Aggregator
-> Telemetry Logger

Each task is defined by operational policy:

{
  "name": "detector",
  "model_path": "models/detector.onnx",
  "priority": 100,
  "target_fps": 15,
  "latency_budget_ms": 80,
  "queue_size": 4,
  "drop_policy": "drop_oldest",
  "worker": "dummy"
}

The scheduler's job is not to run every frame. It decides which task should run next, which frames are stale enough to drop, and when low-priority work should be limited so high-priority latency remains inside budget.

InferEdge Ecosystem Boundary

InferEdge validates deployability. InferEdgeEnv records whether benchmark evidence can be trusted and compared. InferEdgeOrchestrator controls deployed workloads under load.

flowchart LR
    subgraph Validation["Validation Layer"]
        Forge["InferEdgeForge\nmodel conversion\nbuild provenance"]
        Runtime["InferEdge-Runtime\ndevice execution\nresult.json"]
        Lab["InferEdgeLab\ncomparison\ndeployment decision"]
        AIGuard["InferEdgeAIGuard\noptional anomaly/risk\nrecommendation"]
    end

    subgraph Comparability["Experiment Hygiene / Comparability Layer"]
        Env["InferEdgeEnv\nrun evidence registry\ncomparability judgement"]
    end

    subgraph Operation["Operation Layer"]
        Orchestrator["InferEdgeOrchestrator\npriority scheduling\nload shedding\nruntime telemetry"]
    end

    Forge --> Runtime --> Lab
    Lab -. optional guard analysis .-> AIGuard
    Runtime -. benchmark evidence .-> Env
    Lab -->|"deployable model + result.json"| Orchestrator
    AIGuard -. risk signals .-> Lab

The boundary is intentional:

InferEdge answers whether a model is safe and reasonable to deploy.
InferEdgeEnv answers whether benchmark evidence can be trusted and compared.
InferEdgeOrchestrator controls how deployed inference tasks behave together.
Orchestrator integration is file-based through result.json, not direct imports.

Implementation Map

Phase	Delivered capability	Evidence
Phase 1: Scheduler Core	Config schema, dummy frame source, bounded queues, priority/deadline scheduler, dummy worker, load shedding, telemetry export	Pytest coverage for scheduler, queue, shedding, and telemetry
Phase 2: ONNX Runtime Worker	Config-selectable ONNX Runtime worker, identity ONNX smoke model, image/video input path support	`configs/phase2_onnx_demo.json`, `scripts/create_identity_onnx.py`
Phase 3: Overload Scenario	FIFO baseline vs scheduler/load-shedding comparison	`python3 -m inferedge_orchestrator compare-overload ...`
Phase 4: Jetson Smoke	Jetson CLI smoke, telemetry generation, resource snapshots, optional `tegrastats` parsing	`scripts/smoke_jetson_dummy.sh`, `scripts/smoke_jetson_onnx.sh`
Phase 5: InferEdge Handoff	`result.json` latency signal converted into Orchestrator task config	`python3 -m inferedge_orchestrator from-inferedge ...`
Agent Runtime Contract	Vision / Voice-Command / Safety-Monitor dummy workload with Forge agent manifest and Runtime `result.agent` references	`configs/agent_3_workload_demo.json`, `docs/agent_orchestration_summary_contract.md`
Sustained Agent Scenario Starter	Normal / overload / sustained-high-load 3-agent modes with queue-depth timeline, latency timeline, and policy decision reasons	`configs/agent_3_workload_sustained_high_load.json`
Lightweight Sustained Workload Starter	Profiled local sustained scenario for YOLO-like vision, Whisper-like command burst, FastAPI-style ingress, optional tegrastats timeline, and producer-backed starters	`python3 -m inferedge_orchestrator run-multi-workload-sustained ...`
Device-Local Sustained Starter	Device-local mode using committed image, request, and resource snapshot producers before live device integrations	`configs/agent_multi_workload_sustained_device_local.json`
Remote Dispatch Starter	File-based worker registry and task request contract for selecting a remote edge worker; optional `--execute-plan` records explicit HTTP/SSH starter and bounded fallback evidence without claiming production remote execution	`docs/remote_dispatch_starter.md`

Validation Evidence

These results are lifecycle evidence, not benchmark claims. Smoke runs prove the runtime paths execute on edge hardware; the synthetic overload run proves the scheduler policy; the InferEdge handoff proves the validation-to-operation file boundary.

Evidence	Key result	Artifact
Jetson dummy smoke	`nano01` generated telemetry, resource snapshots, and low-priority drops: detector `20/0`, classifier `2/18` executed/dropped	`examples/telemetry/jetson_smoke_dummy_sample.json`
Jetson ONNX Runtime smoke	`onnxruntime` worker executed identity ONNX on Jetson with `CPUExecutionProvider`, output shape `[1, 2]`, 13 `tegrastats` samples	`examples/telemetry/jetson_onnx_smoke_sample.json`
Jetson TensorRT inference smoke	Built `models/identity_fp16.plan` from identity ONNX on Jetson, executed one TensorRT identity frame, and confirmed runtime telemetry metadata: `PASS_TENSORRT_INFERENCE`, `PASS_TENSORRT_TELEMETRY`	`docs/validation_evidence.md`
Jetson TensorRT contention smoke	Ran high-priority and low-priority TensorRT tasks through scheduler/load-shedding contention: `PASS_TENSORRT_CONTENTION`	`examples/telemetry/jetson_tensorrt_contention_sample.json`
Jetson TensorRT diverse contention smoke	Ran distinct generated detector/classifier TensorRT engines through scheduler/load-shedding contention: detector `6/0`, classifier `1/5` executed/dropped, `5` overload events, `PASS_TENSORRT_DIVERSE_CONTENTION`	`examples/telemetry/jetson_tensorrt_diverse_contention_sample.json`
Synthetic overload comparison	Detector p95 end-to-end latency improved from `782.0ms` FIFO baseline to `8.0ms` with scheduler + shedding; classifier dropped `16` low-priority frames	`examples/telemetry/phase3_overload_sample.json`
InferEdge result handoff	Sample `expected_latency_ms=42.2` produced recommended `latency_budget_ms=64.0` without importing InferEdge internals	`configs/from_inferedge.json`

Versioned sample telemetry artifacts are available in examples/telemetry/. For the full evidence index, see docs/validation_evidence.md.

Jetson Smoke Commands

CAPTURE_TEGRASTATS=1 scripts/smoke_jetson_dummy.sh

PYTHON_BIN=$HOME/miniconda3/envs/yolo_env/bin/python \
  CAPTURE_TEGRASTATS=1 \
  scripts/smoke_jetson_onnx.sh

Latest device records:

Smoke	Device	OS / L4T	Python	Result	Note
Dummy scheduler smoke	`nano01`	`Ubuntu 22.04.5 LTS`, `L4T R36.4.7`	`3.10.12`	`PASS`	CLI, telemetry, resource snapshots, low-priority drops
ONNX Runtime smoke	`nano01`	`Ubuntu 22.04.5 LTS`, `L4T R36.4.7`	`3.10.12`	`PASS`	ONNX Runtime `1.23.2`, `CPUExecutionProvider`, output metadata recorded

These smoke records validate worker, scheduler, telemetry, and Jetson execution paths. They are not TensorRT/GPU throughput benchmarks.

Overload Comparison

python3 -m inferedge_orchestrator compare-overload \
  --config configs/phase3_overload.json \
  --output reports/phase3_overload.json \
  --frames 20

Mode	Detector executed	Detector dropped	Detector p95 end-to-end latency	Classifier executed	Classifier dropped	Overload events
FIFO baseline	20	0	782.0ms	20	0	0
Scheduler + load shedding	20	0	8.0ms	4	16	16

This is the core runtime operation-control story: low-priority classifier work is intentionally dropped under overload so the high-priority detector stays within latency budget, and the reason is visible in telemetry.

Multi-Workload Sustained Starter

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_local.json \
  --output reports/agent_multi_workload_sustained.json \
  --frames 16

This starter keeps the same local-first contract boundary while adding workload-profile evidence for a YOLO-like vision loop, Whisper-like command burst, FastAPI-style request ingress, and optional tegrastats timeline:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_local.json \
  --output reports/agent_multi_workload_sustained.json \
  --frames 16 \
  --tegrastats-log reports/tegrastats.log

The default implementation uses lightweight local CPU profile adapters so CI and local laptops can exercise workload pressure without YOLO, Whisper, FastAPI, or Jetson dependencies. The Vision starter can also read a local image fixture:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_vision_file.json \
  --output reports/agent_multi_workload_sustained_vision_file.json \
  --frames 16

This records producer_source=image_file, input digest, sampled bytes, and Vision workload pressure while keeping ONNX/Yolo integration as a later step. A Voice ingress starter can additionally read a local FastAPI-style request fixture:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_voice_ingress.json \
  --output reports/agent_multi_workload_sustained_voice_ingress.json \
  --frames 16

This records producer_source=fastapi_request_fixture, selected routes, request digest, and Voice burst pressure without starting a real FastAPI server or Whisper backend. A Safety monitor starter can also read local resource snapshots:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_safety_resource.json \
  --output reports/agent_multi_workload_sustained_safety_resource.json \
  --frames 16

This records producer_source=resource_snapshot_fixture, CPU/memory/temperature signals, fallback/deadline signals, and a deterministic degradation score while keeping live device monitor integration as a later step.

Run the device-local sustained starter when you want the three committed local producer fixtures in one explicit device_local mode:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_device_local.json \
  --output reports/agent_multi_workload_sustained_device_local.json \
  --frames 16

This records producer_sources, device_local_producer_count, and the same Vision image, Voice request, and Safety resource evidence while keeping live YOLO/ONNX, FastAPI, tegrastats, and Jetson/RPi producers as follow-up integrations.

You can replace those committed producer fixtures at run time without editing the config. --vision-input accepts a single image/video file or a directory of image frames; directories are treated as a deterministic image sequence and cycled during the sustained run:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_device_local.json \
  --output reports/agent_multi_workload_sustained_device_local.json \
  --frames 16 \
  --vision-input /path/to/frame-or-image-sequence \
  --voice-ingress-payload /path/to/requests.json \
  --resource-snapshot /path/to/resources.json

For a minimal process-backed Safety input, use --capture-process-resource-snapshot instead of --resource-snapshot. The CLI writes a small process resource snapshot next to the output JSON and routes it through the Safety producer. If optional ONNX Runtime dependencies and a local model file are available, the Vision producer can also run a small ONNX Runtime probe while preserving the same producer-backed sustained contract:

python3 -m inferedge_orchestrator run-multi-workload-sustained \
  --config configs/agent_multi_workload_sustained_device_local.json \
  --output reports/agent_multi_workload_sustained_device_local.json \
  --frames 16 \
  --vision-input /path/to/frame.ppm \
  --vision-onnx-model /path/to/vision_model.onnx

This records vision_inference_backend=onnxruntime, input/output shapes, provider, and probe latency as runtime operation evidence. It is a lightweight device-local producer step, not a full live YOLO service.

InferEdge Handoff

python3 -m inferedge_orchestrator from-inferedge \
  --result examples/inferedge_result_sample.json \
  --output configs/from_inferedge.json \
  --task-name detector \
  --model-path models/detector.onnx \
  --priority 100 \
  --target-fps 15 \
  --queue-size 4

The helper reads InferEdge result.json latency signals and recommends an initial latency_budget_ms for Orchestrator task policy. This keeps validation and operation control connected by artifacts while keeping the repositories separate.

Quickstart

Install the local package with test dependencies:

python3 -m pip install -e '.[dev]'

Run the tests:

python3 -m pytest

Run the scheduler demo:

python3 -m inferedge_orchestrator run \
  --config configs/phase1_demo.json \
  --output reports/phase1_demo.json \
  --frames 12

Run the ONNX Runtime demo:

python3 -m pip install -e '.[onnx,dev]'
python3 scripts/create_identity_onnx.py --output models/identity.onnx

python3 -m inferedge_orchestrator run \
  --config configs/phase2_onnx_demo.json \
  --output reports/phase2_onnx_demo.json \
  --frames 1

Print a telemetry summary:

python3 -m inferedge_orchestrator report --input reports/phase1_demo.json

For more detail, see:

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
examples		examples
scripts		scripts
src/inferedge_orchestrator		src/inferedge_orchestrator
tests		tests
.gitignore		.gitignore
CHANGELOG.ko.md		CHANGELOG.ko.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
PORTFOLIO.ko.md		PORTFOLIO.ko.md
PORTFOLIO.md		PORTFOLIO.md
README.ko.md		README.ko.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferEdgeOrchestrator

30-Second Read

What It Does

Runtime Model

InferEdge Ecosystem Boundary

Implementation Map

Validation Evidence

Jetson Smoke Commands

Overload Comparison

Multi-Workload Sustained Starter

InferEdge Handoff

Quickstart

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferEdgeOrchestrator

30-Second Read

What It Does

Runtime Model

InferEdge Ecosystem Boundary

Implementation Map

Validation Evidence

Jetson Smoke Commands

Overload Comparison

Multi-Workload Sustained Starter

InferEdge Handoff

Quickstart

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages