Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,18 @@ Use `edgeenv runs telemetry export-history --output <path>` to aggregate
registered run telemetry into an `edgeenv.runtime-telemetry-history.v1` JSON
artifact. The export records missing telemetry as an evidence gap and remains
local replay evidence, not production monitoring.
If an InferEdgeOrchestrator sustained run produced an
`edgeenv_runtime_telemetry_feed` artifact, attach it during export:

```bash
edgeenv runs telemetry export-history \
--orchestrator-feed /tmp/orchestrator-edgeenv-feed.json \
--output /tmp/edgeenv-runtime-telemetry-history.json
```

The feed is stored as supplemental operation context for the matching run ID.
It does not replace Runtime telemetry, change comparability, or act as a
regression judgement.
Use `edgeenv runs telemetry inspect-history <path>` to validate and summarize
that replay artifact before attaching it to a regression report. The intended
local flow is export history, inspect the replay artifact, then pass it to
Expand Down
34 changes: 34 additions & 0 deletions docs/runtime-telemetry-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,28 @@ History export command:
edgeenv runs telemetry export-history --output /tmp/edgeenv-runtime-telemetry-history.json
```

Optional Orchestrator operation context can be attached when the feed comes
from InferEdgeOrchestrator's EdgeEnv handoff contract:

```bash
edgeenv runs telemetry export-history \
--orchestrator-feed /tmp/orchestrator-edgeenv-feed.json \
--output /tmp/edgeenv-runtime-telemetry-history.json
```

The feed schema is:

```text
inferedge-orchestrator-edgeenv-runtime-telemetry-feed-v1
```

EdgeEnv only accepts this feed when it explicitly declares
`not_a_regression_judgement=true` and `not_a_comparability_gate=true`. The feed
is then preserved under the matching history entry as
`orchestrator_operation_context`. It does not replace `runtime_telemetry`, does
not turn missing telemetry into a successful telemetry run, and does not change
the same-condition comparability gate.

Replay validation command:

```bash
Expand All @@ -74,6 +96,11 @@ The history artifact uses this top-level shape:
"run_id": "run-20260522-000000-12345678",
"runtime_telemetry": {
"schema_version": "inferedge-runtime-telemetry-v1"
},
"orchestrator_operation_context": {
"schema_version": "inferedge-orchestrator-edgeenv-runtime-telemetry-feed-v1",
"not_a_regression_judgement": true,
"not_a_comparability_gate": true
}
}
],
Expand Down Expand Up @@ -128,6 +155,11 @@ Replay edge cases are preserved as evidence context:
preserves both result-side and history-side sequence IDs. This does not
change comparability or regression math; downstream diagnosis can treat it as
deterministic review context.
- If an Orchestrator feed is attached, the regression report exposes it under
the matching run's runtime telemetry context as supplemental operation
evidence. Queue depth, deadline/fallback, and resource hints remain context
for downstream review; EdgeEnv still owns only comparability-first regression
analysis.

Optional AIGuard handoff:

Expand All @@ -149,6 +181,7 @@ remains the final deployment decision owner.
- Do not describe this as production observability, cloud monitoring, distributed tracing, or real-time data drift detection.
- Do not use telemetry to bypass the existing comparability-first regression policy.
- Do not treat `inspect-history` as a live health check; it only validates a local replay artifact.
- Do not use Orchestrator operation feed context as a substitute for Runtime telemetry or Lab deployment judgement.

## 5. WHERE — Role In The InferEdge Flow

Expand All @@ -159,6 +192,7 @@ Current flow:
```text
Runtime result
-> EdgeEnv result.json + runtime_telemetry.json
-> optional Orchestrator edgeenv_runtime_telemetry_feed context
-> EdgeEnv export/import replay seed
-> EdgeEnv runtime telemetry history artifact
-> EdgeEnv inspect-history replay validation
Expand Down
22 changes: 22 additions & 0 deletions inferedge_env/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,13 +382,22 @@ def export_runtime_telemetry_history(
"--edgeenv-root",
help="Directory for EdgeEnv artifacts and registry.",
),
orchestrator_feeds: Optional[list[Path]] = typer.Option(
None,
"--orchestrator-feed",
help=(
"Optional InferEdgeOrchestrator EdgeEnv telemetry feed JSON to attach "
"as supplemental operation context. Repeat for multiple run IDs."
),
),
) -> None:
"""Export local runtime telemetry evidence as a replayable history artifact."""
try:
payload = write_runtime_telemetry_history(
edgeenv_root,
output_path,
run_ids=run_ids,
orchestrator_feeds=orchestrator_feeds,
)
except (RuntimeTelemetryHistoryError, OSError) as exc:
_fail(str(exc), hint=_telemetry_history_error_hint(str(exc)))
Expand All @@ -398,6 +407,9 @@ def export_runtime_telemetry_history(
console.print(f"Runs scanned: {summary['registered_runs']}")
console.print(f"Telemetry entries: {summary['telemetry_runs']}")
console.print(f"Missing telemetry: {summary['missing_telemetry_runs']}")
console.print(
f"Orchestrator context entries: {summary.get('orchestrator_feed_runs', 0)}"
)
console.print(
"Scope: local replay evidence; not production monitoring.",
soft_wrap=True,
Expand Down Expand Up @@ -428,6 +440,10 @@ def inspect_runtime_telemetry_history_command(
console.print(f"Schema: {summary['schema_version']}")
console.print(f"Replay runs: {len(replay['run_ids'])}")
console.print(f"Telemetry fields: {', '.join(replay['telemetry_fields']) or '-'}")
console.print(
"Orchestrator context runs: "
f"{len(replay.get('orchestrator_context_run_ids', []))}"
)
console.print(f"Evidence gaps: {replay['evidence_gap_count']}")
console.print(f"Missing run IDs: {', '.join(replay['missing_run_ids']) or '-'}")
console.print(
Expand Down Expand Up @@ -1205,6 +1221,12 @@ def _import_error_hint(message: str) -> str:


def _telemetry_history_error_hint(message: str) -> str:
if "Orchestrator telemetry feed" in message:
return (
"Attach only EdgeEnv runtime telemetry feed artifacts produced by "
"InferEdgeOrchestrator for run IDs included in this export. The feed "
"is supplemental operation context and never replaces runtime telemetry."
)
if "Run not found" in message:
return (
"Use `edgeenv runs list` to find registered run IDs, or omit "
Expand Down
20 changes: 20 additions & 0 deletions inferedge_env/compare/regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ def _maybe_runtime_telemetry_context(
"Runtime telemetry context is supplemental evidence, not a comparability gate.",
"Missing telemetry is an evidence gap, not a failed benchmark run.",
"Regression deltas are still gated by same-condition comparability.",
"Orchestrator operation context is supplemental evidence, not a regression judgement.",
],
}
if telemetry_history is not None:
Expand Down Expand Up @@ -321,11 +322,30 @@ def _telemetry_run_context(
context["history_execution_sequence_id"] = history_entry.get(
"execution_sequence_id"
)
_attach_orchestrator_context(context, history_entry)
if missing_entry is not None:
context["history_missing_reason"] = missing_entry.get("reason")
_attach_orchestrator_context(context, missing_entry)
return context


def _attach_orchestrator_context(
context: dict[str, Any],
history_item: dict[str, Any],
) -> None:
orchestrator_context = history_item.get("orchestrator_operation_context")
if not isinstance(orchestrator_context, dict):
return
candidate_context = orchestrator_context.get("candidate_context")
if not isinstance(candidate_context, dict):
candidate_context = {}
context["orchestrator_context_present"] = True
context["orchestrator_operation_context"] = orchestrator_context
context["orchestrator_available_sections"] = sorted(
str(key) for key in candidate_context.keys()
)


def _telemetry_source(telemetry: dict[str, Any]) -> str | None:
resource = telemetry.get("resource")
if not isinstance(resource, dict):
Expand Down
Loading
Loading