InferEdgeAIGuard

Optional deterministic diagnosis evidence layer
(provenance mismatch · suspicious result signals · guard_analysis)

Language: English | 한국어

GitHub description: Optional deterministic diagnosis layer for provenance mismatch and suspicious inference result evidence.

Summary

Optional deterministic diagnosis layer for the InferEdge validation pipeline
Reads Lab compare/result/history JSON and Runtime/Forge provenance evidence
Detects suspicious inference signals, provenance mismatch, and weak validation evidence
Emits guard_analysis as optional evidence for Lab reports/API bundles
Supports review decisions without replacing InferEdgeLab as the decision owner

What Makes InferEdgeAIGuard Different?

InferEdgeAIGuard is not an LLM guessing layer.

It is a rule/evidence based diagnosis layer that:

checks latency, accuracy, provenance, output pattern, and run-history signals
explains suspected causes with deterministic evidence
preserves warnings/errors in a structured guard_analysis contract
stays optional so Lab remains the final deployment decision owner

InferEdge Pipeline Role

InferEdgeAIGuard is the optional rule + evidence based diagnosis layer of the larger InferEdge validation pipeline:

ONNX model
-> InferEdgeForge build
-> metadata / manifest / worker runtime summary
-> InferEdgeRuntime validation / result export
-> InferEdgeLab compare / API / job workflow / deployment_decision
-> optional InferEdgeAIGuard provenance diagnosis
-> deploy / review / blocked decision

Experiment hygiene / comparability layer:
InferEdgeEnv -> v0.1.5 v1-complete local-first run evidence registry / comparability checker

In that pipeline, AIGuard consumes evidence produced by Forge, Runtime, and Lab. It can compare Forge worker/runtime summary provenance with Runtime worker_response provenance, inspect Lab result/compare context, and emit optional guard_analysis for Lab to preserve in reports and API bundles.

Implemented today:

deterministic detector-based reasoning for Lab compare/result/history JSON
evidence schema, severity/verdict mapping, explanation builder, and JSON/Markdown report persistence
output-level bbox validity, bbox collapse, confidence distribution, detection count drift, NaN/Inf, and score range detectors
baseline-vs-candidate comparison for output quality drift and suspicious speed/quality trade-offs
initial temporal consistency evidence for detection count variance, bbox center movement, class flip rate, and track-free temporal instability signals
runtime reliability evidence from Orchestrator orchestration_summary files: deadline miss, drop/fallback, queue backlog, queue pressure reasons, worker operation risk summaries, device-local producer/event coverage, sustained workload profile pressure, local profile adapter signals, and optional tegrastats thermal/resource signals
portfolio demo diagnosis bundle covering normal/pass, bbox collapse/blocked, score saturation/blocked, temporal instability/review_required, and provenance mismatch cases
artifact and source model provenance mismatch detection
Forge summary vs Runtime worker_response provenance mismatch coverage
guard_analysis schema compatibility with Lab deployment decision handoff

Planned later:

production service or worker packaging
broader detector coverage as new Runtime/Forge evidence fields become stable
deeper integration with future SaaS job execution infrastructure

AIGuard is not an LLM guessing layer and does not make the final deployment decision. InferEdgeLab remains the final deployment_decision owner; AIGuard supplies optional evidence that can support review or block decisions.

Portfolio boundary: InferEdgeLab is the validation / decision layer. InferEdgeEnv is the v0.1.5 v1-complete experiment hygiene / comparability layer; it records whether benchmark evidence can be trusted and compared without replacing AIGuard diagnosis evidence or Lab deployment decisions.

Why This Exists

Edge AI에서는 latency 숫자가 좋아 보여도 validation evidence가 충분하지 않을 수 있습니다.

latency가 개선된 것처럼 보여도 accuracy가 기록되지 않았을 수 있습니다.
FP16/INT8 candidate인데 FP32 대비 기대한 speedup이 없을 수 있습니다.
반복 실행 history에서 일부 run만 accuracy가 기록될 수 있습니다.
이런 문제는 단순 benchmark 숫자만 보면 놓치기 쉽습니다.

AIGuard는 inference result를 그대로 믿지 않고, result-level evidence에서 의심 신호와 suspected cause를 설명합니다.

Current Capabilities

Output-level failure detection

YOLO detection output JSON을 직접 분석합니다.

bbox collapse
confidence saturation
detection count mismatch
단일 output, FP32/candidate pair, batch directory 분석 지원

Compare result reasoning

reason-compare 또는 unified reason 명령으로 Lab compare result JSON을 분석합니다.

latency improvement + accuracy missing
latency improvement + accuracy drop 또는 risky tradeoff
shape/run_config mismatch
cross-precision large latency delta

Structured result reasoning

reason-result 또는 unified reason 명령으로 단일 Lab structured result JSON을 분석합니다.

missing latency metric
invalid latency value
p99 latency instability
missing runtime_artifact_path
missing resolved_input_shapes
quantized result without accuracy

Forge/Runtime provenance reasoning

Forge metadata/manifest와 Runtime result JSON의 provenance를 비교하는 rule-based detector를 제공합니다.

artifact sha256 mismatch
source model sha256 mismatch
Forge worker/runtime summary vs Runtime worker_response provenance mismatch
runtime artifact path mismatch
backend/target/precision/shape mismatch
insufficient Forge/Runtime provenance

이 detector는 실제 artifact를 실행하지 않고, Forge가 기록한 build provenance와 Runtime이 기록한 profiling/worker response provenance가 같은 산출물을 가리키는지 evidence 기반으로 확인합니다. 명확한 hash mismatch는 error guard_analysis로 이어질 수 있고, path/config/shape mismatch 또는 provenance 누락은 warning evidence로 남깁니다.

Run history reasoning

reason-history 또는 unified reason 명령으로 repeated Lab structured result list JSON을 분석합니다.

repeated-run mean latency instability
p99 tail latency instability
latency outlier run
mixed experiment group
partial or missing accuracy logging

CLI Overview

Command	Input	Purpose
`analyze`	YOLO output JSON	Single output failure detection
`compare`	FP32/candidate output JSON	Output-level pair comparison
`batch-analyze`	Directory of output JSON	Batch output failure rate
`batch-compare`	FP32/candidate directories	Batch output comparison
`reason-compare`	Lab compare result JSON	Compare result reasoning
`reason-result`	Lab structured result JSON	Single result reasoning
`reason-history`	Lab structured result list JSON	Multi-run stability reasoning
`reason-orchestration`	Orchestrator summary JSON	Runtime reliability reasoning
`reason`	Compare/result/history/orchestration JSON	Unified auto-routing reasoning

Quick Smoke Commands

python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_compare_realistic.json
- Expected: accuracy_missing_warning, likely_quantization_effect
python -m inferedge_aiguard.cli reason --input real_device/jetson/compare_fp32_fp16.json
- Expected: insufficient_precision_speedup
python -m inferedge_aiguard.cli reason --input real_device/jetson/history/yolov8n_fp16_history.json
- Expected: partial_accuracy_missing

Unified Reason CLI

reason 명령은 입력 JSON 타입을 보고 적절한 reasoning 경로로 자동 라우팅합니다.

JSON이 list이면 reason-history와 동일하게 run history reasoning을 수행합니다.
JSON이 Lab compare result dict로 보이면 reason-compare와 동일하게 adapter 정규화 후 compare reasoning을 수행합니다.
JSON이 Lab structured result dict로 보이면 reason-result와 동일하게 단일 result reasoning을 수행합니다.
JSON이 Orchestrator inferedge-orchestration-summary-v1 dict로 보이면 reason-orchestration과 동일하게 runtime reliability reasoning을 수행합니다.

python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_compare_realistic.json
python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_result_realistic.json
python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_history_realistic.json

저장도 같은 entrypoint에서 가능합니다.

python -m inferedge_aiguard.cli reason \
  --input examples/lab_compat/lab_history_realistic.json \
  --save-json reports/reason.json \
  --save-md reports/reason.md

이 구조는 향후 API나 SaaS로 확장할 때 단일 endpoint로 연결하기 좋습니다. 현재 단계에서는 SaaS/API 서버를 구현하지 않고 CLI entrypoint와 JSON/Markdown report 저장만 제공합니다.

명시적 명령이 필요하면 기존 reason-compare, reason-result, reason-history도 그대로 사용할 수 있습니다.

Orchestrator runtime reliability summary도 같은 흐름으로 분석할 수 있습니다.

python -m inferedge_aiguard.cli reason-orchestration \
  --input reports/agent_orchestration_summary.json
python -m inferedge_aiguard.cli reason \
  --input reports/agent_orchestration_summary.json

이 경로는 policy_decision_log, decision_reason, queue_depth_timeline, deadline miss, drop/fallback 신호를 guard_analysis evidence로 변환합니다. AIGuard는 runtime reliability risk를 설명하고, 최종 deployment decision은 계속 InferEdgeLab이 담당합니다.

EdgeEnv runtime regression report도 deterministic runtime anomaly evidence로 해석할 수 있습니다.

python -m inferedge_aiguard.cli reason-edgeenv-regression \
  --input reports/edgeenv_runtime_regression.json
python -m inferedge_aiguard.cli reason \
  --input reports/edgeenv_runtime_regression.json
python -m inferedge_aiguard.cli reason-edgeenv-regression \
  --input examples/runtime_intelligence/edgeenv_runtime_regression_with_orchestrator_feed.json \
  --save-json examples/runtime_intelligence/aiguard_runtime_operation_guard_analysis.json

이 경로는 EdgeEnv의 comparability-first 결과를 존중하면서 runtime_latency_regression, runtime_throughput_regression, runtime_memory_regression, runtime_telemetry_context_coverage, runtime_telemetry_replay_context evidence를 생성합니다. EdgeEnv가 runtime telemetry context에 thermal/throttling 또는 queue depth 신호를 포함하면 runtime_thermal_instability와 runtime_queue_overload evidence도 additive하게 생성합니다. AIGuard는 regression 계산이나 final deployment decision을 소유하지 않습니다. EdgeEnv가 runtime_telemetry_context.history.telemetry_coverage를 제공하면 AIGuard는 해당 producer-side replay summary를 우선 사용해 coverage ratio, missing field run, missing_telemetry_is_failure를 deterministic warning context로 설명합니다. 이 summary가 없을 때만 per-run runtime_telemetry.coverage로 fallback하며, coverage gap을 배포 판단으로 직접 승격하지 않습니다. candidate telemetry gap과 baseline/candidate execution sequence inversion은 EdgeEnv replay context에서 온 warning evidence로 보존되며, AIGuard가 이를 comparability decision으로 재판정하지 않습니다. AIGuard는 EdgeEnv가 보존한 Orchestrator edgeenv_mapping_hint를 raw context에 유지해 coverage_summary_owner=edgeenv, coverage_summary_path=runtime_telemetry_context.history.telemetry_coverage, operation_context_role=supplemental 경계를 Lab bundle까지 설명할 수 있게 합니다. 이 값들은 ownership marker이며 AIGuard가 coverage/regression을 소유한다는 의미가 아닙니다. tests/fixtures/edgeenv_regression/에는 EdgeEnv의 committed replay fixtures를 mirror한 작은 CLI smoke 입력이 있습니다. examples/runtime_intelligence/aiguard_runtime_operation_guard_analysis.json는 Lab Runtime Intelligence bundle에 넣을 수 있는 precomputed guard_analysis artifact 예시입니다. 파일명은 Lab bundle의 AIGuard artifact role과 맞추며, AIGuard는 여기서도 deterministic evidence만 생성하고 deployment decision은 만들지 않습니다.

Remote dispatch starter 결과도 deterministic evidence로 해석할 수 있습니다.

python -m inferedge_aiguard.cli reason-remote-dispatch \
  --input reports/remote_dispatch_result.json
python -m inferedge_aiguard.cli reason \
  --input reports/remote_dispatch_result.json

이 경로는 inferedge-remote-dispatch-result-v1의 worker selection, remote_execution_result.status, error_category, HTTP/SSH starter 성공/실패를 remote_execution_plan_only, remote_execution_starter_success, remote_execution_failed, remote_execution_recovered_by_fallback 같은 evidence로 변환합니다. fallback이 성공해도 primary worker instability는 review evidence로 남깁니다. 이는 production remote execution 판정이 아니라 explicit starter execution evidence입니다.

Quick Examples

YOLO output 하나를 분석합니다.

python -m inferedge_aiguard.cli analyze --input examples/single/fp32_normal.json

FP32 baseline과 candidate output을 비교합니다.

python -m inferedge_aiguard.cli compare \
  --base examples/single/fp32_normal.json \
  --candidate examples/single/int8_count_mismatch.json

여러 YOLO output을 batch 분석합니다.

python -m inferedge_aiguard.cli batch-analyze --input-dir examples/single

FP32/candidate directory를 파일명 기준으로 batch 비교합니다.

python -m inferedge_aiguard.cli batch-compare \
  --base-dir examples/fp32 \
  --candidate-dir examples/int8

Lab Compatibility Examples

examples/lab_compat는 실제 InferEdgeLab 출력에 더 가까운 compatibility fixture입니다. 실제 Lab repo를 import하지 않고도 unified reason CLI가 Lab-style JSON을 올바른 reasoning 경로로 라우팅하는지 검증합니다.

lab_compare_realistic.json: cross precision FP32 vs INT8 compare result 형태
lab_result_realistic.json: 단일 TensorRT INT8 structured result 형태
lab_history_realistic.json: repeated TensorRT INT8 structured result history 형태

python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_compare_realistic.json
python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_result_realistic.json
python -m inferedge_aiguard.cli reason --input examples/lab_compat/lab_history_realistic.json

이 단계는 실제 Lab repo import가 아니라 JSON 호환성 검증 단계입니다.

Lab Deployment Decision Handoff

InferEdgeLab 4.2의 deployment decision layer는 AIGuard를 optional evidence로 유지합니다. AIGuard가 실행되면 Lab은 guard_analysis.status를 읽어 최종 deployment decision에 반영합니다.

Stable MVP mapping:

`guard_analysis.status`	Lab deployment decision impact
`ok`	favorable Lab judgement can become `deployable`; neutral judgement can become `deployable_with_note`
`warning`	`review_required`
`error`	`blocked`
`skipped`	`unknown`

AIGuard output remains rule + evidence based. It should include reviewer-facing evidence such as mode, anomalies, suspected_causes, recommendations, and confidence, but it must not overwrite Lab judgement.

The schema helper validate_guard_analysis locks this handoff shape inside AIGuard without requiring a runtime dependency on InferEdgeLab.

Validation Evidence

InferEdgeAIGuard includes a fixture-based validation report that demonstrates how the reasoning layer detects suspicious compare results, structured result issues, and repeated-run instability.

Evidence	Path	Purpose
Fixture validation report	`docs/validation_report.md`	Lab-like fixture 기반 reasoning 검증
Jetson validation report	`docs/jetson_validation_report.md`	Real-device evidence
Portfolio summary	`docs/portfolio_summary.md`	면접/포트폴리오 설명용
Runtime reliability signals	`docs/runtime_reliability_signals.md`	Orchestrator scheduling/sustained telemetry -> guard_analysis mapping
Jetson compare evidence	`real_device/jetson/compare_fp32_fp16.json`	FP32 vs FP16 speedup 검증
Jetson history evidence	`real_device/jetson/history/yolov8n_fp16_history.json`	repeated-run logging consistency 검증

Portfolio summary: docs/portfolio_summary.md
Detector validation matrix: docs/detector_validation_matrix.md
Runtime reliability signals: docs/runtime_reliability_signals.md
Validation report: docs/validation_report.md
Jetson validation plan: docs/jetson_validation_plan.md
Jetson validation report: docs/jetson_validation_report.md
GitHub publication notes: docs/github_publication_notes.md
Saved evidence reports: reports/validation/
Real-device Jetson reports: reports/jetson/
Real-device Jetson inputs: real_device/jetson/
Inputs: examples/lab_compat/

Fixture-based validation, Jetson real-device validation, and run-history reasoning evidence are available now. The execution checklist/history remains in docs/jetson_validation_plan.md, and the current Jetson FP32/FP16 evidence is summarized in docs/jetson_validation_report.md.

Jetson run history reasoning evidence도 추가되어, AIGuard가 repeated FP16 run에서 accuracy logging이 일관되지 않은 문제를 partial_accuracy_missing으로 감지할 수 있음을 보여줍니다.

Detector Validation Matrix

AIGuard detectors are deterministic evidence providers. They explain why a result should pass, require review, or be blocked, but InferEdgeLab remains the final deployment decision owner.

Case	Signal	Expected `guard_verdict`	Meaning
normal	stable bbox, score, and detection count	`pass`	no deployment-risk evidence from AIGuard
bbox collapse	near-zero area boxes increase	`blocked`	decoder, postprocess, or quantization issue possible
score saturation	confidence scores concentrate near 0 or 1	`blocked`	score calibration or postprocess issue possible
temporal instability	frame-level detection count or bbox movement is unstable	`review_required`	runtime output stability should be reviewed
provenance mismatch	Forge/Runtime source or artifact identity differs	`blocked` / `error`	evidence may not describe the artifact under review

Detector Verdict Matrix

The table below is the reviewer-facing version of the detector policy. It is not a Lab deployment policy by itself; Lab may combine these signals with latency, accuracy, contract, and runtime evidence before producing the final deployment_decision.

Detector family	Primary evidence	Pass	Review	Block	Report field
bbox validity	`invalid_bbox_rate`	`<= 0.05`	`> 0.05`	`> 0.20`	`evidence[].metric_name`
bbox collapse	`bbox_collapse_ratio`	`<= 0.05`	`> 0.05` or baseline factor `> 5x`	severe collapse or baseline factor `> 10x`	`evidence[].observed_value`
confidence score range	`score_range_violation_count`	`0`	n/a	`> 0`	`evidence[].severity`
confidence saturation	`saturation_ratio`	`< 0.70`	`>= 0.70`	`>= 0.85` with quality drift	`evidence[].observed_value`
detection disappearance	`detection_count_drop_pct`, `zero_detection_frame_ratio`	stable count	drop `>= 50%`	drop `>= 80%` or zero-frame ratio `> 0.30`	`candidate_summary.comparison`
baseline deviation	invalid/collapse/saturation factor	near baseline	factor `> 5x`	factor `> 10x`	`evidence[].increase_factor`
temporal consistency	count CV, bbox jump, class flip	stable sequence	count CV `> 1.0`, class flip `> 0.30`, or large center jump	zero-frame ratio `> 0.30`	`candidate_summary.temporal`
provenance consistency	source/artifact/backend identity	exact handoff match	warning mismatch	error mismatch	`guard_analysis.anomalies`

Planned detector extensions are intentionally still deterministic: per-class detection drift, stronger detection disappearance summaries, calibration drift for score distributions, and baseline profile stability. These are documented as roadmap items, not as implemented automatic root-cause proof.

The full matrix is maintained in docs/detector_validation_matrix.md.

Output JSON Schema

YOLO output-level detector는 다음 형식을 기준으로 합니다.

{
  "model": "yolov8n",
  "precision": "fp32",
  "image_id": "sample_001",
  "detections": [
    {
      "class_id": 0,
      "confidence": 0.91,
      "bbox": [12.0, 24.0, 120.0, 80.0]
    }
  ]
}

bbox는 [x, y, w, h] 형식입니다.
confidence는 0.0 이상 1.0 이하의 숫자여야 합니다.
detections는 빈 배열일 수 있습니다.

Failure Definition

Core output-level detector families are:

bbox validity/collapse: invalid, NaN/Inf, out-of-bounds, or near-zero-area boxes
confidence distribution: score range violation and saturation
detection count drift: FP32 or known-good baseline 대비 detection 수 변화
baseline deviation: invalid bbox, collapse, saturation factor 증가
temporal consistency: tracking 없이 frame-level instability 감지

각 detector는 affected_count, total_count, ratio, threshold 계열 필드를 함께 반환합니다. severity는 고정 문자열이 아니라 failure ratio 기반으로 산정됩니다.

Summary Metadata

모든 summary 결과에는 실험 재현성을 위한 metadata가 포함됩니다.

guard_version: 실험에 사용한 InferEdgeAIGuard 버전
created_at: summary 생성 시각의 UTC ISO-8601 문자열
detector_config: failure 판단에 사용된 threshold/config snapshot

--save-json은 summary dict를 그대로 저장하므로 후속 분석, 표 작성, 논문/포트폴리오 실험 로그 누적에 적합합니다. --save-md는 사람이 읽기 쉬운 실험 리포트를 남길 때 사용합니다.

Research Framing

RQ1: Quantized/cross-runtime inference results show what kinds of failure/anomaly patterns?
RQ2: Can output/result-level signals identify suspicious inference results without trusting the model output?
RQ3: Can rule-based reasoning reduce manual debugging effort for Edge AI validation?

InferEdgeAIGuard는 ground truth 정답을 직접 판단하기보다, result-level signal을 통해 "검증자가 더 살펴봐야 할 inference result"를 빠르게 좁히는 연구형 도구입니다.

Limitations

InferEdgeAIGuard는 result-based validation reasoning layer입니다.

heuristic/rule-based reasoning이며, actual root cause를 확정하지 않고 suspected cause를 제공합니다.
모델 내부 구조 분석
weight/graph 분석 중심 진단
ground truth accuracy 평가기
TensorRT/Jetson 실행기
모델 변환기
ML 학습 또는 calibration 자동화
controlled repeated-run 실험은 추가 예정
SaaS/API는 future work

즉, AIGuard는 실행기나 변환기가 아니라 Lab/Runtime이 남긴 결과를 해석하는 reasoning layer입니다.

Tests

python -m pytest -q

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
docs		docs
examples		examples
inferedge_aiguard		inferedge_aiguard
real_device/jetson		real_device/jetson
reports		reports
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferEdgeAIGuard

Summary

What Makes InferEdgeAIGuard Different?

InferEdge Pipeline Role

Why This Exists

Current Capabilities

Output-level failure detection

Compare result reasoning

Structured result reasoning

Forge/Runtime provenance reasoning

Run history reasoning

CLI Overview

Quick Smoke Commands

Unified Reason CLI

Quick Examples

Lab Compatibility Examples

Lab Deployment Decision Handoff

Validation Evidence

Detector Validation Matrix

Detector Verdict Matrix

Output JSON Schema

Failure Definition

Summary Metadata

Research Framing

Limitations

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferEdgeAIGuard

Summary

What Makes InferEdgeAIGuard Different?

InferEdge Pipeline Role

Why This Exists

Current Capabilities

Output-level failure detection

Compare result reasoning

Structured result reasoning

Forge/Runtime provenance reasoning

Run history reasoning

CLI Overview

Quick Smoke Commands

Unified Reason CLI

Quick Examples

Lab Compatibility Examples

Lab Deployment Decision Handoff

Validation Evidence

Detector Validation Matrix

Detector Verdict Matrix

Output JSON Schema

Failure Definition

Summary Metadata

Research Framing

Limitations

Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages