ARIA Mission Control: Portfolio Case Study

Project Summary

ARIA is a real-time mission-control simulation for parafoil landing operations.
It combines telemetry playback, retrieval-augmented planning, rule-based safety checks, and a human-in-the-loop UI.

This project demonstrates practical AI systems engineering across backend, frontend, data, and reliability concerns.

Problem Statement

Hackathon prototypes usually prove ideas but leave architectural and reliability gaps:

Tight coupling of model provider logic with domain logic.
Inconsistent API/data contracts between backend and frontend.
Schema drift risks in local databases.
Weak fallback behavior when model outputs are empty or invalid.

Goal: stabilize and productionize the prototype without losing iteration speed.

Solution Architecture

Runtime Flow

Telemetry scenarios stream from CSV at 20 Hz.
Planner ticks at 1 Hz and assembles working memory:
- state summary
- recent episodic events
- retrieved lessons and docs
LLM returns a structured JSON plan.
Safety gate validates/adjusts risky actions.
Plan and metrics are emitted via SSE and persisted to episodic memory.
End-of-run distillation writes semantic lessons for future retrieval.

Key Components

backend/services/playback.py: telemetry loop, planner invocation, SSE publishing
backend/services/planner.py: planning orchestration
backend/services/safety_gate.py: domain safety checks/redlines
backend/aria/memory/: retrieval, schema, distillation, storage
backend/llm/: provider abstraction, retry/fallback policy
frontend/components/: mission chat, gauges, timeline, plan panel

Engineering Improvements Implemented

1) LLM Provider Abstraction

Refactored provider logic into backend/llm/:

config.py: environment-driven provider/model config
providers/openai_compatible.py: OpenAI-compatible adapter
client.py: retry policy, model fallback policy, health checks

Impact:

Provider changes no longer require touching planner/chat logic.
Cleaner testing surface and better maintainability.

2) Robust Fallback Behavior

Added failover behavior for empty model responses:

Empty content is treated as a failure condition.
Retry/fallback path is invoked automatically.
Chat route guarantees a non-empty user-visible response string.

Impact:

Eliminates silent (no reply) UX failures.

3) Backward-Compatible Retrieval

Fixed schema drift between code and local SQLite snapshots:

Retriever now handles docs tables with or without an embedding column.

Impact:

Prevents runtime crashes on older/dev database files.

4) API and UI Contract Fixes

Mounted missing backend routers.
Added missing POST /api/plan/now endpoint.
Fixed frontend plan-check rendering for structured check objects.
Corrected toggle defaults and SSE base handling.

Impact:

End-to-end planning flow became stable and observable.

Tradeoffs

Kept backend/aria/agent.py as a compatibility facade during refactor to avoid widespread import churn.
Used OpenAI-compatible adapter for both OpenRouter/OpenAI paths to minimize code duplication.
Prioritized reliability patches over broad feature expansion.

Reliability and Observability Notes

Plan generation errors now surface clearly through API logs.
Model fallback order is configurable in env.
SSE stream includes run, plan, anomaly, and metrics events for operator visibility.

What This Project Demonstrates

Full-stack ownership (Python + TypeScript).
Real-time distributed flow design (SSE + async services).
AI application robustness patterns:
- strict output shaping
- safety-gated post-processing
- retrieval grounding
- fallback-oriented provider integration
Pragmatic migration from hackathon code to maintainable architecture.

Next Improvements (Roadmap)

Add automated integration tests for start -> plan -> decide -> distill.
Add structured logging/metrics dashboards for planner latency and fallback rates.
Move to explicit env loading per service directory to avoid .env ambiguity.
Add optional auth and role-based controls for multi-operator scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARIA Mission Control: Portfolio Case Study

Project Summary

Problem Statement

Solution Architecture

Runtime Flow

Key Components

Engineering Improvements Implemented

1) LLM Provider Abstraction

2) Robust Fallback Behavior

3) Backward-Compatible Retrieval

4) API and UI Contract Fixes

Tradeoffs

Reliability and Observability Notes

What This Project Demonstrates

Next Improvements (Roadmap)

FilesExpand file tree

portfolio-case-study.md

Latest commit

History

portfolio-case-study.md

File metadata and controls

ARIA Mission Control: Portfolio Case Study

Project Summary

Problem Statement

Solution Architecture

Runtime Flow

Key Components

Engineering Improvements Implemented

1) LLM Provider Abstraction

2) Robust Fallback Behavior

3) Backward-Compatible Retrieval

4) API and UI Contract Fixes

Tradeoffs

Reliability and Observability Notes

What This Project Demonstrates

Next Improvements (Roadmap)