ARIA is a real-time mission-control simulation for parafoil landing operations.
It combines telemetry playback, retrieval-augmented planning, rule-based safety checks, and a human-in-the-loop UI.
This project demonstrates practical AI systems engineering across backend, frontend, data, and reliability concerns.
Hackathon prototypes usually prove ideas but leave architectural and reliability gaps:
- Tight coupling of model provider logic with domain logic.
- Inconsistent API/data contracts between backend and frontend.
- Schema drift risks in local databases.
- Weak fallback behavior when model outputs are empty or invalid.
Goal: stabilize and productionize the prototype without losing iteration speed.
- Telemetry scenarios stream from CSV at 20 Hz.
- Planner ticks at 1 Hz and assembles working memory:
- state summary
- recent episodic events
- retrieved lessons and docs
- LLM returns a structured JSON plan.
- Safety gate validates/adjusts risky actions.
- Plan and metrics are emitted via SSE and persisted to episodic memory.
- End-of-run distillation writes semantic lessons for future retrieval.
backend/services/playback.py: telemetry loop, planner invocation, SSE publishingbackend/services/planner.py: planning orchestrationbackend/services/safety_gate.py: domain safety checks/redlinesbackend/aria/memory/: retrieval, schema, distillation, storagebackend/llm/: provider abstraction, retry/fallback policyfrontend/components/: mission chat, gauges, timeline, plan panel
Refactored provider logic into backend/llm/:
config.py: environment-driven provider/model configproviders/openai_compatible.py: OpenAI-compatible adapterclient.py: retry policy, model fallback policy, health checks
Impact:
- Provider changes no longer require touching planner/chat logic.
- Cleaner testing surface and better maintainability.
Added failover behavior for empty model responses:
- Empty content is treated as a failure condition.
- Retry/fallback path is invoked automatically.
- Chat route guarantees a non-empty user-visible response string.
Impact:
- Eliminates silent
(no reply)UX failures.
Fixed schema drift between code and local SQLite snapshots:
- Retriever now handles
docstables with or without anembeddingcolumn.
Impact:
- Prevents runtime crashes on older/dev database files.
- Mounted missing backend routers.
- Added missing
POST /api/plan/nowendpoint. - Fixed frontend plan-check rendering for structured check objects.
- Corrected toggle defaults and SSE base handling.
Impact:
- End-to-end planning flow became stable and observable.
- Kept
backend/aria/agent.pyas a compatibility facade during refactor to avoid widespread import churn. - Used OpenAI-compatible adapter for both OpenRouter/OpenAI paths to minimize code duplication.
- Prioritized reliability patches over broad feature expansion.
- Plan generation errors now surface clearly through API logs.
- Model fallback order is configurable in env.
- SSE stream includes run, plan, anomaly, and metrics events for operator visibility.
- Full-stack ownership (Python + TypeScript).
- Real-time distributed flow design (SSE + async services).
- AI application robustness patterns:
- strict output shaping
- safety-gated post-processing
- retrieval grounding
- fallback-oriented provider integration
- Pragmatic migration from hackathon code to maintainable architecture.
- Add automated integration tests for
start -> plan -> decide -> distill. - Add structured logging/metrics dashboards for planner latency and fallback rates.
- Move to explicit env loading per service directory to avoid
.envambiguity. - Add optional auth and role-based controls for multi-operator scenarios.