Skip to content

ramiradwan/LSIE-MLF

Repository files navigation

LSIE-MLF

Live Stream Inference Engine — Machine Learning Framework

LSIE-MLF is a desktop-first monorepo for real-time multimodal inference during live-stream sessions. It combines tethered mobile audio/video capture with external telemetry, synchronizes those inputs into fixed-duration segments, and runs ML analysis for transcription, facial action units, acoustic features, semantic evaluation, and downstream analytics.

The v4 runtime is organized around a host-side desktop process graph:

  • services.desktop_app is the primary operator/runtime entrypoint
  • full GUI mode opens the PySide Operator Console
  • operator API runtime serves the same loopback API/control graph for CLI use without opening the GUI
  • ui_api_shell serves the local desktop UI/API surface
  • capture_supervisor manages physical capture supervision
  • module_c_orchestrator assembles synchronized inference segments
  • gpu_ml_worker executes compute-heavy ML tasks and analytics
  • analytics_state_worker maintains local analytical state
  • cloud_sync_worker drains the desktop cloud-sync outbox
  • shared packages provide schemas and reusable ML utilities

Architecture at a Glance

Runtime Topology

Desktop process Responsibility
ui_api_shell Local desktop UI/API shell
capture_supervisor Physical capture supervision
module_c_orchestrator Segment assembly, synchronization, dispatch
gpu_ml_worker GPU-backed ML inference and analytics
analytics_state_worker Local analytical state maintenance
cloud_sync_worker Offline cloud-sync outbox draining

Functional Modules

Module Primary responsibility
A — Hardware & Transport Capture raw audio/video from tethered mobile hardware
B — Ground Truth Ingestion Accept external event/telemetry inputs
C — Orchestration & Synchronization Align timestamps, assemble segments, attach context
D — Multimodal ML Processing Run transcription, facial, acoustic, and semantic inference
E — Experimentation & Analytics Persist metrics, run analytics, manage experimentation state
F — Context Enrichment Run asynchronous metadata enrichment workflows

High-Level Data Flow

Android device
  -> capture_supervisor
    -> module_c_orchestrator
      -> gpu_ml_worker
        -> analytics_state_worker

External telemetry / webhook ingress
  -> ui_api_shell / retained API surfaces
    -> analytics_state_worker
      -> module_c_orchestrator
        -> gpu_ml_worker

Oura webhook deliveries are treated as change notifications. Retained server/API surfaces may still use Redis-backed hydration queues, but the primary v4 operator path starts from the desktop app entrypoint rather than a Docker Compose stack.


Repository Layout

/
├── services/
│   ├── desktop_app/                # Active v4 desktop runtime, ProcessGraph, IPC, SQLite, cloud outbox
│   ├── desktop_launcher/           # Installer/launcher preflight, install, and repair surface
│   ├── operator_console/           # Reusable PySide6 UI-only host against an external API Server
│   ├── api/                        # Retained API Server routes and desktop loopback route graph
│   ├── cloud_api/                  # Cloud control plane plus PostgreSQL bootstrap/migration surfaces
│   └── worker/                     # Retained server/cloud worker, pipeline, and ingest surfaces
├── packages/
│   ├── ml_core/                    # Shared ML utilities and math
│   ├── schemas/                    # Pydantic contracts and DTOs
│   └── signer/                     # Webcast signing/token helpers for retained ingest surfaces
├── automation/
│   ├── schemas/                    # Validators for local packets and planning artifacts
│   └── work-items/templates/       # Durable packet templates; active packets stay local/gitignored
├── scripts/                        # Operator CLI, spec/audit/schema tools, fixtures, and repo gates
├── tests/                          # Unit, integration, release-pipeline, and v4 Gate 0 coverage
├── docs/                           # Developer setup, post-merge playbook, spec, and operational docs
├── build/                          # Packaging assets and signing helpers
├── package.json                    # Pinned Node tooling for visual-drift capture/reporting
├── workspace.json                  # Shared CI command inventory
├── pyproject.toml                  # Canonical Python dependency declarations and tool config
└── uv.lock                         # Frozen dependency resolution for uv sync --frozen

Runtime Notes

  • python -m services.desktop_app is the primary v4 launch path
  • Shared code in packages/ is available to all services
  • Heavy ML dependencies belong in the ml_backend extra, not the base API/runtime environment
  • No Docker Compose or Dockerfile manifests are tracked for the active v4 desktop runtime; historical/spec references are not launch instructions

Navigation Notes

  • Start in services/desktop_app/ for active operator-runtime changes involving the ProcessGraph, IPC queues/shared-memory blocks, local SQLite state, or cloud outbox behavior.
  • Treat services/api/, services/cloud_api/, and services/worker/ as retained server/cloud or shared route/pipeline surfaces rather than the default desktop launch path.
  • Keep automation/work-items/active/ packets local and uncommitted; durable workflow changes belong in automation/work-items/templates/, automation/schemas/, or project docs.
  • data/raw/, data/interim/, and data/processed/ are local staging directories, not contract-bearing schema sources.
  • package.json and workspace.json support visual-drift and shared tooling; Python runtime dependencies remain governed by pyproject.toml and uv.lock.
  • Prefer README.md, CLAUDE.md, docs/DEVELOPER_SETUP.md, and docs/POST_MERGE_PLAYBOOK.md for current guidance. Treat docs/artifacts/ as historical snapshots unless a task explicitly tells you to use one.

Authoritative Documentation

Document Use it for Boundary
docs/tech-spec-v*.pdf Governed product, runtime, math, schema, dependency, and audit contracts Authoritative when it conflicts with repo prose
CLAUDE.md Agent-facing hard rules, canonical terminology, and required gates Keep concise; do not duplicate full setup instructions
README.md Repo orientation, launch paths, and change-routing map Operational overview, not a spec amendment surface
docs/DEVELOPER_SETUP.md Machine setup, local run modes, and test inventory Developer workflow details
docs/POST_MERGE_PLAYBOOK.md Standing and merge-specific hygiene chores Updated after feature merges
docs/companion_integration.md Android companion subset-consumer boundary and drift controls Desktop integration guide; Android repo remains protocol owner
docs/artifacts/ Historical design notes, baselines, and review outputs Reference only unless a task explicitly names an artifact

Quick Start

1) Configure environment

cp .env.example .env
# Edit .env with the required credentials and runtime settings

2) Prepare the host

Recommended local prerequisites:

  • Python 3.11
  • uv
  • CUDA-capable NVIDIA GPU and current NVIDIA driver for GPU-backed inference
  • ADB / Android device connectivity if using live USB capture

3) Install dependencies

uv sync --frozen --extra ml_backend

4) Launch the app or operator API runtime

Full GUI app:

uv run python -m services.desktop_app

CLI/API-only workflow:

uv run python -m services.desktop_app --operator-api

Both modes run preflight and start the ProcessGraph with capture, orchestration, ML, analytics, and cloud-sync workers. The default command opens the PySide Operator Console; --operator-api starts the same loopback API/control surface for CLI use without opening the GUI.

5) Use the CLI

With either the full GUI app or operator API runtime running, the CLI defaults to http://127.0.0.1:8000 or LSIE_API_URL if set:

uv run python -m scripts status
uv run python -m scripts health
uv run python -m scripts sessions start android://device --experiment greeting_line_v1
uv run python -m scripts sessions list
uv run python -m scripts stimulus submit <session-id> --note "test stimulus"
uv run python -m scripts live-session readback <session-id>
uv run python -m scripts sessions end <session-id>

If the loopback API selects another port, set LSIE_API_URL once for the shell or pass --api-url <url> on individual commands. The CLI talks only to the loopback API; it does not read SQLite directly.

6) Reusable Operator Console host

services.operator_console is a reusable/standalone PySide6 UI-only host. It polls an external API Server's /api/v1/operator/* aggregate routes and does not start capture, GPU inference, SQLite state, or cloud sync.

Use it only when developing or testing the UI against an already-running external API:

uv sync --frozen
uv run python -m services.operator_console

Environment variables (all optional; sensible defaults apply):

Variable Purpose
LSIE_OPERATOR_API_BASE_URL External API Server base URL (default http://localhost:8000)
LSIE_OPERATOR_API_TIMEOUT_SECONDS Per-request timeout, default 5
LSIE_OPERATOR_ENVIRONMENT_LABEL Free-text label shown in the statusline (e.g. dev, staging)
LSIE_OPERATOR_*_POLL_MS Per-surface poll cadences (overview, sessions, health, …) — see services/operator_console/config.py for the full list

The console ships six pages: Overview, Live Session, Experiments, Physiology, Health, and Sessions. Page behavior traces to the spec:

  • Live Session's reward explanation uses p90_intensity, semantic_gate, gated_reward, n_frames_in_window, and au12_baseline_pre (§7B).
  • Physiology surfaces fresh / stale / absent / no-rmssd as four distinct states (§4.C.4).
  • Co-modulation null is rendered as a legitimate null-valid outcome with its null_reason, not as an error (§7C).
  • Health distinguishes degraded / recovering / error with operator-action hints on the error summary card (§12).

Historical Docker/server references

The current tracked tree has no Docker Compose or Dockerfile manifests for the active v4 desktop runtime. Docker, container, Message Broker, and Persistent Store references that remain in spec extracts or archived artifacts describe retained legacy/server architecture or historical context, not the default operator workflow.


Where to Make Changes

If you need to change... Start here Do not put here
Active desktop runtime, ProcessGraph, IPC, local SQLite, or cloud outbox behavior services/desktop_app/ Retained Celery/Redis/PostgreSQL launch logic
Desktop installer, preflight, or repair logic services/desktop_launcher/ Runtime process behavior or ML algorithms
Reusable Operator Console UI-only host services/operator_console/ Capture startup, SQLite writes, cloud sync, or direct local-store reads
Loopback or retained API routes and operator aggregates services/api/ Desktop process lifecycle or cloud-control-plane persistence
Cloud control plane routes, repos, or PostgreSQL bootstrap surfaces services/cloud_api/ Desktop SQLite state, local IPC, or operator GUI code
Retained worker pipeline, attribution, ingest, or cloud-side background tasks services/worker/ Active desktop ProcessGraph wiring or local SQLite ownership
Shared schemas and data contracts packages/schemas/ Service-specific persistence or transport code
Shared inference helpers or math packages/ml_core/ Runtime process orchestration or UI presentation
Webcast signing or token acquisition helpers packages/signer/ Capture supervision, analytics, or persistence policy
Local packet validators or durable packet templates automation/schemas/ and automation/work-items/templates/ Generated active packet instances
Operator CLI, repo gates, spec tooling, or fixture/drift utilities scripts/ Long-lived service code or schema contracts
Tests for a surface tests/unit/, tests/integration/, tests/v4_gate0/, tests/contract/, tests/static/ Production modules or generated artifacts
Visual-drift tooling package.json and workspace.json Python runtime dependency declarations
Python dependency placement pyproject.toml and uv.lock Ad-hoc dependency pins in service code

automation/work-items/active/ packets are local planning artifacts. Keep them out of commits and update templates or validators when you need a durable workflow change.

Review and Ownership Map

Area Primary review focus Supporting docs/tests
Desktop runtime (services/desktop_app/) ProcessGraph lifecycle, IPC queues/shared-memory blocks, SQLite local state, cloud outbox, privacy cleanup tests/unit/desktop_app/, tests/integration/desktop_app/, tests/v4_gate0/
Operator UI (services/operator_console/) PySide6 presentation, design-system tokens/widgets, viewmodels, API polling tests/unit/operator_console/, tests/integration/operator_console/, design-system skills
Shared contracts (packages/schemas/) Pydantic payloads, DTO compatibility, schema consistency scripts/check_schema_consistency.py, tests/unit/schemas/
ML and math (packages/ml_core/, retained reward helpers) AU12/acoustic/transcription helpers, attribution math, Thompson Sampling invariants tests/unit/ml_core/, tests/unit/test_v3_math_recipe.py, tests/v4_gate0/
Retained/cloud API (services/api/, services/cloud_api/) Loopback/retained route shape, cloud PostgreSQL DDL, idempotent upload surfaces tests/unit/api/, tests/integration/cloud_api/, tests/contract/
Retained worker and ingest (services/worker/, packages/signer/) Guarded dormant surfaces, Webcast signing, retained pipeline compatibility tests/unit/worker/, tests/unit/automation/test_deferred_integration_guards.py
Automation and gates (automation/, scripts/) Work-item schemas, spec/audit tooling, repo validation scripts tests/unit/automation/, tests/unit/scripts/, scripts/check.sh

Contributor Workflow

  • Start with CLAUDE.md for canonical terminology, desktop-vs-retained-surface boundaries, and required validation gates.
  • Use docs/DEVELOPER_SETUP.md for machine prerequisites and for when to run services.desktop_app versus services.operator_console.
  • Run the narrowest relevant tests first, then finish repo-wide code, schema, or automation work with bash scripts/check.sh or pwsh scripts/check.ps1.
  • Schema-affecting changes should update packages/schemas/ and be checked with uv run python scripts/check_schema_consistency.py.
  • Work-item packet instances under automation/work-items/active/ stay local; committed changes belong in templates, validators, or project docs.

Development Notes

Dependency Placement

Use pyproject.toml as the declaration surface and uv.lock as the frozen resolution surface:

  • put shared runtime packages in [project.dependencies]
  • put ML-heavy worker/orchestrator packages in [project.optional-dependencies].ml_backend
  • refresh uv.lock whenever dependency declarations change

This keeps the base API/runtime environment lightweight while preserving a reproducible lockfile for uv sync --frozen.

Boundary Patterns

  • Inter-module payloads cross runtime boundaries as Pydantic models from packages/schemas/; avoid untyped dictionaries at module boundaries.
  • The desktop SQLite write path is owned by analytics_state_worker. UI/API readers use query-only adapters, and cloud sync drains the outbox rather than becoming a general-purpose local-state writer.
  • services.desktop_app.os_adapter owns platform-specific behavior; process, state, and capture modules should call adapter functions instead of branching on the OS directly.
  • Operator Console changes should use existing design-system tokens, widgets, viewmodels, and tests; do not add one-off styling primitives for a single page.
  • Data persistence and payload call sites that carry §5.2 classification intent use packages.schemas.data_tiers markers such as DataTier and mark_data_tier; the helpers preserve runtime values and feed audit/static verification.

Local validation gates

For desktop-runtime changes, run targeted desktop validation first:

uv run pytest tests/v4_gate0/ tests/unit/desktop_app/ tests/integration/desktop_app/ tests/unit/worker/pipeline/test_orchestrator.py
uv run ruff check packages/ services/ tests/ automation/
uv run ruff format --check packages/ services/ tests/ automation/
uv run mypy packages/ services/ tests/ automation/ --python-version 3.11 --ignore-missing-imports --explicit-package-bases

The full local check scripts are still available for repository-wide pre-push validation:

bash scripts/check.sh          # macOS / Linux / Git Bash on Windows
pwsh scripts/check.ps1         # PowerShell on Windows

There is no standing Docker Compose gate for the active v4 desktop runtime because no compose/Dockerfile manifests are tracked. Historical/spec Docker references should not be converted into launch or validation instructions.

At a minimum, changes touching worker or analytics code should be validated against the full worker test path.


Data Handling

Raw media and inbound telemetry should be treated as processing inputs, not long-term application records. Persistent storage is intended for structured analytical outputs, experiment state, and derived metrics.

Keep README-level guidance brief and put detailed governance, retention, and security rules in the technical specification and implementation docs.


Specification

LSIE-MLF is implemented against the single signed specification PDF committed as docs/tech-spec-v*.pdf.

This README is intentionally operational. It explains how the repository is organized, how to run it locally, and where to make changes. Detailed contracts, mathematical formulas, failure handling, and version history belong in the signed specification payload.

If this README and the specification differ, the specification is authoritative.


License

Confidential. All rights reserved.

Packages

 
 
 

Contributors

Languages