Dev by MrPhantom2325 · Pull Request #30 · MrPhantom2325/HungerNet-Smart-Food-Redistribution-Using-AI

MrPhantom2325 · 2026-05-13T14:43:24Z

Phase 1 final: full MLOps stack — Sprints 1-10

- tune.py runs a study for one agent at a time (q_learning, sarsa, dqn) - TPESampler-driven Bayesian search over agent-specific param spaces: - q_learning/sarsa: learning_rate, epsilon schedule, decay episodes - dqn: lr, hidden_sizes (categorical), batch_size, target_update_interval - Per-trial budget reduced (600 ep for tabular, 300 for dqn) to keep tuning fast; winner gets full production budget when re-trained in Step 23 - Each trial logged as nested MLflow run under a parent study run; clean hierarchy visible in UI - save_best_config writes configs/<agent>_tuned.yaml with the winning hyperparameters at full production episode budget - Smoke-tested with 2 trials end-to-end; configs/<agent>_tuned.yaml roundtrip successful Refs #8

- multi_seed_eval.py trains the same config N times with different seeds and aggregates eval metrics with mean / std / min / max - Parent MLflow run owns N nested per-seed runs for clean UI hierarchy - Per-seed JSONs + summary.json written to experiments/multi_seed/<run_id>/ - Each seed evaluates on 5 held-out eval seeds; total 25 eval episodes per config for robust statistics - aggregate_results computes the full distribution including range - 3 tests cover aggregation logic and end-to-end mini run Satisfies CO1 cross-validation requirement: instead of one number per algorithm, every comparison reports mean ± std across 5 independent training runs.

- Aggregates multi-seed eval summaries with greedy/random baselines from MLflow - Generates 3 outputs in experiments/figures/: - sprint6_comparison.md (drop-in for the report) - sprint6_comparison.csv (raw data for further analysis) - sprint6_comparison.png (bar chart with error bars, dark theme) - Multi-seed learners color-coded green with std error bars; baselines gray - Sorted by descending eval reward so the leader is leftmost

- configs/q_learning_tuned.yaml: Optuna-tuned (30 trials) - configs/sarsa_tuned.yaml: Optuna-tuned (30 trials) - configs/dqn_tuned.yaml: Optuna-tuned (15 trials) - scripts/register_tuned_models.py: registers multi-seed eval results as new versions in the MLflow Model Registry Closes #8

Feature/hyperparam tuning

- api/schemas.py: Pydantic v2 models for request/response with field validation (rejects NaN/inf observations) - api/main.py: FastAPI app with /health, /info, /metrics, /predict endpoints - Module-level state holds loaded policy, model info, prediction counters, rolling latency window - DQN-specific shortcut extracts Q-values from policy.q_net for response - _interpret_action maps integer action -> (kind, target_index) using num_donors/num_shelters from model info - Prediction logging is best-effort (failures don't break the response) - lifespan context manager loads policy on startup - /predict returns 503 if no model loaded; 422 if obs has wrong dim Refs #5

…ution - api/policy_loader.py loads a DQN policy from one of three sources: 1. MLflow Model Registry (FOOD_RESCUE_MODEL_NAME + _VERSION env vars) 2. Local file or directory (FOOD_RESCUE_MODEL_PATH) 3. Default convention: experiments/policies/dqn_tuned.pt or dqn_v1.pt - Only DQN supported for serving (tabular agents need env-derived state) - _load_from_mlflow_registry uses mlflow.artifacts.download_artifacts - meta.json sidecar provides obs_dim, num_actions; num_donors/num_shelters hardcoded to 5 (matches our scenarios) - api/prediction_log.py is a stub; full SQLite impl in Step 28 Service starts cleanly via 'uvicorn api.main:app --port 8000', all four endpoints (/health, /info, /metrics, /predict) tested end-to-end with curl.

- api/prediction_log.py replaces the Step 27 stub - log_prediction() writes request_id, timestamp, observation (JSON), action, action_kind, model name/version, latency_ms per prediction - fetch_recent() and fetch_observations() used by drift detector + dashboard - DB path defaults to experiments/prediction_log.db; override with FOOD_RESCUE_LOG_DB env var (needed for Docker volume mounts) - CREATE TABLE IF NOT EXISTS is idempotent; no migration tooling needed - New connection per request avoids SQLite threading issues Refs #5

- monitoring/drift_detector.py: per-feature KS test comparing live request obs (from prediction_log.db) vs training distribution (rolled out from all 3 scenarios, n_reference_episodes=20) - DriftReport dataclass with summary(), drifted_features list, p-values - Needs ≥30 live samples before reporting drift (avoids false positives) - Reference distribution cached in memory after first build - monitoring/dashboard.py: Streamlit app reading from the prediction log - Service health panel (polls /health, /info, /metrics) - On-demand drift check with per-feature p-value table - Action distribution bar chart + latency line chart - Recent predictions table (last 50) - scipy added to requirements.txt (ks_2samp) Refs #5

…k state - Extract load_policy_from_env_if_needed() wrapper in api/main.py - autouse fixture patches the wrapper, preventing real policy load - Fix test_predict_donor/shelter/idle to set state[policy] directly instead of patching q_net return_value after client creation

Feature/serving api

…CORS, fix dup tests - requirements.txt: removed double scipy + double httpx entries - docker-compose.yml: removed deprecated 'version' top-level key - api/main.py: added CORSMiddleware for browser-based demo - pyproject.toml: add ruff config with sensible ignores - tests: remove duplicate test_no_update_in_eval_mode definitions that were silently shadowing each other

fix(sprint8): deduplicate deps, remove obsolete compose version, add …

CI (.github/workflows/ci.yml): - Lint with ruff - Run pytest with coverage on Python 3.11 - Build both Docker images on push to main/dev - Smoke-test the serve image by hitting /health CD (.github/workflows/cd.yml): - workflow_dispatch with a config dropdown to pick which agent to train - Auto-trigger on pushes to main with [retrain] in the commit message - Upload trained policy and MLflow runs as artifacts Also adds .github/pull_request_template.md to enforce the What/Why/How-to-test/Closes structure for all future PRs.

data_prep.py requires the --scenario argument; 'all' processes all three scenarios (weekday, weekend, holiday_rush) which is what the tests expect to be present.

feat(cicd): add GitHub Actions CI and CD workflows

k8s/: - 00-namespace.yaml, 10-mlflow.yaml, 20-api.yaml, 30-train-job.yaml - README.md with apply instructions + ArgoCD GitOps example - Manifests aren't deployed (no cluster); they document the GitOps pattern README.md (root): - Quick demo + docker-compose reproduction - MLOps capability matrix linking to rubric requirements - Honest results table linking to MODEL_IMPROVEMENT.md

feat(sprint10): K8s manifests

Previously the 'DQN' option silently fell back to the JS-side greedy policy — the trained model at experiments/policies/dqn_v1.pt was never actually used. Changes: - Add 'DQN (via API)' dropdown option - New buildObservationFor(v) constructs the 31-dim vector matching sim/environment.py:_get_observation exactly - New policyDqnApi async function POSTs to /predict and decodes the returned action via decodeAction() - API status pill in the topbar (green/amber/red) - Graceful fallback to greedy if the API is unreachable - stepSim() is now async with for/of loop instead of forEach (because forEach doesn't await async callbacks) - Step + play button handlers updated to await stepSim() API base URL is overridable via localStorage.setItem('api_base', ...)

feat: wire index.html demo to FastAPI /predict endpoint

test

MrPhantom2325 and others added 30 commits May 11, 2026 03:16

Merge pull request #22 from MrPhantom2325/feature/hyperparam-tuning

9656ee8

Feature/hyperparam tuning

Merge pull request #23 from MrPhantom2325/feature/serving-api

201c14c

Feature/serving api

docker

aba17d0

Update Dockerfile.train

29af7c6

Update Dockerfile.serve

119da2e

Update docker-compose.yml

149c202

Delete .dvc directory

9f03068

test changes

0ffbfa6

Merge pull request #25 from MrPhantom2325/feature/sprint8-fixes

d6e73d5

fix(sprint8): deduplicate deps, remove obsolete compose version, add …

fix(ci): pass --scenario all to data_prep.py

9fb4438

data_prep.py requires the --scenario argument; 'all' processes all three scenarios (weekday, weekend, holiday_rush) which is what the tests expect to be present.

Merge pull request #26 from MrPhantom2325/feature/cicd

fff7467

feat(cicd): add GitHub Actions CI and CD workflows

Merge pull request #27 from MrPhantom2325/feature/k8s-and-polish

549df48

feat(sprint10): K8s manifests

Merge pull request #28 from MrPhantom2325/feature/wire-index-to-api

5f05791

feat: wire index.html demo to FastAPI /predict endpoint

test

bca7469

model version history

b82780c

Merge pull request #29 from MrPhantom2325/feature/model-improvements

9b94a11

test

MrPhantom2325 added 2 commits May 13, 2026 20:11

test

3b83c57

changed docker trian file

ba3b051

MrPhantom2325 merged commit 08b55d1 into main May 13, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev#30

Dev#30
MrPhantom2325 merged 32 commits into
mainfrom
dev

MrPhantom2325 commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MrPhantom2325 commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants