THU-MAIC · MycallAI · Apr 4, 2026 · Apr 4, 2026 · Apr 4, 2026 · Apr 4, 2026
diff --git a/.cursor/rules/git-remote-target.mdc b/.cursor/rules/git-remote-target.mdc
@@ -0,0 +1,8 @@
+---
+alwaysApply: true
+---
+# Git remote target
+
+- **Push all branches, tags, and changes to `origin`:** [MycallAI/OpenMAIC-1](https://github.com/MycallAI/OpenMAIC-1).
+- **`upstream`** ([THU-MAIC/OpenMAIC](https://github.com/THU-MAIC/OpenMAIC)) is for pulling or comparing with the original project only. Do not treat upstream as the default push target or PR base for this workspace.
+- When suggesting `git push`, `gh pr create`, or clone URLs for this repo, use **OpenMAIC-1** unless the user explicitly asks about contributing back to THU-MAIC.
diff --git a/.cursor/skills/nexus/SKILL.md b/.cursor/skills/nexus/SKILL.md
@@ -0,0 +1,120 @@
+---
+name: nexus
+description: >-
+  Implements Nexus in OpenMAIC: homeroom MVPs 1–4 (StudentContext, pulse, briefing, escalation)
+  and assessment MVPs 5–8 (Mini-Boss grading API, Counter-Attack interrogator, Variate scenario
+  engine, Loot Drop diagnostic reports). Use when the user mentions Nexus, homeroom, grading API,
+  Counter-Attack, variate scenarios, skill-tree report, Stage 2–4 assessment, or MVP 1–8.
+---
+
+# Nexus
+
+**Target repo:** [MycallAI/OpenMAIC-1](https://github.com/MycallAI/OpenMAIC-1) (`origin`). Use [THU-MAIC/OpenMAIC](https://github.com/THU-MAIC/OpenMAIC) (`upstream`) only to sync or cherry-pick upstream changes—not as the default push or PR destination for this workspace.
+
+Two tracks share **`lib/nexus/`**, **Vitest**, and **`app/api/nexus/`**; keep subfolders separate (`homeroom/` vs `assessment/`). Do not entangle with classroom lesson generation unless reusing server LLM resolution only.
+
+## Track A — Homeroom (MVP 1–4)
+
+Pastoral / daily loop (see sections below).
+
+## Track B — Assessment (MVP 5–8)
+
+Technical evaluation pipeline: **Stage 2** isolated grading (MVP 5), **Stage 3** anti-cheat + synthesis (MVP 6–7), **Stage 4** diagnostics (MVP 8). **API-first** for MVP 5; chat UI only where MVP 6 requires it.
+
+---
+
+# Track A — Homeroom
+
+## Non-negotiables
+
+- **No biometric data** in any schema or mock.
+- **PII separation**: processing types use **`studentId` (UUID)** only—no names or emails in `StudentContext` / `SessionState` used for logic.
+- **Pulse is ephemeral**: morning-loop state lives in memory / client session; **no persistent chat logs** to DB.
+- **Escalation**: **Tier 3** must never rely on LLM alone—**deterministic regex/keyword path** must fire first and **cannot be down-ranked** by the model.
+
+## MVP 1 — Context Aggregator
+
+1. Types + validation: `StudentContext` (LMS + calendar fields, `dataQuality` / gaps for missing data).
+2. Adapters: `parseLmsMock`, `parseCalendarMock` — **never throw** on partial JSON; default empty structures.
+3. `buildStudentContext({ studentId, lms, calendar })` — single merge function.
+4. Fixtures: at least **three** edge cases—late-night submission, packed calendar, sparse/missing fields.
+5. **DoD**: Vitest loads all three; merge completes **without throws**; output validates to schema.
+
+## MVP 2 — 10-Second Pulse
+
+1. `SessionState` includes **`cognitiveLoad`**: `High` | `Med` | `Low`, plus greeting metadata, timestamps.
+2. **POST** `app/api/nexus/pulse/route.ts`: input = `StudentContext` + raw text/emoji; server-only LLM via existing server provider patterns.
+3. Structured model output (e.g. `generateObject`) for enum fields—not free-text parsing for load level.
+4. Minimal UI e.g. `app/nexus/pulse/page.tsx`: submit → show greeting + load; **wipe state** on loop completion.
+5. **DoD**: five distinct `StudentContext` profiles; tests use **mocked LLM** for CI stability; optional live integration behind env flag.
+
+## MVP 3 — Orchestrator
+
+1. Pure function `orchestrateDay(session, context)` → **Markdown briefing** (exactly **three** bullets) + **`FocusModeActive: boolean`**.
+2. **Rule**: `cognitiveLoad === 'High'` ⇒ **`FocusModeActive === true`**; briefing **reduces** / **prioritizes** to prevent overwhelm.
+3. Bullets must be **atomic** (single concrete action, e.g. “Draft history intro paragraph”).
+4. **DoD**: Vitest table-driven—High load always sets Focus mode true; list behavior matches spec.
+
+## MVP 4 — Tiered Escalation Router
+
+1. **`ThreatTier`**: `1` | `2` | `3` with documented semantics.
+2. **`tier3KeywordRouter(text)`** — regex/keyword list for explicit crisis/self-harm; runs **before** LLM; match ⇒ Tier 3 **always**.
+3. LLM (optional) for lower tiers—**max** with deterministic tier or skip LLM when Tier 3 locked.
+4. **`executeEscalation(tier)`**: Tier 1 = local log; Tier 2 = mock Form Tutor webhook; Tier 3 = mock crisis path + log; **no real PII in logs**.
+5. Integrate router on **every** student text in the pulse flow.
+6. **DoD**: corpus of **50** strings in fixtures; Vitest asserts **100% Tier 3** on all **keyword-defined** Tier-3 cases and webhook behavior (mocked). Do **not** claim 100% accuracy on sarcasm vs distress for the full set unless explicitly labeled and human-reviewed.
+
+## Implementation order (Track A)
+
+Build **MVP 1** and **MVP 4** (pure + tests) early; ship **MVP 2** only after keyword escalation is wired; then **MVP 3**.
+
+---
+
+# Track B — Assessment
+
+### Shared assessment non-negotiables
+
+- **Anonymized candidate IDs** in APIs and logs (UUID); no names in grading payloads.
+- **Anti-hallucination for MVP 5**: model output must be **structured** (schema); include **`rubricVersion`** and **`evidenceSpans`** (quotes or line refs from the submission) where feasible; optional **deterministic checks** (parse JSON, run formatter) as input to the grader, not as a silent replacement for the rubric.
+- **Counter-Attack (MVP 6)** must **ground questions in the actual submission** (AST diff, highlighted snippet, or rubric dimension)—no generic trivia.
+
+## MVP 5 — Mini-Boss Evaluation Engine
+
+1. **POST** `app/api/nexus/assessment/grade/route.ts` (or `evaluate`): body = `{ taskPrompt, candidateSolution, rubricId? }`.
+2. Measure **`executionTimeMs`** server-side (wall clock for the grade request).
+3. Response JSON: **`pass`** (boolean), **`executionTimeMs`**, **`confidenceScore`** (0–1), plus **`rubricVersion`**, **`reasoning`** (short, for audit only; do not expose to candidate if product forbids).
+4. **Fixture corpus**: **100** labeled submissions (50 pass, 50 fail)—stored as JSON files with **human label** and optional **expert notes**.
+5. **DoD**: Offline eval script computes **agreement with human labels** (accuracy / Cohen’s kappa); target **≥ 95%** on this fixed corpus. Run **regression** in CI against frozen fixtures with **mocked LLM** or recorded outputs; live-model gate in optional job. Document failure modes (hallucinated pass) in test names.
+
+## MVP 6 — Counter-Attack Interrogator
+
+1. **Trigger model**: deterministic rules from telemetry—e.g. `timeToSolveSeconds < 10` OR **`perfectionFlag`** from static analysis / complexity heuristic (define explicitly in code).
+2. **Agent**: specialized system prompt + tool-less or minimal-tool chat; **one primary follow-up** per trigger, referencing submission artifact.
+3. **Session store**: in-memory or short TTL for MVP; log **decision** only if policy allows (no full transcript retention unless required).
+4. **DoD**: **Red-team script**—fixtures for “pasted perfect + vacuous explanation” vs “expert natural explanation”; Vitest asserts **expected disposition** when using **mocked judge LLM**; human red-team checklist in `reference.md` for what automation cannot prove.
+
+## MVP 7 — Variate Engine
+
+1. **Input**: master scenario template (prompt + bug spec + schema + constraints) as structured data—not prose only.
+2. **`variate(scenarioSeed, template)`** → new scenario: renamed symbols, shifted bug location, perturbed dataset rows, **equivalent difficulty** by construction (same algorithmic family / same rubric dimensions).
+3. **Output**: serialized scenario for candidate + **internal answer-key metadata** (for graders only, never sent to client).
+4. **DoD**: **10** variations from one template; **`pnpm`** script checks structural distinctness (hash surfaces); **expert time study** documented externally (skill cannot claim “same time” without measured data—track in spreadsheet + link from repo doc if needed).
+
+## MVP 8 — Loot Drop Synthesizer
+
+1. **Input**: Final Boss telemetry (events, edits, test runs) + Counter-Attack transcript (if any).
+2. **Output**: **Markdown** report mapping each failure to **skill-tree node IDs** (predefined enum), e.g. `SKILL.RESOURCE_MEMORY` → “Module 3: Resource Management”.
+3. Prompt/schema constraints: **no generic praise**; every paragraph must cite a **specific observed behavior** (event ID, quote, or rubric miss).
+4. **DoD**: Vitest **snapshot** or structured checks: report contains **≥ N** skill-tree references for fixture sessions; **forbidden phrase list** test (e.g. “great job”, “keep it up”) optional.
+
+## Implementation order (Track B)
+
+**MVP 7** (deterministic variate) before or parallel with **MVP 5** so grades target stable rubrics. **MVP 5** before **MVP 6** (need graded artifact to interrogate). **MVP 8** last (consumes telemetry + interrogation).
+
+## Verification
+
+Run `pnpm test` under `tests/nexus/` (split `homeroom/` vs `assessment/`). Do not claim work is complete without passing tests.
+
+## Optional deep reference
+
+For paths, evaluation metrics, and red-team checklist, see [reference.md](reference.md).
diff --git a/.cursor/skills/nexus/reference.md b/.cursor/skills/nexus/reference.md
@@ -0,0 +1,64 @@
+# Nexus — reference layout
+
+**Canonical fork for Nexus work:** [github.com/MycallAI/OpenMAIC-1](https://github.com/MycallAI/OpenMAIC-1) (`origin`). Upstream: [THU-MAIC/OpenMAIC](https://github.com/THU-MAIC/OpenMAIC) (`upstream`).
+
+Suggested paths (adjust if repo conventions differ):
+
+## Track A — Homeroom
+
+| Area | Path |
+|------|------|
+| Types | `lib/nexus/homeroom/types.ts` |
+| Merge + adapters | `lib/nexus/homeroom/context-aggregator.ts` |
+| Fixtures | `lib/nexus/homeroom/fixtures/lms/`, `calendar/`, `escalation-corpus.json` |
+| Pulse prompt | `lib/nexus/homeroom/pulse-prompt.ts` |
+| Orchestrator | `lib/nexus/homeroom/orchestrator.ts` |
+| Escalation | `lib/nexus/homeroom/escalation-router.ts`, `escalation-actions.ts` |
+| API | `app/api/nexus/pulse/route.ts`, `app/api/nexus/briefing/route.ts` |
+| UI | `app/nexus/pulse/page.tsx` |
+| Tests | `tests/nexus/homeroom/*.test.ts` |
+
+## Track B — Assessment
+
+| Area | Path |
+|------|------|
+| Grading types + schema | `lib/nexus/assessment/grade-types.ts`, `grade-schema.ts` |
+| Grader service | `lib/nexus/assessment/mini-boss-grader.ts` |
+| Labeled corpus | `lib/nexus/assessment/fixtures/grading-corpus/*.json` (100 items, human `expectedPass`) |
+| Eval script | `scripts/nexus-assessment-eval.mts` or `pnpm nexus:assessment:eval` |
+| Counter-Attack | `lib/nexus/assessment/counter-attack/trigger.ts`, `agent-prompt.ts` |
+| Variate | `lib/nexus/assessment/variate/template.ts`, `variate.ts` |
+| Skill tree | `lib/nexus/assessment/skill-tree.ts` (enum + module copy) |
+| Loot Drop | `lib/nexus/assessment/loot-drop/synthesize.ts` |
+| API | `app/api/nexus/assessment/grade/route.ts`, `interrogate/route.ts`, `variate/route.ts`, `report/route.ts` |
+| UI (MVP 6 only) | `app/nexus/assessment/interrogate/page.tsx` (minimal) |
+| Tests | `tests/nexus/assessment/*.test.ts` |
+
+**Provider reuse**: resolve LLM via existing server utilities (`lib/server/resolve-model.ts`, `lib/ai/providers.ts`); never expose API keys to the client.
+
+**Webhook URLs**: `process.env` placeholders only for homeroom MVP (e.g. `NEXUS_FORM_TUTOR_WEBHOOK_URL`).
+
+---
+
+## MVP 5 — grading correlation
+
+- **Primary metric**: accuracy vs frozen human labels on the 100-item corpus; report **Cohen’s kappa** if class balance is skewed.
+- **CI**: pin **expected metrics** on **mocked** grader responses; optional nightly/live job for real LLM drift.
+- **Hallucination guardrails**: require `evidenceSpans` from submission text; fail closed (low `confidenceScore` or `pass: false`) when evidence missing.
+
+## MVP 6 — red-team checklist (human)
+
+Automation proves wiring and mocked judge behavior, not real-world bluff resistance. Human sessions should include:
+
+- Paste of external “perfect” solution + vague explanation → expect **fail / further scrutiny**.
+- Expert explains trade-offs tied to submitted lines → expect **pass**.
+- False positive check: fast but legitimate solve → ensure trigger policy tuned (not only `time < 10s`).
+
+## MVP 7 — expert time equivalence
+
+Distinctness is testable in code; **time-on-task equivalence** requires timed solves by N≥3 experts × 10 variants—track outside the repo or in a linked spreadsheet.
+
+## MVP 8 — report quality
+
+- Skill-tree nodes are a **closed enum**; report generator must only emit registered IDs + human-readable module titles.
+- Tests: forbidden generic phrases; minimum count of **concrete citations** (telemetry event types or quoted snippets).
diff --git a/lib/ai/providers.ts b/lib/ai/providers.ts
@@ -951,6 +951,24 @@ export const PROVIDERS: Record<ProviderId, ProviderConfig> = {
       },
     ],
   },
+
+  local: {
+    id: 'local',
+    name: 'Local (LM Studio)',
+    type: 'openai',
+    defaultBaseUrl: 'http://192.168.10.111:1234/v1',
+    requiresApiKey: false,
+    icon: '/logos/openai.svg',
+    models: [
+      {
+        id: 'google/gemma-4-e4b',
+        name: 'Gemma 4 E4B (Local)',
+        contextWindow: 8192,
+        outputWindow: 4096,
+        capabilities: { streaming: true, tools: false, vision: false },
+      },
+    ],
+  },
 };
 
 /**

diff --git a/lib/server/provider-config.ts b/lib/server/provider-config.ts
@@ -49,6 +49,7 @@ const LLM_ENV_MAP: Record<string, string> = {
   SILICONFLOW: 'siliconflow',
   DOUBAO: 'doubao',
   GROK: 'grok',
+  LOCAL: 'local',
 };
 
 const TTS_ENV_MAP: Record<string, string> = {

diff --git a/lib/types/provider.ts b/lib/types/provider.ts
@@ -16,7 +16,8 @@ export type BuiltInProviderId =
   | 'glm'
   | 'siliconflow'
   | 'doubao'
-  | 'grok';
+  | 'grok'
+  | 'local';
 
 /**
  * Provider ID (built-in or custom)