Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .cursor/rules/git-remote-target.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
alwaysApply: true
---
# Git remote target

- **Push all branches, tags, and changes to `origin`:** [MycallAI/OpenMAIC-1](https://github.com/MycallAI/OpenMAIC-1).
- **`upstream`** ([THU-MAIC/OpenMAIC](https://github.com/THU-MAIC/OpenMAIC)) is for pulling or comparing with the original project only. Do not treat upstream as the default push target or PR base for this workspace.
- When suggesting `git push`, `gh pr create`, or clone URLs for this repo, use **OpenMAIC-1** unless the user explicitly asks about contributing back to THU-MAIC.
120 changes: 120 additions & 0 deletions .cursor/skills/nexus/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
name: nexus
description: >-
Implements Nexus in OpenMAIC: homeroom MVPs 1–4 (StudentContext, pulse, briefing, escalation)
and assessment MVPs 5–8 (Mini-Boss grading API, Counter-Attack interrogator, Variate scenario
engine, Loot Drop diagnostic reports). Use when the user mentions Nexus, homeroom, grading API,
Counter-Attack, variate scenarios, skill-tree report, Stage 2–4 assessment, or MVP 1–8.
---

# Nexus

**Target repo:** [MycallAI/OpenMAIC-1](https://github.com/MycallAI/OpenMAIC-1) (`origin`). Use [THU-MAIC/OpenMAIC](https://github.com/THU-MAIC/OpenMAIC) (`upstream`) only to sync or cherry-pick upstream changes—not as the default push or PR destination for this workspace.

Two tracks share **`lib/nexus/`**, **Vitest**, and **`app/api/nexus/`**; keep subfolders separate (`homeroom/` vs `assessment/`). Do not entangle with classroom lesson generation unless reusing server LLM resolution only.

## Track A — Homeroom (MVP 1–4)

Pastoral / daily loop (see sections below).

## Track B — Assessment (MVP 5–8)

Technical evaluation pipeline: **Stage 2** isolated grading (MVP 5), **Stage 3** anti-cheat + synthesis (MVP 6–7), **Stage 4** diagnostics (MVP 8). **API-first** for MVP 5; chat UI only where MVP 6 requires it.

---

# Track A — Homeroom

## Non-negotiables

- **No biometric data** in any schema or mock.
- **PII separation**: processing types use **`studentId` (UUID)** only—no names or emails in `StudentContext` / `SessionState` used for logic.
- **Pulse is ephemeral**: morning-loop state lives in memory / client session; **no persistent chat logs** to DB.
- **Escalation**: **Tier 3** must never rely on LLM alone—**deterministic regex/keyword path** must fire first and **cannot be down-ranked** by the model.

## MVP 1 — Context Aggregator

1. Types + validation: `StudentContext` (LMS + calendar fields, `dataQuality` / gaps for missing data).
2. Adapters: `parseLmsMock`, `parseCalendarMock` — **never throw** on partial JSON; default empty structures.
3. `buildStudentContext({ studentId, lms, calendar })` — single merge function.
4. Fixtures: at least **three** edge cases—late-night submission, packed calendar, sparse/missing fields.
5. **DoD**: Vitest loads all three; merge completes **without throws**; output validates to schema.

## MVP 2 — 10-Second Pulse

1. `SessionState` includes **`cognitiveLoad`**: `High` | `Med` | `Low`, plus greeting metadata, timestamps.
2. **POST** `app/api/nexus/pulse/route.ts`: input = `StudentContext` + raw text/emoji; server-only LLM via existing server provider patterns.
3. Structured model output (e.g. `generateObject`) for enum fields—not free-text parsing for load level.
4. Minimal UI e.g. `app/nexus/pulse/page.tsx`: submit → show greeting + load; **wipe state** on loop completion.
5. **DoD**: five distinct `StudentContext` profiles; tests use **mocked LLM** for CI stability; optional live integration behind env flag.

## MVP 3 — Orchestrator

1. Pure function `orchestrateDay(session, context)` → **Markdown briefing** (exactly **three** bullets) + **`FocusModeActive: boolean`**.
2. **Rule**: `cognitiveLoad === 'High'` ⇒ **`FocusModeActive === true`**; briefing **reduces** / **prioritizes** to prevent overwhelm.
3. Bullets must be **atomic** (single concrete action, e.g. “Draft history intro paragraph”).
4. **DoD**: Vitest table-driven—High load always sets Focus mode true; list behavior matches spec.

## MVP 4 — Tiered Escalation Router

1. **`ThreatTier`**: `1` | `2` | `3` with documented semantics.
2. **`tier3KeywordRouter(text)`** — regex/keyword list for explicit crisis/self-harm; runs **before** LLM; match ⇒ Tier 3 **always**.
3. LLM (optional) for lower tiers—**max** with deterministic tier or skip LLM when Tier 3 locked.
4. **`executeEscalation(tier)`**: Tier 1 = local log; Tier 2 = mock Form Tutor webhook; Tier 3 = mock crisis path + log; **no real PII in logs**.
5. Integrate router on **every** student text in the pulse flow.
6. **DoD**: corpus of **50** strings in fixtures; Vitest asserts **100% Tier 3** on all **keyword-defined** Tier-3 cases and webhook behavior (mocked). Do **not** claim 100% accuracy on sarcasm vs distress for the full set unless explicitly labeled and human-reviewed.

## Implementation order (Track A)

Build **MVP 1** and **MVP 4** (pure + tests) early; ship **MVP 2** only after keyword escalation is wired; then **MVP 3**.

---

# Track B — Assessment

### Shared assessment non-negotiables

- **Anonymized candidate IDs** in APIs and logs (UUID); no names in grading payloads.
- **Anti-hallucination for MVP 5**: model output must be **structured** (schema); include **`rubricVersion`** and **`evidenceSpans`** (quotes or line refs from the submission) where feasible; optional **deterministic checks** (parse JSON, run formatter) as input to the grader, not as a silent replacement for the rubric.
- **Counter-Attack (MVP 6)** must **ground questions in the actual submission** (AST diff, highlighted snippet, or rubric dimension)—no generic trivia.

## MVP 5 — Mini-Boss Evaluation Engine

1. **POST** `app/api/nexus/assessment/grade/route.ts` (or `evaluate`): body = `{ taskPrompt, candidateSolution, rubricId? }`.
2. Measure **`executionTimeMs`** server-side (wall clock for the grade request).
3. Response JSON: **`pass`** (boolean), **`executionTimeMs`**, **`confidenceScore`** (0–1), plus **`rubricVersion`**, **`reasoning`** (short, for audit only; do not expose to candidate if product forbids).
4. **Fixture corpus**: **100** labeled submissions (50 pass, 50 fail)—stored as JSON files with **human label** and optional **expert notes**.
5. **DoD**: Offline eval script computes **agreement with human labels** (accuracy / Cohen’s kappa); target **≥ 95%** on this fixed corpus. Run **regression** in CI against frozen fixtures with **mocked LLM** or recorded outputs; live-model gate in optional job. Document failure modes (hallucinated pass) in test names.

## MVP 6 — Counter-Attack Interrogator

1. **Trigger model**: deterministic rules from telemetry—e.g. `timeToSolveSeconds < 10` OR **`perfectionFlag`** from static analysis / complexity heuristic (define explicitly in code).
2. **Agent**: specialized system prompt + tool-less or minimal-tool chat; **one primary follow-up** per trigger, referencing submission artifact.
3. **Session store**: in-memory or short TTL for MVP; log **decision** only if policy allows (no full transcript retention unless required).
4. **DoD**: **Red-team script**—fixtures for “pasted perfect + vacuous explanation” vs “expert natural explanation”; Vitest asserts **expected disposition** when using **mocked judge LLM**; human red-team checklist in `reference.md` for what automation cannot prove.

## MVP 7 — Variate Engine

1. **Input**: master scenario template (prompt + bug spec + schema + constraints) as structured data—not prose only.
2. **`variate(scenarioSeed, template)`** → new scenario: renamed symbols, shifted bug location, perturbed dataset rows, **equivalent difficulty** by construction (same algorithmic family / same rubric dimensions).
3. **Output**: serialized scenario for candidate + **internal answer-key metadata** (for graders only, never sent to client).
4. **DoD**: **10** variations from one template; **`pnpm`** script checks structural distinctness (hash surfaces); **expert time study** documented externally (skill cannot claim “same time” without measured data—track in spreadsheet + link from repo doc if needed).

## MVP 8 — Loot Drop Synthesizer

1. **Input**: Final Boss telemetry (events, edits, test runs) + Counter-Attack transcript (if any).
2. **Output**: **Markdown** report mapping each failure to **skill-tree node IDs** (predefined enum), e.g. `SKILL.RESOURCE_MEMORY` → “Module 3: Resource Management”.
3. Prompt/schema constraints: **no generic praise**; every paragraph must cite a **specific observed behavior** (event ID, quote, or rubric miss).
4. **DoD**: Vitest **snapshot** or structured checks: report contains **≥ N** skill-tree references for fixture sessions; **forbidden phrase list** test (e.g. “great job”, “keep it up”) optional.

## Implementation order (Track B)

**MVP 7** (deterministic variate) before or parallel with **MVP 5** so grades target stable rubrics. **MVP 5** before **MVP 6** (need graded artifact to interrogate). **MVP 8** last (consumes telemetry + interrogation).

## Verification

Run `pnpm test` under `tests/nexus/` (split `homeroom/` vs `assessment/`). Do not claim work is complete without passing tests.

## Optional deep reference

For paths, evaluation metrics, and red-team checklist, see [reference.md](reference.md).
64 changes: 64 additions & 0 deletions .cursor/skills/nexus/reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Nexus — reference layout

**Canonical fork for Nexus work:** [github.com/MycallAI/OpenMAIC-1](https://github.com/MycallAI/OpenMAIC-1) (`origin`). Upstream: [THU-MAIC/OpenMAIC](https://github.com/THU-MAIC/OpenMAIC) (`upstream`).

Suggested paths (adjust if repo conventions differ):

## Track A — Homeroom

| Area | Path |
|------|------|
| Types | `lib/nexus/homeroom/types.ts` |
| Merge + adapters | `lib/nexus/homeroom/context-aggregator.ts` |
| Fixtures | `lib/nexus/homeroom/fixtures/lms/`, `calendar/`, `escalation-corpus.json` |
| Pulse prompt | `lib/nexus/homeroom/pulse-prompt.ts` |
| Orchestrator | `lib/nexus/homeroom/orchestrator.ts` |
| Escalation | `lib/nexus/homeroom/escalation-router.ts`, `escalation-actions.ts` |
| API | `app/api/nexus/pulse/route.ts`, `app/api/nexus/briefing/route.ts` |
| UI | `app/nexus/pulse/page.tsx` |
| Tests | `tests/nexus/homeroom/*.test.ts` |

## Track B — Assessment

| Area | Path |
|------|------|
| Grading types + schema | `lib/nexus/assessment/grade-types.ts`, `grade-schema.ts` |
| Grader service | `lib/nexus/assessment/mini-boss-grader.ts` |
| Labeled corpus | `lib/nexus/assessment/fixtures/grading-corpus/*.json` (100 items, human `expectedPass`) |
| Eval script | `scripts/nexus-assessment-eval.mts` or `pnpm nexus:assessment:eval` |
| Counter-Attack | `lib/nexus/assessment/counter-attack/trigger.ts`, `agent-prompt.ts` |
| Variate | `lib/nexus/assessment/variate/template.ts`, `variate.ts` |
| Skill tree | `lib/nexus/assessment/skill-tree.ts` (enum + module copy) |
| Loot Drop | `lib/nexus/assessment/loot-drop/synthesize.ts` |
| API | `app/api/nexus/assessment/grade/route.ts`, `interrogate/route.ts`, `variate/route.ts`, `report/route.ts` |
| UI (MVP 6 only) | `app/nexus/assessment/interrogate/page.tsx` (minimal) |
| Tests | `tests/nexus/assessment/*.test.ts` |

**Provider reuse**: resolve LLM via existing server utilities (`lib/server/resolve-model.ts`, `lib/ai/providers.ts`); never expose API keys to the client.

**Webhook URLs**: `process.env` placeholders only for homeroom MVP (e.g. `NEXUS_FORM_TUTOR_WEBHOOK_URL`).

---

## MVP 5 — grading correlation

- **Primary metric**: accuracy vs frozen human labels on the 100-item corpus; report **Cohen’s kappa** if class balance is skewed.
- **CI**: pin **expected metrics** on **mocked** grader responses; optional nightly/live job for real LLM drift.
- **Hallucination guardrails**: require `evidenceSpans` from submission text; fail closed (low `confidenceScore` or `pass: false`) when evidence missing.

## MVP 6 — red-team checklist (human)

Automation proves wiring and mocked judge behavior, not real-world bluff resistance. Human sessions should include:

- Paste of external “perfect” solution + vague explanation → expect **fail / further scrutiny**.
- Expert explains trade-offs tied to submitted lines → expect **pass**.
- False positive check: fast but legitimate solve → ensure trigger policy tuned (not only `time < 10s`).

## MVP 7 — expert time equivalence

Distinctness is testable in code; **time-on-task equivalence** requires timed solves by N≥3 experts × 10 variants—track outside the repo or in a linked spreadsheet.

## MVP 8 — report quality

- Skill-tree nodes are a **closed enum**; report generator must only emit registered IDs + human-readable module titles.
- Tests: forbidden generic phrases; minimum count of **concrete citations** (telemetry event types or quoted snippets).
18 changes: 18 additions & 0 deletions lib/ai/providers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -951,6 +951,24 @@ export const PROVIDERS: Record<ProviderId, ProviderConfig> = {
},
],
},

local: {
id: 'local',
name: 'Local (LM Studio)',
type: 'openai',
defaultBaseUrl: 'http://192.168.10.111:1234/v1',
requiresApiKey: false,
icon: '/logos/openai.svg',
models: [
{
id: 'google/gemma-4-e4b',
name: 'Gemma 4 E4B (Local)',
contextWindow: 8192,
outputWindow: 4096,
capabilities: { streaming: true, tools: false, vision: false },
},
],
},
};

/**
Expand Down
1 change: 1 addition & 0 deletions lib/server/provider-config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ const LLM_ENV_MAP: Record<string, string> = {
SILICONFLOW: 'siliconflow',
DOUBAO: 'doubao',
GROK: 'grok',
LOCAL: 'local',
};

const TTS_ENV_MAP: Record<string, string> = {
Expand Down
3 changes: 2 additions & 1 deletion lib/types/provider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ export type BuiltInProviderId =
| 'glm'
| 'siliconflow'
| 'doubao'
| 'grok';
| 'grok'
| 'local';

/**
* Provider ID (built-in or custom)
Expand Down
Loading