AI-assisted student report reviewer. Reviewer uploads templates + good/bad samples, drops in student reports, the agent flags critical and major issues with hover popovers on the formatted document, reviewer edits/approves comments, then emails feedback (with screenshots) to the student.

| Layer | Choice |
|---|---|
| Frontend + API | Next.js 16 (App Router) + React 19 + Tailwind 4 + TypeScript |
| LLM (pluggable) | Anthropic Claude (SDK or claude CLI) Β· OpenAI Codex/GPT-5 (SDK) Β· Google Gemini (SDK) β switchable from the UI |
| Default models | claude-sonnet-4-6 Β· gpt-5-codex Β· gemini-2.5-flash |
| Multi-modal | PDF + image attachments routed to Anthropic SDK + Gemini; agent SEES scanned declaration sheets, logos, signatures |
| DB | Postgres (Prisma 6) |
| Blob | Local filesystem (./storage/) β abstracted in lib/storage.ts |
Gmail SMTP via nodemailer (inline CID PNG screenshots) |
|
| PDF ingest (server) | pdf2json β pure-JS, no worker spawn |
| PDF render (client) | react-pdf + pdfjs-dist text layer |
| DOCX β HTML | mammoth (legacy reports only) |
.doc |
libreoffice --headless β DOCX β mammoth (legacy reports only) |
| Screenshots | Playwright headless Chromium |
| Streaming | Server-Sent Events (/review/stream) + DB-backed reviewProgress |
nvm use # picks Node 20.20.2 from .nvmrc
pnpm install
pnpm exec playwright install chromium # download browser for screenshots
cp .env.example .env # fill in keys (see below)
pnpm exec prisma migrate deploy
pnpm devMinimum required env: DATABASE_URL + one of (ANTHROPIC_API_KEY / OPENAI_API_KEY / GEMINI_API_KEY) OR the claude CLI installed and logged in.
System deps:
- Postgres running locally (or change
DATABASE_URL) libreofficeon PATH for.docingest (brew install --cask libreofficeon macOS). PDF + DOCX work without it.
| Var | Notes |
|---|---|
DATABASE_URL |
Postgres connection string |
LLM_PROVIDER |
anthropic (default) / codex / gemini. Seed value β runtime UI switch persists in storage/llm-settings.json and takes precedence. |
ANTHROPIC_API_KEY |
If set β SDK. If empty/unset β claude CLI fallback (run claude /login once first). |
ANTHROPIC_MODEL |
Override default claude-sonnet-4-6. |
OPENAI_API_KEY |
Required for codex provider. |
OPENAI_MODEL |
Override default gpt-5-codex. |
GEMINI_API_KEY |
Required for gemini provider. |
GEMINI_MODEL |
Override default gemini-2.5-flash. |
GMAIL_USER |
Gmail address used to send feedback emails. |
GMAIL_APP_PASSWORD |
App password (generate at https://myaccount.google.com/apppasswords). |
STORAGE_DIR |
Defaults to ./storage. |
APP_URL |
Defaults to http://localhost:3000. |
/knowledgeβ upload at least one template, optionally excellent and bad samples./β drop a student report (.pdfonly β preserves formatting), pick Review mode and Marking mode, upload./report/[id]opens. Click Review β agent flags critical/major issues + computes marking.- Hover any highlight in the document for issue details. Right panel: edit comment text/severity/category, delete, or select text and +Add a manual comment.
- Approve β enables Send feedback to (only visible if report header had a student email).
- Email goes via Gmail SMTP with one PNG screenshot per issue inline; marking is never included in the email.
Bottom-left of the sidebar carries the provider badge β click it to switch provider or change the per-provider model. Unavailable providers (missing key) are greyed out with a reason in the tooltip.
| Severity | Color | Notes |
|---|---|---|
| CRITICAL | red | template/section/grammar that blocks meaning |
| MAJOR | orange | significant but non-blocking |
| MINOR | β | never highlighted (spec) |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Next.js 16 App Router (single process, nodejs runtime) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Browser βββββ React 19 / Tailwind ββββββ€
(PDF.js, β
SSE client, βΌ
provider badge) βββββββββββββββββ
β Route handlersβ
β /api/reports β
β /api/knowledgeβ
β /api/llm β ββ provider/model switch
β /review/streamβ ββ SSE βββΊ Browser
βββββββββ¬βββββββββ
β
βββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ
β Postgres β β lib/agent β β lib/checks β
β (Prisma) βββββββββββ€ reviewer.ts β β deterministic β
β Report β β marking.ts β β preflight β
β Issue β β markingScheme β β (wordCount, β
β KbTemplate β β skills / promptsβ β requiredSec, β
β KbSample β β sectionGroups ββββββββββββββ€ depth) β
β LearnedRej. β β learning β ββββββββββββββββββ
ββββββββββββββββ ββββββββββ¬ββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββ
β lib/agent/llm β provider abstraction β
β ββββββββββββββββ ββββββββββ ββββββββββββ β
β β anthropic.ts β βcodex.tsβ βgemini.ts β β
β β SDK + CLI β βOpenAI β β@google/ β β
β β (PDF / image)β βSDK β βgenai β β
β ββββββββββββββββ ββββββββββ β(PDF / imgβ β
β β inline) β β
β settings.ts (runtime file + env) β
β jsonSalvage.ts (MAX_TOKENS recovery) β
βββββββββββββββββββββββββββββββββββββββββββββ
Side effects:
storage/ (local FS via lib/storage.ts β uploaded PDFs, screenshots,
llm-settings.json runtime provider/model selection)
Playwright (headless Chromium β PNG of highlight context for email)
nodemailer (Gmail SMTP β student email with inline CID images)
| Layer | What lives here | Files |
|---|---|---|
| UI | Server + client React components, PDF viewer with custom text-layer highlight, SSE subscription, optimistic state, provider/model switcher in sidebar badge | components/, app/ |
| HTTP | Next.js Route Handlers (REST + SSE). Thin glue: validate β call lib β respond. Long jobs use after() to detach. |
app/api/**/route.ts |
| Agent | Prompt assembly, tool-use schemas, provider abstraction, progress ticker, dismissal de-dupe, fallback marking, cross-report learning, shared section-equivalence | lib/agent/ |
| LLM | Provider-agnostic interface (generateJson({systemSegments, userSegments, attachments, schema})). One adapter per provider; runtime selection from env + storage/llm-settings.json. JSON salvage on MAX_TOKENS. |
lib/agent/llm/ |
| Checks | Cheap deterministic rules run before the LLM: word count, required sections (synonym-aware), section-depth heuristic. Pure (CheckContext) β RuleIssue[]. |
lib/checks/ |
| Ingest | File β { html, plainText }. PDF via pdf2json (no worker), DOCX via mammoth, .doc via libreoffice shell-out. Header regex pulls student name + email. |
lib/ingest/ |
| Output | PDF export, Playwright PNG screenshot per issue for the email body. | lib/pdf/ |
Gmail SMTP via nodemailer; inline cid: PNGs; CRITICAL-only filter; marking deliberately stripped. |
lib/email/ |
|
| Storage | Filesystem-backed save(bucket, name, buf) returning a relative path; designed to swap to S3/GCS by replacing one module. |
lib/storage.ts |
| DB | Postgres + Prisma. JSON columns for marking, reviewProgress, markingScheme β schemaless tail of evolving agent output. |
prisma/schema.prisma |
KbTemplate 1βββ* Report 1βββ* Issue
β
βββ templateId ββ soft FK to the pinned scheme
βββ marking JSON (overall, perSection, baselines, passive penalties)
βββ reviewProgress JSON (stage, steps[], counts, error)
βββ enabledSkills String[] (subset of SKILLS by id)
βββ wordCountMin Int @default(8000)
βββ wordCountMax Int @default(12000)
KbSample (EXCELLENT|BAD) β calibration corpus, shared across reports
LearnedRejection β cross-report dismissal hits (normalizedDesc β hitCount)
Key fields worth noting:
Issue.deletedis a soft delete. Dismissed issues are kept so the next review pass can suppress near-duplicates and feed the cross-reportLearnedRejectioncounter.Report.marking.baselineOverall+baselinePenaltysnapshot what the agent scored on the last full run so per-issue edits can move the score relative to that baseline (recomputeMarkinginlib/agent/marking.ts) without re-paying for an LLM call.Report.reviewProgress.steps[]is the preflight checklist rendered in the right sidebar. The reviewer route writes it, andmergePreflightStepsre-attaches it to every subsequent stage update so refreshes don't wipe the list.
Single interface (lib/agent/llm/types.ts):
interface LlmJsonRequest {
systemSegments: LlmSegment[]; // cache hints honored by Anthropic
userSegments: LlmSegment[];
attachments?: LlmAttachment[]; // PDFs / images for multimodal providers
schemaName?: string;
schema?: Record<string, unknown>;
maxTokens?: number;
timeoutMs?: number;
signal?: AbortSignal;
}
interface LlmProvider {
name: "anthropic" | "codex" | "gemini";
generateJson(req: LlmJsonRequest): Promise<{ json: unknown; via: string }>;
}reviewer.ts calls getProvider().generateJson(...). Provider selection: storage/llm-settings.json (UI-set) > LLM_PROVIDER env > default anthropic. Per-provider model override stored the same way.
| Provider | Auth | Structured output | Multi-modal | Caching |
|---|---|---|---|---|
| Anthropic SDK | ANTHROPIC_API_KEY |
tool_use w/ forced tool name |
PDF (document blocks) + images |
cache_control: ephemeral on segments tagged cache: true |
| Anthropic CLI | claude /login once |
--output-format json (text JSON, schema validated downstream) |
β (text only) | β |
| OpenAI Codex | OPENAI_API_KEY |
response_format: json_object; uses max_completion_tokens + reasoning_effort: "low" for gpt-5/o-series |
β (silently skips; reviewer prompt notes attachment unavailable) | β |
| Google Gemini | GEMINI_API_KEY |
responseMimeType: application/json + responseSchema (OpenAPI subset) |
PDFs + images via inlineData parts |
β |
Gemini 2.5 Flash burns output budget on internal "thinking" tokens; OpenAI gpt-5/o-series do the same on reasoning. When finishReason === "MAX_TOKENS" (Gemini) or finish_reason === "length" (OpenAI), lib/agent/llm/jsonSalvage.ts walks the partial JSON, truncates at the last safe element boundary, and closes open brackets β returning whatever issues fully completed instead of failing the whole review. Gemini Flash thinking budget is set to 0; Gemini Pro to 256 (minimum it accepts).
When the active provider supports it (anthropic SDK or gemini), reviewer.ts loads the original report PDF and attaches it to the review call as an LlmAttachment. The agent SEES the rendered pages β meaning:
- Cover pages with university logos, embedded title blocks, and "Submitted by β¦" lines count as present even when text extraction returns nothing for that page.
- Declaration sheets inserted as scanned/photographed pages between page 1 and page 4 count as present without needing a literal "Declaration" heading.
- Signed signature blocks, stamps, and rendered text-as-image are visually verified.
Cap: 18 MB. Above that, the attachment is skipped and the prompt notes the agent is text-only β biasing toward "present" to avoid false-positive structural flags. The marking call stays text-only (cost optimization).
lib/agent/sectionGroups.ts defines equivalence groups consumed by three layers:
lib/checks/requiredSections.tsβ deterministic pre-flight rule checkcomponents/ReportViewer.tsxβ marking-scheme modal tooltips (hover the cross/check icon to see WHY a section was matched or marked missing, with the exact synonym list searched)lib/agent/skills.ts:flexible-section-matchingβ mirrored in natural language for the LLM prompt
Examples of what counts as the same section:
Referencesβ‘Bibliographyβ‘Works Citedβ‘References and Bibliography(Bibliography alone is optional)Title and Declaration Sheetβ‘Declarationβ‘Statement of Originalityβ‘ any block containingI hereby declareCover Pageβ‘ a first-page block withSubmitted by/Bachelor of/ university name (text or image)Methodologyβ‘Methodsβ‘Approachβ‘Research Method- β¦ 20 groups total. Add a new equivalence by editing one file.
Numbering schemes (1.2.3 vs II.B.1 vs Chapter 4 β) are never compared between template and report β only role.
Skills are review directives injected into the system prompt. enabledSkills: string[] on the Report turns each on/off per report. Notable:
flexible-section-matchingβalwaysOn: rides along on every review regardless of the report's storedenabledSkills. Tells the agent to match by role, ignore numbering, and inspect attached PDF pages 1β4 for image-only declarations before flagging missing.structure-complianceβ defers to flexible-section-matching for naming/numbering; only flags genuinely missing sections.format-style,grammar-prose,section-depth,passive-voice,prose-quality,originality,reference-quality,factual-plausibility,literature-review,technical-diagrams.
Catalog lives in lib/agent/skills.ts. SkillDef.alwaysOn is the mechanism for structural rules that must never be disabled.
reviewer deletes an issue
β
βΌ
PATCH/DELETE /issues/[iid] β soft-delete (deleted=true) + upsert LearnedRejection
β
βΌ
next review:
β’ Per-report: dismissedBlob lists every soft-deleted issue's quotedText
+ description β injected into the user prompt
β’ Cross-report: LearnedRejection.hitCount >= LEARNED_REJECTION_THRESHOLD
β injected as a global "don't flag X" hint
β
βΌ
isDismissedDuplicate() also filters anything the agent re-surfaces
post-tool-call (belt + braces)
Result: the same false positive never has to be deleted twice on the same report, and patterns dismissed repeatedly across many reports become global "do not flag" rules β without retraining or fine-tuning.
Generated once per template (and on demand when missing) by parsing top-level numbered headings (extractTopics in lib/agent/markingScheme.ts). Encodes:
totalPointsbudget (100)topics[]β heading, weight, required-flagwordCountBandβ min/max (default 8000β12000)penaltiesβ per-category deduction weights used by the deterministic recompute
When the LLM marking call fails, the same scheme drives the fallback score, so the system degrades gracefully rather than going blank. The marking schema's perSection is also tolerant: a model returning just {overall: 78} is coerced to {overall: 78, perSection: []} rather than triggering a re-attempt.
lib/agent/schema.ts:
ReviewOutputSchema.issues[]:quotedText(max 500, auto-truncated withβ¦on overrun),severity,category,shortDescription(max 200, auto-truncated).MarkingOutputSchema:overall0β100 (required),perSection[](optional β coerced to[]when missing/null).
The same JSON Schema is sent to every provider (reviewToolSchema.input_schema, markingToolSchema.input_schema) with maxLength/maxItems so providers that respect schema constraints (Gemini's responseSchema) enforce them server-side.
- One review at a time per report β enforced by checking
status === "REVIEWING"at the route handler and by the per-reportAbortController. - Heartbeat ticker every 4s while the agent runs so SSE subscribers and the elapsed-seconds counter on the UI stay live during a multi-minute LLM call.
maxDuration = 600on the review route β fits on Vercel hobby and large local jobs.globalThis-keyed Maps for SSE subscribers + abort controllers survive HMR in dev; in prod they live for the process lifetime.
- Email sending requires
report.status β {APPROVED, SENT}AND a syntactically valid student email saved on the report. markingis never included in the email β the perSection scores stay reviewer-internal.- Screenshots are rendered from
plainText(server-controlled context window around the quoted span), not the raw uploaded file β no third-party JS executes during the screenshot. - File uploads are restricted to
.pdf(preserves formatting, lowest extraction-edge risk). DOCX/.docingest paths remain inlib/ingest/toHtml.tsfor legacy reports already in the DB but are not user-facing at the upload endpoint. - The provider switcher rejects switches to a provider whose key/CLI is missing (HTTP 409) β UI greys the option and shows the reason on hover.
Regression suite covers every layer:
npx tsx scripts/test-providers.ts # 60+ assertions: schema, salvage, skills,
# section equivalence, attachments routing,
# settings + availability, marking shape
npx tsx scripts/test-gemini-salvage.ts # 7 cases for the JSON-truncation salvageBoth run without network calls β providers are tested via static source assertions + schema/parser checks.
- Multi-reviewer accounts / auth
- Cloud blob storage (filesystem abstracted for swap)
- Background job queue (review runs inline via
after();maxDurationset to 600s) - Showing marking to the student
- OpenAI multi-modal (PDF) β Chat Completions doesn't accept PDFs directly; would need migration to Responses API or Files API