LENA is a lightweight AI assistant for online courses. It helps students get quick, sourced answers about assignments, schedules, course expectations, and university policies without waiting for an email reply or digging through several documents.
The point is not to replace instructors. LENA handles common questions, shows the source material behind each answer, and gives students a path to ask for instructor follow-up when the answer is uncertain. For instructors and support staff, the same activity becomes a feedback loop: repeated questions and low-confidence answers can reveal where instructions, deadlines, or course materials need to be clarified.
Students use a simple chat interface that works on desktop and mobile. Instructors and course admins can review an analytics dashboard that tracks trends, top questions, escalations, and emerging pain points across multiple courses. The pilot runs locally with Docker and is built from three main parts: a FastAPI backend, a Next.js frontend, and a Qdrant vector store.
- Student view - Ask a course question, get a sourced answer.
- Each response links back to the syllabus, policy document, uploaded resource, or calendar event it used.
- When LENA is not confident, it gives the student a way to ask for instructor follow-up and collects consented contact info.
- Instructor view - Review the course dashboard.
- KPI cards highlight question volume, helpfulness, and escalations.
- Trend charts and emerging pain points show where course materials or follow-up announcements may help.
- Course management tools let instructors/admins add or retire courses, upload documents, save link snapshots, and re-run ingestion so new materials are searchable without touching the server filesystem.
- Admin / support staff - Review aggregate metrics across pilots, tune ingestion settings, and plan integrations with campus systems as needed.
LENA uses AI to help find the right course material, not to train a new model on student data. When course materials are ingested, the backend splits them into smaller chunks and uses an embedding model to represent the meaning of each chunk as a vector, which is a list of numbers the system can compare. That embedding model is configured with LENA_EMBED_MODEL and defaults to sentence-transformers/all-MiniLM-L6-v2. The first run may need to download the embedding model. The vectors are stored in Qdrant with course and source metadata.
When a student asks a question, LENA represents the question the same way and searches for course chunks with similar meaning. This is AI-assisted retrieval: it helps match questions to relevant material even when the wording is not identical. For example, a question about when a paper is due can still match a syllabus section that says "paper deadline."
After retrieval, LENA answers in one of two modes:
- Generative mode - If
LENA_LLM_MODE=hf, LENA uses a Hugging Face language model to draft an answer from the retrieved course context. - Demo / deterministic mode - If
LENA_LLM_MODE=off, LENA does not call a generative model. It returns an extractive answer from the retrieved course snippets.
Both modes still use the embedding and retrieval step. LENA_LLM_MODE=off only disables generative answer drafting.
You do not need to train or publish a Hugging Face model for each course. Course knowledge lives in the ingested materials and the Qdrant vector store. In generative mode, the Hugging Face model drafts and formats an answer from the retrieved course context. It is not the source of course knowledge.
To use Hugging Face generation, choose a published Hugging Face text-generation model, copy its model ID, and put that ID in LENA_HF_MODEL. You do not need to add a Hugging Face URL. For example, LENA_HF_MODEL=HuggingFaceH4/zephyr-7b-beta points LENA to that published model. The current backend loads the selected Hugging Face model locally through the transformers text-generation pipeline rather than sending course context to a hosted Hugging Face API. The first run may need to download the selected model. The model must work with the standard pipeline and be practical for your environment, because LENA currently loads it on CPU with remote model code disabled. In plain terms, choose a standard model that can run on the machine hosting LENA. LENA_HF_MAX_NEW_TOKENS limits the generated answer length. If the model or pipeline cannot load, LENA falls back to extractive mode.
In both modes, the answer is meant to stay grounded in course materials and include citations. The confidence score is a retrieval heuristic, not a guarantee that the answer is correct.
Course selection modal (choose the active course).
Chat interface (course-scoped Q and A with citations).
Course FAQ page (curated questions and answers).
Instructor landing page (demo login prompt).
Course management page (add or retire courses and manage resources).
Insights page (course trends, top questions, and escalations).
Export modal (choose course scope, components, time range, and CSV or JSON).
This repo is a pilot/demo build. It ships with demo courses and sample content, and it does not include production-ready authentication or role-based access control.
-
Student experience (Chat + Course FAQ) is intentionally open in the pilot.
-
Instructor tools (Insights + Course management + Data export + Ingest) are behind a demo-only login prompt to demonstrate a basic authentication flow.
-
Username:
demo -
Password:
demo
For any production environment, the app must be connected to institutional authentication for proper security and compliance. This applies to both chat access and role-based access to Insights and Course Admin for instructors, staff, and administrators.
The Docker Compose demo enables these credentials automatically. If you run the backend directly, either set LENA_ALLOW_DEFAULT_INSTRUCTOR_CREDS=true for a sandbox demo or set your own LENA_INSTRUCTOR_USERNAME and LENA_INSTRUCTOR_PASSWORD.
- Frontend - Next.js (Pages router) + TypeScript + Tailwind, ships as a standalone Node server.
- Backend - FastAPI service that handles ingestion, retrieval, and the
/askworkflow. - Vector store - Qdrant (running inside Docker by default).
- CI - GitHub Actions runs backend tests and a frontend build on every push / PR.
Directory map:
backend/ FastAPI app, embeddings, ingestion tasks
frontend/ Next.js pilot UI (chat, FAQ, insights)
docker/ Compose file booting qdrant + api + web
data/ Sample markdown + calendar sources for pilots
docs/ Architecture notes and support docs
storage/ Local persisted feedback, cached runs
The fastest way to get LENA running on your machine:
git clone https://github.com/watrall/lena.git
cd lena
./start.shThe script checks that Docker is installed, verifies that the needed ports are free, builds the containers, seeds the demo content when it can, and opens your browser to the chat interface. If seeding is skipped, open the Instructors page, log in with the demo credentials, and click Re-run ingestion.
If you prefer more control over the startup process, or if you're on Windows, use Docker Compose directly:
git clone https://github.com/watrall/lena.git
cd lena
docker compose -f docker/docker-compose.yml up --buildOnce the stack is up:
- Seed content (optional but handy): open http://localhost:3000/instructors, log in (
demo/demo), and click Re-run ingestion - Open the chat: http://localhost:3000 and ask "When is Assignment 1 due?"
- Open instructor tools: http://localhost:3000/instructors (requires demo instructor login; graphs fill in after a few
/ask+/feedbackevents) - When prompted, pick one of the sample courses - the backend validates the
course_idon/ask,/feedback,/faq,/insights, and/escalations/request.
Optional API-only ingest:
TOKEN="$(curl -sS -X POST http://localhost:8000/instructors/login \
-H "Content-Type: application/json" \
-d '{"username":"demo","password":"demo"}' | python -c "import json,sys; print(json.load(sys.stdin)['access_token'])")"
curl -sS -X POST http://localhost:8000/ingest/run -H "Authorization: Bearer $TOKEN"If you change course data or want a clean slate, stop the stack and remove storage/ before restarting.
Create a .env file at the repo root using .env.example as a guide (used by ./start.sh and Docker Compose). If you run the backend directly from backend/, either export the LENA_* variables in your shell or create a backend/.env file as well.
| Variable | Description |
|---|---|
NEXT_PUBLIC_API_BASE |
Base URL the frontend calls (defaults to http://localhost:8000). Always include course_id in client requests. |
LENA_QDRANT_HOST / LENA_QDRANT_PORT |
Qdrant connection details if you run the vector store elsewhere. |
LENA_DATA_DIR / LENA_STORAGE_DIR |
Override data or storage paths for ingestion/output. |
LENA_EMBED_MODEL |
Embedding model used to index course materials and student questions by meaning (defaults to sentence-transformers/all-MiniLM-L6-v2). |
LENA_LLM_MODE |
hf (default) to use the configured Hugging Face text-generation model, or off for deterministic demos. |
LENA_HF_MODEL |
Hugging Face model ID used when LENA_LLM_MODE=hf (defaults to HuggingFaceH4/zephyr-7b-beta). |
LENA_HF_MAX_NEW_TOKENS |
Maximum number of new tokens generated per answer when Hugging Face generation is enabled. |
LENA_CORS_ORIGINS |
Comma-separated list of allowed CORS origins (defaults to http://localhost:3000). |
LENA_INSTRUCTOR_USERNAME / LENA_INSTRUCTOR_PASSWORD |
Demo instructor login values. Replace these before any real pilot. |
LENA_ALLOW_DEFAULT_INSTRUCTOR_CREDS |
Allows demo / demo only in sandbox/demo runs. Docker Compose sets this to true; direct backend runs default to false. |
LENA_ENABLE_INGEST_ENDPOINT / LENA_ENABLE_ADMIN_ENDPOINTS / LENA_ENABLE_EXPORT_ENDPOINT |
Feature flags for instructor-only operations. Keep disabled unless you intentionally need them. |
LENA_ENABLE_PII_EXPORT |
Allows exports with student contact fields only when explicitly enabled and an encryption key is configured. |
The backend reads any LENA_* variables via Pydantic settings, while the frontend only needs the NEXT_PUBLIC_* keys because Next.js exposes them to the browser build.
The course picker reads from storage/courses.json. If the file doesn't exist, the backend seeds two sample anthropology courses so the UI always has something to display. To customize the pilot, drop in your own catalog:
[
{ "id": "anth101", "name": "ANTH 101 · Cultural Anthropology", "code": "ANTH 101", "term": "Fall 2024" },
{ "id": "anth204", "name": "ANTH 204 · Archaeology of Everyday Life", "code": "ANTH 204", "term": "Fall 2024" }
]Escalation requests initiated from the chat are stored in storage/escalations.jsonl so instructor follow-ups can be audited or replayed. FAQ entries and review queue items now record the originating course_id, keeping per-course dashboards consistent with the student experience.
API note: Dashboard/admin endpoints (e.g.
/insights,/admin/*,/ingest/run,/instructors/*) require demo instructor login in this pilot build. Course-scoped endpoints also require an explicitcourse_id.
Ingestion tip: organize course content under
data/<course_id>/...so each vector chunk carries the propercourse_id. Files placed directly underdata/inherit the first course fromstorage/courses.json, making it easy to pilot with a single catalog while still supporting multi-course retrieval later.
POST /ask- body must includequestionandcourse_id. Responses contain aquestion_idyou'll reuse.POST /feedback- requiresquestion_id,course_id, and the user's helpfulness choice (plus optional transcript context).GET /faq- requirescourse_idquery params; the backend rejects empty IDs.GET /insights- requires instructor login +course_idquery params.POST /escalations/request- includecourse_id,student_name, andstudent_emailso instructors can follow up.GET /admin/review/POST /admin/promote/GET /admin/export/POST /ingest/run- require instructor login and are locked down via feature flags for the demo.
Frontend:
cd frontend
npm ci
npm run devBackend:
cd backend
python3 -m pip install -r requirements.txt # Requires Python 3.10+
export LENA_LLM_MODE=off
export LENA_ALLOW_DEFAULT_INSTRUCTOR_CREDS=true
export LENA_ENABLE_INGEST_ENDPOINT=true
export LENA_ENABLE_ADMIN_ENDPOINTS=true
export LENA_ENABLE_EXPORT_ENDPOINT=true
uvicorn app.main:app --reload --port 8000Ensure Qdrant is reachable (either docker run qdrant/qdrant or the docker compose stack) before hitting /ask.
Run backend tests (which include a deterministic ingest pass) and the frontend checks before opening a PR:
python3 -m pip install -r backend/requirements.txt
python3 -m pytest
cd frontend
npm ci
npm run lint
npx tsc --noEmit --incremental falseSet LENA_LLM_MODE=off locally for quick deterministic answers and to avoid downloading large Hugging Face models during test runs.
The LENA pilot publishes both backend and frontend containers for reproducibility and deployment.
To pull the latest images directly from Docker Hub:
# Backend (FastAPI)
docker pull docker.io/watrall/lena-backend:latest
docker run -d -p 8000:8000 docker.io/watrall/lena-backend:latest
# Frontend (Next.js)
docker pull docker.io/watrall/lena-web:latest
docker run -d -p 3000:3000 docker.io/watrall/lena-web:latest
Set NEXT_PUBLIC_API_BASE=http://localhost:8000 (or your backend host) before starting the frontend container so the chat can reach the API.
- When deploying the frontend separately (Netlify, Vercel, etc.), set
LENA_CORS_ORIGINSin the backend environment to include the web origin (e.g.,https://lena-pilot.example.edu). The Compose stack already runs both services on the same network so no extra config is required locally. - Mattermost, Slack, LMS, or email integrations should live behind opt-in environment flags so student data only routes to approved channels. The README keeps the defaults closed off; check
docs/SECURITY-NOTES.mdbefore rolling into a large cohort.
- Architecture overview:
docs/OVERVIEW.md - Demo script for pilots:
docs/DEMO-SCRIPT.md - Security and guardrails:
docs/SECURITY-NOTES.md - Changelog:
CHANGELOG.md