Skip to content

watrall/lena

Repository files navigation

CI Docker Pulls Backend Docker Pulls Web

LENA - Learning Engagement & Navigation Assistant

LENA is a lightweight AI assistant for online courses. It helps students get quick, sourced answers about assignments, schedules, course expectations, and university policies without waiting for an email reply or digging through several documents.

The point is not to replace instructors. LENA handles common questions, shows the source material behind each answer, and gives students a path to ask for instructor follow-up when the answer is uncertain. For instructors and support staff, the same activity becomes a feedback loop: repeated questions and low-confidence answers can reveal where instructions, deadlines, or course materials need to be clarified.

Students use a simple chat interface that works on desktop and mobile. Instructors and course admins can review an analytics dashboard that tracks trends, top questions, escalations, and emerging pain points across multiple courses. The pilot runs locally with Docker and is built from three main parts: a FastAPI backend, a Next.js frontend, and a Qdrant vector store.

LENA Features

  • Student view - Ask a course question, get a sourced answer.
    • Each response links back to the syllabus, policy document, uploaded resource, or calendar event it used.
    • When LENA is not confident, it gives the student a way to ask for instructor follow-up and collects consented contact info.
  • Instructor view - Review the course dashboard.
    • KPI cards highlight question volume, helpfulness, and escalations.
    • Trend charts and emerging pain points show where course materials or follow-up announcements may help.
    • Course management tools let instructors/admins add or retire courses, upload documents, save link snapshots, and re-run ingestion so new materials are searchable without touching the server filesystem.
  • Admin / support staff - Review aggregate metrics across pilots, tune ingestion settings, and plan integrations with campus systems as needed.

How LENA Uses AI

LENA uses AI to help find the right course material, not to train a new model on student data. When course materials are ingested, the backend splits them into smaller chunks and uses an embedding model to represent the meaning of each chunk as a vector, which is a list of numbers the system can compare. That embedding model is configured with LENA_EMBED_MODEL and defaults to sentence-transformers/all-MiniLM-L6-v2. The first run may need to download the embedding model. The vectors are stored in Qdrant with course and source metadata.

When a student asks a question, LENA represents the question the same way and searches for course chunks with similar meaning. This is AI-assisted retrieval: it helps match questions to relevant material even when the wording is not identical. For example, a question about when a paper is due can still match a syllabus section that says "paper deadline."

After retrieval, LENA answers in one of two modes:

  • Generative mode - If LENA_LLM_MODE=hf, LENA uses a Hugging Face language model to draft an answer from the retrieved course context.
  • Demo / deterministic mode - If LENA_LLM_MODE=off, LENA does not call a generative model. It returns an extractive answer from the retrieved course snippets.

Both modes still use the embedding and retrieval step. LENA_LLM_MODE=off only disables generative answer drafting.

You do not need to train or publish a Hugging Face model for each course. Course knowledge lives in the ingested materials and the Qdrant vector store. In generative mode, the Hugging Face model drafts and formats an answer from the retrieved course context. It is not the source of course knowledge.

To use Hugging Face generation, choose a published Hugging Face text-generation model, copy its model ID, and put that ID in LENA_HF_MODEL. You do not need to add a Hugging Face URL. For example, LENA_HF_MODEL=HuggingFaceH4/zephyr-7b-beta points LENA to that published model. The current backend loads the selected Hugging Face model locally through the transformers text-generation pipeline rather than sending course context to a hosted Hugging Face API. The first run may need to download the selected model. The model must work with the standard pipeline and be practical for your environment, because LENA currently loads it on CPU with remote model code disabled. In plain terms, choose a standard model that can run on the machine hosting LENA. LENA_HF_MAX_NEW_TOKENS limits the generated answer length. If the model or pipeline cannot load, LENA falls back to extractive mode.

In both modes, the answer is meant to stay grounded in course materials and include citations. The confidence score is a retrieval heuristic, not a guarantee that the answer is correct.

Screenshots

Course selection modal Course selection modal (choose the active course).

Chat interface Chat interface (course-scoped Q and A with citations).

Course FAQ page Course FAQ page (curated questions and answers).

Instructor login Instructor landing page (demo login prompt).

Course management Course management page (add or retire courses and manage resources).

Insights dashboard Insights page (course trends, top questions, and escalations).

Export modal Export modal (choose course scope, components, time range, and CSV or JSON).

Demo Authentication

This repo is a pilot/demo build. It ships with demo courses and sample content, and it does not include production-ready authentication or role-based access control.

  • Student experience (Chat + Course FAQ) is intentionally open in the pilot.

  • Instructor tools (Insights + Course management + Data export + Ingest) are behind a demo-only login prompt to demonstrate a basic authentication flow.

  • Username: demo

  • Password: demo

For any production environment, the app must be connected to institutional authentication for proper security and compliance. This applies to both chat access and role-based access to Insights and Course Admin for instructors, staff, and administrators.

The Docker Compose demo enables these credentials automatically. If you run the backend directly, either set LENA_ALLOW_DEFAULT_INSTRUCTOR_CREDS=true for a sandbox demo or set your own LENA_INSTRUCTOR_USERNAME and LENA_INSTRUCTOR_PASSWORD.


Stack at a Glance

  • Frontend - Next.js (Pages router) + TypeScript + Tailwind, ships as a standalone Node server.
  • Backend - FastAPI service that handles ingestion, retrieval, and the /ask workflow.
  • Vector store - Qdrant (running inside Docker by default).
  • CI - GitHub Actions runs backend tests and a frontend build on every push / PR.

Directory map:

backend/     FastAPI app, embeddings, ingestion tasks
frontend/    Next.js pilot UI (chat, FAQ, insights)
docker/      Compose file booting qdrant + api + web
data/        Sample markdown + calendar sources for pilots
docs/        Architecture notes and support docs
storage/     Local persisted feedback, cached runs

One-Click Start

The fastest way to get LENA running on your machine:

git clone https://github.com/watrall/lena.git
cd lena
./start.sh

The script checks that Docker is installed, verifies that the needed ports are free, builds the containers, seeds the demo content when it can, and opens your browser to the chat interface. If seeding is skipped, open the Instructors page, log in with the demo credentials, and click Re-run ingestion.


Quickstart (Docker Compose)

If you prefer more control over the startup process, or if you're on Windows, use Docker Compose directly:

git clone https://github.com/watrall/lena.git
cd lena
docker compose -f docker/docker-compose.yml up --build

Once the stack is up:

  1. Seed content (optional but handy): open http://localhost:3000/instructors, log in (demo / demo), and click Re-run ingestion
  2. Open the chat: http://localhost:3000 and ask "When is Assignment 1 due?"
  3. Open instructor tools: http://localhost:3000/instructors (requires demo instructor login; graphs fill in after a few /ask + /feedback events)
  4. When prompted, pick one of the sample courses - the backend validates the course_id on /ask, /feedback, /faq, /insights, and /escalations/request.

Optional API-only ingest:

TOKEN="$(curl -sS -X POST http://localhost:8000/instructors/login \
  -H "Content-Type: application/json" \
  -d '{"username":"demo","password":"demo"}' | python -c "import json,sys; print(json.load(sys.stdin)['access_token'])")"

curl -sS -X POST http://localhost:8000/ingest/run -H "Authorization: Bearer $TOKEN"

If you change course data or want a clean slate, stop the stack and remove storage/ before restarting.


Local Development Notes

Environment variables

Create a .env file at the repo root using .env.example as a guide (used by ./start.sh and Docker Compose). If you run the backend directly from backend/, either export the LENA_* variables in your shell or create a backend/.env file as well.

Variable Description
NEXT_PUBLIC_API_BASE Base URL the frontend calls (defaults to http://localhost:8000). Always include course_id in client requests.
LENA_QDRANT_HOST / LENA_QDRANT_PORT Qdrant connection details if you run the vector store elsewhere.
LENA_DATA_DIR / LENA_STORAGE_DIR Override data or storage paths for ingestion/output.
LENA_EMBED_MODEL Embedding model used to index course materials and student questions by meaning (defaults to sentence-transformers/all-MiniLM-L6-v2).
LENA_LLM_MODE hf (default) to use the configured Hugging Face text-generation model, or off for deterministic demos.
LENA_HF_MODEL Hugging Face model ID used when LENA_LLM_MODE=hf (defaults to HuggingFaceH4/zephyr-7b-beta).
LENA_HF_MAX_NEW_TOKENS Maximum number of new tokens generated per answer when Hugging Face generation is enabled.
LENA_CORS_ORIGINS Comma-separated list of allowed CORS origins (defaults to http://localhost:3000).
LENA_INSTRUCTOR_USERNAME / LENA_INSTRUCTOR_PASSWORD Demo instructor login values. Replace these before any real pilot.
LENA_ALLOW_DEFAULT_INSTRUCTOR_CREDS Allows demo / demo only in sandbox/demo runs. Docker Compose sets this to true; direct backend runs default to false.
LENA_ENABLE_INGEST_ENDPOINT / LENA_ENABLE_ADMIN_ENDPOINTS / LENA_ENABLE_EXPORT_ENDPOINT Feature flags for instructor-only operations. Keep disabled unless you intentionally need them.
LENA_ENABLE_PII_EXPORT Allows exports with student contact fields only when explicitly enabled and an encryption key is configured.

The backend reads any LENA_* variables via Pydantic settings, while the frontend only needs the NEXT_PUBLIC_* keys because Next.js exposes them to the browser build.

Courses & multi-course mode

The course picker reads from storage/courses.json. If the file doesn't exist, the backend seeds two sample anthropology courses so the UI always has something to display. To customize the pilot, drop in your own catalog:

[
  { "id": "anth101", "name": "ANTH 101 · Cultural Anthropology", "code": "ANTH 101", "term": "Fall 2024" },
  { "id": "anth204", "name": "ANTH 204 · Archaeology of Everyday Life", "code": "ANTH 204", "term": "Fall 2024" }
]

Escalation requests initiated from the chat are stored in storage/escalations.jsonl so instructor follow-ups can be audited or replayed. FAQ entries and review queue items now record the originating course_id, keeping per-course dashboards consistent with the student experience.

API note: Dashboard/admin endpoints (e.g. /insights, /admin/*, /ingest/run, /instructors/*) require demo instructor login in this pilot build. Course-scoped endpoints also require an explicit course_id.

Ingestion tip: organize course content under data/<course_id>/... so each vector chunk carries the proper course_id. Files placed directly under data/ inherit the first course from storage/courses.json, making it easy to pilot with a single catalog while still supporting multi-course retrieval later.

API requirements

  • POST /ask - body must include question and course_id. Responses contain a question_id you'll reuse.
  • POST /feedback - requires question_id, course_id, and the user's helpfulness choice (plus optional transcript context).
  • GET /faq - requires course_id query params; the backend rejects empty IDs.
  • GET /insights - requires instructor login + course_id query params.
  • POST /escalations/request - include course_id, student_name, and student_email so instructors can follow up.
  • GET /admin/review / POST /admin/promote / GET /admin/export / POST /ingest/run - require instructor login and are locked down via feature flags for the demo.

Running without Docker

Frontend:

cd frontend
npm ci
npm run dev

Backend:

cd backend
python3 -m pip install -r requirements.txt  # Requires Python 3.10+
export LENA_LLM_MODE=off
export LENA_ALLOW_DEFAULT_INSTRUCTOR_CREDS=true
export LENA_ENABLE_INGEST_ENDPOINT=true
export LENA_ENABLE_ADMIN_ENDPOINTS=true
export LENA_ENABLE_EXPORT_ENDPOINT=true
uvicorn app.main:app --reload --port 8000

Ensure Qdrant is reachable (either docker run qdrant/qdrant or the docker compose stack) before hitting /ask.

Testing & linting

Run backend tests (which include a deterministic ingest pass) and the frontend checks before opening a PR:

python3 -m pip install -r backend/requirements.txt
python3 -m pytest

cd frontend
npm ci
npm run lint
npx tsc --noEmit --incremental false

Set LENA_LLM_MODE=off locally for quick deterministic answers and to avoid downloading large Hugging Face models during test runs.


Docker Images

The LENA pilot publishes both backend and frontend containers for reproducibility and deployment.

Service Image Technology Description
Backend Docker Hub FastAPI FastAPI backend for LENA AI - course Q&A, feedback, and analytics to support online learning.
Frontend Docker Hub Next.js Next.js frontend for LENA AI - chat Q&A and learner analytics for online course support.

Pull and Run

To pull the latest images directly from Docker Hub:

# Backend (FastAPI)
docker pull docker.io/watrall/lena-backend:latest
docker run -d -p 8000:8000 docker.io/watrall/lena-backend:latest

# Frontend (Next.js)
docker pull docker.io/watrall/lena-web:latest
docker run -d -p 3000:3000 docker.io/watrall/lena-web:latest

Set NEXT_PUBLIC_API_BASE=http://localhost:8000 (or your backend host) before starting the frontend container so the chat can reach the API.


CORS & Production Considerations

  • When deploying the frontend separately (Netlify, Vercel, etc.), set LENA_CORS_ORIGINS in the backend environment to include the web origin (e.g., https://lena-pilot.example.edu). The Compose stack already runs both services on the same network so no extra config is required locally.
  • Mattermost, Slack, LMS, or email integrations should live behind opt-in environment flags so student data only routes to approved channels. The README keeps the defaults closed off; check docs/SECURITY-NOTES.md before rolling into a large cohort.

Helpful Docs

  • Architecture overview: docs/OVERVIEW.md
  • Demo script for pilots: docs/DEMO-SCRIPT.md
  • Security and guardrails: docs/SECURITY-NOTES.md
  • Changelog: CHANGELOG.md

About

Prototype AI assistant supporting online learners through instant course Q&A, contextual feedback, and analytics. Built with FastAPI, Next.js, and Hugging Face to explore responsible, transparent AI in education, improve student success, and enhance course management and development through data-driven insights.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors