A Mastra agent that ingests textbook PDFs and produces structured pedagogical JSON, plus a standalone CLI that turns those JSONs into per-class Excel workbooks.
The repo has three pieces:
src/mastra/— Mastra agents (pedagogyAgent,pedagogyLiteratureAgent), tools, prompts, and a shared Mastra Workspace (src/mastra/workspace.ts). Exposes amastra devserver on port4111.standalone-excel-converter/— Pure-Bun CLI (cli.ts,batch-convert.ts) that consumes the JSON output and produces Excel workbooks.workspace/— Persistent agent workspace. The MastraLocalFilesystembasePathis rooted here, and all generated artifacts land inworkspace/outputs/. This is where the agent and converter exchange files.
You can run everything locally with Bun or in Docker.
- Bun ≥ 1.0 (
curl -fsSL https://bun.sh/install | bash) - A Google AI / Vertex AI API key with access to
gemini-2.5-pro
git clone <repo-url> pedagogy-agent-mastra
cd pedagogy-agent-mastra
bun installcp .env.example .env
# edit .env and set GOOGLE_GENERATIVE_AI_API_KEYBun loads .env automatically — no dotenv needed.
The workspace/ directory is checked into the repo (skeleton only). At runtime the agent and converter both read/write workspace/outputs/:
workspace/
└── outputs/
├── json_files/ ← agent saveJsonToFile target, converter input
└── excel_files/ ← converter output, also agent jsonToXLSXTool target
workspace/outputs/* is gitignored — only the directory skeleton is tracked. Nothing else to set up; the tools mkdir -p on first write.
bun run devmastra dev boots on http://localhost:4111 (playground UI + REST API). The agent is wired with a Workspace (see src/mastra/workspace.ts), so it has built-in read_file / write_file / list_dir tools scoped to workspace/, plus the project's custom PDF and JSON-saving tools.
Open the playground, pick Pedagogy Agent or Pedagogy Literature Agent, paste a public PDF URL, and watch outputs land in workspace/outputs/json_files/ on the host.
Use this path if you don't want Bun on the host or want a reproducible runtime.
- Docker ≥ 24
- Docker Compose v2
Dockerfile— Bun 1.x (Debian) base, installs deps, copies source + workspace skeleton, runsbun run devon:4111.docker-compose.yml— two services on the same image:pedagogy-agent— long-running Mastra dev server.converter— gated behindprofiles: ["tools"], intended fordocker compose run --rm converter ....
.dockerignore— excludesnode_modules,.env,.mastra,workspace/outputs/, etc.
cp .env.example .env
# edit .envdocker compose buildBakes deps into pedagogy-agent:latest; both services share it.
docker compose up -d pedagogy-agent
docker compose logs -f pedagogy-agent # tail
docker compose down # stopReachable at http://localhost:4111.
The converter service is opt-in. Either drop into a shell:
docker compose run --rm converter
# inside:
bun batch-convert.ts
bun cli.ts ../workspace/outputs/json_files/class_1-english_bb_class1.json english class_1
exit…or run a single command:
docker compose run --rm converter bun batch-convert.ts
docker compose run --rm converter bun cli.ts example.json mathematics class5Or, if the agent is already up, exec into it:
docker compose exec pedagogy-agent bash
cd standalone-excel-converter && bun batch-convert.tsOnly workspace/outputs/ is bind-mounted — that's the persistent boundary between host and container. Everything else (source, node_modules) lives inside the image, so host/container Bun versions can't collide.
| Host | Container | Purpose |
|---|---|---|
./workspace/outputs |
/app/workspace/outputs |
Agent + converter artifacts |
To pick up source changes after edits, rebuild: docker compose build. (For a live-reload dev workflow, add a ./src:/app/src mount yourself — left out of the default to keep the image self-contained.)
docker compose build --no-cache pedagogy-agentLocal:
bun run devDocker:
docker compose up -d pedagogy-agentHit the API:
curl -X POST http://localhost:4111/analyze-pdf \
-H "Content-Type: application/json" \
-d '{"pdfUrl": "https://example.com/textbook.pdf"}'The agent downloads the PDF, parses it, returns the pedagogical structure, and writes JSON to workspace/outputs/json_files/ via the saveJsonToFile tool.
bun src/batch-process.ts input.json
bun src/batch-process.ts input.json --class=1 --concurrent=2
bun src/batch-process.ts input.json --book=hindi_bb_class1
bun src/batch-process.ts input.json --skip-existing --concurrent=3Manifest shape (input.json):
{
"hindi_bb_class1": {
"url": "https://.../hindi-class1.pdf",
"class": 1,
"medium": ["hindi"],
"filename": "hindi_bb_class1",
"prompt": "..."
}
}--skip-existing checks workspace/outputs/json_files/ for already-processed books.
Convert every JSON in workspace/outputs/json_files/ into per-class workbooks:
cd standalone-excel-converter
bun batch-convert.tsSingle file:
bun cli.ts <json-file> <subject> <class>
# example
bun cli.ts ../workspace/outputs/json_files/class_1-english_bb_class1.json english class_1Output:
workspace/outputs/excel_files/
├── class_1-pedagogy.xlsx
│ ├── english (english_bb + english_book records merged)
│ ├── hindi
│ └── marathi
└── ...
For converter internals (subject normalization, multi-medium handling, file-naming conventions) see standalone-excel-converter/README.md.
| Task | Command |
|---|---|
| Start agent (local) | bun run dev |
| Start agent (Docker) | docker compose up -d pedagogy-agent |
| Tail logs (Docker) | docker compose logs -f pedagogy-agent |
| Shell into running agent | docker compose exec pedagogy-agent bash |
| One-off converter shell | docker compose run --rm converter |
| One-off batch convert | docker compose run --rm converter bun batch-convert.ts |
| Rebuild after dep changes | docker compose build --no-cache |
src/mastra/workspace.tsinstantiates aWorkspacewithLocalFilesystem({ basePath: <repo>/workspace })and aLocalSandboxrooted at the same path.- The workspace is attached to both agents via
new Agent({ ..., workspace }). Mastra automatically exposes filesystem tools (read_file,write_file,list_dir, …) scoped tobasePath. The destructivedeletetool is disabled. - Custom tools (
saveJsonToFile,convertToXlsx) write toworkspace/outputs/json_files/andworkspace/outputs/excel_files/via constants exported fromworkspace.ts— same physical location as the workspace tools, so the agent can alsoread_file/list_dirover its own outputs. - The path is resolved via
import.meta.dirnameso it's stable regardless ofprocess.cwd()(which differs betweenmastra dev,bun run, and the bundled build).
- Port 4111 in use — change the host side in
docker-compose.yml(e.g."5111:4111"). GOOGLE_GENERATIVE_AI_API_KEYnot set — the agent fails on the first model call.docker compose configshows the resolved env.- Empty
workspace/outputs/excel_files/— make sureworkspace/outputs/json_files/has files matchingclass_N-subject_type_classN.jsonbefore runningbatch-convert.ts. - Permission errors on Linux bind mounts — the container runs as
bun. If host-owned files end up unwritable:chown -R $USER workspace/outputs.