Skills: Transformers.js / WebLLM · prompt engineering · build scripting · QA review
Time: ~10 hours
Good for: ML engineers · NLP folks · lore curators
Difficulty: Advanced
Context
Of our 310 entities, ~120 don't have Wikipedia entries (entity.long is
empty). Mostly minor characters, ships, vehicles. The semantic search struggles
on these because there's no narrative for the embedding model to chew on.
Goal
At build time, run a small local LLM (Phi-3-mini, Gemma-2B, or similar via
Transformers.js / WebLLM) to generate canonical descriptions from the entity's
relations + name + type. Manually verified before merging into kb.json.
Where to start
- New
scripts/build-llm-descriptions.ts — pure Node script that:
- Loads kb.json
- For each entity with empty
long, builds a structured prompt from
name + type + relations + short
- Calls a local LLM (no external API)
- Caches output to
data/.cache/llm/<entityId>.json for review
- A small UI in CLI to approve/reject each generated description before merging
- Re-run
build:embeddings after merge
Acceptance criteria
- 80%+ of empty
long fields populated with plausible descriptions
- Every generated description manually reviewed (one-shot pass is fine)
- No hallucinated facts; if the LLM doesn't have enough signal, leave the
field empty
- Local-only, no API keys, no network calls beyond model download
Notes
- Model size budget: ≤2 GB on disk, ≤4 GB RAM
- Generation budget: ≤2s per entity on CPU (so ~4 minutes total)
Context
Of our 310 entities, ~120 don't have Wikipedia entries (
entity.longisempty). Mostly minor characters, ships, vehicles. The semantic search struggles
on these because there's no narrative for the embedding model to chew on.
Goal
At build time, run a small local LLM (Phi-3-mini, Gemma-2B, or similar via
Transformers.js / WebLLM) to generate canonical descriptions from the entity's
relations + name + type. Manually verified before merging into kb.json.
Where to start
scripts/build-llm-descriptions.ts— pure Node script that:long, builds a structured prompt fromname + type + relations + shortdata/.cache/llm/<entityId>.jsonfor reviewbuild:embeddingsafter mergeAcceptance criteria
longfields populated with plausible descriptionsfield empty
Notes