Grade your AI chat-history exports 100% locally. Point moderngirl at an export
from ChatGPT, Claude, or Gemini, and a local model running on
Ollama grades how good the AI was across five categories —
printing scores and specific, actionable suggestions for each.
Your private chat history never leaves your machine. No cloud APIs, no internet calls for grading — only your local Ollama server. The raw chat lives in memory during the run and is never written to disk or logged.
$ moderngirl ~/Downloads/my-claude-export.zip
Exports of your AI conversations are a goldmine of feedback — but reading them
back to judge how well the assistant actually did is tedious. moderngirl does it
for you, locally, and tells you where the AI was strong and where it could
improve. It judges the whole interaction (your prompt and the AI's answer),
so a great answer to a vague prompt is rewarded fairly.
| Category | What it measures |
|---|---|
| Linguistic Fluidity | How natural, clear, and well-structured the writing is |
| Logic & Reasoning | Soundness, depth, and correctness of reasoning |
| Code Quality | Correctness, clarity, and best-practices of any code |
| Grammar & Precision | Grammatical correctness and precise wording |
| Task Handling | How well the AI understood and completed your need |
Each category gets a 0–100 score, a visual bar, and a short list of "did well / improve this" suggestions, plus an overall score and summary.
- Python 3.8+
- Ollama installed and running, with at least one model
rich(optional, for the nice terminal report — the tool falls back to plain text without it)
# install ollama (see https://ollama.com), then start the server:
ollama serve
# pull a small, capable model that runs on most laptops:
ollama pull llama3.2Other good choices: gemma2:2b (lightest), qwen2.5:7b or gemma2:9b (more
capable, more RAM). moderngirl is model-agnostic — it detects whatever you
have installed and lets you pick.
git clone https://github.com/yourname/moderngirl.git
cd moderngirl
# recommended: a virtual environment
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt # installs rich
# optional: install the `moderngirl` command
pip install -e .You can also run it without installing, straight from the repo:
python3 cli.py <path-to-export>moderngirl <path> # parse, pick a model, grade
moderngirl <path> --model llama3.2 # skip the model picker
moderngirl <path> --max-turns 30 # analyse more turns (slower, more thorough)
moderngirl <path> --host http://localhost:11434
moderngirl <path> --plain # disable colored/boxed output<path> can be a file or a folder:
- A Google/OpenAI/Anthropic export
.zip conversations.json(ChatGPTmappingtree, or Claudechat_messages)- A Claude
.jsonlexport (one{"role","content"}per line) - A Claude
.htmlexport - A Gemini Takeout
MyActivity.json/.html - …or a folder containing any of the above (it searches inside)
On startup moderngirl:
- checks that Ollama is running (and tells you
ollama serveif not), - lists your installed models and lets you choose (or honors
--model), - parses the export and reports what it found,
- grades a representative sample of turns with a live progress bar,
- prints the scores table + per-category suggestion panels.
Local grading is private but limited by your hardware (a big model on a small GPU is slow). If you don't have a strong machine, you can grade via a cloud API instead — much faster and smarter. Local stays the default; cloud is opt-in.
Easiest — save your settings once:
moderngirl --setup # pick mode, paste your API key + model — saved to
# ~/.config/moderngirl/config.json (owner-only)
moderngirl my-export.zip # from now on, just run it — no flags, no key to retypeIf you run moderngirl with no path, it simply asks you for the file location.
# Local (default, 100% private):
moderngirl <path> --model llama3.2
# Cloud API (faster/smarter — sends chat text to the provider to grade):
export GROQ_API_KEY=... # or OPENAI_API_KEY, GEMINI_API_KEY, etc.
moderngirl <path> --api groq --model llama-3.3-70b-versatile
moderngirl <path> --api openai --model gpt-4o-mini
moderngirl <path> --api gemini --model gemini-2.0-flash
moderngirl <path> --api openrouter --model anthropic/claude-3.5-sonnet
# any OpenAI-compatible endpoint:
moderngirl <path> --api-base https://your-host/v1 --api-key KEY --model NAMESupported --api providers (all OpenAI-compatible): openai, groq,
openrouter, gemini, deepseek, mistral, together. The key is read from
the provider's env var (e.g. GROQ_API_KEY) or --api-key.
⚠️ Privacy trade-off: in cloud mode your chat text is sent to the provider to be graded. The export file still stays on your disk, and nothing is logged — but the content does travel during grading. Use local mode if that matters.
Every grade is saved automatically (just the scores — no chat content),
tagged by the chat's provider (e.g. claude). The next time you grade a claude
export, moderngirl shows how the scores changed:
moderngirl my-newer-claude-export.zip # auto-compares to your last 'claude' grade
moderngirl export.zip --label me-claude # use your own tag instead
moderngirl export.zip --no-save # grade without saving
moderngirl --history # list all your saved grades
moderngirl --history claude # see one tag's trend over timeExample after a second grade:
Progress vs your last 'claude' grade (2026-06-01)
Category Then Now Δ
Code Quality 70 84 ▲14
...
OVERALL 79 86 ▲7
📈 Up 7 points since last time — improving!
History is stored in ~/.config/moderngirl/history.json (owner-only, scores only).
moderngirl --setup # add/update a provider's key (keeps the others)
moderngirl --list-keys # show saved keys (masked)
moderngirl --delete-key gemini # remove one (e.g. pasted by mistake)
moderngirl --delete-key all # remove all saved keysOne key is stored per provider, so you can keep e.g. Gemini and OpenAI keys
side by side and switch with --api gemini / --api openai.
╭──────────────────────────────────────────────────────────╮
│ ModernGirl — AI Chat History Grader │
│ grade how good your AI was, from your past chats · local or cloud │
╰──────────────────────────────────────────────────────────╯
Found 32 conversation(s), 1828 message(s) · provider: claude
Grading 20 sampled turn(s) in 5 chunk(s) with llama3.2:latest …
──────────── Results · claude · model: llama3.2:latest ────────────
Category Score
Linguistic Fluidity 88/100 ████████████████████░░░░ solid
Logic & Reasoning 92/100 ██████████████████████░░ excellent
Code Quality 74/100 █████████████████░░░░░░░ solid
Grammar & Precision 94/100 ███████████████████████░ excellent
Task Handling 90/100 ██████████████████████░░ excellent
╭─ OVERALL 88/100 ─────────────────────────────────────╮
│ Strong reasoning and task handling; tighten verbosity. │
╰────────────────────────────────────────────────────────╯
╭─ Logic & Reasoning — 92/100 ──────────────────────────╮
│ • Reasoning is sound and well-sequenced; consider │
│ stating assumptions up front for complex asks. │
╰────────────────────────────────────────────────────────╯
...
When a category has no relevant content in the sampled turns (e.g. no code at all), it is shown as n/a and excluded from the overall — never a misleading low score.
Three small, independently testable modules:
export file/folder
│
parser.py → one common format {provider, conversations:[{title, messages:[{role,text}]}]}
│
grader.py → samples {prompt → answer} turns → local Ollama → scores + suggestions
│
cli.py → the `moderngirl` command: checks, model pick, progress, final report
Each module also runs standalone for testing:
python3 parser.py <path> # prints provider, counts, first 2 messages
python3 grader.py <path> --model <name> # parses then grades, prints raw results
python3 cli.py <path> # the full experienceLocal models have limited context, so grader.py doesn't send everything. It
samples a representative, capped set of turns — spread across conversations
and across each chat's timeline — then grades them in chunks and aggregates.
All the knobs are top-of-file constants in grader.py:
| Constant | Meaning |
|---|---|
CATEGORIES |
the graded categories (edit freely) |
MAX_TURNS |
total turns analysed (also --max-turns) |
CHUNK_SIZE |
turns per model request |
MAX_PROMPT_CHARS / MAX_ANSWER_CHARS |
per-message truncation |
- Local mode (default) is fully private. Grading talks only to your Ollama
server at
localhost:11434. No data leaves your machine. - Cloud mode (opt-in
--api) sends chat text to the provider. That's required for the provider to grade it. The export file stays on your disk and nothing is logged, but the content travels during grading. Choose local if that matters to you. - In memory only. The raw chat history is read, held in memory for the run, and discarded. It is never written to disk by moderngirl.
- No raw logging. Only scores and suggestions are printed; conversation text is never logged.
- "Ollama is not reachable" → run
ollama servein another terminal. - "No Ollama models are installed" →
ollama pull llama3.2. - "model '…' is not installed" → check
ollama list, thenollama pull <name>. - Grading is slow → use a smaller model (e.g.
llama3.2,gemma2:2b) or a lower--max-turns. - Output looks plain/boxless →
pip install rich.
MIT — see LICENSE.