Use this document as a system prompt when implementing the taxonomy/categorization feature for declaude. It contains the full architectural context, category definitions, detection rules, caching strategy, and integration points into the existing codebase.
declaude converts Claude conversation exports (conversations.json) into browsable HTML. The codebase has these files:
| File | Role |
|---|---|
chat_message.py |
ChatMessage dataclass: uuid, text, sender, created_at, updated_at, content (list of block dicts), attachments, files |
conversation.py |
Conversation dataclass: uuid, name, summary, created_at, updated_at, account_uuid, chat_messages. Also provides filename/folder generation |
html_renderer.py |
HtmlRenderer class with render_conversation() and render_index(). Index rows are built in render_index() as <tr> elements with Date and Title columns. Uses PAPERCLIP_SVG for attachment icons |
exporter.py |
export_conversations() orchestrates the pipeline: loads JSON, iterates conversations, renders HTML, builds index entries as dicts with keys: date, title, path, created_dt, has_attachments |
declaude.py |
CLI entry point using argparse. Current flags: input (positional), -o/--output, --utc, -s/--source |
Each ChatMessage.content is a list of dicts with a type field:
"text": hastext(str) andcitations(list) fields"thinking": hasthinking(str) field"tool_use": hasname(str) andinput(dict) fields. For artifacts:input.type,input.title,input.content"tool_result": hasname(str),content(list),is_error(bool) fields
Each ChatMessage.attachments entry has: file_name, file_size, file_type, extracted_content.
Each ChatMessage.files entry has: file_name.
Add a taxonomy categorizer that assigns up to 2 categories to each conversation. Display these as a third column ("Categories") in the index.html table. Cache results in a JSON file so successive runs skip already-categorized conversations.
Use exactly these category names. Each has a priority rank -- when a conversation matches more than 2 categories, keep the 2 with the lowest rank numbers (highest priority).
| Rank | Category | Description |
|---|---|---|
| 1 | Theology | Bible study, apologetics, church history, prayer, scriptural analysis |
| 2 | Python | Python programming, scripts, libraries, pip/uv |
| 3 | Go | Go/Golang programming, modules, CLI tools |
| 4 | Bash | Shell scripting, CLI commands, terminal operations |
| 5 | NATS | NATS messaging, JetStream, nats CLI, Synadia |
| 6 | Networking | Mikrotik, DNS, SSH, SFTP, firewalls, VPNs, Starlink, network hardware |
| 7 | Creative Writing | Satirical stories, fiction, humor pieces, narrative writing |
| 8 | macOS | macOS-specific tools, Homebrew, Time Machine, Finder, system preferences |
| 9 | Data & Formats | JSON processing, CSV, data extraction, file format conversion |
| 10 | Web | HTML, CSS, JavaScript, web APIs, web scraping |
| 11 | AI & LLMs | Prompting, model comparison, Claude features, API usage |
| 12 | General | Catch-all for anything that does not match above categories |
- Assign at most 2 categories per conversation.
- If only 1 category matches, use just that one -- do not pad with General.
- Assign General only if zero other categories match.
- When more than 2 match, keep the 2 with the lowest rank numbers.
- Category names must be used exactly as shown (case-sensitive).
Categorize by scanning the conversation title, the summary field, and the first 5 human messages (content text blocks only, not assistant messages). Do not scan the entire conversation -- the first few human messages establish the topic.
Theology
- Title keywords: bible, scripture, verse, psalm, proverb, genesis, exodus, leviticus, numbers, deuteronomy, joshua, judges, ruth, samuel, kings, chronicles, ezra, nehemiah, esther, job, ecclesiastes, isaiah, jeremiah, lamentations, ezekiel, daniel, hosea, joel, amos, obadiah, jonah, micah, nahum, habakkuk, zephaniah, haggai, zechariah, malachi, matthew, mark, luke, john, acts, romans, corinthians, galatians, ephesians, philippians, colossians, thessalonians, timothy, titus, philemon, hebrews, james, peter, jude, revelation, theology, apologetics, gospel, prayer, church, sermon, faith, god, jesus, christ, hebrew, greek (in biblical context), NET bible, NIV, ESV, KJV, testament, covenant
- Content signals: Bible verse references (e.g. "John 3:16", "Gen 1:1"), theological terms
Python
- Title keywords: python, .py, pytest, pip, uv run, pandas, numpy, flask, django, fastapi, dataclass, pydantic
- Content signals: code fences tagged
pythonorpy,importstatements for Python modules,deffunction definitions,classwith Python-style inheritance,.pyfile references in attachments
Go
- Title keywords: golang, go module, go cli, .go
- Content signals: code fences tagged
goorgolang,package main,func,import ",.gofile references - IMPORTANT: Do not match the bare word "go" in natural English ("go ahead", "let's go"). Require either the code fence tag,
gofollowed by a technical term (module, build, run, install, test, fmt, vet), orgolang
Bash
- Title keywords: bash, shell, zsh, script, terminal, .sh
- Content signals: code fences tagged
bash,sh,shell, orzsh, shebang lines (#!/bin/bash,#!/bin/sh), common CLI tool names in code context (grep, awk, sed, find, xargs, curl, wget)
NATS
- Title keywords: nats, jetstream, synadia, nats-server, nats cli
- Content signals:
natsCLI commands, JetStream references,nats://URLs, stream/consumer terminology in NATS context
Networking
- Title keywords: mikrotik, routerboard, dns, ssh, sftp, firewall, vpn, wireguard, starlink, subnet, vlan, router, switch, ip address, dhcp, tcp, udp
- Content signals: IP addresses, CIDR notation, network configuration blocks, RouterOS commands
Creative Writing
- Title keywords: satirical, satire, story, fiction, humor, narrative, short story, writing prompt, creative
- Content signals: Long-form prose without code blocks, narrative structure, character dialogue. Be conservative -- a conversation about writing code is not creative writing
macOS
- Title keywords: macos, mac os, macbook, homebrew, time machine, finder, spotlight, applescript, diskutil
- Content signals: macOS-specific commands (defaults write, diskutil, osascript, brew), .app references, macOS system paths (/Library, ~/Library, /Applications)
Data & Formats
- Title keywords: json, csv, xml, yaml, data extract, parsing, file format, convert
- Content signals: JSON/CSV/XML processing discussion, jq commands, data transformation pipelines. Only when data processing is the primary topic -- a Python conversation that happens to parse JSON should be categorized as Python, not Data & Formats
Web
- Title keywords: html, css, javascript, typescript, react, vue, angular, api endpoint, web scraping, http, rest api
- Content signals: code fences tagged
html,css,javascript,typescript,jsx,tsx, HTML tags in content, HTTP methods discussion
AI & LLMs
- Title keywords: prompt, llm, gpt, claude, model, ai, chatgpt, anthropic, openai, gemini, fine-tune, embedding, token
- Content signals: Discussion of AI model capabilities, prompt engineering, API usage for LLMs. Do not match when "claude" appears only as a proper name or "model" appears in non-AI context (data models, 3D models)
General
- Assigned only when no other category matches.
function categorize(conversation):
scores = {} # category -> int
# Build the text corpus to scan
title = conversation.name.lower()
summary = conversation.summary.lower()
human_texts = []
for msg in conversation.chat_messages[:10]: # first 10 messages
if msg.sender != "human":
continue
for block in msg.content:
if block.type == "text":
human_texts.append(block.text.lower())
for att in msg.attachments:
human_texts.append(att.file_name.lower())
for f in msg.files:
human_texts.append(f.file_name.lower())
if len(human_texts) >= 5:
break
corpus = "\n".join([title, summary] + human_texts)
# Also extract code fence language tags from first 10 messages
code_langs = set()
for msg in conversation.chat_messages[:10]:
for block in msg.content:
if block.type == "text":
# extract language from ```lang blocks
for line in block.text.split("\n"):
stripped = line.strip()
if stripped.startswith("```") and len(stripped) > 3:
lang = stripped[3:].strip().split()[0].lower()
if lang:
code_langs.add(lang)
# Score each category using title keywords + corpus keywords + code fences
# Title matches are worth 3 points, corpus matches 1 point, code fences 2 points
# Use the keyword lists from the detection signals above
for category, rules in CATEGORY_RULES.items():
for keyword in rules.title_keywords:
if keyword in title:
scores[category] = scores.get(category, 0) + 3
for keyword in rules.corpus_keywords:
if keyword in corpus:
scores[category] = scores.get(category, 0) + 1
for lang_tag in rules.code_fence_tags:
if lang_tag in code_langs:
scores[category] = scores.get(category, 0) + 2
# Filter to categories with score > 0
matched = {k: v for k, v in scores.items() if v > 0}
if not matched:
return ["General"]
# Sort by rank (priority), breaking ties by score (higher first)
sorted_cats = sorted(matched.keys(), key=lambda c: (RANK[c], -matched[c]))
return sorted_cats[:2]
- Conversations titled "Untitled": Rely entirely on content scanning.
- Conversations with emoji-prefixed titles (e.g. starting with a speech bubble): Strip leading emoji before keyword matching. The title often contains a truncated first message after the emoji.
- Multi-topic conversations: The 2-category limit and rank-based priority handles this. A conversation about "Python script for Bible verse lookup" would get Theology (rank 1) + Python (rank 2).
- Short conversations (1-2 messages): Title + summary may be the only useful signals. This is fine.
- "Go" ambiguity: Only match
gowhen preceded/followed by technical context or when ago/golangcode fence is present. Never match bare "go" as a verb.
Store at {output_dir}/taxonomy_cache.json with this structure:
{
"version": 1,
"categories": {
"conv-uuid-1": ["Python", "Bash"],
"conv-uuid-2": ["Theology"],
"conv-uuid-3": ["General"]
}
}- Key: conversation UUID (stable across exports).
- Value: list of 1-2 category strings.
- The
versionfield allows future schema changes.
- At startup, load the cache file if it exists.
- For each conversation, check if its UUID is in the cache.
- If cached, use the cached categories. If not, run the categorizer and add to cache.
- After export completes, write the updated cache back to disk.
- If the cache file does not exist, create it.
Add --no-cache flag to force re-categorization of all conversations. This rebuilds the cache from scratch.
Create a single new file containing:
CATEGORY_RULES: dict mapping category names to their keyword lists and code fence tagsCATEGORY_RANKS: dict mapping category names to their priority rankcategorize(conv: Conversation) -> list[str]: returns 1-2 category namesload_cache(cache_path: Path) -> dict[str, list[str]]: loads or returns emptysave_cache(cache_path: Path, cache: dict[str, list[str]]) -> None: writes cache- The categorizer takes a
Conversationobject (already defined inconversation.py) and accessesconv.name,conv.summary, andconv.chat_messages[*].content[*]
In export_conversations():
- Add
no_cache: bool = Falseparameter. - After loading conversations, load the taxonomy cache from
output_dir / "taxonomy_cache.json". - Inside the
for conv in conversations:loop, after building the index entry dict, add a"categories"key:if conv.uuid in cache and not no_cache: categories = cache[conv.uuid] else: categories = categorize(conv) cache[conv.uuid] = categories index_entries.append({ "date": ..., "title": ..., "path": ..., "created_dt": ..., "has_attachments": ..., "categories": categories, })
- After the loop, save the updated cache.
In render_index():
- Add a third
<th>Categories</th>column to the table header. - In the row building loop, read
entry.get("categories", [])and join with", ". - Render as:
<td class="categories">{categories_str}</td> - Add CSS for the categories column:
td.categories { font-size: 0.85em; color: #555; white-space: nowrap; }
- Consider a fixed width for the column (e.g.
10em) to keep the table aligned.
- Add
--no-cacheargument to the argument parser. - Pass
no_cache=args.no_cachethrough toexport_conversations().
After implementing, verify by spot-checking these conversations (known from the current dataset):
| Conversation title | Expected categories |
|---|---|
| NATS Jetstream push vs pull consumers | NATS |
| Refactoring Monolithic Code into Modular Structure | Python |
| Go CLI tool using standard library modules | Go |
| Configuring sftp chroot for single user | Bash, Networking |
| Bible study website navigation design | Theology, Web |
| Mikrotik RB5009UGSIN vs L009UiGS-RM specs | Networking |
| Creating a satirical BBQ story outline | Creative Writing |
| MacOS Time Machine setup for external NVMe | macOS |
| Jobs true message about Gods character | Theology |
| Morse code converter with obfuscated variable names | Python |
| 1966 persona system prompt | AI & LLMs |
| Publishing satirical short story collection | Creative Writing |
Run the export twice to verify caching works -- the second run should not re-categorize any conversations and should complete faster.