ClawLearn

ClawLearn is a Python CLI that turns real-world text (podcast transcripts, articles, etc.) into Anki cloze decks (.apkg) for language learning.

It:

ingests content from local .txt/.md/.epub files
cleans and chunks the text into context blocks
uses an OpenAI-compatible LLM to generate contextual cloze sentences
uses a separate (usually cheaper) LLM for translations
uses edge_tts to generate audio for each card
exports a complete Anki deck via genanki
applies taxonomy-aware candidate ranking in advanced mode
combines model-proposed labels with programmatic re-ranking corrections
adds expression_transfer hints to capture cross-context reuse intent
writes taxonomy/validation/transfer metrics into run_summary.json for tuning

This README describes the current V2-oriented CLI. For an overview in Chinese, see README_zh.md.

1. Installation

1.1 Create a virtualenv

python -m venv .venv
# Windows (PowerShell)
. .venv/Scripts/activate
# macOS / Linux
# source .venv/bin/activate

pip install -r requirements.txt

1.2 Initialize the project

# Copy env example and check prompts/templates
python -m clawlearn.cli init

init will:

create .env from .env.example (if missing)
verify that the default prompts and template exist:
- ./prompts/cloze_contextual.json
- ./prompts/cloze_prose_beginner.json
- ./prompts/cloze_prose_intermediate.json
- ./prompts/cloze_prose_advanced.json
- ./prompts/cloze_transcript_beginner.json
- ./prompts/cloze_transcript_intermediate.json
- ./prompts/cloze_transcript_advanced.json
- ./prompts/cloze_textbook_examples.json
- ./prompts/translate_rewrite.json
- ./prompts/template_extraction.json
- ./prompts/template_explanation.json
- ./templates/anki_cloze_default.json

Edit .env (or copy ENV_EXAMPLE.md content into your own env file) to point at your LLM endpoints and TTS voices.

2. Configuration (ENV_EXAMPLE)

All configuration is done via environment variables. The repository includes ENV_EXAMPLE.md describing defaults. Key groups:

2.1 LLM (cloze)

CLAWLEARN_LLM_PROVIDER=openai_compatible
CLAWLEARN_LLM_BASE_URL=http://127.0.0.1:8000/v1
CLAWLEARN_LLM_API_KEY=YOUR_API_KEY
CLAWLEARN_LLM_MODEL=qwen3-30b
CLAWLEARN_LLM_TIMEOUT_SECONDS=120
CLAWLEARN_LLM_MAX_RETRIES=3
CLAWLEARN_LLM_RETRY_BACKOFF_SECONDS=2.0
# Base sleep between successful LLM calls (seconds);
# actual sleep is random in [N, 3N], 0 means no sleep.
CLAWLEARN_LLM_REQUEST_SLEEP_SECONDS=0
CLAWLEARN_LLM_TEMPERATURE=0.2

The cloze LLM is responsible for generating contextual cloze sentences. It expects an OpenAI-compatible /chat/completions endpoint.

2.2 Ingest cleaning

CLAWLEARN_INGEST_SHORT_LINE_MAX_WORDS=3

This controls pre-LLM line filtering:

lines with very few words (for example one-word interjections) are dropped;
set CLAWLEARN_INGEST_SHORT_LINE_MAX_WORDS=0 to disable this filter.
.md input is converted to plain text before filtering.
.epub input is unpacked and chapter HTML is converted to plain text before filtering.

2.3 Chunking

Chunking controls how input text is split into context blocks:

# Character-based chunking, with soft sentence boundaries.
CLAWLEARN_CHUNK_MAX_CHARS=1800
CLAWLEARN_CHUNK_MIN_CHARS=120
CLAWLEARN_CHUNK_OVERLAP_SENTENCES=1

Behaviour:

text is split into paragraphs, then short paragraphs are merged until CHUNK_MIN_CHARS is reached
each merged paragraph is split into chunks based on CHUNK_MAX_CHARS:
- extend by whole sentences until just below the char limit
- never split inside a sentence
- adjacent chunks may share CHUNK_OVERLAP_SENTENCES of overlap

2.4 Cloze-level controls

These control per-card behaviour:

# Max sentences per cloze text (validator + prompt docs)
CLAWLEARN_CLOZE_MAX_SENTENCES=3

# Min characters per cloze text. Too-short candidates are discarded (0 = no limit).
CLAWLEARN_CLOZE_MIN_CHARS=200

# Difficulty: beginner | intermediate | advanced
# Can be overridden by CLI --difficulty
CLAWLEARN_CLOZE_DIFFICULTY=intermediate

# Max number of cards per chunk (after dedupe); empty/0 = no per-chunk cap
CLAWLEARN_CLOZE_MAX_PER_CHUNK=4

# LLM chunk batch size: how many chunks to process per LLM call; 1 = per-chunk
CLAWLEARN_LLM_CHUNK_BATCH_SIZE=1

# Retry format-only validation failures (recover candidates rejected for format issues).
CLAWLEARN_VALIDATE_FORMAT_RETRY_ENABLE=true
# Retry attempts after initial validation failure (0-3).
CLAWLEARN_VALIDATE_FORMAT_RETRY_MAX=3
# Allow attempts >=2 to call LLM repair/regenerate.
CLAWLEARN_VALIDATE_FORMAT_RETRY_LLM_ENABLE=true

The LLM decides how many candidates to return per chunk (0-N).
CLOZE_MAX_PER_CHUNK is a safety cap applied after validation and dedupe; set to 0 or empty to disable.
Difficulty is now a first-class strategy selector (not only a prompt hint): it affects prompt family variant, validation, and ranking.
CLAWLEARN_LLM_CHUNK_BATCH_SIZE is always user-respected (no profile/difficulty hard override).

For material_profile=textbook_examples, if CLOZE_MIN_CHARS is above 120 and you do not override with --cloze-min-chars, the run is rejected.

2.5 Translation LLM (small LLM)

# Optional small LLM for translation; falls back to main LLM if empty.
CLAWLEARN_TRANSLATE_LLM_BASE_URL=
CLAWLEARN_TRANSLATE_LLM_API_KEY=
CLAWLEARN_TRANSLATE_LLM_MODEL=
CLAWLEARN_TRANSLATE_LLM_TEMPERATURE=
# Number of originals translated in one request (recommended: 4-8).
CLAWLEARN_TRANSLATE_BATCH_SIZE=4

If TRANSLATE_LLM_BASE_URL/MODEL are set, translations are generated by this small LLM (cheaper backend).
Otherwise, translations use the main cloze LLM.
CLAWLEARN_TRANSLATE_BATCH_SIZE controls batch size per request (recommended range: 4-8; start from 4).
Translation batches use request-level retries (max 3 attempts). For partial responses, successful items are consumed first and only remaining items are retried.

2.X Secondary extraction (dual-LLM phrase extraction)

ClawLearn can optionally run a secondary extraction pass using a different LLM configuration. This is useful when your primary model is conservative (high precision) and a secondary model can provide recall, or when you want to compare/merge candidates from two models.

Enable via CLAWLEARN_SECONDARY_EXTRACT_ENABLE=true.
Secondary pass never aborts the whole run; errors are recorded and the build falls back to primary.
Candidates from both passes are deduped and merged. run_summary.json reports the unique gain.

Relevant env vars:

CLAWLEARN_SECONDARY_EXTRACT_ENABLE=false
CLAWLEARN_SECONDARY_EXTRACT_PARALLEL=false
CLAWLEARN_SECONDARY_EXTRACT_LLM_BASE_URL=
CLAWLEARN_SECONDARY_EXTRACT_LLM_API_KEY=
CLAWLEARN_SECONDARY_EXTRACT_LLM_MODEL=
CLAWLEARN_SECONDARY_EXTRACT_LLM_TIMEOUT_SECONDS=
CLAWLEARN_SECONDARY_EXTRACT_LLM_TEMPERATURE=
CLAWLEARN_SECONDARY_EXTRACT_LLM_MAX_RETRIES=
CLAWLEARN_SECONDARY_EXTRACT_LLM_RETRY_BACKOFF_SECONDS=
CLAWLEARN_SECONDARY_EXTRACT_LLM_CHUNK_BATCH_SIZE=

2.6 Prompts, templates, output

CLAWLEARN_CONTENT_PROFILE=prose_article
CLAWLEARN_MATERIAL_PROFILE=prose_article
CLAWLEARN_LEARNING_MODE=lingua_expression
CLAWLEARN_PROMPT_CLOZE=./prompts/cloze_contextual.json
CLAWLEARN_PROMPT_CLOZE_TEXTBOOK=./prompts/cloze_textbook_examples.json
CLAWLEARN_PROMPT_CLOZE_PROSE_BEGINNER=./prompts/cloze_prose_beginner.json
CLAWLEARN_PROMPT_CLOZE_PROSE_INTERMEDIATE=./prompts/cloze_prose_intermediate.json
CLAWLEARN_PROMPT_CLOZE_PROSE_ADVANCED=./prompts/cloze_prose_advanced.json
CLAWLEARN_PROMPT_CLOZE_TRANSCRIPT_BEGINNER=./prompts/cloze_transcript_beginner.json
CLAWLEARN_PROMPT_CLOZE_TRANSCRIPT_INTERMEDIATE=./prompts/cloze_transcript_intermediate.json
CLAWLEARN_PROMPT_CLOZE_TRANSCRIPT_ADVANCED=./prompts/cloze_transcript_advanced.json
CLAWLEARN_PROMPT_TRANSLATE=./prompts/translate_rewrite.json
# Preferred default prompt files by role (used when no CLI override)
CLAWLEARN_EXTRACT_PROMPT=
CLAWLEARN_EXPLAIN_PROMPT=
CLAWLEARN_PROMPT_LANG=zh
CLAWLEARN_ANKI_TEMPLATE=./templates/anki_cloze_default.json

# Intermediate run data (JSONL, media snapshots)
CLAWLEARN_OUTPUT_DIR=./runs
# Final exported decks (when --output is not provided)
CLAWLEARN_EXPORT_DIR=./outputs
CLAWLEARN_LOG_DIR=./logs
CLAWLEARN_LOG_LEVEL=INFO
CLAWLEARN_SAVE_INTERMEDIATE=true
CLAWLEARN_ALLOW_EMPTY_DECK=true
CLAWLEARN_DEFAULT_DECK_NAME=ClawLearn Default Deck

CLAWLEARN_MATERIAL_PROFILE chooses material strategy: prose_article, transcript_dialogue, textbook_examples.
CLAWLEARN_LEARNING_MODE defaults to lingua_expression (see supported modes in src/clawlearn/constants.py).
Prompt selection is now profile + difficulty driven:
- prose: cloze_prose_{beginner|intermediate|advanced}.json
- transcript: cloze_transcript_{beginner|intermediate|advanced}.json
- textbook_examples: cloze_textbook_examples.json
CLAWLEARN_CONTENT_PROFILE is kept as a backward-compatible alias.
CLAWLEARN_EXTRACT_PROMPT / CLAWLEARN_EXPLAIN_PROMPT can pin default prompt files directly (higher priority than profile-chain defaults, lower priority than CLI --extract-prompt / --explain-prompt).
CLAWLEARN_PROMPT_LANG controls which language variant is used for multi-lingual prompts (en or zh), and can be overridden by --prompt-lang.

2.7 TTS (edge_tts)

CLAWLEARN_TTS_PROVIDER=edge_tts
CLAWLEARN_TTS_OUTPUT_FORMAT=mp3
CLAWLEARN_TTS_RATE=+0%
CLAWLEARN_TTS_VOLUME=+0%
CLAWLEARN_TTS_RANDOM_SEED=

# Voice lists per language; at least 3 voices per source language is recommended.
CLAWLEARN_TTS_EDGE_EN_VOICES=en-US-AnaNeural,en-US-AndrewNeural,en-GB-SoniaNeural
CLAWLEARN_TTS_EDGE_ZH_VOICES=zh-CN-XiaoxiaoNeural,zh-CN-YunxiNeural,zh-CN-liaoning-XiaobeiNeural
CLAWLEARN_TTS_EDGE_JA_VOICES=ja-JP-NanamiNeural,ja-JP-KeitaNeural,ja-JP-AoiNeural

ClawLearn uses edge_tts to synthesize audio for the Original sentence of each card, selecting a voice from the configured list based on source_lang.

3. Cloze & translation prompts

3.1 Cloze prompt families

Cloze generation now uses prompt family + difficulty variant:

prose_article: cloze_prose_beginner.json, cloze_prose_intermediate.json, cloze_prose_advanced.json
transcript_dialogue: cloze_transcript_beginner.json, cloze_transcript_intermediate.json, cloze_transcript_advanced.json
textbook_examples: cloze_textbook_examples.json

Legacy cloze_contextual.json is still supported for backward compatibility.

Prompt fields support both legacy and multi-lingual formats:
- Legacy string format: "system_prompt": "...".
- Multi-lingual map: "system_prompt": { "en": "...", "zh": "..." }.
At runtime, the language variant is selected based on CLAWLEARN_PROMPT_LANG (or the --prompt-lang CLI override).

It uses source_lang, target_lang, learning_mode, difficulty, cloze_max_sentences, and a merged chunk_text (possibly containing multiple chunk blocks) as placeholders.

The expected output depends on the selected extraction prompt schema:

Phrase extraction pipeline (current default prompts, schema phrase_candidates_*):

{
  "chunk_id": "chunk_0001",
  "context_sentences": [
    "Sentence 1 copied verbatim from chunk_text.",
    "Sentence 2 copied verbatim from chunk_text."
  ],
  "phrases": [
    { "text": "short phrase copied verbatim" }
  ]
}

In this pipeline, the LLM does not output any cloze markers. Cloze markup is generated later by code by injecting selected phrase spans into the original sentence text.

Legacy cloze-cards pipeline (schema cloze_cards_*, kept for backward compatibility):

{
  "chunk_id": "chunk_0001_abcd12",
  "text": "The more {{c1::<b>whimsical explanation</b>}}(target-lang hint) is that maybe RL training makes the models a little too {{c2::<b>single-minded</b>}}(target-lang hint) and narrowly focused.",
  "original": "The more whimsical explanation is that maybe RL training makes the models a little too single-minded and narrowly focused.",
  "target_phrases": ["whimsical explanation", "single-minded"],
  "note_hint": "optional short hint"
}

Key rules (enforced via prompt + validator):

text must contain at least one cloze:
- cloze syntax: {{cN::...}} where N = 1, 2, 3...
- cloze inside uses ... to emphasize the phrase.
- cloze is immediately followed by a short explanation in parentheses:
 - {{c1::whimsical explanation}}(target-lang hint).
original must not contain any cloze markers or HTML.
Each chunk may produce 0-4 high-quality cloze candidates.
Each candidate may contain multiple clozes (e.g. c1 and c2 in the same sentence).

The validator further:

normalizes single-brace clozes ({c1::...} -> {{c1::...}});
auto-injects a {{c1::...}} cloze from target_phrases when text has no clozes but phrases are present (fallback only);
rejects candidates that are too short (len(text) < CLOZE_MIN_CHARS) or exceed CLOZE_MAX_SENTENCES.

NOTE: future work may include renumbering multiple c1 occurrences to c1, c2, c3 in order of appearance.

3.2 Textbook prompt: `prompts/cloze_textbook_examples.json`

Use this with --content-profile textbook_examples for textbook-style entries that mix headwords, definitions, and example sentences. The prompt is tuned to:

ignore standalone headword/title lines;
ignore dictionary-style definition lines;
extract cloze candidates only from natural example sentences.

3.3 Translation prompt: `prompts/translate_rewrite.json`

The translation prompt follows the same multi-lingual structure support as the cloze prompts: system_prompt and user_prompt_template may be plain strings or { "en": "...", "zh": "..." } maps, selected via CLAWLEARN_PROMPT_LANG / --prompt-lang.

The translation prompt runs in batch mode. Input is an array of originals, and the LLM should return a JSON array in the same order:

[
  { "translation": "..." },
  { "translation": "..." }
]

Validator ensures:

non-empty translation;
no translation-prefix artifacts like "Translation:";
no Markdown ** (HTML  is allowed).

Error/retry semantics:

Network/request-level failures (timeout, HTTP, full JSON parse failure) are retried up to 3 times for the current remaining batch.
When a batch response is incomplete, successful entries are accepted and only the remaining entries are retried (up to 3 attempts total).
Content/validation failures are not retried; they follow --continue-on-error semantics.

4. CLI commands

The entrypoint module is src/clawlearn/cli.py, exposing a Typer-based CLI. You can run it either as a module or via an installed entrypoint (if configured in your environment).

4.1 `init`

python -m clawlearn.cli init

Creates .env from .env.example if needed.
Verifies required prompt/template files.
Optionally prepares an output directory.

4.2 `doctor`

python -m clawlearn.cli doctor --env-file .env

Performs a series of checks:

Python dependencies (edge_tts, genanki, httpx, typer)
base config (paths, prompt/template files)
runtime config (LLM, TTS voices)
cloze/translate prompt schema
primary LLM connectivity (CLAWLEARN_LLM_*)
translation LLM config & connectivity (CLAWLEARN_TRANSLATE_LLM_*)
output directory writability
cloze control summary (max_sentences / min_chars / difficulty / max_per_chunk / material_profile / learning_mode)
TTS voices for default_source_lang

4.3 `build deck`

Core command:

python -m clawlearn.cli lingua build deck INPUT \
  --source-lang en \
  --target-lang zh \
  --material-profile prose_article|transcript_dialogue|textbook_examples \
  --learning-mode lingua_expression|lingua_reading \ 
  --lingua-annotate \
  --lingua-annotate-batch-size 50 \
  --lingua-annotate-max-items 200 \
  --extract-prompt ./prompts/cloze_transcript_advanced.json \
  --explain-prompt ./prompts/translate_rewrite.json \
  --verbose \

  --input-char-limit 4000 \
  --env-file .env \
  --output deck.apkg \
  --deck-name "My Cloze Deck" \
  --max-chars 1500 \
  --cloze-min-chars 60 \
  --max-notes 200 \
  --temperature 0.2 \
  --difficulty beginner|intermediate|advanced \
  --prompt-lang en|zh \
  --extract-prompt ./prompts/cloze_prose_intermediate.json \
  --explain-prompt ./prompts/translate_rewrite.json \
  --save-intermediate \
  --continue-on-error \
  --debug

Where:

INPUT: path to .txt/.md/.epub file.
--source-lang / --target-lang override defaults from env.
--material-profile selects material strategy and cloze prompt family.
--learning-mode selects the pipeline behavior. For lingua pipelines: lingua_expression|lingua_reading. For textbook pipelines: textbook_focus|textbook_review.
--content-profile is kept as a deprecated alias of --material-profile.
--input-char-limit lets you process only the first N characters for quick tests.
--difficulty overrides CLAWLEARN_CLOZE_DIFFICULTY.
--prompt-lang overrides CLAWLEARN_PROMPT_LANG for multi-lingual prompts.
--extract-prompt overrides extraction prompt file for this run.
--explain-prompt overrides explanation prompt file for this run.
--max-chars overrides CLAWLEARN_CHUNK_MAX_CHARS for this run.
--cloze-min-chars overrides CLAWLEARN_CLOZE_MIN_CHARS for this run.
In textbook_examples profile, runs are rejected when env CLOZE_MIN_CHARS > 120 unless you explicitly provide --cloze-min-chars.
--max-notes imposes a global cap on number of notes.
--save-intermediate dumps intermediates under CLAWLEARN_OUTPUT_DIR/<run_id>.
When --output is not provided, the final deck is written to CLAWLEARN_EXPORT_DIR/<run_id>/output.apkg.
--continue-on-error logs and skips individual failures instead of aborting.
--debug makes _run_guard re-raise exceptions with tracebacks.
By default, deck name uses the input file name (without extension); --deck-name overrides it.

4.4 `prompt validate`

python -m clawlearn.cli prompt validate ./prompts/cloze_prose_intermediate.json
python -m clawlearn.cli prompt validate ./prompts/cloze_textbook_examples.json

Validates a prompt file against the expected JSON schema (see src/clawlearn/llm/prompt_loader.py).

4.5 `config show` / `config validate`

python -m clawlearn.cli config show --env-file .env
python -m clawlearn.cli config validate --env-file .env

config show prints resolved AppConfig as JSON.
config validate runs config validation without building a deck.

5. Output format

Using the default Anki template (templates/anki_cloze_default.json), each card has at least the following fields:

Text: the cloze sentence with {{cN::...}} markers, HTML  for emphasis, and optional translations in parentheses.
Original: the original sentence(s) without cloze markers or HTML.
Translation: target-language translation of Original.
Note: metadata (source title, chunk id, target phrases).
Audio: edge_tts-generated audio for Original (via [sound:xxx.mp3]).

The exact Anki field mapping is defined in the JSON template; you can customize it if you want different field names or card faces.

6. Migration notes (V1 -> V2)

If you have older scripts/configs, here are the key changes:

Subcommands:
- New preferred command: python -m clawlearn.cli lingua build deck ...
- python -m clawlearn.cli build deck ... is kept as a deprecated alias.
Profiles:
- Prefer --material-profile / CLAWLEARN_MATERIAL_PROFILE.
- --content-profile / CLAWLEARN_CONTENT_PROFILE is a deprecated alias.
- Legacy content_profile=general maps to material_profile=prose_article.
Learning mode:
- CLAWLEARN_LEARNING_MODE defaults to lingua_expression.
- Supported modes are listed in src/clawlearn/constants.py.
Extraction schema:
- Current default prompts output phrase_candidates_* JSON (no cloze markup).
- Cloze markup is generated by code by injecting phrase spans.
- Legacy prompts may output cloze_cards_* JSON (cloze markup produced by the LLM).

7. Typical workflow

Set up .env
- Configure LLM endpoints (primary + optional translation LLM).
- Configure chunking & cloze controls.
- Configure TTS voices for your source language.

Run doctor

python -m clawlearn.cli doctor --env-file .env

Build a deck from a podcast transcript

python -m clawlearn.cli lingua build deck ./podcast_transcript.md \
  --source-lang en --target-lang zh --env-file .env \
  --material-profile transcript_dialogue --learning-mode lingua_expression \
  --difficulty intermediate --max-chars 1500 \
  --save-intermediate --continue-on-error --verbose

Import the generated .apkg into Anki and review cards.
Inspect intermediates (optional) under ./runs/<run_id>:
- chunks.jsonl: chunked text
- text_candidates.raw.jsonl: raw cloze candidates from LLM
- text_candidates.validated.jsonl: candidates that passed validation
- translations.jsonl: translations per card
- cards.final.jsonl: final card data before export

8. Optional web UI

For users who prefer a browser-based interface, ClawLearn ships an optional local-only web UI built with Gradio. This does not change the core CLI behaviour and is only started when explicitly invoked.

7.1 Installation

Install the web extra (in addition to the core dependencies):

pip install .[web]

7.2 Launching the web UI

From the project root:

clawlearn-web
# or
python -m clawlearn_web.app

This starts a Gradio app bound to 127.0.0.1:7860. Open http://127.0.0.1:7860 in your browser.

The web UI has three tabs:

Run - upload a .txt/.md/.epub file, select source/target language, content profile, difficulty, and per-run overrides (max notes, input char limit, cloze min chars, chunk max chars, temperature). The backend calls the same run_build_deck pipeline and writes intermediate data to CLAWLEARN_OUTPUT_DIR/<run_id> and the final deck to CLAWLEARN_EXPORT_DIR/<run_id>/output.apkg.
Config - a .env editor for common CLAWLEARN_* settings (LLM endpoints, chunk/cloze defaults, prompt language, output/log directories, default deck name, TTS, etc.). Saving changes writes a new .env, validates it via clawlearn.config.validate_base_config + validate_runtime_config, and rolls back on failure. The "Load defaults" button loads values from ENV_EXAMPLE.md into the form without writing to disk. The Config tab also provides "List models" / "Test connectivity" helpers for both the Extraction LLM and the Explanation LLM using their /models endpoints, and includes CLAWLEARN_EXTRACT_PROMPT / CLAWLEARN_EXPLAIN_PROMPT dropdowns.
Prompt - manage prompt files in ./prompts with New / Save / Rename / Delete. New prompt creation loads role templates (template_extraction.json or template_explanation.json) based on selected Prompt type (Extraction/Explanation). Save/Delete require explicit confirmation in the UI. Delete is guarded: the app refuses to remove the last Extraction prompt or the last Explanation prompt. The web UI is optional; OpenClaw skills and automated usage should continue calling the CLI directly.

8. Known limitations / future work

Cloze numbering: when multiple clozes appear in a single text, we may need to renumber them deterministically (c1, c2, c3) in order of appearance.
Cloze formatting: current prompt encourages the {{cN::phrase}}(translation) style, but behaviour still depends on the chosen LLM and may require further prompt tuning.
Tests: the original tests/ directory has been removed from main; if you extend the project, consider reintroducing a focused test suite.

For a Chinese overview and usage guide, see README_zh.md.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
prompts		prompts
src		src
templates		templates
tests		tests
.codex		.codex
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
ENV_EXAMPLE.md		ENV_EXAMPLE.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
docker-compose.yml		docker-compose.yml
man_clawlearn.en.txt		man_clawlearn.en.txt
man_clawlearn.zh.txt		man_clawlearn.zh.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ClawLearn

1. Installation

1.1 Create a virtualenv

1.2 Initialize the project

2. Configuration (ENV_EXAMPLE)

2.1 LLM (cloze)

2.2 Ingest cleaning

2.3 Chunking

2.4 Cloze-level controls

2.5 Translation LLM (small LLM)

2.X Secondary extraction (dual-LLM phrase extraction)

2.6 Prompts, templates, output

2.7 TTS (edge_tts)

3. Cloze & translation prompts

3.1 Cloze prompt families

3.2 Textbook prompt: prompts/cloze_textbook_examples.json

3.3 Translation prompt: prompts/translate_rewrite.json

4. CLI commands

4.1 init

4.2 doctor

4.3 build deck

4.4 prompt validate

4.5 config show / config validate

5. Output format

6. Migration notes (V1 -> V2)

7. Typical workflow

8. Optional web UI

7.1 Installation

7.2 Launching the web UI

8. Known limitations / future work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3.2 Textbook prompt: `prompts/cloze_textbook_examples.json`

3.3 Translation prompt: `prompts/translate_rewrite.json`

4.1 `init`

4.2 `doctor`

4.3 `build deck`

4.4 `prompt validate`

4.5 `config show` / `config validate`

Packages