Skip to content

Add --format md|html|xml to socrates pack#1

Merged
CryptoJones merged 2 commits into
mainfrom
feat/pack-format-options
May 20, 2026
Merged

Add --format md|html|xml to socrates pack#1
CryptoJones merged 2 commits into
mainfrom
feat/pack-format-options

Conversation

@CryptoJones
Copy link
Copy Markdown
Owner

Summary

Adds socrates pack --format {md,html,xml} so the Architect pack can be emitted in formats other than markdown.

Motivation: the planning files (STATE, DECISIONS, RISKS, etc.) must stay markdown — operators edit them in \$EDITOR and they're git-tracked. But the pack is paste-bait for a browser LLM, and there's some evidence that XML-style structural delimiters (or full HTML for table-heavy content) can improve the Architect's reading of the bundle. Putting the option on pack lets users A/B on their own content without disturbing the source-of-truth files.

Formats

Format Token overhead Notes
md (default) baseline Historical pack behavior; nothing changes for existing users.
xml ~5% Markdown bodies stay markdown, wrapped in <section> / <header> / <footer> tags with path + label attributes. Matches Anthropic's published recommendation to use XML-style tags for structural delimitation when packing context for Claude.
html ~30-50% Full HTML, converted from markdown via the optional markdown library. Install via pip install socrates120x[html]. Use when explicitly A/B-testing whether HTML helps your Architect on your content.

Output extension follows the format: .socrates-architect-pack.{md,xml,html}.

Implementation notes

  • Refactored pack.py to collect a list of _Section dataclasses in one pass, then dispatch to a format-specific renderer. Same section ordering across all three formats.
  • The markdown package is a lazy import (only loaded when the user passes --format html), gated behind a clear RuntimeError that tells the operator how to install it. Added as the [html] optional extra in pyproject.toml.
  • XML special chars (<, >, &) in user content are escaped in the XML output so a stray <X> in DECISIONS doesn't break the bundle.
  • Fixed a pre-existing minor bug: section labels were leaking the developer's absolute path (e.g. /tmp/pytest-of-akclark/.../AGENTS.md) into the output. Now uses project-relative paths in all formats.

Tests

  • 12 new unit tests on build_pack / write_pack: default-is-md, unknown-format-rejection, md headers vs. xml/html tags, XML section attributes, XML escape of user content, HTML doctype/structure, file-extension picking, content survival across all formats, missing-markdown error path.
  • 8 new CLI-level tests in a new tests/test_cli_pack.py that drive cli.main directly — the project had no CLI tests before, so this also establishes the harness pattern. Covers: dest/choices/default, stdout-vs-write branching, argparse invalid-choice rejection, the RuntimeErrorexit 2 path when markdown is missing, and --help text listing all three choices.

147 tests pass (was 119 before this PR). Ruff clean. Mypy clean.

Out of scope (deliberately)

  • No version bump in pyproject.toml. The project's convention is release-summary commits (`v0.7.0: cache segments, preamble template, socrates decide`), not per-feature bumps. Bump to v0.8.0 at release time.
  • No CHANGELOG entry — the project doesn't keep one.

Test plan

  • pytest -v clean
  • ruff check . clean
  • mypy src clean
  • socrates pack --format xml --stdout from a real project, paste into a Claude/ChatGPT session, confirm Architect can parse the section tags
  • pip install socrates120x[html] then socrates pack --format html works
  • pip install socrates120x (without [html]), then socrates pack --format html fails with the install hint on stderr and exit code 2

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

🤖 Generated with Claude Code

CryptoJones and others added 2 commits May 19, 2026 23:09
The Architect pack has only ever shipped one output format (markdown).
That's the right default — operators read and edit these files in
$EDITOR, and the source-of-truth planning files stay clean for git
diffs. But for the pack itself, where the only consumer is a browser-
chat LLM, the right shape might differ:

- `md` (default): historical behavior, lowest token overhead
- `xml`: markdown bodies wrapped in `<section>` / `<header>` /
  `<footer>` tags with `path` and `label` attributes. Matches
  Anthropic's published recommendation to use XML-style tags for
  structural delimitation when packing context for Claude. ~5%
  token overhead vs. plain markdown.
- `html`: full HTML, converted from markdown via the optional
  `markdown` library. ~30-50% token overhead. Use when explicitly
  A/B-testing whether full HTML helps the Architect on specific
  content.

Implementation details:

- Refactored pack.py to collect a list of `_Section` dataclasses
  (label, body, path, kind) in one pass, then dispatch to a
  format-specific renderer. Same section ordering across all three
  formats.
- The `markdown` package is a lazy import (only loaded when the user
  passes `--format html`), gated behind a clear RuntimeError that
  tells the operator how to install it. Added as the `[html]`
  optional extra in pyproject.toml.
- File extension follows the format: `.socrates-architect-pack.{md,xml,html}`.
- XML special chars in user content (`<`, `>`, `&`) are escaped in
  the XML output so a stray `<X>` in a DECISIONS file doesn't break
  the bundle.
- Fixed a pre-existing minor bug: section labels were leaking the
  developer's absolute path (e.g. `/tmp/pytest-of-akclark/.../AGENTS.md`)
  into the output. Now uses project-relative paths in all formats.

Tests:

- 12 new tests covering: default-is-md, unknown-format-rejection,
  md headers vs. xml/html tags, XML section attributes, XML escape
  of user content, HTML doctype/structure, file-extension picking,
  content survival across all formats, missing-`markdown` error path.
- All 139 existing tests still pass.
- Ruff + mypy clean.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review of the prior commit ("Did you update test harnesses?") caught
that the new --format argparse wiring had no CLI-level coverage. The
build_pack/write_pack unit tests exercise the rendering, but the CLI
plumbing (dest name, choices, default, stdout-vs-write branching, the
markdown-missing error path through sys.exit 2, and the --help text)
was completely untested. A typo in `args.pack_format` or a wrong dest
would have shipped.

The project has no other CLI-level tests today, so this also establishes
the harness pattern (drive `socrates120x.cli.main` directly, use
capsys + monkeypatch) that future CLI tests can follow.

Coverage:

- default (no --format) writes .md
- explicit --format md / xml / html each write the matching extension
- --stdout + --format xml prints to stdout and does NOT write a file
- --format with an unknown value triggers argparse's invalid-choice
  exit (SystemExit code 2 + "invalid choice" on stderr)
- --format html with the markdown package missing returns exit code 2
  with an install hint on stderr (verifies the CLI's try/except around
  the RuntimeError, not just the underlying build_pack error)
- --help output contains the --format flag and lists all three choices

8 new tests; 147/147 total pass; ruff + mypy clean.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CryptoJones CryptoJones merged commit 86ef768 into main May 20, 2026
3 checks passed
@CryptoJones CryptoJones deleted the feat/pack-format-options branch May 20, 2026 04:33
CryptoJones added a commit that referenced this pull request May 20, 2026
One feature since v0.7.0:

- `socrates pack --format {md,html,xml}` (#1): the Architect pack can now
  be emitted in formats other than markdown. `md` (default) is unchanged.
  `xml` wraps markdown bodies in `<section>` / `<header>` / `<footer>` tags
  with path + label attributes — matches Anthropic's recommendation for
  structural delimitation when packing context for Claude (~5% token
  overhead). `html` produces full HTML via the optional `markdown` lib
  (install with `pip install socrates120x[html]`, ~30-50% token overhead).
  Output extension follows the format. 20 new tests (12 unit + 8 CLI).

Also in this release:

- First CLI-level test harness for the project (tests/test_cli_pack.py).
  Drives `cli.main` directly via argparse; pattern can be reused for any
  future CLI test.
- `[html]` optional extra in pyproject.toml so users who never need HTML
  output don't pull the `markdown` dependency.
- Fixed a pre-existing bug where pack output leaked the developer's
  absolute path (e.g. `/tmp/pytest-of-akclark/.../AGENTS.md`) into
  section labels. Now uses project-relative paths in all formats.

Proudly Made in Nebraska. Go Big Red! 🌽 https://xkcd.com/2347/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant