feat(cli): add auto command with intelligent template matching#80
feat(cli): add auto command with intelligent template matching#80chenmofei wants to merge 16 commits into
Conversation
PerishCode
left a comment
There was a problem hiding this comment.
@chenmofei — thanks for the detailed write-up; the three-layer matching design and the README decision flowchart make the intent easy to follow.
This review covers the auto command added in c3dbcee — cli/src/skills-matcher.ts, the handleAuto additions in cli/src/index.ts, and the cli/README.md docs. Three inline comments:
- Blocking —
strongSignalMatchkeyword matching: short ASCII keywords (most clearly the bare"X") substring-match common English words and mis-route most English input tosocial-x-post-card. - Non-blocking —
--force-aidoes not actually force the AI path; it only skips Layer 1, so the documented behavior is not met. - Non-blocking — the
autostdin to stdout path lacks theEPIPEguard thatconverthas, so piping the output onward can crash.
For context, cli/ has no test runner, so the matcher ships without automated coverage; a concrete suggestion is in the blocking comment.
| const matched = keywords.filter((kw) => lower.includes(kw.toLowerCase())); | ||
| if (matched.length > 0) { | ||
| const confidence = matched.length * 2 + 3; | ||
| if (!best || confidence > best.confidence) { |
There was a problem hiding this comment.
Blocking: strong-signal keyword matching produces false positives for short ASCII keywords.
strongSignalMatch matches each keyword with keywords.filter((kw) => lower.includes(kw.toLowerCase())) — unbounded substring matching on lowercased content. Several STRONG_SIGNALS keywords are 1-3 ASCII characters that occur inside common English words, so they match content that has nothing to do with the template.
The clearest case is the bare "X" keyword on line 26 (["推特", "twitter", "X", "tweet", ...], "social-x-post-card"). After toLowerCase() it becomes "x", so lower.includes("x") is true for almost any English-ish input — text, next, example, experience, index, box, max all contain x. A single hit yields confidence = 1 * 2 + 3 = 5, which clears CONFIDENCE_THRESHOLD (3). Other examples: "doc" matches document/doctor, "app" matches apply/happy, "soft" matches software/Microsoft, "live" matches delivery/alive, "RED" to "red" matches required/covered, plus the everyday words "done"/"doing"/"todo".
Why it matters: social-x-post-card sits early in the STRONG_SIGNALS array, and strongSignalMatch keeps the first rule at the maximum confidence (confidence > best.confidence). Any ordinary English document containing the letter x, with no stronger signal, is routed to social-x-post-card instead of falling through to Layer-2 scoring or the AI fallback. That directly defeats the PR's stated goal of matching the best template, for a large class of inputs.
Suggested fix: match ASCII keywords on word boundaries while keeping substring matching for CJK (which has no word boundaries). Classify each keyword — if it is pure ASCII (e.g. matches /^[\x00-\x7F]+$/), test it with a word-boundary regex such as new RegExp("\\b" + escapeRegExp(kw) + "\\b", "i") against the content; otherwise keep the current includes. At minimum, remove or repair the ambiguous keywords ("X", "doc", "app", "soft", "live", "red", "done", "doing"). cli/ currently has no test runner — adding one with a fixture matrix (input snippet to expected templateId, including ambiguous-keyword cases) would catch regressions like this.
There was a problem hiding this comment.
Thanks for the thorough analysis. Fixed in the latest push (also synced with #75):
- Added
kwMatches()—\bword-boundary for ASCII keywords,includesfor CJK. Applied tostrongSignalMatchandruleMatch. - Removed
"X","RED","TODO","done","doing","todo"fromSTRONG_SIGNALS. - 39 unit tests covering all identified false-positive cases (
"x"in"next"→false,"app"in"happy"→false, etc).
| const rule = fallbackMatch(content, templates); | ||
| if (rule && rule.confidence >= CONFIDENCE_THRESHOLD) { | ||
| rule.confidence = Math.min(rule.confidence, 10); | ||
| return rule; | ||
| } |
There was a problem hiding this comment.
--force-ai does not actually force the AI summary path.
matchTemplate consults forceAi only to skip Layer 1 (strongSignalMatch, gated by !forceAi on line 226). Layer 2 still runs unconditionally: fallbackMatch is called on line 234, and lines 235-238 return the rule match whenever its confidence reaches CONFIDENCE_THRESHOLD. So for any input where rule scoring is even moderately confident, --force-ai never reaches aiSummaryMatch on line 254.
Why it matters: the auto help text describes --force-ai as "Force AI summary for matching", and cli/README.md documents it as "跳过关键词匹配,强制使用 AI summary" (skip keyword matching, force the AI summary). The flag silently does something narrower than documented — a user who passes --force-ai to override a bad keyword/rule guess still gets that rule guess. This is one of only two flags the command adds, so the contract mismatch is user-visible.
Suggested fix: gate the Layer-2 short-circuit on !forceAi as well — if (!forceAi && rule && rule.confidence >= CONFIDENCE_THRESHOLD) { ... } — so --force-ai falls through to aiSummaryMatch. Note also that with forceAi set and content shorter than MIN_CONTENT_LENGTH_FOR_AI (the content.length < MIN_CONTENT_LENGTH_FOR_AI branch on line 240), AI is still never invoked; either document that --force-ai is a no-op for short content, or update the README so it only claims to skip Layer-1 keyword matching.
There was a problem hiding this comment.
Fixed: Layer-2 short-circuit now gated on !forceAi:
if (!forceAi && rule && rule.confidence >= CONFIDENCE_THRESHOLD) {
return rule;
}With --force-ai, the rule match is bypassed and execution falls through to aiSummaryMatch. Also fixed the Layer-1 gate — strong-signal matching now works for any content length (was incorrectly gated behind MIN_CONTENT_LENGTH_FOR_AI).
| } else { | ||
| content = await readStdin(); | ||
| if (!content.trim()) { | ||
| console.error("Error: No input provided. Pipe content via stdin or specify an input file."); | ||
| process.exit(1); | ||
| } | ||
| } |
There was a problem hiding this comment.
auto stdin to stdout path is missing the EPIPE guard that convert has.
When auto reads from stdin and no --output/--output-dir is given, convertOne writes the generated HTML to process.stdout (via the call on line 660). handleConvert's stdin branch installs a process.stdout.on("error", ...) handler that swallows EPIPE (index.ts lines 243-245); this stdin branch in handleAuto does not.
Why it matters: auto is documented to support piped input — cat article.md | html-anything auto appears in both the help text and the README — and piping the output onward (| head, | less) is normal CLI usage. If the downstream reader closes the pipe early, the unguarded write raises an unhandled error event on process.stdout and Node crashes with a stack trace. The sibling convert command handles this; auto regresses on it.
Suggested fix: in this stdin branch, before the conversion runs, add the same guard handleConvert uses:
process.stdout.on("error", (err) => {
if ((err as NodeJS.ErrnoException).code === "EPIPE") process.exit(0);
});There was a problem hiding this comment.
Fixed: added the same EPIPE guard that handleConvert uses:
process.stdout.on("error", (err) => {
if ((err as NodeJS.ErrnoException).code === "EPIPE") process.exit(0);
});Placed right after readStdin() in handleAuto's stdin branch, so piping output onward (| head, | less) no longer crashes.
- Add CLI package that converts Markdown to styled HTML via local AI agents - Support 8 coding-agent CLIs (Claude Code, Codex, Cursor Agent, Gemini, etc.) - 75 skill templates from next/src/lib/templates/skills/ - Spinner progress indicator with chunk count and elapsed time (zero deps, pure ANSI) - Auto-save output to <input>.html when input is a file - --output-dir / -d flag to specify auto-save directory - Config management (default template, agent, model) - Stdin support for piping content Part of: nexu-io/html-anything
- extractHtml: return empty string instead of wrapping non-HTML in pre tag, so the CLI correctly surfaces agent errors (rate limits, auth failures) instead of silently saving a valid-looking HTML file around error text - createSpinner: in the non-TTY branch, still flush the final status message to stderr so CI/piped scripts can diagnose failures
Agent exit-code & stderr (A): track done.code and stderr; if the agent exits non-zero, report the failure instead of silently saving a (possibly truncated) HTML file with exit 0. Format validation (B): reject unknown --format values with a list of supported formats (markdown, text, csv, json). Config write guard (C): catch filesystem errors in saveConfig() so disk- full/permission failures show a readable message instead of an uncaught exception. Overwrite prompt (D): ask before overwriting an existing output file in TTY mode; skip the prompt (auto-overwrite) when piped/CI. EPIPE handler (E): catch broken-pipe errors on stdout so piping to head(1) or early-closing consumers does not print a noisy stacktrace. -o/-d conflict (F): error when both --output and --output-dir are set. Multi-file support (G): accept multiple positional input files, process each sequentially, then summarise failures.
When multiple input files would produce the same output basename (e.g. dir1/readme.md and dir2/readme.md both -> readme.html), the CLI now pre-scans before any work begins: 1. Collision detection — lists conflicting basenames and asks whether to resolve by preserving relative directory paths (dir1/readme.html). 2. Overwrite check — after resolving all output paths, checks whether any target files already exist and asks for confirmation before overwriting. 3. On N at any step, the CLI aborts with a clear error before any agent work starts.
- Batch overwrite now skips the interactive prompt outside TTY (matching the single-file promptOverwrite auto-overwrite behaviour), so scripted CI runs don't abort when existing outputs are present. - resolveCollisionOutput now derives relative paths from the common ancestor of all colliding inputs (findCommonPath) instead of cwd, and strips '..' segments so outputs stay inside --output-dir, even when inputs live outside the current working directory.
…d default agents - agents-invoke: aider/deepseek close path now enqueues stdoutBuf directly instead of running it through both parse() AND a raw enqueue, which was producing duplicate HTML (two <!DOCTYPE html> blocks). - handleConfig set-default-agent: now rejects agents that are not installed (!available) or use an unsupported protocol (unsupported), with a clear error listing available supported alternatives. - findAgent: when resolving config.defaultAgent, now also filters out unsupported agents so a stale default (e.g. from manual config.json edit) automatically falls through to the next available agent.
…supported default agents" This reverts commit 19636bc.
…ult agents - agents-invoke: aider/deepseek close path now enqueues stdoutBuf directly instead of running it through both parse() AND a raw enqueue, which was producing duplicate <!DOCTYPE html> blocks. - findAgent: when resolving config.defaultAgent, now also filters out unsupported agents so a stale default (e.g. from manual config.json edit) automatically falls through to the next available agent. - handleConfig set-default-agent: now rejects agents that are not installed or use an unsupported protocol, with a clear error listing available supported alternatives.
detectAgents() previously only accepted *_BIN overrides as absolute paths (existsSync). Relative command names like GEMINI_BIN=fake-claude were dropped even though invocation (resolveBinForAgent) can find them on PATH. Now falls back to resolveOnPath() when existsSync fails, so detection and config flows match the actual invoke behaviour.
Based on all reviewer feedback across 10 rounds, added a complete regression test suite covering every reported failure path: - extract-html.test.ts (9): non-HTML content returns empty, no scaffold wrapping - prompt.test.ts (11): TTY/non-TTY behavior for promptYesNo & promptOverwrite - collision-resolve.test.ts (8): findCommonPath & resolveCollisionOutput edge cases - agents-detect.test.ts (20): *_BIN env overrides, PATH resolution, unsupported protocols - agents-invoke.test.ts (19): DeepSeek/Aider close path no double-enqueue, exit code propagation - index.test.ts (22): param validation, config set-default-agent guards, convert integration Refactored for testability: - Extracted collision-resolve.ts (findCommonPath + resolveCollisionOutput) - Extracted prompt.ts (promptYesNo + promptOverwrite) All 89 tests pass. Typecheck and build clean.
tryPath() in resolveBinForAgent previously only handled absolute paths (starting with / or C:\) and command names on PATH. Relative paths like ./mock-deepseek or ../wrappers/claude fell through to resolveOnPath() which only searches PATH directories, causing a mismatch where detectAgents() reported the agent as available but invokeAgent() could not find it. Now paths containing / or \ or starting with . are resolved via path.resolve() + existsSync(), matching what detectAgents() does.
Two new test cases verify that invokeAgent correctly resolves relative binOverride paths (e.g. ./mock-agent, ../bin/claude) via path.resolve() + existsSync(), matching what detectAgents() already does.
Implements automatic template detection for the CLI, partially resolves nexu-io#60 and supplements the CLI entrypoint introduced in nexu-io#75. - Add skills-matcher.ts with three-layer matching strategy: 1. ~80 strong-signal keyword rules (resume→resume-modern, etc.) 2. Full-template scoring (tags + name + description + scenario) 3. AI summary fallback only when confidence is low (~0 tokens) - Add `auto` command: html-anything auto article.md - Support --force-ai (skip rules) and --show-match-only flags - Update README with consolidated parameter docs and decision flowchart Examples: html-anything auto resume.md # auto-match + convert html-anything auto article.md --show-match-only # preview match only
- Add kwMatches() with \b word-boundary for ASCII keywords, substring for CJK - Remove ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo" - Gate Layer-2 fallback on !forceAi so --force-ai reaches AI summary - Add EPIPE guard to handleAuto stdin-to-stdout path (matching handleConvert)
…tests - Add kwMatches() with \b word-boundary for ASCII keywords, substring for CJK - Remove ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo" - Gate Layer-2 fallback on !forceAi so --force-ai reaches AI summary - Add EPIPE guard to handleAuto stdin-to-stdout path - Fix Layer-1 gate: strong-signal matching now works for any content length - Export kwMatches for unit testing - Add skills-matcher.test.ts with 39 tests covering kwMatches, strong-signal matching, false-positive prevention, --force-ai path, fallback, and reason output
c3dbcee to
af5ee90
Compare
chenmofei
left a comment
There was a problem hiding this comment.
Thanks for the detailed review, all three issues have been addressed in the latest push. Also synced with latest code from #75.
1. 🔴 Blocking — short ASCII keyword false positives (cli/src/skills-matcher.ts:L145)
- Added
kwMatches()helper: ASCII keywords use\bword-boundary regex, CJK usesincludes. - Applied to both
strongSignalMatchandruleMatch(tags + scenario keywords). - Removed ambiguous short keywords:
"X","RED","TODO","done","doing","todo". - Added 39 unit tests in
skills-matcher.test.tsverifying all the false-positive cases you identified (e.g."x"inside"next"→ false,"app"inside"happy"→ false,"live"inside"deliver"→ false).
2. 🟡 --force-ai does not actually force AI (cli/src/skills-matcher.ts:L238)
- Layer-2 fallback now gated on
!forceAi— with--force-ai, the rule match is bypassed so execution falls through toaiSummaryMatch.
3. 🟡 Missing EPIPE guard (cli/src/index.ts:L624)
- Added
process.stdout.on("error", ...)EPIPE handler inhandleAutostdin path, matching the existing guard inhandleConvert.
PerishCode
left a comment
There was a problem hiding this comment.
@chenmofei — thanks for the careful follow-up work on the auto command. I reviewed head af5ee90 against the three unresolved threads from the previous round, and all three are addressed:
- Short-ASCII keyword false positives —
kwMatches()now applies a\bword-boundary regex for ASCII keywords andincludesfor CJK, and the bare"X"/"TODO"/"done"/"doing"style entries are gone fromSTRONG_SIGNALS. Applied consistently tostrongSignalMatchandruleMatch(tags + scenario keywords). --force-ai— the!forceAiguard on the Layer 2 early return means rule scoring no longer short-circuits before the AI summary path for normal-length content;--force-ainow actually reachesaiSummaryMatch.autostdin → stdout EPIPE —handleAuto's stdin branch now installs the sameprocess.stdoutEPIPEhandler thathandleConverthas, socat … | html-anything auto | headno longer crashes.
Verification: ran pnpm -F @html-anything/cli test (130 tests across 7 files, all passing — including the new skills-matcher suite covering the word-boundary false-positive cases and the --force-ai path) and pnpm -F @html-anything/cli typecheck (clean). CI does not report checks on this branch, so this is local verification.
The three-layer matching design is clear and the test coverage for the regression cases is solid. Nice work iterating on this — approving.
🔁 Powered by Looper · runner=reviewer · agent=claude-code · An autonomous AI dev team for your GitHub repos.
Partially resolves #60 — supplements the CLI entrypoint introduced in #75.
Summary
This PR adds an
autocommand to the CLI that intelligently matches the best template for any input content, eliminating the need to manually browse 75 templates.Three-layer matching strategy
auto命令流程匹配策略说明:
deck-swiss-international通用模板整个过程完全本地运行,不依赖任何外部 API key,使用你已有的 agent 订阅。转换过程中会显示动画进度指示器,展示已接收的文本块数和耗时。
Usage
Files changed
cli/src/skills-matcher.ts(new) — core matching enginecli/src/index.ts—autocommand + handleAuto() handlercli/README.md— consolidated docs with decision flowchart