feat(cli): add auto command with intelligent template matching by chenmofei · Pull Request #80 · nexu-io/html-anything

chenmofei · 2026-05-21T14:31:59Z

Partially resolves #60 — supplements the CLI entrypoint introduced in #75.

Summary

This PR adds an auto command to the CLI that intelligently matches the best template for any input content, eliminating the need to manually browse 75 templates.

Three-layer matching strategy

user content
  ├─ Layer 1: ~80 strong-signal keyword rules (resume→resume-modern, etc.) — 0 tokens, ms
  ├─ Layer 2: full-template scoring (tags + name + description + scenario) — 0 tokens, ms
  └─ Layer 3: AI summary fallback — only when confidence is low, ~minimal tokens

`auto` 命令流程

用户内容
    │
    ▼
┌─────────────────────────┐
│ 第一层：强信号关键词匹配   │  ← 零 token，毫秒级
│ 命中 → 直接使用匹配模板    │
│ (简历→resume-modern 等)  │
└──────────┬──────────────┘
           │ 未命中
           ▼
┌─────────────────────────┐
│ 第二层：规则打分匹配       │  ← 零 token，毫秒级
│ 内容 × 全部模板 metadata  │
│ (tags + name + desc +    │
│  scenario keywords)      │
│ 置信度 ≥ 阈值 → 使用      │
└──────────┬──────────────┘
           │ 置信度不足
           ▼
┌─────────────────────────┐
│ 第三层：AI Summary 兜底   │  ← 仅在规则失配时
│ 提取内容前 800 字         │
│ → AI 判断主题类型         │
│ → 再次规则匹配            │
└──────────┬──────────────┘
           │
           ▼
       执行转换

匹配策略说明：

强信号（~80 条规则）：覆盖简历、定价、OKR、PRD、周报等高频场景，命中即定
规则打分：遍历所有模板的 tags、名称、描述、场景关键词，累加得分
AI 兜底：内容 ≥ 60 字且前两层均低置信度时，调用 AI 做一句话主题摘要，仅消耗极少量 token
最终兜底：若所有层均失败，回退到 deck-swiss-international 通用模板

整个过程完全本地运行，不依赖任何外部 API key，使用你已有的 agent 订阅。转换过程中会显示动画进度指示器，展示已接收的文本块数和耗时。

Usage

html-anything auto article.md                 # auto-match + convert
html-anything auto article.md --show-match-only  # preview match only
html-anything auto article.md --force-ai         # skip rules, force AI
cat article.md | html-anything auto              # pipe input

Files changed

cli/src/skills-matcher.ts (new) — core matching engine
cli/src/index.ts — auto command + handleAuto() handler
cli/README.md — consolidated docs with decision flowchart

PerishCode

@chenmofei — thanks for the detailed write-up; the three-layer matching design and the README decision flowchart make the intent easy to follow.

This review covers the auto command added in c3dbcee — cli/src/skills-matcher.ts, the handleAuto additions in cli/src/index.ts, and the cli/README.md docs. Three inline comments:

Blocking — strongSignalMatch keyword matching: short ASCII keywords (most clearly the bare "X") substring-match common English words and mis-route most English input to social-x-post-card.
Non-blocking — --force-ai does not actually force the AI path; it only skips Layer 1, so the documented behavior is not met.
Non-blocking — the auto stdin to stdout path lacks the EPIPE guard that convert has, so piping the output onward can crash.

For context, cli/ has no test runner, so the matcher ships without automated coverage; a concrete suggestion is in the blocking comment.

_{🔁 Powered by Looper · runner=reviewer · agent=claude-code · An autonomous AI dev team for your GitHub repos.}

PerishCode · 2026-05-21T14:58:24Z

+    const matched = keywords.filter((kw) => lower.includes(kw.toLowerCase()));
+    if (matched.length > 0) {
+      const confidence = matched.length * 2 + 3;
+      if (!best || confidence > best.confidence) {


Blocking: strong-signal keyword matching produces false positives for short ASCII keywords.

strongSignalMatch matches each keyword with keywords.filter((kw) => lower.includes(kw.toLowerCase())) — unbounded substring matching on lowercased content. Several STRONG_SIGNALS keywords are 1-3 ASCII characters that occur inside common English words, so they match content that has nothing to do with the template.

The clearest case is the bare "X" keyword on line 26 (["推特", "twitter", "X", "tweet", ...], "social-x-post-card"). After toLowerCase() it becomes "x", so lower.includes("x") is true for almost any English-ish input — text, next, example, experience, index, box, max all contain x. A single hit yields confidence = 1 * 2 + 3 = 5, which clears CONFIDENCE_THRESHOLD (3). Other examples: "doc" matches document/doctor, "app" matches apply/happy, "soft" matches software/Microsoft, "live" matches delivery/alive, "RED" to "red" matches required/covered, plus the everyday words "done"/"doing"/"todo".

Why it matters: social-x-post-card sits early in the STRONG_SIGNALS array, and strongSignalMatch keeps the first rule at the maximum confidence (confidence > best.confidence). Any ordinary English document containing the letter x, with no stronger signal, is routed to social-x-post-card instead of falling through to Layer-2 scoring or the AI fallback. That directly defeats the PR's stated goal of matching the best template, for a large class of inputs.

Suggested fix: match ASCII keywords on word boundaries while keeping substring matching for CJK (which has no word boundaries). Classify each keyword — if it is pure ASCII (e.g. matches /^[\x00-\x7F]+$/), test it with a word-boundary regex such as new RegExp("\\b" + escapeRegExp(kw) + "\\b", "i") against the content; otherwise keep the current includes. At minimum, remove or repair the ambiguous keywords ("X", "doc", "app", "soft", "live", "red", "done", "doing"). cli/ currently has no test runner — adding one with a fixture matrix (input snippet to expected templateId, including ambiguous-keyword cases) would catch regressions like this.

Thanks for the thorough analysis. Fixed in the latest push (also synced with #75):

Added kwMatches() — \b word-boundary for ASCII keywords, includes for CJK. Applied to strongSignalMatch and ruleMatch.

Removed "X", "RED", "TODO", "done", "doing", "todo" from STRONG_SIGNALS.

39 unit tests covering all identified false-positive cases ("x" in "next"→false, "app" in "happy"→false, etc).

PerishCode · 2026-05-21T14:58:24Z

+  const rule = fallbackMatch(content, templates);
+  if (rule && rule.confidence >= CONFIDENCE_THRESHOLD) {
+    rule.confidence = Math.min(rule.confidence, 10);
+    return rule;
+  }


--force-ai does not actually force the AI summary path.

matchTemplate consults forceAi only to skip Layer 1 (strongSignalMatch, gated by !forceAi on line 226). Layer 2 still runs unconditionally: fallbackMatch is called on line 234, and lines 235-238 return the rule match whenever its confidence reaches CONFIDENCE_THRESHOLD. So for any input where rule scoring is even moderately confident, --force-ai never reaches aiSummaryMatch on line 254.

Why it matters: the auto help text describes --force-ai as "Force AI summary for matching", and cli/README.md documents it as "跳过关键词匹配，强制使用 AI summary" (skip keyword matching, force the AI summary). The flag silently does something narrower than documented — a user who passes --force-ai to override a bad keyword/rule guess still gets that rule guess. This is one of only two flags the command adds, so the contract mismatch is user-visible.

Suggested fix: gate the Layer-2 short-circuit on !forceAi as well — if (!forceAi && rule && rule.confidence >= CONFIDENCE_THRESHOLD) { ... } — so --force-ai falls through to aiSummaryMatch. Note also that with forceAi set and content shorter than MIN_CONTENT_LENGTH_FOR_AI (the content.length < MIN_CONTENT_LENGTH_FOR_AI branch on line 240), AI is still never invoked; either document that --force-ai is a no-op for short content, or update the README so it only claims to skip Layer-1 keyword matching.

Fixed: Layer-2 short-circuit now gated on !forceAi:

if (!forceAi && rule && rule.confidence >= CONFIDENCE_THRESHOLD) { return rule; }

With --force-ai, the rule match is bypassed and execution falls through to aiSummaryMatch. Also fixed the Layer-1 gate — strong-signal matching now works for any content length (was incorrectly gated behind MIN_CONTENT_LENGTH_FOR_AI).

PerishCode · 2026-05-21T14:58:24Z

+  } else {
+    content = await readStdin();
+    if (!content.trim()) {
+      console.error("Error: No input provided. Pipe content via stdin or specify an input file.");
+      process.exit(1);
+    }
+  }


auto stdin to stdout path is missing the EPIPE guard that convert has.

When auto reads from stdin and no --output/--output-dir is given, convertOne writes the generated HTML to process.stdout (via the call on line 660). handleConvert's stdin branch installs a process.stdout.on("error", ...) handler that swallows EPIPE (index.ts lines 243-245); this stdin branch in handleAuto does not.

Why it matters: auto is documented to support piped input — cat article.md | html-anything auto appears in both the help text and the README — and piping the output onward (| head, | less) is normal CLI usage. If the downstream reader closes the pipe early, the unguarded write raises an unhandled error event on process.stdout and Node crashes with a stack trace. The sibling convert command handles this; auto regresses on it.

Suggested fix: in this stdin branch, before the conversion runs, add the same guard handleConvert uses:

process.stdout.on("error", (err) => { if ((err as NodeJS.ErrnoException).code === "EPIPE") process.exit(0); });

Fixed: added the same EPIPE guard that handleConvert uses:

process.stdout.on("error", (err) => { if ((err as NodeJS.ErrnoException).code === "EPIPE") process.exit(0); });

Placed right after readStdin() in handleAuto's stdin branch, so piping output onward (| head, | less) no longer crashes.

- Add CLI package that converts Markdown to styled HTML via local AI agents - Support 8 coding-agent CLIs (Claude Code, Codex, Cursor Agent, Gemini, etc.) - 75 skill templates from next/src/lib/templates/skills/ - Spinner progress indicator with chunk count and elapsed time (zero deps, pure ANSI) - Auto-save output to <input>.html when input is a file - --output-dir / -d flag to specify auto-save directory - Config management (default template, agent, model) - Stdin support for piping content Part of: nexu-io/html-anything

- extractHtml: return empty string instead of wrapping non-HTML in pre tag, so the CLI correctly surfaces agent errors (rate limits, auth failures) instead of silently saving a valid-looking HTML file around error text - createSpinner: in the non-TTY branch, still flush the final status message to stderr so CI/piped scripts can diagnose failures

Agent exit-code & stderr (A): track done.code and stderr; if the agent exits non-zero, report the failure instead of silently saving a (possibly truncated) HTML file with exit 0. Format validation (B): reject unknown --format values with a list of supported formats (markdown, text, csv, json). Config write guard (C): catch filesystem errors in saveConfig() so disk- full/permission failures show a readable message instead of an uncaught exception. Overwrite prompt (D): ask before overwriting an existing output file in TTY mode; skip the prompt (auto-overwrite) when piped/CI. EPIPE handler (E): catch broken-pipe errors on stdout so piping to head(1) or early-closing consumers does not print a noisy stacktrace. -o/-d conflict (F): error when both --output and --output-dir are set. Multi-file support (G): accept multiple positional input files, process each sequentially, then summarise failures.

When multiple input files would produce the same output basename (e.g. dir1/readme.md and dir2/readme.md both -> readme.html), the CLI now pre-scans before any work begins: 1. Collision detection — lists conflicting basenames and asks whether to resolve by preserving relative directory paths (dir1/readme.html). 2. Overwrite check — after resolving all output paths, checks whether any target files already exist and asks for confirmation before overwriting. 3. On N at any step, the CLI aborts with a clear error before any agent work starts.

- Batch overwrite now skips the interactive prompt outside TTY (matching the single-file promptOverwrite auto-overwrite behaviour), so scripted CI runs don't abort when existing outputs are present. - resolveCollisionOutput now derives relative paths from the common ancestor of all colliding inputs (findCommonPath) instead of cwd, and strips '..' segments so outputs stay inside --output-dir, even when inputs live outside the current working directory.

…d default agents - agents-invoke: aider/deepseek close path now enqueues stdoutBuf directly instead of running it through both parse() AND a raw enqueue, which was producing duplicate HTML (two <!DOCTYPE html> blocks). - handleConfig set-default-agent: now rejects agents that are not installed (!available) or use an unsupported protocol (unsupported), with a clear error listing available supported alternatives. - findAgent: when resolving config.defaultAgent, now also filters out unsupported agents so a stale default (e.g. from manual config.json edit) automatically falls through to the next available agent.

…supported default agents" This reverts commit 19636bc.

…ult agents - agents-invoke: aider/deepseek close path now enqueues stdoutBuf directly instead of running it through both parse() AND a raw enqueue, which was producing duplicate <!DOCTYPE html> blocks. - findAgent: when resolving config.defaultAgent, now also filters out unsupported agents so a stale default (e.g. from manual config.json edit) automatically falls through to the next available agent. - handleConfig set-default-agent: now rejects agents that are not installed or use an unsupported protocol, with a clear error listing available supported alternatives.

detectAgents() previously only accepted *_BIN overrides as absolute paths (existsSync). Relative command names like GEMINI_BIN=fake-claude were dropped even though invocation (resolveBinForAgent) can find them on PATH. Now falls back to resolveOnPath() when existsSync fails, so detection and config flows match the actual invoke behaviour.

Based on all reviewer feedback across 10 rounds, added a complete regression test suite covering every reported failure path: - extract-html.test.ts (9): non-HTML content returns empty, no scaffold wrapping - prompt.test.ts (11): TTY/non-TTY behavior for promptYesNo & promptOverwrite - collision-resolve.test.ts (8): findCommonPath & resolveCollisionOutput edge cases - agents-detect.test.ts (20): *_BIN env overrides, PATH resolution, unsupported protocols - agents-invoke.test.ts (19): DeepSeek/Aider close path no double-enqueue, exit code propagation - index.test.ts (22): param validation, config set-default-agent guards, convert integration Refactored for testability: - Extracted collision-resolve.ts (findCommonPath + resolveCollisionOutput) - Extracted prompt.ts (promptYesNo + promptOverwrite) All 89 tests pass. Typecheck and build clean.

tryPath() in resolveBinForAgent previously only handled absolute paths (starting with / or C:\) and command names on PATH. Relative paths like ./mock-deepseek or ../wrappers/claude fell through to resolveOnPath() which only searches PATH directories, causing a mismatch where detectAgents() reported the agent as available but invokeAgent() could not find it. Now paths containing / or \ or starting with . are resolved via path.resolve() + existsSync(), matching what detectAgents() does.

Two new test cases verify that invokeAgent correctly resolves relative binOverride paths (e.g. ./mock-agent, ../bin/claude) via path.resolve() + existsSync(), matching what detectAgents() already does.

Implements automatic template detection for the CLI, partially resolves nexu-io#60 and supplements the CLI entrypoint introduced in nexu-io#75. - Add skills-matcher.ts with three-layer matching strategy: 1. ~80 strong-signal keyword rules (resume→resume-modern, etc.) 2. Full-template scoring (tags + name + description + scenario) 3. AI summary fallback only when confidence is low (~0 tokens) - Add `auto` command: html-anything auto article.md - Support --force-ai (skip rules) and --show-match-only flags - Update README with consolidated parameter docs and decision flowchart Examples: html-anything auto resume.md # auto-match + convert html-anything auto article.md --show-match-only # preview match only

- Add kwMatches() with \b word-boundary for ASCII keywords, substring for CJK - Remove ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo" - Gate Layer-2 fallback on !forceAi so --force-ai reaches AI summary - Add EPIPE guard to handleAuto stdin-to-stdout path (matching handleConvert)

…tests - Add kwMatches() with \b word-boundary for ASCII keywords, substring for CJK - Remove ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo" - Gate Layer-2 fallback on !forceAi so --force-ai reaches AI summary - Add EPIPE guard to handleAuto stdin-to-stdout path - Fix Layer-1 gate: strong-signal matching now works for any content length - Export kwMatches for unit testing - Add skills-matcher.test.ts with 39 tests covering kwMatches, strong-signal matching, false-positive prevention, --force-ai path, fallback, and reason output

chenmofei

Thanks for the detailed review, all three issues have been addressed in the latest push. Also synced with latest code from #75.

1. 🔴 Blocking — short ASCII keyword false positives (cli/src/skills-matcher.ts:L145)

Added kwMatches() helper: ASCII keywords use \b word-boundary regex, CJK uses includes.
Applied to both strongSignalMatch and ruleMatch (tags + scenario keywords).
Removed ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo".
Added 39 unit tests in skills-matcher.test.ts verifying all the false-positive cases you identified (e.g. "x" inside "next" → false, "app" inside "happy" → false, "live" inside "deliver" → false).

2. 🟡 --force-ai does not actually force AI (cli/src/skills-matcher.ts:L238)

Layer-2 fallback now gated on !forceAi — with --force-ai, the rule match is bypassed so execution falls through to aiSummaryMatch.

3. 🟡 Missing EPIPE guard (cli/src/index.ts:L624)

Added process.stdout.on("error", ...) EPIPE handler in handleAuto stdin path, matching the existing guard in handleConvert.

PerishCode

@chenmofei — thanks for the careful follow-up work on the auto command. I reviewed head af5ee90 against the three unresolved threads from the previous round, and all three are addressed:

Short-ASCII keyword false positives — kwMatches() now applies a \b word-boundary regex for ASCII keywords and includes for CJK, and the bare "X" / "TODO" / "done" / "doing" style entries are gone from STRONG_SIGNALS. Applied consistently to strongSignalMatch and ruleMatch (tags + scenario keywords).
--force-ai — the !forceAi guard on the Layer 2 early return means rule scoring no longer short-circuits before the AI summary path for normal-length content; --force-ai now actually reaches aiSummaryMatch.
auto stdin → stdout EPIPE — handleAuto's stdin branch now installs the same process.stdout EPIPE handler that handleConvert has, so cat … | html-anything auto | head no longer crashes.

Verification: ran pnpm -F @html-anything/cli test (130 tests across 7 files, all passing — including the new skills-matcher suite covering the word-boundary false-positive cases and the --force-ai path) and pnpm -F @html-anything/cli typecheck (clean). CI does not report checks on this branch, so this is local verification.

The three-layer matching design is clear and the test coverage for the regression cases is solid. Nice work iterating on this — approving.

_{🔁 Powered by Looper · runner=reviewer · agent=claude-code · An autonomous AI dev team for your GitHub repos.}

lefarcen requested a review from PerishCode May 21, 2026 14:37

lefarcen added size/XXL PR size: 1500+ changed lines risk/medium Medium risk change type/feature Feature or new user-facing capability labels May 21, 2026

lefarcen mentioned this pull request May 21, 2026

提供agent.md让本地cli自己去决策主题和使用 #60

Open

PerishCode reviewed May 21, 2026

View reviewed changes

chenmofei added 16 commits May 22, 2026 10:55

Revert "fix(cli): deduplicate aider/deepseek close output + reject un…

d8b9c8f

…supported default agents" This reverts commit 19636bc.

fix: auto-enable relative paths for basename collisions in non-TTY mode

978774b

test(cli): add relative *_BIN override resolution tests

a421081

Two new test cases verify that invokeAgent correctly resolves relative binOverride paths (e.g. ./mock-agent, ../bin/claude) via path.resolve() + existsSync(), matching what detectAgents() already does.

chenmofei force-pushed the feat/cli-auto-template-match branch from c3dbcee to af5ee90 Compare May 22, 2026 03:21

chenmofei commented May 22, 2026

View reviewed changes

PerishCode approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add auto command with intelligent template matching#80

feat(cli): add auto command with intelligent template matching#80
chenmofei wants to merge 16 commits into
nexu-io:mainfrom
chenmofei:feat/cli-auto-template-match

chenmofei commented May 21, 2026 •

edited

Loading

Uh oh!

PerishCode left a comment

Uh oh!

PerishCode May 21, 2026

Uh oh!

chenmofei May 22, 2026

Uh oh!

PerishCode May 21, 2026

Uh oh!

chenmofei May 22, 2026

Uh oh!

PerishCode May 21, 2026

Uh oh!

chenmofei May 22, 2026

Uh oh!

chenmofei left a comment

Uh oh!

PerishCode left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chenmofei commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Three-layer matching strategy

auto 命令流程

Usage

Files changed

Uh oh!

PerishCode left a comment

Choose a reason for hiding this comment

Uh oh!

PerishCode May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenmofei May 22, 2026

Choose a reason for hiding this comment

Uh oh!

PerishCode May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenmofei May 22, 2026

Choose a reason for hiding this comment

Uh oh!

PerishCode May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chenmofei May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chenmofei left a comment

Choose a reason for hiding this comment

Uh oh!

PerishCode left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenmofei commented May 21, 2026 •

edited

Loading

`auto` 命令流程