Skip to content

feat(cli): add auto command with intelligent template matching#80

Open
chenmofei wants to merge 16 commits into
nexu-io:mainfrom
chenmofei:feat/cli-auto-template-match
Open

feat(cli): add auto command with intelligent template matching#80
chenmofei wants to merge 16 commits into
nexu-io:mainfrom
chenmofei:feat/cli-auto-template-match

Conversation

@chenmofei

@chenmofei chenmofei commented May 21, 2026

Copy link
Copy Markdown

Partially resolves #60 — supplements the CLI entrypoint introduced in #75.

Summary

This PR adds an auto command to the CLI that intelligently matches the best template for any input content, eliminating the need to manually browse 75 templates.

Three-layer matching strategy

user content
  ├─ Layer 1: ~80 strong-signal keyword rules (resume→resume-modern, etc.) — 0 tokens, ms
  ├─ Layer 2: full-template scoring (tags + name + description + scenario) — 0 tokens, ms
  └─ Layer 3: AI summary fallback — only when confidence is low, ~minimal tokens

auto 命令流程

用户内容
    │
    ▼
┌─────────────────────────┐
│ 第一层:强信号关键词匹配   │  ← 零 token,毫秒级
│ 命中 → 直接使用匹配模板    │
│ (简历→resume-modern 等)  │
└──────────┬──────────────┘
           │ 未命中
           ▼
┌─────────────────────────┐
│ 第二层:规则打分匹配       │  ← 零 token,毫秒级
│ 内容 × 全部模板 metadata  │
│ (tags + name + desc +    │
│  scenario keywords)      │
│ 置信度 ≥ 阈值 → 使用      │
└──────────┬──────────────┘
           │ 置信度不足
           ▼
┌─────────────────────────┐
│ 第三层:AI Summary 兜底   │  ← 仅在规则失配时
│ 提取内容前 800 字         │
│ → AI 判断主题类型         │
│ → 再次规则匹配            │
└──────────┬──────────────┘
           │
           ▼
       执行转换

匹配策略说明

  • 强信号(~80 条规则):覆盖简历、定价、OKR、PRD、周报等高频场景,命中即定
  • 规则打分:遍历所有模板的 tags、名称、描述、场景关键词,累加得分
  • AI 兜底:内容 ≥ 60 字且前两层均低置信度时,调用 AI 做一句话主题摘要,仅消耗极少量 token
  • 最终兜底:若所有层均失败,回退到 deck-swiss-international 通用模板

整个过程完全本地运行,不依赖任何外部 API key,使用你已有的 agent 订阅。转换过程中会显示动画进度指示器,展示已接收的文本块数和耗时。

Usage

html-anything auto article.md                 # auto-match + convert
html-anything auto article.md --show-match-only  # preview match only
html-anything auto article.md --force-ai         # skip rules, force AI
cat article.md | html-anything auto              # pipe input

Files changed

  • cli/src/skills-matcher.ts (new) — core matching engine
  • cli/src/index.tsauto command + handleAuto() handler
  • cli/README.md — consolidated docs with decision flowchart

@lefarcen lefarcen requested a review from PerishCode May 21, 2026 14:37
@lefarcen lefarcen added size/XXL PR size: 1500+ changed lines risk/medium Medium risk change type/feature Feature or new user-facing capability labels May 21, 2026

@PerishCode PerishCode left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenmofei — thanks for the detailed write-up; the three-layer matching design and the README decision flowchart make the intent easy to follow.

This review covers the auto command added in c3dbceecli/src/skills-matcher.ts, the handleAuto additions in cli/src/index.ts, and the cli/README.md docs. Three inline comments:

  • BlockingstrongSignalMatch keyword matching: short ASCII keywords (most clearly the bare "X") substring-match common English words and mis-route most English input to social-x-post-card.
  • Non-blocking--force-ai does not actually force the AI path; it only skips Layer 1, so the documented behavior is not met.
  • Non-blocking — the auto stdin to stdout path lacks the EPIPE guard that convert has, so piping the output onward can crash.

For context, cli/ has no test runner, so the matcher ships without automated coverage; a concrete suggestion is in the blocking comment.

🔁 Powered by Looper · runner=reviewer · agent=claude-code · An autonomous AI dev team for your GitHub repos.

Comment thread cli/src/skills-matcher.ts Outdated
Comment on lines +142 to +145
const matched = keywords.filter((kw) => lower.includes(kw.toLowerCase()));
if (matched.length > 0) {
const confidence = matched.length * 2 + 3;
if (!best || confidence > best.confidence) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: strong-signal keyword matching produces false positives for short ASCII keywords.

strongSignalMatch matches each keyword with keywords.filter((kw) => lower.includes(kw.toLowerCase())) — unbounded substring matching on lowercased content. Several STRONG_SIGNALS keywords are 1-3 ASCII characters that occur inside common English words, so they match content that has nothing to do with the template.

The clearest case is the bare "X" keyword on line 26 (["推特", "twitter", "X", "tweet", ...], "social-x-post-card"). After toLowerCase() it becomes "x", so lower.includes("x") is true for almost any English-ish input — text, next, example, experience, index, box, max all contain x. A single hit yields confidence = 1 * 2 + 3 = 5, which clears CONFIDENCE_THRESHOLD (3). Other examples: "doc" matches document/doctor, "app" matches apply/happy, "soft" matches software/Microsoft, "live" matches delivery/alive, "RED" to "red" matches required/covered, plus the everyday words "done"/"doing"/"todo".

Why it matters: social-x-post-card sits early in the STRONG_SIGNALS array, and strongSignalMatch keeps the first rule at the maximum confidence (confidence > best.confidence). Any ordinary English document containing the letter x, with no stronger signal, is routed to social-x-post-card instead of falling through to Layer-2 scoring or the AI fallback. That directly defeats the PR's stated goal of matching the best template, for a large class of inputs.

Suggested fix: match ASCII keywords on word boundaries while keeping substring matching for CJK (which has no word boundaries). Classify each keyword — if it is pure ASCII (e.g. matches /^[\x00-\x7F]+$/), test it with a word-boundary regex such as new RegExp("\\b" + escapeRegExp(kw) + "\\b", "i") against the content; otherwise keep the current includes. At minimum, remove or repair the ambiguous keywords ("X", "doc", "app", "soft", "live", "red", "done", "doing"). cli/ currently has no test runner — adding one with a fixture matrix (input snippet to expected templateId, including ambiguous-keyword cases) would catch regressions like this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough analysis. Fixed in the latest push (also synced with #75):

  1. Added kwMatches()\b word-boundary for ASCII keywords, includes for CJK. Applied to strongSignalMatch and ruleMatch.
  2. Removed "X", "RED", "TODO", "done", "doing", "todo" from STRONG_SIGNALS.
  3. 39 unit tests covering all identified false-positive cases ("x" in "next"→false, "app" in "happy"→false, etc).

Comment thread cli/src/skills-matcher.ts
Comment on lines +234 to +238
const rule = fallbackMatch(content, templates);
if (rule && rule.confidence >= CONFIDENCE_THRESHOLD) {
rule.confidence = Math.min(rule.confidence, 10);
return rule;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--force-ai does not actually force the AI summary path.

matchTemplate consults forceAi only to skip Layer 1 (strongSignalMatch, gated by !forceAi on line 226). Layer 2 still runs unconditionally: fallbackMatch is called on line 234, and lines 235-238 return the rule match whenever its confidence reaches CONFIDENCE_THRESHOLD. So for any input where rule scoring is even moderately confident, --force-ai never reaches aiSummaryMatch on line 254.

Why it matters: the auto help text describes --force-ai as "Force AI summary for matching", and cli/README.md documents it as "跳过关键词匹配,强制使用 AI summary" (skip keyword matching, force the AI summary). The flag silently does something narrower than documented — a user who passes --force-ai to override a bad keyword/rule guess still gets that rule guess. This is one of only two flags the command adds, so the contract mismatch is user-visible.

Suggested fix: gate the Layer-2 short-circuit on !forceAi as well — if (!forceAi && rule && rule.confidence >= CONFIDENCE_THRESHOLD) { ... } — so --force-ai falls through to aiSummaryMatch. Note also that with forceAi set and content shorter than MIN_CONTENT_LENGTH_FOR_AI (the content.length < MIN_CONTENT_LENGTH_FOR_AI branch on line 240), AI is still never invoked; either document that --force-ai is a no-op for short content, or update the README so it only claims to skip Layer-1 keyword matching.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: Layer-2 short-circuit now gated on !forceAi:

if (!forceAi && rule && rule.confidence >= CONFIDENCE_THRESHOLD) {
  return rule;
}

With --force-ai, the rule match is bypassed and execution falls through to aiSummaryMatch. Also fixed the Layer-1 gate — strong-signal matching now works for any content length (was incorrectly gated behind MIN_CONTENT_LENGTH_FOR_AI).

Comment thread cli/src/index.ts
Comment on lines +618 to +624
} else {
content = await readStdin();
if (!content.trim()) {
console.error("Error: No input provided. Pipe content via stdin or specify an input file.");
process.exit(1);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto stdin to stdout path is missing the EPIPE guard that convert has.

When auto reads from stdin and no --output/--output-dir is given, convertOne writes the generated HTML to process.stdout (via the call on line 660). handleConvert's stdin branch installs a process.stdout.on("error", ...) handler that swallows EPIPE (index.ts lines 243-245); this stdin branch in handleAuto does not.

Why it matters: auto is documented to support piped input — cat article.md | html-anything auto appears in both the help text and the README — and piping the output onward (| head, | less) is normal CLI usage. If the downstream reader closes the pipe early, the unguarded write raises an unhandled error event on process.stdout and Node crashes with a stack trace. The sibling convert command handles this; auto regresses on it.

Suggested fix: in this stdin branch, before the conversion runs, add the same guard handleConvert uses:

process.stdout.on("error", (err) => {
  if ((err as NodeJS.ErrnoException).code === "EPIPE") process.exit(0);
});

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: added the same EPIPE guard that handleConvert uses:

process.stdout.on("error", (err) => {
  if ((err as NodeJS.ErrnoException).code === "EPIPE") process.exit(0);
});

Placed right after readStdin() in handleAuto's stdin branch, so piping output onward (| head, | less) no longer crashes.

chenmofei added 16 commits May 22, 2026 10:55
- Add CLI package that converts Markdown to styled HTML via local AI agents
- Support 8 coding-agent CLIs (Claude Code, Codex, Cursor Agent, Gemini, etc.)
- 75 skill templates from next/src/lib/templates/skills/
- Spinner progress indicator with chunk count and elapsed time (zero deps, pure ANSI)
- Auto-save output to <input>.html when input is a file
- --output-dir / -d flag to specify auto-save directory
- Config management (default template, agent, model)
- Stdin support for piping content

Part of: nexu-io/html-anything
- extractHtml: return empty string instead of wrapping non-HTML in pre tag,
  so the CLI correctly surfaces agent errors (rate limits, auth failures)
  instead of silently saving a valid-looking HTML file around error text
- createSpinner: in the non-TTY branch, still flush the final status
  message to stderr so CI/piped scripts can diagnose failures
Agent exit-code & stderr (A): track done.code and stderr; if the agent
exits non-zero, report the failure instead of silently saving a (possibly
truncated) HTML file with exit 0.

Format validation (B): reject unknown --format values with a list of
supported formats (markdown, text, csv, json).

Config write guard (C): catch filesystem errors in saveConfig() so disk-
full/permission failures show a readable message instead of an uncaught
exception.

Overwrite prompt (D): ask before overwriting an existing output file
in TTY mode; skip the prompt (auto-overwrite) when piped/CI.

EPIPE handler (E): catch broken-pipe errors on stdout so piping to
head(1) or early-closing consumers does not print a noisy stacktrace.

-o/-d conflict (F): error when both --output and --output-dir are set.

Multi-file support (G): accept multiple positional input files, process
each sequentially, then summarise failures.
When multiple input files would produce the same output basename (e.g.
dir1/readme.md and dir2/readme.md both -> readme.html), the CLI now
pre-scans before any work begins:

1. Collision detection — lists conflicting basenames and asks whether to
   resolve by preserving relative directory paths (dir1/readme.html).
2. Overwrite check — after resolving all output paths, checks whether any
   target files already exist and asks for confirmation before overwriting.
3. On N at any step, the CLI aborts with a clear error before any agent
   work starts.
- Batch overwrite now skips the interactive prompt outside TTY (matching
  the single-file promptOverwrite auto-overwrite behaviour), so scripted
  CI runs don't abort when existing outputs are present.
- resolveCollisionOutput now derives relative paths from the common
  ancestor of all colliding inputs (findCommonPath) instead of cwd, and
  strips '..' segments so outputs stay inside --output-dir, even when
  inputs live outside the current working directory.
…d default agents

- agents-invoke: aider/deepseek close path now enqueues stdoutBuf directly
  instead of running it through both parse() AND a raw enqueue, which
  was producing duplicate HTML (two <!DOCTYPE html> blocks).

- handleConfig set-default-agent: now rejects agents that are not
  installed (!available) or use an unsupported protocol (unsupported),
  with a clear error listing available supported alternatives.

- findAgent: when resolving config.defaultAgent, now also filters out
  unsupported agents so a stale default (e.g. from manual config.json
  edit) automatically falls through to the next available agent.
…ult agents

- agents-invoke: aider/deepseek close path now enqueues stdoutBuf directly
  instead of running it through both parse() AND a raw enqueue, which
  was producing duplicate <!DOCTYPE html> blocks.

- findAgent: when resolving config.defaultAgent, now also filters out
  unsupported agents so a stale default (e.g. from manual config.json
  edit) automatically falls through to the next available agent.

- handleConfig set-default-agent: now rejects agents that are not
  installed or use an unsupported protocol, with a clear error
  listing available supported alternatives.
detectAgents() previously only accepted *_BIN overrides as absolute
paths (existsSync). Relative command names like GEMINI_BIN=fake-claude
were dropped even though invocation (resolveBinForAgent) can find them
on PATH. Now falls back to resolveOnPath() when existsSync fails, so
detection and config flows match the actual invoke behaviour.
Based on all reviewer feedback across 10 rounds, added a complete regression
test suite covering every reported failure path:

- extract-html.test.ts (9): non-HTML content returns empty, no scaffold wrapping
- prompt.test.ts (11): TTY/non-TTY behavior for promptYesNo & promptOverwrite
- collision-resolve.test.ts (8): findCommonPath & resolveCollisionOutput edge cases
- agents-detect.test.ts (20): *_BIN env overrides, PATH resolution, unsupported protocols
- agents-invoke.test.ts (19): DeepSeek/Aider close path no double-enqueue, exit code propagation
- index.test.ts (22): param validation, config set-default-agent guards, convert integration

Refactored for testability:
- Extracted collision-resolve.ts (findCommonPath + resolveCollisionOutput)
- Extracted prompt.ts (promptYesNo + promptOverwrite)

All 89 tests pass. Typecheck and build clean.
tryPath() in resolveBinForAgent previously only handled absolute
paths (starting with / or C:\) and command names on PATH. Relative
paths like ./mock-deepseek or ../wrappers/claude fell through to
resolveOnPath() which only searches PATH directories, causing a
mismatch where detectAgents() reported the agent as available but
invokeAgent() could not find it.

Now paths containing / or \ or starting with . are resolved via
path.resolve() + existsSync(), matching what detectAgents() does.
Two new test cases verify that invokeAgent correctly resolves
relative binOverride paths (e.g. ./mock-agent, ../bin/claude)
via path.resolve() + existsSync(), matching what detectAgents()
already does.
Implements automatic template detection for the CLI, partially resolves nexu-io#60
and supplements the CLI entrypoint introduced in nexu-io#75.

- Add skills-matcher.ts with three-layer matching strategy:
  1. ~80 strong-signal keyword rules (resume→resume-modern, etc.)
  2. Full-template scoring (tags + name + description + scenario)
  3. AI summary fallback only when confidence is low (~0 tokens)

- Add `auto` command: html-anything auto article.md
- Support --force-ai (skip rules) and --show-match-only flags
- Update README with consolidated parameter docs and decision flowchart

Examples:
  html-anything auto resume.md        # auto-match + convert
  html-anything auto article.md --show-match-only  # preview match only
- Add kwMatches() with \b word-boundary for ASCII keywords, substring for CJK
- Remove ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo"
- Gate Layer-2 fallback on !forceAi so --force-ai reaches AI summary
- Add EPIPE guard to handleAuto stdin-to-stdout path (matching handleConvert)
…tests

- Add kwMatches() with \b word-boundary for ASCII keywords, substring for CJK
- Remove ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo"
- Gate Layer-2 fallback on !forceAi so --force-ai reaches AI summary
- Add EPIPE guard to handleAuto stdin-to-stdout path
- Fix Layer-1 gate: strong-signal matching now works for any content length
- Export kwMatches for unit testing
- Add skills-matcher.test.ts with 39 tests covering kwMatches, strong-signal matching, false-positive prevention, --force-ai path, fallback, and reason output
@chenmofei chenmofei force-pushed the feat/cli-auto-template-match branch from c3dbcee to af5ee90 Compare May 22, 2026 03:21

@chenmofei chenmofei left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review, all three issues have been addressed in the latest push. Also synced with latest code from #75.

1. 🔴 Blocking — short ASCII keyword false positives (cli/src/skills-matcher.ts:L145)

  • Added kwMatches() helper: ASCII keywords use \b word-boundary regex, CJK uses includes.
  • Applied to both strongSignalMatch and ruleMatch (tags + scenario keywords).
  • Removed ambiguous short keywords: "X", "RED", "TODO", "done", "doing", "todo".
  • Added 39 unit tests in skills-matcher.test.ts verifying all the false-positive cases you identified (e.g. "x" inside "next" → false, "app" inside "happy" → false, "live" inside "deliver" → false).

2. 🟡 --force-ai does not actually force AI (cli/src/skills-matcher.ts:L238)

  • Layer-2 fallback now gated on !forceAi — with --force-ai, the rule match is bypassed so execution falls through to aiSummaryMatch.

3. 🟡 Missing EPIPE guard (cli/src/index.ts:L624)

  • Added process.stdout.on("error", ...) EPIPE handler in handleAuto stdin path, matching the existing guard in handleConvert.

@PerishCode PerishCode left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenmofei — thanks for the careful follow-up work on the auto command. I reviewed head af5ee90 against the three unresolved threads from the previous round, and all three are addressed:

  • Short-ASCII keyword false positiveskwMatches() now applies a \b word-boundary regex for ASCII keywords and includes for CJK, and the bare "X" / "TODO" / "done" / "doing" style entries are gone from STRONG_SIGNALS. Applied consistently to strongSignalMatch and ruleMatch (tags + scenario keywords).
  • --force-ai — the !forceAi guard on the Layer 2 early return means rule scoring no longer short-circuits before the AI summary path for normal-length content; --force-ai now actually reaches aiSummaryMatch.
  • auto stdin → stdout EPIPEhandleAuto's stdin branch now installs the same process.stdout EPIPE handler that handleConvert has, so cat … | html-anything auto | head no longer crashes.

Verification: ran pnpm -F @html-anything/cli test (130 tests across 7 files, all passing — including the new skills-matcher suite covering the word-boundary false-positive cases and the --force-ai path) and pnpm -F @html-anything/cli typecheck (clean). CI does not report checks on this branch, so this is local verification.

The three-layer matching design is clear and the test coverage for the regression cases is solid. Nice work iterating on this — approving.

🔁 Powered by Looper · runner=reviewer · agent=claude-code · An autonomous AI dev team for your GitHub repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk/medium Medium risk change size/XXL PR size: 1500+ changed lines type/feature Feature or new user-facing capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

提供agent.md让本地cli自己去决策主题和使用

3 participants