Audit-driven fixes: correctness, gate integrity, config-dir, CI/doc coverage by ferran-valvia · Pull Request #136 · DietrichGebert/ponytail

ferran-valvia · 2026-06-17T11:08:50Z

What this is

A read-only senior audit of the repo turned into a set of small, verified fixes.
Each finding was adversarially verified before landing, and every change ships
with the runnable check that fails if it regresses. No behavior is changed beyond
the bullets below; the multi-platform rule copies (the product) are untouched.

All commits keep node scripts/check-rule-copies.js and npm test green
(77 checks), and the series applies cleanly on top of main.

Fixes

Correctness / state machine

mode-tracker double-output: a prompt that both switches mode and contains
"stop ponytail" / "normal mode" emitted two JSON objects on one stdout,
breaking the host's JSON.parse (Codex/Copilot) and leaving the emitted
systemMessage contradicting the final state. The deactivation branch is now
mutually exclusive with the command branch.
silent level reset on a typo: /ponytail <unknown-arg> fell through to
getDefaultMode() and silently reset the active level in Claude Code and
OpenCode. It is now a no-op (the active level is left alone), matching what the
pi extension already does. Bare /ponytail still re-applies the default.

Benchmark correctness gate (benchmarks/correctness.js)

prose scored as a pass: the unfenced-fallback block is now tagged, and the
run-free structural checks (countdown/ratelimit) require a real code construct,
so prose that name-drops useState/FastAPI/429 no longer scores 1.
exit-0 scored as a pass: a model demo that calls sys.exit(0) /
process.exit(0) before the appended asserts run used to pass on exit code
alone. Each harness already prints a PASS sentinel on success; the gate now
requires it.

Cross-platform config dir

The statusline scripts (.sh + .ps1) and the SessionStart setup nudge
hardcoded ~/.claude, but the hooks write the flag under $CLAUDE_CONFIG_DIR
when set. They now mirror getClaudeDir(), so the badge and nudge work when the
config dir is relocated.

CI coverage

Fold the orphaned benchmarks/correctness.test.js (the issue Impact on model performance? #65 guard, never
run by npm test/CI) into tests/correctness.test.js, plus regression cases
for the two gate fixes above.
Add a bash smoke test for the statusline badge.
Extend the Windows hooks test to cover copilot-hooks.json (script existence)
and assert hooks.json ↔ copilot-hooks.json wire the same scripts.
Add a parity gate over the three marketplace.json manifests, and a
name+description gate over the skill source frontmatter.

Docs / metadata

Reconcile the host/command matrix: Antigravity reuses gemini-extension.json
(same as Gemini CLI) and surfaces commands as chat-skills, so it is documented
as command-capable; Copilot CLI and OpenClaw are marked command-capable and
"Copilot (editor)" as the instruction-only one; the bare /ponytail row is
corrected (it re-applies the default, it does not report the level).
Document .agents/plugins/marketplace.json and add the missing Aider row.
Drop the unused license: MIT frontmatter key from skills/ponytail.

Deliberately not changed

mode-tracker still accepts a $ponytail prefix. It is undocumented and no
known host emits it, but narrowing an input parser on an unverified assumption
risks breaking a niche host for a cosmetic win, so it is left as-is.

Limits

Antigravity's exact command surface and the precise CLI verb that resolves
.agents/plugins/marketplace.json were reasoned from the shared manifest and
docs, not confirmed against a live host.
CI is ubuntu-only, so the PowerShell statusline path is static-checked, not
executed; the bash statusline smoke test does run.

- mode-tracker: a prompt that both switches mode and says "stop ponytail" no longer emits two JSON objects on one stdout (broke the host JSON.parse in Codex/Copilot); the deactivation branch is now mutually exclusive. - mode-tracker / opencode: an unknown /ponytail arg (a typo) no longer silently resets the active level — it is left untouched, matching what the pi extension already does. Bare /ponytail still re-applies the default. - benchmarks/correctness gate: prose that name-drops the structural keywords no longer scores as a pass on the run-free checks (countdown/ratelimit), and a model demo that exits 0 before the appended asserts run no longer counts as a pass — the harness already prints a PASS sentinel; we require it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The hooks write the mode flag under $CLAUDE_CONFIG_DIR when set (getClaudeDir), but the statusline scripts hardcoded ~/.claude, so the badge vanished whenever a user relocated Claude's config dir. Both the bash and PowerShell statusline now resolve the same dir. The SessionStart setup nudge likewise pointed at a literal ~/.claude/settings.json; it now interpolates the resolved settings path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Fold the orphaned benchmarks/correctness.test.js (issue DietrichGebert#65 guard, never run by `npm test` or CI) into tests/correctness.test.js, and add regression cases for the prose-as-pass and exit-0-as-pass fixes. - Add a bash smoke test for the statusline badge (both CLAUDE_CONFIG_DIR and the ~/.claude fallback). - Extend the Windows hooks test to also validate copilot-hooks.json (script existence) and assert hooks.json and copilot-hooks.json wire the same scripts, so a hook added to one host manifest can't be silently forgotten in the other. - Add a parity gate over the three marketplace.json manifests (parse, plugin name, shared description) and a name+description gate over the skill source frontmatter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- README + agent-portability: Antigravity reuses gemini-extension.json (same as Gemini CLI) and surfaces /ponytail commands as chat-skills, so it is listed as command-capable, resolving the README's self-contradiction. - Mark Copilot CLI and OpenClaw as command-capable and qualify "Copilot (editor)" as the instruction-only one. - Correct the bare /ponytail description: it re-applies the default level, it does not report the current one. - Document .agents/plugins/marketplace.json (the .agents-standard marketplace manifest) and add the missing Aider row. - Drop the unused `license: MIT` frontmatter key from skills/ponytail (no host reads it; the openclaw generator injects license uniformly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Ferran Torres and others added 4 commits June 17, 2026 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audit-driven fixes: correctness, gate integrity, config-dir, CI/doc coverage#136

Audit-driven fixes: correctness, gate integrity, config-dir, CI/doc coverage#136
ferran-valvia wants to merge 4 commits into
DietrichGebert:mainfrom
ferran-valvia:ponytail-audit-fixes

ferran-valvia commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ferran-valvia commented Jun 17, 2026

What this is

Fixes

Deliberately not changed

Limits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant