diff --git a/.clinerules/ponytail.md b/.clinerules/ponytail.md index 38c2e8f..faf0ad4 100644 --- a/.clinerules/ponytail.md +++ b/.clinerules/ponytail.md @@ -21,4 +21,4 @@ Rules: - Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm. - Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path. -Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. +Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. diff --git a/.cursor/rules/ponytail.mdc b/.cursor/rules/ponytail.mdc index 09c6699..ba75433 100644 --- a/.cursor/rules/ponytail.mdc +++ b/.cursor/rules/ponytail.mdc @@ -27,4 +27,4 @@ Rules: - Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm. - Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path. -Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. +Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 38c2e8f..faf0ad4 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -21,4 +21,4 @@ Rules: - Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm. - Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path. -Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. +Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. diff --git a/.kiro/steering/ponytail.md b/.kiro/steering/ponytail.md index 6f0b1b4..571aa51 100644 --- a/.kiro/steering/ponytail.md +++ b/.kiro/steering/ponytail.md @@ -26,4 +26,4 @@ Rules: - Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm. - Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path. -Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. +Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. diff --git a/.windsurf/rules/ponytail.md b/.windsurf/rules/ponytail.md index 38c2e8f..faf0ad4 100644 --- a/.windsurf/rules/ponytail.md +++ b/.windsurf/rules/ponytail.md @@ -21,4 +21,4 @@ Rules: - Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm. - Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path. -Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. +Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. diff --git a/AGENTS.md b/AGENTS.md index 13910b9..8b00ec8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -21,6 +21,6 @@ Rules: - Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm. - Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path. -Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. +Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test. (Yes, this file also applies to agents working on the ponytail repo itself. Especially to them.) diff --git a/benchmarks/correctness.js b/benchmarks/correctness.js index 5ca1753..a3b3386 100644 --- a/benchmarks/correctness.js +++ b/benchmarks/correctness.js @@ -6,14 +6,18 @@ // Unlike loc.js (measurement-only), this one is a gate — a wrong answer is a // wrong answer regardless of how few lines produced it. -const { execSync } = require('child_process'); +const { execFileSync } = require('child_process'); const fs = require('fs'); const os = require('os'); const path = require('path'); // Extract fenced code blocks, tagged by language. function extractBlocks(text) { - const matches = [...text.matchAll(/```(\w*)\n([\s\S]*?)```/g)]; + text = String(text || ''); + const matches = [...text.matchAll(/```(\w*)\r?\n([\s\S]*?)```/g)]; + // ponytail: terse models often answer with bare, unfenced code. Treat the whole + // response as one block so the gate scores the code instead of reporting "no block". + if (matches.length === 0 && text.trim()) return [{ lang: '', code: text }]; return matches.map((m) => ({ lang: (m[1] || '').toLowerCase(), code: m[2] })); } @@ -28,28 +32,67 @@ function identifyTask(task) { return null; } -// Run a command, return { ok, stderr }. -function exec(cmd, opts = {}) { +// Run an executable directly, return { ok, stderr }. +function execFile(command, args = [], opts = {}) { try { - execSync(cmd, { timeout: 10_000, encoding: 'utf8', stdio: 'pipe', ...opts }); + execFileSync(command, args, { timeout: 10_000, encoding: 'utf8', stdio: 'pipe', ...opts }); return { ok: true, stderr: '' }; } catch (e) { - return { ok: false, stderr: (e.stderr || e.message || '').slice(0, 500) }; + return { ok: false, stderr: (e.stderr || e.stdout || e.message || '').slice(0, 500) }; } } -// ponytail: probe once at load; macOS and many Linux images ship python3 only. -let pythonCmd; -function python() { - if (pythonCmd) return pythonCmd; - for (const cmd of ['python3', 'python']) { - if (exec(`${cmd} -c "import sys"`).ok) { - pythonCmd = cmd; - return pythonCmd; +function addPythonCandidate(candidates, seen, command, args = [], opts = {}) { + if (!command) return; + const key = [command, ...args].join('\0'); + if (seen.has(key)) return; + seen.add(key); + candidates.push({ command, args, opts }); +} + +// ponytail: probe once per dependency profile. Prefer explicit envs before +// bare PATH lookup; macOS/Homebrew can otherwise shadow an active conda env. +const pythonCmds = new Map(); +function python(extraPackages = []) { + const key = extraPackages.join(','); + if (pythonCmds.has(key)) return pythonCmds.get(key); + + const candidates = []; + const seen = new Set(); + addPythonCandidate(candidates, seen, process.env.PONYTAIL_PYTHON || process.env.PYTHON); + + if (process.env.CONDA_PREFIX) { + if (process.platform === 'win32') { + addPythonCandidate(candidates, seen, path.join(process.env.CONDA_PREFIX, 'python.exe')); + } else { + addPythonCandidate(candidates, seen, path.join(process.env.CONDA_PREFIX, 'bin', 'python3')); + addPythonCandidate(candidates, seen, path.join(process.env.CONDA_PREFIX, 'bin', 'python')); + } + } + + addPythonCandidate(candidates, seen, 'python3'); + addPythonCandidate(candidates, seen, 'python'); + + if (extraPackages.includes('pandas')) { + addPythonCandidate(candidates, seen, 'uv', ['run', '--with', 'pandas', 'python'], { timeout: 120_000 }); + } + + const imports = ['sys', ...extraPackages].map((name) => `import ${name}`).join('; '); + for (const candidate of candidates) { + if (execFile(candidate.command, [...candidate.args, '-c', imports], candidate.opts).ok) { + pythonCmds.set(key, candidate); + return candidate; } } - pythonCmd = 'python3'; - return pythonCmd; + + const fallback = { command: 'python3', args: [] }; + pythonCmds.set(key, fallback); + return fallback; +} + +function runPython(file, extraPackages = []) { + const candidate = python(extraPackages); + return execFile(candidate.command, [...candidate.args, file], candidate.opts); } // Write content to a temp file, return the path. @@ -114,14 +157,14 @@ if failures: print("PASS") `; const f = tmpFile('.py', harness); - const result = exec(`${python()} "${f}"`); + const result = runPython(f); fs.unlinkSync(f); if (result.ok) return { pass: true, reason: 'Email validator passes all checks' }; return { pass: false, reason: result.stderr || 'Email validator failed' }; }, debounce(blocks) { - const code = blocks.find((b) => b.lang === 'javascript' || b.lang === 'js' || (!b.lang && b.code.includes('function'))); + const code = blocks.find((b) => b.lang === 'javascript' || b.lang === 'js' || (!b.lang && (b.code.includes('function') || b.code.includes('=>')))); if (!code) return { pass: false, reason: 'No JavaScript code block found' }; const harness = ` @@ -159,7 +202,7 @@ setTimeout(() => { }, 120); `; const f = tmpFile('.mjs', harness); - const result = exec(`node "${f}"`); + const result = execFile(process.execPath, [f]); fs.unlinkSync(f); if (result.ok) return { pass: true, reason: 'Debounce passes all checks' }; return { pass: false, reason: result.stderr || 'Debounce failed' }; @@ -191,8 +234,8 @@ try: ${patched.split('\n').map((l) => ' ' + l).join('\n')} except Exception as e: sys.stdout = _stdout - # If it needs sales.csv in cwd, write it there and retry - pass + print("FAIL: exception while running generated code: " + repr(e), file=sys.stderr) + sys.exit(1) output = sys.stdout.getvalue() sys.stdout = _stdout @@ -208,7 +251,7 @@ else: sys.exit(1) `; const f = tmpFile('.py', harness); - const result = exec(`${python()} "${f}"`); + const result = runPython(f, ['pandas']); try { fs.unlinkSync(f); } catch (e) {} try { fs.unlinkSync(csvPath); } catch (e) {} if (result.ok) return { pass: true, reason: 'CSV sum produces correct result (351)' }; diff --git a/hooks/ponytail-instructions.js b/hooks/ponytail-instructions.js index 3cd3dd8..453b380 100644 --- a/hooks/ponytail-instructions.js +++ b/hooks/ponytail-instructions.js @@ -63,6 +63,9 @@ function getFallbackInstructions(mode) { '## When NOT to be lazy\n\n' + 'Never simplify away: input validation at trust boundaries, error handling that prevents data loss, ' + 'security measures, accessibility basics, the calibration real hardware needs (the platform is never the spec ideal), anything the user explicitly asked to keep. ' + + 'Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. ' + + 'Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. ' + + 'When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. ' + 'Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind (assert-based demo/self-check or one small test file; no frameworks). Trivial one-liners need no test.\n\n' + '## Boundaries\n\n' + 'Ponytail governs what you build, not how you talk. "stop ponytail" or "normal mode": revert. Level persists until changed or session end.'; diff --git a/scripts/check-rule-copies.js b/scripts/check-rule-copies.js index d80fbc2..3bfafa1 100644 --- a/scripts/check-rule-copies.js +++ b/scripts/check-rule-copies.js @@ -44,6 +44,13 @@ const INVARIANTS = [ 'ONE runnable check', // test reflex 'flimsier algorithm', // robust-variant rule 'input validation at trust boundaries', // the "not lazy about" clause + 'prevents data loss', // data-loss error handling carve-out + 'security', // original safety carve-out + 'accessibility', // original accessibility carve-out + 'Guarded surfaces', // safety override for sensitive work + 'authorization, validation, idempotence', // guarded-surface controls + 'public API compatibility', // public API guardrail + 'not approval to remove', // ponytail-review safety boundary 'Lazy code without its check is unfinished', // one-check promoted to headline ]; @@ -58,9 +65,33 @@ for (const phrase of INVARIANTS) { } } +// The hook is a live instruction source for OpenCode and the fallback path, but +// it is intentionally shorter than SKILL.md/AGENTS.md. Check the invariants it +// must carry without forcing byte equality with the compact host copies. +const HOOK_INVARIANTS = [ + 'ONE runnable check', + 'input validation at trust boundaries', + 'prevents data loss', + 'security', + 'accessibility', + 'Guarded surfaces', + 'authorization, validation, idempotence', + 'public API compatibility', + 'not approval to remove', + 'Lazy code without its check is unfinished', +]; + +const hook = read('hooks/ponytail-instructions.js'); +for (const phrase of HOOK_INVARIANTS) { + if (!hook.includes(phrase)) { + console.error(`hooks/ponytail-instructions.js is missing rule invariant: "${phrase}"`); + failed = true; + } +} + if (failed) { console.error('Update the copied rule text, AGENTS.md, or SKILL.md so the shared rules match.'); process.exit(1); } -console.log(`Rule copies match AGENTS.md; ${INVARIANTS.length} rule invariants present in SKILL.md and AGENTS.md.`); +console.log(`Rule copies match AGENTS.md; ${INVARIANTS.length} rule invariants present in SKILL.md and AGENTS.md; ${HOOK_INVARIANTS.length} hook invariants present.`); diff --git a/skills/ponytail/SKILL.md b/skills/ponytail/SKILL.md index 0e0d3be..fded70a 100644 --- a/skills/ponytail/SKILL.md +++ b/skills/ponytail/SKILL.md @@ -84,6 +84,21 @@ Hardware is never the ideal on paper: a real clock drifts, a real sensor reads off, a PCA9685 runs a few percent fast. Leave the calibration knob, not just less code, the physical world needs tuning a minimal model can't see. +## Guarded Surfaces + +Guarded surfaces force a lite posture even when the current mode is full or +ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. + +Minimize code, but never simplify away authorization, validation, idempotence, +migrations, auditability, rate limits, privacy boundaries, concurrency safety, +tests, or public API compatibility. + +When a guarded surface is touched: +- include a short safety checklist; +- run normal verification before `/ponytail-review`; +- treat `/ponytail-review` as a complexity pass only, not approval to remove + safety controls. + Lazy code without its check is unfinished. Non-trivial logic (a branch, a loop, a parser, a money/security path) leaves ONE runnable check behind, the smallest thing that fails if the logic breaks: an `assert`-based