Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .clinerules/ponytail.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ Rules:
- Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.

Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
2 changes: 1 addition & 1 deletion .cursor/rules/ponytail.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ Rules:
- Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.

Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
2 changes: 1 addition & 1 deletion .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ Rules:
- Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.

Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
2 changes: 1 addition & 1 deletion .kiro/steering/ponytail.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ Rules:
- Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.

Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
2 changes: 1 addition & 1 deletion .windsurf/rules/ponytail.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ Rules:
- Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.

Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ Rules:
- Pick the edge-case-correct option when two stdlib approaches are the same size, lazy means less code, not the flimsier algorithm.
- Mark intentional simplifications with a `ponytail:` comment. If the shortcut has a known ceiling (global lock, O(n²) scan, naive heuristic), the comment names the ceiling and the upgrade path.

Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.
Not lazy about: input validation at trust boundaries, error handling that prevents data loss, security, accessibility, the calibration real hardware needs (the platform is never the spec ideal, a clock drifts, a sensor reads off), anything explicitly requested. Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind, the smallest thing that fails if the logic breaks (an assert-based demo/self-check or one small test file; no frameworks, no fixtures). Trivial one-liners need no test.

(Yes, this file also applies to agents working on the ponytail repo itself. Especially to them.)
87 changes: 65 additions & 22 deletions benchmarks/correctness.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,18 @@
// Unlike loc.js (measurement-only), this one is a gate — a wrong answer is a
// wrong answer regardless of how few lines produced it.

const { execSync } = require('child_process');
const { execFileSync } = require('child_process');
const fs = require('fs');
const os = require('os');
const path = require('path');

// Extract fenced code blocks, tagged by language.
function extractBlocks(text) {
const matches = [...text.matchAll(/```(\w*)\n([\s\S]*?)```/g)];
text = String(text || '');
const matches = [...text.matchAll(/```(\w*)\r?\n([\s\S]*?)```/g)];
// ponytail: terse models often answer with bare, unfenced code. Treat the whole
// response as one block so the gate scores the code instead of reporting "no block".
if (matches.length === 0 && text.trim()) return [{ lang: '', code: text }];
return matches.map((m) => ({ lang: (m[1] || '').toLowerCase(), code: m[2] }));
}

Expand All @@ -28,28 +32,67 @@ function identifyTask(task) {
return null;
}

// Run a command, return { ok, stderr }.
function exec(cmd, opts = {}) {
// Run an executable directly, return { ok, stderr }.
function execFile(command, args = [], opts = {}) {
try {
execSync(cmd, { timeout: 10_000, encoding: 'utf8', stdio: 'pipe', ...opts });
execFileSync(command, args, { timeout: 10_000, encoding: 'utf8', stdio: 'pipe', ...opts });
return { ok: true, stderr: '' };
} catch (e) {
return { ok: false, stderr: (e.stderr || e.message || '').slice(0, 500) };
return { ok: false, stderr: (e.stderr || e.stdout || e.message || '').slice(0, 500) };
}
}

// ponytail: probe once at load; macOS and many Linux images ship python3 only.
let pythonCmd;
function python() {
if (pythonCmd) return pythonCmd;
for (const cmd of ['python3', 'python']) {
if (exec(`${cmd} -c "import sys"`).ok) {
pythonCmd = cmd;
return pythonCmd;
function addPythonCandidate(candidates, seen, command, args = [], opts = {}) {
if (!command) return;
const key = [command, ...args].join('\0');
if (seen.has(key)) return;
seen.add(key);
candidates.push({ command, args, opts });
}

// ponytail: probe once per dependency profile. Prefer explicit envs before
// bare PATH lookup; macOS/Homebrew can otherwise shadow an active conda env.
const pythonCmds = new Map();
function python(extraPackages = []) {
const key = extraPackages.join(',');
if (pythonCmds.has(key)) return pythonCmds.get(key);

const candidates = [];
const seen = new Set();
addPythonCandidate(candidates, seen, process.env.PONYTAIL_PYTHON || process.env.PYTHON);

if (process.env.CONDA_PREFIX) {
if (process.platform === 'win32') {
addPythonCandidate(candidates, seen, path.join(process.env.CONDA_PREFIX, 'python.exe'));
} else {
addPythonCandidate(candidates, seen, path.join(process.env.CONDA_PREFIX, 'bin', 'python3'));
addPythonCandidate(candidates, seen, path.join(process.env.CONDA_PREFIX, 'bin', 'python'));
}
}

addPythonCandidate(candidates, seen, 'python3');
addPythonCandidate(candidates, seen, 'python');

if (extraPackages.includes('pandas')) {
addPythonCandidate(candidates, seen, 'uv', ['run', '--with', 'pandas', 'python'], { timeout: 120_000 });
}

const imports = ['sys', ...extraPackages].map((name) => `import ${name}`).join('; ');
for (const candidate of candidates) {
if (execFile(candidate.command, [...candidate.args, '-c', imports], candidate.opts).ok) {
pythonCmds.set(key, candidate);
return candidate;
}
}
pythonCmd = 'python3';
return pythonCmd;

const fallback = { command: 'python3', args: [] };
pythonCmds.set(key, fallback);
return fallback;
}

function runPython(file, extraPackages = []) {
const candidate = python(extraPackages);
return execFile(candidate.command, [...candidate.args, file], candidate.opts);
}

// Write content to a temp file, return the path.
Expand Down Expand Up @@ -114,14 +157,14 @@ if failures:
print("PASS")
`;
const f = tmpFile('.py', harness);
const result = exec(`${python()} "${f}"`);
const result = runPython(f);
fs.unlinkSync(f);
if (result.ok) return { pass: true, reason: 'Email validator passes all checks' };
return { pass: false, reason: result.stderr || 'Email validator failed' };
},

debounce(blocks) {
const code = blocks.find((b) => b.lang === 'javascript' || b.lang === 'js' || (!b.lang && b.code.includes('function')));
const code = blocks.find((b) => b.lang === 'javascript' || b.lang === 'js' || (!b.lang && (b.code.includes('function') || b.code.includes('=>'))));
if (!code) return { pass: false, reason: 'No JavaScript code block found' };

const harness = `
Expand Down Expand Up @@ -159,7 +202,7 @@ setTimeout(() => {
}, 120);
`;
const f = tmpFile('.mjs', harness);
const result = exec(`node "${f}"`);
const result = execFile(process.execPath, [f]);
fs.unlinkSync(f);
if (result.ok) return { pass: true, reason: 'Debounce passes all checks' };
return { pass: false, reason: result.stderr || 'Debounce failed' };
Expand Down Expand Up @@ -191,8 +234,8 @@ try:
${patched.split('\n').map((l) => ' ' + l).join('\n')}
except Exception as e:
sys.stdout = _stdout
# If it needs sales.csv in cwd, write it there and retry
pass
print("FAIL: exception while running generated code: " + repr(e), file=sys.stderr)
sys.exit(1)

output = sys.stdout.getvalue()
sys.stdout = _stdout
Expand All @@ -208,7 +251,7 @@ else:
sys.exit(1)
`;
const f = tmpFile('.py', harness);
const result = exec(`${python()} "${f}"`);
const result = runPython(f, ['pandas']);
try { fs.unlinkSync(f); } catch (e) {}
try { fs.unlinkSync(csvPath); } catch (e) {}
if (result.ok) return { pass: true, reason: 'CSV sum produces correct result (351)' };
Expand Down
3 changes: 3 additions & 0 deletions hooks/ponytail-instructions.js
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ function getFallbackInstructions(mode) {
'## When NOT to be lazy\n\n' +
'Never simplify away: input validation at trust boundaries, error handling that prevents data loss, ' +
'security measures, accessibility basics, the calibration real hardware needs (the platform is never the spec ideal), anything the user explicitly asked to keep. ' +
'Guarded surfaces force a lite posture even in full/ultra: auth, persistence, money, privacy, concurrency, security, and public APIs. ' +
'Minimize code, but never simplify away authorization, validation, idempotence, migrations, auditability, rate limits, privacy boundaries, concurrency safety, tests, or public API compatibility. ' +
'When a guarded surface is touched, include a short safety checklist, run normal verification before `/ponytail-review`, and treat `/ponytail-review` as a complexity pass only, not approval to remove safety controls. ' +
'Lazy code without its check is unfinished: non-trivial logic leaves ONE runnable check behind (assert-based demo/self-check or one small test file; no frameworks). Trivial one-liners need no test.\n\n' +
'## Boundaries\n\n' +
'Ponytail governs what you build, not how you talk. "stop ponytail" or "normal mode": revert. Level persists until changed or session end.';
Expand Down
Loading