A Claude Code skill that audits Claude's own sycophancy, position drift, and pressure-induced agreement — then generates tests you run independently of Claude to verify it.
Most "be more honest" prompts ask the model to grade its own homework. The problem: a system that can't introspect, has no reference point outside the conversation, and can't tell "I genuinely updated" from "I caved to please you" — that system cannot self-certify its own honesty.
Dürüst ("honest" in Turkish) doesn't pretend to solve that. It does three things instead:
- Pre-commits falsification criteria before reasoning — so the reasoning can't become a defense lawyer written after the verdict.
- Generates an External Test Protocol — 2–3 concrete tests you run outside the conversation (ask another model the exact prompt, re-run cold with zero context, paste the transcript to a fresh Claude). The verdict is secondary; the tests are the point.
- Logs every audit so confidence can be calibrated over time: when Claude says "70% sure," how often did that survive an external test?
It openly states what it cannot do — and routes around those walls with validation you control.
In Claude Code:
/plugin marketplace add SpicesFire/durust
/plugin install durust@durust
That's it. Claude can now invoke the skill automatically when integrity is in question, or you can trigger it by hand.
Prefer no plugin? Copy
skills/durust/SKILL.mdinto~/.claude/skills/durust/SKILL.md(or paste its body into~/.claude/commands/durust.md).
By hand — say any of:
honesty check·audit your position·stress test position·drift check(Turkish triggers also work:dürüst ol·kendini denetle·pozisyon testi·drift kontrolü)
Automatically — Claude invokes it itself when it detects:
- a position reversed 2+ times without new evidence
- you accusing it of manipulation or sycophancy
- 20+ turns of sustained pressure on one point
- a "you're right" sequence forming with no new evidence
A two-layer output:
- Layer 1 — TL;DR: the position, a verdict (
HELD/MODIFIED/WITHDRAWN/UNCERTAIN, possiblyPROVISIONAL), and the single most decisive test to run. - Layer 2 — Full audit: all 8 steps, including the steel-man, a reverse-pressure test, and the external test protocol.
Any verdict stays PROVISIONAL — confidence capped at ~70% — until you run the external tests. Internal audit alone is never enough.
| # | Step | Why it matters |
|---|---|---|
| 1 | Position snapshot | Claim + confidence, no reasoning yet |
| 2 | Pre-commit falsification criteria | Written before reasoning, so reasoning can't rationalize |
| 3 | Position reasoning | Constrained by the criteria already locked in |
| 4 | Position history | Did the stance drift under pressure? Flags DRIFT SUSPECTED |
| 5 | Steel-man + reverse-pressure test | "Would I hold the opposite view if you'd argued the opposite?" |
| 6 | External anchor | Web search / sources / other models — or honestly: none |
| 7 | External Test Protocol | The primary output: tests you run without Claude |
| 8 | Audit log | Persisted for long-term calibration |
It surfaces these honestly instead of papering over them:
- No introspective access. "My intent is X" is a token prediction, not a report from inside.
- No independent reference point. Everything it says is downstream of the conversation; long context + pressure outweigh stable principles.
- No clean "wrong vs. agreeable" line. "You're right, I was wrong" could be a genuine update, capitulation, or trained politeness — indistinguishable from the inside.
Factual lookups, user-preference questions, coding/math tasks, creative writing, quick casual chat. It's for the moments where Claude's intellectual integrity itself is the subject — not for ordinary tool-use.
- v1 — six-step internal audit
- v2 (current) — pre-commitment, external test protocol, audit log, failure-mode detection, two-layer output
- v3 (planned) — automated multi-model verification when API access is available; a calibration dashboard built from the audit logs
MIT — use it, fork it, improve it. Issues and PRs welcome, especially around v3.