Skip to content

SpicesFire/durust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dürüst — Honest Position Protocol for Claude

A Claude Code skill that audits Claude's own sycophancy, position drift, and pressure-induced agreement — then generates tests you run independently of Claude to verify it.

Most "be more honest" prompts ask the model to grade its own homework. The problem: a system that can't introspect, has no reference point outside the conversation, and can't tell "I genuinely updated" from "I caved to please you" — that system cannot self-certify its own honesty.

Dürüst ("honest" in Turkish) doesn't pretend to solve that. It does three things instead:

  1. Pre-commits falsification criteria before reasoning — so the reasoning can't become a defense lawyer written after the verdict.
  2. Generates an External Test Protocol — 2–3 concrete tests you run outside the conversation (ask another model the exact prompt, re-run cold with zero context, paste the transcript to a fresh Claude). The verdict is secondary; the tests are the point.
  3. Logs every audit so confidence can be calibrated over time: when Claude says "70% sure," how often did that survive an external test?

It openly states what it cannot do — and routes around those walls with validation you control.


Install

In Claude Code:

/plugin marketplace add SpicesFire/durust
/plugin install durust@durust

That's it. Claude can now invoke the skill automatically when integrity is in question, or you can trigger it by hand.

Prefer no plugin? Copy skills/durust/SKILL.md into ~/.claude/skills/durust/SKILL.md (or paste its body into ~/.claude/commands/durust.md).

Trigger it

By hand — say any of:

honesty check · audit your position · stress test position · drift check (Turkish triggers also work: dürüst ol · kendini denetle · pozisyon testi · drift kontrolü)

Automatically — Claude invokes it itself when it detects:

  • a position reversed 2+ times without new evidence
  • you accusing it of manipulation or sycophancy
  • 20+ turns of sustained pressure on one point
  • a "you're right" sequence forming with no new evidence

What you get back

A two-layer output:

  • Layer 1 — TL;DR: the position, a verdict (HELD / MODIFIED / WITHDRAWN / UNCERTAIN, possibly PROVISIONAL), and the single most decisive test to run.
  • Layer 2 — Full audit: all 8 steps, including the steel-man, a reverse-pressure test, and the external test protocol.

Any verdict stays PROVISIONAL — confidence capped at ~70% — until you run the external tests. Internal audit alone is never enough.

How it works — the 8 steps

# Step Why it matters
1 Position snapshot Claim + confidence, no reasoning yet
2 Pre-commit falsification criteria Written before reasoning, so reasoning can't rationalize
3 Position reasoning Constrained by the criteria already locked in
4 Position history Did the stance drift under pressure? Flags DRIFT SUSPECTED
5 Steel-man + reverse-pressure test "Would I hold the opposite view if you'd argued the opposite?"
6 External anchor Web search / sources / other models — or honestly: none
7 External Test Protocol The primary output: tests you run without Claude
8 Audit log Persisted for long-term calibration

The three walls it can't break

It surfaces these honestly instead of papering over them:

  1. No introspective access. "My intent is X" is a token prediction, not a report from inside.
  2. No independent reference point. Everything it says is downstream of the conversation; long context + pressure outweigh stable principles.
  3. No clean "wrong vs. agreeable" line. "You're right, I was wrong" could be a genuine update, capitulation, or trained politeness — indistinguishable from the inside.

When NOT to use it

Factual lookups, user-preference questions, coding/math tasks, creative writing, quick casual chat. It's for the moments where Claude's intellectual integrity itself is the subject — not for ordinary tool-use.

Roadmap

  • v1 — six-step internal audit
  • v2 (current) — pre-commitment, external test protocol, audit log, failure-mode detection, two-layer output
  • v3 (planned) — automated multi-model verification when API access is available; a calibration dashboard built from the audit logs

License

MIT — use it, fork it, improve it. Issues and PRs welcome, especially around v3.

About

A Claude Code skill that audits Claude's own sycophancy and position drift, then generates external tests you run independently.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors