Reliable Chain-of-Thought via Prefix Consistency

Naoto Iwase, Yuki Ichihara, Mohammad Atif Quamar, Junpei Komiyama

When we truncate a Chain-of-Thought (CoT) partway through and regenerate the remainder, traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this gap as a reliability signal, prefix consistency, that weights majority voting without access to logits or self-rating prompts.

Repository structure

generation/             # Inference pipeline (initial generation, regeneration, enrichment)
analysis/               # WMV evaluation and paper figures/tables (scripts only)
utils/                  # Dataset and model download scripts

See each subdirectory's README for details:

analysis/README.md: WMV evaluation, paper figures and tables, per-cell data layout.
generation/README.md: inference pipeline, vLLM server, and run_regen.sh for regenerating the JSONLs from scratch (GPU required).

Reproducing the paper from the released data (no GPU)

Per-cell JSONLs and wmv_result.json files for all 5 models × 4 benchmarks under both self-judge and external judge are published on Zenodo at https://doi.org/10.5281/zenodo.20082164. From the repository root:

curl -L 'https://zenodo.org/records/20082164/files/prefix-consistency-data.zip?download=1' -o prefix-consistency-data.zip
unzip prefix-consistency-data.zip   # creates data-self-judge/ and data-external-judge/
uv sync                             # requires uv (https://docs.astral.sh/uv/)
bash analysis/run_paper_all.sh

Writes every table and figure into paper-out/{tables,figures}/.

Workspace layout

<parent>/                  # auto-detected as workspace root
├── prefix-consistency/    # this repo
├── models/                # model weights (only needed to re-run generation)
└── datasets/              # evaluation datasets (only needed to re-run generation)

models/ and datasets/ are only needed if you want to re-run generation/. To regenerate paper figures and tables from the released data, neither is required.

Using a different storage location

# Option 1: symlink
ln -s /data/shared/models  <parent>/models
ln -s /data/shared/datasets <parent>/datasets

# Option 2: env vars (per command or in .bashrc)
export MODELS_DIR=/data/shared/models
export DATASETS_DIR=/data/shared/datasets

Citation

@article{iwase2026prefixconsistency,
  title   = {Reliable Chain-of-Thought via Prefix Consistency},
  author  = {Naoto Iwase and Yuki Ichihara and Mohammad Atif Quamar and Junpei Komiyama},
  journal = {arXiv preprint arXiv:2605.07654},
  year    = {2026},
  url     = {https://arxiv.org/abs/2605.07654},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
analysis		analysis
generation		generation
utils		utils
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reliable Chain-of-Thought via Prefix Consistency

Repository structure

Reproducing the paper from the released data (no GPU)

Workspace layout

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reliable Chain-of-Thought via Prefix Consistency

Repository structure

Reproducing the paper from the released data (no GPU)

Workspace layout

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages