A Claude Code skill (also usable as a standalone CLI) that catches AI-hallucinated citations in BibTeX files. Inspired by TrueCite, built on Semantic Scholar.
LLMs confidently fabricate plausible-looking citations. Running a paper draft through this tool surfaces:
- Pure hallucinations — papers that don't exist anywhere.
- Author corruption — real paper, wrong author list.
- Venue sloppiness — real paper, venue field garbled (e.g. arXiv preprint cited as a journal article).
git clone https://github.com/mufanq/verify-bib-skill.git
cd verify-bib-skill
pip install -r requirements.txtTo use as a Claude Code skill, symlink into your skills directory:
ln -s "$(pwd)" ~/.claude/skills/verify-bibRuns out of the box without a key (shared 5000 req / 5 min pool). For higher throughput request a free key at https://www.semanticscholar.org/product/api and:
export SEMANTIC_SCHOLAR_API_KEY=your_key_here # ~/.zshrc or ~/.bashrcThe script auto-detects whether a key is set:
- No key → unauthenticated requests, 0.2 s sleep between entries.
- Key present →
x-api-keyheader, 1.05 s sleep between entries.
You can also pass --api-key sk_... on the command line to override. The key is never logged or committed — .env and common secret files are in .gitignore.
# Human-readable report
python3 verify_bib.py references.bib
# Machine-readable (pipe into jq, CI, etc.)
python3 verify_bib.py references.bib --jsonExit codes: 0 all clean, 1 issues found, 2 file not found. Suitable as a pre-submission gate.
- Parse
.bibwith pybtex. - For each entry, query Semantic Scholar's
/paper/search/matchwith the title (fall back to/paper/searchif the match endpoint returns nothing). - Compute three fuzzy scores (rapidfuzz token-set ratio on title & venue, last-name set overlap on authors).
verified = title_score ≥ 0.85. Author / venue scores are surfaced as additional flags.- Cache successful lookups in
~/.cache/verify-bib/s2_cache.sqlitefor 30 days.
The scoring + judgment model follows the reverse-engineered behavior of wispaper.ai/agents/true-cite — title match is the primary verdict, author / venue mismatches are surfaced as secondary flags rather than hard failures. This matches how real BibTeX files drift: the paper is usually real, but author lists and venue strings are often truncated or auto-generated from lossy sources.
MIT