_______ _______ _
( ___ )( ____ )| \ /\
| ( ) || ( )|| \ / /
| (___) || (____)|| (_/ /
| ___ || __)| _ (
| ( ) || (\ ( | ( \ \
| ) ( || ) \ \__| / \ \
|/ \||/ \__/|_/ \/
scientific integrity tool. takes a research paper, tells you what holds up. catches hallucinated citations, flags inflated claims, finds counter-evidence. currently using Gemma 4 by Google (open source, local-first).
LLMs are reshaping how research is written. two failure modes are surfacing at scale:
| failure | how bad |
|---|---|
| citations that do not exist or misrepresent their source | 17% phantom rate in AI survey papers (Ilter 2026), 39% error rate in biomedicine (Sarol et al. 2024) |
| LLM-inflated rhetoric rewarded over substance | rhetorical intensity predicts citations, not quality (Qiu et al. 2025) |
AI detectors are not the answer. they have a big false positive rate. and the problem is not who wrote it. the problem is whether it is true.
ark does not detect AI. it checks integrity: the claims, the citations, the evidence.
it produces a report on two things:
A. citation hallucination. does this reference exist? does it match its claimed metadata? does it say what the paper claims it does? binary, verifiable.
B. rhetorical inflation. is the claim stronger than the evidence supports? continuous, measurable.
three commands. each does one thing.
checks if cited references actually exist and match their claimed metadata.
- queries arxiv for canonical metadata
- compares cited title to resolved title (token similarity)
- compares cited authors to resolved authors (surname overlap)
- outputs verdict:
confirmed,not_found,metadata_mismatch, orunverifiable - saves report to
reports/<paper>/ref_report.md
catches arxiv ID hijacks: when a citation claims arxiv:X is paper P, but arxiv:X actually points to something unrelated.
extracts verifiable claims from paper text using Gemma (local LLM).
- sends abstract (and available sections) to Gemma
- extracts each claim with its type (attribution, result, scope), section, and linked references
- saves to
reports/<paper>/claims.md(user-editable)
the user reviews claims.md, corrects types, edits text, or removes bad extractions by setting keep: no. edits are the source of truth for inflation scoring.
scores each claim for rhetorical inflation using Gemma.
- reads claims from
reports/<paper>/claims.md(user-reviewed) or extracts fresh - scores each claim from 0.0 (conservative) to 1.0 (highly inflated)
- provides reasoning and a conservative rewrite for each claim
- saves to
reports/<paper>/inflation_report.md
two options. pick whichever fits your setup.
requires only Docker.
git clone <repo-url> ark && cd ark
./run setup # builds image, starts ollama, pulls gemma4
./run ref flairr_ts # run
./run status # check if running
./run down # stop when doneeverything runs inside containers. no local dependencies.
requires python 3.12+, uv, and Ollama.
git clone <repo-url> ark && cd ark
uv sync --python 3.12
ollama pull gemma4:e4binstall globally so ark works from anywhere:
uv tool install -e .after code changes, reinstall:
uv tool install -e . --reinstallif your machine lacks RAM for Gemma, run Ollama on a remote server:
# on the server
ollama serve && ollama pull gemma4:e4b
# on your machine
export OLLAMA_HOST=http://server-ip:11434
ark claim flairr_ts # runs locally, Gemma runs remotelyall examples below use ark (local install). for Docker, replace ark with ./run.
ark ref flairr_tsuses the included FLAIRR-TS fixture, a real EMNLP 2025 paper with a documented hallucinated citation (source: HalluCitation Matters).
expected: 18 references scanned, 9 confirmed, 4 metadata_mismatch (including the documented TEMPO fake), 5 unverifiable (no arxiv ID).
ark claim flairr_tsextracts claims from the paper's abstract. saves to reports/flairr_ts/claims.md. open the file, review, edit, then run inflation scoring.
ark inflate flairr_tsreads your reviewed claims from claims.md and scores each one. saves to reports/flairr_ts/inflation_report.md.
ark ref flairr_ts → reports/flairr_ts/ref_report.md
ark claim flairr_ts → reports/flairr_ts/claims.md (editable)
user reviews claims.md
ark inflate flairr_ts → reports/flairr_ts/inflation_report.md
ark ref works standalone (no LLM needed). ark claim and ark inflate need Ollama running with Gemma.
re-running a command prompts before overwriting existing reports.
papers are defined as python fixtures. each fixture declares a Paper with its text and references.
from ark.models import Paper, Reference
PAPER = Paper(
title="your paper",
authors=["author one", "author two"],
year=2025,
abstract="the paper's abstract text...",
references=[
Reference(
raw="full citation text",
title="cited title",
authors=["cited author"],
year=2024,
arxiv_id="2401.12345",
),
],
)
EXPECTED_VERDICTS = {
0: "metadata_mismatch", # if you know this citation is fake
}save as tests/fixtures/<name>.py and run:
ark ref <name>
ark claim <name>
ark inflate <name>- not an AI detector. ark checks truth, not authorship.
- not a replacement for peer review. ark surfaces signals, you decide.
- not a tool for paywalled content (yet). ark verifies existence for 100%, verifies content for ~50%, and reports the gap honestly.
layer 0 (citation verification): working. confirmed catch on documented hallucinations. layer 1 (claim extraction + inflation scoring): working. Gemma e4b extracts claims and scores inflation with reasoning and conservative rewrites.