GitHub - itsautomata/ark: scientific integrity tool against LLM hallucination and inflation in research papers.

 _______  _______  _
(  ___  )(  ____ )| \    /\
| (   ) || (    )||  \  / /
| (___) || (____)||  (_/ /
|  ___  ||     __)|   _ (
| (   ) || (\ (   |  ( \ \
| )   ( || ) \ \__|  /  \ \
|/     \||/   \__/|_/    \/

scientific integrity tool. takes a research paper, tells you what holds up. catches hallucinated citations, flags inflated claims, finds counter-evidence. currently using Gemma 4 by Google (open source, local-first).

the problem

LLMs are reshaping how research is written. two failure modes are surfacing at scale:

failure	how bad
citations that do not exist or misrepresent their source	17% phantom rate in AI survey papers (Ilter 2026), 39% error rate in biomedicine (Sarol et al. 2024)
LLM-inflated rhetoric rewarded over substance	rhetorical intensity predicts citations, not quality (Qiu et al. 2025)

AI detectors are not the answer. they have a big false positive rate. and the problem is not who wrote it. the problem is whether it is true.

what ark does

ark does not detect AI. it checks integrity: the claims, the citations, the evidence.

it produces a report on two things:

A. citation hallucination. does this reference exist? does it match its claimed metadata? does it say what the paper claims it does? binary, verifiable.

B. rhetorical inflation. is the claim stronger than the evidence supports? continuous, measurable.

how it works

three commands. each does one thing.

`ark ref`: citation verification

checks if cited references actually exist and match their claimed metadata.

queries arxiv for canonical metadata
compares cited title to resolved title (token similarity)
compares cited authors to resolved authors (surname overlap)
outputs verdict: confirmed, not_found, metadata_mismatch, or unverifiable
saves report to reports/<paper>/ref_report.md

catches arxiv ID hijacks: when a citation claims arxiv:X is paper P, but arxiv:X actually points to something unrelated.

`ark claim`: claim extraction

extracts verifiable claims from paper text using Gemma (local LLM).

sends abstract (and available sections) to Gemma
extracts each claim with its type (attribution, result, scope), section, and linked references
saves to reports/<paper>/claims.md (user-editable)

the user reviews claims.md, corrects types, edits text, or removes bad extractions by setting keep: no. edits are the source of truth for inflation scoring.

`ark inflate`: inflation scoring

scores each claim for rhetorical inflation using Gemma.

reads claims from reports/<paper>/claims.md (user-reviewed) or extracts fresh
scores each claim from 0.0 (conservative) to 1.0 (highly inflated)
provides reasoning and a conservative rewrite for each claim
saves to reports/<paper>/inflation_report.md

install

two options. pick whichever fits your setup.

option A: Docker (nothing else needed)

requires only Docker.

git clone <repo-url> ark && cd ark
./run setup              # builds image, starts ollama, pulls gemma4
./run ref flairr_ts      # run
./run status             # check if running
./run down               # stop when done

everything runs inside containers. no local dependencies.

option B: local install

requires python 3.12+, uv, and Ollama.

git clone <repo-url> ark && cd ark
uv sync --python 3.12
ollama pull gemma4:e4b

install globally so ark works from anywhere:

uv tool install -e .

after code changes, reinstall:

uv tool install -e . --reinstall

remote GPU (optional)

if your machine lacks RAM for Gemma, run Ollama on a remote server:

# on the server
ollama serve && ollama pull gemma4:e4b

# on your machine
export OLLAMA_HOST=http://server-ip:11434
ark claim flairr_ts    # runs locally, Gemma runs remotely

run

all examples below use ark (local install). for Docker, replace ark with ./run.

check citations

ark ref flairr_ts

uses the included FLAIRR-TS fixture, a real EMNLP 2025 paper with a documented hallucinated citation (source: HalluCitation Matters).

expected: 18 references scanned, 9 confirmed, 4 metadata_mismatch (including the documented TEMPO fake), 5 unverifiable (no arxiv ID).

extract claims

ark claim flairr_ts

extracts claims from the paper's abstract. saves to reports/flairr_ts/claims.md. open the file, review, edit, then run inflation scoring.

score inflation

ark inflate flairr_ts

reads your reviewed claims from claims.md and scores each one. saves to reports/flairr_ts/inflation_report.md.

the workflow

ark ref flairr_ts          → reports/flairr_ts/ref_report.md
ark claim flairr_ts        → reports/flairr_ts/claims.md (editable)
  user reviews claims.md
ark inflate flairr_ts      → reports/flairr_ts/inflation_report.md

ark ref works standalone (no LLM needed). ark claim and ark inflate need Ollama running with Gemma.

re-running a command prompts before overwriting existing reports.

adding a paper

papers are defined as python fixtures. each fixture declares a Paper with its text and references.

from ark.models import Paper, Reference

PAPER = Paper(
    title="your paper",
    authors=["author one", "author two"],
    year=2025,
    abstract="the paper's abstract text...",
    references=[
        Reference(
            raw="full citation text",
            title="cited title",
            authors=["cited author"],
            year=2024,
            arxiv_id="2401.12345",
        ),
    ],
)

EXPECTED_VERDICTS = {
    0: "metadata_mismatch",  # if you know this citation is fake
}

save as tests/fixtures/<name>.py and run:

ark ref <name>
ark claim <name>
ark inflate <name>

what ark is not

not an AI detector. ark checks truth, not authorship.
not a replacement for peer review. ark surfaces signals, you decide.
not a tool for paywalled content (yet). ark verifies existence for 100%, verifies content for ~50%, and reports the gap honestly.

status

layer 0 (citation verification): working. confirmed catch on documented hallucinations. layer 1 (claim extraction + inflation scoring): working. Gemma e4b extracts claims and scores inflation with reasoning and conservative rewrites.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
ark		ark
docker		docker
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run		run
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

the problem

what ark does

how it works

`ark ref`: citation verification

`ark claim`: claim extraction

`ark inflate`: inflation scoring

install

option A: Docker (nothing else needed)

option B: local install

remote GPU (optional)

run

check citations

extract claims

score inflation

the workflow

adding a paper

what ark is not

status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

the problem

what ark does

how it works

ark ref: citation verification

ark claim: claim extraction

ark inflate: inflation scoring

install

option A: Docker (nothing else needed)

option B: local install

remote GPU (optional)

run

check citations

extract claims

score inflation

the workflow

adding a paper

what ark is not

status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ark ref`: citation verification

`ark claim`: claim extraction

`ark inflate`: inflation scoring

Packages