An MCP server for source-grounded web research. It searches the web, fetches and extracts pages, pulls structured data out of tables/files/APIs, and — the part that sets it apart — verifies that a claim is actually supported by its source instead of trusting a snippet. 42 tools over stdio MCP, driven by any MCP client (Claude Desktop, Cursor) or by the companion Scholiast research agent.
The design priority is trustworthiness over convenience: search snippets are treated as discovery only, every fetched page is cached with provenance, and claims are checked against the source text before they count. It also degrades gracefully — with no API keys and no config it still works (scraped search + an automatic headless-browser fallback + an offline verification heuristic); keys and env vars only make it better.
From source (Python ≥ 3.10):
python3 -m venv .venv && source .venv/bin/activate
pip install -e . # installs the `footnote-mcp` console script + deps
python -m playwright install chromium # the headless browser used by the fetch fallback
footnote-mcp # start the server (speaks MCP over stdio)footnote-mcp now waits for an MCP client on stdio. Point a client at it by dropping this
into its MCP settings (Claude Desktop: claude_desktop_config.json; Cursor: ~/.cursor/mcp.json):
{
"mcpServers": {
"footnote": { "command": "footnote-mcp" }
}
}No API keys are required to start — search falls back to scraping Bing + DuckDuckGo. Add
keys later under "env" (see Search backends). Pass --headed to watch
the browser tier work.
Optional runtime variables are documented in .env.example. Copy it to
.env for local shells, or paste selected variables into your MCP client config:
{
"mcpServers": {
"footnote": {
"command": "footnote-mcp",
"env": {
"TAVILY_API_KEY": "..."
}
}
}
}To run without installing, straight from the source tree:
PYTHONPATH=src python -m footnote_mcpThe reason to use this over a plain search tool is evidence_entailment and friends:
they tell a claim a source supports from one it does not. benchmarks/run_benchmark.py
measures that on a labeled set of claim/source pairs (and demos corroborate_claim and
locate_claim_span):
python benchmarks/run_benchmark.py # offline heuristic (deterministic)
python benchmarks/run_benchmark.py --backend ollama # LLM judge (needs ollama)Offline-heuristic result on the labeled set (benchmarks/REPORT.md):
| Set | n | Accuracy | Unsupported-claim catch rate | Precision on "supported" |
|---|---|---|---|---|
| Data domain (numeric + factual) | 15 | 100% | 100% | 100% |
| Overall (incl. semantic) | 18 | 83% | 78% | 80% |
On its design domain — numeric and factual data claims — the offline heuristic never
blesses an unsupported claim and never misses one. Its blind spot is purely-semantic
negation/paraphrase; for those, evidence_entailment with backend="ollama" (a local LLM
judge) closes the gap. Run the --backend ollama line above to score that path on your own
machine.
Discovery and reading (9 tools)
| Tool | Description |
|---|---|
web_search |
Keyed provider (Tavily/Brave/Google) when available, else scraped Bing + DuckDuckGo. Snippets are discovery only. |
web_search_recent |
Search restricted to a recency window (day/week/month/year). |
web_deep_search |
Search, fetch, extract, rerank, and return source context. |
web_read |
Fetch one URL, extract text, classify source quality, persist cache metadata. |
scholarly_search |
Search arXiv (papers) or Wikipedia (encyclopedic) corpora. |
web_archive_fetch |
Find the closest Wayback Machine snapshot for a dead/changed URL. |
web_fetch_authenticated |
Fetch a page that needs cookies or custom headers. |
web_crawl |
Breadth-first crawl from a start URL, on-host by default (≤ 50 pages). |
generate_search_queries |
Generate operator queries (site:, filetype:csv, API/data-table variants). |
Structured data (9 tools)
| Tool | Description |
|---|---|
web_extract_tables |
Parse HTML tables into columns/rows with source-URL provenance. |
web_detect_downloads |
Detect linked CSV/TSV/XLS/XLSX/PDF/JSON/XML files. |
web_parse_file |
Download and parse CSV/TSV/XLS/XLSX/PDF/JSON. |
web_fetch_json |
Fetch direct API/JSON endpoints into parsed JSON. |
check_date_completeness |
Validate required date coverage (day/week/month). |
resolve_units |
Detect currencies, currency pairs, measurement units. |
validate_unit_rows |
Reject rows with incompatible units or currency pairs. |
reconcile_time_series |
Align series on a key, compute deltas, flag missing keys/outliers. |
export_dataset |
Write consolidated rows to a csv/xlsx/json file. |
Source quality and verification (8 tools)
| Tool | Description |
|---|---|
classify_source |
Classify official / aggregator / blog / forum / interactive / blocked / error. |
evidence_entailment |
Strict claim-vs-source checker: heuristic, auto, ollama, optional local_nli. |
corroborate_claim |
Triangulate a claim across excerpts (corroborated / conflicting / single_source / …). |
locate_claim_span |
Locate supporting sentence(s) with char offsets and a containment score. |
source_cache_get / source_cache_put |
Inspect and write persistent source-cache entries. |
build_research_debug_report |
Compact report of queries, URLs, source quality, verification gaps. |
startup_health_check |
Check parser, OCR, browser, and cache dependencies. |
Controlled extraction recipes (6 tools)
When generic parsers fail, synthesize a sandboxed parser:
| Tool | Description |
|---|---|
tool_spec_propose |
Propose a task-specific extraction recipe spec. |
tool_code_generate |
Generate a starter extract(source_text, input_payload) recipe. |
tool_code_validate |
Validate recipe code against a static safety allowlist. |
tool_code_run_sandboxed |
Run validated code in a limited subprocess (JSON output only). |
tool_promote |
Save a validated recipe as reusable memory (no server edit). |
recipe_registry |
Manage promoted recipes: list / get / run / delete. |
Browser fallback (10 tools)
A controlled Chromium session for JS-heavy or interactive pages:
| Tool | Description |
|---|---|
web_navigate · web_snapshot · web_click · web_type · web_extract · web_scroll |
Drive a page via stable element refs. |
browser_set_date_range · browser_extract_tables · browser_extract_tables_for_date_range |
Set a date range, submit, extract visible tables. |
web_screenshot |
Save a PNG and optionally OCR text locked inside the image. |
web_search (and everything built on it) routes through a provider layer. With an API key
it uses that provider; otherwise it scrapes Bing + DuckDuckGo. Results are normalized to one
shape regardless of backend.
| Provider | Env vars | Notes |
|---|---|---|
| Tavily | TAVILY_API_KEY |
LLM-oriented search API. |
| Brave | BRAVE_API_KEY |
Independent web index. |
GOOGLE_API_KEY + GOOGLE_CSE_ID |
Programmable Search (Custom Search JSON API). | |
| Bing + DuckDuckGo | none | Default fallback; scraped, no key. |
auto (default) tries each keyed provider in order Tavily → Brave → Google, then scrapes.
Force one with the provider argument (tavily/brave/google/scrape).
Semantic reranking. Pass semantic: true to web_search to reorder by meaning rather
than keyword overlap: it over-fetches, embeds query and results with a local ollama model,
and sorts by cosine similarity (each result gains semantic_score). Best-effort — if ollama
is unavailable the original order is returned. Model: FOOTNOTE_EMBED_MODEL (default bge-m3).
web_read fetches through an escalation ladder (scraper.py):
the cheapest method runs first and escalates only when a result looks blocked or empty. A
block/quality detector decides when to escalate; a per-domain rate limiter, circuit breaker,
and negative cache keep it polite. The tier used and the full attempt trace come back in
fetch_tier / scrape_tiers.
| Tier | Method | Enabled by |
|---|---|---|
| 1 | HTTP (curl_cffi TLS impersonation) | always |
| 2 | HTTP through a rotating proxy | FOOTNOTE_PROXIES set |
| 3 | Headless Chromium (runs JavaScript) | FOOTNOTE_BROWSER_FALLBACK=1 (default on) |
| 4 | Chromium through a proxy | proxies + browser |
| 5 | Hosted scrape API (Firecrawl / ScrapingBee) | FOOTNOTE_SCRAPE_API set |
With nothing configured it is the plain HTTP path plus an automatic browser fallback for JavaScript-rendered pages.
| Env var | Default | Purpose |
|---|---|---|
FOOTNOTE_BROWSER_FALLBACK |
1 |
Escalate blocked/JS pages to headless Chromium. |
FOOTNOTE_PROXIES |
(none) | Comma-separated proxy URLs; sticky per domain with health tracking. |
FOOTNOTE_SCRAPE_API |
(none) | firecrawl or scrapingbee (needs the matching API key). |
FOOTNOTE_DOMAIN_RPS / _BURST |
3 / 5 |
Per-domain rate limit (token bucket). |
FOOTNOTE_BREAKER_THRESHOLD / _COOLDOWN |
5 / 120 |
Per-domain circuit breaker. |
FOOTNOTE_NEGCACHE_TTL |
300 |
Seconds to remember a blocked URL. |
FOOTNOTE_THIN_CONTENT_CHARS |
200 |
Below this extracted length, a script-heavy page counts as a JS shell. |
~/.footnote-mcp/source_cache/ # persistent page cache (with provenance)
~/.footnote-mcp/research_memory.json # persistent research memory
Override the cache location with FOOTNOTE_SOURCE_CACHE=/path/to/cache footnote-mcp.
check_date_completeness supports the calendars calendar, business_day, crypto_24_7,
forex_weekday, us_business_day, and ru_business_day (pass explicit holidays for
source-specific ones; the us_/ru_ variants use the optional holidays package).
Docker bundles Chromium and tesseract — nothing else to install:
docker build -t footnote-mcp .
docker run -i --rm footnote-mcp # the client launches this; see MCP config belowPublished images are available from GitHub Container Registry:
docker run -i --rm ghcr.io/kazkozdev/footnote-mcp:1.1.0
docker run -i --rm ghcr.io/kazkozdev/footnote-mcp:latest{
"mcpServers": {
"footnote": {
"command": "docker",
"args": ["run", "-i", "--rm", "ghcr.io/kazkozdev/footnote-mcp:latest"]
}
}
}pipx / uvx (isolated install of the entry point):
pipx install /path/to/footnote-mcp # or: pipx install git+<repo-url>
uvx --from /path/to/footnote-mcp footnote-mcp # ad-hoc, no installOCR. PDF/image OCR uses pytesseract + the system tesseract binary (brew install tesseract on macOS). Local NLI backend for evidence_entailment backend="local_nli":
pip install -r requirements-nli.txt (model via FOOTNOTE_NLI_MODEL). Either way,
startup_health_check reports what is actually available. Runtime dependency ranges
are declared in pyproject.toml and mirrored in requirements.txt.
pip install -r requirements-dev.txt
python -m pytest -q # offline unit + smoke tests; no network or keys neededtests/test_mcp_smoke.py launches the server over real MCP stdio and exercises the tools
end to end against a local HTTP fixture; the rest are offline unit tests of the parsers,
fetch ladder, search providers, and dispatch. The live search test is opt-in:
RUN_LIVE_WEB_TESTS=1 python -m pytest -m liveCI runs the same suite (.github/workflows/tests.yml).
MIT — see LICENSE.
