Skip to content

bradtaylorsf/muckwire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

141 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

muckwire

What is this and why does it exist?

Existing AI research agents are cloud-only, opaque about where their answers came from, and lock you into one vendor. muckwire is a different shape: a CLI daemon that runs on your laptop, defaults to local models via LM Studio (so the wallet cost is $0), and treats every claim as something that has to land in a footnote with a Wayback archive URL behind it.

You hand it a goal — "Compare Pydantic AI, LangGraph, and CrewAI", "Profile this contractor for due diligence", "Track Project 2025 implementation across federal agencies" — and walk away. A planner spawns worker tasks (fetch, extract, dedup, classify), findings accumulate in a per-job folder with a SQLite index, a synthesizer rewrites a report.md with inline citations, a critic on a different model pushes back, and the whole thing runs under a deterministic job ID with start / status / stop / resume lifecycle commands a sysadmin would recognize.

Who it's for: investigative journalists, security researchers, due-diligence work, anyone who needs to kick off a multi-hour run overnight and wake up to a report with citations they can defend.

What it's NOT: a chatbot, a real-time tool, or the right answer to a one-off question you could resolve with a single web search.

🚧 Status: actively developed. v1 baseline is shipped and validated end-to-end ($0 local-mode runs produce real, sourced reports). The open issues are the roadmap — see #107 for the connector buildout epic and #158 for the active connector smoke-fix sweep. Expect rough edges; PRs and issue reports welcome.

MIT licensed, contributions welcome — see CONTRIBUTING.md for the issue-driven workflow and LICENSE for terms.

This README is the entry point. It walks an operator from "fresh laptop" to "24-hour soak running unattended". Deeper detail lives alongside it:

The name

muckwire carries the spirit of the muckrakers — the Progressive Era journalists (roughly 1890–1920) who exposed corruption in government, the predations of big business, and the social injustices nobody in power wanted documented. Ida Tarbell dismantled Standard Oil claim by claim. Lincoln Steffens mapped the machinery of municipal graft. Upton Sinclair walked readers onto the meatpacking floor. Ida B. Wells indicted lynching in statistics nobody could wave away. Jacob Riis and Lewis Hine photographed tenements and child labor until the country had to look. Theodore Roosevelt coined the term in 1906 as an insult — borrowed from Bunyan's Pilgrim's Progress, a man so busy raking muck he never looked up — and the journalists wore it as a badge of honor. Their work drove the Pure Food and Drug Act, antitrust enforcement, child-labor laws, and most of the accountability infrastructure that survives today.

The "wire" half points at the plumbing of that same tradition — the wire services (AP, Reuters, UPI) that moved an investigation from one city to readers in another by morning. This tool is built for the same kind of work: patient, sourced, evidence-first investigation. Every claim it produces lands in a footnote with a Wayback archive URL behind it, because the only thing that survives the pushback is the receipts.

Architecture

flowchart TD
    Goal["Goal (interactive intake)"] --> Planner["Planner<br/>(frontier / frontier_speed)"]
    Planner --> Workers["Worker pool<br/>fetch · extract · dedup · classify<br/>(general · fast)"]
    Workers --> Findings["findings/NNNNNN.md"]
    Findings <--> SQLite["SQLite (WAL)<br/>FTS5 + embeddings<br/>(embeddings tier)"]
    Findings --> Synth["Synthesis<br/>(frontier)"]
    Synth <--> Critique["Critique<br/>(frontier_alt)"]
    Synth --> Report["report.md<br/>+ history/"]

    classDef local fill:#e6f4ff,stroke:#2563eb,color:#0b3d91
    classDef cloud fill:#fff4e6,stroke:#d97706,color:#7c2d12
    class Workers,Findings,SQLite local
    class Planner,Synth,Critique,Report cloud
Loading

Blue = local LM Studio tiers (free at the wallet). Orange = OpenRouter cloud tiers (priced — see Costs). The full tier roster lives in config/models.yaml.

Direct connector kinds

The planner can dispatch directly to any of the connectors below instead of falling back to web_search with a site: operator. Each row here mirrors what the planner sees in its system prompt — same description, same optional payload knobs, same example query. The table is generated from src/research_agent/tools/_registry.py via scripts/regen_readme_kinds.py; do not hand-edit between the sentinels.

Kind What it covers Optional payload knobs Example query
bbb_search Better Business Bureau profiles + ratings (Playwright, no auth) SBI Builders
bne_search BNE Hemeroteca Digital Spanish historical press (Playwright scrape, no auth) max_results, fechaDesde, fechaHasta, localizacion guerra civil 1936
calaccess_search California Cal-Access campaign finance (Playwright) kind: contributions|independent_expenditures Newsom
commons_search Wikimedia Commons free media files with imageinfo license, author, MIME type, original URL, and thumbnail metadata max_results Algerian war photographs
congress_search Bills, members, committees, hearings, congressional record (Congress.gov v3 API) kind: bill|member|committee|hearing|congressional-record Inflation Reduction Act
courtlistener_search Federal & state court opinions, dockets (RECAP), oral arguments — requires COURTLISTENER_API_TOKEN kind: opinions|dockets|oral_arguments Schedule F appellate
cspan_search C-SPAN Video Library US political broadcast video with transcripts (Playwright scrape, no auth) max_results, type=House|Senate Project 2025
dpla_search Digital Public Library of America item metadata across US cultural institutions; requires DPLA_API_KEY max_results, provider Maya land claims
edgar_search SEC filings (10-K, 10-Q, 8-K, Form 4) — requires RESEARCH_USER_AGENT w/ contact email form_type: 10-K|8-K|... Cisco cybersecurity
europeana_search Europeana multilingual European cultural-heritage item metadata across museums, libraries, and archives; requires EUROPEANA_API_KEY max_results, lang Algerian war 1954
fec_search Candidates, committees, schedule A/E filings (OpenFEC) kind: candidates|candidates_enumerate|committees|schedules/schedule_a|schedules/schedule_e, cycle, office, state, district, party, candidate_status, max_rows Trump 2024 committee
fedregister_search Federal Register rules, proposed rules, agency notices since 1994 (no auth) since: YYYY-MM-DD, agencies: [...] Schedule F
gallica_search Gallica/BnF SRU XML search for French national-library newspapers, books, manuscripts, maps, and other digitized primary sources max_results (SRU maximumRecords capped at 50) guerre d'Algerie
gdelt_search GDELT — Global news event aggregator, no site: operator (no auth) since: YYYY-MM-DD, language: english Project 2025 mainstream coverage
iarchive_search Internet Archive texts, audio, movies, and web-archive collection metadata through advancedsearch.php mediatype: texts|audio|movies|web, page: <int> Pullman Strike
iwm_search Imperial War Museums public collections: photographs, sound/oral histories, documents, film, objects (Playwright scrape, no auth) max_results, object_category, related_period, records_with_media, style, page_size Battle of Britain
lda_search Senate Lobbying Disclosure Act filings (registrants, contributions) kind: filings|registrants|contributions Heritage Foundation
licensing_search State contractor / licensing-board lookups (Playwright; CA wired, others stubs) state: CA|TX|FL|NY SBI Builders
linkedin_search LinkedIn person/company lookup via Proxycurl or Lix — requires broker key kind: person|company Sundar Pichai
littlesis_search Power-mapping database — entities, donations, board seats, family ties (lead, not evidence) kind: entities|relationships Peter Thiel
loc_search Library of Congress digital collections, including Chronicling America through the unified loc.gov API collection: chronicling-america|prints|manuscripts|recordings|maps, page: <int> battle of algiers
nara_search US National Archives Catalog OPA v2 records, declassified federal records, military records, photos; requires NARA_API_KEY available_online, type_of_materials, result_types, record_group, page Vietnam War declassified
nonprofits_search ProPublica Nonprofit Explorer (Form 990 filings, no auth) Heritage Foundation
openalex_search OpenAlex Works scholarly articles, abstracts, DOIs, citations, authors, venues, and open-access URLs max_results, filter, sort Project 2025 unitary executive theory
opencorporates_search Global company registry — requires OPENCORPORATES_API_KEY jurisdiction: us_ca|gb|... Acme Holdings
openlibrary_search Open Library book metadata, ISBN/OCLC/LCCN identifiers, and Internet Archive scan IDs through search.json max_results Pullman Strike 1894
persee_search Persee French academic journals in humanities and social sciences (Playwright scrape, no auth) max_results guerre d'Algerie
sanctions_search OFAC SDN + UK sanctions lists (local index, no auth) Wagner Group
scholar_search Google Scholar via SerpAPI — requires SERPAPI_KEY kind: case_law|articles Section 230 appellate
si_search Smithsonian Open Access digitized collection objects, museum artifacts, images, 3D assets, and object metadata via api.data.gov max_results Apollo 11
sos_search State Secretary-of-State business entity filings (Playwright; CA wired, others stubs) state: CA|DE|NV|... Acme Corp
state_election_search Official state election candidate roster sources and portals state, office, cycle, max_results 2026 House candidates
trove_search Trove / National Library of Australia metadata for newspapers, books, photos, magazines, oral histories; metadata-only default category, zone, sortby White Australia Policy 1901
ukna_search UK National Archives Discovery catalogue metadata for Foreign Office, War Office, Colonial Office, and other UK archival records (no auth) max_results, page Mau Mau Kenya
usaspending_search Federal contracts, grants, loans (award-level detail, no auth) award_type: contracts|grants|loans Heritage Foundation contract
wikidata_search Wikidata Query Service raw SPARQL for biographical, relational, occupational, place, and entity-ID data max_results (client-side truncation; SPARQL should include LIMIT) SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q5; wdt:P19 wd:Q90 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } LIMIT 3
wikisource_search Wikisource transcribed primary documents across per-language hosts; fetch returns the full source text in cleaned_text `lang: en fr

What does the output actually look like?

Every job ends with a jobs/<job-id>/report.md — markdown with inline numeric citations and a source list at the bottom. Trimmed real example:

# Investigation Report: Project 2025 Implementation Tracker

## Executive Summary

- **Active Regulatory Shifts:** The EPA and Army Corps of Engineers are
  currently in a public comment period (open through January 5, 2026)
  regarding the revision of "Waters of the United States" (WOTUS)
  definitions following the *Sackett v. EPA* ruling [40, 99].
- **Health Policy Revisions:** Significant proposals within Project 2025
  target HHS and the CDC, including reversing FDA approvals for abortion
  medication (mifepristine), restructuring the CDC into two separate
  agencies, and implementing Medicaid work requirements [38, 43, 44, 116].
- **Expansion of Executive Authority:** The blueprint advocates for
  "unitary executive theory," aiming to place the federal bureaucracy
  under direct presidential control [38, 87].

## Hypotheses

### H1: Identify core policy pillars and specific proposals
**Status:** Confirmed
- **Supporting:** The investigation identified key pillars including
  abortion access restrictions (Comstock Act), immigration overhaul,
  voting rights limitations, and the expansion of executive power [37, 38].

## Open Questions

- **Specific Implementation Dates:** While IRS workforce downsizing is
  noted, the exact timeline for the "quiet cuts" remains unverified [127].

## Recommended Human Follow-Ups

### FOIA candidates
- Correspondence between the EPA and Army Corps of Engineers regarding
  the "wet season" definition in the proposed WOTUS rule [40, 98].

## Sources

1. https://www.aclu.org/project-2025-explained — "Project 2025, Explained"
   (retrieved 2025-05-20)
2. https://www.bbc.com/news/articles/c977njnvq2do — "What is Project 2025?"
   (retrieved 2025-05-20)
40. https://www.alston.com/en/insights/publications/2025/12/epa-army-corps...
    — "EPA, Army Corps of Engineers Proposal on Revised WOTUS Definition"
    (retrieved 2025-05-20)
...

Bracketed numbers in the body resolve to the numbered list at the bottom. Each source row carries the canonical URL, the resolved title, and the retrieval date. Wayback archive URLs are mirrored alongside in jobs/<job-id>/sources/ so a deleted page is still defensible.

Install

Prerequisites

  • Python 3.12+
  • Playwright browsers (installed via playwright install chromium below)
  • tesseract (optional but recommended) — enables the PDF OCR escalation layer for scanned FOIA responses and image-PDF court filings. Without it, those documents silently return degraded text.
    • macOS: brew install tesseract
    • Debian/Ubuntu: apt install tesseract-ocr
# editable install with dev extras
pip install -e ".[dev]"

# one-time browser bootstrap (binaries are not pip-installable)
playwright install chromium

# create your local .env from the template
cp .env.example .env

.env is the only place runtime secrets and operator overrides live — no export required. Lookup order, highest precedence first:

  1. Existing process env vars (CI, one-shot OPENROUTER_API_KEY=... research ...).
  2. ./.env.local (gitignored, dev-only overrides).
  3. ./.env (or the nearest ancestor walking up to repo root).

The full list of recognized keys lives in src/research_agent/config.py (EXPECTED_ENV_KEYS). .env.example and research doctor both read from that list, so there is no drift.

Required / commonly used

Key Required Purpose
OPENROUTER_API_KEY yes Cloud synthesis tier (Claude Opus / Haiku via OpenRouter).
BRAVE_SEARCH_API_KEY no Brave Search API key (free tier ~2000 queries/month). When set, web_search engine auto picks Brave over the DDG-Playwright scraper.
LMSTUDIO_BASE_URL no Override the default http://localhost:1234/v1.
RESEARCH_HEADFUL no Set to 1 to launch Playwright in headed mode for debugging.
YOUTUBE_API_KEY no YouTube Data API v3 key (free quota: 10,000 units/day). When set, tools/youtube.py:search uses the official API; absent, it falls back to scraping the public results page via Playwright.

Connector-specific and advanced

Click to expand — escalation toggles, connector API keys, broker switches
Key Required Purpose
RESEARCH_USER_AGENT no Override default UA sent by httpx + Playwright.
RESEARCH_IGNORE_ROBOTS no Set to 1 to bypass robots.txt checks in web_fetch.
RESEARCH_PDF_VLM_ESCALATION no Set to 1 to enable Opus 4.7 vision escalation for PDFs that fail every cheaper layer. Off by default — costs real money; emits a pdf_vlm_escalation WARN event when fired.
RESEARCH_OCR_VLM_ESCALATION no Set to 1 to enable Opus 4.7 vision escalation for image OCR when Tesseract and the local VLM both fail. Off by default — costs real money; emits an ocr_vlm_escalation WARN event when fired.
RESEARCH_DAEMON_PROGRESS no Set to 0 to suppress the foreground Rich progress bar the daemon writes to stdout when run interactively.
RESEARCH_FRAGMENT_SYNTH no Set to 1 to enable experimental section-fragment synthesis. Unset keeps the legacy whole-report synthesizer.
COURTLISTENER_API_TOKEN no CourtListener API token (free w/ signup) — required by tools/courtlistener.py. Authenticated tier is 5,000 req/hr; anonymous traffic is throttled to the point of unusability.
DATA_GOV_API_KEY no api.data.gov key (free w/ signup at https://api.data.gov/signup/) — used by tools/fec.py, tools/congress.py, and tools/smithsonian.py. FEC/Congress/Smithsonian fall back to DEMO_KEY for low-volume smoke when unset.
LDA_API_KEY no Senate Lobbying Disclosure Act API key (free, optional, register at https://lda.senate.gov/api/register/) — used by tools/lda.py. Anonymous works for low-volume; authenticated raises rate limits. Sent via Authorization: Token <key>.
OPENCORPORATES_API_KEY no OpenCorporates API token — used by tools/opencorporates.py. Required for any live request: anonymous v0.4 access is now gated (returns HTTP 401), so without a key the connector returns no results and smoke skips cleanly. Token rides as ?api_token=<key>. Public-benefit access by emailing service desk; commercial pricing £2,250–£12,000/yr.
OPENALEX_API_KEY no Free OpenAlex API key — used by tools/openalex.py. Optional for low-volume smoke/demos; recommended for regular use since the February 2026 free-key policy. Sent as ?api_key=<key>.
TROVE_API_KEY no Trove/National Library of Australia API key — used by tools/trove.py. Keys expire after 12 months and require renewal by email. Sent as X-API-KEY, not a URL parameter. Connector defaults to metadata-only; no automatic full-text downloads.
NARA_API_KEY no National Archives Catalog OPA v2 API key — used by tools/nara.py. Request by emailing Catalog_API@nara.gov; registration takes about 24h. Sent as x-api-key. Default limit is 10,000 queries/month; connector and smoke skip cleanly when unset.
DPLA_API_KEY no Digital Public Library of America API key — used by tools/dpla.py. Request with curl -X POST https://api.dp.la/v2/api_key/<your-email>; the emailed 32-character key is sent as ?api_key=<key>. Connector and smoke skip cleanly when unset.
EUROPEANA_API_KEY no Europeana API key — used by tools/europeana.py. Create a free key in your Europeana account under Manage API keys (migrated there on 2025-05-28). Sent as ?wskey=<key> to https://api.europeana.eu/api/v2/search.json; connector enforces 1 RPS and smoke skips cleanly when unset.
SERPAPI_KEY no SERPAPI key — required by tools/scholar.py (Google Scholar engine, case law + academic). Plans start at $75/mo for 5k searches across all engines; per-query ≈ $0.015. Sign up at https://serpapi.com/.
LINKEDIN_DATA_API_KEY no LinkedIn data-broker key (default broker: Proxycurl) — required by tools/linkedin.py. Per-lookup ≈ $0.01–$0.05; gate fetches behind explicit planner tasks. Sign up at https://nubela.co/proxycurl/.
LINKEDIN_BROKER no Broker recipe used by tools/linkedin.py. proxycurl (default) or lix; switching to lix consults LIX_API_KEY instead of LINKEDIN_DATA_API_KEY.
LIX_API_KEY no Lix data-broker key (https://lix-it.com/) — only consulted when LINKEDIN_BROKER=lix. Similar per-lookup pricing to Proxycurl.
RESEARCH_REDDIT_USER_AGENT no Override the User-Agent tools/reddit.py sends. Reddit's anonymous JSON endpoint 403s the project's descriptive UA; the connector defaults to a Chrome UA. Set this when you have a registered OAuth app or want a different override than RESEARCH_USER_AGENT (consulted next in the fallback chain).
RESEARCH_MODELS_CONFIG no Path to the models routing YAML the daemon loads. Defaults to config/models.yaml relative to cwd. Set when running out-of-tree or pointing at a packaged config.
RESEARCH_DB_PATH no Override the SQLite index path the daemon uses. Unset uses data/index.sqlite. Useful for isolating runs under test or pointing at a writable disk.
RESEARCH_JOBS_ROOT no Override the directory that holds per-job folders. Unset uses jobs/. Useful for redirecting big runs onto a larger disk.
SANCTIONS_DB_PATH no Override where tools/sanctions.py writes its SDN/EU index sqlite. Unset uses the module default under data/sanctions/. Useful when refreshing into a staging path before atomic swap.

LM Studio

Local tiers run through LM Studio at http://localhost:1234/v1. The exact model identifiers the router maps to each tier live in config/models.yaml; never hardcode model names elsewhere — pick a tier.

Models to download (LM Studio UI → Discover → search by exact ID):

Tier Model ID Purpose
fast josiefied-qwen3-4b-instruct-2507-abliterated-v2 Classification, dedup, language detection.
general qwen3.6-35b-a3b-holo3-qwopus-instruct-qx64-hi-mlx Worker default — query rewriting, extraction, summarization.
reasoner deepseek-r1-distill-qwen-32b Hypothesis ranking, contradiction detection.
vision deepseek-ocr-2 PDF page screenshots, chart reading.
embeddings qwen3-embedding-4b-dwq Semantic search across findings + sources (768-d output).

After downloading, start the LM Studio local server (Developer tab → Server → Start). The default port is 1234 and the OpenAI-compatible endpoint mounts at /v1. Override with LMSTUDIO_BASE_URL if you've moved it (e.g. http://192.168.1.10:1234/v1 for a workstation across the LAN).

embeddings intentionally has no cloud fallback — a stall surfaces as a hard error rather than silently rerouting to a chat model. Keep qwen3-embedding-4b-dwq loaded any time you plan to use research search or the daemon's hybrid retrieval. After upgrading from a release that shipped the older 1024-d qwen3-embedding-4b model, drop data/index.sqlite or re-index any job with persisted embeddings — the new model emits 768-d vectors and existing rows will mis-shape on read-back.

OpenRouter

Cloud tiers go through OpenRouter. Create a key (Dashboard → Keys → Create Key) and paste it into .env:

OPENROUTER_API_KEY=sk-or-v1-...

The key drives three tiers:

Tier Model When it fires
frontier anthropic/claude-opus-4-7 Major synthesis, critique, final report, planner rewrites.
frontier_alt moonshotai/kimi-k2-1t Critique pass — diverse second opinion.
frontier_speed anthropic/claude-haiku-4-5 Fast cloud calls when local isn't enough but Opus is overkill; intake follow-ups; tier fallback.

research doctor sanity-checks the key shape (sk-or- prefix) without hitting the network. List prices live in config/models.yaml under pricing: and feed the budget tracker (src/research_agent/llm/budgets.py).

research doctor

research --help
research --version          # print package version
research doctor             # environment readiness checks (Rich table)
research doctor --json      # same report as machine-readable JSON

research doctor is the canonical wiring check. It verifies:

  • Python ≥ 3.12 and the .env files that were loaded.
  • Every key in EXPECTED_ENV_KEYS (presence + masked tail).
  • OPENROUTER_API_KEY shape (sk-or- prefix).
  • LM Studio reachability at LMSTUDIO_BASE_URL (optional check, never required).
  • data/ and jobs/ exist and are writable.
  • SQLite WAL mode is selectable.
  • config/models.yaml parses.

Required failures exit non-zero (safe to wire into CI as a pre-flight gate). Optional skips (LM Studio unreachable, optional env keys missing) never affect the exit code.

Walk-through

End-to-end: from clean repo to a finished report.

# 1. Verify the stack.
research doctor
# All required checks should be green. LM Studio "skip" is fine if you're
# only running cloud tiers; "fail" on OPENROUTER_API_KEY is not.

# 2. Start a job. The daemon runs detached; control returns immediately.
research start --skip-intake \
    --goal "Compare Pydantic AI, LangGraph, and CrewAI" \
    --budget-usd 5.00 \
    --time-cap 24 \
    --disk-cap-gb 10 \
    --inbox
# → Started job 2026-05-02-compare-pydantic-ai- (daemon pid 12345).
#   Tail logs with: research logs 2026-05-02-compare-pydantic-ai- -f

# 3. See what's running.
research list                          # newest first

# 4. Watch progress live.
JOB=$(research list --json | jq -r '.[0].id')
research status "$JOB" --watch         # Rich panel, refreshes every 2s

# 5. Tail events as they fire.
research logs "$JOB" -f

# 6. Read the report when synthesis lands (auto-rewrites as it iterates).
research view "$JOB" --report          # opens $EDITOR on a TTY

# 7. Stop early if you want — graceful by default.
research stop "$JOB"                   # daemon finishes current task, then synthesizes
research stop "$JOB" --kill            # hard SIGTERM/SIGKILL escalation

# 8. Resume from the last checkpoint after a crash or a clean stop.
research resume "$JOB"

For long unattended runs, see macOS hygiene below.

Directory layout

Per-job folder (jobs/<job-id>/)

Every job is a self-contained folder. The cross-job DB only mirrors metadata for fast queries — the folder is the source of truth. The field-level contract is documented in docs/JOB_FOLDER_CONTRACT.md.

jobs/<job-id>/
├── job.json              # canonical metadata (id, goal, status, timestamps)
├── intake.json           # frozen intake answers
├── goal.md               # human-readable goal + scope
├── plan/                 # planner state (versioned)
├── findings/             # findings/NNNNNN.md (zero-padded, monotonic)
├── sources/              # symlinks/copies of canonical source markdown
├── synthesis/            # synthesis/NNNN.md (versioned)
├── fragments/            # fragments/<section>/NNNN.md (versioned)
├── critique/             # critique/NNNN.md (versioned)
├── artifacts/            # structured outputs such as CSV tables
├── coverage.json         # enumeration coverage ledger, when required
├── inbox/                # optional human-supplied documents for live ingest
│   └── processed/        # ingested inbox files, renamed with content hash
├── report.md             # current report (rotated to report.history/ on rewrite)
├── report.history/       # archived prior reports
├── events.jsonl          # append-only event log
├── INBOX_REPLAN.json     # transient replan trigger after inbox ingest
├── daemon.pid            # written on spawn, removed on clean exit
├── daemon.out.log        # daemon stdout
├── daemon.err.log        # daemon stderr
└── STOP                  # presence signals graceful stop request

Job IDs are deterministic: YYYY-MM-DD-<slug> derived from the intake goal. All on-disk writes go through atomic *.tmp + os.replace so a crashed process never leaves half-written sidecars.

Cross-job state (data/)

data/
├── index.sqlite          # WAL-mode; jobs, findings, sources, llm_calls, FTS5, embeddings
├── index.sqlite-wal
├── index.sqlite-shm
└── llm_cache.sqlite      # LLM response cache (separate file for safe wipe)

research config cache-clear wipes llm_cache.sqlite (and its -wal/ -shm sidecars) without touching index.sqlite.

Gitignored regenerable dirs

jobs/, runs/, data/, logs/, sessions/, .alpha-loop/, .venv/. Lockfiles (uv.lock) are committed.

CLI surface

Job verbs

research start --skip-intake --goal "<goal>" \
    [--budget-usd 5.0] [--time-cap 24] [--corpus path/to/notes] \
    [--corpus-dossier] \
    [--disk-cap-gb 10] [--translate-non-english] [--fragments] [--inbox]

research list                      # newest first; Rich on a TTY, JSON otherwise
research list --json
research list --status running

research status <job-id>           # detailed Rich panel
research status <job-id> --watch   # refresh every 2s

research view <job-id>             # report.md in $EDITOR (or stdout off-TTY)
research view <job-id> --report
research view <job-id> --findings  # latest findings/NNNNNN.md
research view <job-id> --sources
research view <job-id> --hypotheses

research logs <job-id>             # print existing events.jsonl entries
research logs <job-id> -f          # follow appended events
research logs <job-id> --level ERROR

research stop <job-id>             # graceful: drop STOP flag
research stop <job-id> --kill      # SIGTERM, then SIGKILL after 10s

research resume <job-id>           # respawn daemon, restore from checkpoint
research resume <job-id> --force   # resume even when completed/failed

research inbox <job-id> add <file>  # copy a human-supplied doc into the live inbox
research inbox <job-id> list        # show pending and processed inbox files
# Registered verb paths: research inbox add, research inbox list.

research search "<query>"          # hybrid FTS5 + semantic (cross-job)
research search "<query>" --fts-only
research search "<query>" --job <job-id>
research search "<query>" --kind findings
research search "<query>" --kind sources
research search "<query>" --json

research export <job-id> --zip
research export <job-id> --md-bundle
research export <job-id> --csv <artifact-name>
research export <job-id> --zip --out PATH
research export <job-id> --md-bundle --include-history

research compare <ref-a> <ref-b>           # delta table (counts, departments, hosts)
research compare <ref-a> <ref-b> --json    # machine-readable deltas
research compare <ref-a> <ref-b> --side-by-side   # unified diff via $PAGER

research start runs interactive intake (or accepts --skip-intake --goal "..." as a non-interactive testing back door), creates the job folder + DB row, and spawns a detached daemon via subprocess.Popen(start_new_session=True). The PID is written atomically to jobs/<id>/daemon.pid; the daemon's stdout/stderr land in jobs/<id>/daemon.{out,err}.log.

--inbox enables a live jobs/<id>/inbox/ watcher for the daemon. Files added with research inbox <job-id> add <file> are indexed into the local corpus, moved to inbox/processed/, logged as corpus_doc_added, and used to trigger a tactical replan while the daemon is still running.

--corpus-dossier (epic #359) opts the job into per-page dossier-mode ingestion. With the flag set, local-corpus indexing runs in per-page mode when the daemon indexes operator-supplied files; the current live --inbox watcher already forwards the flag, and the corpus-walk startup path will use the same intake field when it lands. In dossier mode PDFs go through pdf.extract_pages_sync() and write one Source row per page, with metadata.parent_file / metadata.page_no / metadata.page_chunk stamped on the sidecar. The dossier rollup (filed in M2 of the epic) uses this grouping to produce one dossier per file plus a structured dossiers artifact; without dossier mode the existing thematic-sampling synthesis stays the default. The flag requires --corpus to be set (otherwise research start exits with code 2).

--input-csv PATH --artifact NAME --key COL imports an existing CSV into jobs/<id>/artifacts/ before the daemon starts so a run can enrich missing cells instead of creating a list from scratch. Repeat --key or pass comma-separated key columns. Use --target-column COL to restrict enrichment to specific columns, repeating it or passing comma-separated names as needed. Existing non-empty cells are preserved by default; --update-existing records overwrite intent in intake metadata.

--translate-non-english is off by default. When enabled, extracted findings whose source metadata is non-English get a findings/NNNNNN.translation.md English mirror via frontier_speed; the original finding remains unchanged. Plan YAML can also opt in per task with payload.translate_non_english: true. See docs/CONFIG.md for the per-job knob and budget behavior.

--fragments enables experimental section-fragment synthesis for a job by setting RESEARCH_FRAGMENT_SYNTH=1 for the spawned daemon and recording "fragments": true in intake.json. Leaving it unset keeps the legacy whole-report synthesis path.

research search defaults to a hybrid pass: FTS5 on findings_fts / sources_fts plus semantic cosine over embeddings blobs, deduped and fused via reciprocal-rank fusion (k=60). Pass --fts-only for a keyword-only escape hatch (useful when LM Studio is offline or for debugging FTS5 syntax).

research export bundles a job for sharing. --zip walks jobs/<job-id>/ into a ZIP_DEFLATED archive; --md-bundle concatenates intake, report.md, every finding, and the source list into one navigable markdown file; --csv <artifact-name> exports a structured table artifact from jobs/<job-id>/artifacts/ with stable column order and empty cells for missing optional values. Exactly one mode flag is required. When structured artifacts exist, synthesis links them from the report instead of relying on prose-only list output.

research compare diffs two runs by counts (tasks, findings, sources, plan versions, drain-replans, cornerstone hits), department coverage, and source-host frequency. Each <ref> is either a live job id or a path to a report.md (works on archived copies under jobs/<id>/archive/ even after the job's DB rows are gone). Re-running research start against the same goal auto-archives the prior report.md into that folder so this command always has something to compare against; pass --fresh-reset to opt back into the legacy "fail on collision" behavior.

Config verbs

research config cache-clear        # wipe data/llm_cache.sqlite

The LLM response cache lives in its own SQLite file, keyed on (provider, model, prompt, sampling-params, tool-defs) with a 30-day default TTL. The router opts in per call (cache=True) — deterministic extractions opt in, exploratory synthesis opts out.

Hidden smoke verbs

_smoke-llm and _smoke-tool are operator/CI helpers, hidden from --help but stable enough to script against. See Troubleshooting for usage.

Costs

Local LM Studio inference is free at the wallet (the cost is the GPU/CPU time on your laptop). All dollar spend goes through OpenRouter via the frontier, frontier_alt, and frontier_speed tiers.

Realistic per-run dollar ranges

The default cap on research start is --budget-usd 5.00. Typical spend for a single run, depending on goal scope and aggressiveness:

Run shape Typical spend Notes
Quick recon (≤ 1 hr, ~50 tasks) $0.10 – $1.00 Mostly local; one or two cloud syntheses.
Half-day investigation (~4 hr) $1 – $5 Several synth + critique passes; cap defaults handle this.
24-hour soak (Phase 6 fixture) $5 – $25 Set --budget-usd 25.00 for the full soak per tests/integration/test_phase6_soak_24h.md.

The exact ratio depends on how often the planner triggers cloud calls, which models actually serve them (Opus is ~25× the price of Haiku per output token), and whether the LLM cache returns hits.

What triggers cloud calls

In rough order of frequency:

  • Synthesis passesfrontier for major checkpoints, frontier_speed as a budget-aware fallback if frontier would tip over the cap.
  • Critiquefrontier_alt is preferred so the synthesizer and critic disagree productively.
  • Adaptive intake follow-upsfrontier_speed for short clarifying turns during interactive intake.
  • Local-tier fallbacks — when an LM Studio tier times out or returns a RateLimitError, the router routes to the tier's fallback_tier. This is the only path where a "local-looking" task quietly costs money; watch for tier_fallback events in events.jsonl if your spend looks high.

How the budget cap behaves

The cap is enforced in src/research_agent/llm/budgets.py at the OpenRouter wrapper — every cloud call passes through it; no direct OpenRouter clients exist elsewhere.

  • Soft warning at 90 %. A single WARNING log fires the first time spend crosses 90 % of the cap. Use it as your "wrap up" signal if watching live.
  • Hard stop at 100 %. BudgetTracker.precheck() raises BudgetExceeded before the next cloud call ships. The loop catches it, emits cap_hit, and triggers a final-pass synthesis on the cheaper frontier_speed tier so the user gets a report. If even frontier_speed would blow the cap, a template stub is rendered from on-disk findings without any LLM call (per issue #39).
  • State survives restarts. BudgetTracker re-hydrates spent from jobs.cost_so_far_usd at construction, so a daemon restart picks up the same running total.
  • Local tiers are free. cost_usd = 0.0 for local rows in llm_calls; only OpenRouter tiers are priced. Pricing is read from the pricing: block in config/models.yaml (manually maintained).

Compared to other research tools

The questions you're going to ask (and that issue reporters will ask): "why not LangGraph / CrewAI / Gemini Deep Research / Perplexity?" Honest answer is "because they're solving slightly different problems" — laid out so you can pick the right tool for what you're doing:

muckwire LangGraph Gemini Deep Research Perplexity Pro
Runs locally ⚠️ framework — you provide infra
$0 default cost depends ❌ subscription
Source attribution + Wayback archival manual ⚠️ partial
Multi-hour unattended runs manual
CLI lifecycle (start / stop / resume)
Per-job folder + cross-job index
Cost cap enforcement manual n/a
Pre-built agentic primitives ❌ build-your-own
Conversational UX

If you want a chat box that answers a single question, use Perplexity. If you want to wire a graph by hand, use LangGraph. If you want to hand a CLI a goal and walk away with a defensible report eight hours later, that's the niche this project is in.

Roadmap / known limitations

The 🚧 callout above is honest, but vague. Specifics, in priority order:

  • Single-machine only. v1 ships single-host; no multi-machine / distributed coordination is on the roadmap. A "control plane" would be a separate project.
  • No web UI. CLI-first is intentional — operators run this over SSH on a workstation overnight. A read-only status dashboard might land later but is not committed.
  • macOS / Linux only. LM Studio runs on macOS (MLX) or Linux/Windows (CUDA), but the daemon's lifecycle code (launchd plist, caffeinate guidance, softwareupdate) is only documented for macOS. Linux works but needs equivalent systemd-inhibit wiring; Windows is untested.
  • Cost model drifts. OpenRouter pricing in config/models.yaml is manually maintained — provider price changes don't auto-sync. Run research doctor --json and compare against the OpenRouter dashboard if your spend looks off.
  • Connector buildout is in flight. Issue #107 is the connector epic; #158 is the active smoke-fix sweep against the post-#107 bug list (anonymous-tier breakages, request-shape regressions, env-var skip behavior). Expect intermittent smoke skips on niche connectors until that closes.
  • Synthesizer can close subgoals too aggressively. Tracked at #159 — broad-scope subgoals sometimes terminate overnight runs early. Workaround: phrase the goal narrowly, or run with a longer --time-cap.
  • completion_reason mislabel. Tracked at #160 — clean subgoal closes are labeled user_stopped rather than goal_complete. Cosmetic but misleading in research status output.

Bug reports are how this list shrinks — open an issue if you hit something not on it.

macOS hygiene

A 24-hour soak on a laptop has three failure modes that aren't bugs in the daemon: idle sleep, OS auto-reboot, and you closing the lid in the wrong way. Address them once, up front.

Prevent idle sleep — caffeinate -i -w <pid>

After research start returns, capture the daemon PID and tie a caffeinate to it from a second terminal:

DAEMON_PID=$(cat jobs/<job-id>/daemon.pid)
caffeinate -i -w "$DAEMON_PID" &
  • -i blocks idle sleep specifically (the display can still dim, which is fine — the soak doesn't need pixels).
  • -w <pid> ties caffeinate's lifetime to the daemon. When the daemon exits — graceful stop, kill, or crash — caffeinate auto-exits with it, so there's no orphan process holding the system awake.

Long activity gaps in events.jsonl are usually idle-sleep, not a bug in the daemon. On non-macOS hosts, use the equivalent for your OS (e.g. Linux systemd-inhibit --what=idle).

Disable auto-reboot for system updates

macOS will silently install and reboot for security updates by default. A reboot mid-soak loses the daemon and leaves a stale PID file behind. Either:

  • GUI: System Settings → General → Software Update → Automatic Updates → toggle off "Install macOS updates" and "Install Security Responses and system files".
  • CLI:
    sudo softwareupdate --schedule off

Re-enable after the soak finishes if you want the OS to keep itself patched.

Optional launchd plist for auto-resume on boot

If a reboot does happen (power loss, manual restart), you can have the most recent running job auto-resume. Drop a launch agent at ~/Library/LaunchAgents/com.alpha.research.resume.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.muckwire.resume</string>
  <key>ProgramArguments</key>
  <array>
    <string>/bin/bash</string>
    <string>-lc</string>
    <string>cd /path/to/muckwire &amp;&amp; \
            JOB=$(./.venv/bin/research list --json --status running 2>/dev/null \
                  | /usr/bin/python3 -c 'import json,sys; jobs=json.load(sys.stdin); print(jobs[0]["id"]) if jobs else None') &amp;&amp; \
            [ -n "$JOB" ] &amp;&amp; ./.venv/bin/research resume "$JOB"</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/research-resume.out.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/research-resume.err.log</string>
</dict>
</plist>

Load it once: launchctl load ~/Library/LaunchAgents/com.muckwire.resume.plist. Edit /path/to/muckwire to your checkout. Inspect /tmp/research-resume.{out,err}.log if a boot doesn't pick up the job you expected.

Troubleshooting

research doctor failures

Failure What to do
python: fail Install Python 3.12+ (brew install python@3.12) and rebuild the venv.
env:OPENROUTER_API_KEY: missing (required) Add the key to .env. Restart any open shell so the new value is picked up.
openrouter_key_shape: fail Key doesn't start with sk-or- — copy it again from the OpenRouter dashboard.
lm_studio: skip ... not reachable Optional, but local tiers won't work. Start LM Studio, click Server → Start, confirm port 1234.
writable_dirs: fail data/ or jobs/ permissions issue. mkdir -p data jobs && chmod u+rwx data jobs.
sqlite_wal: fail Stdlib SQLite is too old or the temp dir is read-only. Re-run on a writable partition.
models_yaml: fail config/models.yaml was edited and no longer parses. git diff config/models.yaml to inspect.

When in doubt: research doctor --json | jq . for a structured view that omits the Rich formatting.

Smoke commands

Two hidden verbs verify the LLM stack and tool registry without spinning up a job:

# Single structured-output call against one tier in config/models.yaml.
research _smoke-llm fast "Say hello"
research _smoke-llm general "Say hello"
research _smoke-llm reasoner "Say hello"
research _smoke-llm frontier "Say hello"
research _smoke-llm frontier_alt "Say hello"
research _smoke-llm frontier_speed "Say hello"

# Skipped (exit 0) without --image; reports `output: skipped: vision: no image provided`.
research _smoke-llm vision "Describe this" --image path/to/page.png

# Bypasses Pydantic AI; hits /embeddings directly. Reports `output: dim=<N>`.
research _smoke-llm embeddings "vector me"

# Tool registry probes (Phase 3 connectors).
research _smoke-tool web_search "muckraker investigative journalism"
research _smoke-tool web_fetch "https://example.com/article"
research _smoke-tool arxiv "transformer interpretability"
research _smoke-tool news "federal reserve"
research _smoke-tool trove_search "White Australia Policy"
research _smoke-tool commons_search "Algerian war photographs"

web_fetch prints the resolved title, the path that served the fetch (httpx vs playwright), HTTP status, word count, the Wayback archive URL (when Save Page Now completed in time), and the first 200 characters of cleaned text. A missing archive_url is not a fetch failure — Wayback archival is fire-and-forget.

Where to read events

  • research logs <job-id> -f — formatted tail of events.jsonl (level filter via --level ERROR).
  • jobs/<job-id>/events.jsonl — raw append-only JSON, one event per line. jq is your friend (jq 'select(.level=="ERROR")' events.jsonl).
  • jobs/<job-id>/daemon.err.log — daemon stderr (uncaught exceptions, process-level errors that didn't make it to events.jsonl).
  • data/index.sqlite — cross-job mirror of events / findings / sources / llm_calls. Open with sqlite3 data/index.sqlite for ad-hoc queries.

Disk cap

Each job has a per-job disk cap (default 10 GB, override with --disk-cap-gb). The daemon polls jobs/<id>/ every 5 minutes; when total usage exceeds the cap, it scores every linked source by 5 * findings_usage + 1 * fts_title_hits − 0.1 * age_days and prunes the lowest-scored 10 % until usage drops below 90 % of the cap. A single WARN/warning event marks the cap crossing; one INFO/source_pruned event fires per file removed. Pruned ≠ banned: the sources row stays in the cross-job index with md_path = NULL, and a future fetch with the same sha256 transparently re-creates the file under the current job.

End-to-end testing

The Phase 4 "done when" gate is exercised manually — too heavy and too cost-bearing for CI. See tests/integration/test_phase4_e2e.md for the playbook (canonical fixture goal, driver script, AC verification commands, and a triage table for common failures).

The Phase 5 (4-hour daemon-lifecycle soak) and Phase 6 (24-hour real-goal soak) gates have their own playbooks alongside it: tests/integration/test_phase5_lifecycle.md and tests/integration/test_phase6_soak_24h.md. Phase 6 also captures its results in tests/integration/soak_24h_postmortem.template.md (copy the .template.md to a dated file for your specific run rather than overwriting the template).

About

Autonomous research agent that runs locally on a Mac via LM Studio. Plans, fetches, synthesizes, and cites — overnight, $0, with real source attribution. Plug in OpenRouter for cloud-grade synthesis when needed.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors