Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 45 additions & 44 deletions README.md

Large diffs are not rendered by default.

14 changes: 8 additions & 6 deletions docs/API_KEYS.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,14 +83,16 @@ explicitly per #113).
| Connector | Issue | Env var | Where to get it | Approx cost |
|---|---|---|---|---|
| Google Scholar via SERPAPI | #114 | `SERPAPI_KEY` | <https://serpapi.com/users/sign_up> | $75/mo for 5K queries (Scholar is one engine of many they offer) |
| LinkedIn via Proxycurl (default broker) | #115 | `LINKEDIN_DATA_API_KEY` | <https://nubela.co/proxycurl/> → Sign up → API Key | $0.01–$0.05 per profile lookup |
| LinkedIn via Lix (alternate broker) | #115 | `LIX_API_KEY` (set `LINKEDIN_BROKER=lix` to switch) | <https://lix-it.com/> → Sign up | Similar per-lookup pricing to Proxycurl |
| LinkedIn via Proxycurl (legacy default broker) | #115/#320 | `LINKEDIN_DATA_API_KEY` | Proxycurl official pages now say the service is shut down; use only if an operator confirms legacy access | Historical broker pricing only |
| LinkedIn via Lix (alternate broker) | #115/#320 | `LIX_API_KEY` (set `LINKEDIN_BROKER=lix` to switch) | <https://lix-it.com/> → Sign up | Paid/gated Lix credits; review current pricing and terms |

For LinkedIn the connector is **broker-pluggable**: `LINKEDIN_BROKER`
selects the recipe (`proxycurl` by default, or `lix`). Each broker
reads its own key — see the rows above. Adding another broker is a
recipe-layer change in `tools/linkedin.py`. Use whichever broker your
wallet and TOS comfort allow.
selects the recipe (`proxycurl` by default, or `lix`). Proxycurl is retained
only as a legacy code path because Nubela now says it is shut down; NinjaPear
is the successor platform but is not implemented by this connector. Each
broker reads its own key — see the rows above. Adding another broker is a
recipe-layer change in `tools/linkedin.py`. Use broker data only when your
wallet, terms review, and source-provenance needs allow it.

---

Expand Down
38 changes: 38 additions & 0 deletions docs/CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,41 @@ the epic) reads `metadata.parent_file` to build one
artifact. Stage 1 in this PR series is just the ingestion plumbing —
M1.2 wires the per-page coverage ledger, and M2 fills in the
extraction / rollup tasks.

## Candidate Roster Handoff Backtest

The 2026 federal candidate-roster regression runs without live network access:

```bash
UV_CACHE_DIR=.uv-cache uv run pytest tests/test_candidate_roster_backtest.py -q
```

It covers planner handoff failures before enqueue: grouped
`state_election_search` tasks without `state`, full state names repaired to
postal abbreviations, empty FEC candidate searches repaired to
`kind=candidates_enumerate` when structured filters exist, and rejected when
they do not.

When LM Studio and live source access are available, run a short local smoke
from the repo root:

```bash
UV_CACHE_DIR=.uv-cache uv run research start \
--skip-intake \
--local \
--max-tasks 10 \
--goal "As of May 15, 2026, create a complete sourced state-by-state list of every U.S. House and Senate candidate in all 50 states."
```

Then inspect the job events for connector-contract repairs/rejections and
cadence diagnostics:

```bash
jq 'select(.kind=="connector_contract_rejected" or .kind=="connector_contract_repaired" or .kind=="warning" or (.kind=="checkpoint" and (.payload.checkpoint_kind=="synthesis_done" or .payload.checkpoint_kind=="critique_done")))' \
jobs/<job-id>/events.jsonl
```

The smoke is healthy when malformed connector tasks are absent or logged as
contract repairs/rejections before dispatch, local model routing is visible in
the daemon logs, and any synthesis/critique failure appears as a `warning`
event instead of a quiet pending-task stall.
70 changes: 70 additions & 0 deletions docs/CONNECTOR_SKILL_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
name: connector-short-name
description: "One-line planner routing signal for this connector."
when_to_use: "Research situations where this connector is the right first source."
when_not_to_use: "Nearby questions that should use another connector or generic web search."
---

# Connector display name

Use `<short_name>_search` for the connector's authoritative scope. Keep usage
guidance here; registry schemas enforce only the minimum payload contract.

## Official documentation

- API or site docs:
- Terms, usage policy, or robots guidance:
- Migration or maintenance notices to verify before changing code:

## Auth and cost

- Required env vars:
- Free/paid constraints:
- Anonymous fallback behavior:

## Required payload fields

- `query` - required common field.
- `sub_question` - required common field.
- Connector-specific required fields:

## Knobs available

- `kind` - valid modes and defaults.
- `max_results` - default and cap.
- Other connector-specific fields:

## Valid payload examples

```yaml
kind: <short_name>_search
payload:
query: "example query"
sub_question: "What should this connector prove?"
```

## Request and pagination pattern

Describe endpoint/page entry, filters, pagination, detail-page fan-out, and
rate limits. Include stable selectors only when they are reliable enough to
survive ordinary site changes.

## Failure modes

- Missing credentials:
- Rate limits/captcha/blocked access:
- Maintenance windows:
- True no-result behavior:
- Retry/backoff guidance:

## Evidence shape

Describe expected `SearchResult` fields and which connector-specific values
belong in `extras`. For fetch support, describe expected `Source.cleaned_text`
sections and `metadata` fields.

## Anti-patterns

- Do not use this connector outside its source authority.
- Do not treat third-party profile/context pages as official records unless
the connector itself is an official registry.
96 changes: 79 additions & 17 deletions src/research_agent/doctor.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,57 +524,118 @@ def check_task_kind_registry_coherence() -> CheckResult:
)


def check_registry_contract_coherence() -> CheckResult:
"""Assert registered connector contracts are importable and planner-visible."""
name = "registry_contract_coherence"
try:
import importlib

import research_agent.tools # noqa: F401 - populate the connector registry
from research_agent.tools._registry import iter_kinds

problems: list[str] = []
required_summaries: list[str] = []
for entry in iter_kinds():
fields = set(entry.payload_schema.model_fields)
missing_base = {"query", "sub_question"} - fields
if missing_base:
problems.append(
f"{entry.name} payload schema missing base fields: {sorted(missing_base)}"
)
module = importlib.import_module(f"research_agent.tools.{entry.module_name}")
if not hasattr(module, "search"):
problems.append(f"{entry.name} module {entry.module_name} has no search()")
required = ", ".join(entry.required_payload_fields) or "none"
required_summaries.append(f"{entry.name} required={required}")

if problems:
return CheckResult(
name,
"fail",
required=True,
detail="; ".join(problems),
)
detail = (
f"{len(required_summaries)} connector contract(s) expose required fields; "
+ "; ".join(required_summaries[:8])
)
if len(required_summaries) > 8:
detail += f"; +{len(required_summaries) - 8} more"
return CheckResult(name, "ok", required=True, detail=detail)
except Exception as exc: # noqa: BLE001
return CheckResult(
name,
"fail",
required=True,
detail=f"coherence check raised {type(exc).__name__}: {exc}",
)


def check_registry_skill_coherence() -> list[CheckResult]:
"""Assert each registered kind's skill file exists and parses.

Issue #223: every connector PR ships a ``skills/connectors/<name>.md``
file (per #211/#212). Kinds with ``skill_name=None`` are grandfathered
from the existing-connector skills backfill — they ``skip`` rather than
fail. Kinds whose ``skill_name`` is set but the file is missing are a
hard ``fail`` (the planner would fall back to a description-only path).
file (per #211/#212). Issue #317 makes missing coverage a hard failure
unless the registry entry carries an explicit issue-linked exemption.
"""
from research_agent.skills.loader import SkillParseError, _parse, _skills_dir
from research_agent.tools._registry import iter_kinds

results: list[CheckResult] = []
for entry in iter_kinds():
row = f"registry_skill:{entry.name}"
expected_name = entry.expected_skill_name
expected_path = _skills_dir("connectors") / f"{expected_name}.md"
if entry.skill_name is None:
base_detail = (
f"kind={entry.name}; short_name={entry.short_name}; "
f"module_name={entry.module_name}; expected="
f"skills/connectors/{expected_name}.md"
)
if entry.skill_exemption:
results.append(
CheckResult(
row,
"skip",
required=False,
detail=f"{base_detail}; exemption={entry.skill_exemption}",
)
)
continue
results.append(
CheckResult(
row,
"skip",
required=False,
"fail",
required=True,
detail=(
f"{entry.name} grandfathered (skill_name=None);"
" backfill pending"
f"{base_detail}; missing skill_name and no documented exemption"
),
)
)
continue
path = _skills_dir("connectors") / f"{entry.skill_name}.md"
if not path.exists():
if not expected_path.exists():
results.append(
CheckResult(
row,
"fail",
required=True,
detail=(
f"missing skills/connectors/{entry.skill_name}.md"
f" for kind {entry.name}"
f"missing skills/connectors/{expected_name}.md"
f" for kind {entry.name}; short_name={entry.short_name};"
f" module_name={entry.module_name}; expected={expected_path}"
),
)
)
continue
try:
_parse("connectors", entry.skill_name, path)
_parse("connectors", expected_name, expected_path)
except SkillParseError as exc:
results.append(
CheckResult(
row,
"fail",
required=True,
detail=f"{path}: {exc}",
detail=f"{expected_path}: {exc}",
)
)
continue
Expand All @@ -583,7 +644,7 @@ def check_registry_skill_coherence() -> list[CheckResult]:
row,
"ok",
required=True,
detail=f"skills/connectors/{entry.skill_name}.md parses",
detail=f"skills/connectors/{expected_name}.md parses",
)
)
return results
Expand All @@ -602,7 +663,7 @@ def check_registry_skill_summary_coherence(
try:
detail_rows = rows if rows is not None else check_registry_skill_coherence()
failures = [row for row in detail_rows if row.required and row.status == "fail"]
skipped = [row for row in detail_rows if row.status == "skip"]
exempted = [row for row in detail_rows if row.status == "skip"]
ok_count = sum(1 for row in detail_rows if row.status == "ok")
if failures:
failed_names = ", ".join(row.name for row in failures)
Expand All @@ -618,7 +679,7 @@ def check_registry_skill_summary_coherence(
required=True,
detail=(
f"{ok_count} connector skill file(s) parse;"
f" {len(skipped)} grandfathered skip(s)"
f" {len(exempted)} documented exemption(s)"
),
)
except Exception as exc: # noqa: BLE001
Expand Down Expand Up @@ -680,6 +741,7 @@ def run_all_checks(
results.extend(check_sanctions_refresh())
results.append(check_planner_allowlist_coherence())
results.append(check_task_kind_registry_coherence())
results.append(check_registry_contract_coherence())
registry_skill_rows = check_registry_skill_coherence()
results.append(check_registry_skill_summary_coherence(registry_skill_rows))
results.extend(registry_skill_rows)
Expand Down
2 changes: 2 additions & 0 deletions src/research_agent/observability/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@
"cornerstone_followups_emitted",
"translation_skipped_budget",
"second_order_fanout",
"connector_contract_rejected",
"connector_contract_repaired",
"source_list_reconciled",
"synth_status_from_prose",
"synth_status_missing",
Expand Down
Loading