Skip to content

v0.7: Add Research Radar templates and watchlist#20

Merged
t3chn merged 1 commit into
mainfrom
feat/research-radar
May 25, 2026
Merged

v0.7: Add Research Radar templates and watchlist#20
t3chn merged 1 commit into
mainfrom
feat/research-radar

Conversation

@t3chn
Copy link
Copy Markdown
Contributor

@t3chn t3chn commented May 25, 2026

Summary:

  • Adds a public-safe Research Radar layer for benchmark/eval methodology monitoring.
  • Adds prioritized watchlists, source map, repo/team lists, and query sets.
  • Adds daily brief, weekly synthesis, and reading queue templates.
  • Adds docs explaining cadence, public/private boundaries, action categories, and roadmap decision flow.
  • Updates README and docs index links.

Non-goals:

  • no new benchmark task family;
  • no automation scripts;
  • no scheduled jobs;
  • no web scraping;
  • no private eval material;
  • no hidden holdouts, answer keys, customer data, private rubrics, or protected scorer configs;
  • no personal/private notes;
  • no consumer product coupling.

Test plan:

  • make validate
  • make test
  • make leak-check
  • python3 -m ruff check .
  • git diff --check

@t3chn t3chn force-pushed the feat/research-radar branch from c42cc42 to e75ea50 Compare May 25, 2026 13:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c42cc42813

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread research/source-map.csv Outdated
P0,search cluster,MCP benchmark cluster,"MCP and tool-registry benchmark methods","GitHub/arXiv search: MCP benchmark, MCP-Bench, MCP-Universe",daily,medium,high,"mcp;tool-use"
P0,search cluster,Deep Research benchmark cluster,"Fixed-corpus research, citation scoring, claim checks","GitHub/arXiv search: DeepResearch Bench, LiveDRBench, DRBench, PaperBench",daily,medium,high,"research;citations"
P0,repo,AgentDojo,"Prompt-injection and tool-output trust-boundary benchmark design",https://github.com/ethz-spylab/agentdojo,daily,high,high,"security;injection"
P0,standards,NIST AI Risk Management Framework,"Evaluation governance and risk framing",https://www.nist.gov/itl/ai-risk-management-framework,weekly,high,medium,"standards;governance"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid secret-pattern trip in source map URL

The new NIST URL contains the substring sk-management-framework, which matches the leak checker’s secret regex sk-[A-Za-z0-9_-]{20,} in scripts/public_leak_check.py and causes make leak-check / python3 scripts/public_leak_check.py . to fail on this commit. This turns the newly added documentation row into a deterministic CI/public-release gate failure, so the URL or leak-check handling needs to be adjusted.

Useful? React with 👍 / 👎.

@t3chn t3chn merged commit b88bcfe into main May 25, 2026
1 check passed
@t3chn t3chn deleted the feat/research-radar branch May 25, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant