This document captures the critical technical design details that the implementation should follow.
It complements PLAN.md, which defines the workflow and agent boundaries. This file focuses on concrete design contracts, data shapes, initialization behavior, persistence rules, and the controlled code-execution model.
The system starts each run from three durable external inputs:
- The live JWKS Catalog YAML:
https://raw.githubusercontent.com/UnitVectorY-Labs/jwks-catalog/refs/heads/main/data/services.yaml - The persisted local
candidates.yamlfrom previous runs. - The Cloudflare top domains API data for the run.
These inputs serve different roles:
- The live catalog is the official exclusion set.
candidates.yamlis the provisional exclusion set.- Cloudflare data is the search-space seed.
The initialization stage must run before planning or investigation.
Initialization responsibilities:
- Create the run record and artifact directories.
- Download the current
services.yaml. - Parse and normalize catalog entries into SQLite.
- Load local
candidates.yaml. - Parse and normalize candidate entries into SQLite.
- Build the current known-issuer exclusion view.
- Record an initialization summary for the rest of the run.
The rest of the workflow must assume that this has already happened.
The live JWKS Catalog is not schema-perfect, so the importer must normalize field variations instead of assuming one exact YAML key spelling.
At minimum, the importer must tolerate these keys:
openid-configurationopenid_configurationopen_id_configurationjwks_urialiases
The importer should produce a normalized row model such as:
source:official_catalogservice_idnameissuer_hintopenid_configuration_urljwks_urialiasescatalog_snapshot_id
Important rule:
- The catalog import is an exclusion source, not a target for direct automated modification in the first version.
The local candidates.yaml represents previously proposed but not yet officially promoted findings.
Those entries must be treated as if they are already known for the purpose of future runs.
That means:
- do not re-propose them as new
- do not re-investigate them unless explicitly marked stale or invalid
- do allow them to be revalidated by targeted maintenance logic in a later version
The importer should store them separately from official catalog entries, but expose a combined known set to planning and review.
Recommended normalized fields:
source:candidate_filecandidate_idnameissueropenid_configuration_urljwks_urialiasesstatuscandidate_snapshot_id
The system should maintain a logical known set composed of:
- all imported official catalog entries
- all imported candidate entries that are still active
This known set should be queryable by:
- issuer
- openid configuration URL
- jwks URI
- primary domain
- alternate domains
This is the core exclusion view used by planning, investigation, and candidate review.
SQLite should store:
- run metadata
- normalized catalog and candidate imports
- investigation batches
- summarized domain observations
- probe classifications
- issuer clusters
- candidate decisions
- report indexes
SQLite should not store:
- every raw HTTP response body
- every raw crawl artifact
- arbitrary generated Python code output as unstructured blobs unless specifically retained as an artifact reference
Files should store:
- imported YAML snapshots
- generated
candidates.yaml - run reports
- retained positive evidence
- retained ambiguous evidence
- bounded investigation artifacts
Recommended layout:
state/
oidc-hunter.db
candidates.yaml
reports/
artifacts/
runs/<run_id>/
imports/
investigation/
probes/
retained/
The database should answer these operational questions efficiently:
- Has this domain already been investigated?
- Has this issuer already been seen?
- Is this candidate already in the official catalog?
- Is this candidate already in
candidates.yaml? - Which tactics have historically produced good candidates?
- Which domains or patterns were already exhausted?
Recommended table groups:
runscatalog_snapshotscatalog_entriescandidate_snapshotscandidate_entriesstrategy_tacticsrun_plansinvestigation_batchesdomain_stateprobe_summaryissuer_clustercandidate_decisionlessons_learned
You explicitly do not want the system to dump every single crawl result into SQLite.
The right compromise is:
- keep a normalized
domain_stateor equivalent table - record only the latest meaningful classification and selected metadata
- retain richer artifacts only for positives, ambiguous cases, or cases that affected planning
Recommended domain_state fields:
domaindiscovered_by_tacticfirst_seen_run_idlast_seen_run_idlast_probe_statuslast_probe_classificationlast_openid_configuration_urllast_issuerlast_jwks_urineeds_followupartifact_ref
This keeps the database useful without making it a raw crawl log dump.
The most important change from the previous Go implementation is that discovery should not be reduced to a fixed deterministic prefix crawler.
The system should instead use a hybrid model:
- The planner chooses tactics and budgets.
- The investigator writes bounded Python code to explore those tactics.
- Deterministic tools probe and classify the resulting targets.
- The review loop decides whether findings should be retained.
This is the core reimagining of the crawler as an agentic workflow.
Only one agent should be allowed to write and execute Python code: the InvestigatorAgent.
That code-execution tool should be narrow and opinionated.
It should:
- accept structured task input
- run in a restricted workspace
- expose helper utilities for common operations
- emit structured output
- refuse direct arbitrary persistence into the durable database
Suggested helper capabilities available inside the sandbox:
- read compact plan input
- read compact historical summaries
- emit target domain lists
- emit notes and hypotheses
- call a constrained HTTP helper if needed
- write run-scoped artifacts
Suggested restrictions:
- time limit per execution
- memory limit per execution
- no unrestricted shell access
- no arbitrary write access outside the run artifact directory
- no direct network access beyond configured allowlists if feasible
The actual OIDC verification should remain deterministic.
Recommended flow:
- Investigator emits candidate targets.
- Verification tool probes:
https://<domain>/.well-known/openid-configuration
- Verification tool classifies result:
- not found
- timeout
- invalid response
- valid OIDC discovery document
- For valid responses, normalize:
- issuer
- jwks URI
- host relationship
- Store only summarized results in SQLite.
This separation matters:
- agentic code performs adaptive search
- deterministic tools perform repeatable verification and persistence
The candidate review stage should decide among four classes:
rejectneeds_more_evidencenew_candidatealternative_domain
Critical rules:
- If the issuer already exists in the official catalog, reject as known.
- If the issuer already exists in
candidates.yaml, reject as known. - If multiple domains point to the same issuer, preserve one canonical candidate and store the others as alternates.
- If the issuer-domain relationship looks suspicious, require more evidence before retaining it.
The first version's primary durable output is candidates.yaml.
That file should be generated deterministically from the database after review is complete.
The update logic should:
- Load retained provisional candidates from SQLite.
- Merge them with prior still-active candidate entries.
- Remove candidates that were explicitly rejected or superseded.
- Write a canonical ordered YAML file.
Important rule:
candidates.yamlshould not be patched incrementally by the LLM.- It should be rendered from normalized database state by deterministic code.
Cloudflare top domains are the main search seed, but they are not the direct answer.
The search problem is:
- start from high-value domains
- infer likely subdomain patterns
- test likely OIDC deployment shapes
- learn from prior outcomes
That means the planner and investigator need access to:
- domain popularity input
- prior tactic performance
- prior invalid patterns
- prior successful domain families
- current known-set exclusions
A tactic should be a first-class concept in the database and reporting.
Examples of tactic categories:
- common auth prefixes
- organization-specific naming patterns
- sector-specific conventions
- issuer reuse from related domains
- candidate expansion from previously successful domains
Recommended tactic fields:
tactic_idnamedescriptioninput_scopehistorical_success_ratehistorical_false_positive_ratelast_used_run_id
This lets the planner improve over time instead of repeating the same exploration blindly.
The prior Go implementation established a few useful principles that should be retained:
- SQLite-backed skip logic
- concurrent deterministic probing
- prefix-driven discovery as one tactic
- persistent storage of known results
What should change:
- the system should no longer be a single-purpose batch probe utility
- the search strategy should be dynamic and tactic-driven
- candidate handling should be explicit
- initialization from the live catalog and prior candidates should be mandatory
- the agent should be able to investigate adaptively through bounded Python code
When implementation begins, prioritize in this order:
- database schema
- catalog and candidate importers
- deterministic verification tools
- candidate export logic
- bounded investigation code-execution tool
- ADK workflow wiring
That order matters because the durable operating model is more important than prompt behavior.