Add Go tool to extract full Numerai GraphQL schema via introspection#1
Add Go tool to extract full Numerai GraphQL schema via introspection#1lingster wants to merge 4 commits into
Conversation
Sends a standard GraphQL introspection query to https://api-tournament.numer.ai/ and renders the result as SDL (Schema Definition Language), saved to schema.graphql. https://claude.ai/code/session_017vQNLGRmNurR5LwFhooNHJ
Greptile SummaryThis PR adds a self-contained Go CLI (
Confidence Score: 4/5Safe to merge after addressing the missing HTTP status code check, which produces misleading errors on non-200 responses One P1 finding (no HTTP status check) means the tool can emit confusing errors when the API is unavailable or returns an error. All other findings are P2 style/robustness suggestions. Score reflects that the P1 is easy to fix and the tool otherwise works correctly for the happy path. graphql-schema-extractor/main.go — specifically the fetchSchema function's response handling Important Files Changed
Sequence DiagramsequenceDiagram
participant main
participant fetchSchema
participant Numerai API
participant renderSchema
participant stdout
main->>fetchSchema: fetchSchema()
fetchSchema->>fetchSchema: json.Marshal(introspectionQuery)
fetchSchema->>Numerai API: POST https://api-tournament.numer.ai/
Numerai API-->>fetchSchema: JSON introspection response
fetchSchema->>fetchSchema: io.ReadAll(resp.Body)
Note over fetchSchema: ⚠️ HTTP status not checked here
fetchSchema->>fetchSchema: json.Unmarshal → introspectionSchema
fetchSchema-->>main: *introspectionSchema
main->>renderSchema: renderSchema(schema)
renderSchema->>renderSchema: filter builtins, group by kind, sort
renderSchema->>renderSchema: renderType() per SCALAR/ENUM/INTERFACE/UNION/INPUT_OBJECT/OBJECT
renderSchema-->>main: SDL string
main->>stdout: fmt.Print(sdl)
main->>stdout: "Done. Types: N, Directives: N" (stderr)
Prompt To Fix All With AIThis is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 186-203
Comment:
**HTTP status code never checked**
`fetchSchema` reads and parses the body regardless of the HTTP status. If the API returns a `4xx` or `5xx`, `json.Unmarshal` will receive an HTML error page or a non-introspection JSON body and emit a cryptic parse error (e.g. `"invalid character '<' looking for beginning of value"`), hiding the real cause. Check `resp.StatusCode` before reading the body.
```suggestion
raw, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("reading response: %w", err)
}
if resp.StatusCode != http.StatusOK {
preview := string(raw)
if len(preview) > 200 {
preview = preview[:200] + "..."
}
return nil, fmt.Errorf("unexpected HTTP %d: %s", resp.StatusCode, preview)
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 174-184
Comment:
**No HTTP timeout configured**
`http.Post` uses `http.DefaultClient`, which has no timeout. If the Numerai endpoint is slow or hangs, the process will block forever. Prefer an explicit client with a timeout.
Also add `"time"` to the imports.
```suggestion
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body))
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 221-230
Comment:
**Multiline description block missing per-line indentation**
For descriptions that contain embedded newlines, only the first content line receives the `indent` prefix — inner newlines leave subsequent lines flush with the left margin, which can produce visually inconsistent SDL (though GraphQL's block-string whitespace rules still parse it correctly).
Consider applying `indent` to every line of the escaped content, e.g. via `strings.ReplaceAll(escaped, "\n", "\n"+indent)`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 248-252
Comment:
**Hardcoded indentation in `renderArgs`**
The multi-arg branch uses literal `" "` (4 spaces) and `" )"` (2 spaces), ignoring the caller's indentation context. When field args appear inside nested types, the closing `)` may not line up with the field's own indentation. Passing an `indent` parameter (like `descriptionBlock` does) would make this consistent.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Add Go tool to extract full Numerai Grap..." | Re-trigger Greptile |
| raw, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("reading response: %w", err) | ||
| } | ||
|
|
||
| var result introspectionResponse | ||
| if err := json.Unmarshal(raw, &result); err != nil { | ||
| return nil, fmt.Errorf("parsing response: %w", err) | ||
| } | ||
|
|
||
| if len(result.Errors) > 0 { | ||
| msgs := make([]string, len(result.Errors)) | ||
| for i, e := range result.Errors { | ||
| msgs[i] = e.Message | ||
| } | ||
| return nil, fmt.Errorf("graphql errors: %s", strings.Join(msgs, "; ")) | ||
| } | ||
|
|
There was a problem hiding this comment.
HTTP status code never checked
fetchSchema reads and parses the body regardless of the HTTP status. If the API returns a 4xx or 5xx, json.Unmarshal will receive an HTML error page or a non-introspection JSON body and emit a cryptic parse error (e.g. "invalid character '<' looking for beginning of value"), hiding the real cause. Check resp.StatusCode before reading the body.
| raw, err := io.ReadAll(resp.Body) | |
| if err != nil { | |
| return nil, fmt.Errorf("reading response: %w", err) | |
| } | |
| var result introspectionResponse | |
| if err := json.Unmarshal(raw, &result); err != nil { | |
| return nil, fmt.Errorf("parsing response: %w", err) | |
| } | |
| if len(result.Errors) > 0 { | |
| msgs := make([]string, len(result.Errors)) | |
| for i, e := range result.Errors { | |
| msgs[i] = e.Message | |
| } | |
| return nil, fmt.Errorf("graphql errors: %s", strings.Join(msgs, "; ")) | |
| } | |
| raw, err := io.ReadAll(resp.Body) | |
| if err != nil { | |
| return nil, fmt.Errorf("reading response: %w", err) | |
| } | |
| if resp.StatusCode != http.StatusOK { | |
| preview := string(raw) | |
| if len(preview) > 200 { | |
| preview = preview[:200] + "..." | |
| } | |
| return nil, fmt.Errorf("unexpected HTTP %d: %s", resp.StatusCode, preview) | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 186-203
Comment:
**HTTP status code never checked**
`fetchSchema` reads and parses the body regardless of the HTTP status. If the API returns a `4xx` or `5xx`, `json.Unmarshal` will receive an HTML error page or a non-introspection JSON body and emit a cryptic parse error (e.g. `"invalid character '<' looking for beginning of value"`), hiding the real cause. Check `resp.StatusCode` before reading the body.
```suggestion
raw, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("reading response: %w", err)
}
if resp.StatusCode != http.StatusOK {
preview := string(raw)
if len(preview) > 200 {
preview = preview[:200] + "..."
}
return nil, fmt.Errorf("unexpected HTTP %d: %s", resp.StatusCode, preview)
}
```
How can I resolve this? If you propose a fix, please make it concise.| func fetchSchema() (*introspectionSchema, error) { | ||
| body, err := json.Marshal(map[string]string{"query": introspectionQuery}) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| resp, err := http.Post(endpoint, "application/json", bytes.NewReader(body)) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("request failed: %w", err) | ||
| } | ||
| defer resp.Body.Close() |
There was a problem hiding this comment.
http.Post uses http.DefaultClient, which has no timeout. If the Numerai endpoint is slow or hangs, the process will block forever. Prefer an explicit client with a timeout.
Also add "time" to the imports.
| func fetchSchema() (*introspectionSchema, error) { | |
| body, err := json.Marshal(map[string]string{"query": introspectionQuery}) | |
| if err != nil { | |
| return nil, err | |
| } | |
| resp, err := http.Post(endpoint, "application/json", bytes.NewReader(body)) | |
| if err != nil { | |
| return nil, fmt.Errorf("request failed: %w", err) | |
| } | |
| defer resp.Body.Close() | |
| client := &http.Client{Timeout: 30 * time.Second} | |
| resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body)) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 174-184
Comment:
**No HTTP timeout configured**
`http.Post` uses `http.DefaultClient`, which has no timeout. If the Numerai endpoint is slow or hangs, the process will block forever. Prefer an explicit client with a timeout.
Also add `"time"` to the imports.
```suggestion
client := &http.Client{Timeout: 30 * time.Second}
resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body))
```
How can I resolve this? If you propose a fix, please make it concise.| func descriptionBlock(desc, indent string) string { | ||
| if desc == "" { | ||
| return "" | ||
| } | ||
| escaped := strings.ReplaceAll(desc, `"""`, `\"""`) | ||
| if !strings.Contains(escaped, "\n") { | ||
| return fmt.Sprintf("%s\"\"\"%s\"\"\"\n", indent, escaped) | ||
| } | ||
| return fmt.Sprintf("%s\"\"\"\n%s%s\n%s\"\"\"\n", indent, indent, escaped, indent) | ||
| } |
There was a problem hiding this comment.
Multiline description block missing per-line indentation
For descriptions that contain embedded newlines, only the first content line receives the indent prefix — inner newlines leave subsequent lines flush with the left margin, which can produce visually inconsistent SDL (though GraphQL's block-string whitespace rules still parse it correctly).
Consider applying indent to every line of the escaped content, e.g. via strings.ReplaceAll(escaped, "\n", "\n"+indent).
Prompt To Fix With AI
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 221-230
Comment:
**Multiline description block missing per-line indentation**
For descriptions that contain embedded newlines, only the first content line receives the `indent` prefix — inner newlines leave subsequent lines flush with the left margin, which can produce visually inconsistent SDL (though GraphQL's block-string whitespace rules still parse it correctly).
Consider applying `indent` to every line of the escaped content, e.g. via `strings.ReplaceAll(escaped, "\n", "\n"+indent)`.
How can I resolve this? If you propose a fix, please make it concise.| if len(parts) == 1 { | ||
| return "(" + parts[0] + ")" | ||
| } | ||
| return "(\n " + strings.Join(parts, "\n ") + "\n )" | ||
| } |
There was a problem hiding this comment.
Hardcoded indentation in
renderArgs
The multi-arg branch uses literal " " (4 spaces) and " )" (2 spaces), ignoring the caller's indentation context. When field args appear inside nested types, the closing ) may not line up with the field's own indentation. Passing an indent parameter (like descriptionBlock does) would make this consistent.
Prompt To Fix With AI
This is a comment left during a code review.
Path: graphql-schema-extractor/main.go
Line: 248-252
Comment:
**Hardcoded indentation in `renderArgs`**
The multi-arg branch uses literal `" "` (4 spaces) and `" )"` (2 spaces), ignoring the caller's indentation context. When field args appear inside nested types, the closing `)` may not line up with the field's own indentation. Passing an `indent` parameter (like `descriptionBlock` does) would make this consistent.
How can I resolve this? If you propose a fix, please make it concise.- graphql-schema-extractor/cmd/query/main.go: CLI tool that POSTs arbitrary GraphQL queries to api-tournament.numer.ai, supports variables JSON, optional bearer auth (flag or NUMERAI_TOKEN env), and pretty/raw output - .claude/skills/numerai-graphql-query/SKILL.md: Claude skill with query templates, score metric glossary, and field usage notes backed by the extracted schema.graphql https://claude.ai/code/session_017vQNLGRmNurR5LwFhooNHJ
| if reason == "" { | ||
| reason = "No longer supported" | ||
| } | ||
| deprecated = fmt.Sprintf(` @deprecated(reason: "%s")`, reason) |
There was a problem hiding this comment.
🟡 Deprecation reason strings are not escaped, producing invalid SDL when reasons contain double quotes or newlines
At lines 278 and 317, DeprecationReason is interpolated directly into a double-quoted GraphQL string via fmt.Sprintf( @deprecated(reason: "%s"), reason) without escaping ", \, or newline characters. If the upstream GraphQL server returns a deprecation reason containing a double quote (e.g., Use "v2" endpoint) or a newline, the generated SDL will be syntactically invalid — the string literal will be prematurely terminated. Regular GraphQL strings require " to be escaped as \" and cannot contain raw newlines.
Prompt for agents
In graphql-schema-extractor/main.go, the deprecation reason string is inserted into a double-quoted GraphQL string without escaping at two locations: line 278 and line 317. Both use fmt.Sprintf(` @deprecated(reason: "%s")`, reason).
The fix is to escape the reason string for use inside a GraphQL double-quoted string literal before interpolation. At minimum, backslashes should be replaced with \\, double quotes with \", and newlines with \n. Consider adding a helper function like:
func escapeGraphQLString(s string) string {
s = strings.ReplaceAll(s, `\`, `\\`)
s = strings.ReplaceAll(s, `"`, `\"`)
s = strings.ReplaceAll(s, "\n", `\n`)
return s
}
Then use it in both locations:
deprecated = fmt.Sprintf(` @deprecated(reason: "%s")`, escapeGraphQLString(reason))
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 271eb91d85
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return nil, err | ||
| } | ||
|
|
||
| resp, err := http.Post(endpoint, "application/json", bytes.NewReader(body)) |
There was a problem hiding this comment.
Set an HTTP timeout for introspection request
fetchSchema uses http.Post, which relies on the default client with no timeout. If the endpoint or network path stalls (e.g., hanging TLS/proxy connection), this tool can block indefinitely and never produce output or an error, which is especially problematic in automation where schema refresh jobs need bounded runtime.
Useful? React with 👍 / 👎.
| if reason == "" { | ||
| reason = "No longer supported" | ||
| } | ||
| deprecated = fmt.Sprintf(` @deprecated(reason: "%s")`, reason) |
There was a problem hiding this comment.
Escape deprecation reasons before emitting SDL
The generated SDL inserts deprecation reasons directly into a quoted string. If a GraphQL deprecation reason contains ", backslashes, or newlines, the emitted @deprecated(reason: "...") becomes invalid SDL and downstream parsers will fail to load the schema. Escaping should be applied before formatting the reason string.
Useful? React with 👍 / 👎.
| { | ||
| v2RoundModelPerformances(modelId: "MODEL_UUID", tournament: 8, lastNRounds: 20) { | ||
| roundNumber roundResolved roundOpenTime | ||
| corr corrPercentile mmc mmcPercentile tc tcPercentile | ||
| payout atRisk | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 SKILL.md v2RoundModelPerformances template queries fields that don't exist on V2RoundModelPerformance
The "Model performance (last N rounds)" template queries corr, corrPercentile, mmc, mmcPercentile, tc, tcPercentile, and payout on the result of v2RoundModelPerformances. According to the schema extracted by this same PR (graphql-schema-extractor/schema.graphql:2202-2230), V2RoundModelPerformance does not have any of these fields. Those fields exist on the different type RoundModelPerformance (graphql-schema-extractor/schema.graphql:1680-1734), which is returned by v3UserProfile.roundModelPerformances, not by v2RoundModelPerformances. Any agent using this template will get a GraphQL error. The same issue affects the "Model performance for a specific round" template at lines 96-97 which also queries corr mmc tc on the same non-existent type.
Prompt for agents
The SKILL.md template for 'Model performance (last N rounds)' uses v2RoundModelPerformances but requests fields (corr, corrPercentile, mmc, mmcPercentile, tc, tcPercentile, payout) that belong to the RoundModelPerformance type, not V2RoundModelPerformance. V2RoundModelPerformance exposes scores through nested submissionScores and intraRoundSubmissionScores (of type SubmissionScore with fields: displayName, value, percentile, day, etc.). The template should be rewritten to use submissionScores for score data, or alternatively use v3UserProfile with roundModelPerformances which does return the score fields directly. The same fix should be applied to the 'Model performance for a specific round' template at lines 94-101 which also incorrectly queries corr, mmc, tc on V2RoundModelPerformance.
Was this helpful? React with 👍 or 👎 to provide feedback.
| ```graphql | ||
| { | ||
| signalsLeaderboard(limit: 10, orderBy: "corrRep", direction: "desc") { | ||
| username rank corrRep mmcRep tcRep nmrStaked return52Weeks |
There was a problem hiding this comment.
🟡 SKILL.md Signals leaderboard template queries non-existent corrRep field
The Signals leaderboard template at line 129 selects corrRep as a field on the result of signalsLeaderboard, but SignalsLeaderboardEntry (graphql-schema-extractor/schema.graphql:1795-1874) has no corrRep field. The type has corrRank, various qualified rep fields (corr20Rep, corr60Rep, corrV4Rep, alphaRep, etc.), and a generic reputation field, but no plain corrRep. This query will produce a GraphQL error when used.
| username rank corrRep mmcRep tcRep nmrStaked return52Weeks | |
| username rank corr60Rep mmcRep tcRep nmrStaked return52Weeks |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if *raw { | ||
| os.Stdout.Write(respBody) | ||
| return | ||
| } |
There was a problem hiding this comment.
🟡 Raw mode (-raw) always exits 0, even when the response contains GraphQL errors
The non-raw code path (lines 105-109) explicitly checks for GraphQL errors in the response and calls os.Exit(2), but the raw mode path at lines 87-90 unconditionally returns with exit code 0. Scripts or agents that use -raw for machine-readable output and rely on the exit code to detect errors will miss GraphQL failures.
| if *raw { | |
| os.Stdout.Write(respBody) | |
| return | |
| } | |
| if *raw { | |
| os.Stdout.Write(respBody) | |
| var rawResult map[string]any | |
| if err := json.Unmarshal(respBody, &rawResult); err == nil { | |
| if errs, ok := rawResult["errors"]; ok && errs != nil { | |
| os.Exit(2) | |
| } | |
| } | |
| return | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
numerai-mcp-cloudflare/ — a remote MCP server deployable to Cloudflare Workers using McpAgent (Durable Objects) from the `agents` package. Tools exposed (13 total): graphql_query — arbitrary escape-hatch query list_tournaments — all active tournaments get_rounds — round list with status/number filters get_pipeline_status — scoring/data pipeline ETAs get_account — authenticated account info + balances get_account_profile — public profile + model UUIDs by username get_leaderboard — paginated/sorted account leaderboard get_model — model metadata by UUID get_model_performances — per-round corr/mmc/tc/payout history get_model_profile — public model profile with ranks and reps get_round_details — aggregate round stats and (optionally) all model scores list_datasets — dataset filenames for a round get_dataset_url — presigned download URL for a dataset file get_currency_price — NMR/USD or any pair exchange rate Auth: NUMERAI_PUBLIC_ID and NUMERAI_SECRET_KEY set as wrangler secrets. Deploy: npx wrangler deploy Dev: npx wrangler dev (MCP endpoint at http://localhost:8787/mcp) https://claude.ai/code/session_017vQNLGRmNurR5LwFhooNHJ
| ) { | ||
| roundNumber roundOpenTime roundResolveTime roundResolved roundPayoutFactor roundTarget | ||
| corrMultiplier mmcMultiplier tcMultiplier | ||
| atRisk payout |
There was a problem hiding this comment.
🔴 GraphQL query requests non-existent payout field on V2RoundModelPerformance
The get_model_performances tool queries payout on V2RoundModelPerformance, but this field does not exist on that type (see graphql-schema-extractor/schema.graphql:2202-2230). The payout: Nmr field only exists on the older RoundModelPerformance type (schema.graphql:1720), which is a different type. This will cause a GraphQL error every time the tool is invoked. The payout information is available through submissionScores { payoutPending payoutSettled } which the query already requests, so payout should simply be removed.
| atRisk payout | |
| atRisk |
Was this helpful? React with 👍 or 👎 to provide feedback.
| latestRanks { corr corrV4 corr20 corr60 mmc mmc60 tc fnc fncV4 ic icV2 alpha ric mpc } | ||
| latestReps { corr corrV4 corr20 corr60 mmc mmc60 tc fnc fncV4 ic icV2 alpha ric mpc } |
There was a problem hiding this comment.
🔴 GraphQL query requests non-existent corr20 field on Ranks and Reps types
The get_model_profile tool queries corr20 in both latestRanks and latestReps, but neither the Ranks type (schema.graphql:916-937) nor the Reps type (schema.graphql:947-968) has a corr20 field. They have corr20V2 and corr20d instead. This causes a GraphQL error when the tool is invoked. The field name should be changed to corr20V2 or corr20d depending on which metric was intended.
| latestRanks { corr corrV4 corr20 corr60 mmc mmc60 tc fnc fncV4 ic icV2 alpha ric mpc } | |
| latestReps { corr corrV4 corr20 corr60 mmc mmc60 tc fnc fncV4 ic icV2 alpha ric mpc } | |
| latestRanks { corr corrV4 corr20V2 corr60 mmc mmc60 tc fnc fncV4 ic icV2 alpha ric mpc } | |
| latestReps { corr corrV4 corr20V2 corr60 mmc mmc60 tc fnc fncV4 ic icV2 alpha ric mpc } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| `query($id: ID) { | ||
| model(modelId: $id) { | ||
| id name tournament archived computeEnabled | ||
| v2Stake { stakeValue latestValue status pendingV2ChangeStakeRequest { amount type dueDate } } |
There was a problem hiding this comment.
🔴 GraphQL query requests non-existent amount field on V2ChangeStakeRequest
The get_model tool queries amount from pendingV2ChangeStakeRequest (type V2ChangeStakeRequest). This type has no amount field — the correct field name is requestedAmount (graphql-schema-extractor/schema.graphql:2025-2032). GraphQL validates queries statically against the schema, so this error occurs regardless of whether a pending stake change exists at runtime, making the get_model tool non-functional.
Was this helpful? React with 👍 or 👎 to provide feedback.
… tool
Each McpAgent (Durable Object) instance gets its own isolated storage.
Credentials are written to that storage at runtime rather than baked in
as Worker secrets, so different users can authenticate independently.
New tools:
authenticate(public_id, secret_key) — validates creds against the live
API then stores them in this session's DO storage; throws on bad creds
sign_out() — removes stored credentials from the session
auth_status() — shows current auth state and which account is active
Auth resolution order per request:
1. Session DO storage (set by authenticate)
2. NUMERAI_PUBLIC_ID + NUMERAI_SECRET_KEY env vars (single-user fallback)
3. Unauthenticated (public queries still work)
registerTools() now takes AuthCallbacks instead of Env, keeping storage
access out of the tools module.
https://claude.ai/code/session_017vQNLGRmNurR5LwFhooNHJ
Sends a standard GraphQL introspection query to https://api-tournament.numer.ai/
and renders the result as SDL (Schema Definition Language), saved to schema.graphql.
https://claude.ai/code/session_017vQNLGRmNurR5LwFhooNHJ