Skip to content

Major Platform and Architecture Update#67

Open
Allan-Feng wants to merge 91 commits into
Open-Finance-Lab:mainfrom
Allan-Feng:main
Open

Major Platform and Architecture Update#67
Allan-Feng wants to merge 91 commits into
Open-Finance-Lab:mainfrom
Allan-Feng:main

Conversation

@Allan-Feng

Copy link
Copy Markdown
Collaborator

This PR introduces a broader update to the Agentic Trading Lab platform.

  • Reorganized the backend into clearer API, domain, and infrastructure layers
  • Added a Protocol v1 SDK for connecting and running external trading agents
  • Improved agent management, backtesting, paper trading, and leaderboard workflows
  • Updated the Playground and My Agents interface and navigation
  • Expanded leaderboard support for different LLM models and strategies
  • Removed legacy modules, standardized imports, and improved tests and documentation

Allan-Feng and others added 30 commits June 23, 2026 00:13
Switch Docker and Render deployment to the canonical ASGI package target
`uvicorn dashboard.backend.app:app` (PORT-aware, no PYTHONPATH/path hacks),
replacing the deprecated `python dashboard/backend/app.py` startup.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
FlyM1ss and others added 30 commits July 4, 2026 16:20
If an agent's decide() plus submission took longer than the environment's
per-step decision window (default 30s), submit_decision raised ATLConflictError
and the SDK's AgentRunner let it abort the WHOLE run — even though the backend
had auto-held that one step and the run was still live.

- Catch ATLConflictError with code in {decision_deadline_exceeded,
  step_already_finalized} around submit_decision and advance to the next step
  instead of aborting. on_execution_result does not fire for the auto-held step.
- max_steps early exit now returns the metrics gathered so far (status
  "running") instead of calling /result, which 409s on an unfinalized run.
- Attach run.id to every backend error raised inside run_backtest
  (ATLAPIError.with_run_id, preserves the traceback) so callers can locate the
  failing run.
- Validate poll_interval > 0 (0 was a busy-loop footgun: _wait's
  `sleep_for or poll_interval` fallback meant a 0 interval never advanced the
  idle timer). Document the 30s window in the README.

Adversarial verify (sonnet) surfaced that a genuinely-late decision surfaces as
`step_already_finalized`, NOT `decision_deadline_exceeded` — the backend applies
the elapsed-deadline auto-hold during status reconciliation before re-checking
the submitted step. The catch-set already covers it, but nothing locked which
code the backend emits (the SDK tests fabricate the string), so a future cleanup
could drop step_already_finalized and silently regress H5. Added a cross-package
backend contract test (test_late_decision_returns_autoheld_code) that asserts the
real code a late /decision produces, and cross-referenced it from the SDK.

Tests: +6 SDK runner tests (deadline advance, step_already_finalized advance,
non-autoheld conflict still raises w/ run_id, max_steps→metrics, poll_interval
guard, run_id attach) — all red-green verified; +1 backend contract test.
packaging: 36 passed. backend: 5 failed (pre-existing) / 631 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
… (H6)

An LLM leaderboard entry that silently fell back to rule-based trading — no
client (missing key/SDK) or a model id the active gateway rejected so every call
failed — was still persisted under its LLM model name, showing a rule-based
curve as if the model produced it. With only ANTHROPIC_API_KEY the 5 gateway-slug
entries all fell back and published identical rule-based curves.

- Add `_reject_if_llm_fallback`: refuse to publish when an LLM strategy reports
  used_llm=False or llm_calls==0, unless allow_fallback=True. Rule-based
  baselines expose no `used_llm` and pass through untouched. Applied on BOTH
  insert paths — deploy_model_run AND ensure_leaderboard_runs (belt-and-
  suspenders, so a misconfigured LLM entry left on the auto-compute path can't
  slip a fallback onto the board).
- Thread `--allow-fallback` through scripts/deploy_leaderboard_model.py; the CLI
  prints a clear message and exits non-zero when a fallback is refused.
- Make llm_agent.py's default model id gateway-aware (default_model_name() vs a
  hardcoded native id) so an entry without an explicit model id matches the
  gateway make_llm_client actually built.

Adversarial verify (sonnet) confirmed: only these two paths write leaderboard
rows (grep of every insert_run call site); no false positives (an all-HOLD LLM
run still has llm_calls>0 — the counter increments per completed call, not per
trade); no false negatives (a per-request model rejection surfaces as
used_llm=True/llm_calls=0 and is caught).

Documented limitations (not defects): (a) the guard fires on new writes only —
any fallback published before this patch keeps being served from cache until the
entry is re-deployed with --force; (b) full model_id↔gateway reconciliation for
the configured entries (non-Anthropic models need CommonStack) is a deployment
decision — the guard now makes those entries refuse rather than publish fakes.

Tests: +7 leaderboard tests (refuse on used_llm=False / llm_calls==0, allow_
fallback override, real LLM publishes, baseline unaffected, auto-compute path
guarded, gateway-aware default) — red-green verified; updated the llm_agent
canonical-import characterization test. Full suite: 5 failed (pre-existing) /
638 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…l id (H7)

A built-in agent's model_name defaults to the sentinel 'local-model'. The Discord
bot forwarded it verbatim: /ask passed model='local-model' to chat_with_agent
(asking the API to call a model literally named 'local-model' → broken), and
/backtest set payload['model']='local-model' (mislabeled / failed run).

- Add token_cost.is_free_model(model): True for sentinel / rule-based / local
  markers (the existing _FREE_MODEL_MARKERS) or empty — a reusable predicate for
  "names no real paid LLM".
- In discord_bot, map a sentinel model_name to None via _model_override at both
  forwarding sites (/ask and /backtest), so the server picks its default instead
  of receiving a bogus model id. Real model ids pass through unchanged.
  chat_with_agent already treats model=None as "use default".

Tests: is_free_model unit tests (sentinels/empty → True, real ids → False);
a behavioral _model_override test in the discord suite (guarded by
importorskip('discord'), runs where the optional dep is present); and a
source-level wiring guard (tests/integrations) that runs everywhere — since
discord is an undeclared optional dep the behavioral test is skipped in the base
env, so the source guard locks the two call sites there. Red-green verified
(is_free_model + wiring both go red against the original). Full suite: 5 failed
(pre-existing) / 642 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
The dashboard invented data when the backend had none or was unreachable:
- an empty agents list, and any /api/v1/agents failure, fell back to 9 hardcoded
  MOCK_AGENTS — masking a real outage as if agents existed;
- "My Portfolio" rendered a hardcoded $128,742.34 account with fake holdings.

- Gate MOCK_AGENTS behind demo mode only (isDemoMode(): ?demo query flag, or a
  localhost/127.0.0.1/file host). On production (vercel/onrender) real users now
  see the genuine empty-state, and an API failure shows a distinct error-state
  (renderAgentsError) instead of fake agents.
- Add a prominent "SAMPLE DATA" badge next to the "My Portfolio" heading so the
  illustrative mock is clearly not a real brokerage account. (Full /paper/*
  wiring is a larger feature; the badge makes the current mock honest.)

The frontend is vanilla JS with no test harness, so verification is: node --check
(both files parse); a node behavioral check of the extracted isDemoMode across 7
host/query cases (production hosts -> false, localhost/?demo -> true, ?demo=0 ->
false); and source-guard tests (tests/integrations/test_frontend_no_mock_data.py)
that run in CI and lock the wiring (MOCK_AGENTS demo-gated, error-state + sample
badge present, old "using mock data" fallback removed). Red-green verified. Full
backend suite: 5 failed (pre-existing) / 645 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…UDE.md (H9)

The docs told users to start the server with `python3 dashboard/backend/app.py`,
which is broken after the package refactor: the top-level `dashboard.backend.*`
imports fail unless the repo root is on sys.path. And this branch had no root
CLAUDE.md, so a merge would inherit main's flat-imports guidance — which
literally instructs undoing this PR's core change.

- getting_started.rst: run the server via `uvicorn dashboard.backend.app:app
  --reload` (canonical, matches render.yaml + Dockerfile) or `python -m
  dashboard.backend.app`; drop the broken direct-file command.
- dashboard-target-structure.md: the `__main__` block is a real `python -m`
  entrypoint (canonical import string), not just a deprecated shim kept for stale
  docs; note that running the file directly does not work and why.
- Add a root CLAUDE.md documenting the packaged contract (`dashboard.backend`
  package imports, uvicorn run command, api/routers + domain/* layout, the
  DATABASE_PATH-backed stores). It carries a header note that it SUPERSEDES the
  flat-imports CLAUDE.md on main and must win at merge — this is the "coordinate
  with main" reconciliation the review flagged (same class as the /api/v1 vs
  /api/v2 merge decision). app.py's `__main__` block was already correct
  (uvicorn.run with the canonical import string) — left as-is.

Tests: doc-guard tests (tests/integrations/test_docs_run_command.py) that run in
CI and lock the fix — getting_started documents a working command, CLAUDE.md
describes the package contract (not flat imports). Red-green verified. Full
suite: 5 failed (pre-existing) / 647 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…ew branch

House-cleaning: version the 91-agent adversarial-review fix checklist
alongside the code it tracks. B0/B1 + H1-H9 are done; MEDIUM/LOW pending.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…pen-Finance-Lab#5)

GET /app/ served app.html directly. app.html references its assets with
relative paths (styles.css, app.js, images/...), which a browser on /app/
resolves against the /app/ base (/app/styles.css -> 404) — the dashboard
renders unstyled. Serve /app directly and 308-redirect (method-preserving)
the trailing-slash variant to /app so relative assets resolve against root.

Route contract unchanged (both /app and /app/ were already registered).
+2 tests (test_static_routes.py), red-green verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…Open-Finance-Lab#11)

discord_bot.py imports `discord` (discord.py 2.x: app_commands, ui.View,
Interaction, Intents), but the dep was undeclared in every requirements
file — the bot was unrunnable from declared deps. Declare it in an optional
requirements-discord.txt (mirroring requirements-sphinx.txt for docs) rather
than core requirements.txt, so web/API/backtest installs stay lean, and point
contributors at it in CLAUDE.md.

+1 source-guard test (runs without discord installed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
… (MEDIUM Open-Finance-Lab#6)

strategy.html is a standalone shared-link page and used
`const API = location.origin` unconditionally. That works locally (frontend +
backend share localhost:8000) but breaks on Vercel, where the static frontend
and the API are on different origins — every API call (strategy fetch,
/backtest/run, status polling) would hit the frontend host and 404. Replicate
app.js's localhost-vs-hosted resolution (falls back to
https://agentictrading.onrender.com). Also drop the hardcoded default dates for
a runtime past-7-days initializer.

+2 source-guard tests (run in CI without a browser); embedded JS node --check'd.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…anager docstring (MEDIUM Open-Finance-Lab#7)

The module docstring claimed the class was "Moved verbatim" and "functionally
identical to the post-Phase-2C2 implementation". That is false for the
safe_trading candidate selection in make_trading_decision_with_llm: the
pre-refactor code ranked the top-10 candidates by RSI extremity (|RSI-50|, a
mean-reversion heuristic); this version ranks the top-12 by a multi-factor
trend/momentum score AND always appends current holdings. That is a deliberate
strategy change bundled into the refactor, so backtests before/after this commit
are not directly comparable. Correct the docstring to disclose the divergence
(the inline comment on the branch was already honest; the module docstring was not).

+3 characterization tests (test_portfolio_manager_move.py): trend-based ranking
excludes a deeply-oversold no-trend name the old RSI ranking would surface first;
current holdings are force-included; docstring no longer claims verbatim identity.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…IUM Open-Finance-Lab#4)

POST /api/strategies is public by design (shared links must work without a
session), but had no prompt size cap and no write throttle — an anonymous client
could persist unbounded, megabyte-sized prompts without limit, bloating the DB.

- CreateStrategyBody.prompt: max_length=5000 (422 on oversized, before any DB write).
- Per-client write rate limit (30/hour, keyed by session/browser-id or peer host)
  via a new reusable api/rate_limit.FixedWindowRateLimiter (best-effort in-process
  abuse control, documented as such; reused by the /backtest/run fix). 429 on excess.
- `owner` documented as a display-only attribution label (e.g. "discord:<id>"),
  never an auth control — unchanged, so the Discord bot / frontend still work.

+6 tests (limiter units with injected clock + endpoint 422/429/ok). Full suite:
5 failed (pre-existing) / 661 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Open-Finance-Lab#3)

get_run_plot was `async def` but did fully synchronous, CPU-bound matplotlib
rendering (Figure build + savefig) plus blocking SQLite reads inline — every
plot request stalled the event loop for the whole render, and a burst (e.g. from
the Discord bot) serialized all server traffic behind it. It also re-imported
matplotlib and re-called matplotlib.use("Agg") on every request.

- Make the handler a plain `def` so FastAPI offloads it to the threadpool.
- Hoist matplotlib import + Agg backend to module scope (configured once).
- Extract the render into an @lru_cache(maxsize=128) `_render_run_plot_png`;
  a run's equity data is immutable and run_ids are unique, so bytes are reused
  without re-querying/re-rendering. 404s raise (not cached) so late data is
  still picked up.

Route contract unchanged (same path + handler name). +4 tests. (The public
plot.png stays session-exempt by design — it's an <img> embed that can't send
X-Session-Id; noted for the ownership follow-up.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…DIUM Open-Finance-Lab#2)

POST /backtest/run is reachable with only a (self-minted) X-Session-Id and spends
real operator LLM credits per trading hour of the run, with zero validation: an
anonymous caller could force the most expensive model, an oversized prompt, and a
multi-year date range — hundreds of paid LLM calls per request.

Validate the merged effective params (they arrive as query OR body) in the handler:
- model must be a known/priced id or free/local marker (new token_cost.is_known_model,
  single source of truth with _PRICING_TABLE) — 422 otherwise;
- strategy_prompt capped at 4000 chars (it's injected into every call) — 422;
- date range must be YYYY-MM-DD, ordered, and <= 31 days — 422.
Plus a per-client run budget (10/hour, reusing api.rate_limit) to cap serial abuse
beyond the existing concurrent-run guard — 429. Rejections happen before any thread
is scheduled.

+7 tests. Full suite: 5 failed (pre-existing) / 672 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…EDIUM Open-Finance-Lab#8)

The protocol documented decision_deadline_exceeded for a late decision, but it
was effectively dead: submit_decision calls session.get_status() first, which
auto-holds the expired step and advances step_index, so control hits the earlier
`seq < current_index` branch — which only knew step_already_finalized. The
dedicated deadline raise at submit_decisions() was reachable only in the razor-
thin window where the deadline elapses mid-call.

Distinguish the two by consulting the engine decision log: a real finalized step
populates step_results_by_seq[seq]; a deadline auto-hold does not, but is logged
with decision_source == "timeout_hold". When prior is None and the source is
timeout_hold, raise the documented decision_deadline_exceeded; a genuine
double-submit (prior set) keeps step_already_finalized. Factor the log lookup
into _step_decision_source (shared with _historical_step_status).

Update the H5 backend contract test to assert the specific code, and correct the
now-stale NOTE in the SDK runner.py (both codes stay in _STEP_AUTOHELD_CODES for
robustness). Protocol suite 29 passed; SDK 36 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…pe all errors (MEDIUM Open-Finance-Lab#9)

Three doc/impl mismatches:
- The run state machine documented `running -> cancelled`, but no route or
  service transition ever produces `cancelled` (grepped backend + SDK: only the
  SDK's defensive _FAILED_STATES mentions it). Drop it (YAGNI) with a note.
- §11 claims "all protocol errors use a consistent envelope", but runs.py raised
  bare-string HTTPException details for ownership/not-found/mismatch — those
  responses had `detail` as a string, breaking envelope parsing. Route all six
  through error_body(): run_not_found, forbidden (x2), agent_version_not_found,
  run_id_mismatch, step_id_mismatch.
- §11's code inventory was stale: 403 now `forbidden`; added agent_version_not_found,
  too_many_orders, run_id/step_id_mismatch, step_not_active/run_not_active/run_completed.

+1 not-found envelope test, extended access-control test, +2 doc-guard tests.
Full suite: 5 failed (pre-existing) / 675 passed / 2 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Open-Finance-Lab#12)

The SDK exposed initial_cash (default 100_000) and always sent it in the run
config, and the README quickstart passed it — but H2 made the backend REJECT any
config.initial_cash != INITIAL_CAPITAL with invalid_config (400). So the knob
could never change anything: the default was a silent no-op and any other value
made a doomed round trip.

Make the SDK honest and consistent with the backend:
- create_run / AgentRunner.run_backtest: initial_cash defaults to None and a
  non-default value is rejected client-side (ATLValidationError,
  code="initial_cash_fixed") — fail fast, no doomed request. The fixed default is
  tolerated for backward compat but omitted from the wire payload.
- Drop initial_cash from the README quickstart.

+2 SDK tests (client-side rejection + fixed-default omitted from payload);
updated the existing create_run test. SDK suite: 38 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Open-Finance-Lab#13)

Most changed-behavior coverage was folded into the per-item TDD (plot.png,
strategies, backtest-abuse, protocol codes, SDK). Close the remaining concrete
gaps in the code this pass touched:
- trend-score ranking survives NaN-indicator bars (early sma50) without crashing
  and ranks such names out of the top-12 rather than surfacing them;
- a custom strategy_prompt is threaded through to create_prompt(custom_prompt=).

(default_model_name()/make_llm_client() env-matrix coverage stays with H6's
gateway work; not re-litigated here.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…IUM Open-Finance-Lab#2 follow-up)

Adversarial review caught that the is_known_model allowlist (matching the pricing
table) 422'd the dashboard UI's OWN dropdown models — gpt-5.2, gpt-5-mini,
deepseek-v4-*, gemini-*-* are not pricing-table families — so selecting them in
the backtest UI would have failed. And since the UI intentionally offers expensive
models (claude-opus-4.7), gating by tier was never the goal.

Replace the pricing allowlist with a model-id FORMAT validator (charset + length):
rejects a garbage/injection string reaching the backtest subprocess, accepts every
legit provider/model slug. Drop the now-unused token_cost.is_known_model and the
docstring's overclaim about blocking "the most expensive model"; note the rate
limit is a best-effort throttle, not a hard cap (the per-request caps are).

Tests parametrize over every UI dropdown option (accept) + malformed ids (reject).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…inance-Lab#2/Open-Finance-Lab#4 follow-up)

The claimed memory-bounding was dead code: `if len(q) >= max_events: if not q:
del ...` can never fire (it needs a bucket both full and empty, impossible for
max_events >= 1), so empty/expired buckets were never reclaimed and per-key state
grew for the process lifetime. Add a max_keys cap: when a new key would exceed it,
sweep buckets whose entire window has expired. Preserves allow/reject/window
semantics.

+1 test (reclamation under max_keys with a fake clock).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…EDIUM Open-Finance-Lab#4 follow-up)

The bot posted /api/strategies with no id header, so the server's write rate
limiter (30/hr) fell back to the peer host — the one bot process's IP — making all
Discord users share a single bucket. Send X-Browser-Id: discord:<user_id> so each
user gets their own budget. +1 wiring guard test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…n-Finance-Lab#3/Open-Finance-Lab#8 follow-up)

_finalize() minted run_id = ext_<second-resolution timestamp> with no uniqueness
guard, persisted as a PRIMARY KEY via INSERT OR REPLACE. Two runs finalizing in
the same second collided: the second overwrote the first's rows. Adversarial
review showed this turns two later fixes into latent bugs — Open-Finance-Lab#3's plot.png cache
would serve the overwritten run's chart forever, and Open-Finance-Lab#8's _step_decision_source
could read a merged decision log. Append a uuid8 suffix (extract _new_ext_run_id;
prefix preserved for baseline_resolver's startswith check). +1 uniqueness test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…IUM Open-Finance-Lab#12 follow-up)

Adversarial review found the client-side initial_cash guard only checked the
kwarg — a caller could still smuggle a non-default via config={"initial_cash": ...}
because config is merged into run_config afterward (and it would then 400 at the
backend, the exact doomed round-trip the fix set out to avoid). Validate the
EFFECTIVE value from kwarg-or-config after the merge, then pop it so it's never
sent. Also drop the now-de-advertised initial_cash= from all four dashboard/examples
SDK scripts, the python-sdk-quickstart doc, and the protocol config example.

+1 test (config-dict bypass rejected). SDK suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…e-Lab#5 follow-up)

The 308 redirect hardcoded url="/app", dropping the query string — but the
frontend deep-links via query params on this route (?auth=login opens the auth
modal, ?view=/?mode= drive navigation, generateShareURL builds shareable links).
A bookmarked /app/?auth=login lost its params. Carry request.url.query through.
+1 test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…MEDIUM Open-Finance-Lab#6 follow-up)

initDateDefaults() formatted the past-7-days defaults with toISOString().slice(0,10),
which is UTC — near local midnight in a non-UTC timezone the shown "today"/"7 days
ago" could be off by a calendar day. Format from local getFullYear/getMonth/getDate.
Guard test updated. Embedded JS node --check'd.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…n holdings test (MEDIUM Open-Finance-Lab#7 follow-up)

Adversarial review: (1) the corrected docstring pinned the RSI->trend ranking
change to this file's Phase-2C3 move, but the change actually landed in an earlier,
unrelated commit — reword to state it was made separately, without claiming when.
(2) test_safe_trading_always_includes_current_holdings used a HELD fixture with
rsi=20 (|20-50|=30), which the OLD RSI-extremity ranking would ALSO surface, so it
didn't isolate the holdings-append behavior. Change to rsi=50 so HELD ranks last
under both schemes and only appears via the holdings-append step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…e-Lab#9 follow-up)

Adversarial review: §11 still omitted codes the impl actually raises. Verified
against the source and added: unsupported_environment (400), result_not_found
(404), too_many_active_runs (429), run_failed (500). Guard test extended to
require them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…pen-Finance-Lab#11 follow-up)

Adversarial review: the bot needs core backend deps too, so
`pip install -r requirements-discord.txt` alone didn't make it runnable (only
discord.py was declared). Add `-r requirements.txt` so the one command installs
everything. Strengthen the guard test to require a real discord.py requirement
line (regex, not a comment match) and the `-r requirements.txt` include.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…list

All actionable MEDIUM items (Open-Finance-Lab#2-Open-Finance-Lab#9,Open-Finance-Lab#11,Open-Finance-Lab#12,Open-Finance-Lab#13) done on pr-67-review with a
10-agent adversarial pass and a second round fixing every confirmed defect.
Per-item SHAs + notes recorded. #1 (v1/v2) and Open-Finance-Lab#10 (landing rebuild) deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…nce-Lab#3 residual)

Re-verification of the ext_ run_id fix found the same collision class in every
other run_id-minting site: engine.py (agent_/buyhold_/djia_index_), the paper
baselines (djia_paper_baseline_/bah_paper_baseline_), and paper_session
({agent_name}_{timestamp}) all built ids from a bare second-resolution timestamp,
written as the agent_runs PRIMARY KEY via INSERT OR REPLACE, and are all servable
through the same unrestricted /runs/{id}/plot.png cache. Append a uuid8 suffix at
each so run_id uniqueness is a real invariant (the plot-cache premise, and Open-Finance-Lab#8's
decision-log lookup, depend on it).

Verified safe: nothing reconstructs a timestamp from run_ids (only startswith
prefix checks + a comma-split); paper-baseline idempotency keys on mode; defaults.json
references only pre-existing seed ids (generation change affects new runs only).
+1 guard test across all four sites. Full suite: 5 pre-existing / 706 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…x on the checklist

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants