Major Platform and Architecture Update#67
Open
Allan-Feng wants to merge 91 commits into
Open
Conversation
Switch Docker and Render deployment to the canonical ASGI package target `uvicorn dashboard.backend.app:app` (PORT-aware, no PYTHONPATH/path hacks), replacing the deprecated `python dashboard/backend/app.py` startup. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
If an agent's decide() plus submission took longer than the environment's
per-step decision window (default 30s), submit_decision raised ATLConflictError
and the SDK's AgentRunner let it abort the WHOLE run — even though the backend
had auto-held that one step and the run was still live.
- Catch ATLConflictError with code in {decision_deadline_exceeded,
step_already_finalized} around submit_decision and advance to the next step
instead of aborting. on_execution_result does not fire for the auto-held step.
- max_steps early exit now returns the metrics gathered so far (status
"running") instead of calling /result, which 409s on an unfinalized run.
- Attach run.id to every backend error raised inside run_backtest
(ATLAPIError.with_run_id, preserves the traceback) so callers can locate the
failing run.
- Validate poll_interval > 0 (0 was a busy-loop footgun: _wait's
`sleep_for or poll_interval` fallback meant a 0 interval never advanced the
idle timer). Document the 30s window in the README.
Adversarial verify (sonnet) surfaced that a genuinely-late decision surfaces as
`step_already_finalized`, NOT `decision_deadline_exceeded` — the backend applies
the elapsed-deadline auto-hold during status reconciliation before re-checking
the submitted step. The catch-set already covers it, but nothing locked which
code the backend emits (the SDK tests fabricate the string), so a future cleanup
could drop step_already_finalized and silently regress H5. Added a cross-package
backend contract test (test_late_decision_returns_autoheld_code) that asserts the
real code a late /decision produces, and cross-referenced it from the SDK.
Tests: +6 SDK runner tests (deadline advance, step_already_finalized advance,
non-autoheld conflict still raises w/ run_id, max_steps→metrics, poll_interval
guard, run_id attach) — all red-green verified; +1 backend contract test.
packaging: 36 passed. backend: 5 failed (pre-existing) / 631 passed / 2 skipped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
… (H6) An LLM leaderboard entry that silently fell back to rule-based trading — no client (missing key/SDK) or a model id the active gateway rejected so every call failed — was still persisted under its LLM model name, showing a rule-based curve as if the model produced it. With only ANTHROPIC_API_KEY the 5 gateway-slug entries all fell back and published identical rule-based curves. - Add `_reject_if_llm_fallback`: refuse to publish when an LLM strategy reports used_llm=False or llm_calls==0, unless allow_fallback=True. Rule-based baselines expose no `used_llm` and pass through untouched. Applied on BOTH insert paths — deploy_model_run AND ensure_leaderboard_runs (belt-and- suspenders, so a misconfigured LLM entry left on the auto-compute path can't slip a fallback onto the board). - Thread `--allow-fallback` through scripts/deploy_leaderboard_model.py; the CLI prints a clear message and exits non-zero when a fallback is refused. - Make llm_agent.py's default model id gateway-aware (default_model_name() vs a hardcoded native id) so an entry without an explicit model id matches the gateway make_llm_client actually built. Adversarial verify (sonnet) confirmed: only these two paths write leaderboard rows (grep of every insert_run call site); no false positives (an all-HOLD LLM run still has llm_calls>0 — the counter increments per completed call, not per trade); no false negatives (a per-request model rejection surfaces as used_llm=True/llm_calls=0 and is caught). Documented limitations (not defects): (a) the guard fires on new writes only — any fallback published before this patch keeps being served from cache until the entry is re-deployed with --force; (b) full model_id↔gateway reconciliation for the configured entries (non-Anthropic models need CommonStack) is a deployment decision — the guard now makes those entries refuse rather than publish fakes. Tests: +7 leaderboard tests (refuse on used_llm=False / llm_calls==0, allow_ fallback override, real LLM publishes, baseline unaffected, auto-compute path guarded, gateway-aware default) — red-green verified; updated the llm_agent canonical-import characterization test. Full suite: 5 failed (pre-existing) / 638 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…l id (H7)
A built-in agent's model_name defaults to the sentinel 'local-model'. The Discord
bot forwarded it verbatim: /ask passed model='local-model' to chat_with_agent
(asking the API to call a model literally named 'local-model' → broken), and
/backtest set payload['model']='local-model' (mislabeled / failed run).
- Add token_cost.is_free_model(model): True for sentinel / rule-based / local
markers (the existing _FREE_MODEL_MARKERS) or empty — a reusable predicate for
"names no real paid LLM".
- In discord_bot, map a sentinel model_name to None via _model_override at both
forwarding sites (/ask and /backtest), so the server picks its default instead
of receiving a bogus model id. Real model ids pass through unchanged.
chat_with_agent already treats model=None as "use default".
Tests: is_free_model unit tests (sentinels/empty → True, real ids → False);
a behavioral _model_override test in the discord suite (guarded by
importorskip('discord'), runs where the optional dep is present); and a
source-level wiring guard (tests/integrations) that runs everywhere — since
discord is an undeclared optional dep the behavioral test is skipped in the base
env, so the source guard locks the two call sites there. Red-green verified
(is_free_model + wiring both go red against the original). Full suite: 5 failed
(pre-existing) / 642 passed / 2 skipped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
The dashboard invented data when the backend had none or was unreachable: - an empty agents list, and any /api/v1/agents failure, fell back to 9 hardcoded MOCK_AGENTS — masking a real outage as if agents existed; - "My Portfolio" rendered a hardcoded $128,742.34 account with fake holdings. - Gate MOCK_AGENTS behind demo mode only (isDemoMode(): ?demo query flag, or a localhost/127.0.0.1/file host). On production (vercel/onrender) real users now see the genuine empty-state, and an API failure shows a distinct error-state (renderAgentsError) instead of fake agents. - Add a prominent "SAMPLE DATA" badge next to the "My Portfolio" heading so the illustrative mock is clearly not a real brokerage account. (Full /paper/* wiring is a larger feature; the badge makes the current mock honest.) The frontend is vanilla JS with no test harness, so verification is: node --check (both files parse); a node behavioral check of the extracted isDemoMode across 7 host/query cases (production hosts -> false, localhost/?demo -> true, ?demo=0 -> false); and source-guard tests (tests/integrations/test_frontend_no_mock_data.py) that run in CI and lock the wiring (MOCK_AGENTS demo-gated, error-state + sample badge present, old "using mock data" fallback removed). Red-green verified. Full backend suite: 5 failed (pre-existing) / 645 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…UDE.md (H9) The docs told users to start the server with `python3 dashboard/backend/app.py`, which is broken after the package refactor: the top-level `dashboard.backend.*` imports fail unless the repo root is on sys.path. And this branch had no root CLAUDE.md, so a merge would inherit main's flat-imports guidance — which literally instructs undoing this PR's core change. - getting_started.rst: run the server via `uvicorn dashboard.backend.app:app --reload` (canonical, matches render.yaml + Dockerfile) or `python -m dashboard.backend.app`; drop the broken direct-file command. - dashboard-target-structure.md: the `__main__` block is a real `python -m` entrypoint (canonical import string), not just a deprecated shim kept for stale docs; note that running the file directly does not work and why. - Add a root CLAUDE.md documenting the packaged contract (`dashboard.backend` package imports, uvicorn run command, api/routers + domain/* layout, the DATABASE_PATH-backed stores). It carries a header note that it SUPERSEDES the flat-imports CLAUDE.md on main and must win at merge — this is the "coordinate with main" reconciliation the review flagged (same class as the /api/v1 vs /api/v2 merge decision). app.py's `__main__` block was already correct (uvicorn.run with the canonical import string) — left as-is. Tests: doc-guard tests (tests/integrations/test_docs_run_command.py) that run in CI and lock the fix — getting_started documents a working command, CLAUDE.md describes the package contract (not flat imports). Red-green verified. Full suite: 5 failed (pre-existing) / 647 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…ew branch House-cleaning: version the 91-agent adversarial-review fix checklist alongside the code it tracks. B0/B1 + H1-H9 are done; MEDIUM/LOW pending. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…pen-Finance-Lab#5) GET /app/ served app.html directly. app.html references its assets with relative paths (styles.css, app.js, images/...), which a browser on /app/ resolves against the /app/ base (/app/styles.css -> 404) — the dashboard renders unstyled. Serve /app directly and 308-redirect (method-preserving) the trailing-slash variant to /app so relative assets resolve against root. Route contract unchanged (both /app and /app/ were already registered). +2 tests (test_static_routes.py), red-green verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…Open-Finance-Lab#11) discord_bot.py imports `discord` (discord.py 2.x: app_commands, ui.View, Interaction, Intents), but the dep was undeclared in every requirements file — the bot was unrunnable from declared deps. Declare it in an optional requirements-discord.txt (mirroring requirements-sphinx.txt for docs) rather than core requirements.txt, so web/API/backtest installs stay lean, and point contributors at it in CLAUDE.md. +1 source-guard test (runs without discord installed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
… (MEDIUM Open-Finance-Lab#6) strategy.html is a standalone shared-link page and used `const API = location.origin` unconditionally. That works locally (frontend + backend share localhost:8000) but breaks on Vercel, where the static frontend and the API are on different origins — every API call (strategy fetch, /backtest/run, status polling) would hit the frontend host and 404. Replicate app.js's localhost-vs-hosted resolution (falls back to https://agentictrading.onrender.com). Also drop the hardcoded default dates for a runtime past-7-days initializer. +2 source-guard tests (run in CI without a browser); embedded JS node --check'd. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…anager docstring (MEDIUM Open-Finance-Lab#7) The module docstring claimed the class was "Moved verbatim" and "functionally identical to the post-Phase-2C2 implementation". That is false for the safe_trading candidate selection in make_trading_decision_with_llm: the pre-refactor code ranked the top-10 candidates by RSI extremity (|RSI-50|, a mean-reversion heuristic); this version ranks the top-12 by a multi-factor trend/momentum score AND always appends current holdings. That is a deliberate strategy change bundled into the refactor, so backtests before/after this commit are not directly comparable. Correct the docstring to disclose the divergence (the inline comment on the branch was already honest; the module docstring was not). +3 characterization tests (test_portfolio_manager_move.py): trend-based ranking excludes a deeply-oversold no-trend name the old RSI ranking would surface first; current holdings are force-included; docstring no longer claims verbatim identity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…IUM Open-Finance-Lab#4) POST /api/strategies is public by design (shared links must work without a session), but had no prompt size cap and no write throttle — an anonymous client could persist unbounded, megabyte-sized prompts without limit, bloating the DB. - CreateStrategyBody.prompt: max_length=5000 (422 on oversized, before any DB write). - Per-client write rate limit (30/hour, keyed by session/browser-id or peer host) via a new reusable api/rate_limit.FixedWindowRateLimiter (best-effort in-process abuse control, documented as such; reused by the /backtest/run fix). 429 on excess. - `owner` documented as a display-only attribution label (e.g. "discord:<id>"), never an auth control — unchanged, so the Discord bot / frontend still work. +6 tests (limiter units with injected clock + endpoint 422/429/ok). Full suite: 5 failed (pre-existing) / 661 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Open-Finance-Lab#3) get_run_plot was `async def` but did fully synchronous, CPU-bound matplotlib rendering (Figure build + savefig) plus blocking SQLite reads inline — every plot request stalled the event loop for the whole render, and a burst (e.g. from the Discord bot) serialized all server traffic behind it. It also re-imported matplotlib and re-called matplotlib.use("Agg") on every request. - Make the handler a plain `def` so FastAPI offloads it to the threadpool. - Hoist matplotlib import + Agg backend to module scope (configured once). - Extract the render into an @lru_cache(maxsize=128) `_render_run_plot_png`; a run's equity data is immutable and run_ids are unique, so bytes are reused without re-querying/re-rendering. 404s raise (not cached) so late data is still picked up. Route contract unchanged (same path + handler name). +4 tests. (The public plot.png stays session-exempt by design — it's an <img> embed that can't send X-Session-Id; noted for the ownership follow-up.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…DIUM Open-Finance-Lab#2) POST /backtest/run is reachable with only a (self-minted) X-Session-Id and spends real operator LLM credits per trading hour of the run, with zero validation: an anonymous caller could force the most expensive model, an oversized prompt, and a multi-year date range — hundreds of paid LLM calls per request. Validate the merged effective params (they arrive as query OR body) in the handler: - model must be a known/priced id or free/local marker (new token_cost.is_known_model, single source of truth with _PRICING_TABLE) — 422 otherwise; - strategy_prompt capped at 4000 chars (it's injected into every call) — 422; - date range must be YYYY-MM-DD, ordered, and <= 31 days — 422. Plus a per-client run budget (10/hour, reusing api.rate_limit) to cap serial abuse beyond the existing concurrent-run guard — 429. Rejections happen before any thread is scheduled. +7 tests. Full suite: 5 failed (pre-existing) / 672 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…EDIUM Open-Finance-Lab#8) The protocol documented decision_deadline_exceeded for a late decision, but it was effectively dead: submit_decision calls session.get_status() first, which auto-holds the expired step and advances step_index, so control hits the earlier `seq < current_index` branch — which only knew step_already_finalized. The dedicated deadline raise at submit_decisions() was reachable only in the razor- thin window where the deadline elapses mid-call. Distinguish the two by consulting the engine decision log: a real finalized step populates step_results_by_seq[seq]; a deadline auto-hold does not, but is logged with decision_source == "timeout_hold". When prior is None and the source is timeout_hold, raise the documented decision_deadline_exceeded; a genuine double-submit (prior set) keeps step_already_finalized. Factor the log lookup into _step_decision_source (shared with _historical_step_status). Update the H5 backend contract test to assert the specific code, and correct the now-stale NOTE in the SDK runner.py (both codes stay in _STEP_AUTOHELD_CODES for robustness). Protocol suite 29 passed; SDK 36 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…pe all errors (MEDIUM Open-Finance-Lab#9) Three doc/impl mismatches: - The run state machine documented `running -> cancelled`, but no route or service transition ever produces `cancelled` (grepped backend + SDK: only the SDK's defensive _FAILED_STATES mentions it). Drop it (YAGNI) with a note. - §11 claims "all protocol errors use a consistent envelope", but runs.py raised bare-string HTTPException details for ownership/not-found/mismatch — those responses had `detail` as a string, breaking envelope parsing. Route all six through error_body(): run_not_found, forbidden (x2), agent_version_not_found, run_id_mismatch, step_id_mismatch. - §11's code inventory was stale: 403 now `forbidden`; added agent_version_not_found, too_many_orders, run_id/step_id_mismatch, step_not_active/run_not_active/run_completed. +1 not-found envelope test, extended access-control test, +2 doc-guard tests. Full suite: 5 failed (pre-existing) / 675 passed / 2 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Open-Finance-Lab#12) The SDK exposed initial_cash (default 100_000) and always sent it in the run config, and the README quickstart passed it — but H2 made the backend REJECT any config.initial_cash != INITIAL_CAPITAL with invalid_config (400). So the knob could never change anything: the default was a silent no-op and any other value made a doomed round trip. Make the SDK honest and consistent with the backend: - create_run / AgentRunner.run_backtest: initial_cash defaults to None and a non-default value is rejected client-side (ATLValidationError, code="initial_cash_fixed") — fail fast, no doomed request. The fixed default is tolerated for backward compat but omitted from the wire payload. - Drop initial_cash from the README quickstart. +2 SDK tests (client-side rejection + fixed-default omitted from payload); updated the existing create_run test. SDK suite: 38 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
Open-Finance-Lab#13) Most changed-behavior coverage was folded into the per-item TDD (plot.png, strategies, backtest-abuse, protocol codes, SDK). Close the remaining concrete gaps in the code this pass touched: - trend-score ranking survives NaN-indicator bars (early sma50) without crashing and ranks such names out of the top-12 rather than surfacing them; - a custom strategy_prompt is threaded through to create_prompt(custom_prompt=). (default_model_name()/make_llm_client() env-matrix coverage stays with H6's gateway work; not re-litigated here.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…IUM Open-Finance-Lab#2 follow-up) Adversarial review caught that the is_known_model allowlist (matching the pricing table) 422'd the dashboard UI's OWN dropdown models — gpt-5.2, gpt-5-mini, deepseek-v4-*, gemini-*-* are not pricing-table families — so selecting them in the backtest UI would have failed. And since the UI intentionally offers expensive models (claude-opus-4.7), gating by tier was never the goal. Replace the pricing allowlist with a model-id FORMAT validator (charset + length): rejects a garbage/injection string reaching the backtest subprocess, accepts every legit provider/model slug. Drop the now-unused token_cost.is_known_model and the docstring's overclaim about blocking "the most expensive model"; note the rate limit is a best-effort throttle, not a hard cap (the per-request caps are). Tests parametrize over every UI dropdown option (accept) + malformed ids (reject). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…inance-Lab#2/Open-Finance-Lab#4 follow-up) The claimed memory-bounding was dead code: `if len(q) >= max_events: if not q: del ...` can never fire (it needs a bucket both full and empty, impossible for max_events >= 1), so empty/expired buckets were never reclaimed and per-key state grew for the process lifetime. Add a max_keys cap: when a new key would exceed it, sweep buckets whose entire window has expired. Preserves allow/reject/window semantics. +1 test (reclamation under max_keys with a fake clock). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…EDIUM Open-Finance-Lab#4 follow-up) The bot posted /api/strategies with no id header, so the server's write rate limiter (30/hr) fell back to the peer host — the one bot process's IP — making all Discord users share a single bucket. Send X-Browser-Id: discord:<user_id> so each user gets their own budget. +1 wiring guard test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…n-Finance-Lab#3/Open-Finance-Lab#8 follow-up) _finalize() minted run_id = ext_<second-resolution timestamp> with no uniqueness guard, persisted as a PRIMARY KEY via INSERT OR REPLACE. Two runs finalizing in the same second collided: the second overwrote the first's rows. Adversarial review showed this turns two later fixes into latent bugs — Open-Finance-Lab#3's plot.png cache would serve the overwritten run's chart forever, and Open-Finance-Lab#8's _step_decision_source could read a merged decision log. Append a uuid8 suffix (extract _new_ext_run_id; prefix preserved for baseline_resolver's startswith check). +1 uniqueness test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…IUM Open-Finance-Lab#12 follow-up) Adversarial review found the client-side initial_cash guard only checked the kwarg — a caller could still smuggle a non-default via config={"initial_cash": ...} because config is merged into run_config afterward (and it would then 400 at the backend, the exact doomed round-trip the fix set out to avoid). Validate the EFFECTIVE value from kwarg-or-config after the merge, then pop it so it's never sent. Also drop the now-de-advertised initial_cash= from all four dashboard/examples SDK scripts, the python-sdk-quickstart doc, and the protocol config example. +1 test (config-dict bypass rejected). SDK suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…e-Lab#5 follow-up) The 308 redirect hardcoded url="/app", dropping the query string — but the frontend deep-links via query params on this route (?auth=login opens the auth modal, ?view=/?mode= drive navigation, generateShareURL builds shareable links). A bookmarked /app/?auth=login lost its params. Carry request.url.query through. +1 test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…MEDIUM Open-Finance-Lab#6 follow-up) initDateDefaults() formatted the past-7-days defaults with toISOString().slice(0,10), which is UTC — near local midnight in a non-UTC timezone the shown "today"/"7 days ago" could be off by a calendar day. Format from local getFullYear/getMonth/getDate. Guard test updated. Embedded JS node --check'd. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…n holdings test (MEDIUM Open-Finance-Lab#7 follow-up) Adversarial review: (1) the corrected docstring pinned the RSI->trend ranking change to this file's Phase-2C3 move, but the change actually landed in an earlier, unrelated commit — reword to state it was made separately, without claiming when. (2) test_safe_trading_always_includes_current_holdings used a HELD fixture with rsi=20 (|20-50|=30), which the OLD RSI-extremity ranking would ALSO surface, so it didn't isolate the holdings-append behavior. Change to rsi=50 so HELD ranks last under both schemes and only appears via the holdings-append step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…e-Lab#9 follow-up) Adversarial review: §11 still omitted codes the impl actually raises. Verified against the source and added: unsupported_environment (400), result_not_found (404), too_many_active_runs (429), run_failed (500). Guard test extended to require them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…pen-Finance-Lab#11 follow-up) Adversarial review: the bot needs core backend deps too, so `pip install -r requirements-discord.txt` alone didn't make it runnable (only discord.py was declared). Add `-r requirements.txt` so the one command installs everything. Strengthen the guard test to require a real discord.py requirement line (regex, not a comment match) and the `-r requirements.txt` include. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…list All actionable MEDIUM items (Open-Finance-Lab#2-Open-Finance-Lab#9,Open-Finance-Lab#11,Open-Finance-Lab#12,Open-Finance-Lab#13) done on pr-67-review with a 10-agent adversarial pass and a second round fixing every confirmed defect. Per-item SHAs + notes recorded. #1 (v1/v2) and Open-Finance-Lab#10 (landing rebuild) deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…nce-Lab#3 residual) Re-verification of the ext_ run_id fix found the same collision class in every other run_id-minting site: engine.py (agent_/buyhold_/djia_index_), the paper baselines (djia_paper_baseline_/bah_paper_baseline_), and paper_session ({agent_name}_{timestamp}) all built ids from a bare second-resolution timestamp, written as the agent_runs PRIMARY KEY via INSERT OR REPLACE, and are all servable through the same unrestricted /runs/{id}/plot.png cache. Append a uuid8 suffix at each so run_id uniqueness is a real invariant (the plot-cache premise, and Open-Finance-Lab#8's decision-log lookup, depend on it). Verified safe: nothing reconstructs a timestamp from run_ids (only startswith prefix checks + a comma-split); paper-baseline idempotency keys on mode; defaults.json references only pre-existing seed ids (generation change affects new runs only). +1 guard test across all four sites. Full suite: 5 pre-existing / 706 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
…x on the checklist Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a broader update to the Agentic Trading Lab platform.