Skip to content

docs: RLM FinanceBench 50/50 benchmark + ingestion time#63

Merged
miguelgfierro merged 1 commit into
feat/rlm-integrationfrom
docs/rlm-benchmark
Jun 18, 2026
Merged

docs: RLM FinanceBench 50/50 benchmark + ingestion time#63
miguelgfierro merged 1 commit into
feat/rlm-integrationfrom
docs/rlm-benchmark

Conversation

@miguelgfierro

Copy link
Copy Markdown
Contributor

Adds docs/rlm-benchmark.md with the final FinanceBench 50/50 results (RLM in the default sandbox: Answer-Correctness 0.510 vs hybrid-RAG 0.434 and champion 0.497; 81/81, 0 sandbox failures; ~34 s/query) and the ingestion time (~84 min for 184 filings, embedding-dominated — a cost RLM doesn't use, so RLM-only ingest could skip it). Lands in PR #43. Targets feat/rlm-integration.

Final sandbox-mode run (rlm-final): Answer-Correctness 0.510 (> champion
0.497, > hybrid RAG 0.434), 0 sandbox failures, ~34s/query. Documents the
~84min/184-filing ingestion cost, which is embedding-dominated and unused
by RLM (an RLM-only ingest could skip embedding).
@miguelgfierro miguelgfierro merged commit 3f0e6e3 into feat/rlm-integration Jun 18, 2026
@miguelgfierro miguelgfierro deleted the docs/rlm-benchmark branch June 18, 2026 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant