docs: RLM FinanceBench 50/50 benchmark + ingestion time by miguelgfierro · Pull Request #63 · firefly-operationOS/flycanon

miguelgfierro · 2026-06-18T14:09:55Z

Adds docs/rlm-benchmark.md with the final FinanceBench 50/50 results (RLM in the default sandbox: Answer-Correctness 0.510 vs hybrid-RAG 0.434 and champion 0.497; 81/81, 0 sandbox failures; ~34 s/query) and the ingestion time (~84 min for 184 filings, embedding-dominated — a cost RLM doesn't use, so RLM-only ingest could skip it). Lands in PR #43. Targets feat/rlm-integration.

Final sandbox-mode run (rlm-final): Answer-Correctness 0.510 (> champion 0.497, > hybrid RAG 0.434), 0 sandbox failures, ~34s/query. Documents the ~84min/184-filing ingestion cost, which is embedding-dominated and unused by RLM (an RLM-only ingest could skip embedding).

miguelgfierro merged commit 3f0e6e3 into feat/rlm-integration Jun 18, 2026

miguelgfierro deleted the docs/rlm-benchmark branch June 18, 2026 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: RLM FinanceBench 50/50 benchmark + ingestion time#63

docs: RLM FinanceBench 50/50 benchmark + ingestion time#63
miguelgfierro merged 1 commit into
feat/rlm-integrationfrom
docs/rlm-benchmark

miguelgfierro commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

miguelgfierro commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant