docs: RLM vs RAG benchmark (FinanceBench 50/50 + full) by miguelgfierro · Pull Request #64 · firefly-operationOS/flycanon

miguelgfierro · 2026-06-18T14:16:47Z

Reframes the benchmark doc as RLM vs hybrid vector RAG across FinanceBench 50/50 and full, matching the experiments-repo README format (RAGAS + custom + retrieval metrics, time, cost; ordered by Answer-Correctness). Shows RLM winning answer quality on both datasets (AnsCorr 0.497/0.501 vs best-RAG 0.434/0.422), the vector embedding-ingest cost (~1h16m / ~2h36m) vs RLM's lazy no-ingest, and the 50/50 production-sandbox validation (0.510, 0 failures). Renames docs/rlm-benchmark.md → docs/rlm-vs-rag-benchmark.md. Lands in PR #43.

Rename rlm-benchmark.md -> rlm-vs-rag-benchmark.md; match the experiments README format; RLM vs hybrid vector RAG across both datasets with retrieval + generation metrics, time (incl. the embedding-heavy vector ingest vs RLM's lazy no-ingest), and cost; PageIndex omitted (RLM-vs-RAG view).

miguelgfierro merged commit 8a7a76c into feat/rlm-integration Jun 18, 2026

miguelgfierro deleted the docs/rlm-vs-rag-benchmark branch June 18, 2026 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: RLM vs RAG benchmark (FinanceBench 50/50 + full)#64

docs: RLM vs RAG benchmark (FinanceBench 50/50 + full)#64
miguelgfierro merged 1 commit into
feat/rlm-integrationfrom
docs/rlm-vs-rag-benchmark

miguelgfierro commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

miguelgfierro commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant