Remove the serialization size bottleneck by FTRobbin · Pull Request #25 · ajpal/poach

FTRobbin · 2026-03-03T01:27:22Z

Replace serde_json with serde/flexbuffers.
Implement a debug feature that prints the size of serialized egraphs.

Output example:

cargo run --bin poach tests/taylor51.egg tempo/ size-report

egglog::EGraph : 4.97GB
. rulesets : 4.06GB (81.82%)
. functions : 0.44GB (8.95%)
. backend : 13.59MB (0.27%)
. type_info : 0.24MB (0.00%)
. overall_run_report : 148.99KB (0.00%)
. proof_state : 78B (0.00%)
. schedulers : 27B (0.00%)
. commands : 7B (0.00%)
. command_macros : 7B (0.00%)
. pushed_egraph : 3B (0.00%)
[1/1] taylor51 : SUCCESS
0 failures out of 1 files

*Numbers were off by a bit from the Mar 2nd meeting version because I eyeballed the size from the number in bytes for the meeting and used base 1000 instead of 1024.

Investigate the problem in rulesets.

The problem is in span. The egglog frontend uses spans throughout its ASTs for debugging.

pub enum Span {
    Panic,
    Egglog(Arc<EgglogSpan>),
    Rust(Arc<RustSpan>),
}

...

pub struct EgglogSpan {
    pub file: Arc<SrcFile>,
    pub i: usize,
    pub j: usize,
}

However, the default implementation generated by #[derive(serde::Serialize)] performs a deep copy for Arc<SrcFile>, which means every node in the AST would carry a copy of the whole input egglog program, causing a significant blowup that scales quadratically with the input file size (# AST nodes * filesize).

Remove the span serialization bottleneck.

I implemented a quick hack that serializes span into unit. For the example tested above, we now have a 300x overall reduction and a 1600x reduction for rulesets and functions:

&egglog::EGraph : 17.12MB
  backend : 13.56MB (79.24%)
  rulesets : 2.53MB (14.80%)
  functions : 0.72MB (4.25%)
  type_info : 0.24MB (1.42%)
  overall_run_report : 148.99KB (0.85%)
  proof_state : 78B (0.00%)
  schedulers : 27B (0.00%)
  pushed_egraph : 3B (0.00%)
[1/1] taylor51 : SUCCESS
0 failures out of 1 files

Add experiments previously blocked by the serialization size.

Run extract runmode on easteregg, herbie-hamming, herbie-math-rewrite, herbie-math-taylor benchmarks.

Fixed minor bugs.

Update nightly frontend to present the new data

Added radio buttons to pick a benchmark.

Added a new chart to show the speedup of using Poach vs. Vanilla.

Added a checkbox for including deserialization time in Poach time.

Get a nightly run. Link
List what research questions to ask and how to look at the data.
Add file size statistics to the nightly report. (Link in the #poach channel)
Is the serialization size linear to the egraph size? - Yes (~500bytes per tuple)
Is the serialization/deserialization time linear to the serialization size? - Yes
Clean up hacks.
Get another nightly run. Link Results reproduced.
Investigate if egglog's hashcon works as I thought it does

FTRobbin added 21 commits February 23, 2026 10:07

Use Flexbuffer

2374094

Implement SizeReport

dae76d2

Dig deeper into the size blowup

156f463

Serialize span into unit

92cc333

Add control for how much size information to output

c37fd3a

Merge remote-tracking branch 'origin' into haobin-mining

a099964

Extract experiment runs

4234f79

Tweak nightly frontent to display extract experiment results

9c85469

Show egraph size in size report

54533db

Add include ser time option, add a speedup graph

dcf81e5

Merge remote-tracking branch 'origin' into haobin-mining

41a6fe8

fmt

63d2be2

Skip tests because containers are not yet supported

c54b1a2

Merge remote-tracking branch 'origin' into haobin-mining

c15978f

Comment local dev setup

85dcdcf

Output a csv file with serialization size data

1d46162

fmt

a575829

Hacks

96ea226

fmt

53cb8f8

More more evil hacks

78f79fb

Remove Easteregg from the list of experiments

41742d6

ajpal mentioned this pull request Mar 13, 2026

Run rules after deserialization #30

Open

FTRobbin added 4 commits March 26, 2026 15:06

Clean up evil hacks

35fa1d9

fmt

a939da6

Merge remote-tracking branch 'origin' into haobin-mining

9b6eaf1

fmt

0e3ffb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the serialization size bottleneck#25

Remove the serialization size bottleneck#25
FTRobbin wants to merge 25 commits intomainfrom
haobin-mining

FTRobbin commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FTRobbin commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FTRobbin commented Mar 3, 2026 •

edited

Loading