buffa generated view code currently decodes packed repeated scalar fields by pushing into RepeatedView one value at a time without reserving capacity first:
while !pcur.is_empty() {
view.values.push(::buffa::types::decode_uint32(&mut pcur)?);
}
This can become allocation-heavy when decoding many messages that each contain small packed repeated fields. In an MVT decoding workload, this showed up prominently as Vec growth / realloc / malloc time in flamegraphs.
I created a minimal repro here: https://github.com/nyurik/buffa-perf-optimization
The repro encodes a Tile with 65,536 Feature messages, each containing 16 packed uint32 values. This matches the problematic shape: many small repeated vectors rather than one large vector.
On my machine, changing the generated view code to decode into a preallocated temporary vector improves runtime from about 9.3s to 7.6s, roughly a 22% improvement:
let mut values = ::buffa::alloc::vec::Vec::with_capacity(payload.len());
while !pcur.is_empty() {
values.push(::buffa::types::decode_uint32(&mut pcur)?);
}
if view.values.is_empty() {
view.values = values.into();
} else {
for value in values {
view.values.push(value);
}
}
A cleaner upstream fix may be to expose RepeatedView::reserve() and have generated code reserve before decoding packed values:
view.values.reserve(payload.len());
while !pcur.is_empty() {
view.values.push(::buffa::types::decode_uint32(&mut pcur)?);
}
payload.len() is a safe upper bound for the number of decoded varints, since each varint is at least one byte. This preserves protobuf repeated-field merge semantics while avoiding repeated vector growth for packed fields.
buffagenerated view code currently decodes packed repeated scalar fields by pushing intoRepeatedViewone value at a time without reserving capacity first:This can become allocation-heavy when decoding many messages that each contain small packed repeated fields. In an MVT decoding workload, this showed up prominently as
Vecgrowth / realloc / malloc time in flamegraphs.I created a minimal repro here: https://github.com/nyurik/buffa-perf-optimization
The repro encodes a
Tilewith 65,536Featuremessages, each containing 16 packeduint32values. This matches the problematic shape: many small repeated vectors rather than one large vector.On my machine, changing the generated view code to decode into a preallocated temporary vector improves runtime from about
9.3sto7.6s, roughly a 22% improvement:A cleaner upstream fix may be to expose
RepeatedView::reserve()and have generated code reserve before decoding packed values:payload.len()is a safe upper bound for the number of decoded varints, since each varint is at least one byte. This preserves protobuf repeated-field merge semantics while avoiding repeated vector growth for packed fields.