As write throughput is bound by disk IO, compressing events during serialization could improve throughput at the cost of CPU (see: proof of concept).
If possible, per-event compression should be delivered inside the scope of existing v2 PQ page format, in which entries contain only seqnum+length+N bytes. To do this, the reader will need to be able to handle compressed or uncompressed bytes without additional context (e.g., by differentiating zlib header from existing CBOR first-bytes).
Because not all users will want to spend CPU for increased throughput, and because of later-mentioned rollback barriers, this feature should first be delivered as opt-in, preferably at a per-pipeline level.
Compatibility Considerations
Once a queue contains compressed events, it will be unable to be read by a logstash instance that does not support event decompression; this presents an undesired rollback barrier that would prevent a user from rolling back to a last known-working configuration due to an unrelated issue.
Queue compression should be implemented as opt-in until at least three minor versions have shipped with decompression support.
Design Requirements
- compression is opt-in for at least 2 minor releases
- reads compressed events from queue unless explicitly configured otherwise
- include metrics in
pipeline.${pipeline_id}.queue.compression namespace:
| name |
definition |
expected value range |
encode.spend.lifetime |
(encode_time / uptime) |
[0,(N_CPUS)] |
encode.ratio.lifetime |
(compressed_bytes / decompresssed_bytes) |
[0,1] |
decode.spend.lifetime |
(decode_time / uptime) |
[0,(N_CPUS)] |
decode.ratio.lifetime |
(decompressed_bytes / compresssed_bytes) |
[1,) |
As write throughput is bound by disk IO, compressing events during serialization could improve throughput at the cost of CPU (see: proof of concept).
If possible, per-event compression should be delivered inside the scope of existing v2 PQ page format, in which entries contain only
seqnum+length+N bytes. To do this, the reader will need to be able to handle compressed or uncompressed bytes without additional context (e.g., by differentiating zlib header from existing CBOR first-bytes).Because not all users will want to spend CPU for increased throughput, and because of later-mentioned rollback barriers, this feature should first be delivered as opt-in, preferably at a per-pipeline level.
Compatibility Considerations
Once a queue contains compressed events, it will be unable to be read by a logstash instance that does not support event decompression; this presents an undesired rollback barrier that would prevent a user from rolling back to a last known-working configuration due to an unrelated issue.
Queue compression should be implemented as opt-in until at least three minor versions have shipped with decompression support.
Design Requirements
pipeline.${pipeline_id}.queue.compressionnamespace:encode.spend.lifetimeencode_time/uptime)0,(N_CPUS)]encode.ratio.lifetime0,1]decode.spend.lifetimedecode_time/uptime)0,(N_CPUS)]decode.ratio.lifetime1,)