Skip to content

netlink, nltest: add ReceiveIter to stream responses#258

Merged
nickgarlis merged 1 commit intomdlayher:mainfrom
nickgarlis:add-iterator
Mar 22, 2026
Merged

netlink, nltest: add ReceiveIter to stream responses#258
nickgarlis merged 1 commit intomdlayher:mainfrom
nickgarlis:add-iterator

Conversation

@nickgarlis
Copy link
Collaborator

@nickgarlis nickgarlis commented Feb 25, 2026

Add ReceiveIter() method that returns an iter.Seq2[Message, error] for iterating over netlink messages without collecting them into a slice. Refactor lockedReceive() to use the new lockedReceiveIter() iterator internally to eliminate code duplication.

I've tested this with google/nftables and got a notable 33% memory pressure reduction on large responses.

This is in an exploratory phase but is looking promising so far. I think that in order to evaluate this approach, the should be addressed:

  • Figure out what the behavior should be when the user stops iterating. Should the socket buffer be drained ? Or should it be the user's responsibility ? What happens in an async scenario ?
  • Make sure it doesn't disrupt packages using other netlink families (I've tested this change with wgctrl, rtnetlink, go-tc and all tests are passing.)
  • Add tests/benchmarks

cc @aojea

nickgarlis added a commit to nickgarlis/nftables that referenced this pull request Feb 25, 2026
Currently, listing many rules or set elements generates a lot of
intermediate slices, which increases memory usage unnecessarily.

This change unmarshals elements during iteration, avoiding these
intermediate allocations. Benchmarks show improved performance,
particularly when reading rules.

Depends on: mdlayher/netlink#258
nickgarlis added a commit to nickgarlis/nftables that referenced this pull request Feb 25, 2026
Currently, listing many rules or set elements generates a lot of
intermediate slices, which increases memory usage unnecessarily.

This change unmarshals elements during iteration, avoiding these
intermediate allocations. Benchmarks show improved performance,
particularly when reading rules.

Depends on: mdlayher/netlink#258
@mdlayher
Copy link
Owner

Hi @nickgarlis, thanks for the contributions. I've added you as a collaborator since you're actively making improvements to the ecosystem. Thank you!

@nickgarlis
Copy link
Collaborator Author

Hi @nickgarlis, thanks for the contributions. I've added you as a collaborator since you're actively making improvements to the ecosystem. Thank you!

Hi @mdlayher, thanks for the invite! I appreciate the trust. I’ll do my best to be helpful and keep things in good shape.

Add ReceiveIter() method that returns an iter.Seq2[Message, error] for
iterating over netlink messages without collecting them into a slice.

Refactor lockedReceive() to use the new lockedReceiveIter() iterator
internally to eliminate code duplication.

Reduces memory usage for large responses, particularly when using
ReceiveIter. If iteration is stopped early on multi-part responses, the
remaining buffer is drained to keep the socket in a consistent state.
@nickgarlis nickgarlis changed the title Add ReceiveSeq iterator for streaming message responses netlink, nltest: add ReceiveIter to stream responses Mar 19, 2026
@nickgarlis
Copy link
Collaborator Author

I benchmarked this change using BenchmarkNftablesDump with the current nftables implementation (using Receive), and observed the following results:

goos: linux
goarch: amd64
pkg: github.com/mdlayher/netlink/internal/integration
cpu: 13th Gen Intel(R) Core(TM) i9-13900H
                      │    v1.txt    │               v2.txt               │
                      │    sec/op    │   sec/op     vs base               │
NftablesDump/1-20        65.18µ ± 7%   65.49µ ± 8%       ~ (p=0.796 n=10)
NftablesDump/8-20       103.67µ ± 3%   95.67µ ± 8%  -7.72% (p=0.002 n=10)
NftablesDump/64-20       297.5µ ± 2%   296.0µ ± 3%       ~ (p=0.353 n=10)
NftablesDump/512-20      1.835m ± 3%   1.819m ± 3%       ~ (p=0.436 n=10)
NftablesDump/4096-20     18.45m ± 3%   18.21m ± 3%  -1.29% (p=0.023 n=10)
NftablesDump/32768-20    225.7m ± 4%   225.3m ± 8%       ~ (p=0.971 n=10)
geomean                  1.577m        1.550m       -1.71%

                      │    v1.txt    │               v2.txt                │
                      │     B/op     │     B/op      vs base               │
NftablesDump/1-20       11.18Ki ± 0%   11.34Ki ± 0%  +1.43% (p=0.000 n=10)
NftablesDump/8-20       23.90Ki ± 0%   23.18Ki ± 0%  -3.00% (p=0.000 n=10)
NftablesDump/64-20      125.5Ki ± 0%   122.1Ki ± 0%  -2.69% (p=0.000 n=10)
NftablesDump/512-20     953.5Ki ± 0%   916.4Ki ± 0%  -3.89% (p=0.000 n=10)
NftablesDump/4096-20    7.974Mi ± 0%   7.435Mi ± 0%  -6.77% (p=0.000 n=10)
NftablesDump/32768-20   66.97Mi ± 0%   64.22Mi ± 0%  -4.10% (p=0.000 n=10)
geomean                 511.5Ki        495.1Ki       -3.20%

                      │   v1.txt    │               v2.txt               │
                      │  allocs/op  │  allocs/op   vs base               │
NftablesDump/1-20        146.0 ± 0%    156.0 ± 0%  +6.85% (p=0.000 n=10)
NftablesDump/8-20        407.0 ± 0%    417.0 ± 0%  +2.46% (p=0.000 n=10)
NftablesDump/64-20      2.486k ± 0%   2.494k ± 0%  +0.32% (p=0.000 n=10)
NftablesDump/512-20     19.06k ± 0%   19.02k ± 0%  -0.19% (p=0.000 n=10)
NftablesDump/4096-20    151.6k ± 0%   151.2k ± 0%  -0.26% (p=0.000 n=10)
NftablesDump/32768-20   1.212M ± 0%   1.208M ± 0%  -0.27% (p=0.000 n=10)
geomean                 8.959k        9.089k       +1.45%

I then updated the consumer to use ReceiveIter as shown in this PR google/nftables#357, and got the following results:

goos: linux
goarch: amd64
pkg: github.com/mdlayher/netlink/internal/integration
cpu: 13th Gen Intel(R) Core(TM) i9-13900H
                      │    v1.txt    │                v3.txt                │
                      │    sec/op    │    sec/op     vs base                │
NftablesDump/1-20        65.18µ ± 7%   63.21µ ±  7%        ~ (p=0.739 n=10)
NftablesDump/8-20       103.67µ ± 3%   90.83µ ± 12%  -12.38% (p=0.002 n=10)
NftablesDump/64-20       297.5µ ± 2%   302.2µ ±  5%        ~ (p=0.579 n=10)
NftablesDump/512-20      1.835m ± 3%   1.840m ±  3%        ~ (p=1.000 n=10)
NftablesDump/4096-20     18.45m ± 3%   16.40m ±  4%  -11.11% (p=0.000 n=10)
NftablesDump/32768-20    225.7m ± 4%   201.7m ±  2%  -10.62% (p=0.000 n=10)
geomean                  1.577m        1.481m         -6.04%

                      │    v1.txt    │                v3.txt                │
                      │     B/op     │     B/op      vs base                │
NftablesDump/1-20       11.18Ki ± 0%   11.32Ki ± 0%   +1.25% (p=0.000 n=10)
NftablesDump/8-20       23.90Ki ± 0%   22.06Ki ± 0%   -7.67% (p=0.000 n=10)
NftablesDump/64-20      125.5Ki ± 0%   110.9Ki ± 0%  -11.67% (p=0.000 n=10)
NftablesDump/512-20     953.5Ki ± 0%   820.9Ki ± 0%  -13.91% (p=0.000 n=10)
NftablesDump/4096-20    7.974Mi ± 0%   6.419Mi ± 0%  -19.50% (p=0.000 n=10)
NftablesDump/32768-20   66.97Mi ± 0%   51.34Mi ± 0%  -23.34% (p=0.000 n=10)
geomean                 511.5Ki        445.8Ki       -12.83%

                      │   v1.txt    │               v3.txt               │
                      │  allocs/op  │  allocs/op   vs base               │
NftablesDump/1-20        146.0 ± 0%    158.0 ± 0%  +8.22% (p=0.000 n=10)
NftablesDump/8-20        407.0 ± 0%    413.0 ± 0%  +1.47% (p=0.000 n=10)
NftablesDump/64-20      2.486k ± 0%   2.484k ± 0%  -0.08% (p=0.000 n=10)
NftablesDump/512-20     19.06k ± 0%   19.01k ± 0%  -0.27% (p=0.000 n=10)
NftablesDump/4096-20    151.6k ± 0%   151.2k ± 0%  -0.27% (p=0.000 n=10)
NftablesDump/32768-20   1.212M ± 0%   1.208M ± 0%  -0.27% (p=0.000 n=10)
geomean                 8.959k        9.086k       +1.42%

@nickgarlis nickgarlis marked this pull request as ready for review March 19, 2026 19:59
nickgarlis added a commit to nickgarlis/nftables that referenced this pull request Mar 19, 2026
Currently, listing many rules or set elements generates a lot of
intermediate slices, which increases memory usage unnecessarily.

This change unmarshals elements during iteration, avoiding these
intermediate allocations. Benchmarks show improved performance,
particularly when reading rules.

Depends on: mdlayher/netlink#258
@nickgarlis nickgarlis merged commit 5af0e4f into mdlayher:main Mar 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants