Redfs ubuntu HWE improve writeback performance by hbirth · Pull Request #184 · DDNStorage/linux

hbirth · 2026-06-24T09:34:20Z

No description provided.

Signed-off-by: Horst Birthelmer <horst@birthelmer.de>

Writes that already match the alignment advertised via FUSE_ALIGN_PG_ORDER gain nothing from the writeback cache and can degrade into page-sized WRITE requests under dirty throttling. Send them through fuse_perform_write() instead, which packs requests up to max_write and keeps them stripe-aligned for the backend. They create no dirty pages, so no DLM write lock needs to be cached for them. Unaligned writes keep using the writeback cache. Also clarify in the uapi header that align_page_order is the log2 of the alignment in bytes, not in pages. Ported from the redfs-ubuntu-noble-writethrough-split branch and adapted to the iomap-based writeback path: the decision gates the writeback bool in fuse_cache_write_iter() (and the DLM write-lock acquisition) instead of branching to a writethrough label. Signed-off-by: Horst Birthelmer <horst@birthelmer.de>

Add a per-connection size threshold, settable via fusectl as writethrough_threshold, that sends buffered writes >= threshold through fuse_perform_write() regardless of alignment. The knob is off by default (0 == disabled) and leaves the existing alignment-based decision in place for writes below the threshold. Ported from the redfs-ubuntu-noble-writethrough-split branch; the fusectl dentry uses this branch's fuse_ctl_add_dentry() signature and the ops struct omits the now-removed no_llseek. Signed-off-by: Horst Birthelmer <horst@birthelmer.de>

fuse_readahead() batches whole folios into a single request, capped at min(fc->max_pages, fc->max_read/PAGE_SIZE) pages, but fuse_init_file_inode() let the page cache build folios up to MAX_PAGECACHE_ORDER. A large sequential read could thus produce a folio bigger than one request can carry: the first loop iteration took the folio_pages > cur_pages path, fired WARN_ON(!pages), and broke with ap->num_folios == 0. fuse_send_readpages() was still called and dereferenced a NULL ap->folios[0] via folio_pos(), oopsing at CR2=0x20 (folio->index). Cap the folio order to the per-request page limit so the page cache can never build an unserviceable folio. Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

Buffered writes hold the inode lock exclusively, serialising all writers even on disjoint ranges. When fc->dlm is set and the write goes through iomap writeback, the DLM already serialises cluster-wide, so the inode rwsem only needs to keep i_size stable. Add fuse_cache_wr_exclusive_lock() to detect this and take the lock shared, letting disjoint writers run in parallel (MPI-IO / IOR). Direct, appending, or i_size-extending writes still take it exclusive; re-check past-EOF under the shared lock and escalate if needed. Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

A FUSE server that advertises a large max_pages and max_write (e.g. max_pages=256, max_write=1MB) cannot currently obtain matching FUSE_READ request sizes from the kernel. Buffered sequential writes arrive at the server at the negotiated max_write size, but a large buffered read() is split into several smaller FUSE_READ requests. For a buffered read, filemap_get_pages() -> page_cache_sync_ra() sizes the read against ractl_max_pages(): max_pages = ractl->ra->ra_pages; if (req_size > max_pages && bdi->io_pages > max_pages) max_pages = min(req_size, bdi->io_pages); fuse leaves bdi->io_pages at the default VM_READAHEAD_PAGES (128KB), so a 1MB read() (req_size = 256 pages) is clamped to the readahead window (128KB, or 256KB for POSIX_FADV_SEQUENTIAL), producing four 256KB FUSE_READ round-trips instead of one. Set bdi->io_pages to fc->max_pages after feature negotiation. As the code above shows, io_pages only raises the limit when the request size already exceeds the readahead window, so it enlarges explicitly requested reads without enlarging the speculative readahead window. This avoids increasing speculative page-cache readahead on behalf of an unprivileged server. NFS does the same, setting io_pages from rpages while leaving ra_pages at the default. fc->max_pages is already bounded by fc->max_pages_limit (and, for virtio-fs, by the virtqueue descriptor count), so io_pages inherits the same bound. Suggested-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jim Harris <jim.harris@nvidia.com> Assisted-by: Cursor:claude-opus-4.8

commit 0c58a97 ("fuse: remove tmp folio for writebacks and internal rb tree") removed temp folios for dirty page writeback. Consequently, fuse can now use the default writeback accounting. With switching fuse to use default writeback accounting, there are some added benefits. This updates wb->writeback_inodes tracking as well now and updates writeback throughput estimates after writeback completion. This commit also removes inc_wb_stat() and dec_wb_stat(). These have no callers anymore now that fuse does not call them. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit 494d2f5)

hbirth added 4 commits June 24, 2026 11:02

fuse: drop BDI_CAP_STRICTLIMIT from fuse bdi setup

628829f

Signed-off-by: Horst Birthelmer <horst@birthelmer.de>

hbirth requested a review from bsbernd June 24, 2026 09:34

hbirth changed the title ~~Redfs ubuntu improve writeback performance~~ Redfs ubuntu HWE improve writeback performance Jun 24, 2026

hbirth requested review from cding-ddn and yongzech June 24, 2026 09:35

hbirth and others added 3 commits June 26, 2026 08:37

hbirth force-pushed the redfs-ubuntu-hwe-writeback branch from 9757e36 to 98adf72 Compare June 26, 2026 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redfs ubuntu HWE improve writeback performance#184

Redfs ubuntu HWE improve writeback performance#184
hbirth wants to merge 7 commits into
DDNStorage:redfs-ubuntu-hwe-6.17.0-16.16-24.04.1from
hbirth:redfs-ubuntu-hwe-writeback

hbirth commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hbirth commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants