block: enable RWF_DONTCACHE for block devices#835
Conversation
|
Upstream branch: aa54b1d |
b1870f6 to
ca57796
Compare
|
Upstream branch: 70eda68 |
701593b to
323483e
Compare
|
Upstream branch: 70eda68 |
323483e to
0521c96
Compare
ca57796 to
c1feb59
Compare
|
Upstream branch: 8bc67e4 |
0521c96 to
c5e228d
Compare
c1feb59 to
ea833a1
Compare
|
Upstream branch: 6779b50 |
c5e228d to
b59c682
Compare
ea833a1 to
7af85d1
Compare
|
Upstream branch: 79bd2dd |
b59c682 to
f9c12fd
Compare
7af85d1 to
de94ac7
Compare
|
Upstream branch: eed108e |
f9c12fd to
a98dc1b
Compare
de94ac7 to
86d8d37
Compare
|
Upstream branch: e8c2f9f |
a98dc1b to
adda07a
Compare
86d8d37 to
9805659
Compare
|
Upstream branch: eb3f4b7 |
adda07a to
4a04def
Compare
9805659 to
3f4a345
Compare
|
Upstream branch: 8fde5d1 |
4a04def to
19536ad
Compare
3f4a345 to
c6dc343
Compare
|
Upstream branch: e43ffb6 |
19536ad to
b7cacd4
Compare
c6dc343 to
fc36596
Compare
|
Upstream branch: ba3e43a |
Some bio completion handlers need to run from preemptible task context,
but bio_endio() may be called from IRQ context (e.g., buffer_head
writeback). Callers need a way to ensure their callback eventually runs
from a sleepable context. Add infrastructure for that, in two forms:
1. BIO_COMPLETE_IN_TASK, a bio flag the submitter sets when it knows
in advance that its callback needs task context (e.g., dropbehind
writeback). bio_endio() sees the flag and offloads completion to a
worker automatically.
2. bio_complete_in_task(), a helper that completion callbacks can
invoke from within bi_end_io() when the deferral decision is
dynamic (e.g., fserror reporting).
Both share a per-CPU batch list drained by a delayed work item on a
WQ_PERCPU workqueue. Producers push the bio onto the local CPU's batch
and schedule the work item, which then dispatches each bio's bi_end_io()
from task context. The delayed work item uses a 1-jiffie delay to allow
batches of completions to accumulate before processing.
Both methods are gated on bio_in_atomic(), which returns true in any
context where a sleeping bi_end_io() is unsafe, including
non-preemptible task context. This logic is copied from commit
c99fab6 ("erofs: fix atomic context detection when
!CONFIG_DEBUG_LOCK_ALLOC").
Two CPU hotplug callbacks are used to drain remaining bios from the
departing CPU's batch, while maintaining the per-CPU behavior. The
CPUHP_AP_ONLINE_DYN callback disables the per-CPU delayed work while the
CPU is still online, preventing it from running on an unbound worker
later. CPUHP_BP_PREPARE_DYN then drains any bios added between disabling
the work item and CPU offline.
Link: https://lore.kernel.org/all/20260409160243.1008358-1-hch@lst.de/
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Set BIO_COMPLETE_IN_TASK on iomap writeback bios when a dropbehind folio is added. This ensures that bi_end_io runs in task context, where folio_end_dropbehind() can safely invalidate folios. With the bio layer now handling task-context deferral generically, IOMAP_IOEND_DONTCACHE is no longer needed, as XFS no longer needs to route DONTCACHE ioends through its completion workqueue. Remove the flag and its NOMERGE entry. Without the NOMERGE, regular I/Os that get merged with a dropbehind folio will also have their completion deferred to task context. Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Christoph Hellwig <hch@lst.de>
Add block_write_begin_iocb() which threads the kiocb through to __filemap_get_folio() so that buffer_head-based I/O can use DONTCACHE behavior. When the iocb has IOCB_DONTCACHE set, FGP_DONTCACHE is passed to mark the folio for dropbehind. The existing block_write_begin() is preserved as a wrapper that passes a NULL iocb. Set BIO_COMPLETE_IN_TASK in submit_bh_wbc() when the folio has dropbehind set, so that buffer_head writeback completions get deferred to task context. Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Christoph Hellwig <hch@lst.de>
Block device buffered reads and writes already pass through filemap_read() and iomap_file_buffered_write() respectively, both of which handle IOCB_DONTCACHE. Enable RWF_DONTCACHE for block device files by setting FOP_DONTCACHE in def_blk_fops. For CONFIG_BUFFER_HEAD=y paths, use block_write_begin_iocb() in blkdev_write_begin() to thread the kiocb through so that buffer_head writeback gets dropbehind support. CONFIG_BUFFER_HEAD=n paths are handled by the previously added iomap BIO_COMPLETE_IN_TASK support. This support is useful for databases that operate on raw block devices, among other userspace applications. Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Christoph Hellwig <hch@lst.de>
b7cacd4 to
46d2366
Compare
Pull request for series with
subject: block: enable RWF_DONTCACHE for block devices
version: 6
url: https://patchwork.kernel.org/project/linux-block/list/?series=1095033