Refactor prefetcher code by googlyrahman · Pull Request #818 · fsspec/gcsfs

googlyrahman · 2026-04-20T15:13:22Z

Summary of the changes:

Refactor Prefetcher file:

Replaced the passing of individual lambda functions to the PrefetchProducer and PrefetchConsumer with direct object references. This streamlines the initialization process and makes the internal dependency graph much cleaner.

Start prefetching on 3rd read, instead of second

Previously, prefetching started on the second read. However, when used in conjunction with caches like readahead_chunked (which generates two underlying requests for a single user read), this triggered false positive signals, causing the engine to prefetch unnecessary extra data. Delaying the start until the third read corrects this behavior.

Disable Prefetching for variable reads with average greater than 64MB

When read chunk sizes are highly variable, the producer fetches data based on a rolling average, forcing the consumer to slice and stitch multiple partial chunks to match the exact user request.

For average chunk sizes <= 64MB, this memory assembly is fast, the bottleneck remains network I/O, and prefetching is highly beneficial.
For average chunk sizes > 64MB, memory allocation and byte concatenation shift the bottleneck to the CPU, which slows down the entire operation.

Numbers before Change

Before

SCENARIO SUMMARY	CONCURRENCY (GB/s)	PREFETCHER+CONCURRENCY (GB/s)	DIFF
Sequential fixed 64KB	0.03	0.37	+1065.2%
Sequential fixed 1MB	0.17	1.13	+558.7%
Sequential fixed 16MB	1.08	2.04	+89.7%
Sequential fixed 150MB	2.09	2.32	+11.2%
Sequential fixed 256MB	2.32	2.30	-1.0%
Sequential fixed 1GB	2.43	2.41	-0.7%
Sequential fixed 4MB (Below MIN_CHUNK)	0.36	1.44	+293.5%
Sequential fixed 6MB (Above MIN_CHUNK)	0.66	1.59	+140.0%
Sparse Read: Read 1MB, Skip 10MB forward	0.20	0.19	-3.6%
Sparse Read: Read 16MB, Skip 100MB forward	1.22	1.20	-1.5%
Sequential variable (1KB - 1MB) [Micro Jitter]	0.14	1.18	+717.4%
Sequential variable (1MB - 32MB) [Medium Jitter]	1.07	1.94	+80.7%
Sequential variable (64KB - 100MB) [Good Jitter]	1.74	1.63	-6.6%
Sequential variable (64KB - 1GB) [Large Jitter]	2.42	1.16	-51.9%
Sequential variable (54MB - 62MB) [Jitter Below 64MB]	1.67	1.41	-15.9%
Sequential variable (66MB - 74MB) [Jitter Above 64MB]	1.73	1.34	-22.7%
Sequential variable (90MB - 98MB) [Deep into CPU limits]	1.98	1.56	-21.2%
Step Up/Down (10MB to 100MB to 10MB)	1.25	1.74	+39.5%
Step Up/Down (10MB to 150MB to 120MB)	1.57	1.56	-0.4%
Step Up/Down Extreme Drop (1MB to 500MB to 1MB)	0.77	1.33	+73.1%
Pure Seek fixed (1MB)	0.17	0.19	+11.6%
Pure Seek fixed (16MB)	1.12	1.07	-5.0%
Pure Seek variable (64KB - 100MB)	1.71	1.79	+4.7%
30% Seq / 70% Seek (fixed 16MB)	1.09	0.76	-30.6%
50% Seq / 50% Seek (fixed 16MB)	1.12	0.70	-37.2%
70% Seq / 30% Seek (fixed 16MB)	1.14	0.77	-33.0%
90% Seq / 10% Seek (fixed 16MB)	1.06	1.09	+2.8%
50% Seq/Seek variable (64KB - 100MB)	1.58	0.96	-39.2%

Numbers After Change

After

SCENARIO SUMMARY	CONCURRENCY (GB/s)	PREFETCHER+CONCURRENCY (GB/s)	DIFF
Sequential fixed 64KB	0.03	0.37	+1002.9%
Sequential fixed 1MB	0.20	1.28	+524.2%
Sequential fixed 16MB	1.06	2.15	+102.9%
Sequential fixed 150MB	2.09	2.11	+1.3%
Sequential fixed 256MB	2.22	2.31	+3.9%
Sequential fixed 1GB	2.41	2.41	+0.4%
Sequential fixed 4MB (Below MIN_CHUNK)	0.40	1.20	+197.9%
Sequential fixed 6MB (Above MIN_CHUNK)	0.56	1.46	+160.4%
Sparse Read: Read 1MB, Skip 10MB forward	0.21	0.15	-26.2%
Sparse Read: Read 16MB, Skip 100MB forward	1.14	1.17	+3.3%
Sequential variable (1KB - 1MB) [Micro Jitter]	0.14	1.17	+727.9%
Sequential variable (1MB - 32MB) [Medium Jitter]	1.13	1.91	+69.8%
Sequential variable (64KB - 100MB) [Good Jitter]	1.71	1.57	-8.1%
Sequential variable (64KB - 1GB) [Large Jitter]	2.32	2.38	+2.6%
Sequential variable (54MB - 62MB) [Jitter Below 64MB]	1.70	1.49	-11.8%
Sequential variable (66MB - 74MB) [Jitter Above 64MB]	1.69	1.85	+9.1%
Sequential variable (90MB - 98MB) [Deep into CPU limits]	1.91	1.83	-3.9%
Step Up/Down (10MB to 100MB to 10MB)	1.25	1.61	+29.2%
Step Up/Down (10MB to 150MB to 120MB)	1.70	1.37	-19.4%
Step Up/Down Extreme Drop (1MB to 500MB to 1MB)	0.90	1.69	+87.0%
Pure Seek fixed (1MB)	0.19	0.21	+8.5%
Pure Seek fixed (16MB)	1.05	1.04	-0.4%
Pure Seek variable (64KB - 100MB)	1.55	1.71	+10.4%
30% Seq / 70% Seek (fixed 16MB)	1.09	1.02	-6.4%
50% Seq / 50% Seek (fixed 16MB)	1.08	0.94	-13.4%
70% Seq / 30% Seek (fixed 16MB)	1.09	0.87	-19.7%
90% Seq / 10% Seek (fixed 16MB)	1.12	1.09	-2.4%
50% Seq/Seek variable (64KB - 100MB)	1.68	1.44	-14.2%

martindurant · 2026-04-20T15:30:38Z

when used in conjunction with caches like readahead_chunked

This seems like unnecessary coupling between the behaviours of the "cache" and "fetcher", just the kind of thing you were wishing to avoid. This way around, if the cache is None (which is recommended), you may end up waiting, no?

codecov · 2026-04-20T15:49:53Z

Codecov Report

❌ Patch coverage is 93.02326% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.29%. Comparing base (eaf6e33) to head (1b26348).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
gcsfs/prefetcher.py	93.02%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #818      +/-   ##
==========================================
- Coverage   88.35%   88.29%   -0.06%     
==========================================
  Files          15       15              
  Lines        2989     3051      +62     
==========================================
+ Hits         2641     2694      +53     
- Misses        348      357       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

googlyrahman · 2026-04-20T15:52:58Z

This seems like unnecessary coupling between the behaviours of the "cache" and "fetcher", just the kind of thing you were wishing to avoid.

Yes Ideally, one shouldn't layer the existing caches on top of the prefetcher. However, as we discussed, customers might still enable this experimental flag to test it with default cache (currently readahead_chunked for zonal) during a trial. This change should support that use case.

if the cache is None (which is recommended), you may end up waiting, no?

This is correct, if cache_type = none - we do not start prefetching on the first sequential read, we start prefetching on second sequential read.

While I agree that starting prefetching on the second read is generally more logical, we have to consider existing users who might enable this experimental flag while still using current caches. We haven't transitioned the default cache setting to none yet because removing buffering entirely could negatively impact performance for large portion of our user base who do not use this experimental flag.

zhixiangli · 2026-04-21T01:58:05Z

What happens if read-ahead runs out of cache? I think it will trigger the exact same false-positive signals again?

googlyrahman · 2026-04-21T05:15:34Z

What happens if read-ahead runs out of cache? I think it will trigger the exact same false-positive signals again?

Yes, so basically in cache_type="none" the prefetching would start from actual third read, and in cache_type="readahead_chunked" the prefetching would start from actual second second.

I'm not concerned about handling false-positive here, I'm only concerned about the prefetching shouldn't happen at the first read, because then it would destroy the throughput of users doing completely random reads.

If the goal is to fix this false positive, then this variable MIN_STREAKS_FOR_PREFETCHING needs to be adjusted with respect to cache_types, which would introduce some un-necessary code in core.py which i think might not be desirable at the moment, The core.py is anyway very big, and we wish to simplify the code in future + readahead_chunked will gonna deprecate in future (while there are other caches with similar behaviour of readahead_chunked such as background block but they are rarely used as per my past converation with Martin).

googlyrahman · 2026-04-22T17:24:37Z

@martindurant can you review this?

zhixiangli · 2026-04-24T09:21:49Z

            if not available:
-                if self.is_producer_stopped() and self.queue.empty():
+                is_producer_stopped = (
+                    not hasattr(self.orchestrator, "producer")


nit: how about setting the producer to None then we do self.orchestrator.producer is None check here? It's more readable.

(or set it to None as a class attribute)

googlyrahman · 2026-05-13T13:35:56Z

@martindurant, can you take a look if you're interested? The benchmarks before and after this change is attached in the description.

martindurant · 2026-05-13T14:02:44Z

The change should affect the small reads primarily?

martindurant · 2026-05-13T14:05:35Z

(sorry no, that comment is for #840)

googlyrahman · 2026-05-13T15:07:00Z

The change should affect the small reads primarily?
(sorry no, that comment is for #840)

Yes, it should impact small reads primarily.

martindurant · 2026-05-13T15:25:17Z

-    bp._fetch(100, 150)
+    # Do 6 reads to push the streak well past the MIN_STREAKS threshold
+    for i in range(6):
+        bp._fetch(i * 50, (i + 1) * 50)


It feels like the number here are somewhat arbitrary - or at least, they depend on the current set of defaults assumed by the prefetcher. Maybe they should be explicitly derived from those values?
I don't mind if not, but do add a comment, because they will need to be updated should the defaults change.

Yeah they're hardcoded, and i think the best practice is to hardcode the test, and not derive from main code. Added the comment to update as the parent value changes.

martindurant · 2026-05-13T15:26:03Z


-    assert bp._fetch(0, 100) == b"A" * 100
+    for i in range(2):
+        bp._fetch(i * 100, (i + 1) * 100)


Before, the read_tracker was directly mutated. Why was that insufficient?

I didn't get this, can you detail this comment more?

martindurant · 2026-05-13T15:27:52Z

+    for i in range(4):
+        bp._fetch(i * 60, (i + 1) * 60)
+
+    fsspec.asyn.sync(bp.loop, asyncio.sleep, 0.1)


The sleep may need to be bigger in CI machines - can there be some form of wait here?

I think, given this is just a test, this should be fine. I've tested it on smaller machine (4 cores) as well as bigger machine (192 cores). This test primarily checks whether prefetching remains disable if average is greater than user specified maximum size. a 0.1s gives event loop enough space to schedule things up if it is required.

martindurant · 2026-05-13T15:30:02Z

+        first_val = self._history[0]
+        return any(val != first_val for val in self._history)


Suggested change

first_val = self._history[0]

return any(val != first_val for val in self._history)

return len(set(self._history)) > 1

?

Whether this is faster probably depends on where the first non-equal value is and how long the list can be.

I think any(...) is better, it would exit early as soon as it finds the first non equal value, additionally it doesn't need to create any extra space for set.

My testing suggests set is always faster if the history isn't big (which it never is!) or the non-equal value is not at the start. The iterator and set of hashes probably take us similar, negligible space.

martindurant · 2026-05-13T15:34:38Z

+    # remains the network I/O. However, for massive reads (>= 64MB), the extra
+    # step of copying and assembling huge byte strings in memory severely slows
+    # down the operation.
+    VARIABLE_IO_THRESHOLD = 64 * 1024 * 1024


Does the best value of this depend on the network bandwidth? I bet on slow connections, we always prefer any amount of prefetching and copy time is irrelevant.

Yes, this is true, for slow connections it's always best to prefetch any amount. I'm just playing safe here by putting a lower value, and setting this value based on the numbers derived from fastest network. Do you think we should adjust this variable based on bucket type?

In general, we should be able to establish how close to the data we are running and make sensible decisions based on it. That would make nice follow-up work.

martindurant · 2026-05-13T15:39:49Z

            if not available:
-                if self.is_producer_stopped() and self.queue.empty():
+                is_producer_stopped = (
+                    not hasattr(self.orchestrator, "producer")


(or set it to None as a class attribute)

googlyrahman · 2026-05-25T20:24:50Z

@martindurant, Happy Monday!

Sorry for the delay, I got tied up with a few other things and couldn't follow up on this PR sooner. I've addressed your comments and included one additional small fix. (I kept them in separate commits to make reviewing easier.)

Regarding the small fix: Previously, if the background prefetcher entered an error state, it would permanently reject incoming read requests. This is problematic for users who want to catch the exception and retry on their end, because any subsequent retries would just return the cached exception. The latest commit solves this by triggering a hard seek to the requested offset. The hard seek takes care of clearing the queue, resetting the internal prefetch state, and starting fresh with a new network request.

Let me know what you think!

martindurant · 2026-05-27T20:29:16Z

+    assert bp._fetch(0, 10) == b"X" * 10
+    assert bp._error is None
+
+    bp.close()


Any problem if close() is not reached due to an error in the test? Should this be a fixture?

martindurant · 2026-05-27T20:55:20Z

+        first_val = self._history[0]
+        return any(val != first_val for val in self._history)


My testing suggests set is always faster if the history isn't big (which it never is!) or the non-equal value is not at the start. The iterator and set of hashes probably take us similar, negligible space.

martindurant · 2026-05-27T20:56:25Z

+    # remains the network I/O. However, for massive reads (>= 64MB), the extra
+    # step of copying and assembling huge byte strings in memory severely slows
+    # down the operation.
+    VARIABLE_IO_THRESHOLD = 64 * 1024 * 1024


In general, we should be able to establish how close to the data we are running and make sensible decisions based on it. That would make nice follow-up work.

martindurant · 2026-05-28T20:19:00Z

+                )
+
+                # Disable prefetching ahead if variable AND average > 64MB, or if it exceeds user max
+                if (


This method got very long with all these cases, maybe should split the "choices" and the "action" parts.

martindurant · 2026-05-28T20:20:22Z

+                    logger.debug("Producer reached EOF. Exiting background loop.")
+                    self.is_stopped = True
+                    break
        except asyncio.CancelledError:


This big indent feels like a context?

martindurant · 2026-05-28T20:20:42Z

+                    break
        except asyncio.CancelledError:
            logger.debug("PrefetchProducer loop was cancelled.")
            pass


Suggested change

pass

martindurant · 2026-05-28T20:23:08Z

+                        is_variable = self.tracker.is_variable
+                        avg_io_size = self.tracker.average


May as well inline these in the calculations below, since they are already multiline. We don't use the variables again.

googlyrahman force-pushed the prefetchx branch from f0ddd03 to f1590c6 Compare April 20, 2026 15:37

googlyrahman marked this pull request as ready for review April 21, 2026 05:17

zhixiangli approved these changes Apr 24, 2026

View reviewed changes

update prefetcher logic

555b6d2

googlyrahman force-pushed the prefetchx branch from f1590c6 to 555b6d2 Compare May 6, 2026 18:19

googlyrahman added 2 commits May 12, 2026 11:13

Merge remote-tracking branch 'upstream/main' into prefetchx

9ed15a3

change 100MB to 64MB

bf79c7b

martindurant reviewed May 13, 2026

View reviewed changes

googlyrahman added 2 commits May 23, 2026 20:04

address comments

62c6aa0

recover from error by hard seek

1b26348

martindurant reviewed May 28, 2026

View reviewed changes

		first_val = self._history[0]
		return any(val != first_val for val in self._history)

	first_val = self._history[0]
	return any(val != first_val for val in self._history)
	return len(set(self._history)) > 1

		is_variable = self.tracker.is_variable
		avg_io_size = self.tracker.average

Conversation

googlyrahman commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor Prefetcher file:

Start prefetching on 3rd read, instead of second

Disable Prefetching for variable reads with average greater than 64MB

Before

After

Uh oh!

martindurant commented Apr 20, 2026

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

googlyrahman commented Apr 20, 2026

Uh oh!

zhixiangli commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

googlyrahman commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

googlyrahman commented Apr 22, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

googlyrahman commented May 13, 2026

Uh oh!

martindurant commented May 13, 2026

Uh oh!

martindurant commented May 13, 2026

Uh oh!

googlyrahman commented May 13, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

googlyrahman commented May 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

googlyrahman commented Apr 20, 2026 •

edited

Loading

codecov Bot commented Apr 20, 2026 •

edited

Loading

zhixiangli commented Apr 21, 2026 •

edited

Loading

googlyrahman commented Apr 21, 2026 •

edited

Loading