From 09f401a184fd5064c85c85d0a3757932a17e8691 Mon Sep 17 00:00:00 2001 From: Li Yonghui Date: Tue, 26 May 2026 02:45:59 +0000 Subject: [PATCH] Document intentional non-caching and fallback behavior for UNKNOWN bucket layouts --- docs/source/hns_buckets.rst | 11 +++++++++++ docs/source/rapid_storage_support.rst | 3 +++ gcsfs/extended_gcsfs.py | 4 +++- 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/source/hns_buckets.rst b/docs/source/hns_buckets.rst index fe55be918..89b12ad07 100644 --- a/docs/source/hns_buckets.rst +++ b/docs/source/hns_buckets.rst @@ -76,6 +76,17 @@ The following benchmarks show the time taken (in seconds) to rename a directory For more details on managing these buckets, refer to the official documentation for `Hierarchical Namespace `_. +.. _bucket-type-detection-and-caching: + +Bucket Type Detection and Caching +--------------------------------- + +Before routing directory-level or file-level operations, the ``ExtendedGcsFileSystem`` performs a bucket type detection query via the Cloud Storage Control API's ``get_storage_layout`` method to determine if the bucket is HNS-enabled or Zonal. + +* **Success Caching:** Once the bucket type is successfully determined, the filesystem caches this layout type to avoid repeated API lookup overhead for subsequent operations. +* **Non-Caching of UNKNOWN:** If the bucket type detection returns ``UNKNOWN`` (for instance, due to transient network failures), it is **intentionally not cached**. This ensures that transient lookup failures do not permanently degrade HNS or Zonal performance. +* **Fallback Behavior:** If the bucket type is flagged as ``UNKNOWN``, the filesystem gracefully falls back to standard flat-namespace GCS operations without raising an error. The filesystem will retry the lookup on subsequent requests, allowing it to automatically adopt HNS/Zonal optimizations if the issue resolves. + Disabling HNS Support ------------------------------ diff --git a/docs/source/rapid_storage_support.rst b/docs/source/rapid_storage_support.rst index 8bf7d6068..59c8644e7 100644 --- a/docs/source/rapid_storage_support.rst +++ b/docs/source/rapid_storage_support.rst @@ -53,6 +53,9 @@ making Rapid Storage support fully backward compatible for all operations. At initialization, ``ExtendedGcsFileSystem`` evaluates the underlying bucket's storage layout. If it detects Rapid storage, file-level operations are dynamically routed to the ``ZonalFile`` class instead of the standard ``GCSFile``. +.. note:: + For detailed information on how bucket type detection works and the layout caching strategy, please refer to the HNS documentation on :ref:`bucket-type-detection-and-caching`. + Unlike standard operations which use HTTP endpoints, ``ZonalFile`` utilizes the Google Cloud Storage gRPC API—specifically the ``AsyncMultiRangeDownloader`` (MRD) for reads and ``AsyncAppendableObjectWriter`` (AAOW) for writes. Operation Semantics: Standard vs. Rapid Storage diff --git a/gcsfs/extended_gcsfs.py b/gcsfs/extended_gcsfs.py index 175890af8..9dcaeb0dc 100644 --- a/gcsfs/extended_gcsfs.py +++ b/gcsfs/extended_gcsfs.py @@ -194,7 +194,9 @@ async def _lookup_bucket_type(self, bucket): if bucket in self._storage_layout_cache: return self._storage_layout_cache[bucket] bucket_type = await self._get_bucket_type(bucket) - # Dont cache UNKNOWN type + # Don't cache UNKNOWN type. + # This ensures that subsequent operations will retry the lookup, + # allowing it to recover when the transient error resolves. if bucket_type == BucketType.UNKNOWN: return bucket_type self._storage_layout_cache[bucket] = bucket_type