Skip to content

google-crc32c: release GIL for large buffers in crc32c operations #16923

@zhixiangli

Description

@zhixiangli

Determine this is the right repository

  • I determined this is the correct repository in which to report this feature request.

Summary of the feature request

Release the Global Interpreter Lock (GIL) in _crc32c_extend and _crc32c_value when processing large, immutable byte buffers (>= 1MB). This allows other Python threads to run concurrently during expensive CRC32C calculations on large chunks of data.

Desired code experience

The API usage remains unchanged, but performance improves in multi-threaded environments.

Expected results

google_crc32c.value() and google_crc32c.extend() should release the GIL when the input buffer is large (e.g., >= 1MB) and immutable (bytes), allowing concurrent execution of other Python threads.

API client name and version

google-crc32c

Use case

This feature is highly beneficial for applications performing heavy I/O with checksum verification, such as downloading or uploading large objects using google-cloud-storage. In multi-threaded applications (e.g., web servers or data processing pipelines), preventing the CRC32C calculation from hogging the GIL ensures that other threads remain responsive.

Additional context

  • In the google-cloud-storage multirange downloader through bidi stream with a 1 MB chunk size, 5.4% of the total GIL hold time is spent on crc32c operations.
  • Releasing the GIL can significantly help with other threads' work, for example in gcsfs (which serves as a critical AI/ML connector to connect PyTorch and GCS).
  • Experiments show that at a size of ~1 MB, the overhead of releasing and acquiring the GIL can be ignored.
  • I am happy to contribute to this change if you are okay with this issue.

Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions