From 1eeb9376ee98bf1cdcd2bf2b4c79da3e0e8d0308 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Wed, 6 May 2026 02:38:34 +0000 Subject: [PATCH 1/6] Document cuda.core support policy Add support.rst covering versioning (SemVer), CUDA version support (dual major versions), Python version support (CPython EOL schedule), free-threading (experimental), and release cadence (bimonthly). Closes #2030 --- cuda_core/docs/source/index.rst | 1 + cuda_core/docs/source/support.rst | 75 +++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+) create mode 100644 cuda_core/docs/source/support.rst diff --git a/cuda_core/docs/source/index.rst b/cuda_core/docs/source/index.rst index 3bf962d7251..5c6c9d83ffe 100644 --- a/cuda_core/docs/source/index.rst +++ b/cuda_core/docs/source/index.rst @@ -21,6 +21,7 @@ Welcome to the documentation for ``cuda.core``. .. toctree:: :maxdepth: 1 + support conduct license diff --git a/cuda_core/docs/source/support.rst b/cuda_core/docs/source/support.rst new file mode 100644 index 00000000000..f95ab2a72fe --- /dev/null +++ b/cuda_core/docs/source/support.rst @@ -0,0 +1,75 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. _cuda-core-support: + +``cuda.core`` Support Policy +============================ + +Versioning Scheme +----------------- + +``cuda.core`` follows `Semantic Versioning (SemVer) `_ with the version +format ``major.minor.patch``: + +- **Major**: Bumped when a new CUDA major release is out and support for the oldest CUDA major + version is dropped. Breaking API changes only happen at major-version boundaries. +- **Minor**: Bumped when new, backward-compatible features are added, or when a new Python minor + release is out and the oldest supported Python version reaches EOL. +- **Patch**: Bumped for bug fixes and backward-compatible maintenance updates. + +Unlike ``cuda.bindings``, the ``cuda.core`` version is *not* aligned with the CUDA Toolkit version. +Consult the table below or the :doc:`release notes ` to determine which CUDA versions are +supported by a given ``cuda.core`` release. + +CUDA Version Support +-------------------- + +``cuda.core`` is actively maintained to support the two (2) most recent CUDA major versions. For +example, ``cuda.core`` 1.x supports CUDA 12 and 13. Any fix in the latest release would be +backported as needed. + +When a new CUDA major version is released and support for the oldest major version is dropped, +``cuda.core`` will release a new major version (e.g., 1.x → 2.0.0). + +.. list-table:: CUDA Version Support Matrix + :header-rows: 1 + + * - ``cuda.core`` version + - Supported CUDA versions + * - 1.x + - 12, 13 + +Python Version Support +---------------------- + +``cuda.core`` supports all Python versions following the `CPython EOL schedule +`_. As of writing, Python 3.10 – 3.14 are supported. + +When a new Python minor version is released and the oldest supported version reaches EOL, +``cuda.core`` will bump its minor version accordingly. + +Free-threading Build Support +---------------------------- + +As of ``cuda.core`` 1.0.0, wheels for the `free-threaded interpreter +`_ are shipped to PyPI. This support +is currently *experimental*. + +1. For now, you are responsible for making sure that calls into the underlying CUDA libraries + are thread-safe. This is subject to change. + +Release Cadence +--------------- + +- ``cuda.core`` follows its own release cadence, independent of CUDA Toolkit releases, as long as + SemVer guarantees are maintained. +- We currently aim for bimonthly releases, though this is subject to change. +- Major version releases are aligned to CUDA major version releases. +- New features may be delivered in minor releases at any time — not gated by the CUDA Toolkit + release schedule. + +---- + +The NVIDIA CUDA Python team reserves the right to amend the above support policy. Any major changes, +however, will be announced to users in advance. From 74dff3dd66f217a0c4095e8cb373d6f98e7ddfdb Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Wed, 6 May 2026 02:46:42 +0000 Subject: [PATCH 2/6] Fix broken CCCL URLs and add missing cuda.bindings interfaces - Update cuda.coop and cuda.compute URLs from the old nvidia.github.io/cccl/python/{coop,compute} paths (now 404) to the current unstable doc paths. - Add nvFatbin and NVML to the cuda.bindings interface list. - Update all three synced files: README.md, cuda_python/DESCRIPTION.rst, and cuda_python/docs/source/index.rst. --- README.md | 6 ++++-- cuda_python/DESCRIPTION.rst | 6 ++++-- cuda_python/docs/source/index.rst | 8 ++++---- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 6da895bbb9b..0a986bc10b0 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c * [cuda.core](https://nvidia.github.io/cuda-python/cuda-core/latest): Pythonic access to CUDA Runtime and other core functionality * [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest): Low-level Python bindings to CUDA C APIs * [cuda.pathfinder](https://nvidia.github.io/cuda-python/cuda-pathfinder/latest): Utilities for locating CUDA components installed in the user's Python environment -* [cuda.coop](https://nvidia.github.io/cccl/python/coop): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels -* [cuda.compute](https://nvidia.github.io/cccl/python/compute): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host* +* [cuda.coop](https://nvidia.github.io/cccl/unstable/python/coop.html): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels +* [cuda.compute](https://nvidia.github.io/cccl/unstable/python/compute/index.html): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host* * [numba.cuda](https://nvidia.github.io/numba-cuda/): A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions * [cuda.tile](https://docs.nvidia.com/cuda/cutile-python/): A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels * [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest): Pythonic access to NVIDIA CPU & GPU Math Libraries, with [*host*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#host-apis), [*device*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#device-apis), and [*distributed*](https://docs.nvidia.com/cuda/nvmath-python/latest/distributed-apis/index.html) APIs. It also provides low-level Python bindings to host C APIs ([nvmath.bindings](https://docs.nvidia.com/cuda/nvmath-python/latest/bindings/index.html)). @@ -44,4 +44,6 @@ The list of available interfaces is: * NVRTC * nvJitLink * NVVM +* nvFatbin * cuFile +* NVML diff --git a/cuda_python/DESCRIPTION.rst b/cuda_python/DESCRIPTION.rst index 6120a568023..90bf5c127a4 100644 --- a/cuda_python/DESCRIPTION.rst +++ b/cuda_python/DESCRIPTION.rst @@ -10,8 +10,8 @@ CUDA Python is the home for accessing NVIDIA's CUDA platform from Python. It con * `cuda.core `_: Pythonic access to CUDA Runtime and other core functionality * `cuda.bindings `_: Low-level Python bindings to CUDA C APIs * `cuda.pathfinder `_: Utilities for locating CUDA components installed in the user's Python environment -* `cuda.coop `_: A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels -* `cuda.compute `_: A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc. that are callable on the *host* +* `cuda.coop `_: A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels +* `cuda.compute `_: A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc. that are callable on the *host* * `numba.cuda `_: A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions * `cuda.tile `_: A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels * `nvmath-python `_: Pythonic access to NVIDIA CPU & GPU Math Libraries, with `host `_, `device `_, and `distributed `_ APIs. It also provides low-level Python bindings to host C APIs (`nvmath.bindings `_). @@ -52,4 +52,6 @@ The list of available interfaces is: * NVRTC * nvJitLink * NVVM +* nvFatbin * cuFile +* NVML diff --git a/cuda_python/docs/source/index.rst b/cuda_python/docs/source/index.rst index 7aad94ef9c4..458a7a03229 100644 --- a/cuda_python/docs/source/index.rst +++ b/cuda_python/docs/source/index.rst @@ -20,8 +20,8 @@ multiple components: - `CUPTI Python`_: Python APIs for creation of profiling tools that target CUDA Python applications via the CUDA Profiling Tools Interface (CUPTI) - `Accelerated Computing Hub`_: Open-source learning materials related to GPU computing. You will find user guides, tutorials, and other works freely available for all learners interested in GPU computing. -.. _cuda.coop: https://nvidia.github.io/cccl/python/coop -.. _cuda.compute: https://nvidia.github.io/cccl/python/compute +.. _cuda.coop: https://nvidia.github.io/cccl/unstable/python/coop.html +.. _cuda.compute: https://nvidia.github.io/cccl/unstable/python/compute/index.html .. _numba.cuda: https://nvidia.github.io/numba-cuda/ .. _cuda.tile: https://docs.nvidia.com/cuda/cutile-python/ .. _nvmath-python: https://docs.nvidia.com/cuda/nvmath-python/latest @@ -50,8 +50,8 @@ be available, please refer to the `cuda.bindings`_ documentation for installatio cuda.core cuda.bindings cuda.pathfinder - cuda.coop - cuda.compute + cuda.coop + cuda.compute numba.cuda cuda.tile nvmath-python From 71b1e6ee31612535031732d414013e935a038a35 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Wed, 6 May 2026 03:31:30 +0000 Subject: [PATCH 3/6] Add missing entries to cuda.core 1.0.0 release notes Add new features (green contexts, system.Device NVML APIs, system.typing module, NVML enum re-wrapping), breaking changes (tensor bridge behavior, system.Device renames, privatized helper classes, UUID format change, removed enums), and bug fixes (is_managed for pool alloc, nvJitLink log error handling, NVML event set init, Device.arch unknown, empty field values, runtime error messages, wheel size reduction). --- cuda_core/docs/source/release/1.0.0-notes.rst | 186 +++++++++++++++--- 1 file changed, 156 insertions(+), 30 deletions(-) diff --git a/cuda_core/docs/source/release/1.0.0-notes.rst b/cuda_core/docs/source/release/1.0.0-notes.rst index 7f0ced8c10b..cdac2c09d49 100644 --- a/cuda_core/docs/source/release/1.0.0-notes.rst +++ b/cuda_core/docs/source/release/1.0.0-notes.rst @@ -20,11 +20,79 @@ New features including string process state queries, lock/checkpoint/restore/unlock operations, and GPU UUID remapping support for restore. (`#1343 `__) +- Added green context support (CUDA 12.4+). New types :class:`Context`, + :class:`ContextOptions`, :class:`SMResource`, :class:`SMResourceOptions`, + :class:`WorkqueueResource`, and :class:`WorkqueueResourceOptions` enable GPU + SM and workqueue resource partitioning. Create green contexts via + :meth:`Device.create_context`, then use :meth:`Context.create_stream` and + :attr:`Context.resources` to work within the partitioned resources. + (`#1976 `__) +- Added the :mod:`cuda.core.system` module for NVIDIA Management Library (NVML) + access: + + - :attr:`system.Device.mig` for querying and setting MIG mode, enumerating + MIG device instances, and navigating parent/child relationships. + (`#1916 `__) + - :attr:`system.Device.compute_running_processes` for querying running compute + processes on a device, returning :class:`~system.ProcessInfo` objects with + PID, GPU memory usage, and MIG instance IDs. + (`#1917 `__) + - :meth:`system.Device.get_nvlink` for querying NVLink version and state per + link, and :attr:`system.Device.utilization` returning current GPU and memory + utilization rates. + (`#1918 `__) + +- Re-wrapped NVML enums as human-readable ``StrEnum`` subclasses instead of raw + integer re-exports from ``cuda.bindings.nvml``. Added + :class:`~system.typing.GpuP2PCapsIndex`, :class:`~system.typing.GpuP2PStatus`, + and :class:`~system.typing.GpuTopologyLevel` enums. + (`#2014 `__) +- Moved all :mod:`cuda.core.system` enums into a new :mod:`cuda.core.system.typing` + module. Imports from ``cuda.core.system`` continue to work but the canonical + location is now ``cuda.core.system.typing``. + (`#2022 `__) +- Enums are now available in places where a small number of string values are + accepted or returned. You may continue to use the string values, or use + enumerations for better linting and type-checking. + (`#2016 `__) + The new enums are: + + - :class:`cuda.core.typing.CompilerBackendType` + - :class:`cuda.core.typing.GraphConditionalType` + - :class:`cuda.core.typing.GraphMemoryType` + - :class:`cuda.core.typing.ManagedMemoryLocationType` + - :class:`cuda.core.typing.ObjectCodeFormatType` + - :class:`cuda.core.typing.PCHStatusType` + - :class:`cuda.core.typing.SourceCodeType` + - :class:`cuda.core.typing.VirtualMemoryAccessType` + - :class:`cuda.core.typing.VirtualMemoryAllocationType` + - :class:`cuda.core.typing.VirtualMemoryGranularityType` + - :class:`cuda.core.typing.VirtualMemoryHandleType` + - :class:`cuda.core.typing.VirtualMemoryLocationType` Breaking changes ---------------- +- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor`` + objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor`` + is passed to any ``from_*`` classmethod (``from_dlpack``, + ``from_cuda_array_interface``, ``from_array_interface``, or + ``from_any_interface``), tensor metadata is read directly from the underlying + C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. + This yields ~7–20x faster ``StridedMemoryView`` construction for PyTorch + tensors (depending on whether stream ordering is required). Proper CUDA stream + ordering is established between PyTorch's current stream and the consumer + stream, matching the DLPack synchronization contract. + Requires PyTorch >= 2.3. + + This is a *behavioral* breaking change: because the AOTI tensor bridge reads + raw metadata without re-enacting PyTorch's export guardrails, tensors that + PyTorch would reject at the DLPack boundary (notably ``requires_grad``, + conjugated, non-strided/sparse, and wrong-current-device CUDA tensors) are + now accepted. This is intentional — ``StridedMemoryView`` is designed for + low-level interop where those checks are not needed. + (`#749 `__) - Renamed :class:`~graph.GraphDef` to :class:`~graph.GraphDefinition` for consistency with the rest of the API, which spells words out (e.g. ``TensorMapDescriptor``, not ``TensorMapDesc``). @@ -125,36 +193,94 @@ Breaking changes - :obj:`cuda.core.typing.DevicePointerT` -> :obj:`cuda.core.typing.DevicePointerType` - :obj:`cuda.core.typing.IsStreamT` -> :obj:`cuda.core.typing.IsStreamType` -Fixes and enhancements ------------------------ +- Renamed and converted multiple :class:`~system.Device` properties and methods + for naming consistency + (`#1946 `__): -- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor`` - objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor`` - is passed to any ``from_*`` classmethod (``from_dlpack``, - ``from_cuda_array_interface``, ``from_array_interface``, or - ``from_any_interface``), tensor metadata is read directly from the underlying - C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. - This yields ~7-20x faster ``StridedMemoryView`` construction for PyTorch - tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch's current - stream and the consumer stream, matching the DLPack synchronization contract. - Requires PyTorch >= 2.3. - (`#749 `__) + On :class:`~system.Device`: -- Enums are not available in places where a small number of string values are - accepted or returned. You may continue to use the string values, or use - enumerations for better linting and type-checking. - (`#2016 `__) - The new enums are: + - ``is_c2c_mode_enabled`` -> ``is_c2c_enabled`` + - ``persistence_mode_enabled`` -> ``is_persistence_mode_enabled`` + - ``clock(clock_type)`` -> ``get_clock(clock_type)`` + - ``get_auto_boosted_clocks_enabled()`` -> ``is_auto_boosted_clocks_enabled`` + (method -> property) + - ``get_current_clock_event_reasons()`` -> ``current_clock_event_reasons`` + (method -> property) + - ``get_supported_clock_event_reasons()`` -> ``supported_clock_event_reasons`` + (method -> property) + - ``display_mode`` -> ``is_display_connected`` + - ``display_active`` -> ``is_display_active`` + - ``fan(fan=0)`` -> ``get_fan(fan=0)`` + - ``get_supported_pstates()`` -> ``supported_pstates`` + (method -> property) - - :class:`cuda.core.typing.CompilerBackendType` - - :class:`cuda.core.typing.GraphConditionalType` - - :class:`cuda.core.typing.GraphMemoryType` - - :class:`cuda.core.typing.ManagedMemoryLocationType` - - :class:`cuda.core.typing.ObjectCodeFormatType` - - :class:`cuda.core.typing.PCHStatusType` - - :class:`cuda.core.typing.SourceCodeType` - - :class:`cuda.core.typing.VirtualMemoryAccessType` - - :class:`cuda.core.typing.VirtualMemoryAllocationType` - - :class:`cuda.core.typing.VirtualMemoryGranularityType` - - :class:`cuda.core.typing.VirtualMemoryHandleType` - - :class:`cuda.core.typing.VirtualMemoryLocationType` + On ``PciInfo``: + + - ``get_max_pcie_link_generation()`` -> ``link_generation`` (method -> property) + - ``get_gpu_max_pcie_link_generation()`` -> ``max_link_generation`` + (method -> property) + - ``get_max_pcie_link_width()`` -> ``max_link_width`` (method -> property) + - ``get_current_pcie_link_generation()`` -> ``current_link_generation`` + (method -> property) + - ``get_current_pcie_link_width()`` -> ``current_link_width`` + (method -> property) + - ``get_pcie_throughput(counter)`` -> ``get_throughput(counter)`` + - ``get_pcie_replay_counter()`` -> ``replay_counter`` (method -> property) + + On ``Temperature``: + + - ``sensor(sensor=...)`` -> ``get_sensor(sensor=...)`` + - ``threshold(threshold_type)`` -> ``get_threshold(threshold_type)`` + - ``thermal_settings(sensor_index)`` -> ``get_thermal_settings(sensor_index)`` + + On ``FanInfo``: + + - ``set_default_fan_speed()`` -> ``set_default_speed()`` + +- Removed 18 helper/data-container classes from ``cuda.core.system.__all__``: + ``BAR1MemoryInfo``, ``ClockInfo``, ``ClockOffsets``, ``CoolerInfo``, + ``DeviceAttributes``, ``DeviceEvents``, ``EventData``, ``FanInfo``, + ``FieldValue``, ``FieldValues``, ``GpuDynamicPstatesInfo``, + ``GpuDynamicPstatesUtilization``, ``InforomInfo``, ``PciInfo``, + ``RepairStatus``, ``Temperature``, ``ThermalSensor``, ``ThermalSettings``. + These classes are still returned by :class:`~system.Device` properties and + methods but should not be directly instantiated by users. + (`#1942 `__) +- Removed ``BrandType``, ``NvlinkVersion``, ``PcieUtilCounter``, ``Pstates``, + and ``TemperatureSensors`` enums from ``cuda.core.system``; the underlying + values are now returned as plain strings or accessed through other APIs. + (`#2014 `__) +- :attr:`system.Device.uuid` now returns the full NVML UUID with prefix + (e.g. ``GPU-...``). Use :attr:`system.Device.uuid_without_prefix` for + the previous behavior. + (`#1916 `__) + +Fixes and enhancements +----------------------- + +- Fixed :attr:`Buffer.is_managed` returning ``False`` for pool-allocated managed + memory (:class:`ManagedMemoryResource`), which caused DLPack interop to + misclassify managed buffers as ``kDLCUDAHost``. The fix queries both the + driver pointer attribute and the memory resource. + (`#1924 `__) +- :attr:`system.Device.arch` now returns ``UNKNOWN`` instead of raising + ``ValueError`` when NVML reports an architecture not yet in the enum. + (`#1937 `__) +- :meth:`system.Device.get_field_values` and + :meth:`system.Device.clear_field_values` with an empty list no longer raise + ``InvalidArgumentError``. + (`#1982 `__) +- :class:`Linker` error and info log retrieval now properly checks return codes + from nvJitLink, raising exceptions on failure instead of silently ignoring + errors. + (`#1993 `__) +- Fixed a potential crash when NVML event set creation failed, due to + ``__dealloc__`` freeing an uninitialized handle. + (`#1992 `__) +- CUDA Runtime error messages are now more reliable, especially on Windows + where the runtime DLL name table could disagree with the installed bindings. + (`#2003 `__) +- Linux release wheels are now stripped of debug symbols, significantly reducing + package size. Debug builds are now supported via + ``--config-settings=debug=true``. + (`#1890 `__) From 187806097831875368304ba349c552e2798451a6 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Wed, 6 May 2026 03:31:35 +0000 Subject: [PATCH 4/6] Update cuda.core docs for 1.0.0 GA - api.rst: replace pre-1.0 warning with stable-API statement and link to support policy. - install.rst: update free-threading version reference from 0.4.0 to 1.0.0. - nv-versions.json: add 1.0.0 entry for the version switcher dropdown. --- cuda_core/docs/nv-versions.json | 4 ++++ cuda_core/docs/source/api.rst | 9 ++++----- cuda_core/docs/source/install.rst | 2 +- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/cuda_core/docs/nv-versions.json b/cuda_core/docs/nv-versions.json index d55ec26f53f..0d0aa6276d9 100644 --- a/cuda_core/docs/nv-versions.json +++ b/cuda_core/docs/nv-versions.json @@ -3,6 +3,10 @@ "version": "latest", "url": "https://nvidia.github.io/cuda-python/cuda-core/latest/" }, + { + "version": "1.0.0", + "url": "https://nvidia.github.io/cuda-python/cuda-core/1.0.0/" + }, { "version": "0.7.0", "url": "https://nvidia.github.io/cuda-python/cuda-core/0.7.0/" diff --git a/cuda_core/docs/source/api.rst b/cuda_core/docs/source/api.rst index 582f2140903..c98a3bc8256 100644 --- a/cuda_core/docs/source/api.rst +++ b/cuda_core/docs/source/api.rst @@ -6,11 +6,10 @@ ``cuda.core`` API Reference =========================== -This is the main API reference for ``cuda.core``. The package has not yet -reached version 1.0.0, and APIs may change between minor versions, possibly -without deprecation warnings. Once version 1.0.0 is released, APIs will -be considered stable and will follow semantic versioning with appropriate -deprecation periods for breaking changes. +This is the main API reference for ``cuda.core``. As of version 1.0.0, all +APIs are considered stable and follow `Semantic Versioning `_ +with appropriate deprecation periods for breaking changes. See the +:doc:`support policy ` for details. Devices and execution diff --git a/cuda_core/docs/source/install.rst b/cuda_core/docs/source/install.rst index 90e2a1b5b17..05f813f9d3f 100644 --- a/cuda_core/docs/source/install.rst +++ b/cuda_core/docs/source/install.rst @@ -32,7 +32,7 @@ dependencies are as follows: Free-threading Build Support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As of cuda-core 0.4.0, **experimental** packages for the `free-threaded interpreter`_ are shipped. +As of cuda-core 1.0.0, **experimental** packages for the `free-threaded interpreter`_ are shipped. 1. Support for these builds is best effort, due to heavy use of `built-in modules that are known to be thread-unsafe`_, such as ``ctypes``. From 7bccc2b7bf4f37793fe9ce5e28f0d4feb4dba6c1 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Wed, 6 May 2026 17:45:44 +0000 Subject: [PATCH 5/6] Split cuda.core.system API reference into separate page Move the CUDA system information / NVML section from api.rst into a dedicated api_nvml.rst. The new page uses its own `.. module:: cuda.core.system` directive so autosummary entries no longer need the `system.` prefix. Added to index.rst toctree after api. --- cuda_core/docs/source/api.rst | 40 --------------------------- cuda_core/docs/source/api_nvml.rst | 44 ++++++++++++++++++++++++++++++ cuda_core/docs/source/index.rst | 1 + 3 files changed, 45 insertions(+), 40 deletions(-) create mode 100644 cuda_core/docs/source/api_nvml.rst diff --git a/cuda_core/docs/source/api.rst b/cuda_core/docs/source/api.rst index 64f3c49a547..74e0ad392e7 100644 --- a/cuda_core/docs/source/api.rst +++ b/cuda_core/docs/source/api.rst @@ -241,46 +241,6 @@ execution. checkpoint.Process -CUDA system information and NVIDIA Management Library (NVML) ------------------------------------------------------------- - -.. note:: - ``cuda.core.system`` support requires ``cuda_bindings`` 12.9.6 or later, or 13.2.0 or later. - -Basic functions -``````````````` - -.. autosummary:: - :toctree: generated/ - - system.get_driver_version - system.get_driver_version_full - system.get_driver_branch - system.get_num_devices - system.get_nvml_version - system.get_process_name - system.get_topology_common_ancestor - system.get_p2p_status - -Events -`````` - -.. autosummary:: - :toctree: generated/ - - system.register_events - -Types -````` - -.. autosummary:: - :toctree: generated/ - - :template: autosummary/cyclass.rst - - system.Device - system.NvlinkInfo - Utility functions ----------------- diff --git a/cuda_core/docs/source/api_nvml.rst b/cuda_core/docs/source/api_nvml.rst new file mode 100644 index 00000000000..9e9ad3d5640 --- /dev/null +++ b/cuda_core/docs/source/api_nvml.rst @@ -0,0 +1,44 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. module:: cuda.core.system + +CUDA system information and NVIDIA Management Library (NVML) +============================================================ + +.. note:: + ``cuda.core.system`` support requires ``cuda_bindings`` 12.9.6 or later, or 13.2.0 or later. + +Basic functions +--------------- + +.. autosummary:: + :toctree: generated/ + + get_driver_version + get_driver_version_full + get_driver_branch + get_num_devices + get_nvml_version + get_process_name + get_topology_common_ancestor + get_p2p_status + +Events +------ + +.. autosummary:: + :toctree: generated/ + + register_events + +Types +----- + +.. autosummary:: + :toctree: generated/ + + :template: autosummary/cyclass.rst + + Device + NvlinkInfo diff --git a/cuda_core/docs/source/index.rst b/cuda_core/docs/source/index.rst index 5c6c9d83ffe..9a266e20949 100644 --- a/cuda_core/docs/source/index.rst +++ b/cuda_core/docs/source/index.rst @@ -15,6 +15,7 @@ Welcome to the documentation for ``cuda.core``. install interoperability api + api_nvml environment_variables contribute From b08abc45e719aae5fca73e659aa1ff5216a65865 Mon Sep 17 00:00:00 2001 From: Leo Fang Date: Wed, 6 May 2026 22:57:33 +0000 Subject: [PATCH 6/6] Remove algorithm and size details from make_program_cache_key docstring The Returns section exposed the hash algorithm and digest size, which are implementation details. Replace with "opaque bytes digest" so the public API contract does not pin these. See #2043 --- cuda_core/cuda/core/utils/_program_cache/_keys.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cuda_core/cuda/core/utils/_program_cache/_keys.py b/cuda_core/cuda/core/utils/_program_cache/_keys.py index dda07039e32..fbb5ef3f890 100644 --- a/cuda_core/cuda/core/utils/_program_cache/_keys.py +++ b/cuda_core/cuda/core/utils/_program_cache/_keys.py @@ -670,7 +670,7 @@ def make_program_cache_key( Returns ------- bytes - A 32-byte blake2b digest suitable for use as a cache key. + An opaque bytes digest suitable for use as a cache key. Raises ------