diff --git a/README.md b/README.md index 6da895bbb9b..0a986bc10b0 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ CUDA Python is the home for accessing NVIDIA’s CUDA platform from Python. It c * [cuda.core](https://nvidia.github.io/cuda-python/cuda-core/latest): Pythonic access to CUDA Runtime and other core functionality * [cuda.bindings](https://nvidia.github.io/cuda-python/cuda-bindings/latest): Low-level Python bindings to CUDA C APIs * [cuda.pathfinder](https://nvidia.github.io/cuda-python/cuda-pathfinder/latest): Utilities for locating CUDA components installed in the user's Python environment -* [cuda.coop](https://nvidia.github.io/cccl/python/coop): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels -* [cuda.compute](https://nvidia.github.io/cccl/python/compute): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host* +* [cuda.coop](https://nvidia.github.io/cccl/unstable/python/coop.html): A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels +* [cuda.compute](https://nvidia.github.io/cccl/unstable/python/compute/index.html): A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like `sort`, `scan`, `reduce`, `transform`, etc. that are callable on the *host* * [numba.cuda](https://nvidia.github.io/numba-cuda/): A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions * [cuda.tile](https://docs.nvidia.com/cuda/cutile-python/): A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels * [nvmath-python](https://docs.nvidia.com/cuda/nvmath-python/latest): Pythonic access to NVIDIA CPU & GPU Math Libraries, with [*host*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#host-apis), [*device*](https://docs.nvidia.com/cuda/nvmath-python/latest/overview.html#device-apis), and [*distributed*](https://docs.nvidia.com/cuda/nvmath-python/latest/distributed-apis/index.html) APIs. It also provides low-level Python bindings to host C APIs ([nvmath.bindings](https://docs.nvidia.com/cuda/nvmath-python/latest/bindings/index.html)). @@ -44,4 +44,6 @@ The list of available interfaces is: * NVRTC * nvJitLink * NVVM +* nvFatbin * cuFile +* NVML diff --git a/cuda_core/docs/nv-versions.json b/cuda_core/docs/nv-versions.json index d55ec26f53f..0d0aa6276d9 100644 --- a/cuda_core/docs/nv-versions.json +++ b/cuda_core/docs/nv-versions.json @@ -3,6 +3,10 @@ "version": "latest", "url": "https://nvidia.github.io/cuda-python/cuda-core/latest/" }, + { + "version": "1.0.0", + "url": "https://nvidia.github.io/cuda-python/cuda-core/1.0.0/" + }, { "version": "0.7.0", "url": "https://nvidia.github.io/cuda-python/cuda-core/0.7.0/" diff --git a/cuda_core/docs/source/api.rst b/cuda_core/docs/source/api.rst index 41ff5f179ed..74e0ad392e7 100644 --- a/cuda_core/docs/source/api.rst +++ b/cuda_core/docs/source/api.rst @@ -6,11 +6,10 @@ ``cuda.core`` API Reference =========================== -This is the main API reference for ``cuda.core``. The package has not yet -reached version 1.0.0, and APIs may change between minor versions, possibly -without deprecation warnings. Once version 1.0.0 is released, APIs will -be considered stable and will follow semantic versioning with appropriate -deprecation periods for breaking changes. +This is the main API reference for ``cuda.core``. As of version 1.0.0, all +APIs are considered stable and follow `Semantic Versioning `_ +with appropriate deprecation periods for breaking changes. See the +:doc:`support policy ` for details. Devices and execution @@ -242,46 +241,6 @@ execution. checkpoint.Process -CUDA system information and NVIDIA Management Library (NVML) ------------------------------------------------------------- - -.. note:: - ``cuda.core.system`` support requires ``cuda_bindings`` 12.9.6 or later, or 13.2.0 or later. - -Basic functions -``````````````` - -.. autosummary:: - :toctree: generated/ - - system.get_driver_version - system.get_driver_version_full - system.get_driver_branch - system.get_num_devices - system.get_nvml_version - system.get_process_name - system.get_topology_common_ancestor - system.get_p2p_status - -Events -`````` - -.. autosummary:: - :toctree: generated/ - - system.register_events - -Types -````` - -.. autosummary:: - :toctree: generated/ - - :template: autosummary/cyclass.rst - - system.Device - system.NvlinkInfo - Utility functions ----------------- diff --git a/cuda_core/docs/source/api_nvml.rst b/cuda_core/docs/source/api_nvml.rst new file mode 100644 index 00000000000..9e9ad3d5640 --- /dev/null +++ b/cuda_core/docs/source/api_nvml.rst @@ -0,0 +1,44 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. module:: cuda.core.system + +CUDA system information and NVIDIA Management Library (NVML) +============================================================ + +.. note:: + ``cuda.core.system`` support requires ``cuda_bindings`` 12.9.6 or later, or 13.2.0 or later. + +Basic functions +--------------- + +.. autosummary:: + :toctree: generated/ + + get_driver_version + get_driver_version_full + get_driver_branch + get_num_devices + get_nvml_version + get_process_name + get_topology_common_ancestor + get_p2p_status + +Events +------ + +.. autosummary:: + :toctree: generated/ + + register_events + +Types +----- + +.. autosummary:: + :toctree: generated/ + + :template: autosummary/cyclass.rst + + Device + NvlinkInfo diff --git a/cuda_core/docs/source/index.rst b/cuda_core/docs/source/index.rst index 3bf962d7251..9a266e20949 100644 --- a/cuda_core/docs/source/index.rst +++ b/cuda_core/docs/source/index.rst @@ -15,12 +15,14 @@ Welcome to the documentation for ``cuda.core``. install interoperability api + api_nvml environment_variables contribute .. toctree:: :maxdepth: 1 + support conduct license diff --git a/cuda_core/docs/source/install.rst b/cuda_core/docs/source/install.rst index 90e2a1b5b17..05f813f9d3f 100644 --- a/cuda_core/docs/source/install.rst +++ b/cuda_core/docs/source/install.rst @@ -32,7 +32,7 @@ dependencies are as follows: Free-threading Build Support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As of cuda-core 0.4.0, **experimental** packages for the `free-threaded interpreter`_ are shipped. +As of cuda-core 1.0.0, **experimental** packages for the `free-threaded interpreter`_ are shipped. 1. Support for these builds is best effort, due to heavy use of `built-in modules that are known to be thread-unsafe`_, such as ``ctypes``. diff --git a/cuda_core/docs/source/release/1.0.0-notes.rst b/cuda_core/docs/source/release/1.0.0-notes.rst index a0f1de1a121..99d3dca518e 100644 --- a/cuda_core/docs/source/release/1.0.0-notes.rst +++ b/cuda_core/docs/source/release/1.0.0-notes.rst @@ -20,11 +20,74 @@ New features including string process state queries, lock/checkpoint/restore/unlock operations, and GPU UUID remapping support for restore. (`#1343 `__) +- Added green context support (CUDA 12.4+). New types :class:`Context`, + :class:`ContextOptions`, :class:`SMResource`, :class:`SMResourceOptions`, + :class:`WorkqueueResource`, and :class:`WorkqueueResourceOptions` enable GPU + SM and workqueue resource partitioning. Create green contexts via + :meth:`Device.create_context`, then use :meth:`Context.create_stream` and + :attr:`Context.resources` to work within the partitioned resources. + (`#1976 `__) +- Changes to the :mod:`cuda.core.system` module for NVIDIA Management Library (NVML) + access: + + - :attr:`system.Device.mig` for querying and setting MIG mode, enumerating + MIG device instances, and navigating parent/child relationships. + (`#1916 `__) + - :attr:`system.Device.compute_running_processes` for querying running compute + processes on a device, returning :class:`~system.ProcessInfo` objects with + PID, GPU memory usage, and MIG instance IDs. + (`#1917 `__) + - :meth:`system.Device.get_nvlink` for querying NVLink version and state per + link, and :attr:`system.Device.utilization` returning current GPU and memory + utilization rates. + (`#1918 `__) + +- Re-wrapped NVML enums as human-readable ``StrEnum`` subclasses instead of raw + integer re-exports from ``cuda.bindings.nvml``. These are available in + ``cuda.core.system.typing``. + (`#2014 `__) +- Enums are now available in places where a small number of string values are + accepted or returned. You may continue to use the string values, or use + enumerations for better linting and type-checking. + (`#2016 `__) + The new enums are: + + - :class:`cuda.core.typing.CompilerBackendType` + - :class:`cuda.core.typing.GraphConditionalType` + - :class:`cuda.core.typing.GraphMemoryType` + - :class:`cuda.core.typing.ManagedMemoryLocationType` + - :class:`cuda.core.typing.ObjectCodeFormatType` + - :class:`cuda.core.typing.PCHStatusType` + - :class:`cuda.core.typing.SourceCodeType` + - :class:`cuda.core.typing.VirtualMemoryAccessType` + - :class:`cuda.core.typing.VirtualMemoryAllocationType` + - :class:`cuda.core.typing.VirtualMemoryGranularityType` + - :class:`cuda.core.typing.VirtualMemoryHandleType` + - :class:`cuda.core.typing.VirtualMemoryLocationType` Breaking changes ---------------- +- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor`` + objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor`` + is passed to any ``from_*`` classmethod (``from_dlpack``, + ``from_cuda_array_interface``, ``from_array_interface``, or + ``from_any_interface``), tensor metadata is read directly from the underlying + C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. + This yields ~7–20x faster ``StridedMemoryView`` construction for PyTorch + tensors (depending on whether stream ordering is required). Proper CUDA stream + ordering is established between PyTorch's current stream and the consumer + stream, matching the DLPack synchronization contract. + Requires PyTorch >= 2.3. + + This is a *behavioral* breaking change: because the AOTI tensor bridge reads + raw metadata without re-enacting PyTorch's export guardrails, tensors that + PyTorch would reject at the DLPack boundary (notably ``requires_grad``, + conjugated, non-strided/sparse, and wrong-current-device CUDA tensors) are + now accepted. This is intentional — ``StridedMemoryView`` is designed for + low-level interop where those checks are not needed. + (`#749 `__) - Renamed :class:`~graph.GraphDef` to :class:`~graph.GraphDefinition` for consistency with the rest of the API, which spells words out (e.g. ``TensorMapDescriptor``, not ``TensorMapDesc``). @@ -125,6 +188,63 @@ Breaking changes - :obj:`cuda.core.typing.DevicePointerT` -> :obj:`cuda.core.typing.DevicePointerType` - :obj:`cuda.core.typing.IsStreamT` -> :obj:`cuda.core.typing.IsStreamType` +- Renamed and converted multiple :class:`~system.Device` properties and methods + for naming consistency + (`#1946 `__): + + On :class:`~system.Device`: + + - ``is_c2c_mode_enabled`` -> ``is_c2c_enabled`` + - ``persistence_mode_enabled`` -> ``is_persistence_mode_enabled`` + - ``clock(clock_type)`` -> ``get_clock(clock_type)`` + - ``get_auto_boosted_clocks_enabled()`` -> ``is_auto_boosted_clocks_enabled`` + (method -> property) + - ``get_current_clock_event_reasons()`` -> ``current_clock_event_reasons`` + (method -> property) + - ``get_supported_clock_event_reasons()`` -> ``supported_clock_event_reasons`` + (method -> property) + - ``display_mode`` -> ``is_display_connected`` + - ``display_active`` -> ``is_display_active`` + - ``fan(fan=0)`` -> ``get_fan(fan=0)`` + - ``get_supported_pstates()`` -> ``supported_pstates`` + (method -> property) + + On ``PciInfo``: + + - ``get_max_pcie_link_generation()`` -> ``link_generation`` (method -> property) + - ``get_gpu_max_pcie_link_generation()`` -> ``max_link_generation`` + (method -> property) + - ``get_max_pcie_link_width()`` -> ``max_link_width`` (method -> property) + - ``get_current_pcie_link_generation()`` -> ``current_link_generation`` + (method -> property) + - ``get_current_pcie_link_width()`` -> ``current_link_width`` + (method -> property) + - ``get_pcie_throughput(counter)`` -> ``get_throughput(counter)`` + - ``get_pcie_replay_counter()`` -> ``replay_counter`` (method -> property) + + On ``Temperature``: + + - ``sensor(sensor=...)`` -> ``get_sensor(sensor=...)`` + - ``threshold(threshold_type)`` -> ``get_threshold(threshold_type)`` + - ``thermal_settings(sensor_index)`` -> ``get_thermal_settings(sensor_index)`` + + On ``FanInfo``: + + - ``set_default_fan_speed()`` -> ``set_default_speed()`` + +- Removed 18 helper/data-container classes from ``cuda.core.system.__all__``: + ``BAR1MemoryInfo``, ``ClockInfo``, ``ClockOffsets``, ``CoolerInfo``, + ``DeviceAttributes``, ``DeviceEvents``, ``EventData``, ``FanInfo``, + ``FieldValue``, ``FieldValues``, ``GpuDynamicPstatesInfo``, + ``GpuDynamicPstatesUtilization``, ``InforomInfo``, ``PciInfo``, + ``RepairStatus``, ``Temperature``, ``ThermalSensor``, ``ThermalSettings``. + These classes are still returned by :class:`~system.Device` properties and + methods but should not be directly instantiated by users. + (`#1942 `__) +- :attr:`system.Device.uuid` now returns the full NVML UUID with prefix + (e.g. ``GPU-...``). Use :attr:`system.Device.uuid_without_prefix` for + the previous behavior. + (`#1916 `__) - :func:`args_viewable_as_strided_memory` and :class:`StridedMemoryView` are now longer at the top-level in :mod:`cuda.core`. They are available publicly from the :mod:`cuda.core.utils` module. @@ -133,33 +253,29 @@ Breaking changes Fixes and enhancements ----------------------- -- :class:`~utils.StridedMemoryView` now provides a fast path for ``torch.Tensor`` - objects via PyTorch's AOT Inductor (AOTI) stable C ABI. When a ``torch.Tensor`` - is passed to any ``from_*`` classmethod (``from_dlpack``, - ``from_cuda_array_interface``, ``from_array_interface``, or - ``from_any_interface``), tensor metadata is read directly from the underlying - C struct, bypassing the DLPack and CUDA Array Interface protocol overhead. - This yields ~7-20x faster ``StridedMemoryView`` construction for PyTorch - tensors (depending on whether stream ordering is required). Proper CUDA stream ordering is established between PyTorch's current - stream and the consumer stream, matching the DLPack synchronization contract. - Requires PyTorch >= 2.3. - (`#749 `__) - -- Enums are not available in places where a small number of string values are - accepted or returned. You may continue to use the string values, or use - enumerations for better linting and type-checking. - (`#2016 `__) - The new enums are: - - - :class:`cuda.core.typing.CompilerBackendType` - - :class:`cuda.core.typing.GraphConditionalType` - - :class:`cuda.core.typing.GraphMemoryType` - - :class:`cuda.core.typing.ManagedMemoryLocationType` - - :class:`cuda.core.typing.ObjectCodeFormatType` - - :class:`cuda.core.typing.PCHStatusType` - - :class:`cuda.core.typing.SourceCodeType` - - :class:`cuda.core.typing.VirtualMemoryAccessType` - - :class:`cuda.core.typing.VirtualMemoryAllocationType` - - :class:`cuda.core.typing.VirtualMemoryGranularityType` - - :class:`cuda.core.typing.VirtualMemoryHandleType` - - :class:`cuda.core.typing.VirtualMemoryLocationType` +- Fixed :attr:`Buffer.is_managed` returning ``False`` for pool-allocated managed + memory (:class:`ManagedMemoryResource`), which caused DLPack interop to + misclassify managed buffers as ``kDLCUDAHost``. The fix queries both the + driver pointer attribute and the memory resource. + (`#1924 `__) +- :attr:`system.Device.arch` now returns ``UNKNOWN`` instead of raising + ``ValueError`` when NVML reports an architecture not yet in the enum. + (`#1937 `__) +- :meth:`system.Device.get_field_values` and + :meth:`system.Device.clear_field_values` with an empty list no longer raise + ``InvalidArgumentError``. + (`#1982 `__) +- :class:`Linker` error and info log retrieval now properly checks return codes + from nvJitLink, raising exceptions on failure instead of silently ignoring + errors. + (`#1993 `__) +- Fixed a potential crash when NVML event set creation failed on Windows, due to + ``__dealloc__`` freeing an uninitialized handle. + (`#1992 `__) +- CUDA Runtime error messages are now more reliable, especially on Windows + where the runtime DLL name table could disagree with the installed bindings. + (`#2003 `__) +- Linux release wheels are now stripped of debug symbols, significantly reducing + package size. Debug builds are now supported via + ``--config-settings=debug=true``. + (`#1890 `__) diff --git a/cuda_core/docs/source/support.rst b/cuda_core/docs/source/support.rst new file mode 100644 index 00000000000..38d91368586 --- /dev/null +++ b/cuda_core/docs/source/support.rst @@ -0,0 +1,79 @@ +.. SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +.. _cuda-core-support: + +``cuda.core`` Support Policy +============================ + +Versioning Scheme +----------------- + +``cuda.core`` follows `Semantic Versioning (SemVer) `_ with the version +format ``major.minor.patch``: + +- **Major**: Bumped when a new CUDA major release is out and support for the oldest CUDA major + version is dropped. Breaking API changes only happen at major-version boundaries. +- **Minor**: Bumped when new, backward-compatible features are added, or when a new Python feature + release is out and the oldest supported Python version reaches EOL. +- **Patch**: Bumped for bug fixes and backward-compatible maintenance updates. + +Unlike ``cuda.bindings``, the ``cuda.core`` version is *not* aligned with the CUDA Toolkit version. +Consult the table below or the :doc:`release notes ` to determine which CUDA versions are +supported by a given ``cuda.core`` release. + +CUDA Version Support +-------------------- + +``cuda.core`` is actively maintained to support the two (2) most recent CUDA major versions. For +example, ``cuda.core`` 1.x supports CUDA 12 and 13. Any fix in the latest release would be +backported as needed. + +When a new CUDA major version is released and support for the oldest major version is dropped, +``cuda.core`` will release a new major version (e.g., 1.x → 2.0.0). + +.. list-table:: CUDA Version Support Matrix + :header-rows: 1 + + * - ``cuda.core`` version + - Supported CUDA versions + * - 1.x + - 12, 13 + +As with any CUDA library, certain features may impose additional requirements on +the minimum ``cuda-bindings`` or CUDA driver version. Refer to the individual +module documentation for details. + +Python Version Support +---------------------- + +``cuda.core`` supports all Python versions following the `CPython EOL schedule +`_. As of writing, Python 3.10 – 3.14 are supported. + +When a new Python feature version is released and the oldest supported version reaches EOL, +``cuda.core`` will bump its minor version accordingly. + +Free-threading Build Support +---------------------------- + +As of ``cuda.core`` 1.0.0, wheels for the `free-threaded interpreter +`_ are shipped to PyPI. This support +is currently *experimental*. + +1. For now, you are responsible for making sure that calls into the underlying CUDA libraries + are thread-safe. This is subject to change. + +Release Cadence +--------------- + +- ``cuda.core`` follows its own release cadence, independent of CUDA Toolkit releases, as long as + SemVer guarantees are maintained. +- We currently aim for bimonthly releases, though this is subject to change. +- Major version releases are aligned to CUDA major version releases. +- New features may be delivered in minor releases at any time — not gated by the CUDA Toolkit + release schedule. + +---- + +The NVIDIA CUDA Python team reserves the right to amend the above support policy. Any major changes, +however, will be announced to users in advance. diff --git a/cuda_python/DESCRIPTION.rst b/cuda_python/DESCRIPTION.rst index 6120a568023..90bf5c127a4 100644 --- a/cuda_python/DESCRIPTION.rst +++ b/cuda_python/DESCRIPTION.rst @@ -10,8 +10,8 @@ CUDA Python is the home for accessing NVIDIA's CUDA platform from Python. It con * `cuda.core `_: Pythonic access to CUDA Runtime and other core functionality * `cuda.bindings `_: Low-level Python bindings to CUDA C APIs * `cuda.pathfinder `_: Utilities for locating CUDA components installed in the user's Python environment -* `cuda.coop `_: A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels -* `cuda.compute `_: A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc. that are callable on the *host* +* `cuda.coop `_: A Python module providing CCCL's reusable block-wide and warp-wide *device* primitives for use within Numba CUDA kernels +* `cuda.compute `_: A Python module for easy access to CCCL's highly efficient and customizable parallel algorithms, like ``sort``, ``scan``, ``reduce``, ``transform``, etc. that are callable on the *host* * `numba.cuda `_: A Python DSL that exposes CUDA **SIMT** programming model and compiles a restricted subset of Python code into CUDA kernels and device functions * `cuda.tile `_: A new Python DSL that exposes CUDA **Tile** programming model and allows users to write NumPy-like code in CUDA kernels * `nvmath-python `_: Pythonic access to NVIDIA CPU & GPU Math Libraries, with `host `_, `device `_, and `distributed `_ APIs. It also provides low-level Python bindings to host C APIs (`nvmath.bindings `_). @@ -52,4 +52,6 @@ The list of available interfaces is: * NVRTC * nvJitLink * NVVM +* nvFatbin * cuFile +* NVML diff --git a/cuda_python/docs/source/index.rst b/cuda_python/docs/source/index.rst index 7aad94ef9c4..458a7a03229 100644 --- a/cuda_python/docs/source/index.rst +++ b/cuda_python/docs/source/index.rst @@ -20,8 +20,8 @@ multiple components: - `CUPTI Python`_: Python APIs for creation of profiling tools that target CUDA Python applications via the CUDA Profiling Tools Interface (CUPTI) - `Accelerated Computing Hub`_: Open-source learning materials related to GPU computing. You will find user guides, tutorials, and other works freely available for all learners interested in GPU computing. -.. _cuda.coop: https://nvidia.github.io/cccl/python/coop -.. _cuda.compute: https://nvidia.github.io/cccl/python/compute +.. _cuda.coop: https://nvidia.github.io/cccl/unstable/python/coop.html +.. _cuda.compute: https://nvidia.github.io/cccl/unstable/python/compute/index.html .. _numba.cuda: https://nvidia.github.io/numba-cuda/ .. _cuda.tile: https://docs.nvidia.com/cuda/cutile-python/ .. _nvmath-python: https://docs.nvidia.com/cuda/nvmath-python/latest @@ -50,8 +50,8 @@ be available, please refer to the `cuda.bindings`_ documentation for installatio cuda.core cuda.bindings cuda.pathfinder - cuda.coop - cuda.compute + cuda.coop + cuda.compute numba.cuda cuda.tile nvmath-python