cuda.core: trim graph API surface for v1.0 (drop top-level re-exports, remove GraphAllocOptions)#2048
Merged
Andy-Jost merged 4 commits intoNVIDIA:mainfrom May 7, 2026
Conversation
The graph types (Graph, GraphAllocOptions, GraphBuilder, GraphCompleteOptions, GraphCondition, GraphDebugPrintOptions, GraphDefinition) are now reachable only from the cuda.core.graph submodule. The submodule itself is still loaded by `import cuda.core` (via `from cuda.core import ... graph ...`), so `cuda.core.graph.X` remains accessible without an explicit submodule import. The same symbols are also no longer forwarded through the deprecated cuda.core.experimental shim. Documents the change as a breaking change in the 1.0.0 release notes and updates internal tests and the getting-started guide to import through cuda.core.graph. Co-authored-by: Cursor <cursoragent@cursor.com>
GraphDefinition.allocate and GraphNode.allocate now accept device, memory_type, and peer_access as keyword-only arguments instead of a positional GraphAllocOptions dataclass. The dataclass and its companion AllocNode.options round-trip property are removed; the existing AllocNode.device_id, .memory_type, and .peer_access properties cover that data directly. Documents the change as a breaking change in the 1.0.0 release notes and removes the type from the API reference autosummary. Co-authored-by: Cursor <cursoragent@cursor.com>
leofang
approved these changes
May 7, 2026
Address review comments from @leofang on PR NVIDIA#2048: annotate the device and peer_access parameters of GraphDefinition.allocate and GraphNode.allocate as `"Device" | int | None` and `list["Device" | int] | None` respectively, instead of leaving them untyped. Co-authored-by: Cursor <cursoragent@cursor.com>
This comment has been minimized.
This comment has been minimized.
Member
|
@Andy-Jost CI failed |
…rcular load Importing cuda.core.graph from the top of cuda/core/__init__.py triggers a load of cuda.core.graph._graph_builder, which cimports cuda.core._stream and other extensions. While cuda.core itself is still initializing, those circular loads leave the graph submodules partially initialized when `from ._graph_builder import *` runs in cuda/core/graph/__init__.py, and Graph, GraphBuilder, GraphCompleteOptions, and GraphDebugPrintOptions silently fail to surface on cuda.core.graph. Defer `import cuda.core.graph` until after every cuda.core._* extension has been loaded so the inner `from ._graph_builder import *` finds a fully initialized module. The standalone `import` form (rather than `from cuda.core import graph`) keeps it from being collapsed back into the checkpoint/system/utils block by ruff's import sorter; an `# isort: split` marker pins the placement. Co-authored-by: Cursor <cursoragent@cursor.com>
0fab123 to
a93ea03
Compare
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two pre-1.0 breaking-change cleanups to the graph API, applied as separate commits.
cuda.corenamespace; they live undercuda.core.graphonly. The same symbols are also dropped from the deprecatedcuda.core.experimentalshim. Thecuda.core.graphsubmodule itself remains accessible afterimport cuda.core(added to the existingfrom cuda.core import checkpoint, graph, system, utilsline), socuda.core.graph.Xcontinues to work without an explicit submodule import.GraphAllocOptionsdataclass and theAllocNode.optionsround-trip property. Its three fields are now keyword-only parameters onGraphDefinition.allocateandGraphNode.allocate:device,memory_type,peer_access. The same data is still readable on the resulting node via the existingdevice_id,memory_type, andpeer_accessproperties.Changes
Commit 1 - drop top-level graph re-exports:
cuda_core/cuda/core/__init__.py: removedfrom cuda.core.graph import (Graph, GraphAllocOptions, GraphBuilder, GraphCompleteOptions, GraphCondition, GraphDebugPrintOptions, GraphDefinition); addedgraphto the existing submodule import line.cuda_core/cuda/core/experimental/__init__.py: removed the matchingfrom cuda.core.graph import (Graph, GraphBuilder, GraphCompleteOptions, GraphDebugPrintOptions)block.cuda_core/tests/test_experimental_backward_compat.py: dropped four assertions that exercised the removed forwarding.cuda_core/tests/graph/updated to import the affected names fromcuda.core.graph.cuda_core/docs/source/getting-started.rst::class:\GraphBuilder`->:class:`graph.GraphBuilder``.Commit 2 - remove
GraphAllocOptions:cuda_core/cuda/core/graph/_graph_definition.pyx: removed theGraphAllocOptionsdataclass; newGraphDefinition.allocate(size, *, device=None, memory_type=GraphMemoryType.DEVICE, peer_access=None)signature.cuda_core/cuda/core/graph/_graph_node.pyx: same kwargs onGraphNode.allocate(with full per-parameter docstring); inlined the params intoGN_alloc. Also removed an unsubstantiated note claiming the allocation uses the device's default mempool.cuda_core/cuda/core/graph/_subclasses.pyx: removedAllocNode.optionsand its docstring entry.cuda_core/tests/graph/test_graph_definition.py: dropped the import; updated four call sites and two helpers (also dropped the\"options\"key from expected-attrs dicts).cuda_core/docs/source/api.rst: removedgraph.GraphAllocOptionsfrom the dataclass autosummary.Both commits add corresponding entries under "Breaking changes" in
cuda_core/docs/source/release/1.0.0-notes.rst.Test Coverage
Existing graph tests (
cuda_core/tests/graph/test_graph_definition.py,test_graph_builder*.py,test_graph_memory_resource.py,test_options.py,test_device_launch.py) andtest_experimental_backward_compat.pycover all the touched code paths and were updated in lockstep with the API changes. Local run on the user's GPU machine passed before push.Related Work
Part of the v1.0 API cleanup tracked in the breaking-changes section of
cuda_core/docs/source/release/1.0.0-notes.rst.Made with Cursor