Skip to content

Commit 39c085c

Browse files
leofangclaude
andauthored
Add green context support (#1976)
* Implement green context v1 API * Refine green context split compatibility * Encode green context handle dependencies * Simplify green context view handles * Simplify green context descriptor handling * Expand green context test coverage with proper pytest patterns Restructure tests into fixtures + classes with full resource cleanup: - Fixtures: sm_resource, wq_resource, green_ctx (with CUDAError skip), green_ctx_active (with try/finally restore), fill_kernel - _use_green_ctx context manager for safe push/pop in all tests - TestSMResourceQuery: properties, arch constraints per CC - TestSMResourceSplit: single/two-group splits, discovery, alignment, dry-run vs real parity - TestGreenContextKernelLaunch: compile + launch + verify in green ctx, two independent green contexts, SM + workqueue combined All set_current calls are paired with restore in finally blocks to prevent context stack leaks on test failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Lower green context handling to Cython and simplify Context - Convert ContextOptions and SMResourceOptions/WorkqueueResourceOptions to cdef dataclasses for check_or_create_options compatibility. - Cache SM metadata in typed cdef fields; fall back to arch-based granularity on CUDA 12.x where CUdevSmResource lacks minSmPartitionSize/smCoscheduledAlignment. - Simplify Context to hold only ContextHandle (remove _h_green_ctx and _is_green fields). Green ctx association lives in ContextBox; is_green queries get_context_green_ctx() on demand. - ContextOptions.resources accepts Sequence only (no bare resource). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add explicit green context model: ctx.create_stream and ctx.resources Switch from the push model (dev.set_current + dev.create_stream) to the explicit model (ctx.create_stream + ctx.resources) as the primary way to use green contexts. Context.create_stream(options): - Only supported on green contexts (raises on primary contexts). - Delegates to Stream._init, which calls create_stream_handle in C++. - C++ create_stream_handle auto-dispatches: checks get_context_green_ctx and calls cuGreenCtxStreamCreate for green contexts, or cuStreamCreateWithPriority for primary. Single function, no duplication. Context.resources: - Returns a DeviceResources namespace querying this context's resources (cuCtxGetDevResource / cuGreenCtxGetDevResource), not the full device. dev.set_current(green_ctx) still works but is not the recommended path. Tests rewritten to use the explicit model throughout. Push-model set_current kept as regression tests with _use_green_ctx helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Harden green context stream creation and resource queries - Let the driver validate the nonblocking flag for green context streams: cuGreenCtxStreamCreate rejects CU_STREAM_DEFAULT. On failure, check if the context is green + nonblocking is False and raise a clear ValueError. - cuCtxGetStreamPriorityRange failure (CUDA_ERROR_INVALID_CONTEXT) now raises: "No current CUDA context. Call dev.set_current() before creating streams." - C++ create_stream_handle returns CUDA_ERROR_NOT_SUPPORTED if the context is green but cuGreenCtxStreamCreate is unavailable (CUDA < 12.5), instead of falling through to cuStreamCreateWithPriority. - ctx.resources.workqueue now dispatches to cuGreenCtxGetDevResource for green contexts, matching the SM query path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Stream.resources property Stream.resources delegates to DeviceResources._init_from_ctx via the stream's tracked context handle, returning the same resource view as ctx.resources for the stream's parent context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Polish green context API: docs, error handling, simplification - dev.create_context raises ValueError (not NotImplementedError) when options or resources are missing. - Cache version checks (_check_green_ctx_support, _check_workqueue_support) at module level; raise ValueError instead of NotImplementedError. - Simplify _device_resources.pyx: merge _as_uint and _count_to_sm_count into _to_sm_count; inline unsigned int casts for coscheduled params. - Add green context classes to api.rst (Context, ContextOptions, DeviceResources, SMResource, SMResourceOptions, WorkqueueResource, WorkqueueResourceOptions). - Update all docstrings to NumPy style with Attributes/Parameters/Returns sections matching the existing codebase convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address review comments: consolidate context handles, GIL ordering, std::vector Review comment 1: Consolidate create_context_handle_from_green_ctx with create_context_handle_ref by adding a private overload that takes an optional GreenCtxHandle. The green ctx path now delegates to it after calling cuCtxFromGreenCtx, ensuring registry lookup and deduplication. Review comments 2-4: Move GILReleaseGuard to the first line in create_green_ctx_handle and create_context_handle_from_green_ctx for consistency with the rest of the file. Review comment 6: Keep is_green check inline in _context.pyx using get_context_green_ctx (cannot add a C++ is_green function across separate .so boundaries without linker issues). Review comment 8: Replace malloc/free with std::vector<CUdevResource> in Device.create_context for automatic cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix stream registry corruption for owner-backed handles Owner-backed stream handles (from create_stream_handle_with_owner) are no longer registered in the stream_registry. Multiple Python owners can wrap the same CUstream independently, each stacking its own Py_INCREF/Py_DECREF without competing for a single registry slot. The registry lookup at the top is preserved to reuse existing cuda-core-owned handles that carry context metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent b280b9d commit 39c085c

16 files changed

Lines changed: 1670 additions & 26 deletions

cuda_core/cuda/core/__init__.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,15 @@ def _import_versioned_module():
2929

3030

3131
from cuda.core import checkpoint, system, utils
32+
from cuda.core._context import Context, ContextOptions
3233
from cuda.core._device import Device
34+
from cuda.core._device_resources import (
35+
DeviceResources,
36+
SMResource,
37+
SMResourceOptions,
38+
WorkqueueResource,
39+
WorkqueueResourceOptions,
40+
)
3341
from cuda.core._event import Event, EventOptions
3442
from cuda.core._graphics import GraphicsResource
3543
from cuda.core._launch_config import LaunchConfig

cuda_core/cuda/core/_context.pxd

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

5-
from cuda.core._resource_handles cimport ContextHandle
5+
from cuda.core._resource_handles cimport ContextHandle, GreenCtxHandle
66

77
cdef class Context:
88
"""Cython declaration for Context class.
@@ -18,3 +18,8 @@ cdef class Context:
1818

1919
@staticmethod
2020
cdef Context _from_handle(type cls, ContextHandle h_context, int device_id)
21+
22+
@staticmethod
23+
cdef Context _from_green_ctx(type cls, GreenCtxHandle h_green_ctx, int device_id)
24+
25+
cpdef close(self)

cuda_core/cuda/core/_context.pyx

Lines changed: 94 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,34 @@
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

5+
from __future__ import annotations
6+
7+
from collections.abc import Sequence
58
from dataclasses import dataclass
69

10+
from cuda.bindings cimport cydriver
11+
from cuda.core._device_resources cimport DeviceResources, SMResource, WorkqueueResource
12+
from cuda.core._device_resources import SMResource, WorkqueueResource
713
from cuda.core._resource_handles cimport (
814
ContextHandle,
15+
GreenCtxHandle,
16+
as_cu,
17+
create_context_handle_from_green_ctx,
18+
get_context_green_ctx,
19+
get_last_error,
920
as_intptr,
1021
as_py,
1122
)
23+
from cuda.core._stream import Stream, StreamOptions
24+
from cuda.core._utils.cuda_utils cimport HANDLE_RETURN
1225

1326

1427
__all__ = ['Context', 'ContextOptions']
1528

1629

30+
DeviceResourcesT = Sequence[SMResource | WorkqueueResource]
31+
32+
1733
cdef class Context:
1834
"""CUDA context wrapper.
1935
@@ -32,17 +48,88 @@ cdef class Context:
3248
ctx._device_id = device_id
3349
return ctx
3450

51+
@staticmethod
52+
cdef Context _from_green_ctx(type cls, GreenCtxHandle h_green_ctx, int device_id):
53+
"""Create Context from an owning green context handle."""
54+
cdef ContextHandle h_context = create_context_handle_from_green_ctx(h_green_ctx)
55+
if not h_context:
56+
HANDLE_RETURN(get_last_error())
57+
raise RuntimeError("Failed to create CUDA context view from green context")
58+
return Context._from_handle(cls, h_context, device_id)
59+
3560
@property
3661
def handle(self):
3762
"""Return the underlying CUcontext handle."""
38-
if self._h_context.get() == NULL:
63+
if not self._h_context:
64+
return None
65+
if as_cu(self._h_context) == NULL:
3966
return None
4067
return as_py(self._h_context)
4168

4269
@property
4370
def _handle(self):
4471
return self.handle
4572

73+
@property
74+
def is_green(self) -> bool:
75+
"""True if this context was created from device resources."""
76+
if not self._h_context:
77+
return False
78+
return get_context_green_ctx(self._h_context).get() != NULL
79+
80+
@property
81+
def resources(self) -> DeviceResources:
82+
"""Query the hardware resources provisioned for this context.
83+
84+
For green contexts, returns the resources this context was created
85+
with (SM partition, workqueue config). For primary contexts, returns
86+
the full device resources.
87+
88+
Raises :class:`RuntimeError` if the context has been closed.
89+
"""
90+
if not self._h_context:
91+
raise RuntimeError("Cannot query resources on a closed context")
92+
return DeviceResources._init_from_ctx(self._h_context, self._device_id)
93+
94+
def create_stream(self, options: StreamOptions | None = None):
95+
"""Create a new stream bound to this green context.
96+
97+
This method is only available on green contexts. For primary
98+
contexts, use :meth:`Device.create_stream` instead.
99+
100+
Parameters
101+
----------
102+
options : :obj:`~_stream.StreamOptions`, optional
103+
Customizable dataclass for stream creation options.
104+
105+
Returns
106+
-------
107+
:obj:`~_stream.Stream`
108+
Newly created stream object.
109+
"""
110+
if not self._h_context:
111+
raise RuntimeError("Cannot create a stream on a closed context")
112+
if not self.is_green:
113+
raise RuntimeError(
114+
"Context.create_stream() is only supported on green contexts. "
115+
"Use Device.create_stream() for primary contexts."
116+
)
117+
118+
return Stream._init(options=options, device_id=self._device_id, ctx=self)
119+
120+
cpdef close(self):
121+
"""Release this context wrapper's underlying CUDA handles."""
122+
cdef cydriver.CUcontext current_ctx
123+
if self._h_context and as_cu(self._h_context) != NULL:
124+
with nogil:
125+
HANDLE_RETURN(cydriver.cuCtxGetCurrent(&current_ctx))
126+
if current_ctx == as_cu(self._h_context):
127+
raise RuntimeError(
128+
"Cannot close a CUDA context while it is current. "
129+
"Restore a previous context before closing this context."
130+
)
131+
self._h_context.reset()
132+
46133
def __eq__(self, other):
47134
if not isinstance(other, Context):
48135
return NotImplemented
@@ -57,9 +144,12 @@ cdef class Context:
57144

58145

59146
@dataclass
60-
class ContextOptions:
147+
cdef class ContextOptions:
61148
"""Options for context creation.
62149
63-
Currently unused, reserved for future use.
150+
Attributes
151+
----------
152+
resources : :obj:`~cuda.core.typing.DeviceResourcesT`
153+
Device resources used to create a green context.
64154
"""
65-
pass # TODO
155+
resources: DeviceResourcesT

cuda_core/cuda/core/_cpp/REGISTRY_DESIGN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ carries timing/IPC flags, `KernelBox` carries the library dependency).
2929
Without this level, a round-tripped handle would produce a new Box
3030
with default metadata, losing information that was set at creation.
3131

32-
Instances: `event_registry`, `kernel_registry`, `graph_node_registry`.
32+
Instances: `context_registry`, `stream_registry`, `event_registry`,
33+
`kernel_registry`, `graph_node_registry`.
3334

3435
## Level 2: Resource Handle -> Python Object (Cython)
3536

0 commit comments

Comments
 (0)