Skip to content

Harden _py_host_destructor against invocation after Py_Finalize #2042

@Andy-Jost

Description

@Andy-Jost

Summary

_py_host_destructor (cuda_core/cuda/core/graph/_utils.pyx) is the destroy callback we attach to CUDA user objects that hold Python references for graph-resource lifetime. It is declared noexcept with gil and unconditionally calls Py_DECREF:

cdef void _py_host_destructor(void* data) noexcept with gil:
    _py_decref(data)

It is attached via _attach_user_object with the CU_USER_OBJECT_NO_DESTRUCTOR_SYNC flag, which explicitly allows CUDA to invoke the destructor asynchronously on a worker thread, decoupled from cuGraphDestroy.

This creates a window during interpreter shutdown:

  1. Python starts shutdown; GraphDefinition.__dealloc__ runs and calls cuGraphDestroy.
  2. CUDA queues the destructor for the user object on a worker thread.
  3. Py_Finalize completes.
  4. The CUDA worker later runs _py_host_destructor -> with gil calls PyGILState_Ensure after the runtime is gone -> undefined behavior, typically a crash.

In practice this is usually masked because CUDA tends to run the destructor synchronously inside cuGraphDestroy, but the contract permits the bad ordering and the codebase should not depend on the lucky timing.

Affected callers

All current users of _py_host_destructor:

Proposed fix

Guard the decref with Py_IsInitialized():

cdef extern from \"Python.h\":
    int Py_IsInitialized()

cdef void _py_host_destructor(void* data) noexcept with gil:
    if Py_IsInitialized():
        _py_decref(data)
    # else: process is exiting; the OS will reclaim everything.

An alternative is to drop CU_USER_OBJECT_NO_DESTRUCTOR_SYNC for Python-typed user objects so destructors always run synchronously inside cuGraphDestroy (where we are guaranteed to hold the GIL). That is safer but may have performance implications and changes existing semantics for the host-callback path; the Py_IsInitialized() guard is the smaller, safer change.

Context

This should be fixed in the context of #1330. The broader graph-lifetime/update work tracked there is the natural place to review the user-object lifetime model end to end.

Metadata

Metadata

Assignees

Labels

P2Low priority - Nice to havebugSomething isn't workingcuda.coreEverything related to the cuda.core module

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions