Skip to content

[cuda.core] Add support for host_launch (host callback nodes / host function launches) #2058

@rparolin

Description

@rparolin

Feature Request

Add a host_launch (or equivalent) API to cuda.core that allows scheduling
Python callables (or C function pointers) to execute on the host as part of a
stream's work order. This is the cuLaunchHostFunc / cudaLaunchHostFunc
path (and its graph-node counterpart, host nodes via cuGraphAddHostNode).

Motivation

cuda.core currently exposes launch(...) for device kernels but has no
symmetric primitive for host work. This makes it impossible to express
mixed host/device work ordering in pure cuda.core terms — users must drop
to cuda.bindings for cuLaunchHostFunc, which breaks the cuda.core
abstraction boundary (streams, events, graphs).

Common use cases:

  • Logging / progress callbacks ordered against GPU work without host-side
    stream synchronization.
  • Triggering Python-side state transitions (e.g. buffer release, metric
    updates) at a specific point in a stream.
  • Host nodes in CUDA graphs for workflows that need host-side compute or
    notification steps between kernels.

Proposed Scope

  • A top-level host_launch(stream, fn, *args, **kwargs) (or
    stream.launch_host(fn, ...)) that wraps cuLaunchHostFunc.
  • A corresponding graph node type (HostNode) added to
    cuda.core.graph._subclasses, alongside the existing EmptyNode,
    MemcpyNode, etc.
  • Clear documentation of the callback threading / reentrancy restrictions
    imposed by the CUDA driver (host functions run on an internal driver
    thread; must not call any CUDA API).
  • An example under cuda_core/examples/ demonstrating a host callback
    ordered between two kernels.
  • API reference entries in cuda_core/docs/source/api.rst.

Related

  • Driver API: cuLaunchHostFunc, cuGraphAddHostNode
  • Runtime API: cudaLaunchHostFunc
  • Part of cuda.core feature audit gap list (Nov 2025).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0High priority - Must do!cuda.coreEverything related to the cuda.core module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions