Skip to content

[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders#591

Open
lucifer1004 wants to merge 2 commits into
apache:mainfrom
lucifer1004:module-load-from-bytes
Open

[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders#591
lucifer1004 wants to merge 2 commits into
apache:mainfrom
lucifer1004:module-load-from-bytes

Conversation

@lucifer1004
Copy link
Copy Markdown
Contributor

@lucifer1004 lucifer1004 commented May 15, 2026

Summary

Promotes the existing internal LoadModuleFromBytes(kind, bytes)
helper (src/ffi/extra/library_module.cc) to a public Module API,
registers it as the ffi.ModuleLoadFromBytes global, adds matching
Python (tvm_ffi.load_module_from_bytes) and Rust
(tvm_ffi::Module::load_from_bytes) bindings, and ships end-to-end
Python tests that exercise the dispatch contract.

API contract

Module::LoadFromBytes(kind, bytes) dispatches to the registered
global ffi.Module.load_from_bytes.<kind> (signature
(Bytes) -> Module). If no loader for the given kind is registered,
RuntimeError is raised naming the missing key — so the user knows
exactly what they need to register.

This is the dispatching entry point, not a specific format loader.
Loaders are registered by consumers. The split keeps libtvm_ffi.so
independent of libcuda / ROCm / etc.: a CPU-only build has the API
but no built-in CUDA loader, whereas a consumer-side .so (built
against the existing header-only tvm/ffi/extra/cuda/cubin_launcher.h)
can register ffi.Module.load_from_bytes.cubin for the whole process.
The examples/cubin_launcher/dynamic_cubin/ example already
implements this pattern.

Changes

  • include/tvm/ffi/extra/module.h — declares
    Module::LoadFromBytes(const String& kind, const Bytes& bytes) with
    a doc note pointing at cubin_launcher as the canonical loader
    template.
  • src/ffi/extra/library_module.cc — defines it as a thin wrapper
    around the existing LoadModuleFromBytes.
  • src/ffi/extra/module.cc — registers ffi.ModuleLoadFromBytes in
    the static-init block alongside ffi.ModuleLoadFromFile.
  • rust/tvm-ffi/src/extra/module.rs — adds
    tvm_ffi::Module::load_from_bytes(kind, bytes).
  • python/tvm_ffi/module.py — adds
    load_module_from_bytes(kind, data) mirroring the existing
    load_module(path). Exposed from tvm_ffi/__init__.py.
  • python/tvm_ffi/_ffi_api.py — regenerated stub.
  • tests/python/test_module_load_from_bytes.py — three end-to-end
    tests:
    1. round-trip via a Python-registered loader,
    2. RuntimeError path when no loader is registered (message
      names the missing key),
    3. loader exceptions propagating to the caller.

Motivation

A project that fetches CUDA PTX / CUBIN payloads from a registry
already has the bytes in memory. The current Module::LoadFromFile
path forces a tempfile detour. With this API:

import tvm_ffi

@tvm_ffi.register_global_func("ffi.Module.load_from_bytes.echo")
def _echo_loader(payload: bytes) -> tvm_ffi.Module:
    # Real loader would parse `payload` and return a runnable module.
    ...

mod = tvm_ffi.load_module_from_bytes("echo", b"<payload>")

Or from Rust:

let module = tvm_ffi::Module::load_from_bytes("cubin", &bytes)?;

The dispatch through ffi.Module.load_from_bytes.<kind> is unchanged;
existing loaders register exactly as before.

Test plan

  • New Python tests: 3 passing locally
    (tests/python/test_module_load_from_bytes.py).
  • Existing ffi.ModuleLoadFromFile global continues to work.
  • Rust crate compiles + Module::load_from_bytes callable.
  • Pre-commit lint passes locally and in CI on the previous
    revision; updated revision should match.

Stacked on #590

This branch is logically stacked on #590 (Rust macro fixes). After
#590 merges, this PR's diff collapses to just the
Module::LoadFromBytes commit. Reviewers who have already approved
#590 can skip the first commit here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to load TVM modules from in-memory bytes by adding LoadFromBytes to the C++ API and load_from_bytes to the Rust bindings. It also includes hygiene improvements to Rust macros, such as using $crate for internal references. Feedback suggests replacing the #[unsafe(no_mangle)] attribute with the standard #[no_mangle] to maintain compatibility with Rust versions older than 1.82.0.

Comment thread rust/tvm-ffi/src/macros.rs Outdated
// than a bare `tvm_ffi_sys::…`) lets downstream crates use the
// macro without having to add `tvm-ffi-sys` to their own
// `[dependencies]`.
#[unsafe(no_mangle)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The #[unsafe(no_mangle)] attribute is a feature stabilized in Rust 1.82.0. Using this syntax will cause compilation errors on older versions of the Rust compiler (e.g., 1.80.0 or 1.81.0). Unless the project has explicitly bumped its Minimum Supported Rust Version (MSRV) to 1.82.0, it is recommended to use the standard #[no_mangle] attribute, which is backward-compatible and still valid in current Rust versions.

Suggested change
#[unsafe(no_mangle)]
#[no_mangle]

Three small bugs in the Rust ergonomics that prevented the macros from
being usable from downstream cdylibs:

1. `ensure!` expanded to `crate::bail!`, which resolves to the *caller*
   crate at expansion site rather than `tvm_ffi`. Switched to
   `$crate::bail!` so the path resolves correctly in any crate.

2. `tvm_ffi_dll_export_typed_func!` referenced `tvm_ffi_sys::TVMFFIAny`
   without a `$crate::` prefix, forcing every downstream crate to add
   `tvm-ffi-sys` to its own `[dependencies]`. Switched to
   `$crate::tvm_ffi_sys::TVMFFIAny`; downstream now only needs `tvm-ffi`.

3. The generated `pub unsafe extern "C" fn __tvm_ffi_<name>` had no
   `#[no_mangle]`, so the linker stripped the symbol from cdylibs and
   `Module::GetFunction` could not find it. Added `#[unsafe(no_mangle)]`
   (supported in 2021 + 2024 editions on rustc >= 1.82).

Verified by building a downstream cdylib that only depends on `tvm-ffi`
(no `tvm-ffi-sys` direct dep), loading it via `tvm_ffi.load_module(...)`,
and calling exported scalar + Tensor functions from Python.
@lucifer1004 lucifer1004 force-pushed the module-load-from-bytes branch 3 times, most recently from 64bc197 to 4995972 Compare May 15, 2026 04:01
@lucifer1004 lucifer1004 changed the title [FEAT] Module::LoadFromBytes public API + global registration [FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders May 15, 2026
@lucifer1004 lucifer1004 force-pushed the module-load-from-bytes branch from 4995972 to 70ef19b Compare May 15, 2026 04:09
…odule loaders

The internal helper `LoadModuleFromBytes(kind, bytes)`
(`src/ffi/extra/library_module.cc`) has been around for a while as a
C++ free function used during binary deserialization. It was not
exposed as a public Module API, so callers who already hold module
payload in memory (e.g. a PTX or CUBIN blob fetched from a registry)
had to materialize it to disk first and go through
`ModuleLoadFromFile`. This commit promotes the helper to a public
`Module::LoadFromBytes(kind, bytes)` and registers it as the
`ffi.ModuleLoadFromBytes` global so Python and Rust bindings can call
it without re-implementing kind → loader dispatch.

## API contract

`Module::LoadFromBytes(kind, bytes)` dispatches to the registered
global `ffi.Module.load_from_bytes.<kind>` (signature
`(Bytes) -> Module`). If no loader for the given kind is registered,
`RuntimeError` is raised naming the missing key — so the user knows
exactly what to register.

This is the *dispatching entry point*, not a specific format loader.
Loaders are registered by consumers, mirroring how loaders for module
formats already work today (the cubin_launcher example header-only
library is the canonical CUDA loader template). This split keeps
`libtvm_ffi.so` independent of libcuda / ROCm / etc.: a CPU-only
build of tvm-ffi has the API but no built-in CUDA loader, whereas a
consumer-side `.so` (built against `cubin_launcher.h`) can register
`ffi.Module.load_from_bytes.cubin` for everyone in the same process.

## Changes

* `include/tvm/ffi/extra/module.h`: declares
  `Module::LoadFromBytes(const String& kind, const Bytes& bytes)` with
  a doc note pointing at `cubin_launcher` as the canonical loader
  template.
* `src/ffi/extra/library_module.cc`: defines it as a thin wrapper
  around the existing `LoadModuleFromBytes`.
* `src/ffi/extra/module.cc`: registers `ffi.ModuleLoadFromBytes` in
  the static-init block alongside `ffi.ModuleLoadFromFile`.
* `rust/tvm-ffi/src/extra/module.rs`: adds
  `tvm_ffi::Module::load_from_bytes(kind, bytes)`.
* `python/tvm_ffi/module.py`: adds `load_module_from_bytes(kind,
  data)` mirroring the existing `load_module(path)`. Exposed from
  `tvm_ffi/__init__.py`.
* `python/tvm_ffi/_ffi_api.py`: regenerated stub.
* `tests/python/test_module_load_from_bytes.py`: three end-to-end
  tests covering (1) round-trip via a Python-registered loader,
  (2) error path when no loader is registered, (3) loader exceptions
  propagating to the caller.

## Motivation

Use case: a project that fetches CUDA PTX / CUBIN payloads from a
registry already has the bytes in memory. The current
`Module::LoadFromFile` path forces a tempfile detour. With this API:

```python
import tvm_ffi

@tvm_ffi.register_global_func("ffi.Module.load_from_bytes.echo")
def _echo_loader(payload: bytes) -> tvm_ffi.Module:
    # Real loader would parse `payload` and return a runnable module.
    ...

mod = tvm_ffi.load_module_from_bytes("echo", b"<payload>")
```

The dispatch through `ffi.Module.load_from_bytes.<kind>` is unchanged;
existing loaders register exactly as before.
@lucifer1004 lucifer1004 force-pushed the module-load-from-bytes branch from 70ef19b to 89eb576 Compare May 15, 2026 04:10
@tqchen
Copy link
Copy Markdown
Member

tqchen commented May 15, 2026

Thanks for contribution. this is a case where we do not want to expose to public, mainly because the module loading would needs to be triggered with submodules, and this is supposed to be triggered by the formal whole module loader like dso. For same library load, one mechanism we recommend is the system library approach. Global function should be sufficient for UT perhaps

@tqchen
Copy link
Copy Markdown
Member

tqchen commented May 15, 2026

For specific modules like cuda or ptx, generally we will expose different global ffi function constructors and not going through the module serializer api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants