[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders by lucifer1004 · Pull Request #591 · apache/tvm-ffi

lucifer1004 · 2026-05-15T03:06:53Z

Summary

Promotes the existing internal LoadModuleFromBytes(kind, bytes)
helper (src/ffi/extra/library_module.cc) to a public Module API,
registers it as the ffi.ModuleLoadFromBytes global, adds matching
Python (tvm_ffi.load_module_from_bytes) and Rust
(tvm_ffi::Module::load_from_bytes) bindings, and ships end-to-end
Python tests that exercise the dispatch contract.

API contract

Module::LoadFromBytes(kind, bytes) dispatches to the registered
global ffi.Module.load_from_bytes.<kind> (signature
(Bytes) -> Module). If no loader for the given kind is registered,
RuntimeError is raised naming the missing key — so the user knows
exactly what they need to register.

This is the dispatching entry point, not a specific format loader.
Loaders are registered by consumers. The split keeps libtvm_ffi.so
independent of libcuda / ROCm / etc.: a CPU-only build has the API
but no built-in CUDA loader, whereas a consumer-side .so (built
against the existing header-only tvm/ffi/extra/cuda/cubin_launcher.h)
can register ffi.Module.load_from_bytes.cubin for the whole process.
The examples/cubin_launcher/dynamic_cubin/ example already
implements this pattern.

Changes

include/tvm/ffi/extra/module.h — declares
Module::LoadFromBytes(const String& kind, const Bytes& bytes) with
a doc note pointing at cubin_launcher as the canonical loader
template.
src/ffi/extra/library_module.cc — defines it as a thin wrapper
around the existing LoadModuleFromBytes.
src/ffi/extra/module.cc — registers ffi.ModuleLoadFromBytes in
the static-init block alongside ffi.ModuleLoadFromFile.
rust/tvm-ffi/src/extra/module.rs — adds
tvm_ffi::Module::load_from_bytes(kind, bytes).
python/tvm_ffi/module.py — adds
load_module_from_bytes(kind, data) mirroring the existing
load_module(path). Exposed from tvm_ffi/__init__.py.
python/tvm_ffi/_ffi_api.py — regenerated stub.
tests/python/test_module_load_from_bytes.py — three end-to-end
tests:
1. round-trip via a Python-registered loader,
2. RuntimeError path when no loader is registered (message
  names the missing key),
3. loader exceptions propagating to the caller.

Motivation

A project that fetches CUDA PTX / CUBIN payloads from a registry
already has the bytes in memory. The current Module::LoadFromFile
path forces a tempfile detour. With this API:

import tvm_ffi

@tvm_ffi.register_global_func("ffi.Module.load_from_bytes.echo")
def _echo_loader(payload: bytes) -> tvm_ffi.Module:
    # Real loader would parse `payload` and return a runnable module.
    ...

mod = tvm_ffi.load_module_from_bytes("echo", b"<payload>")

Or from Rust:

let module = tvm_ffi::Module::load_from_bytes("cubin", &bytes)?;

The dispatch through ffi.Module.load_from_bytes.<kind> is unchanged;
existing loaders register exactly as before.

Test plan

New Python tests: 3 passing locally
(tests/python/test_module_load_from_bytes.py).
Existing ffi.ModuleLoadFromFile global continues to work.
Rust crate compiles + Module::load_from_bytes callable.
Pre-commit lint passes locally and in CI on the previous
revision; updated revision should match.

Stacked on #590

This branch is logically stacked on #590 (Rust macro fixes). After
#590 merges, this PR's diff collapses to just the
Module::LoadFromBytes commit. Reviewers who have already approved
#590 can skip the first commit here.

gemini-code-assist

Code Review

This pull request introduces the ability to load TVM modules from in-memory bytes by adding LoadFromBytes to the C++ API and load_from_bytes to the Rust bindings. It also includes hygiene improvements to Rust macros, such as using $crate for internal references. Feedback suggests replacing the #[unsafe(no_mangle)] attribute with the standard #[no_mangle] to maintain compatibility with Rust versions older than 1.82.0.

gemini-code-assist · 2026-05-15T03:08:07Z

+            // than a bare `tvm_ffi_sys::…`) lets downstream crates use the
+            // macro without having to add `tvm-ffi-sys` to their own
+            // `[dependencies]`.
+            #[unsafe(no_mangle)]


The #[unsafe(no_mangle)] attribute is a feature stabilized in Rust 1.82.0. Using this syntax will cause compilation errors on older versions of the Rust compiler (e.g., 1.80.0 or 1.81.0). Unless the project has explicitly bumped its Minimum Supported Rust Version (MSRV) to 1.82.0, it is recommended to use the standard #[no_mangle] attribute, which is backward-compatible and still valid in current Rust versions.

Suggested change

#[unsafe(no_mangle)]

#[no_mangle]

Three small bugs in the Rust ergonomics that prevented the macros from being usable from downstream cdylibs: 1. `ensure!` expanded to `crate::bail!`, which resolves to the *caller* crate at expansion site rather than `tvm_ffi`. Switched to `$crate::bail!` so the path resolves correctly in any crate. 2. `tvm_ffi_dll_export_typed_func!` referenced `tvm_ffi_sys::TVMFFIAny` without a `$crate::` prefix, forcing every downstream crate to add `tvm-ffi-sys` to its own `[dependencies]`. Switched to `$crate::tvm_ffi_sys::TVMFFIAny`; downstream now only needs `tvm-ffi`. 3. The generated `pub unsafe extern "C" fn __tvm_ffi_<name>` had no `#[no_mangle]`, so the linker stripped the symbol from cdylibs and `Module::GetFunction` could not find it. Added `#[unsafe(no_mangle)]` (supported in 2021 + 2024 editions on rustc >= 1.82). Verified by building a downstream cdylib that only depends on `tvm-ffi` (no `tvm-ffi-sys` direct dep), loading it via `tvm_ffi.load_module(...)`, and calling exported scalar + Tensor functions from Python.

…odule loaders The internal helper `LoadModuleFromBytes(kind, bytes)` (`src/ffi/extra/library_module.cc`) has been around for a while as a C++ free function used during binary deserialization. It was not exposed as a public Module API, so callers who already hold module payload in memory (e.g. a PTX or CUBIN blob fetched from a registry) had to materialize it to disk first and go through `ModuleLoadFromFile`. This commit promotes the helper to a public `Module::LoadFromBytes(kind, bytes)` and registers it as the `ffi.ModuleLoadFromBytes` global so Python and Rust bindings can call it without re-implementing kind → loader dispatch. ## API contract `Module::LoadFromBytes(kind, bytes)` dispatches to the registered global `ffi.Module.load_from_bytes.<kind>` (signature `(Bytes) -> Module`). If no loader for the given kind is registered, `RuntimeError` is raised naming the missing key — so the user knows exactly what to register. This is the *dispatching entry point*, not a specific format loader. Loaders are registered by consumers, mirroring how loaders for module formats already work today (the cubin_launcher example header-only library is the canonical CUDA loader template). This split keeps `libtvm_ffi.so` independent of libcuda / ROCm / etc.: a CPU-only build of tvm-ffi has the API but no built-in CUDA loader, whereas a consumer-side `.so` (built against `cubin_launcher.h`) can register `ffi.Module.load_from_bytes.cubin` for everyone in the same process. ## Changes * `include/tvm/ffi/extra/module.h`: declares `Module::LoadFromBytes(const String& kind, const Bytes& bytes)` with a doc note pointing at `cubin_launcher` as the canonical loader template. * `src/ffi/extra/library_module.cc`: defines it as a thin wrapper around the existing `LoadModuleFromBytes`. * `src/ffi/extra/module.cc`: registers `ffi.ModuleLoadFromBytes` in the static-init block alongside `ffi.ModuleLoadFromFile`. * `rust/tvm-ffi/src/extra/module.rs`: adds `tvm_ffi::Module::load_from_bytes(kind, bytes)`. * `python/tvm_ffi/module.py`: adds `load_module_from_bytes(kind, data)` mirroring the existing `load_module(path)`. Exposed from `tvm_ffi/__init__.py`. * `python/tvm_ffi/_ffi_api.py`: regenerated stub. * `tests/python/test_module_load_from_bytes.py`: three end-to-end tests covering (1) round-trip via a Python-registered loader, (2) error path when no loader is registered, (3) loader exceptions propagating to the caller. ## Motivation Use case: a project that fetches CUDA PTX / CUBIN payloads from a registry already has the bytes in memory. The current `Module::LoadFromFile` path forces a tempfile detour. With this API: ```python import tvm_ffi @tvm_ffi.register_global_func("ffi.Module.load_from_bytes.echo") def _echo_loader(payload: bytes) -> tvm_ffi.Module: # Real loader would parse `payload` and return a runnable module. ... mod = tvm_ffi.load_module_from_bytes("echo", b"<payload>") ``` The dispatch through `ffi.Module.load_from_bytes.<kind>` is unchanged; existing loaders register exactly as before.

tqchen · 2026-05-15T14:59:10Z

Thanks for contribution. this is a case where we do not want to expose to public, mainly because the module loading would needs to be triggered with submodules, and this is supposed to be triggered by the formal whole module loader like dso. For same library load, one mechanism we recommend is the system library approach. Global function should be sufficient for UT perhaps

tqchen · 2026-05-15T15:00:51Z

For specific modules like cuda or ptx, generally we will expose different global ffi function constructors and not going through the module serializer api

lucifer1004 mentioned this pull request May 15, 2026

[FIX][RUST] use $crate:: in tvm_ffi_dll_export_typed_func! and ensure! #590

Merged

4 tasks

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

lucifer1004 force-pushed the module-load-from-bytes branch 3 times, most recently from 64bc197 to 4995972 Compare May 15, 2026 04:01

lucifer1004 changed the title ~~[FEAT] Module::LoadFromBytes public API + global registration~~ [FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders May 15, 2026

lucifer1004 force-pushed the module-load-from-bytes branch from 4995972 to 70ef19b Compare May 15, 2026 04:09

lucifer1004 force-pushed the module-load-from-bytes branch from 70ef19b to 89eb576 Compare May 15, 2026 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders#591

[FEAT] Module::LoadFromBytes: dispatching entry point for in-memory module loaders#591
lucifer1004 wants to merge 2 commits into
apache:mainfrom
lucifer1004:module-load-from-bytes

lucifer1004 commented May 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

tqchen commented May 15, 2026

Uh oh!

tqchen commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lucifer1004 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API contract

Changes

Motivation

Test plan

Stacked on #590

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

tqchen commented May 15, 2026

Uh oh!

tqchen commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucifer1004 commented May 15, 2026 •

edited

Loading