Problem
A local large-scale LLM pre-training team finds TE's host-side overhead significant for small-shape / latency-sensitive paths, and has already bypassed TE to call MXFP8/NVFP4 kernels directly from their own C++. The cost is structural: (1) autocast + FP8GlobalStateManager state bookkeeping, (2) tensor-subclass torch_dispatch, (3) Quantizer objects passed per op + attribute reads, (4) Python↔C++ round-trips constructing subclass tensors. tex.quantize doesn't avoid this — it still needs a quantizer and returns a subclass tensor.
Request
A documented, supported stateless functional API: plain torch.Tensor in → plain torch.Tensor(s) out (data + scale_inv + amax), bypassing subclass/Quantizer/autocast. Essentially a thin blessed wrapper over the existing nvte_* C API.
Is a lightweight functional surface like this something the team would consider in principle, or is it intentionally out of scope for TE? Mainly trying to gauge whether it's worth exploring further before we discuss possible ways to help move it along.
Problem
A local large-scale LLM pre-training team finds TE's host-side overhead significant for small-shape / latency-sensitive paths, and has already bypassed TE to call MXFP8/NVFP4 kernels directly from their own C++. The cost is structural: (1) autocast + FP8GlobalStateManager state bookkeeping, (2) tensor-subclass torch_dispatch, (3) Quantizer objects passed per op + attribute reads, (4) Python↔C++ round-trips constructing subclass tensors. tex.quantize doesn't avoid this — it still needs a quantizer and returns a subclass tensor.
Request
A documented, supported stateless functional API: plain torch.Tensor in → plain torch.Tensor(s) out (data + scale_inv + amax), bypassing subclass/Quantizer/autocast. Essentially a thin blessed wrapper over the existing nvte_* C API.
Is a lightweight functional surface like this something the team would consider in principle, or is it intentionally out of scope for TE? Mainly trying to gauge whether it's worth exploring further before we discuss possible ways to help move it along.