Skip to content

Question about FP4 scaling factor layout when K < tile size (e.g., K=32) #311

@IsaiahHY

Description

@IsaiahHY

Hi,

I’m currently working on FP4-based quantized GEMM kernels and referring to the “1D Block Scaling Factors Layout (128×4 tile)” described in the documentation.

However, I couldn’t find detailed guidance on how to handle cases where the K dimension is smaller than the tile requirement, and I’d like to clarify the expected layout for scaling factors in such scenarios.

My understanding

For matrix A (M × K):

When M = 512, K = 64, the scaling factors are:
M_scale = 512
K_scale = 4 (since 64 / 16 = 4)

This matches the documented 128 × 4 tile layout, so the scaling factors can be naturally arranged as shown.

My question

If K = 32, then:

K_scale = 2 (since 32 / 16 = 2)

In this case, the scaling factor tile becomes effectively 128 × 2, which does not match the documented 128 × 4 layout.

What is the correct way to handle this situation?

Specifically:

Should the scaling factors be padded (e.g., zero-filled) along the K dimension to match the required 128 × 4 tile layout?
Or should the layout be compacted (i.e., use 128 × 2 tiles without padding)?
Or is there another expected handling (e.g., different scaling mode or alignment requirement)?
Additional context

I am using FP4 quantization with block scaling (e.g., vec16-style scaling), and trying to ensure my scale layout is fully compatible with Tensor Core / cuBLASLt expectations.

Any clarification or reference would be greatly appreciated. Thanks!

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions