Problem description
Related to #4844, MIGraphX succeeds to compile a simple model when one dynamic dimension is specified, but it then fails with a segfault when the model is run.
Also, the MIGRAPHX_ENABLE_FULL_DYNAMIC environment variable doesn't help whether or not it's set to 1.
Steps to reproduce
import math
import migraphx
import torch
DEVICE = "cuda:0"
EMBEDDING_COUNT = 32
EMBEDDING_DIM = 16
BATCH_SIZE = 4
torch.inference_mode(True)
torch.cuda.set_device(DEVICE)
_TORCH_TYPE_MAPPING = {
torch.int64: "int64_type",
torch.float32: "float_type",
}
def _convert_tensor_to_argument(tensor):
assert str(tensor.device) == DEVICE
assert tensor.is_contiguous()
return migraphx.argument_from_pointer(
migraphx.shape(
type=_TORCH_TYPE_MAPPING[tensor.dtype],
lens=list(tensor.size()),
strides=list(tensor.stride()),
), tensor.data_ptr())
model = torch.nn.Embedding(EMBEDDING_COUNT, EMBEDDING_DIM)
model.eval()
input_batch = torch.arange(math.ceil(EMBEDDING_COUNT / 2)).repeat(BATCH_SIZE, 1).contiguous()
torch.onnx.export(
model,
(input_batch,),
"model.onnx",
external_data=False,
dynamo=True,
dynamic_shapes=[
{0: torch.export.Dim.DYNAMIC, 1: torch.export.Dim.DYNAMIC},
],
)
migraphx_model = migraphx.parse_onnx("model.onnx", map_dyn_input_dims={
"input": [
migraphx.shape.dynamic_dimension(BATCH_SIZE, BATCH_SIZE, {BATCH_SIZE}),
migraphx.shape.dynamic_dimension(1, 64, {1}),
],
})
migraphx_model.compile(migraphx.get_target("gpu"), offload_copy=False)
input_batch = input_batch.to(DEVICE)
output = torch.empty(
(*input_batch.shape, EMBEDDING_DIM), dtype=torch.float32, device=DEVICE)
torch.cuda.synchronize(DEVICE)
migraphx_model.run({
"input": _convert_tensor_to_argument(input_batch),
"main:#output_0": _convert_tensor_to_argument(output),
})
We want to note that we've observed the same issue with larger models, but we've created this reproducer script with a single node for simpler analysis.
Also, if you compare the script in this issue with the script in #4844, we've converted the first dimension (BATCH_SIZE) into a static dimension while the second dimension is still a dynamic dimension. The issue still happens if you leave the first dynamic dimension to be dynamic (BATCH_SIZE) and convert the second dynamic dimension into a static dimension.
Environment
OS: Debian GNU/Linux 12 (bookworm)
CPU: AMD Ryzen 9 9950X
GPU: AMD Radeon AI PRO R9700
ROCm version: 7.2.1
MIGraphX version: 2.16.0.dev+20250912-17-406-gb91f1c0c0
Problem description
Related to #4844, MIGraphX succeeds to compile a simple model when one dynamic dimension is specified, but it then fails with a
segfaultwhen the model is run.Also, the
MIGRAPHX_ENABLE_FULL_DYNAMICenvironment variable doesn't help whether or not it's set to1.Steps to reproduce
We want to note that we've observed the same issue with larger models, but we've created this reproducer script with a single node for simpler analysis.
Also, if you compare the script in this issue with the script in #4844, we've converted the first dimension (
BATCH_SIZE) into a static dimension while the second dimension is still a dynamic dimension. The issue still happens if you leave the first dynamic dimension to be dynamic (BATCH_SIZE) and convert the second dynamic dimension into a static dimension.Environment