Summary
When compiling large SPIR-V (rocThrust lookback_set_op_kernel with HAS_VALUES=true, mangled Lb1E) for Intel DG2 / Arc at the default SIMD32 path, IGC silently omits several kernel instantiations from the final native object even though:
- All variants are still present as
OpEntryPoint in the input SPIR-V,
ocloc compile reports “Build succeeded.”,
- The SIMD32 build log can show
[RetryManager] Start recompilation (e.g. on the float template).
Workaround: IGC_ForceOCLSIMDWidth=16 — all eight element-type specializations then appear in the object’s symbol table.
Affected pattern
thrust::hip_rocprim::__set_operations::lookback_set_op_kernel< default_set_operations_config<10000u, T, T>, /*HAS_VALUES=*/true, ...> — eight element-type pairs in SPIR-V: dd ff ii jj ss tt xx yy (Itanium-style ILj10000E**EELb1E in mangled names).
Observed with offline ocloc: the SIMD32 processed_0_dg2.bin exports 5 distinct lookback_set_op_kernel Lb1E type encodings (dd ss tt xx yy), while SIMD16 exports 8. Missing from SIMD32 vs SPIR-V / SIMD16: ff (float), ii (int), jj (uint).
Downstream (Level Zero / HIP): zeModuleGetKernelNames then lacks those kernel names → Failed to find kernel via kernel name at launch.
Reproducer (offline)
-
Input: processed_0.spv — chipStar-processed SPIR-V (e.g. from CHIP_DUMP_PROCESSED_SPIRV while building rocThrust set_difference_by_key); same IL can be attached to this issue.
-
Tools: spirv-as/spirv-dis (validation inventory), ocloc compile -spirv_input -device dg2, llvm-readobj --symbols on processed_0_dg2.bin.
-
Check: Extract symbol Name: lines for lookback_set_op_kernel and distinct ILj10000E[dfijstxy]{2}EELb1E tags — compare default SIMD32 vs IGC_ForceOCLSIMDWidth=16 SIMD16.
Expected: SIMD32 object missing ILj10000EffEELb1E, ILj10000EiiEELb1E, ILj10000EjjEELb1E relative to SPIR-V and SIMD16; SPIR-V still lists all eight OpEntryPoint kernels.
Environment
- GPU / target: DG2 (reproduced on Intel Arc A770)
- IGC / compute stack: via distro
ocloc (version as shipped with Intel compute runtime)
Related
Distinct from subgroup-shuffle / OpUConvert miscompile issues — this is missing kernels in the final ELF after SIMD32 finalization / retry, not wrong arithmetic in a compiled kernel.
Summary
When compiling large SPIR-V (rocThrust
lookback_set_op_kernelwithHAS_VALUES=true, mangledLb1E) for Intel DG2 / Arc at the default SIMD32 path, IGC silently omits several kernel instantiations from the final native object even though:OpEntryPointin the input SPIR-V,ocloc compilereports “Build succeeded.”,[RetryManager] Start recompilation(e.g. on the float template).Workaround:
IGC_ForceOCLSIMDWidth=16— all eight element-type specializations then appear in the object’s symbol table.Affected pattern
thrust::hip_rocprim::__set_operations::lookback_set_op_kernel< default_set_operations_config<10000u, T, T>, /*HAS_VALUES=*/true, ...>— eight element-type pairs in SPIR-V:dd ff ii jj ss tt xx yy(Itanium-styleILj10000E**EELb1Ein mangled names).Observed with offline
ocloc: the SIMD32processed_0_dg2.binexports 5 distinctlookback_set_op_kernelLb1E type encodings (dd ss tt xx yy), while SIMD16 exports 8. Missing from SIMD32 vs SPIR-V / SIMD16:ff(float),ii(int),jj(uint).Downstream (Level Zero / HIP):
zeModuleGetKernelNamesthen lacks those kernel names →Failed to find kernel via kernel nameat launch.Reproducer (offline)
Input:
processed_0.spv— chipStar-processed SPIR-V (e.g. fromCHIP_DUMP_PROCESSED_SPIRVwhile building rocThrustset_difference_by_key); same IL can be attached to this issue.Tools:
spirv-as/spirv-dis(validation inventory),ocloc compile -spirv_input -device dg2,llvm-readobj --symbolsonprocessed_0_dg2.bin.Check: Extract symbol
Name:lines forlookback_set_op_kerneland distinctILj10000E[dfijstxy]{2}EELb1Etags — compare default SIMD32 vsIGC_ForceOCLSIMDWidth=16SIMD16.Expected: SIMD32 object missing
ILj10000EffEELb1E,ILj10000EiiEELb1E,ILj10000EjjEELb1Erelative to SPIR-V and SIMD16; SPIR-V still lists all eightOpEntryPointkernels.Environment
ocloc(version as shipped with Intel compute runtime)Related
Distinct from subgroup-shuffle /
OpUConvertmiscompile issues — this is missing kernels in the final ELF after SIMD32 finalization / retry, not wrong arithmetic in a compiled kernel.