Description
On an Intel Arc 140T iGPU (Arrow Lake-P, Xe2-LPG), sustained OpenCL compute from llama.cpp (SYCL, via the oneAPI Unified Runtime) intermittently hangs the GPU compute engine. The kernel xe driver detects and resets it:
xe 0000:01:00.0: [drm] exec queue reset detected
xe 0000:01:00.0: [drm] GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=N
The devcoredump reset reason is LR job cleanup, guc_id=N. The application then sees UR_RESULT_ERROR_OUT_OF_RESOURCES (surfaced at a later clFinish / stream->wait()), and aborts.
It is timing-sensitive (looks like a race)
Bare, it aborts within ~2 requests. The exact same workload under SYCL_UR_TRACE=2 (which heavily slows and serializes the UR calls) survives indefinitely (18/18 requests, no reset). Anything that slows submission (tracing, debug logging, fewer concurrent kernels) avoids it, which points to an async-ordering / race condition rather than a deterministic resource limit.
Workload
llama.cpp Mixture-of-Experts inference (e.g. gemma-4-26b-a4b) with the experts on the GPU: each layer dispatches many small per-expert matmul kernels back to back. Running the experts on the CPU (-ot exps=CPU) avoids it entirely, so it is specific to the high-rate small-kernel GPU compute pattern.
Environment
- GPU: Intel Arc 140T (Arrow Lake-P, Xe2-LPG), Core Ultra 9 285H
- OS: Ubuntu 24.04, kernel 6.17.0-35-generic,
xe driver
- GuC firmware: 70.53.0 (updated from upstream linux-firmware; the issue persists at this version)
- intel-compute-runtime (NEO): 24.39.31294
- Backend: OpenCL 3.0 NEO via oneAPI Unified Runtime
Tried, did NOT help
- GuC firmware update 70.36.0 -> 70.53.0 (rebooted, confirmed loaded)
GGML_SYCL_DISABLE_OPT=1
Question
Is this a known CCS engine-reset issue on Arrow Lake / Xe2 under high-rate OpenCL compute, and is it addressed in a newer compute-runtime (we are on 24.39; latest is ~26.18) or a newer kernel? I am about to retest on a much newer stack (NEO 26.18, kernel 7.0) and can report back either way. Happy to provide the full devcoredump, dmesg, or a minimal repro on request.
Description
On an Intel Arc 140T iGPU (Arrow Lake-P, Xe2-LPG), sustained OpenCL compute from llama.cpp (SYCL, via the oneAPI Unified Runtime) intermittently hangs the GPU compute engine. The kernel
xedriver detects and resets it:The
devcoredumpreset reason isLR job cleanup, guc_id=N. The application then seesUR_RESULT_ERROR_OUT_OF_RESOURCES(surfaced at a laterclFinish/stream->wait()), and aborts.It is timing-sensitive (looks like a race)
Bare, it aborts within ~2 requests. The exact same workload under
SYCL_UR_TRACE=2(which heavily slows and serializes the UR calls) survives indefinitely (18/18 requests, no reset). Anything that slows submission (tracing, debug logging, fewer concurrent kernels) avoids it, which points to an async-ordering / race condition rather than a deterministic resource limit.Workload
llama.cpp Mixture-of-Experts inference (e.g. gemma-4-26b-a4b) with the experts on the GPU: each layer dispatches many small per-expert matmul kernels back to back. Running the experts on the CPU (
-ot exps=CPU) avoids it entirely, so it is specific to the high-rate small-kernel GPU compute pattern.Environment
xedriverTried, did NOT help
GGML_SYCL_DISABLE_OPT=1Question
Is this a known CCS engine-reset issue on Arrow Lake / Xe2 under high-rate OpenCL compute, and is it addressed in a newer compute-runtime (we are on 24.39; latest is ~26.18) or a newer kernel? I am about to retest on a much newer stack (NEO 26.18, kernel 7.0) and can report back either way. Happy to provide the full
devcoredump,dmesg, or a minimal repro on request.