fix: add pre-compile normalize_ops to prevent stride mismatch in comp…#31
fix: add pre-compile normalize_ops to prevent stride mismatch in comp…#31danieyan-amd wants to merge 1 commit into
Conversation
…ile_ops
Root cause: ~30 transformation passes between the first normalize_ops and
compile_ops create new ops with un-normalized attributes. During parallel
compilation in compile_ops, strides are captured at compile time. When upstream
code_objects are sequentially replaced, their output strides change, causing
downstream code_objects to fail with 'Input shapes have changed'.
Fix:
- target.cpp: Add normalize_ops{} + dead_code_elimination{} immediately before
compile_ops in the GPU pass pipeline. All ops are fully normalized before any
GPU kernel compilation begins.
- compile_ops.cpp: Add try/catch safety net around sequential replacement. If a
shape mismatch still occurs, failed plans are retained and re-compiled with
updated shapes. Also adds a second compile pass for re-compilation.
Tested: Full Topaz SLMU model (8514 nodes) compiles successfully with no stride
mismatch error. Previously failed after ~6h44m of compilation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f333d76368
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| catch(const std::exception&) | ||
| { |
There was a problem hiding this comment.
Preserve hard compile failures instead of swallowing them
Catching every std::exception from cp.replace(m) turns real compile errors (for example compile_plan::benchmark() throwing "No valid tuned compilation") into a silent retry path, so the pass no longer fails fast when no kernel can be produced. After the second cm.compile(m) these plans can remain unresolved and, in release builds where assert(cm.cps.empty()) is compiled out, compile_ops::apply can return with uncompiled gpu::precompile_op instructions instead of reporting the original error. This changes a deterministic failure into potentially invalid compiled output.
Useful? React with 👍 / 👎.
Root cause: ~30 transformation passes between the first normalize_ops and compile_ops create new ops with un-normalized attributes. During parallel compilation in compile_ops, strides are captured at compile time. When upstream code_objects are sequentially replaced, their output strides change, causing downstream code_objects to fail with 'Input shapes have changed'.
Fix:
Tested: Full Topaz SLMU model (8514 nodes) compiles successfully with no stride mismatch error. Previously failed after ~6h44m of compilation.