Skip to content

fix: add pre-compile normalize_ops to prevent stride mismatch in comp…#31

Open
danieyan-amd wants to merge 1 commit into
developfrom
fix/stride-mismatch-v2
Open

fix: add pre-compile normalize_ops to prevent stride mismatch in comp…#31
danieyan-amd wants to merge 1 commit into
developfrom
fix/stride-mismatch-v2

Conversation

@danieyan-amd
Copy link
Copy Markdown
Owner

Root cause: ~30 transformation passes between the first normalize_ops and compile_ops create new ops with un-normalized attributes. During parallel compilation in compile_ops, strides are captured at compile time. When upstream code_objects are sequentially replaced, their output strides change, causing downstream code_objects to fail with 'Input shapes have changed'.

Fix:

  • target.cpp: Add normalize_ops{} + dead_code_elimination{} immediately before compile_ops in the GPU pass pipeline. All ops are fully normalized before any GPU kernel compilation begins.
  • compile_ops.cpp: Add try/catch safety net around sequential replacement. If a shape mismatch still occurs, failed plans are retained and re-compiled with updated shapes. Also adds a second compile pass for re-compilation.

Tested: Full Topaz SLMU model (8514 nodes) compiles successfully with no stride mismatch error. Previously failed after ~6h44m of compilation.

…ile_ops

Root cause: ~30 transformation passes between the first normalize_ops and
compile_ops create new ops with un-normalized attributes. During parallel
compilation in compile_ops, strides are captured at compile time. When upstream
code_objects are sequentially replaced, their output strides change, causing
downstream code_objects to fail with 'Input shapes have changed'.

Fix:
- target.cpp: Add normalize_ops{} + dead_code_elimination{} immediately before
  compile_ops in the GPU pass pipeline. All ops are fully normalized before any
  GPU kernel compilation begins.
- compile_ops.cpp: Add try/catch safety net around sequential replacement. If a
  shape mismatch still occurs, failed plans are retained and re-compiled with
  updated shapes. Also adds a second compile pass for re-compilation.

Tested: Full Topaz SLMU model (8514 nodes) compiles successfully with no stride
mismatch error. Previously failed after ~6h44m of compilation.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f333d76368

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +346 to +347
catch(const std::exception&)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve hard compile failures instead of swallowing them

Catching every std::exception from cp.replace(m) turns real compile errors (for example compile_plan::benchmark() throwing "No valid tuned compilation") into a silent retry path, so the pass no longer fails fast when no kernel can be produced. After the second cm.compile(m) these plans can remain unresolved and, in release builds where assert(cm.cps.empty()) is compiled out, compile_ops::apply can return with uncompiled gpu::precompile_op instructions instead of reporting the original error. This changes a deterministic failure into potentially invalid compiled output.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant