Skip to content

Fix ZA tile slice indices in ssyrk SME direct kernel#5874

Draft
jschueller wants to merge 1 commit into
OpenMathLib:developfrom
jschueller:dev-fix
Draft

Fix ZA tile slice indices in ssyrk SME direct kernel#5874
jschueller wants to merge 1 commit into
OpenMathLib:developfrom
jschueller:dev-fix

Conversation

@jschueller

Copy link
Copy Markdown

The kernel_2x2 function uses 4 ZA tiles (0-3) each with svl slices. Tiles 0/1 handle rows 0..svl-1 with slice indices 0..svl-1. Tiles 2/3 handle rows svl..2*svl-1, so their slice indices must start at 0, i.e. (i - svl) instead of i.

Fix all three tile 2/3 access sites:

  • C load into ZA (svwrite_hor_za32_f32_m)
  • C writeback for UPPER (svst1_hor_za32)
  • C writeback for LOWER (svst1_hor_za32)

Fixes #5873

The kernel_2x2 function uses 4 ZA tiles (0-3) each with svl slices.
Tiles 0/1 handle rows 0..svl-1 with slice indices 0..svl-1.
Tiles 2/3 handle rows svl..2*svl-1, so their slice indices
must start at 0, i.e. (i - svl) instead of i.

Fix all three tile 2/3 access sites:
- C load into ZA (svwrite_hor_za32_f32_m)
- C writeback for UPPER (svst1_hor_za32)
- C writeback for LOWER (svst1_hor_za32)

Fixes OpenMathLib#5873
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

float32 SSYRK wrong on ARM64 VORTEXM4 (RowMajor, Upper, NoTrans)

1 participant