Skip to content

Commit 9d58b8d

Browse files
committed
provide a NEON version of arm/sgemm
benchmark/sgemm.goto before: M= 200, N= 200, K= 200 : 9262.97 MFlops 0.001727 sec after: M= 200, N= 200, K= 200 : 30223.64 MFlops 0.000529 sec Conveniently the registers are already allocated suitably for vector operation, so the conversion from vfpv3 was rather straightforward. Prefetching was left out because it doesn't help Cortex-A76, only hurts it slightly.
1 parent cd276c2 commit 9d58b8d

1 file changed

Lines changed: 225 additions & 2 deletions

File tree

0 commit comments

Comments
 (0)