Conversation
|
Needs changes from #65 related to using multiple schedules in a workload. The added transform module aims to provide small reusable transform "bundles" to simplify writing schedules. All these helpers are opinionated by design, mostly modeled by what is needed for the example. The APIs could probably be refined. Also, schedules ended up being mostly simple wrappers around the transform bundles. Perhaps, it's not worth having both modules. |
5f60821 to
8029fc4
Compare
dad30a1 to
2c43efd
Compare
|
Reworked transform module to provide simple APIs over transform ops. |
rengolin
left a comment
There was a problem hiding this comment.
The reason why I wanted to add a python file as a schedule was to be able to reuse all of those new schedules you created and added to the lighthouse scope. We can discuss that later.
Some comments inline.
| ) -> bool: | ||
| A, B, C = self._input_arrays | ||
| out_ref = np.matmul(A, B, dtype=np.float32) | ||
| return np.allclose(C, out_ref) |
There was a problem hiding this comment.
How is this comparing with the kernel execution output?
| if dtype == ml_dtypes.bfloat16: | ||
| # For BF16, enforce fixed tile size due to current rewriter pattern matching limitation. | ||
| # TODO: Relax when x86 BF16 pass supports dynamic indexing. | ||
| tile_size = 32 |
There was a problem hiding this comment.
perhaps a warning message (stderr?) saying you did this, to avoid surprises.
| dump_payload=args.dump_kernel, | ||
| dump_schedule=args.dump_schedule, | ||
| ) | ||
| else: |
Adds x86-specific vectorization example for matrix multiplication.
Comes with a collection of opinionated but reusable transforms and schedules.
The lowering schedule currently supports F32 (general) and BF16 (avx512, flat layout) matmuls.