Add NumPy optimization guide by vchamarthi · Pull Request #36 · intel/optimization-zone

vchamarthi · 2026-06-23T00:42:09Z

Adds a new tuning guide documenting how to run NumPy with Intel® oneMKL-backed performance (BLAS/LAPACK plus optional FFT/random/umath patching), and links it from the repository’s main README

Changes:

Add software/numpy/README.md with installation, activation patterns, verification steps, and benchmark summaries for oneMKL-backed NumPy.
Update the root README.md table of contents to include the new NumPy guide.

CC @xaleryb @jharlow-intel @napetrov for addition review

Add intel-numpy mkl extension optimizations readme

david-cortes-intel · 2026-06-24T11:49:14Z

Overall comment: this guide recommends setting the IOMP threading layter for MKL, but pretty much every other PyPI package outside of Intel-distributed NumPy will bundle LibGOMP and could potentially cause incompatibilities.

Perhaps it could recommend setting MKL_THREADING_LAYER=GNU instead.

david-cortes-intel · 2026-06-25T05:58:38Z

+conda install -y \
+  -c https://software.repos.intel.com/python/conda \
+  -c conda-forge --override-channels \
+  "blas=*=*_intelmkl" \


What about lapack? If this is done on an existing environment, there's no guarantee that the user won't have different backends for blas and lapack.

david-cortes-intel · 2026-06-25T06:00:09Z

+  mkl mkl_fft mkl_random mkl_umath mkl-service
+```
+
+`--override-channels` resolves only from the two named channels, so conda does not mix in an OpenBLAS build from elsewhere. The `blas=*=*_intelmkl` selector requests the Intel channel's MKL-backed BLAS; conda-forge offers an equivalent under the build string `blas=*=*mkl`. Either gives an MKL BLAS backend. The Intel channel is required for the three extensions and Intel's latest oneMKL builds.


I assume this advice might have been copied from other documentation pages.

The reason why it was there was to avoid pulling packages from the Anaconda channel which have higher priority. That's worth mentioning here.

david-cortes-intel · 2026-06-25T06:00:52Z

+  conda activate idp_env
+```
+
+Pin `python=<version>` to match your project if you need a specific interpreter. NumPy comes from conda-forge; the Intel channel supplies the `mkl_fft`/`mkl_random`/`mkl_umath` extensions and Intel's latest oneMKL builds. To add oneMKL to an *existing* environment that already has conda-forge NumPy installed, swap its BLAS to the MKL variant and add the extensions in place (this re-links the NumPy you already have, it does not reinstall NumPy):


I think this part is redundant:

Pin python=<version> to match your project if you need a specific interpreter

Since this is not a general conda guide.

david-cortes-intel · 2026-06-25T06:01:19Z

+  mkl mkl_fft mkl_random mkl_umath mkl-service
+```
+
+`--override-channels` resolves only from the two named channels, so conda does not mix in an OpenBLAS build from elsewhere. The `blas=*=*_intelmkl` selector requests the Intel channel's MKL-backed BLAS; conda-forge offers an equivalent under the build string `blas=*=*mkl`. Either gives an MKL BLAS backend. The Intel channel is required for the three extensions and Intel's latest oneMKL builds.


Suggested change

`--override-channels` resolves only from the two named channels, so conda does not mix in an OpenBLAS build from elsewhere. The `blas=*=*_intelmkl` selector requests the Intel channel's MKL-backed BLAS; conda-forge offers an equivalent under the build string `blas=*=*mkl`. Either gives an MKL BLAS backend. The Intel channel is required for the three extensions and Intel's latest oneMKL builds.

`--override-channels` resolves only from the two named channels, so conda does not mix in an OpenBLAS build from elsewhere. The `blas=*=*_intelmkl` selector requests the Intel channel's MKL-backed BLAS; conda-forge offers an equivalent under the build string `blas=*=*mkl*`. Either gives an MKL BLAS backend. The Intel channel is required for the three extensions and Intel's latest oneMKL builds.

david-cortes-intel · 2026-06-25T06:02:07Z

+
+Use `--index-url`, not `--extra-index-url`: Intel's index is a partial mirror, and with `--extra-index-url` pip would see PyPI's higher-numbered OpenBLAS wheel and install that instead. Packages Intel does not mirror (for example `threadpoolctl`, used for [verification](#verifying-onemkl-is-active)) install normally from PyPI in a separate step. The Intel wheels target Linux and Windows; if `pip` reports no matching distribution, check that your platform and Python version are covered on the index.
+
+Whichever path you take, choose the OpenMP threading layer and set it **before anything imports NumPy or MKL**. The variable is read once at MKL load time, so exporting it after the import has no effect. Which value to pick is explained under [Threads and NUMA](#threads-and-numa); the safe default for a typical pip or mixed environment is:


Also applicable to SciPy.

david-cortes-intel · 2026-06-25T06:29:10Z

+
+The `threading_layer` value matches `MKL_THREADING_LAYER` (`gnu`, `intel`, or `sequential`); the field that confirms the backend is `internal_api: mkl`.
+
+`np.show_config()` will show `name: blas, version: 3.9.0` even with oneMKL active. That is expected: it reflects the generic interface NumPy compiled against, not the runtime library. `threadpoolctl` is the reliable check.


These hard-coded version numbers are prone to get outdated over time.

david-cortes-intel · 2026-06-25T06:30:11Z

+MKL_VERBOSE DGEMM(N,N,4096,4096,4096,...) 2.1s CNT=1
+```
+
+If only the banner appears and no `DGEMM`/`DFFT`/`VML` lines follow, oneMKL loaded but is not being called.


It should mention here that which of those show depends on what the code is doing. Maybe could provide a sample script with a matrix multiplication that would trigger dgemm.

david-cortes-intel · 2026-06-25T06:32:03Z

+
+**The extension packages do not activate themselves.** `mkl_fft`, `mkl_random`, and `mkl_umath` do not replace NumPy functions on import. Use the patch function or context manager. Since the 2026.0 release installs the standard conda-forge NumPy rather than a bundled Intel build, there is no longer anything that activates them at build time, so explicit activation is required even in the full Intel® Distribution for Python.
+
+**The activation model is release-specific; this guide targets 2026.0 and later.** The explicit `patch_*` workflow described here matches the package generation in [Benchmark results](#benchmark-results) (NumPy 2.4.3, mkl_fft 2.2.0, mkl_random 1.4.0, mkl_umath 0.4.0). Earlier releases behave differently, verified on `intelpython3_full=2025.3.0`:


This makes it sounds as if this were expected to change in the future. Maybe it could mention that it applies to versions starting with 2026.0.

david-cortes-intel · 2026-06-25T06:33:38Z

+conda install -y \
+  -c https://software.repos.intel.com/python/conda \
+  -c conda-forge --override-channels \
+  "blas=*=*_intelmkl" \


'blas' is a development package providing headers, .pc files, and similar, depending in turn on 'libblas'. 'libblas' is the runtime that sets the backend.

david-cortes-intel · 2026-06-25T06:37:38Z

+conda install -c conda-forge _openmp_mutex=*=*_llvm
+```
+
+On Windows, `_openmp_mutex` offers Intel and LLVM variants but no GNU one, consistent with there being no GNU threading on the platform.


This is not correct:

david-cortes-intel · 2026-06-25T06:39:31Z

+Pin `python=<version>` to match your project if you need a specific interpreter. NumPy comes from conda-forge; the Intel channel supplies the `mkl_fft`/`mkl_random`/`mkl_umath` extensions and Intel's latest oneMKL builds. To add oneMKL to an *existing* environment that already has conda-forge NumPy installed, swap its BLAS to the MKL variant and add the extensions in place (this re-links the NumPy you already have, it does not reinstall NumPy):
+
+```bash
+conda install -y \


Very important to mention here that packages from the Intel channel are meant to be compatible with packages from conda-forge but not with packages from Anaconda, which is the default channel.

david-cortes-intel · 2026-06-25T06:40:36Z

Comment again that the guide specifically mentions AVX-512 as the highest level of SIMD instructions, but that will become outdated soon as hardware with avx10.2 gets released.

david-cortes-intel · 2026-06-25T06:42:42Z

+```python
+from threadpoolctl import threadpool_info
+import pprint
+pprint.pprint(threadpool_info())


This should be executed after importing numpy.

david-cortes-intel · 2026-06-25T07:20:48Z

+| `MKL_DYNAMIC` | `FALSE` | Disable automatic thread scaling |
+| `KMP_AFFINITY` | `granularity=fine,compact,1,0` | Pin threads to physical cores (Intel OpenMP only) |
+
+`KMP_AFFINITY` is an Intel OpenMP setting, so it applies only when oneMKL is on the Intel runtime (`MKL_THREADING_LAYER=INTEL`); under the GNU layer use `GOMP_CPU_AFFINITY` or `numactl` instead. `KMP_AFFINITY=granularity=fine,compact,1,0` is appropriate for single-socket systems or when running one process per socket. On multi-socket systems without `numactl` it may bind threads across sockets; verify the actual binding with `KMP_AFFINITY=verbose`.


What about OMP_PROC_BIND?

david-cortes-intel · 2026-06-25T13:39:12Z

+| Variable | Recommended value | Effect |
+|---|---|---|
+| `MKL_THREADING_LAYER` | `GNU` (mixed env) or `INTEL` (all-Intel) | Select MKL's OpenMP runtime; see note below |
+| `MKL_NUM_THREADS` | physical core count | Cap MKL thread count |


Is this guaranteed to work as intended if you set MKL_NUM_THREADS to number of physical cores, then bind the threads to numbers from the system, but don't specify something like OMP_PLACES=threads? Wouldn't it potentially end up using hyperthreads if the system enumerates them in an interleaved order?

david-cortes-intel · 2026-06-25T14:43:50Z

+The speedup arrives in two parts that activate differently, and the distinction matters for the rest of this guide:
+
+- **Linear algebra (BLAS and LAPACK)** turns on automatically once oneMKL is the backend. `np.dot`, `np.matmul`, and `np.linalg.*` route to it with no code change.
+- **FFT, random, and vectorized math** come from three separate packages (`mkl_fft`, `mkl_random`, `mkl_umath`). These do not activate on import; you switch them on explicitly in code.


It could link to the github repositories of those packages.

vchamarthi and others added 4 commits May 15, 2026 13:01

Add intel-numpy mkl extension optimizations readme

e3c5a1b

Merge remote-tracking branch 'upstream/main' into intel-numpy

b6fa2f4

update the readme with latest release notes.

0f77a3b

Merge pull request #1 from vchamarthi/intel-numpy

3a9bf76

Add intel-numpy mkl extension optimizations readme

jharlow-intel reviewed Jun 23, 2026

View reviewed changes

Comment thread software/numpy/README.md Outdated

jharlow-intel reviewed Jun 23, 2026

View reviewed changes

Comment thread software/numpy/README.md Outdated

david-cortes-intel reviewed Jun 24, 2026

View reviewed changes

update guide with pr comments and recommendations

0bb6756

david-cortes-intel reviewed Jun 25, 2026

View reviewed changes

	`--override-channels` resolves only from the two named channels, so conda does not mix in an OpenBLAS build from elsewhere. The `blas==_intelmkl` selector requests the Intel channel's MKL-backed BLAS; conda-forge offers an equivalent under the build string `blas==mkl`. Either gives an MKL BLAS backend. The Intel channel is required for the three extensions and Intel's latest oneMKL builds.
	`--override-channels` resolves only from the two named channels, so conda does not mix in an OpenBLAS build from elsewhere. The `blas==_intelmkl` selector requests the Intel channel's MKL-backed BLAS; conda-forge offers an equivalent under the build string `blas==mkl*`. Either gives an MKL BLAS backend. The Intel channel is required for the three extensions and Intel's latest oneMKL builds.


		Use `--index-url`, not `--extra-index-url`: Intel's index is a partial mirror, and with `--extra-index-url` pip would see PyPI's higher-numbered OpenBLAS wheel and install that instead. Packages Intel does not mirror (for example `threadpoolctl`, used for [verification](#verifying-onemkl-is-active)) install normally from PyPI in a separate step. The Intel wheels target Linux and Windows; if `pip` reports no matching distribution, check that your platform and Python version are covered on the index.

		Whichever path you take, choose the OpenMP threading layer and set it before anything imports NumPy or MKL. The variable is read once at MKL load time, so exporting it after the import has no effect. Which value to pick is explained under [Threads and NUMA](#threads-and-numa); the safe default for a typical pip or mixed environment is:


		The `threading_layer` value matches `MKL_THREADING_LAYER` (`gnu`, `intel`, or `sequential`); the field that confirms the backend is `internal_api: mkl`.

		`np.show_config()` will show `name: blas, version: 3.9.0` even with oneMKL active. That is expected: it reflects the generic interface NumPy compiled against, not the runtime library. `threadpoolctl` is the reliable check.


		The extension packages do not activate themselves. `mkl_fft`, `mkl_random`, and `mkl_umath` do not replace NumPy functions on import. Use the patch function or context manager. Since the 2026.0 release installs the standard conda-forge NumPy rather than a bundled Intel build, there is no longer anything that activates them at build time, so explicit activation is required even in the full Intel® Distribution for Python.

		The activation model is release-specific; this guide targets 2026.0 and later. The explicit `patch_*` workflow described here matches the package generation in [Benchmark results](#benchmark-results) (NumPy 2.4.3, mkl_fft 2.2.0, mkl_random 1.4.0, mkl_umath 0.4.0). Earlier releases behave differently, verified on `intelpython3_full=2025.3.0`:

Uh oh!

Conversation

vchamarthi commented Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-cortes-intel commented Jun 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel commented Jun 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

david-cortes-intel Jun 25, 2026 •

edited

Loading