Skip to content

Replace vector<int> with SmallList (a stack array) (#743)#1

Open
JPRichings wants to merge 13 commits into
JPRichings:ctrls_performancefrom
QuEST-Kit:devel
Open

Replace vector<int> with SmallList (a stack array) (#743)#1
JPRichings wants to merge 13 commits into
JPRichings:ctrls_performancefrom
QuEST-Kit:devel

Conversation

@JPRichings
Copy link
Copy Markdown
Owner

This is to circumvent the std::vector performance overheads visible in few-qubit simulation (responsible for a performance regression from v3; see QuEST-Kit#720), and also so that qubit lists can be passed directly to CUDA kernels without conversion (as explored in QuEST-Kit#739).

TysonRayJones and others added 13 commits May 17, 2026 17:49
This is to circumvent the std::vector performance overheads visible in few-qubit simulation (responsible for a performance regression from v3; see #720), and also so that qubit lists can be passed directly to CUDA kernels without conversion (as explored in #739).
Optimisations include:
- Adopted SmallView (const SmallList&) to avoid superfluous SmallList copies
- Made internally created matrices static
- Change accelerator dynamic function vectors to static arrays
- Exit all validators early when validation is disabled

Additional cleanup includes:
- Tidied accelerator macros (replaced param-specific macros like "numCtrls" and "numTargs" with "param")
- Fill ctrlStates vectors with default before localiser
- Renamed getBitsFromInteger to setToBitsOfInteger
- Adopted const in bitwise.hpp to better express intent

Note that the naming of SmallList and SmallView will be subsequently changed to List64 and ConstList64
such that they all begin with QUEST, but some have additional changes
so that we can compile MPI tests on systems which cannot actually run with MPI, because they are missing an MPI or UCX library file, as is witnessed in the CI (when compiling with MPICH). It's generally irksome too to trigger an execution of the test binary (which itself initialises QuEST) during build when on a HPC platform with distinct submit and compute nodes
* Added ENABLE_SUBCOMM build option

* Moved from MPI_COMM_WORLD to mpiQuestComm

* Decided passing *MPI_Comm was probably overly cautious, and updated function name to comm_getMpiComm

* environment.cpp: added methods to reset rank and numNodes, and reporting for subcomm compiled

* comm_config.hpp/cpp: added comm_setMpiComm

* CMakeLists.txt: PUBLIC MPI::MPI_CXX turned out to be unhelpful, even for SubComm, because of course it enforces CXX

* Added new custom QuESTEnv initialiser which allow user to positively declare that they take ownership of MPI

* validation.cpp: updated comm_end call

* comm_config.hpp: added config.h include so COMPILE_MPI is actually defined

* subcommunicator.h/cpp: implemented QuESTEnv initialiser with custom MPI_Comm

* CMake: added subcommunicator.cpp

* comm_config.hpp: added missing config.h include...

* comm_config.cpp: explicitly initialise mpiCommQuest to MPI_COMM_NULL, updated setComm for init only workflow

* quest.h: added subcommunicator header

* CMake: added MPI to application binaries when SUBCOMM is enabled

* comm_routines.cpp: post Irecv before Isend which probably won't do anything but it makes MPI library implementers less nervous

* tests: added new env test for initCustomMpiQuESTEnv

* Added error throws to comm_config to cover new scenarios of badness with user owned MPI

* subcommunicator.cpp: updated var names to match QuEST style

* tests/unit/initialisations.cpp: slightly modified setQuregAmps test to avoid unexpected test failure due to range checking when compild in Debug configuration

* Updated validation in comm_setMpiComm

Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es>

* userOwnsMpi int->bool

* comm_config.cpp: corrected call to MPI_Comm_free

* subcommunicator.cpp: userOwnsMpi int->bool

* subcommunicator.cpp: added comm_isInit guard around comm_setMpiComm

* environment.cpp: USER_OWNS_MPI -> userOwnsMpi

* comm_init: fixed case where useDistrib = 0 and userOwnsMpi = true

* comm_init: moved (recently) misplaced MPI_Init

* AUTHORS.txt: added iarejula-bsc

* Added placeholder docstrings to new initialisers

* docs/cmake.md: added ENABLE_SUBCOMM to list of QuEST CMake vars

* Newly added COMPILE_MPI -> QUEST_COMPILE_MPI

* ENABLE_SUBCOMM -> QUEST_ENABLE_SUBCOMM

* CMake: corrected OpenMP and subcommunicator pre-processor definitions

---------

Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Co-authored-by: iarejula-bsc <inigo.arejula@bsc.es>
to reduce the likelihood of users printing from non-root nodes interrupting QuEST root output. This is not bullet-proof; we sync the active communicator rather than MPI_COMM_WORLD so the user-controlled non-participating processes may still be printing. Furthermore, even if all processes participate, some may have outstanding non-root prints that are not aggregated to the user screen by the time MPI_Barrier finishes. But these syncs greatly reduce the change of corruption, and are effectively free!
This enables CRAY MPICH platforms to leverage GPU-awareness, greatly accelerating distributed GPU simulation

Co-authored-by: JPRichings <james.richings@ed.ac.uk>
Important changes:
- permit user initialisation of MPI when QuEST is not distributed
- changed QuESTEnv fields bool from int (e.g. isMultithreaded)
- add user-input validation for custom MPI calls
- disambiguated comm_config.cpp concepts of "MPI is initialised" (comm_isMpiInit) from "QuEST communication is active" (comm_isActive)
- refactored comm_config.cpp flow, especially related to pre-quest-init flow (during validation)
- added Oliver's custom-MPI examples (from #712)
- moved new API functions to experimental.h
- tweaked reportQuESTEnv output grouping
Added:
- QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK CMake option
- QUEST_DEFAULT_NUM_GPU_THREADS_PER_BLOCK environment variable
- setQuESTNumGpuThreadsPerBlock() API function
- getQuESTNumGpuThreadsPerBlock() API function
- set_num_gpu_threads examples in examples/extended

---------

Co-authored-by: Oliver Thomson Brown <8394906+otbrown@users.noreply.github.com>
Co-authored-by: Tyson Jones <tyson.jones.input@gmail.com>
Beware this included removing the superfluous `numControls` argument from the C++only `std::vector` overload of `applyMultiStateControlledCompMatr2`, which is technically a teeny tiny API break ¯\_(ツ)_/¯
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants