First attempt at embarrasingly parallel execution to compute head grids.#140
First attempt at embarrasingly parallel execution to compute head grids.#140dbrakenhoff wants to merge 14 commits into
Conversation
- add parallel submodule - define numba tuples containing relevant data for computations for Model, Aquifer and Elements (LineSink and Well) for now. Each class gets to_numba_tuple() method to collect data. - add integer mappings for elements and boundary conditions for identifying computation path - gather element data in structured arrays - write fast versions of potinf and potential - parallelize on x,y pts
- add parallel submodule - define numba tuples containing relevant data for computations for Model, Aquifer and Elements (LineSink and Well) for now. Each class gets to_numba_tuple() method to collect data. - add integer mappings for elements and boundary conditions for identifying computation path - gather element data in structured arrays - write fast versions of potinf and potential - parallelize on x,y pts
|
Parallel example (based on example from Discussion #115) Note that first run of numba code triggers the compilation step, which means it runs in about ~15s on my machine. The next run takes about ~3s. Normal timflow (using parallel=True, referring to multithreading) is about ~30s. EDIT: example won't run until #139 is merged into dev and subsequently this branch. |
|
Cool work and awesome to see such speed-up already! Some thoughts:
I will follow the progress with great interest, great work Davíd 🎉 |
|
@eriktoller Thanks for the early thoughts!
Good point, I did test it for my little example and it gave the same results, but better to develop it without for now and compare at the end. It also didn't really generate any speedup in my current example.
Good suggestions, I left the existing numba code alone for now, but an (optional) work array output seems like a good idea.
The pre-processing is currently only 0.05% of the total computation time in my current example. Maybe it will be a bit more if I include all the code up to the first call to the numba optimized potential computations, but for now it seems negligible. But it will become more as we start adding in support for more elements probably. So good to keep an eye on that. So >99% of the time is taken by the actual potential computations, and this is how my example script scales with the number of threads on my laptop (averages of 3 runs). The base case (1 thread) runs in ~35s.
|
That is really impressive and shows great potential! For larger timflow models this will be a substantial upgrade when it come to plotting. Did @mbakker7 have a look at this too? |
- avoid memory assignment in loops
|
My early attempt at parallel computations within regular timflow using multithreading was a bit misguided. I now modified it to use multiprocessing instead of multithreading (#139) and this is the result when compared to the fully optimized numba in this PR.
|
Yes, @eriktoller , I am in the loop. Mostly simply talking to @dbrakenhoff rather than giving comments here. Thanks for all your suggestions. Keep them coming! |


Early result suggests ~10x speedup on my machine.
Todo
to_numba_tuple()for all element families intended for Numba path.Add fast
potinfandpotentialkernels for remaining element classes. Missing:selects aquifer by location; backend now uses a single aqtuple and ignores aq_id.
selection behavior, steady contribution, and output shape conventions.
through Python loops. Write fast parallel implementations for disvec_inf, disvec
functions.
points at well radius, near singular line endpoints, zero/very small times, many
intervals, mixed g/v/z elements.