Conversation
…en threaded
mpas_atm_time_integration.F :
Add OMP barriers in atm_rk_integration_setup to keep array use in sync (fixes random bad values at thread bounds)
Turn off OMP parallel for atm_advance_scalars_mono because it slows down (insufficient work in loops)
mpas_atmphys_driver.F : Remove OMP parallel for noahmp because it ignores thread bounds (does all cells)
|
@MicroTed Thanks for providing these OpenMP fixes! Based on testing, I'm still seeing value (speedup) when using OpenMP threading in I think it would be ideal to have a PR branch with two commits: one that fixes the race conditions in |
|
@mgduda Thanks for testing the mono routine on another system. I ran it on a fairly small (900 columns) domain, and perhaps a larger domain is needed to see benefits? I can look at making separate PRs. For the noahmp, I could also just remove the its,ite from the interface since they are not used within. |
|
In my test, I had used a 24-km global mesh (1,024,002 columns) with 256 MPI tasks, each with 4 OpenMP threads. So, it may indeed be a question of how much work each thread is given. I was also using the Intel oneAPI compilers on Derecho, and perhaps there's some compiler or system difference that contributed to the different result. I think it would be fine to keep a single PR, but with two commits in the PR branch. With the
|
This PR resolves the problems raised in issue #1403. Namely, that one subroutine needed barriers to fix random errors and another subroutine needed to called without threading to fix reproducibility with multiple OMP threads.
Changes:
mpas_atm_time_integration.F :
Bug: Add OMP barriers in atm_rk_integration_setup to keep array use in sync (fixes random bad values at thread bounds). It's not clear to me why barriers are needed at both the beginning and end, but neither by itself resolved the bad value errors.
Efficiency: Turn off OMP parallel for atm_advance_scalars_mono because it slows down with multi-threading (insufficient work in loops) (no change to results). The subroutine could be restructured to push the scalar loop inside the column loop, but would use more memory for local arrays.
mpas_atmphys_driver.F :
Bug: Remove OMP parallel for noahmp because it ignores thread bounds (does all cells). The OMP threading was causing the land surface rates to be accumulated multiple times and thus changed the solution.
Tested on MacOS with gfortran/gcc 12.4 and clang version 17.0.0