PSMS Model Precision & Performance #43
Replies: 2 comments
-
|
Thanks for this! Improving the echoSMs PSMS model performance has been on the list for a while, so it's nice to have some concrete input on how to improve things :) Using quad floats from Fortran in Python is fiddly at best so I defaulted to the easier double precision to get things working. Does your model use quad precision in the PSMS model or just in the calculation of the spheroidal values? If it's enough just to do the spheroidal calculations in quad but return the results in double precision, that would make it much easier to support. Your notes have also helped me discover that the argument to Another complication is that most (all?) Intel and AMD CPU's don't natively support quad precision so they get emulated with software, leading to calculations with quad precision being 10-60 times slower than for double precision (according to some searches). But accuracy of calculations wins over speed, so it's still likely to be worthwhile to do... |
Beta Was this translation helpful? Give feedback.
-
I'm always happy to help help spare other folks avoid the same frustrations and time-sinks I ran into trying to improve precision while not taking eons to run!
The distinction I would make is that
Quad precision is not only used inside the spheroidal-wave-function calls. For the full penetrable PSMS branch, I keep the overlap integrals, kernel assembly, and dense per-m solves in the requested arithmetic as well, and only cast back to double at the Returning doubles at the API boundary is fine; demoting before the dense solve is different. My implementation uses a somewhat different linear-solver path from echoSMs, but the general issue is the same: once the matrix is demoted before the SVD/pseudoinverse or related solve step, the conditioning, singular-value thresholding, and final coefficients are all governed by double precision. At that point it is no longer an end-to-end quad-precision PSMS solve. When developing this implementation, I found that the main source of precision drift was not just in evaluating My sense is that a lot of the numerical trouble starts in the spheroidal-function layer, but for the full penetrable PSMS branch the overlap integrals and dense solve can amplify that error rather than wash it out. There is an observable difference in TS at high
Yeah, the source Fortran code erroneously implies that the For example, my compiler ( So the
That is also definitely a good point and matches my experience too. In my benchmarks the quad runs are slower by roughly tens of times rather than a small consistent factor, so I agree that the cost is very real. One alternative would be to use double-double precision to get close to quad precision (something like 31 decimal digits?). And while double-double precision keeps things like an exponent's range, the Fortran code already does a lot of mantissa and exponent scaling that I don't think would be substantially impacted by using double-double. But that is still not true |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I recently pushed a PSMS implementation I've been working on for acousticTS that I think may be helpful for its state in echoSMs in two ways: 1) precision drift at higher$ka/kb$ , and 2) computation time for the fluid-filled boundary condition. This is more food-for-thought more than anything since micro-optimizing code is often completely unnecessary. The purpose for this thread is:
1. Is there a plan to enable quad-precision in echoSMs for
PSMSModel?2. There may be heuristics and proxy methods that can help speed up model runs without losing significant accuracy relative to benchmarks. And if quad-precision is enabled, then perhaps it would be worthwhile migrating code from
Pythonto aC/C++layer (orCython) down the road.The motivation behind this thread is my recent attempts to implement a T-matrix method into the acousticTS
R-package and an interest to migrate this into Python for some future multiple scattering modeling.Precision
The
prolate_swf.f90file used in thespheroidalwavefunctionsimport defaults to the lower end of double precision (integer, parameter :: knd = selected_real_kind(8)). acousticTS uses 15 decimal digits for its double precision, which doesn't seem to make all that much of a difference when it comes to the outputs (at least when comparing the results from echoSMs and acousticTS) for the fixed rigid and pressure release boundaries (largest deviations being associated with nulls).echoSMs(dB)echoSMs(dB)fixed_rigid12, 18, 38, 70, 1000.496920.10091pressure_release12, 18, 38, 70, 1000.086190.01757The double-precision results track pretty well with the benchmarked values for these boundaries. Conversely, this is not the case for the liquid-filled benchmark (the tables below use the acousticTS outputs). Note that here, the
simplify_Amnrefers to the same simplification defined by both Furusawa (1988) and matches the outputs fromPSMSModelin echoSMs.simplify_AmndoubleFALSEdoubleFALSEdoubleTRUEdoubleTRUEThese differences are really driven by a spectral decorrelation and loss of coherence in the higher$ka$ regimes (roughly starting at around 90 kHz for the benchmark values). Comparatively, running things at quad precision (33 decimal digits) does a remarkably better job with the benchmark.
simplify_AmnquadFALSEquadFALSEquadTRUEquadTRUEWhen visualized:
When all said and done, this quad-precision implementation for liquid-filled prolate spheroids demonstrates some differences with echoSMs but aligns well with the
Prol_Spheroidrepository.12, 18, 38, 70, 1001.016760.2053712, 18, 38, 70, 1000.001280.00055I am not very familiar with
F2PYwith respect to how the generator works, but I think it would be helpful to also include either separate quad-precision binaries or some precision-specific interface. Since I am not super familiar withF2PY, below is the implementation chain I used for linking R to Fortran.acousticTS precision handling
The user makes a high-level, human-readable choice in the R function.
R passes this configuration to the C++ interface. Rcpp converts the string into a conditional logic flow, determining which Type Alias of
psms_fbsto use:This then subsequently maps to a specific compiled binary files (via instructions from
Makevarsfiles) that interface with the same shared source Fortran code:These binaries then send the appropriate instructions to the shared Fortran source code:
Runtime
Obviously the backend for the PSMS model is pretty costly. For runtimes on my local machine:
fixed_rigid12, 18, 38, 70, 1000.860.34N/AN/Apressure_release12, 18, 38, 70, 1000.930.33N/AN/Aliquid_filled12, 18, 38, 70, 1002.7248.0248.6511.06The acousticTS runtimes are when the argument
adaptive = TRUE. The adaptive algorithm I implemented makes the following changes:precision="double": relative tolerance = 1e-8, absolute tolerance = 1e-12precision="quad": relative tolerance = 1e-12, absolute tolerance = 1e-18These changes as well as moving practically all of the logic-handling and processing from
RintoC++conferred pretty sizable improvements to model runtimes for the fluid-filled boundary. The fixed rigid and pressure-release boundaries do benefit from this, but really only when you compute over wide frequency bands with fine-scale intervals. Not included in the below benchmarks/comparisons, but it also helped reduce memory-related errors and maxing out computer hardware (no one likes a BSOD...).This adaptive implementation does a fairly good job retaining model agreement with the benchmark values, e.g. the liquid-filled boundary:
n_integrationNANANAThis consistency accompanies an improvement in reduction in computation times (12, 18, 38, 70, 120, 200, 250, 300, 333, 400 kHz):
Rigid and pressure release
adaptiven_integrationfixed_rigiddoubleFALSE96fixed_rigiddoubleTRUEfixed_rigidquadFALSE96fixed_rigidquadTRUEpressure_releasedoubleFALSE96pressure_releasedoubleTRUEpressure_releasequadFALSE96pressure_releasequadTRUELiquid filled
simplify_Amnadaptiven_integrationdoubleFALSEFALSE96doubleFALSETRUEdoubleTRUEFALSE96doubleTRUETRUEquadFALSEFALSE96quadFALSETRUEquadTRUEFALSE96quadTRUETRUEBeta Was this translation helpful? Give feedback.
All reactions