perf: chan's parallel mean-var algorithm for dask-backed arrays (sparse/dense)#4143
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4143 +/- ##
=========================================
+ Coverage 0 79.71% +79.71%
=========================================
Files 0 120 +120
Lines 0 12830 +12830
=========================================
+ Hits 0 10227 +10227
- Misses 0 2603 +2603
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
@ilan-gold I added njit support, see: #4153 . This enables rank_gene_groups to use njit. I integrated this to the rank_gene_groups PR and benchmarked there as well as here and it gives a speedup on both at normal group x gene sizes. |
|
Nice commented there about something, but once you got the pre-commit fixed as well, I'll merge into this |
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
6e18c02 to
196e443
Compare
Benchmark changes
Comparison: https://github.com/scverse/scanpy/compare/d96d91de3162f29d901194ac56fd732459389784..7bf2db4aa11c28d5e6ed01644453bb742bab6375 More details: https://github.com/scverse/scanpy/pull/4143/checks?check_run_id=83408969507 |
|
I have no idea why this CI job is failing but the one in #4013 passes and contains this branch |
…gorithm for dask-backed arrays (sparse/dense)) (#4180) Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
Based on a #4118 (comment) with @zboldyga
This has two benefits - it allows us to calculate mean/var in one pass instead of effectively two (square sum and sum squared) and gets rid of a numerical instability issue that @zboldyga found the solution to (see removed comment)