perf: "two-pass" seurat hvg via scanpy.get.aggregate#4013
Conversation
scanpy.get.aggregatescanpy.get.aggregate
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4013 +/- ##
=======================================
Coverage ? 79.73%
=======================================
Files ? 120
Lines ? 12852
Branches ? 0
=======================================
Hits ? 10248
Misses ? 2604
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Benchmark changes
Warning Some benchmarks failed Comparison: https://github.com/scverse/scanpy/compare/d96d91de3162f29d901194ac56fd732459389784..added47416e86a6412a651f0ddad9e675491d977 More details: https://github.com/scverse/scanpy/pull/4013/checks?check_run_id=83417211191 |
|
The old seurat_v3 (on |
flying-sheep
left a comment
There was a problem hiding this comment.
Looks very straightforward, nice idea!
Co-authored-by: Philipp A. <flying-sheep@web.de>
for more information, see https://pre-commit.ci
|
Thanks for the catch @flying-sheep ! |
…scanpy.get.aggregate`) (#4186) Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Zach Boldyga <zboldyga@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Philipp A. <flying-sheep@web.de>
An idea that popped into my head for disk-bound datasets but likely also normal ones. This should, in theory, greatly improve on-disk access and produce speed ups for disk bound data by reducing the amount of i/o in the worst case, unordered scenario (while, I would guess, leaving in-memory datasets untocuhed or maybe improved thanks to memory access + more efficient mean/var).
Dependent on #4143