Question: Best practices to limit CPU usage?

I have been starting some work running CytoTable to work with some data on CellPainting Gallery, and I am wrapping it as part of a Nextflow workflow. One of the things I would like to be able to do is to limit the CPU usage of `cytotable.convert()` while maintaining its efficiency, and I was not quite clear on the best way to do that. 

I can limit the number of `parsl` processes/threads with its config, but `duckdb` and `pyarrow` seem to have their own CPU discovery that default to the full set of available cores, so I still end up with quite a large number of threads using that setting alone. It seems like I should be able to also set `CYTOTABLE_MAX_THREADS` and that would limit `duckdb`, but perhaps not `pyarrow`? I think pyarrow can also be limited on its own with `set_cpu_count()` and `set_io_thread_count()`.

So there seem to be a number of possible places to limit the CPU use, and I what I am trying to work out is how best to balance those. Do you have any recommendations for balancing the CPU allocation among `parsl`, `duckdb`, and `pyarrow`?

Secondarily, for a system running within a single Docker container for import, do you have a recommendation between `HighThroughputExecutor` and `ThreadPoolExecutor`? It seems from the paper that multithreading is generally a bit more efficient, but I wanted to confirm that was what you would recommend.

Thanks for your help!





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Best practices to limit CPU usage? #437

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question: Best practices to limit CPU usage? #437

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions