dev_cluster: Add --cpuset-stride flag#29843
Conversation
Always pass unrolled cpusets, i.e.: 0,1,2,3 instead of 0-3. Preparation for "smarter" cpuset assign logic
There was a problem hiding this comment.
Pull request overview
Adds an optional CPU interleaving strategy to tools/dev_cluster.py so generated --cpuset values can be spaced by a user-provided stride, enabling SMT-aware placements for dev clusters.
Changes:
- Introduces
--cpuset-strideCLI flag (default1) to control spacing between allocated CPU IDs. - Adds
cpuset_cpu(...)and updates cpuset generation to emit an explicit CPU list rather than a contiguous range. - Threads the new stride setting through
Redpandanode startup.
|
|
||
|
|
||
| def cpuset_cpu( | ||
| cpu_count: int, stride: int, cores: int, node_index: int, core_index: int |
There was a problem hiding this comment.
we use both "cpu" and "core" here, is there a difference I should be aware of?
There was a problem hiding this comment.
One is the actual hardware core count, the other is --smp. Let me clarify.
|
Argh, actually this is more complicated. psutil cpu thing also excludes offline cpus. |
Make dev_cluster cpuset generation optionally use a stride. This is useful for SMT systems or SMT specific testing: - --cpuset-stride=2 on a 32 core system will generate 0,2 , 4,6 and 8,10 - --cpuset-stride=16 on a 32 core system will generate 0,16, 1,17 and 2,18 Both are a valid scenarios depending on the SMT sibling core assignment id of the system. Note flag validation is not very detailed as it's generally hard to validate the different valid scenarios (e.g.: as per above).
b841ee1 to
c7e5e1e
Compare
|
Using nproc --all now to get the count |
How should stride work with offline CPUs? Should it include them (treat them as if online, effectively)? |
I think (this is what's implemented) it should just ignore them and leave all responsibility with the user as otherwise the logic would get even more complex. |
Make dev_cluster cpuset generation optionally use a stride.
This is useful for SMT systems or SMT specific testing:
8,10
2,18
Both are a valid scenarios depending on the SMT sibling core assignment
id of the system.
Note flag validation is not very detailed as it's generally hard to
validate the different valid scenarios (e.g.: as per above).
Backports Required
Release Notes