[AWS/EKS] Tune VPC CNI warm pool for kubernetes_node_scale in EksKarpenterCluster by kiryl-filatau · Pull Request #6557 · GoogleCloudPlatform/PerfKitBenchmarker

kiryl-filatau · 2026-03-25T17:00:00Z

NOTE: should be merged only after PR#6512 is merged.

What

In EksKarpenterCluster._PostCreate, when the benchmark is
kubernetes_node_scale, tune the VPC CNI warm-pool settings on the
aws-node DaemonSet in kube-system and wait for the rollout to
complete before the benchmark run starts.
Settings applied:

WARM_ENI_TARGET=0
WARM_IP_TARGET=1
MINIMUM_IP_TARGET=1

Why

By default, the AWS VPC CNI pre-allocates a warm pool of ENIs and
secondary IPs on each node as soon as it joins the cluster. The number
of IPs reserved scales with the instance type — larger instances have
more ENI slots and more IPs per ENI, so each node can reserve 10–30+
IPs before a single pod is scheduled.
At 5k-node scale this becomes a hard blocker: the cumulative IP
pre-allocation across all nodes exhausts the subnet address space
before all nodes can be scheduled, causing the benchmark to fail with
InsufficientCapacityError and FailedScheduling events.
Setting WARM_ENI_TARGET=0, WARM_IP_TARGET=1, MINIMUM_IP_TARGET=1
instructs the CNI to keep only 1 IP warm per node instead of a full
ENI's worth, which is sufficient for the kubernetes_node_scale
workload (one pod per node). This is not a performance optimization —
it is a prerequisite for the benchmark to complete successfully at
this scale.
See WARM_IP_TARGET and MINIMUM_IP_TARGET for reference, and this overview for a practical explanation.

Scope

The tuning block is guarded by 'kubernetes_node_scale' in FLAGS.benchmarks, so it is a no-op for all other benchmarks. No
existing behaviour is changed outside that gate.

Testing

Validated with two back-to-back 5k-node runs on EKS + Karpenter in
us-east-1. Both runs completed with status
SUCCEEDED.

Usage

python pkb.py \
--benchmarks=kubernetes_node_scale \
--eks_tune_vpc_cni_for_scale=True \
....

…tes_node_scale

hubatish · 2026-04-10T17:05:35Z

I'm wondering what the heck this even does. I found:
https://medium.com/@GiteshWadhwa/optimizing-kubernetes-networking-understanding-warm-eni-target-warm-ip-target-and-14e74096b067

Which seems like a reasonable explanation & discusses IP addresses. Could you provide a link in your description?

hubatish · 2026-04-10T17:10:12Z

Second question:
We often like to run a somewhat naive set of benchmarks without optimizations.. Is this such a premature optimization? Or justified/necessary because either the benchmark fails without or this is indeed purely a networking thing? I know for AKS/GKE we set networking values with cidr during cluster creation / prior to scaling, so maybe even with this optimization EKS is still doing more work while scaling?

hubatish · 2026-04-10T16:27:53Z

  def _PostCreate(self):
    """Performs post-creation steps for the cluster."""
    super()._PostCreate()
+    if 'kubernetes_node_scale' in FLAGS.benchmarks:


It is a bit more complex, but please add a flag in providers/aws/flags & reference it from here. For a few reasons:

We want resources to be unaware of benchmarks. Rather than the resource knowing about a benchmark, the benchmark (or just the user calling the benchmark & setting flag values) should tell the resource (via flag) what it wants it to do.

We might want to enable this in other benchmarks - like scale -> 1k or 5k pods would probably also benefit from this right?

Done, added --eks_tune_vpc_cni_for_scale to providers/aws/flags.py and replaced the benchmark check with it. Thanks for the suggestion!

kiryl-filatau added 3 commits March 25, 2026 17:56

Tune VPC CNI warm pool in EksKarpenterCluster._PostCreate for kuberne…

2912ded

…tes_node_scale

Merge branch 'master' into aws-5k-fix

b6e7ea0

pyink reformatted

030a81d

hubatish reviewed Apr 10, 2026

View reviewed changes

Add --eks_tune_vpc_cni_for_scale flag

f76ce50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS/EKS] Tune VPC CNI warm pool for kubernetes_node_scale in EksKarpenterCluster#6557

[AWS/EKS] Tune VPC CNI warm pool for kubernetes_node_scale in EksKarpenterCluster#6557
kiryl-filatau wants to merge 4 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:aws-5k-fix

kiryl-filatau commented Mar 25, 2026 •

edited

Loading

Uh oh!

hubatish commented Apr 10, 2026

Uh oh!

hubatish commented Apr 10, 2026

Uh oh!

hubatish Apr 10, 2026

Uh oh!

kiryl-filatau Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiryl-filatau commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Scope

Testing

Usage

Uh oh!

hubatish commented Apr 10, 2026

Uh oh!

hubatish commented Apr 10, 2026

Uh oh!

hubatish Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

kiryl-filatau Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kiryl-filatau commented Mar 25, 2026 •

edited

Loading