Skip to content

Support multi-EFA instances with public IPs#3865

Merged
r4victor merged 4 commits intomasterfrom
pr_efa_public_ips
May 8, 2026
Merged

Support multi-EFA instances with public IPs#3865
r4victor merged 4 commits intomasterfrom
pr_efa_public_ips

Conversation

@r4victor
Copy link
Copy Markdown
Collaborator

@r4victor r4victor commented May 8, 2026

Support launching AWS instances with multiple EFA interfaces and public IPs. Previously, multi-EFA instances required public_ips: False because AWS can't automatically assign a public IP if an instance has multiple network interfaces. This limitation is dropped by explicitly allocating and assigning/releasing public IPs instead of relying on IP auto-assign.

Tested launching p4d.24xlarge in eu-north-1: all EFA interfaces configured and public IP assigned. Also tested the same setup with public_ips: false for regressions.

@r4victor
Copy link
Copy Markdown
Collaborator Author

r4victor commented May 8, 2026

Ran NCCL tests on 2x p4d.24xlarge with public IPs. The results as the same as with public_ips: False.

           8             2     float     sum      -1   182.04    0.00    0.00       0   181.35    0.00    0.00       0
          16             4     float     sum      -1   179.80    0.00    0.00       0   176.53    0.00    0.00       0
          32             8     float     sum      -1   176.78    0.00    0.00       0   176.18    0.00    0.00       0
          64            16     float     sum      -1   176.98    0.00    0.00       0   175.23    0.00    0.00       0
         128            32     float     sum      -1   176.00    0.00    0.00       0   180.10    0.00    0.00       0
         256            64     float     sum      -1   176.22    0.00    0.00       0   178.06    0.00    0.00       0
         512           128     float     sum      -1   180.12    0.00    0.01       0   179.21    0.00    0.01       0
        1024           256     float     sum      -1   177.83    0.01    0.01       0   178.19    0.01    0.01       0
        2048           512     float     sum      -1   183.32    0.01    0.02       0   183.40    0.01    0.02       0
        4096          1024     float     sum      -1   187.05    0.02    0.04       0   182.93    0.02    0.04       0
        8192          2048     float     sum      -1   188.79    0.04    0.08       0   189.22    0.04    0.08       0
       16384          4096     float     sum      -1   202.46    0.08    0.15       0   200.69    0.08    0.15       0
       32768          8192     float     sum      -1   231.63    0.14    0.27       0   230.66    0.14    0.27       0
       65536         16384     float     sum      -1   239.24    0.27    0.51       0   234.09    0.28    0.52       0
      131072         32768     float     sum      -1   237.73    0.55    1.03       0   238.19    0.55    1.03       0
      262144         65536     float     sum      -1   253.59    1.03    1.94       0   254.91    1.03    1.93       0
      524288        131072     float     sum      -1   308.22    1.70    3.19       0   314.00    1.67    3.13       0
     1048576        262144     float     sum      -1   402.68    2.60    4.88       0   404.50    2.59    4.86       0
     2097152        524288     float     sum      -1   583.80    3.59    6.74       0   583.21    3.60    6.74       0
     4194304       1048576     float     sum      -1   859.03    4.88    9.15       0   863.63    4.86    9.11       0
     8388608       2097152     float     sum      -1   987.56    8.49   15.93       0   982.72    8.54   16.01       0
    16777216       4194304     float     sum      -1  1180.87   14.21   26.64       0  1181.65   14.20   26.62       0
    33554432       8388608     float     sum      -1  1514.40   22.16   41.54       0  1523.52   22.02   41.30       0
    67108864      16777216     float     sum      -1  2362.20   28.41   53.27       0  2347.93   28.58   53.59       0
   134217728      33554432     float     sum      -1  3995.74   33.59   62.98       0  4014.00   33.44   62.70       0
   268435456      67108864     float     sum      -1  7172.96   37.42   70.17       0  7125.56   37.67   70.64       0
   536870912     134217728     float     sum      -1  13368.9   40.16   75.30       0  13333.1   40.27   75.50       0
  1073741824     268435456     float     sum      -1  25979.1   41.33   77.50       0  25928.6   41.41   77.65       0
  2147483648     536870912     float     sum      -1  50919.0   42.17   79.08       0  50898.8   42.19   79.11       0
  4294967296    1073741824     float     sum      -1   101120   42.47   79.64       0   101064   42.50   79.68       0
  8589934592    2147483648     float     sum      -1   201525   42.62   79.92       0   201274   42.68   80.02       0
ip-172-31-5-190:163:274 [0] NCCL INFO comm 0x5980b4c3e8b0 rank 0 nranks 16 cudaDev 0 busId 101c0 - Destroy COMPLETE
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 22.2693

@r4victor r4victor merged commit e86c432 into master May 8, 2026
25 checks passed
@r4victor r4victor deleted the pr_efa_public_ips branch May 8, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant