Skip to content

Cherry pick from release-3.15 to develop#7415

Open
hanwen-cluster wants to merge 9 commits into
aws:developfrom
hanwen-cluster:develop
Open

Cherry pick from release-3.15 to develop#7415
hanwen-cluster wants to merge 9 commits into
aws:developfrom
hanwen-cluster:develop

Conversation

@hanwen-cluster
Copy link
Copy Markdown
Contributor

@hanwen-cluster hanwen-cluster commented May 28, 2026

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

hanwen-cluster and others added 9 commits May 28, 2026 11:50
The tags are not propagated to root volumes in us-iso regions
…utils

The SSM automation test pipeline deletes tests/performance_tests before running tests, breaking nccl_common.py's top-level import from that package.

Moreover, The function is a generic DynamoDB reporting utility with no performance-test-specific logic, so tests/common/utils is a more appropriate home regardless.
In us-iso*, us-isob* regions, CloudFormation has a behavior where
an UpdateStack call that includes both a Tags change and a resource
whose change is only in Metadata does not update that resource. This
breaks the head node update flow: cfn-hup on the head node polls
DescribeStackResource for Metadata changes on HeadNodeLaunchTemplate,
never sees them when tags are updated in the same call, and the
HeadNodeWaitCondition times out after 30 minutes.

Until the CloudFormation behavior is addressed, block tag updates at
the validation layer in ADC regions so the failure mode surfaces
immediately.

Commercial, GovCloud, and China regions are unaffected.

Policy changes:
- Add UpdatePolicy.SUPPORTED_UNLESS_ADC, equivalent to SUPPORTED
  outside ADC and UNSUPPORTED inside ADC.
- Cover the new policy with parametrized unit tests across commercial,
  GovCloud, China, us-iso, us-isob, and empty region values.
Previously, OS rotation was seeded by day-of-year, meaning multiple builds
on the same day would always test the same OS. This limited coverage when
running tests more than once per day.

Now, when --global-build-number is provided and non-zero, it is used as the
rotation seed instead. Each build picks a different OS, enabling full
coverage across consecutive pipeline runs. Falls back to day-based rotation
when global-build-number is None or 0.
@hanwen-cluster hanwen-cluster requested review from a team as code owners May 28, 2026 20:47
{"Name": "HeadNode", "parallelcluster:node-type": "HeadNode"},
),
"tag_getter_kwargs": {"cluster": cluster, "os": os},
"skip": "us-iso" in cluster.region,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment / TODO for this commit so that we remember to revert this commit once Tag is supported in us-iso-region

return any(path.startswith("SlurmQueues[") for path in change.path)


def _is_adc_region():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment / TODO for this commit so that we remember to revert this commit once Tag is supported in us-iso-region

Copy link
Copy Markdown
Contributor

@himani2411 himani2411 May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also lets add a TBR or Some kind of Tag in commit message so that we can track which commit needs to be reverted in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants