Replace fixed sleep with IAM trust policy validation in example_emr_eks#66736
Merged
o-nikolas merged 1 commit intoMay 20, 2026
Merged
Conversation
9586255 to
a14ab73
Compare
Contributor
|
Bad rebase |
a14ab73 to
f31385e
Compare
Contributor
Author
|
Rebased |
ferruzzi
reviewed
May 13, 2026
f31385e to
20f6fe5
Compare
vincbeck
reviewed
May 13, 2026
o-nikolas
reviewed
May 15, 2026
Member
|
@seanghaeli — There are 6 unresolved review thread(s) on this PR from @ferruzzi, @o-nikolas, @vincbeck. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks! Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
20f6fe5 to
6f33051
Compare
…e_emr_eks The system test was failing intermittently because the fixed 60-second sleep after updating the IAM trust policy was insufficient — AWS IAM OIDC-based trust policy propagation can take 2-5+ minutes. Replace the sleep with a new `wait_for_trust_policy_propagation` task that uses exponential backoff (5s-30s intervals, up to 5 min) to: 1. Verify the trust policy document contains the expected OIDC provider 2. Confirm IAM's SimulatePrincipalPolicy returns "allowed" for sts:AssumeRoleWithWebIdentity This adapts to actual propagation time (fast when IAM is quick, patient when it's slow) and provides observability via logging at each retry. Also adds --retry 3 --retry-delay 5 to the eksctl curl download to handle transient GitHub network failures.
6f33051 to
98e868a
Compare
o-nikolas
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
time.sleep(60)after IAM trust policy update with an active validation task that confirms propagation using exponential backoffwait_for_trust_policy_propagationtask usesiam:GetRole+iam:SimulatePrincipalPolicyto verify the trust policy is consistent before proceeding--retry 3 --retry-delay 5to the eksctl curl download for transient network resilienceMotivation
The
example_emr_ekssystem test was failing intermittently because IAM OIDC-based trust policy propagation can take 2-5+ minutes. The fixed sleep was either too short (causing auth failures) or unnecessarily long (wasting CI time).Test plan
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.6) following the guidelines