Skip to content

Replace fixed sleep with IAM trust policy validation in example_emr_eks#66736

Merged
o-nikolas merged 1 commit into
apache:mainfrom
aws-mwaa:ghaeli/emr-eks-trust-policy-validation
May 20, 2026
Merged

Replace fixed sleep with IAM trust policy validation in example_emr_eks#66736
o-nikolas merged 1 commit into
apache:mainfrom
aws-mwaa:ghaeli/emr-eks-trust-policy-validation

Conversation

@seanghaeli
Copy link
Copy Markdown
Contributor

Summary

  • Replace the fixed time.sleep(60) after IAM trust policy update with an active validation task that confirms propagation using exponential backoff
  • The new wait_for_trust_policy_propagation task uses iam:GetRole + iam:SimulatePrincipalPolicy to verify the trust policy is consistent before proceeding
  • Add --retry 3 --retry-delay 5 to the eksctl curl download for transient network resilience

Motivation

The example_emr_eks system test was failing intermittently because IAM OIDC-based trust policy propagation can take 2-5+ minutes. The fixed sleep was either too short (causing auth failures) or unnecessarily long (wasting CI time).

Test plan

  • Verify DAG parses without errors
  • Run system test end-to-end — confirm the validation task correctly detects propagation and proceeds
  • Confirm transient failure rate decreases over multiple runs

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.6)

Generated-by: Claude Code (Opus 4.6) following the guidelines

Comment thread airflow-core/src/airflow/utils/log/callback_log_reader.py Fixed
@seanghaeli seanghaeli force-pushed the ghaeli/emr-eks-trust-policy-validation branch from 9586255 to a14ab73 Compare May 12, 2026 05:37
@vincbeck
Copy link
Copy Markdown
Contributor

Bad rebase

@seanghaeli seanghaeli force-pushed the ghaeli/emr-eks-trust-policy-validation branch from a14ab73 to f31385e Compare May 12, 2026 21:50
@seanghaeli
Copy link
Copy Markdown
Contributor Author

seanghaeli commented May 12, 2026

Rebased

Comment thread providers/amazon/tests/system/amazon/aws/example_emr_eks.py Outdated
@seanghaeli seanghaeli force-pushed the ghaeli/emr-eks-trust-policy-validation branch from f31385e to 20f6fe5 Compare May 13, 2026 19:27
Comment thread providers/amazon/tests/system/amazon/aws/example_emr_eks.py
Comment thread providers/amazon/tests/system/amazon/aws/example_emr_eks.py
Comment thread providers/amazon/tests/system/amazon/aws/example_emr_eks.py Outdated
Comment thread providers/amazon/tests/system/amazon/aws/example_emr_eks.py Outdated
Comment thread providers/amazon/tests/system/amazon/aws/example_emr_eks.py Outdated
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 18, 2026

@seanghaeli — There are 6 unresolved review thread(s) on this PR from @ferruzzi, @o-nikolas, @vincbeck. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks!


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@seanghaeli seanghaeli force-pushed the ghaeli/emr-eks-trust-policy-validation branch from 20f6fe5 to 6f33051 Compare May 19, 2026 22:13
…e_emr_eks

The system test was failing intermittently because the fixed 60-second
sleep after updating the IAM trust policy was insufficient — AWS IAM
OIDC-based trust policy propagation can take 2-5+ minutes.

Replace the sleep with a new `wait_for_trust_policy_propagation` task
that uses exponential backoff (5s-30s intervals, up to 5 min) to:
1. Verify the trust policy document contains the expected OIDC provider
2. Confirm IAM's SimulatePrincipalPolicy returns "allowed" for
   sts:AssumeRoleWithWebIdentity

This adapts to actual propagation time (fast when IAM is quick, patient
when it's slow) and provides observability via logging at each retry.

Also adds --retry 3 --retry-delay 5 to the eksctl curl download to
handle transient GitHub network failures.
@seanghaeli seanghaeli force-pushed the ghaeli/emr-eks-trust-policy-validation branch from 6f33051 to 98e868a Compare May 19, 2026 23:01
@o-nikolas o-nikolas merged commit a1784e8 into apache:main May 20, 2026
94 checks passed
@o-nikolas o-nikolas deleted the ghaeli/emr-eks-trust-policy-validation branch May 20, 2026 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:logging area:providers area:UI Related to UI/UX. For Frontend Developers. provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants