e2e/gpu: mark TestGPUK8sSuiteUbuntu2204 as flaky#51171
Conversation
External dependency is down causing the job to fail. Mark the test as flaky so CI retries it rather than hard-failing the pipeline.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e415aee14
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // TestGPUK8sSuiteUbuntu2204 runs tests for the VM interface to ensure its implementation is correct. | ||
| // Not to be run in parallel, as some tests wait until the checks are available. | ||
| func TestGPUK8sSuiteUbuntu2204(t *testing.T) { | ||
| flake.Mark(t) |
There was a problem hiding this comment.
Limit flaky marking to outage-specific failures
Calling flake.Mark(t) at suite entry marks TestGPUK8sSuiteUbuntu2204 as flaky unconditionally, so any future failure in this suite (including real regressions unrelated to the current external outage) is treated as a flaky failure by the test washer (is_flaky_failure returns true on the marker string and process_result then allows the job to succeed). Because this masks all failures rather than the known dependency error, it creates a CI blind spot for GPU Kubernetes coverage until someone removes the marker.
Useful? React with 👍 / 👎.
What does this PR do?
Marks
TestGPUK8sSuiteUbuntu2204as a known-flaky test usingflake.Mark(t).The CI job
new-e2e-gpu: [--run TestGPUK8sSuiteUbuntu2204]is currently failing due to an external dependency being down. This change prevents the failure from blocking the pipeline while the dependency is restored.Motivation
On-call mitigation. The test itself is sound; the failure is caused by a transient external outage.