[SREP-3734] Port AlertmanagerInhibitions e2e tests from osde2e#515
[SREP-3734] Port AlertmanagerInhibitions e2e tests from osde2e#515smarthall wants to merge 1 commit into
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
WalkthroughAdds an Alertmanager v2 test client and e2e tests that post synthetic alerts to Alertmanager, poll the API until a target alert becomes suppressed by inhibition rules, and resolve the synthetic alerts during cleanup. Changes
Sequence Diagram(s)sequenceDiagram
participant Test
participant KubeAPI as Kubernetes API
participant Route as Route (alertmanager-main)
participant AlertmgrPod as Alertmanager Pod
participant AlertmgrAPI as Alertmanager v2 API
Test->>KubeAPI: discover `alertmanager-main` route host, get SA token & router CA
KubeAPI-->>Test: route host, bearer token, CA bundle
Test->>Route: use host + CA + token to construct AlertmanagerClient (HTTPS)
Note right of Test: static config checks may read operator-managed alertmanager.yaml
Test->>AlertmgrPod: exec `curl` POST /api/v2/alerts (post synthetic source & target)
AlertmgrPod->>AlertmgrAPI: ingest posted alerts
AlertmgrAPI-->>AlertmgrAPI: evaluate inhibition rules (source -> suppress target)
loop poll until suppressed
Test->>AlertmgrAPI: GET /api/v2/alerts?filter=...
AlertmgrAPI-->>Test: alert list (target status becomes "suppressed")
end
Test->>AlertmgrPod: exec `curl` POST /api/v2/alerts to resolve alerts (endsAt in past)
AlertmgrAPI-->>Test: alerts resolved
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 8 | ❌ 4❌ Failed checks (2 warnings, 2 inconclusive)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 44 minutes and 20 seconds.Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarthall The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #515 +/- ##
=======================================
Coverage 68.26% 68.26%
=======================================
Files 8 8
Lines 1087 1087
=======================================
Hits 742 742
Misses 315 315
Partials 30 30 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/configure_alertmanager_operator_tests.go`:
- Around line 1444-1498: The test currently removes any IDP named by
identityProviderName in cleanup() and later only deletes by that name, which can
permanently mutate cluster auth; instead capture the original identityProviders
slice from oauth (e.g. store in a new originalIDPs variable after the first Get
in the test before appending degradingIDP), then modify oauth.Object to add
degradingIDP as you do, and change cleanup() to restore the entire originalIDPs
slice (using unstructured.SetNestedSlice and
dynamicClient.Resource(oauthGVR).Update) rather than deleting by name; also
invoke the deferred cleanup with a non-cancelable context (e.g.
context.Background()) so it runs even if the test context times out.
- Around line 1520-1544: The test currently treats absence or any non-"active"
state as successful inhibition; update the assertion to ensure
ClusterOperatorDown was actually suppressed by Alertmanager: first verify the
alert is firing in Prometheus (query Prometheus for
ClusterOperatorDown{name="authentication"} via the existing Prometheus client)
or, if you choose the minimal change, require that amClient.GetAlertsByName(...)
returns at least one alert whose alert.Status.State == "suppressed" and
len(alert.Status.InhibitedBy) > 0; replace the len(alerts)==0 branch with a
failure (or an explicit Prometheus firing check) and add the explicit checks for
"suppressed" and non-empty InhibitedBy in the loop that currently logs state to
guarantee suppression rather than silence/absence.
In `@test/e2e/utils/utils.go`:
- Around line 477-487: The code currently uses the first entry in status.ingress
(ingress[0]) which can be non-admitted; update the logic in the function that
builds the Alertmanager client (the code around NewAlertmanagerClient / the
route handling where variables ingress, ingressMap, host are used) to iterate
over the ingress slice and pick the first ingress entry whose conditions include
an object with type "Admitted" and status "True", then extract the host from
that admitted ingress; if none are admitted, return the existing error. Ensure
you check types when casting each ingress element to map[string]interface{} and
use unstructured.NestedSlice/NestedString on the admitted ingress map to get the
host.
- Around line 131-145: The HasInhibitRuleMatching function miscompares unordered
equal label lists and treats a missing target_match_re["alertname"] as matching;
fix by performing an order-insensitive comparison for equal (either sort both
slices before using slices.Equal or check lengths and that every element in the
provided equal slice exists in rule.Equal) and skip the rule if
rule.TargetMatchRE does not contain "alertname" (use val, ok :=
rule.TargetMatchRE["alertname"]; if !ok { continue }). Keep the existing
sourceMatch check and only call regexp.MatchString with the validated val.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5aa31286-f99c-492b-bc2c-b7983a1663f0
📒 Files selected for processing (2)
test/e2e/configure_alertmanager_operator_tests.gotest/e2e/utils/utils.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/utils/utils.go`:
- Line 566: The linter flags the unchecked error from defer resp.Body.Close();
update the defer to explicitly discard the Close() error so it satisfies
errcheck (e.g., use a defer that assigns the return to the blank identifier)
where resp is used in this helper in test/e2e/utils/utils.go; ensure you only
change the defer expression (referencing resp.Body.Close()) and not other logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 85889864-3b3a-422c-82ce-ca0ac5f08f31
📒 Files selected for processing (2)
test/e2e/configure_alertmanager_operator_tests.gotest/e2e/utils/utils.go
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/configure_alertmanager_operator_tests.go
0a9d537 to
38f913e
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (3)
test/e2e/utils/utils.go (2)
464-475:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSelect an admitted ingress entry instead of assuming
status.ingress[0].Line 468 can pick a non-admitted router entry, which makes
NewAlertmanagerClientflaky on clusters with multiple ingress controllers.Proposed fix
- ingressMap, ok := ingress[0].(map[string]interface{}) - if !ok { - return nil, fmt.Errorf("unexpected ingress format in alertmanager-main route") - } - host, found, err := unstructured.NestedString(ingressMap, "host") - if err != nil || !found || host == "" { - return nil, fmt.Errorf("alertmanager-main route has no host in ingress status") - } + var host string + for _, item := range ingress { + ingressMap, ok := item.(map[string]interface{}) + if !ok { + continue + } + conditions, _, _ := unstructured.NestedSlice(ingressMap, "conditions") + admitted := false + for _, c := range conditions { + conditionMap, ok := c.(map[string]interface{}) + if !ok { + continue + } + condType, _, _ := unstructured.NestedString(conditionMap, "type") + condStatus, _, _ := unstructured.NestedString(conditionMap, "status") + if condType == "Admitted" && condStatus == "True" { + admitted = true + break + } + } + if admitted { + host, _, _ = unstructured.NestedString(ingressMap, "host") + if host != "" { + break + } + } + } + if host == "" { + return nil, fmt.Errorf("alertmanager-main route has no admitted ingress host") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/utils/utils.go` around lines 464 - 475, The code currently assumes status.ingress[0] is the admitted route which can be non-admitted; update the selection logic in NewAlertmanagerClient (or the function containing route/ingress parsing) to iterate the ingress slice, cast each entry to map[string]interface{} (ingressMap), and pick the first entry whose "conditions" array contains an object with type=="Admitted" and status=="True"; then read the "host" from that selected ingressMap instead of using ingress[0]; keep existing error returns if no admitted ingress or no host is found.
545-545:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHandle deferred
Close()return to satisfyerrcheck.Line 545 currently ignores the returned error and is flagged by lint.
Proposed fix
- defer resp.Body.Close() + defer func() { _ = resp.Body.Close() }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/utils/utils.go` at line 545, Replace the bare defer resp.Body.Close() so the returned error is handled; change it to a deferred closure that calls resp.Body.Close() and checks the error (e.g. defer func() { if err := resp.Body.Close(); err != nil { log.Printf("resp.Body.Close error: %v", err) } }()), locating the call to resp.Body.Close() in the same helper in test/e2e/utils/utils.go and use the package-appropriate logger (t.Logf, log.Printf, or similar) available in that context.test/e2e/configure_alertmanager_operator_tests.go (1)
1410-1415:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid first-item short-circuit; assert inhibition explicitly.
Line 1414 returns from the loop on the first non-suppressed alert, and Line 1411 accepts suppressed without checking
inhibitedBy. This can produce false results when multiple matching alerts are returned.Proposed fix
- for _, alert := range alerts { - if alert.Status.State == "suppressed" { - return nil - } - return fmt.Errorf("ClusterOperatorDown state=%q, expected suppressed", alert.Status.State) - } - return nil + for _, alert := range alerts { + if alert.Status.State != "suppressed" || len(alert.Status.InhibitedBy) == 0 { + return fmt.Errorf( + "ClusterOperatorDown is not inhibited (state=%q, inhibitedBy=%v)", + alert.Status.State, alert.Status.InhibitedBy, + ) + } + } + return nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/configure_alertmanager_operator_tests.go` around lines 1410 - 1415, The loop over alerts currently short-circuits on the first item and treats any "suppressed" state as sufficient without checking inhibition; change the logic in the alerts iteration to (1) examine every alert instead of returning from the first iteration, (2) for each alert assert that alert.Status.State == "suppressed" and that the alert's inhibition marker (e.g. alert.Status.InhibitedBy or alert.InhibitedBy—use the actual field present) is non-empty, and (3) only succeed (return nil) if all matching alerts are suppressed and have a non-empty inhibitedBy; otherwise return an aggregated/first meaningful error after the loop mentioning which alerts failed the checks. Ensure you update the loop that references alerts and alert.Status.State to perform these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/configure_alertmanager_operator_tests.go`:
- Around line 1410-1415: The loop over alerts currently short-circuits on the
first item and treats any "suppressed" state as sufficient without checking
inhibition; change the logic in the alerts iteration to (1) examine every alert
instead of returning from the first iteration, (2) for each alert assert that
alert.Status.State == "suppressed" and that the alert's inhibition marker (e.g.
alert.Status.InhibitedBy or alert.InhibitedBy—use the actual field present) is
non-empty, and (3) only succeed (return nil) if all matching alerts are
suppressed and have a non-empty inhibitedBy; otherwise return an
aggregated/first meaningful error after the loop mentioning which alerts failed
the checks. Ensure you update the loop that references alerts and
alert.Status.State to perform these checks.
In `@test/e2e/utils/utils.go`:
- Around line 464-475: The code currently assumes status.ingress[0] is the
admitted route which can be non-admitted; update the selection logic in
NewAlertmanagerClient (or the function containing route/ingress parsing) to
iterate the ingress slice, cast each entry to map[string]interface{}
(ingressMap), and pick the first entry whose "conditions" array contains an
object with type=="Admitted" and status=="True"; then read the "host" from that
selected ingressMap instead of using ingress[0]; keep existing error returns if
no admitted ingress or no host is found.
- Line 545: Replace the bare defer resp.Body.Close() so the returned error is
handled; change it to a deferred closure that calls resp.Body.Close() and checks
the error (e.g. defer func() { if err := resp.Body.Close(); err != nil {
log.Printf("resp.Body.Close error: %v", err) } }()), locating the call to
resp.Body.Close() in the same helper in test/e2e/utils/utils.go and use the
package-appropriate logger (t.Logf, log.Printf, or similar) available in that
context.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a60d3211-5b39-4147-877e-44e109ce6431
📒 Files selected for processing (2)
test/e2e/configure_alertmanager_operator_tests.gotest/e2e/utils/utils.go
38f913e to
b4bd314
Compare
Move the AlertmanagerInhibitions tests into this repo's e2e suite. Instead of degrading a real cluster operator, inject synthetic alerts directly into Alertmanager's v2 API and verify the inhibition rule suppresses ClusterOperatorDown when ClusterOperatorDegraded is active. POST uses pod exec to reach each alertmanager instance directly, since alerts posted via the API are not gossiped between peers. Test completes in seconds with no cluster state changes. Signed-off-by: Daniel Hall <danhall@redhat.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b4bd314 to
37690a4
Compare
|
@smarthall: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Ports the
AlertmanagerInhibitionse2e tests from osde2e into this repo, adapted to the CAMO test framework.What's tested
severity=criticalsource match (existingDescribe("Alertmanager configuration structure")block).ClusterOperatorDegradedandClusterOperatorDownalerts directly into the Alertmanager v2 API, then verifiesClusterOperatorDownis present withstate=suppressed. This proves the inhibition rule is working end-to-end.Approach: synthetic alert injection
Instead of degrading a real cluster operator (risky, slow, doesn't verify actual suppression), we POST synthetic alerts to Alertmanager's
/api/v2/alertsendpoint. Alertmanager treats them identically to real alerts — inhibition rules apply immediately.POST uses pod exec to reach each alertmanager instance directly, since alerts posted via the API are not gossiped between peers and the external route load-balances across replicas. Cleanup resolves the synthetic alerts by posting them with
endsAtin the past.Test completes in ~11 seconds with no cluster state changes.
New utilities in
test/e2e/utils/AlertmanagerClient— authenticated HTTP client for the Alertmanager v2 API (GET via external route, POST via pod exec)PostAlerts()/ResolveAlerts()— inject and clean up synthetic alertsGetAlerts()/GetAlertsByName()— query alert state including inhibition statusTest plan
go build -tags osde2e ./test/e2e/...compiles cleanlyRelated
Corresponding osde2e test removal: openshift/osde2e#3222
🤖 Generated with Claude Code