DNM: Test PR for ClusterBot, controller: cap BareMetalHost reconcile backoff at 30 seconds#478
DNM: Test PR for ClusterBot, controller: cap BareMetalHost reconcile backoff at 30 seconds#478MahnoorAsghar wants to merge 1 commit into
Conversation
|
/test e2e-metal-ipi-bm |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 35 minutes and 37 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
WalkthroughAdded Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MahnoorAsghar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/controller/metal3.io/baremetalhost_controller.go (1)
64-72: LGTM! The constants are well-documented and the values are appropriate.Consider whether these rate limiter parameters should be exposed as command-line flags or configuration options for operational flexibility, similar to how
maxConcurrentReconcileis passed in. This would allow operators to tune the behavior without code changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/metal3.io/baremetalhost_controller.go` around lines 64 - 72, Replace the hard-coded rate limiter constants (rateLimiterBaseDelay, rateLimiterMaxDelay, rateLimiterBursts, rateLimiterRequestsPerSecond) with configurable options exposed as command-line flags or config fields similar to how maxConcurrentReconcile is passed in: add flags for base delay, max delay, bursts, and requests-per-second in the controller setup, parse and validate them, surface them on the reconciler/controller struct (or options struct) and use those fields when constructing the rate limiter instead of the constants (look for where the rate limiter is created to wire the new fields in). Ensure sensible default values equal to the current constants and validate ranges (e.g., non-negative durations, positive burst and RPS).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Line 18: The go.mod contains a duplicate entry for the module
golang.org/x/time (present as both a direct dependency and an indirect entry);
remove the redundant indirect listing by running `go mod tidy` to reconcile
dependencies and ensure only the direct golang.org/x/time v0.9.0 entry remains,
or manually delete the duplicate golang.org/x/time line and then run `go mod
tidy` to update go.sum and module metadata.
---
Nitpick comments:
In `@internal/controller/metal3.io/baremetalhost_controller.go`:
- Around line 64-72: Replace the hard-coded rate limiter constants
(rateLimiterBaseDelay, rateLimiterMaxDelay, rateLimiterBursts,
rateLimiterRequestsPerSecond) with configurable options exposed as command-line
flags or config fields similar to how maxConcurrentReconcile is passed in: add
flags for base delay, max delay, bursts, and requests-per-second in the
controller setup, parse and validate them, surface them on the
reconciler/controller struct (or options struct) and use those fields when
constructing the rate limiter instead of the constants (look for where the rate
limiter is created to wire the new fields in). Ensure sensible default values
equal to the current constants and validate ranges (e.g., non-negative
durations, positive burst and RPS).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: fe838af9-b1d6-429e-ba1c-8100bc5d4be4
📒 Files selected for processing (2)
go.modinternal/controller/metal3.io/baremetalhost_controller.go
141503a to
b524621
Compare
The default controller-runtime exponential rate limiter has a maximum delay of 1000 seconds (~16 minutes). During BMO startup there is a brief race between the reconcile loop becoming active and BMO's own validating-webhook Service endpoint being propagated: any BareMetalHost reconciliation that triggers a webhook call in that window gets a "no endpoints available" error. A burst of such errors is enough to drive the per-item exponential backoff to its ceiling, after which BMO silently waits up to 16 minutes before attempting the next reconcile even though the webhook has long since become reachable. Replace the default rate limiter with one that is otherwise identical but caps the per-item exponential delay at 30 seconds. This bounds the recovery window to at most one retry interval after the endpoint is propagated, matching the behaviour of other Metal3 controllers. Generated-by: Cursor, claude-4.6-sonnet-medium model Signed-off-by: MahnoorAsghar <masghar@redhat.com>
b524621 to
1ea1fb3
Compare
|
/test e2e-metal-ipi-bm |
|
@MahnoorAsghar: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The default controller-runtime exponential rate limiter has a maximum delay of 1000 seconds (~16 minutes). During BMO startup there is a brief race between the reconcile loop becoming active and BMO's own validating-webhook Service endpoint being propagated: any BareMetalHost reconciliation that triggers a webhook call in that window gets a "no endpoints available" error. A burst of such errors is enough to drive the per-item exponential backoff to its ceiling, after which BMO silently waits up to 16 minutes before attempting the next reconcile even though the webhook has long since become reachable.
Replace the default rate limiter with one that is otherwise identical but caps the per-item exponential delay at 30 seconds. This bounds the recovery window to at most one retry interval after the endpoint is propagated, matching the behaviour of other Metal3 controllers.
Generated-by: Cursor, claude-4.6-sonnet-medium model
Summary by CodeRabbit
Chores
New Features