fix: handle case where PD cluster has no leader#538
Conversation
|
Welcome @MananShukla7! |
|
Warning Review limit reached
More reviews will be available in 7 minutes and 59 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. 📝 WalkthroughWalkthroughThe PR hardens PD cluster connection logic by converting an unsafe ChangesMissing leader error handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3158fcf to
546d228
Compare
|
/cc @ekexium @andylokandy |
Signed-off-by: MananShukla7 <shuklamanan8@gmail.com>
546d228 to
e21c876
Compare
|
/cc @iosmanthus |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: iosmanthus, pingyu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@MananShukla7 Thanks for your contribution ! |
Problem
In
try_connect_leader, the previous leader is accessed with.unwrap():This panics when the PD cluster temporarily has no leader — a situation that occurs during a rolling restart of PD pods in a Kubernetes environment. During leader election,
GetMembersResponse.leadercan beNone, causing the client to crash rather than returning a recoverable error.Fix
Replace
.unwrap()with.ok_or_else()to return a descriptiveErrinstead of panicking. This is consistent with the style already used in this file.Before:
After:
The same pattern is applied to the
resp.leaderlookup later in the function, replacing the existingok_or_elseerror message with a clearer one.Reproduction
Trigger a rolling restart of PD pods while the TiKV client is connected. The client panics at
previous.leader.as_ref().unwrap()when the newGetMembersResponsearrives withleader: Noneduring leader election.Checklist
cargo fmt --allcargo clippy --all-targets --all-features -- -D warningscargo testgit commit -s)Summary by CodeRabbit