-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add tiered send/receive recovery to azure-core-amqp and azure-messaging-servicebus #48460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
EldertGrootenboer
wants to merge
30
commits into
Azure:main
Choose a base branch
from
EldertGrootenboer:fix/servicebus-tiered-send-recovery
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
a3faf26
Add tiered send/receive recovery matching Go SDK parity
EldertGrootenboerMS 5e28c33
fix(ci): build azure-core-amqp from source for tiered recovery PR
EldertGrootenboerMS f0c6e53
Merge branch 'main' into fix/servicebus-tiered-send-recovery
EldertGrootenboer 0bd7282
Merge branch 'main' into fix/servicebus-tiered-send-recovery
EldertGrootenboer 6116b7b
fix(review): address Copilot PR review feedback on tiered recovery
EldertGrootenboerMS f3a29a7
fix(review): address second Copilot review on tiered recovery
EldertGrootenboerMS 79c52f9
fix(review): address third Copilot review on tiered recovery
EldertGrootenboerMS 1fec9e5
fix(servicebus): make link/connection force-close non-blocking in rec…
EldertGrootenboerMS d4c64fd
fix(amqp): prevent NONE failure from consuming quick-retry flag
EldertGrootenboerMS 52d4435
fix(amqp): classify disposed-link IllegalStateException as LINK recovery
EldertGrootenboerMS ce10aa2
fix(amqp): narrow disposed-ISE match to prevent tier misclassification
EldertGrootenboerMS 5448c97
fix(amqp): clarify virtual-time reason in retry tests
EldertGrootenboerMS 4bd93bb
fix(amqp): tie forceCloseConnection invalidation to specific connecti…
EldertGrootenboerMS 2e80396
fix(amqp): rename test methods to camelCase for checkstyle compliance
EldertGrootenboerMS 1860dd3
fix(amqp): use distinct log message for force-invalidation path
EldertGrootenboerMS c2d2086
Merge branch 'main' into fix/servicebus-tiered-send-recovery
EldertGrootenboer bf2b560
fix(amqp): guard backoff overflow and normalize RecoveryKind message …
EldertGrootenboerMS e76c2d8
fix(servicebus): scope send-link recovery to per-operation reference,…
EldertGrootenboerMS 3eea3b2
fix(servicebus): include exception in connection-recovery warning log
EldertGrootenboerMS 1f9bf92
fix(ci): move azure-core-amqp source comment to correct entry
EldertGrootenboerMS c0e2d8b
Merge branch 'main' into fix/servicebus-tiered-send-recovery
EldertGrootenboer 0c6cb7a
refactor: address reviewer feedback on tiered recovery
EldertGrootenboerMS cb1e86a
fix(servicebus): restore comment explaining Mono.delay(Duration.ZERO)…
EldertGrootenboerMS a3b639c
fix(servicebus): include exception in recovery warning logs
EldertGrootenboerMS ea2961b
docs(servicebus): clarify invalidateConnection v1/v2 behavior in javadoc
EldertGrootenboerMS 591a8c7
Merge branch 'main' into fix/servicebus-tiered-send-recovery
EldertGrootenboerMS 43a27f4
test(amqp): add unit tests for ReactorConnectionCache.invalidateConne…
EldertGrootenboerMS 43d5ff1
fix(servicebus): wrap getLinkSize() in retry+recovery boundary
EldertGrootenboerMS e1b382a
fix(servicebus): remove unused static import for RetryUtil.withRetry
EldertGrootenboerMS 95e9070
docs(servicebus): clarify session-removal recovery comment for LINK a…
EldertGrootenboerMS File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
168 changes: 168 additions & 0 deletions
168
sdk/core/azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/RecoveryKind.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| // Copyright (c) Microsoft Corporation. All rights reserved. | ||
| // Licensed under the MIT License. | ||
|
|
||
| package com.azure.core.amqp.implementation; | ||
|
|
||
| import com.azure.core.amqp.exception.AmqpErrorCondition; | ||
| import com.azure.core.amqp.exception.AmqpException; | ||
|
|
||
| import java.util.Locale; | ||
| import java.util.concurrent.TimeoutException; | ||
|
|
||
| /** | ||
| * Classifies errors into recovery tiers, determining what resources should be closed | ||
| * between retry attempts. This follows the tiered recovery pattern used by the Go, .NET, | ||
| * Python, and JS Azure SDKs. | ||
| * | ||
| * <ul> | ||
| * <li>{@link #NONE} — Retry on the same link (server-busy, timeouts).</li> | ||
| * <li>{@link #LINK} — Close the send/receive link; next retry creates a fresh link on the same connection.</li> | ||
| * <li>{@link #CONNECTION} — Close the entire connection; next retry creates a fresh connection and link.</li> | ||
| * <li>{@link #FATAL} — Do not retry (unauthorized, not-found, message too large).</li> | ||
| * </ul> | ||
| */ | ||
| public enum RecoveryKind { | ||
| /** | ||
| * No recovery needed — retry on the same link and connection. | ||
| * Applies to: server-busy, timeouts, resource-limit-exceeded. | ||
| */ | ||
| NONE, | ||
|
|
||
| /** | ||
| * Close the link (and its session) before retrying. The next retry creates a fresh link | ||
| * on the same connection. | ||
| * Applies to: link:detach-forced, link:stolen, transient AMQP errors on the link. | ||
| */ | ||
| LINK, | ||
|
|
||
| /** | ||
| * Close the entire connection before retrying. The next retry creates a fresh connection, | ||
| * session, and link. | ||
| * Applies to: connection:forced, connection:framing-error, proton:io, internal-error. | ||
| */ | ||
| CONNECTION, | ||
|
|
||
| /** | ||
| * Do not retry — the error is permanent. | ||
| * Applies to: unauthorized-access, not-found, message-size-exceeded. | ||
| */ | ||
| FATAL; | ||
|
|
||
| /** | ||
| * Classifies the given error into a {@link RecoveryKind} that determines what resources | ||
| * should be invalidated between retry attempts. | ||
| * | ||
| * @param error The error to classify. | ||
| * @return The recovery kind for the given error. | ||
| */ | ||
| public static RecoveryKind classify(Throwable error) { | ||
| if (error == null) { | ||
| return NONE; | ||
| } | ||
|
|
||
| // Timeouts — retry on same link, the link may still be healthy. | ||
| if (error instanceof TimeoutException) { | ||
| return NONE; | ||
| } | ||
|
|
||
| if (error instanceof AmqpException) { | ||
| final AmqpException amqpError = (AmqpException) error; | ||
| final AmqpErrorCondition condition = amqpError.getErrorCondition(); | ||
|
|
||
| if (condition != null) { | ||
| switch (condition) { | ||
| // Connection-level errors — close the entire connection. | ||
| case CONNECTION_FORCED: | ||
| case CONNECTION_FRAMING_ERROR: | ||
| case CONNECTION_REDIRECT: | ||
| case PROTON_IO: | ||
| case INTERNAL_ERROR: | ||
| return CONNECTION; | ||
|
|
||
| // Link-level errors — close the link, keep the connection. | ||
| case LINK_DETACH_FORCED: | ||
| case LINK_STOLEN: | ||
| case LINK_REDIRECT: | ||
| case PARTITION_NOT_OWNED_ERROR: | ||
| case TRANSFER_LIMIT_EXCEEDED: | ||
| // operation-cancelled can signal "AMQP layer unexpectedly aborted or disconnected" | ||
| // (e.g. ReceiverUnsettledDeliveries remote Released outcome), requiring link recovery. | ||
| case OPERATION_CANCELLED: | ||
| return LINK; | ||
|
|
||
| // Fatal errors — do not retry. | ||
| case NOT_FOUND: | ||
| case UNAUTHORIZED_ACCESS: | ||
| case LINK_PAYLOAD_SIZE_EXCEEDED: | ||
| case NOT_ALLOWED: | ||
|
EldertGrootenboer marked this conversation as resolved.
|
||
| case NOT_IMPLEMENTED: | ||
| case ENTITY_DISABLED_ERROR: | ||
| case ENTITY_ALREADY_EXISTS: | ||
| case PUBLISHER_REVOKED_ERROR: | ||
| case ARGUMENT_ERROR: | ||
| case ARGUMENT_OUT_OF_RANGE_ERROR: | ||
| case ILLEGAL_STATE: | ||
| case MESSAGE_LOCK_LOST: | ||
| case STORE_LOCK_LOST_ERROR: | ||
| return FATAL; | ||
|
|
||
| // Server-busy, timeouts, and resource-limit errors — retry on same link. | ||
| // RESOURCE_LIMIT_EXCEEDED is treated as transient here because ReactorSender | ||
| // groups it alongside SERVER_BUSY and TIMEOUT in its send-error retry logic. | ||
| case SERVER_BUSY_ERROR: | ||
| case TIMEOUT_ERROR: | ||
| case RESOURCE_LIMIT_EXCEEDED: | ||
| return NONE; | ||
|
EldertGrootenboer marked this conversation as resolved.
|
||
|
|
||
| // Session/lock errors — link-level recovery. | ||
| // Session lock loss means the session link is invalid and | ||
| // a fresh link must be acquired for a new session. | ||
| case SESSION_LOCK_LOST: | ||
| case SESSION_CANNOT_BE_LOCKED: | ||
| case SESSION_NOT_FOUND: | ||
| case MESSAGE_NOT_FOUND: | ||
| return LINK; | ||
|
|
||
|
EldertGrootenboer marked this conversation as resolved.
|
||
| default: | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| // Transient AMQP errors without a specific condition — link recovery. | ||
| if (amqpError.isTransient()) { | ||
| return LINK; | ||
| } | ||
|
|
||
| // Non-transient AMQP errors without a recognized condition — fatal. | ||
| return FATAL; | ||
| } | ||
|
|
||
| // RequestResponseChannelClosedException — link-level (parent connection disposing). | ||
| if (error instanceof RequestResponseChannelClosedException) { | ||
| return LINK; | ||
| } | ||
|
|
||
| // IllegalStateException thrown by a disposed ReactorSender (e.g., "Cannot publish | ||
| // message when disposed." or "Cannot publish data batch when disposed."). This is | ||
| // a link-staleness signal: the link was closed (possibly by a concurrent recovery | ||
| // path) before the in-flight send could complete. LINK recovery creates a fresh | ||
| // link on the next retry. | ||
| // Match both "Cannot publish" and "disposed" to avoid misclassifying unrelated | ||
| // disposal signals (e.g., "Connection is disposed. Cannot get management instance."). | ||
| if (error instanceof IllegalStateException) { | ||
| final String msg = error.getMessage(); | ||
| if (msg != null) { | ||
| final String normalizedMsg = msg.toLowerCase(Locale.ROOT); | ||
| if (normalizedMsg.contains("cannot publish") && normalizedMsg.contains("disposed")) { | ||
| return LINK; | ||
| } | ||
| } | ||
|
EldertGrootenboer marked this conversation as resolved.
|
||
| } | ||
|
EldertGrootenboer marked this conversation as resolved.
|
||
|
|
||
| // Unknown non-AMQP errors — treat as fatal (don't retry application or SDK bugs). | ||
| // The Go SDK defaults to CONNECTION for unknown errors, but those are AMQP-layer | ||
| // errors (io.EOF, net.Error). Java's non-AMQP exceptions (e.g., AzureException, | ||
| // RuntimeException) should fail fast rather than trigger connection recovery. | ||
| return FATAL; | ||
|
EldertGrootenboer marked this conversation as resolved.
|
||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.