Open observation scope and stop on non-FeignException errors#3407
Open
seonwooj0810 wants to merge 1 commit into
Open
Open observation scope and stop on non-FeignException errors#3407seonwooj0810 wants to merge 1 commit into
seonwooj0810 wants to merge 1 commit into
Conversation
`MicrometerObservationCapability` started an `Observation` but never opened its `Scope`, so the new observation was not the current one while the call ran. Downstream `ObservationHandler`s — most importantly, tracing handlers that copy trace/span ids into the MDC — saw the parent observation instead of the Feign one, so log lines for the outgoing call were tagged with the wrong span id. The capability also caught only `FeignException`. Underlying clients (JDK `HttpURLConnection`, Apache HC5, OkHttp, …) throw client-specific exceptions — `SocketTimeoutException`, `HttpHostConnectException`, `IOException` — directly, never wrapped as `FeignException`. Those slipped past the `catch`, so the observation was never stopped: the error was not recorded, the timer was never committed, and the observation leaked. Both calls now open the observation as a `Scope` while the underlying client runs, catch any `Throwable` to record the error and stop the observation, and unwrap `CompletionException` on the async path so the recorded error is the cause rather than the wrapper. Fixes OpenFeign#2406 Fixes OpenFeign#2566
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MicrometerObservationCapabilityhad two related defects that left users without correct tracing or metrics when their feign-instrumented call did anything other than complete successfully or fail with aFeignException:observation.start()but neverobservation.openScope(). AnyObservationHandlerthat looks at the current observation — most notably the tracing handler that copiestraceId/spanIdinto the MDC — saw the parent observation while the Feign call ran. Log lines emitted from inside the call were tagged with the wrong span id, breaking distributed-tracing log correlation.FeignExceptionwas caught ( [BUG] MicrometerObservationCapability not reporting timeouts #2566). The actual clients Feign delegates to throw client-specific types directly: JDKHttpURLConnectionthrowsSocketTimeoutException/ConnectException, Apache HC5 throwsHttpHostConnectException, OkHttp throwsIOException. None of those areFeignException. They flew past thecatch, the observation was never stopped,observation.error(...)was never called — so the timer was never committed, the trace stayed open, and one failed request quietly leaked an observation context.Both fixes are small and live in the same handful of lines, so this PR addresses them together.
What changed
In
MicrometerObservationCapability:enrichpaths now open the observation as aScope(try (Observation.Scope scope = observation.openScope()) { … }). The observation is the current observation while the underlying client runs, so MDC-propagating handlers see the Feign span — not the parent.catchclause widened fromFeignExceptiontoThrowable. Anything thrown by the underlying client is now captured byobservation.error(...)and the observation is always stopped infinally. On the async path the same widening is applied to the synchronous portion ofexecuteand to thewhenCompletecallback.CompletionExceptionbefore recording, so the error stamped on the observation is the underlying cause rather than the executor wrapper.finalizeObservationcollapsed into the inline logic now that the lifecycle is symmetrical.Why this shape
Observation.Scopeis the documented mechanism in Micrometer for making an observation visible to downstream code — handlers, propagators, baggage.start()only marks the observation as in-flight; withoutopenScope()the registry'sgetCurrentObservation()keeps returning the parent. Opening the scope inside atry-with-resourceskeeps the scope tied to the call boundary, so a thrown exception still closes it cleanly and the parent observation is correctly restored afterwards.Catching
Throwablerather thanExceptionis deliberate: the contract here is "the observation must be closed no matter what happens inside this block."finally { observation.stop(); }is what makes that true, andcatch (Throwable)is what makes sureobservation.error(...)gets the right cause attached before stop.Tests
Added
MicrometerObservationCapabilityTest(usesTestObservationRegistryfrommicrometer-observation-test, already on the classpath viamicrometer-test) covering:scopeIsOpenDuringClientExecution—observationRegistry.getCurrentObservation()returns the Feign observation while the underlyingClientruns, and isnullagain afterwards. Locks in the fix for Feign observation span is not put in "scope" - loggers do not see current spanId #2406.recordsNonFeignExceptionThrownByClient/recordsRuntimeExceptionThrownByClient— when theClientthrowsSocketTimeoutException/RuntimeException, the observation is stopped and the underlying error is recorded. Locks in [BUG] MicrometerObservationCapability not reporting timeouts #2566. (Retryer.NEVER_RETRYis used so the test asserts on the single observation produced, not on retry copies.)asyncScopeIsOpenDuringClientExecution— same for the async path.asyncRecordsNonFeignExceptionFromFailedFuture— a future completing exceptionally withIOExceptionrecords the originalIOExceptionon the observation (not aCompletionExceptionwrapper).asyncRecordsSynchronousExceptionFromClient— synchronous throws from insideAsyncClient.executealso stop the observation.parentObservationIsRestoredAfterCall— confirms the scope correctly returns to the caller's observation, so we aren't accidentally clobbering ambient tracing context.scopeReceivesObservationCarryingFeignContext— verifies theFeignContextis the context delivered toObservationHandler#onScopeOpened, which is what tracing handlers actually read.Existing tests still pass:
mvn -pl micrometer test→ 26 tests, 0 failures acrossMicrometerCapabilityTest,MicrometerObservationRegistryCapabilityTest,FeignHeaderInstrumentationTest, and the new class.Fixes #2406
Fixes #2566