Skip to content

fix(logging): Fix deadlock in log batcher#5684

Merged
sentrivana merged 8 commits intomasterfrom
ivana/recursive-guard-in-batcher
Mar 17, 2026
Merged

fix(logging): Fix deadlock in log batcher#5684
sentrivana merged 8 commits intomasterfrom
ivana/recursive-guard-in-batcher

Conversation

@sentrivana
Copy link
Contributor

Description

In certain scenarios, the SDK's log batcher might cause a deadlock. This happens if it's currently flushing, and during the flush, something emits a log that we try to capture and add to the (locked) batcher.

With this PR, we're adding a re-entry guard to the batcher, preventing it from recursively handling log items during locked code paths like flush().

Issues

Closes #5681

Reminders

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

Anthropic

  • Record finish reasons in AI monitoring spans by ericapisani in #5678
  • Emit gen_ai.chat spans for asynchronous messages.stream() by alexander-alderman-webb in #5572
  • Emit AI Client Spans for synchronous messages.stream() by alexander-alderman-webb in #5565
  • Set gen_ai.response.id span attribute by ericapisani in #5662
  • Add gen_ai.system attribute to spans by ericapisani in #5661

Pydantic Ai

  • Support ImageUrl content type in span instrumentation by ericapisani in #5629
  • Add tool description to execute_tool spans by ericapisani in #5596

Other

  • (crons) Add owner field to MonitorConfig by julwhitney13 in #5610
  • (otlp) Add collector_url option to OTLPIntegration by sl0thentr0py in #5603

Bug Fixes 🐛

  • (ai) Truncate list-based message content in AI monitoring by ericapisani in #5631
  • (anthropic) Close span on GeneratorExit by alexander-alderman-webb in #5643
  • (celery) Propagate user-set headers by sentrivana in #5581
  • (langchain) Wrap finish_reason in array for gen_ai span attribute by ericapisani in #5666
  • (logging) Fix deadlock in log batcher by sentrivana in #5684
  • (profiler) Prevent buffer race condition during rapid start/stop cycles by ericapisani in #5622
  • (utils) Avoid double serialization of strings in safe_serialize by ericapisani in #5587
  • Enable unused import ruff check and fix unused imports by sentrivana in #5652

Documentation 📚

  • (openai-agents) Remove inapplicable comment by alexander-alderman-webb in #5495
  • Add AGENTS.md by sentrivana in #5579
  • Add set_attribute example to changelog by sentrivana in #5578

Internal Changes 🔧

Anthropic

  • Check system and response ID attributes on spans created by stream() by alexander-alderman-webb in #5665
  • Skip accumulation logic for unexpected types in streamed response by alexander-alderman-webb in #5564
  • Factor out streamed result handling by alexander-alderman-webb in #5563
  • Stream valid JSON by alexander-alderman-webb in #5641
  • Stop mocking response iterator by alexander-alderman-webb in #5573

Docs

  • Remove agentic codebase documentation workflows by dingsdax in #5655
  • Switch agentic workflows from Copilot to Claude engine by dingsdax in #5654
  • Add agentic workflows for codebase documentation by dingsdax in #5649

Openai Agents

  • Do not fail on new tool fields by alexander-alderman-webb in #5625
  • Stop expecting a specific function name by alexander-alderman-webb in #5623
  • Set streaming header when library uses with_streaming_response() by alexander-alderman-webb in #5583
  • Replace mocks with httpx for streamed responses by alexander-alderman-webb in #5580
  • Replace mocks with httpx in non-MCP tool tests by alexander-alderman-webb in #5602
  • Replace mocks with httpx in MCP tool tests by alexander-alderman-webb in #5605
  • Replace mocks with httpx in handoff tests by alexander-alderman-webb in #5604
  • Replace mocks with httpx in API error test by alexander-alderman-webb in #5601
  • Replace mocks with httpx in non-error single-response tests by alexander-alderman-webb in #5600
  • Remove test for unreachable state by alexander-alderman-webb in #5584
  • Expect namespace tool field for new openai versions by alexander-alderman-webb in #5599

Other

  • (graphene) Simplify span creation by sentrivana in #5648
  • (httpx) Resolve type checking failures by alexander-alderman-webb in #5626
  • (pyramid) Support alpha suffixes in version parsing by alexander-alderman-webb in #5618
  • (rust) Don't implement separate scope management by sentrivana in #5639
  • (strawberry) Simplify span creation by sentrivana in #5647
  • 🤖 Update test matrix with new releases (03/16) by github-actions in #5671
  • Remove custom warden action by sentrivana in #5653
  • Add httpx to linting requirements by alexander-alderman-webb in #5644
  • Remove CodeQL action by sentrivana in #5616
  • Normalize dots in package names in populate_tox.py by alexander-alderman-webb in #5574
  • Do not run actions on potel-base by sentrivana in #5614

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Codecov Results 📊

134 passed | Total: 134 | Pass Rate: 100% | Execution Time: 20.60s

All tests are passing successfully.

❌ Patch coverage is 2.56%. Project has 13572 uncovered lines.

Files with missing lines (2)
File Patch % Lines
_batcher.py 37.36% ⚠️ 57 Missing
_span_batcher.py 28.57% ⚠️ 55 Missing

Generated by Codecov Action

@github-actions
Copy link
Contributor

Codecov Results 📊

52 passed | Total: 52 | Pass Rate: 100% | Execution Time: 6.78s

All tests are passing successfully.

❌ Patch coverage is 5.56%. Project has 15300 uncovered lines.

Files with missing lines (1)
File Patch % Lines
_batcher.py 37.78% ⚠️ 56 Missing

Generated by Codecov Action

@sentrivana sentrivana marked this pull request as ready for review March 17, 2026 12:40
@sentrivana sentrivana requested a review from a team as a code owner March 17, 2026 12:40
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: flush() unconditionally clears re-entry guard flag
    • Batcher.flush() now preserves and restores the thread-local re-entry flag instead of unconditionally clearing it, and a regression test covers this case.

Create PR

Or push these changes by commenting:

@cursor push 4fe16a2160
Preview (4fe16a2160)
diff --git a/sentry_sdk/_batcher.py b/sentry_sdk/_batcher.py
--- a/sentry_sdk/_batcher.py
+++ b/sentry_sdk/_batcher.py
@@ -115,11 +115,12 @@
         self._flusher = None
 
     def flush(self) -> None:
+        old_flag = getattr(self._active, "flag", False)
         self._active.flag = True
         try:
             self._flush()
         finally:
-            self._active.flag = False
+            self._active.flag = old_flag
 
     def _add_to_envelope(self, envelope: "Envelope") -> None:
         envelope.add_item(

diff --git a/tests/test_batcher.py b/tests/test_batcher.py
new file mode 100644
--- /dev/null
+++ b/tests/test_batcher.py
@@ -1,0 +1,28 @@
+from sentry_sdk._batcher import Batcher
+
+
+class DummyBatcher(Batcher[int]):
+    def __init__(self) -> None:
+        super().__init__(
+            capture_func=lambda envelope: None,
+            record_lost_func=lambda *args, **kwargs: None,
+        )
+        self.flush_calls = 0
+
+    def _flush(self):
+        self.flush_calls += 1
+        return None
+
+    @staticmethod
+    def _to_transport_format(item):
+        return item
+
+
+def test_flush_restores_existing_reentry_guard():
+    batcher = DummyBatcher()
+    batcher._active.flag = True
+
+    batcher.flush()
+
+    assert batcher._active.flag is True
+    assert batcher.flush_calls == 1

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.



@minimum_python_37
@pytest.mark.timeout(5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will actually make the test fail after 5s if it gets deadlocked. Tested on master.

@sentrivana sentrivana merged commit b29c4bb into master Mar 17, 2026
158 checks passed
@sentrivana sentrivana deleted the ivana/recursive-guard-in-batcher branch March 17, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock in LogBatcher when ResourceWarning emitted during flush (enable_logs=True)

2 participants