Skip to content

out_cloudwatch_logs: Plug SEGV on not found error#11983

Open
cosmo0920 wants to merge 4 commits into
masterfrom
cosmo0920-plug-segv-on-cloudwatch_logs
Open

out_cloudwatch_logs: Plug SEGV on not found error#11983
cosmo0920 wants to merge 4 commits into
masterfrom
cosmo0920-plug-segv-on-cloudwatch_logs

Conversation

@cosmo0920

@cosmo0920 cosmo0920 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

In this PR, we'll treat unrecoverable error on not found exception occurred on out_cloudwatch_logs.

Closes #11959.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • CloudWatch log delivery now recognizes “resource not found” failures as non-retriable, stopping repeated send attempts and improving recovery behavior.
    • Flush behavior now returns the correct error outcome when a non-retriable delivery issue is detected.
  • Tests

    • Updated CloudWatch runtime tests to better model existing/not-found scenarios.
    • Added and enhanced CloudWatch mock call-count tracking to validate when streams are created and when log events are submitted.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11cc3dd1-da15-46c3-8190-a4d3796b0c86

📥 Commits

Reviewing files that changed from the base of the PR and between 9b1fcbd and 11c3ea0.

📒 Files selected for processing (2)
  • plugins/out_cloudwatch_logs/cloudwatch_api.c
  • tests/runtime/out_cloudwatch.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/runtime/out_cloudwatch.c
  • plugins/out_cloudwatch_logs/cloudwatch_api.c

📝 Walkthrough

Walkthrough

The PR fixes a SIGSEGV crash on ResourceNotFoundException by adding a non_retriable_error flag to the cw_flush struct. When put_log_events() receives a ResourceNotFoundException, it parses the error, sets the flag, and returns -1. process_and_send() exits early on that flag, and cb_cloudwatch_flush returns FLB_ERROR instead of FLB_RETRY. Mock call-count helpers are added for test infrastructure, and runtime tests are updated to validate the one-shot delivery failure behavior.

Changes

CloudWatch ResourceNotFoundException Non-Retriable Error Handling

Layer / File(s) Summary
cw_flush struct field and mock API declarations
plugins/out_cloudwatch_logs/cloudwatch_logs.h, plugins/out_cloudwatch_logs/cloudwatch_api.h
Adds non_retriable_error int field to cw_flush struct to mark a non-retriable delivery failure, and declares three mock call-count helpers (cloudwatch_mock_call_count_reset, cloudwatch_mock_call_count_get, cloudwatch_mock_create_after_put_count_get) in the header.
Mock call counter implementation
plugins/out_cloudwatch_logs/cloudwatch_api.c
Adds static counters for CreateLogStream and PutLogEvents calls, implements the three helper functions to reset and query call counts, and updates mock_http_call() to increment counters based on API name and track create-after-put sequencing.
ResourceNotFoundException detection in put_log_events()
plugins/out_cloudwatch_logs/cloudwatch_api.c
Introduces a local error variable to parse AWS error payloads; when the error code is ResourceNotFoundException, sets buf->non_retriable_error = FLB_TRUE, clears stream expiration, destroys the HTTP client, and returns -1.
Non-retriable error propagation
plugins/out_cloudwatch_logs/cloudwatch_api.c, plugins/out_cloudwatch_logs/cloudwatch_logs.c
process_and_send() returns -1 immediately when buf->non_retriable_error is set. cb_cloudwatch_flush branches on the flag to return FLB_ERROR instead of FLB_RETRY when non-retriable.
Stream expiration cleanup
plugins/out_cloudwatch_logs/cloudwatch_api.c
Reorders get_or_create_log_stream() iteration logic to first expire and destroy stale streams before evaluating name/group matches, using an else-if branch to prevent false matches.
Test infrastructure and validation
tests/runtime/out_cloudwatch.c
Introduces CLOUDWATCH_ERROR_ALREADY_EXISTS constant across already-exists tests, adds Windows setenv/unsetenv macros, resets mock counters before the not-found test, and asserts PutLogEvents call counts and create-after-put sequencing to validate one-shot behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • fluent/fluent-bit#11966: Complementary PR touching CloudWatch Logs error handling for ResourceNotFoundException in the generic error reporter layer alongside this plugin-level fix.

Suggested reviewers

  • PettitWesley
  • edsiper

Poem

🐰 A stream vanished briefly — but wait, don't you fret!
No more SIGSEGV crashing; the fix is all set.
One flag marks the error, no loops to retry,
FLB_ERROR returned, the mock counts count high.
Mock assertions pass now: one call, then no more,
Little rabbit's relieved — logs flow as before! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main fix: addressing a SIGSEGV crash when ResourceNotFoundException is encountered on an existing log stream.
Linked Issues check ✅ Passed Changes address the core issue: detecting ResourceNotFoundException, marking it non-retriable, cleaning up resources, and preventing the crash. New test assertions verify proper handling of the error condition.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the ResourceNotFoundException handling: mock infrastructure for testing, error flag addition, error detection logic, and corresponding test updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-plug-segv-on-cloudwatch_logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b1fcbd710

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/out_cloudwatch_logs/cloudwatch_api.c
@cosmo0920 cosmo0920 added this to the Fluent Bit v5.0.8 milestone Jun 23, 2026
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cloudwatch_logs] SIGSEGV when PutLogEvents returns ResourceNotFoundException on an existing stream (plugin crashes instead of recreating/retrying)

2 participants