Skip to content

out_s3: preserve $INDEX sequence state across restarts#11988

Open
dacort wants to merge 1 commit into
fluent:masterfrom
dacort:dacort--out_s3-preserve-seq-index
Open

out_s3: preserve $INDEX sequence state across restarts#11988
dacort wants to merge 1 commit into
fluent:masterfrom
dacort:dacort--out_s3-preserve-seq-index

Conversation

@dacort

@dacort dacort commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Fixes #11987.

The S3 output persists the $INDEX sequence counter as a plain file under the metadata stream directory (<store_dir>/<bucket>/sequence/index_metadata/seq_index_<id>). Because this file is not a Chunk I/O chunk, the sequence fstore stream has zero registered files, and flb_fstore_destroy() deletes any stream with zero files on shutdown — recursively removing the directory and the persisted $INDEX. After a graceful restart the index resets to 0 and uploads overwrite previously-written objects, contradicting the documented behavior that $INDEX continues from its last value when store_dir is on a persistent disk.

This detaches the metadata stream reference in s3_store_exit() (without deleting the directory) before the store is destroyed, so the index file survives on disk and init_seq_index() recovers it on the next start.

Root cause

  • $INDEX is written as a raw file via init_seq_index() / write_seq_index() in plugins/out_s3/s3.c, inside the sequence fstore stream, under a nested index_metadata/ subdirectory.
  • cio_scan_stream_files() (lib/chunkio/src/cio_scan.c) only registers DT_REG entries directly under a stream dir, so the nested index_metadata/ is ignored and the sequence stream has 0 files.
  • flb_fstore_destroy() (src/flb_fstore.c) deletes any stream with files == 0 (flb_fstore_stream_destroy(stream, FLB_TRUE)cio_stream_delete() → recursive directory delete).
  • s3_store_exit() (plugins/out_s3/s3_store.c) calls flb_fstore_destroy() on shutdown, so the metadata stream and the persisted $INDEX are removed on every graceful exit.

Fix

/* in s3_store_exit(), right after the `if (!ctx->fs) return 0;` guard */
if (ctx->stream_metadata != NULL) {
    flb_fstore_stream_destroy(ctx->stream_metadata, FLB_FALSE);
    ctx->stream_metadata = NULL;
}

flb_fstore_stream_destroy(stream, FLB_FALSE) removes the fstore reference from fs->streams without deleting the on-disk directory; the underlying cio_stream is still freed by the subsequent cio_destroy(). Only the sequence (metadata) stream is affected — multipart_upload_metadata uses real Chunk I/O chunks, so it is unaffected by the empty-stream cleanup.

Testing

  • Reproduced the deletion on graceful shutdown: with store_dir on a persistent volume, a marker file placed inside sequence/ and the seq_index_<id> file are both gone after a SIGTERM restart (the sequence/ dir returns with a fresh inode), while a marker placed one level above on the same volume survives — confirming the volume is healthy and Fluent Bit is deleting its own metadata stream. No Successfully recovered index log appears, and $INDEX resets to 0.
  • With this change the metadata directory is retained across restart, so init_seq_index() takes the read_seq_index() recovery path.
  • Verified the referenced symbols/signatures against the source: ctx->stream_metadata (plugins/out_s3/s3.h), flb_fstore_stream_destroy(struct flb_fstore_stream *, int delete) (include/fluent-bit/flb_fstore.h).

Notes

  • Present on 4.2.x, 5.0.x, and master (the $INDEX + metadata-stream design is unchanged across these).
  • Not multipart-specific — both PutObject and multipart upload paths use the same init_seq_index() state file.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed an issue where metadata could be lost during graceful shutdown in the S3 storage plugin by improving the shutdown sequence to properly protect persisted index data.

The S3 output stores the $INDEX sequence counter in a plain file under the
metadata stream directory (sequence/index_metadata/seq_index_<id>). This file
is not a Chunk I/O chunk, so the metadata stream has zero registered fstore
files. On (graceful) shutdown, flb_fstore_destroy() recursively deletes any
stream that has no files, which removes the metadata stream and the persisted
$INDEX value. On the next start the index is missing and resets to 0, so
subsequent uploads reuse object keys (events_0, events_1, ...) and overwrite
previously uploaded objects.

This contradicts the documented behavior that $INDEX is saved in store_dir
and continues from its last value after a restart on the same disk.

Detach the metadata stream reference in s3_store_exit() before the store is
destroyed (without deleting the directory), so the persisted index survives a
restart and can be recovered by init_seq_index().

Signed-off-by: Damon P. Cortesi <d.lifehacker@gmail.com>
@dacort dacort requested a review from a team as a code owner June 23, 2026 21:36
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a2eeba6-1e23-41c7-94d7-bdeb48c19a1c

📥 Commits

Reviewing files that changed from the base of the PR and between 08162b2 and de44ae3.

📒 Files selected for processing (1)
  • plugins/out_s3/s3_store.c

📝 Walkthrough

Walkthrough

In s3_store_exit, a new block is inserted before the existing teardown logic to explicitly detach ctx->stream_metadata by calling flb_fstore_stream_destroy with FLB_FALSE (no directory deletion) and nulling the pointer. This prevents flb_fstore_destroy from treating the empty metadata stream as a candidate for deletion.

Changes

$INDEX metadata stream preservation on shutdown

Layer / File(s) Summary
Detach metadata stream before fstore destroy
plugins/out_s3/s3_store.c
Adds a pre-teardown check in s3_store_exit: if ctx->stream_metadata != NULL, calls flb_fstore_stream_destroy(ctx->stream_metadata, FLB_FALSE) and clears the pointer, preventing flb_fstore_destroy from recursively deleting the sequence/ directory that holds the persisted $INDEX file.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Poem

🐰 Hoppy little index file, hiding in your stream,
No more getting swept away — you'll live through shutdown's dream!
A FLB_FALSE guards your home, your sequence safe and sound,
Next restart picks right up where you last were found.
The rabbit cheers: no overwrites, the $INDEX holds its ground! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'out_s3: preserve $INDEX sequence state across restarts' directly and clearly summarizes the main change—preventing $INDEX from resetting during graceful restarts.
Linked Issues check ✅ Passed The PR fully addresses the requirements in issue #11987: it detaches the metadata stream before fstore destruction, preserving the persisted $INDEX file across restarts as documented.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the objective of preventing $INDEX reset; no extraneous modifications to unrelated code are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@dacort

dacort commented Jun 24, 2026

Copy link
Copy Markdown
Author

PR follow-up: the exit-only version of this fix is not sufficient. End-to-end testing showed that the sequence/ stream can still be removed before the next init_seq_index() recovery check, because the stream has no valid Chunk I/O files and fstore treats it as empty.

I am updating the proposed approach to make the sequence stream non-empty by creating a valid marker chunk directly under sequence/ whenever $INDEX is enabled, while preserving the existing raw sequence/index_metadata/seq_index_<id> file format for the actual counter. The S3 store code also needs to skip sequence when mapping old buffered files / detecting old upload backlog, so the marker is not interpreted as data to upload.

This should cover both classes of deletion: graceful-exit fstore destruction and any startup/load cleanup path that prunes empty streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

out_s3: $INDEX sequence index resets on restart because the metadata stream is deleted on shutdown (causes object overwrites)

1 participant