Skip to content

Add controller support for rejecting out-of-retention segment uploads#18486

Open
shuturmurgh wants to merge 4 commits into
apache:masterfrom
shuturmurgh:master
Open

Add controller support for rejecting out-of-retention segment uploads#18486
shuturmurgh wants to merge 4 commits into
apache:masterfrom
shuturmurgh:master

Conversation

@shuturmurgh
Copy link
Copy Markdown

Fixes #18485

Summary

Add controller capability to reject segment uploads when the segment is already outside the table’s configured data retention window (time-based retention), using RetentionUtils so the boundary matches retention manager behavior.

  • New controller property: controller.segment.upload.rejectOutOfRetention.enabled (default false). No behavior change until it is set to true.
  • Wired on single-segment upload only (PinotSegmentUploadDownloadRestletResourceSegmentValidationUtils.rejectUploadIfOutOfRetention). METADATA batch upload and reingest upload paths are unchanged.
  • Retention check is skipped for OFFLINE tables whose batch segment ingestion type is not APPEND (completed-segment semantics). REALTIME and OFFLINE with APPEND ingestion are evaluated when the flag is on and retention parses successfully.
  • Shared parsing / purgeability helpers live in RetentionUtils (parseTableDataRetentionMillis, SegmentMetadata-based isPurgeable, segmentMetadataEndTimeMillis). Configuration is surfaced via ControllerConf.

Release notes

  • New controller configuration: controller.segment.upload.rejectOutOfRetention.enabled (default false). When true, single-segment upload may return 403 if the segment is outside the table data retention window as defined by SegmentsValidationAndRetentionConfig retention time value/unit.

Introduce controller.segment.upload.rejectOutOfRetention.enabled (default false). When enabled, single-segment upload rejects segments past the table data retention window using RetentionUtils (aligned with time retention). METADATA batch upload and reingest upload paths are unchanged.

- RetentionUtils: parseTableDataRetentionMillis, SegmentMetadata isPurgeable, segmentMetadataEndTimeMillis, shared isPurgeableInternal
- SegmentValidationUtils.rejectUploadIfOutOfRetention (OFFLINE non-APPEND skip)
- Tests for RetentionUtils, SegmentValidationUtils, and ControllerConf

Co-authored-by: Cursor <cursoragent@cursor.com>
if (TimeUtils.timeValueInValidRange(endTimeMs)) {
return currentTimeMs - endTimeMs > retentionMs;
}
if (useCreationTimeFallback && TimeUtils.timeValueInValidRange(creationTimeMs)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creationTime fallback may be of no value for offline tables. It means that if a valid endTime was not computed for data then we fall back to the time when the segment was created. In many cases the createTimeMs will be closer to now(). so currentTimeMs - creationTimeMs > retentionMs; will always be false in fallback.

In case of upsertCompaction like minion tasks, the creationTime is preserved from the original segments so thay may be very old. The segments would get rejected here. It looks like a similar handling is done previously on minion side to avoid uploading segments near retention boundary: #18285

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarun11Mavani, can you also take a look from upsertCompaction perspective.

Copy link
Copy Markdown
Author

@shuturmurgh shuturmurgh May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single-segment upload we now call RetentionUtils.isPurgeable(..., useCreationTimeFallback=false), so we only use the segment’s data end time from file metadata.
Invalid/missing end time remains fail-open (no reject).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

Comment thread pinot-common/src/main/java/org/apache/pinot/common/utils/RetentionUtils.java Outdated
Centralize offline APPEND retention gating in RetentionUtils.shouldManageTimeBasedDataRetention
and reuse it from RetentionManager and SegmentValidationUtils. Log parse failures for table
retention config, rename getSegmentMetadataEndTimeMillis with clearer javadoc, and reject uploads
using segment data end time only (no index-creation fallback). Emit
OUT_OF_RETENTION_SEGMENT_UPLOAD_REJECTED when rejecting; document 403 behavior in Javadoc.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread pinot-common/src/main/java/org/apache/pinot/common/utils/RetentionUtils.java Outdated
shuturmurgh and others added 2 commits May 15, 2026 12:22
Use getTimeUnit and getEndTime only instead of Joda Interval; expand RetentionUtilsTest coverage for the new contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
Convert RetentionUtils, ControllerConf, ControllerMeter, and SegmentValidationUtils docs to /// style for consistency with PR review.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.99%. Comparing base (5ebd445) to head (31f1ba0).
⚠️ Report is 32 commits behind head on master.

Files with missing lines Patch % Lines
.../org/apache/pinot/common/utils/RetentionUtils.java 78.57% 3 Missing and 3 partials ⚠️
.../controller/api/upload/SegmentValidationUtils.java 83.33% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18486      +/-   ##
============================================
- Coverage     34.99%   34.99%   -0.01%     
  Complexity      869      869              
============================================
  Files          3256     3266      +10     
  Lines        199548   199941     +393     
  Branches      30986    31055      +69     
============================================
+ Hits          69840    69961     +121     
- Misses       123570   123822     +252     
- Partials       6138     6158      +20     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 34.99% <83.33%> (-0.01%) ⬇️
temurin 34.99% <83.33%> (-0.01%) ⬇️
unittests 34.98% <83.33%> (-0.01%) ⬇️
unittests2 34.98% <83.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Controller should reject out-of-retention segments

4 participants