Add controller support for rejecting out-of-retention segment uploads#18486
Add controller support for rejecting out-of-retention segment uploads#18486shuturmurgh wants to merge 4 commits into
Conversation
Introduce controller.segment.upload.rejectOutOfRetention.enabled (default false). When enabled, single-segment upload rejects segments past the table data retention window using RetentionUtils (aligned with time retention). METADATA batch upload and reingest upload paths are unchanged. - RetentionUtils: parseTableDataRetentionMillis, SegmentMetadata isPurgeable, segmentMetadataEndTimeMillis, shared isPurgeableInternal - SegmentValidationUtils.rejectUploadIfOutOfRetention (OFFLINE non-APPEND skip) - Tests for RetentionUtils, SegmentValidationUtils, and ControllerConf Co-authored-by: Cursor <cursoragent@cursor.com>
| if (TimeUtils.timeValueInValidRange(endTimeMs)) { | ||
| return currentTimeMs - endTimeMs > retentionMs; | ||
| } | ||
| if (useCreationTimeFallback && TimeUtils.timeValueInValidRange(creationTimeMs)) { |
There was a problem hiding this comment.
The creationTime fallback may be of no value for offline tables. It means that if a valid endTime was not computed for data then we fall back to the time when the segment was created. In many cases the createTimeMs will be closer to now(). so currentTimeMs - creationTimeMs > retentionMs; will always be false in fallback.
In case of upsertCompaction like minion tasks, the creationTime is preserved from the original segments so thay may be very old. The segments would get rejected here. It looks like a similar handling is done previously on minion side to avoid uploading segments near retention boundary: #18285
There was a problem hiding this comment.
@tarun11Mavani, can you also take a look from upsertCompaction perspective.
There was a problem hiding this comment.
For single-segment upload we now call RetentionUtils.isPurgeable(..., useCreationTimeFallback=false), so we only use the segment’s data end time from file metadata.
Invalid/missing end time remains fail-open (no reject).
Centralize offline APPEND retention gating in RetentionUtils.shouldManageTimeBasedDataRetention and reuse it from RetentionManager and SegmentValidationUtils. Log parse failures for table retention config, rename getSegmentMetadataEndTimeMillis with clearer javadoc, and reject uploads using segment data end time only (no index-creation fallback). Emit OUT_OF_RETENTION_SEGMENT_UPLOAD_REJECTED when rejecting; document 403 behavior in Javadoc. Co-authored-by: Cursor <cursoragent@cursor.com>
Use getTimeUnit and getEndTime only instead of Joda Interval; expand RetentionUtilsTest coverage for the new contract. Co-authored-by: Cursor <cursoragent@cursor.com>
Convert RetentionUtils, ControllerConf, ControllerMeter, and SegmentValidationUtils docs to /// style for consistency with PR review. Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18486 +/- ##
============================================
- Coverage 34.99% 34.99% -0.01%
Complexity 869 869
============================================
Files 3256 3266 +10
Lines 199548 199941 +393
Branches 30986 31055 +69
============================================
+ Hits 69840 69961 +121
- Misses 123570 123822 +252
- Partials 6138 6158 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fixes #18485
Summary
Add controller capability to reject segment uploads when the segment is already outside the table’s configured data retention window (time-based retention), using
RetentionUtilsso the boundary matches retention manager behavior.controller.segment.upload.rejectOutOfRetention.enabled(defaultfalse). No behavior change until it is set totrue.PinotSegmentUploadDownloadRestletResource→SegmentValidationUtils.rejectUploadIfOutOfRetention). METADATA batch upload and reingest upload paths are unchanged.APPEND(completed-segment semantics). REALTIME and OFFLINE with APPEND ingestion are evaluated when the flag is on and retention parses successfully.RetentionUtils(parseTableDataRetentionMillis,SegmentMetadata-basedisPurgeable,segmentMetadataEndTimeMillis). Configuration is surfaced viaControllerConf.Release notes
controller.segment.upload.rejectOutOfRetention.enabled(defaultfalse). Whentrue, single-segment upload may return 403 if the segment is outside the table data retention window as defined bySegmentsValidationAndRetentionConfigretention time value/unit.