Skip to content

fix: Fix invalid time comparison#19603

Open
vivek807 wants to merge 11 commits into
apache:masterfrom
deep-bi:deep/feature/DRUID-125-Fix-invalid-time-comparison
Open

fix: Fix invalid time comparison#19603
vivek807 wants to merge 11 commits into
apache:masterfrom
deep-bi:deep/feature/DRUID-125-Fix-invalid-time-comparison

Conversation

@vivek807

Copy link
Copy Markdown
Contributor

Fixes #19602.

Description

forceOrWaitOngoingDatabasePoll in SqlSegmentsMetadataManager checks whether a concurrent OnDemandDatabasePoll that completed while the calling thread waited for the write lock makes a new poll unnecessary. This check was always false due to a clock mismatch, causing an extra database poll on every contended call.

Fixed invalid clock comparison in forceOrWaitOngoingDatabasePoll

OnDemandDatabasePoll.initiationTimeNanos is assigned via System.nanoTime(), which is a monotonic clock with an arbitrary JVM-internal reference point.
The comparison value was constructed as:

long checkStartTimeNanos = TimeUnit.MILLISECONDS.toNanos(checkStartTime);
// checkStartTime = System.currentTimeMillis()

System.currentTimeMillis() returns milliseconds since the Unix epoch (~1.7 × 10¹² ms). Converting to nanoseconds yields ~1.7 × 10²¹ ns, which is orders of magnitude larger than any System.nanoTime() value. The condition initiationTimeNanos > checkStartTimeNanos was therefore always false, so an in-progress on-demand poll was never recognised as sufficient.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added the GHA label Jun 19, 2026
@vivek807 vivek807 force-pushed the deep/feature/DRUID-125-Fix-invalid-time-comparison branch from 534f417 to 81c9773 Compare June 19, 2026 11:25

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1
Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 1 of 1 changed files.


This is an automated review by Codex GPT-5.5

Comment thread server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java Outdated
@vivek807 vivek807 requested a review from FrankChen021 June 22, 2026 09:07

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 0
P3 1
Total 1

Reviewed 2 of 2 changed files.


This is an automated review by Codex GPT-5.5

@maytasm

maytasm commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@vivek807
Thanks for flagging this issue. You are right, the values are not comparable.
I'm thinking that maybe instead we should standardize the timestamps in OnDemandDatabasePoll and PeriodicDatabasePoll to use the same format (i.e. both use System.nanoTime() or both use System.currentTimeMillis()). This can avoid similar confusion / bug like this in the future. Wdyt? Thanks!

@vivek807

Copy link
Copy Markdown
Contributor Author

@vivek807 Thanks for flagging this issue. You are right, the values are not comparable. I'm thinking that maybe instead we should standardize the timestamps in OnDemandDatabasePoll and PeriodicDatabasePoll to use the same format (i.e. both use System.nanoTime() or both use System.currentTimeMillis()). This can avoid similar confusion / bug like this in the future. Wdyt? Thanks!

Yes, this will avoid confusion in future. Let me refactor it to use same type of clock here

Signed-off-by: Vivek Dhiman <approach2vivek@gmail.com>
@vivek807 vivek807 force-pushed the deep/feature/DRUID-125-Fix-invalid-time-comparison branch from b80a10d to 33138ae Compare June 26, 2026 08:04
@vivek807 vivek807 requested a review from FrankChen021 June 26, 2026 08:05

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 2 of 2 changed files.


This is an automated review by Codex GPT-5.5

long millisElapsedFromInitiation()
{
return System.nanoTime() - initiationTimeNanos;
return System.currentTimeMillis() - initiationTimeMillis;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Use monotonic time for freshness elapsed checks

millisElapsedFromInitiation() now computes elapsed time from System.currentTimeMillis(). If the wall clock moves backward after an on-demand poll starts, this value can become negative or too small, so useLatestSnapshotIfWithinDelay() can keep accepting a stale snapshot and the periodic poll task can sleep longer than periodicPollDelay. Keep a monotonic nanoTime() value for elapsed/freshness calculations and use a separate wall-clock timestamp only where this code compares against wall-clock poll start times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

4 participants