Skip to content

feat: add query/segment/count metric to data nodes#19624

Open
jtuglu1 wants to merge 1 commit into
apache:masterfrom
jtuglu1:add-query-segment-count
Open

feat: add query/segment/count metric to data nodes#19624
jtuglu1 wants to merge 1 commit into
apache:masterfrom
jtuglu1:add-query-segment-count

Conversation

@jtuglu1

@jtuglu1 jtuglu1 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

Allows for quick queries to understand how many segments a given query issued scans for against a given data node. Better than counting the unique query/segment/time or similar, especially if you are sampling higher volume metrics like query/segment/time, but still want to have quick way to know stats about # of segments queried per node.

Release note

Add query/segment/count metric to data nodes.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 2
P3 0
Total 2
Severity Findings
P0 0
P1 0
P2 2
P3 0
Total 2

Reviewed 14 of 14 changed files.


This is an automated review by Codex GPT-5.5

Comment thread server/src/main/java/org/apache/druid/server/ServerManager.java Outdated
Comment thread docs/operations/metrics.md
@jtuglu1 jtuglu1 force-pushed the add-query-segment-count branch from 78d50df to c273754 Compare June 24, 2026 17:37
@jtuglu1 jtuglu1 added this to the 38.0.0 milestone Jun 24, 2026

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left some minor comments.

|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`, `statusCode`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|< 1s|
|`query/segment/count`|Number of segments this Historical scans for a query. This is local to this data node; the Broker metric `query/segments/count` reports the distributed query plan.|<p>Common: `dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|Varies|

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the similar metric names query/segment/count and query/segments/count (with an 'S') can be confusing. I wish the Broker metric name reflected that it was a plan-only metric.

Nit: For the use case in this PR though, I feel that query/segments/count probably makes more sense since query/segment/time reflects time taken to scan a single segment and not the total time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, do we even need to mention the Broker metric here? It seems unrelated despite the similar names. The disambiguation, if needed, may be done in Broker metrics section itself.

@Nullable
private final String segmentId;
@Nullable
private final Integer localSegmentCount;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the prefix local signify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local to this data node

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that is implicit, since the data node is not aware of segments from other nodes anyway.

/**
* This class is responsible for emitting query/segment/time, query/wait/time and query/segmentAndCache/Time metrics for a Sink.
* This class is responsible for emitting query/segment/time, query/wait/time, query/segmentAndCache/Time and
* query/segment/count metrics for a Sink.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call out that unlike other metrics emitted here, query/segment/count is emitted only when segmentId is non-null, since it is a total count.

Comment on lines +505 to +506
Assert.assertEquals(1, stubServiceEmitter.getMetricEventCount(DefaultQueryMetrics.QUERY_SEGMENT_COUNT));
Assert.assertEquals(2L, stubServiceEmitter.getLatestMetricEventValue(DefaultQueryMetrics.QUERY_SEGMENT_COUNT));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You may also use stubServiceEmitter.verifyEmitted and verifyValue.

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 16 of 16 changed files.


This is an automated review by Codex GPT-5.5

);
}

final int localSegmentCount = allSegmentReferences.size();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Count realtime sinks instead of hydrants

This counts allSegmentReferences, but those references are hydrant-level: Sink.acquireSegmentReferences adds one SinkSegmentReference per FireHydrant. After a realtime segment has multiple hydrants, for example after persist/swap or with both persisted and current hydrants, a query over one Druid segment emits query/segment/count as the hydrant count instead of the segment/Sink count. That disagrees with the metric's documented segment-level meaning and with the existing per-segment metrics that aggregate by sinkSegmentId; please count distinct queried sinks/descriptors and add a multi-hydrant test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants