Skip to content

otel: add tcp metrics#12652

Open
AgraVator wants to merge 11 commits intogrpc:masterfrom
AgraVator:tcp-metrics
Open

otel: add tcp metrics#12652
AgraVator wants to merge 11 commits intogrpc:masterfrom
AgraVator:tcp-metrics

Conversation

@AgraVator
Copy link
Contributor

@AgraVator AgraVator commented Feb 11, 2026

Implements A80

@AgraVator AgraVator closed this Feb 11, 2026
@AgraVator AgraVator reopened this Feb 11, 2026
@AgraVator AgraVator marked this pull request as ready for review February 16, 2026 16:14
java.util.List<String> labelValues = getLabelValues(channel);
try {
if (channel.getClass().getName().equals(epollSocketChannelClassName)) {
Class<?> tcpInfoClass = Class.forName(epollTcpInfoClassName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of reflection in hot exporting path. So, a couple of questions.

  1. Why do we choose reflectin over typecasting and calling actual methods? Reducing dependency surface?
  2. Is there a way to optimise this like caching value on per channel level? Something like a creating a runnable per channel. Not sure how it'd be possible to share this data for the final collection on channel inactive.

Method rttMethod = tcpInfoClass.getMethod("rtt");

long totalRetrans = (Long) totalRetransMethod.invoke(info);
int retransmits = (Integer) retransmitsMethod.invoke(info);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these cumulative metrics or are these total retransmits since the connection start? If latter, the current logic is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAh I see, then I think we should not be using a counter for this... Will clarify in the gRFC

@AgraVator AgraVator requested a review from ejona86 February 26, 2026 09:54
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"otel" is a very surprising prefix to use for the PR/commit, as opentelemetry isn't actually changed at all.

AgraVator and others added 6 commits March 9, 2026 14:21
The backoff timer is only used when serializeRetries=true, and that
exists to match the old/current pick_first's behavior as closely as
possible. InternalSubchannel.updateAddresses() would take no action when
in TRANSIENT_FAILURE; it would update the addresses and just wait for
the backoff timer to expire.

Note that this only impacts serializeRetries=true; in the other cases we
do want to start trying to the new addresses immediately, because the
backoff timers are in the subchannels.

Note that this change was also important because requestConnection() can
be directly triggered by the user with channel.getState(true), and that
shouldn't defeat the backoff timer.
Since we're only supporting API levels 23+, all the supported Android
versions handle multidex natively, and without any bugs to workaround.

Also bump some minSdkVersion that didn't get updated in fa7b52b so
that multiDex is actually enabled by default.

See also b/476359563
… and 'core: Wait for backoff timer on address update in pick_first'
@AgraVator AgraVator requested a review from ejona86 March 11, 2026 09:52
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description's "Implements A80" doesn't link to the gRFC; it just links to this PR.

.setTransportTracerFactory(fakeClockTransportTracer)
.buildTransportServers(streamTracerFactories);
.setTransportTracerFactory(fakeClockTransportTracer);
return InternalOkHttpServerBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use InternalOkHttpServerBuilder; buildTransportServers() is package-private... the code was already calling it directly.

totalRetransMethod = epollTcpInfoClass.getMethod("totalRetrans");
retransMethod = epollTcpInfoClass.getMethod("retrans");
rttMethod = epollTcpInfoClass.getMethod("rtt");
} catch (Exception | Error t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use precise error handling. What specific exceptions are expected here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we catching Error and RuntimeException? Are they expected for some reason?

@AgraVator AgraVator requested a review from ejona86 March 18, 2026 08:52
totalRetransMethod = epollTcpInfoClass.getMethod("totalRetrans");
retransMethod = epollTcpInfoClass.getMethod("retrans");
rttMethod = epollTcpInfoClass.getMethod("rtt");
} catch (Exception | Error t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we catching Error and RuntimeException? Are they expected for some reason?


FakeEpollTcpInfo infoSource = new FakeEpollTcpInfo();
infoSource.setValues(123, 4, 5000);
ConfigurableFakeWithTcpInfo channel = org.mockito.Mockito.spy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use spy. Ever. It is implemented in a horribly evil way that invalidates any potential use-case for it. (It blindly copies fields from a constructed instance to a new instance that it creates.) Generally you can use a normal mock with delegatesTo(realObj) instead. If that isn't applicable, you get to rethink things.


FakeEpollTcpInfo infoSource = new FakeEpollTcpInfo();
infoSource.setValues(123, 4, 5000);
ConfigurableFakeWithTcpInfo channel = org.mockito.Mockito.spy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try actually using EmbeddedChannel instead of mocking? It looks like you're mocking this so you can fake time. I think you can do that directly with EmbeddedChannel.

@Mock
private MetricRecorder metricRecorder;
@Mock
private Channel channel;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a mock/spy here? It is really bad practice to mock out classes that aren't your own, as mocks lie and don't follow normal API guarantees when changes are made.

It looks like you're using this to override local addresses and such. Can you try using EmbeddedChannel for that?

(Note: For "real" tests, EmbeddedChannel doesn't act like a real channel well enough to be trusted too much. So we tend to prefer LocalChannel. But in this case, you want specific address types, and don't actually care about a lot of the lifecycle things that EmbeddedChannel gets wrong, so it is a fair choice.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants