Skip to content

RATIS-2397. Add trace support for Log Appender#1443

Open
taklwu wants to merge 6 commits intoapache:masterfrom
taklwu:RATIS-2397
Open

RATIS-2397. Add trace support for Log Appender#1443
taklwu wants to merge 6 commits intoapache:masterfrom
taklwu:RATIS-2397

Conversation

@taklwu
Copy link
Copy Markdown
Contributor

@taklwu taklwu commented Apr 30, 2026

What changes were proposed in this pull request?

Adding tracing support for AppendEntries / AppendEntriesAsync

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2397

How was this patch tested?

  1. added unit tests
  2. build with RATIS-2398. Add opentelemetry-javaagent to ratis-examples and assembly #1428 and test locally with Jaegar UI to see the spans have been capture.
Screenshot 2026-04-29 at 5 04 41 PM Screenshot 2026-04-29 at 4 54 36 PM

@taklwu taklwu changed the title RATIS-2397 Add trace support for Log Appender RATIS-2397. Add trace support for Log Appender Apr 30, 2026
@taklwu

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taklwu , thanks for working on this!

Please see the comments inlined and also https://issues.apache.org/jira/secure/attachment/13082105/1443_review.patch

: TraceUtils.extractContextFromProto(spanContext);
return TraceUtils.traceAsyncMethod(action, () -> {
final Span span = TraceUtils.getGlobalTracer()
.spanBuilder("raft.server.appendEntriesAsync")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a static constant for "raft.server.appendEntriesAsync".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new class SpanNames and add this into it, also it comes with other refactoring

Comment thread ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java Outdated
Comment on lines +586 to +591
for (LogEntryProto e : entries) {
final SpanContextProto sc = traceByIndex.get(e.getIndex());
if (sc != null && !sc.getContextMap().isEmpty()) {
return sc;
}
}
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since entries is a list, there could be multiple PendingRequest(s). Should we return multiple SpanContextProto(s)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK that AppendEntriesRequestProto is a single RPC with a batch of append entries, so, we should only have a single parent spancontext for this batch.

Comment on lines +586 to +587
for (LogEntryProto e : entries) {
final SpanContextProto sc = traceByIndex.get(e.getIndex());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it needs to loop for entries in consecutive indices, it is more efficient to use a NavigableMap. I suggest to use TreeMap with a read-write lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, I followed your change.

Comment on lines +363 to +364
private final ConcurrentHashMap<Long, SpanContextProto> replicationTraceByLogIndex =
new ConcurrentHashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a new class, say LeaderTracer, and move all the tracing code there.

1. use LeaderTracer
2. refactor constants
Copy link
Copy Markdown
Contributor Author

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your time! and please review again.

: TraceUtils.extractContextFromProto(spanContext);
return TraceUtils.traceAsyncMethod(action, () -> {
final Span span = TraceUtils.getGlobalTracer()
.spanBuilder("raft.server.appendEntriesAsync")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new class SpanNames and add this into it, also it comes with other refactoring

Comment on lines +586 to +591
for (LogEntryProto e : entries) {
final SpanContextProto sc = traceByIndex.get(e.getIndex());
if (sc != null && !sc.getContextMap().isEmpty()) {
return sc;
}
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK that AppendEntriesRequestProto is a single RPC with a batch of append entries, so, we should only have a single parent spancontext for this batch.

Comment on lines +586 to +587
for (LogEntryProto e : entries) {
final SpanContextProto sc = traceByIndex.get(e.getIndex());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, I followed your change.

Comment thread ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java Outdated
@taklwu taklwu requested a review from szetszwo May 5, 2026 23:00
@taklwu
Copy link
Copy Markdown
Contributor Author

taklwu commented May 6, 2026

TestRaftAsyncWithNetty failure should not be related, I may need to retrigger the execution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants