Skip to content

feat(telemetry): latency histograms for LLM request duration and TTFB (#463)#782

Open
ajbozarth wants to merge 1 commit intogenerative-computing:mainfrom
ajbozarth:feat/latency-histograms-463
Open

feat(telemetry): latency histograms for LLM request duration and TTFB (#463)#782
ajbozarth wants to merge 1 commit intogenerative-computing:mainfrom
ajbozarth:feat/latency-histograms-463

Conversation

@ajbozarth
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth commented Apr 2, 2026

Misc PR

Type of PR

  • New Feature

Description

Adds latency histograms for LLM request duration and time-to-first-token (TTFB)
as part of the metrics telemetry epic (#443).

  • LatencyMetricsPlugin hooks generation_post_call (FIRE_AND_FORGET) and records
    mellea.llm.request.duration (every request) and mellea.llm.ttfb (streaming only)
  • Custom OTel View + ExplicitBucketHistogramAggregation bucket boundaries sized for
    LLM latencies; both plugins auto-register alongside TokenMetricsPlugin
  • ModelOutputThunk gains streaming: bool and ttfb_ms: float | None; TTFB is
    captured on first chunk in astream(), gated on self.streaming
  • Updated AGENTS.md, metrics.md, telemetry.md, and metrics_example.py

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

…generative-computing#463)

Adds request duration and time-to-first-token (TTFB) latency histograms
via the plugin pattern established in generative-computing#653. Includes custom OTel bucket
views sized for LLM latencies, backend telemetry field assertions across
all backends, and updated dev/published docs.

Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
@ajbozarth ajbozarth self-assigned this Apr 2, 2026
@github-actions github-actions bot added the enhancement New feature or request label Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@ajbozarth ajbozarth changed the title feat(telemetry): latency histograms for LLM request duration and TTFB… feat(telemetry): latency histograms for LLM request duration and TTFB (#463) Apr 2, 2026
@ajbozarth ajbozarth marked this pull request as ready for review April 2, 2026 21:35
@ajbozarth ajbozarth requested review from a team, jakelorocco and nrfulton as code owners April 2, 2026 21:35
@ajbozarth
Copy link
Copy Markdown
Contributor Author

I'll be OOTO tomorrow (Fri April 3) through Monday (April 6). So I will address any review feedback on Tuesday. If this gets two committer approvals with no actionable feedback while I'm out the second approver can feel free to add it to the merge queue.

This has no rush on it so it should be fine to wait till I'm back for addressing feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request observability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement latency histograms to track request duration distribution and time-to-first-token (TTFB) for streaming requests

1 participant