Skip to content

feat(metrics): migrate sei-cosmos to OpenTelemetry (PLT-353)#3467

Open
amir-deris wants to merge 1 commit into
mainfrom
amir/plt-352-migrate-sei-cosmos-to-otel
Open

feat(metrics): migrate sei-cosmos to OpenTelemetry (PLT-353)#3467
amir-deris wants to merge 1 commit into
mainfrom
amir/plt-352-migrate-sei-cosmos-to-otel

Conversation

@amir-deris
Copy link
Copy Markdown
Contributor

@amir-deris amir-deris commented May 19, 2026

Adds OTel instrumentation to sei-cosmos following the same pattern as PLT-329, PLT-330, PLT-336, PLT-339, and PLT-343.

New instruments

baseapp (meter seicosmos_baseapp)

  • mid_block_duration — histogram, seconds
  • end_block_duration — histogram, seconds
  • deliver_tx_duration — histogram, seconds
  • tx — counter, total delivered transactions
  • tx_result — counter, result label (successful/failed)
  • tx_gas_used — gauge
  • tx_gas_wanted — gauge
  • commit_duration — histogram, seconds
  • abci_query_duration — histogram, seconds, path label
  • process_proposal_duration — histogram, seconds
  • finalize_block_duration — histogram, seconds
  • get_tx_priority_hint_duration — histogram, seconds
  • run_tx_duration — histogram, seconds, mode label (replaces MeasureThroughputSinceWithLabels for TxCount)
  • run_msgs_duration — histogram, seconds (replaces MeasureThroughputSinceWithLabels for MessageCount)
  • run_msg_latency — histogram, seconds, type label (replaces both sei.cosmos.run.msg.latency and cosmos.run.msg.latency)

storev2/rootmulti (meter seicosmos_storev2_rootmulti)

  • sc_commit_latency — histogram, seconds
  • ss_version — gauge
  • historical_abci_query — counter, success + proof labels
  • iavl_total_key_bytes — gauge, store_name label
  • iavl_total_value_bytes — gauge, store_name label
  • iavl_total_num_keys — gauge, store_name label
  • state_sync_keys_exported — counter

tasks (meter seicosmos_tasks)

  • scheduler_retries — counter
  • scheduler_incarnations — counter

store/types (meter seicosmos_store_types)

  • gas_exceeded — counter, error + descriptor labels
  • bounded_cache — gauge, type label

x/upgrade (meter seicosmos_x_upgrade)

  • begin_blocker_duration — histogram, seconds
  • plan_height — gauge, name + info labels

x/upgrade/keeper (meter seicosmos_x_upgrade_keeper)

  • plan_height — gauge, name + info labels

x/auth/vesting (meter seicosmos_x_auth_vesting)

  • new_account — counter
  • account_amount — gauge, denom label

x/bank/keeper (meter seicosmos_x_bank_keeper)

  • send_amount — gauge, denom label

x/distribution/keeper (meter seicosmos_x_distribution_keeper)

  • withdraw_reward_amount — gauge, denom label
  • withdraw_commission_amount — gauge, denom label

x/staking/keeper (meter seicosmos_x_staking_keeper)

  • delegate — counter
  • delegate_amount — gauge, denom label
  • redelegate — counter
  • redelegate_amount — gauge, denom label
  • undelegate — counter
  • undelegate_amount — gauge, denom label

x/gov/keeper (meter seicosmos_x_gov_keeper)

  • proposal — counter
  • vote — counter, proposal_id label
  • deposit — counter, proposal_id label

Notes

  • All packages use dual-emit with TODO(PLT-353) comments pending dashboard verification.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 19, 2026

PR Summary

Medium Risk
Touches performance-critical ABCI and store commit/query paths to emit new OpenTelemetry metrics; while logic is largely additive, histogram/counter recording and new must()-initialized instruments could introduce runtime overhead or startup failures if OTEL is misconfigured.

Overview
Adds OpenTelemetry metrics across sei-cosmos for core execution paths, introducing new meters/instruments and recording them alongside existing telemetry metrics (guarded by TODOs for later removal).

In baseapp, ABCI handlers (MidBlock, EndBlock, DeliverTx, Commit, Query, ProcessProposal, FinalizeBlock, GetTxPriorityHint) and execution helpers (runTx, RunMsgs, per-message latency) now record OTEL histograms/counters/gauges (with attributes like tx result, query path, mode, and msg type). GetTxPriorityHint now uses the incoming context (signature change) to support OTEL recording.

In storage and modules, OTEL metrics are added for bounded cache evictions and gas exceeded errors (store/types), storev2 commit latency/state sync and historical query outcomes (storev2/rootmulti), OCC scheduler retries/incarnations (tasks), and key message-level amount/counter metrics in bank, distribution, staking, gov, auth/vesting, plus upgrade begin-blocker duration and plan height (x/upgrade).

Reviewed by Cursor Bugbot for commit bf58610. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 19, 2026, 11:48 PM

@amir-deris amir-deris changed the title Added otel to sei-cosmos package feat(metrics): migrate sei-cosmos to OpenTelemetry (PLT-353) May 19, 2026
@amir-deris amir-deris requested review from bdchatham and masih May 19, 2026 23:48
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bf58610. Configure here.

defer telemetry.MeasureSinceWithLabels([]string{"abci", "query"}, time.Now(), []metrics.Label{{Name: "path", Value: req.Path}})
queryStart := time.Now()
defer func() {
baseappMetrics.abciQueryDuration.Record(ctx, time.Since(queryStart).Seconds(), otelmetric.WithAttributes(attribute.String("path", req.Path)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABCI query path cardinality

High Severity

baseapp_abci_query_duration labels each series with the raw path from RequestQuery, which clients choose freely. That creates an unbounded set of metric attribute combinations and can grow memory use in the OTel metrics backend.

Fix in Cursor Fix in Web

Triggered by learned rule: OTel metrics: guard attribute cardinality and use native types

Reviewed by Cursor Bugbot for commit bf58610. Configure here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a blocker I'm afraid

Cc @amir-deris

},
)
defer func() {
govMetrics.voteTotal.Add(goCtx, 1, otelmetric.WithAttributes(attribute.String("proposal_id", strconv.FormatUint(msg.ProposalId, 10))))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gov proposal_id metric labels

Medium Severity

Vote and deposit counters attach proposal_id from the message as an OTel string attribute. Proposal IDs increase monotonically, so label cardinality grows without bound over the life of a chain.

Additional Locations (2)
Fix in Cursor Fix in Web

Triggered by learned rule: OTel metrics: guard attribute cardinality and use native types

Reviewed by Cursor Bugbot for commit bf58610. Configure here.

upgradeMetrics.planHeight.Record(ctx.Context(), plan.Height, otelmetric.WithAttributes(
attribute.String("name", plan.Name),
attribute.String("info", plan.Info),
))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrade plan info attribute

Medium Severity

plan_height gauges include info from the upgrade plan as an OTel attribute. plan.Info is free-form text set via governance, so each distinct plan can add a unique label combination.

Additional Locations (1)
Fix in Cursor Fix in Web

Triggered by learned rule: OTel metrics: guard attribute cardinality and use native types

Reviewed by Cursor Bugbot for commit bf58610. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 79.79275% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.06%. Comparing base (80a4364) to head (bf58610).

Files with missing lines Patch % Lines
sei-cosmos/x/gov/keeper/msg_server.go 70.58% 10 Missing ⚠️
sei-cosmos/baseapp/abci.go 85.71% 6 Missing ⚠️
sei-cosmos/baseapp/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/store/types/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/storev2/rootmulti/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/tasks/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/x/auth/vesting/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/x/bank/keeper/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/x/distribution/keeper/metrics.go 50.00% 1 Missing and 1 partial ⚠️
sei-cosmos/x/gov/keeper/metrics.go 50.00% 1 Missing and 1 partial ⚠️
... and 4 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3467      +/-   ##
==========================================
+ Coverage   59.05%   59.06%   +0.01%     
==========================================
  Files        2188     2199      +11     
  Lines      182088   182234     +146     
==========================================
+ Hits       107530   107639     +109     
- Misses      64925    64951      +26     
- Partials     9633     9644      +11     
Flag Coverage Δ
sei-chain-pr 72.65% <79.79%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-cosmos/baseapp/baseapp.go 75.78% <100.00%> (+0.48%) ⬆️
sei-cosmos/store/types/cache.go 72.22% <100.00%> (+0.52%) ⬆️
sei-cosmos/store/types/gas.go 91.66% <100.00%> (+0.21%) ⬆️
sei-cosmos/storev2/rootmulti/store.go 65.78% <100.00%> (+0.98%) ⬆️
sei-cosmos/tasks/scheduler.go 95.93% <100.00%> (+0.02%) ⬆️
sei-cosmos/x/auth/vesting/msg_server.go 78.57% <100.00%> (+0.79%) ⬆️
sei-cosmos/x/bank/keeper/msg_server.go 83.05% <100.00%> (+0.29%) ⬆️
sei-cosmos/x/staking/keeper/msg_server.go 78.71% <100.00%> (+0.52%) ⬆️
sei-cosmos/x/upgrade/abci.go 80.88% <100.00%> (+2.54%) ⬆️
sei-cosmos/x/upgrade/keeper/keeper.go 92.82% <100.00%> (+0.12%) ⬆️
... and 14 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants