Skip to content

fix(svm): silence FillRelay txn failures (debug, not error)#3416

Merged
pxrl merged 4 commits into
masterfrom
droplet/T90K0AL22-C07063X25NG-1779772982-896759
May 26, 2026
Merged

fix(svm): silence FillRelay txn failures (debug, not error)#3416
pxrl merged 4 commits into
masterfrom
droplet/T90K0AL22-C07063X25NG-1779772982-896759

Conversation

@droplet-rl
Copy link
Copy Markdown
Contributor

Summary

Recent SDK changes — across-protocol/sdk#1440 and #1443 — preserve SolanaError type information through QuorumFallbackSolanaRpcFactory. As a side effect, SvmFillerClient's Failed to send fill transaction (Solana error code: -32002) log line now consistently fires at error level with mrkdwn set on every Solana preflight failure (most commonly: a competing relayer landed the fill first, so our preflight rejects). Previously the wrapped SolanaError failed the isSolanaError(e) typeguard and fell through into a quieter branch, so the alert is effectively a regression.

Operational context

Last 24h on zion-across-relayer-primary saw 8 distinct -32002 alerts to Slack. Cross-referenced each against the indexer's final fillTxHash:

  • 6 / 8 were filled by a competing relayer while our preflight kept failing — i.e. we lost the race and the Slack alert announced it.
  • 2 / 8 were filled by zion itself (one on the next outer-loop iteration — a genuine retry).

In all 8 cases the deposit was filled within ~6 minutes. There is no user-visible incident — these alerts are spamming #alerts for what is, by design, a recoverable outcome.

Change

Downgrade this.logger.errorthis.logger.debug at the two FillRelay-failure catch sites in src/clients/SvmFillerClient.ts:

  • executeFillImmediately (~L174)
  • executeTxnQueue (~L256)

simulateQueue's logger.error with notificationPath: "across-error" is unchanged — it's a distinct batch-level signal about simulation health.

mrkdwn is retained on the (now debug-level) line so deposit context is preserved for log-trawling diagnosis. Per ops directive, FillRelay-based errors should be silent (a failed fill is recoverable: another relayer fills, or our outer loop re-picks).

Symmetry follow-up (not in this PR)

The EVM filler (MultiCallerClient) already encodes "this revert is fine, don't alert" via knownRevertReasons ("RelayFilled", "relay filled", ...) + canIgnoreRevertReason(): matching reverts log at debug; unexpected ones log at error with notificationPath. SVM has no equivalent — SvmFillerClient treats every fill failure as the same class today.

To bring EVM and SVM to parity, a follow-up should:

  1. Extract a knownSolanaFillErrorCodes constant (start: [SVM_TRANSACTION_PREFLIGHT_FAILURE]) and a canIgnoreSvmFillError(e) helper that mirrors canIgnoreRevertReason.
  2. Use it to gate log level: debug for known-benign failures, warn/error for unexpected ones (wallet / RPC outage / unrecognised codes).
  3. Plumb the inner Solana error (error.data.err, error.data.logs) from the @solana/rpc-transformers wrapper into the catch block so fine-grained gating becomes possible — e.g. distinguishing FillStatus already filled (silent) from compute budget exceeded (worth alerting on).

(2) and (3) are deliberately deferred to keep this PR minimal and immediately deployable.

Test plan

  • Solana fill that hits -32002 (preflight failure): no Slack alert; appears at debug level with mrkdwn context preserved.
  • Successful Solana fill: unchanged — "Filled v3 deposit on SVM 🚀" still logs at info.
  • simulateQueue failure path: unchanged — still alerts via notificationPath: "across-error".

Originated from a Slack thread investigating a -32002 alert for deposit 4363585. Full triage transcript including the regression diagnosis is in-thread.

Recent SDK changes (across-protocol/sdk#1440, #1443) preserve
SolanaError through the quorum/fallback layer, which made
SvmFillerClient's "Failed to send fill transaction (Solana error code:
-32002)" log line fire at error level (with mrkdwn) on every Solana
preflight failure. Previously these were swallowed by the
isSolanaError(e) === false fallback path, so the alert is effectively
a regression.

Per ops directive, FillRelay-based errors should be silent: a failed
fill is recoverable (another relayer fills it, or this relayer retries
on the next loop). Downgrade the two catch sites in executeFillImmediately
and executeTxnQueue to debug. simulateQueue's error log is a separate
batch-level signal and unchanged.

Follow-up: mirror MultiCallerClient's knownRevertReasons /
canIgnoreRevertReason pattern with a knownSolanaFillErrorCodes helper,
so EVM and SVM filler implementations gate alerting symmetrically.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14b8d5f9bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/clients/SvmFillerClient.ts Outdated
Comment thread src/clients/SvmFillerClient.ts Outdated
Codex P1 feedback: the blanket error→debug downgrade also silenced
unknown failures (RPC outage, signer/wallet, malformed tx) that should
still alert. Extract `knownSolanaFillErrorCodes` + `canIgnoreSvmFillError`
helper so only `SVM_TRANSACTION_PREFLIGHT_FAILURE` (the recoverable race)
logs at debug; unknown failures keep error level + `across-error`
notificationPath. Mirrors EVM `canIgnoreRevertReason` in MultiCallerClient.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@droplet-rl
Copy link
Copy Markdown
Contributor Author

@codex addressed both P1 comments in b07ed26 (will land on push).

Added knownSolanaFillErrorCodes + canIgnoreSvmFillError(e) helper. Both catch sites (executeFillImmediately, executeTxnQueue) now gate on it:

  • Known-benign (currently just SVM_TRANSACTION_PREFLIGHT_FAILURE = -32002) → debug, no notification.
  • Anything else (RPC outage, signer/wallet, malformed tx build, non-SolanaError thrown) → error + notificationPath: "across-error", same alerting behavior as before.

Non-SolanaError errors (the !isSolanaError(e) branch) also fall through to error since they fail the typeguard. This closes the visibility gap you flagged on RPC/signer/build failures.

Also collapsed the old retryableErrorCodes array into the same knownSolanaFillErrorCodes set — they are the same concept (retry-once on -32002 is just the eager half of "this code is recoverable").

EVM parity follow-up (extending the known-set, plumbing error.data.err/error.data.logs from the wrapper for finer-grained gating) remains deferred per the PR description.

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Addressed Codex's two P1 comments — both flagged the same issue: the blanket error → debug downgrade also silences unknown failures (RPC outage, signer/wallet, malformed tx), not just the benign -32002 race.

Fix (b07ed26 on the PR branch):

  • Added knownSolanaFillErrorCodes = Set([SVM_TRANSACTION_PREFLIGHT_FAILURE]) + canIgnoreSvmFillError(e) helper.
  • Both catch sites now gate on it:
    • Known-benign → debug, no notification (the intended quiet path).
    • Unknown (incl. non-SolanaError) → error + notificationPath: "across-error", same alerting as before.
  • Collapsed the old retryableErrorCodes array into the same set — same concept.

yarn typecheck + yarn lint clean. Posted a @codex mention on the PR to re-trigger review: #3416 (comment)

@chatgpt-codex-connector
Copy link
Copy Markdown

No follow-up code changes were needed in this workspace: the PR branch already contains commit b07ed26, and the requested P1 gating behavior is present in src/clients/SvmFillerClient.ts. Specifically, both catch sites now use canIgnoreSvmFillError(e) to keep -32002 at debug while preserving error+notificationPath: "across-error" for unknown/non-SolanaError failures, and retry logic reuses the same known-code set. src/clients/SvmFillerClient.tsL48-L55 src/clients/SvmFillerClient.tsL168-L189 src/clients/SvmFillerClient.tsL210-L225 src/clients/SvmFillerClient.tsL252-L272

Because there were no new edits to commit, I did not create an additional commit and did not call make_pr (per your constraints).

Checks run

  • cd /workspace/relayer && find .. -name AGENTS.md -print
  • cd /workspace/relayer && git status --short && git log --oneline -n 5 && rg -n "knownSolanaFillErrorCodes|canIgnoreSvmFillError|SVM_TRANSACTION_PREFLIGHT_FAILURE|retryableErrorCodes" src/clients/SvmFillerClient.ts
  • cd /workspace/relayer && nl -ba src/clients/SvmFillerClient.ts | sed -n '40,290p'

View task →

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Codex confirmed b07ed26 addresses both P1 gating concerns — no further code changes needed. Leaving the PR as-is for human review.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Codex re-reviewed the gating fix and found no issues. PR is clean from automated review — waiting on human review.

Add an `SvmTxKind` discriminator ("fillRelay" | "slowFillRequest") to
each queued tx and require `kind === "fillRelay"` in
`canIgnoreSvmFillError` alongside the existing code check. Slow-fill
request preflight failures now stay at error level — they're rarer and
worth alerting on, and "method name = fillRelay" was the gating
dimension Paul explicitly asked for.

Public signature change: `enqueueFillRelayTxPromises` now takes
`SvmTxKind` between `txPromises` and `message`. `enqueueFill` /
`enqueueSlowFill` set it internally; Relayer.ts call sites unchanged.
`executeFillImmediately` is fillRelay-only by construction so passes
the literal at the call site.

Builds on the prior commit (which introduced `canIgnoreSvmFillError`
and notificationPath symmetry). Now both gating dimensions are
explicit: tx kind + benign error code, mirroring the precision of
MultiCallerClient's revert-reason gating.
@droplet-rl
Copy link
Copy Markdown
Contributor Author

@paul addressed in 6aaf087: stacked a kind discriminator (SvmTxKind = "fillRelay" | "slowFillRequest") onto each queued tx and require kind === "fillRelay" inside canIgnoreSvmFillError alongside the existing code check.

Suppression now requires both gates:

  • kind === "fillRelay" (method-name gate — what you asked for)
  • code === SVM_TRANSACTION_PREFLIGHT_FAILURE (known-benign error gate — from b07ed26)

Behavior matrix:

kind code log level notificationPath
fillRelay preflight (-32002) debug
fillRelay non-preflight error across-error
slowFillRequest preflight error across-error
slowFillRequest non-preflight error across-error

Public API change: enqueueFillRelayTxPromises now takes SvmTxKind between txPromises and message. The two specific enqueuers (enqueueFill, enqueueSlowFill) set the kind internally, so Relayer.ts call sites are unchanged. executeFillImmediately is fillRelay-only by construction and passes the literal "fillRelay".

Typecheck + lint both clean locally. The kind is also surfaced on the catch-site log payload so it shows up in log search.

Comment thread src/clients/SvmFillerClient.ts Outdated
pxrl flagged: the prior commit's `kind === "fillRelay"` gate excluded
slowFillRequest from suppression, so a slow-fill request that hit a
known-benign Solana preflight failure still alerted. Both kinds can
race the same way (another actor wins, or our outer loop retries), so
gate on the error code only.

Keep the SvmTxKind discriminator + structured log field — still useful
diagnostically when triaging a failure post-hoc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@droplet-rl
Copy link
Copy Markdown
Contributor Author

pxrl flagged that slowFillRequest was being excluded from the suppression set (the prior kind === "fillRelay" gate kept it loud). Fixed in bc2d66e: dropped the kind gate from canIgnoreSvmFillError so both fillRelay and slowFillRequest qualify on the benign error code. Kept the SvmTxKind discriminator + structured log field for diagnostic value.

yarn typecheck + yarn lint clean. Replied to pxrl's inline comment: #3416 (comment)

@droplet-rl
Copy link
Copy Markdown
Contributor Author

pxrl approved with no further comments. PR is good to merge.

@pxrl pxrl merged commit e344e8a into master May 26, 2026
5 checks passed
@pxrl pxrl deleted the droplet/T90K0AL22-C07063X25NG-1779772982-896759 branch May 26, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants