fix(svm): silence FillRelay txn failures (debug, not error)#3416
Conversation
Recent SDK changes (across-protocol/sdk#1440, #1443) preserve SolanaError through the quorum/fallback layer, which made SvmFillerClient's "Failed to send fill transaction (Solana error code: -32002)" log line fire at error level (with mrkdwn) on every Solana preflight failure. Previously these were swallowed by the isSolanaError(e) === false fallback path, so the alert is effectively a regression. Per ops directive, FillRelay-based errors should be silent: a failed fill is recoverable (another relayer fills it, or this relayer retries on the next loop). Downgrade the two catch sites in executeFillImmediately and executeTxnQueue to debug. simulateQueue's error log is a separate batch-level signal and unchanged. Follow-up: mirror MultiCallerClient's knownRevertReasons / canIgnoreRevertReason pattern with a knownSolanaFillErrorCodes helper, so EVM and SVM filler implementations gate alerting symmetrically.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 14b8d5f9bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P1 feedback: the blanket error→debug downgrade also silenced unknown failures (RPC outage, signer/wallet, malformed tx) that should still alert. Extract `knownSolanaFillErrorCodes` + `canIgnoreSvmFillError` helper so only `SVM_TRANSACTION_PREFLIGHT_FAILURE` (the recoverable race) logs at debug; unknown failures keep error level + `across-error` notificationPath. Mirrors EVM `canIgnoreRevertReason` in MultiCallerClient. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@codex addressed both P1 comments in Added
Non- Also collapsed the old EVM parity follow-up (extending the known-set, plumbing |
|
Addressed Codex's two P1 comments — both flagged the same issue: the blanket Fix (
|
|
No follow-up code changes were needed in this workspace: the PR branch already contains commit Because there were no new edits to commit, I did not create an additional commit and did not call Checks run
|
|
Codex confirmed |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Codex re-reviewed the gating fix and found no issues. PR is clean from automated review — waiting on human review. |
Add an `SvmTxKind` discriminator ("fillRelay" | "slowFillRequest") to
each queued tx and require `kind === "fillRelay"` in
`canIgnoreSvmFillError` alongside the existing code check. Slow-fill
request preflight failures now stay at error level — they're rarer and
worth alerting on, and "method name = fillRelay" was the gating
dimension Paul explicitly asked for.
Public signature change: `enqueueFillRelayTxPromises` now takes
`SvmTxKind` between `txPromises` and `message`. `enqueueFill` /
`enqueueSlowFill` set it internally; Relayer.ts call sites unchanged.
`executeFillImmediately` is fillRelay-only by construction so passes
the literal at the call site.
Builds on the prior commit (which introduced `canIgnoreSvmFillError`
and notificationPath symmetry). Now both gating dimensions are
explicit: tx kind + benign error code, mirroring the precision of
MultiCallerClient's revert-reason gating.
|
@paul addressed in Suppression now requires both gates:
Behavior matrix:
Public API change: Typecheck + lint both clean locally. The |
pxrl flagged: the prior commit's `kind === "fillRelay"` gate excluded slowFillRequest from suppression, so a slow-fill request that hit a known-benign Solana preflight failure still alerted. Both kinds can race the same way (another actor wins, or our outer loop retries), so gate on the error code only. Keep the SvmTxKind discriminator + structured log field — still useful diagnostically when triaging a failure post-hoc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
pxrl flagged that
|
|
pxrl approved with no further comments. PR is good to merge. |
Summary
Recent SDK changes — across-protocol/sdk#1440 and #1443 — preserve
SolanaErrortype information throughQuorumFallbackSolanaRpcFactory. As a side effect,SvmFillerClient'sFailed to send fill transaction (Solana error code: -32002)log line now consistently fires at error level withmrkdwnset on every Solana preflight failure (most commonly: a competing relayer landed the fill first, so our preflight rejects). Previously the wrappedSolanaErrorfailed theisSolanaError(e)typeguard and fell through into a quieter branch, so the alert is effectively a regression.Operational context
Last 24h on
zion-across-relayer-primarysaw 8 distinct-32002alerts to Slack. Cross-referenced each against the indexer's finalfillTxHash:In all 8 cases the deposit was filled within ~6 minutes. There is no user-visible incident — these alerts are spamming
#alertsfor what is, by design, a recoverable outcome.Change
Downgrade
this.logger.error→this.logger.debugat the two FillRelay-failure catch sites insrc/clients/SvmFillerClient.ts:executeFillImmediately(~L174)executeTxnQueue(~L256)simulateQueue'slogger.errorwithnotificationPath: "across-error"is unchanged — it's a distinct batch-level signal about simulation health.mrkdwnis retained on the (now debug-level) line so deposit context is preserved for log-trawling diagnosis. Per ops directive, FillRelay-based errors should be silent (a failed fill is recoverable: another relayer fills, or our outer loop re-picks).Symmetry follow-up (not in this PR)
The EVM filler (
MultiCallerClient) already encodes "this revert is fine, don't alert" viaknownRevertReasons("RelayFilled","relay filled", ...) +canIgnoreRevertReason(): matching reverts log atdebug; unexpected ones log aterrorwithnotificationPath. SVM has no equivalent —SvmFillerClienttreats every fill failure as the same class today.To bring EVM and SVM to parity, a follow-up should:
knownSolanaFillErrorCodesconstant (start:[SVM_TRANSACTION_PREFLIGHT_FAILURE]) and acanIgnoreSvmFillError(e)helper that mirrorscanIgnoreRevertReason.debugfor known-benign failures,warn/errorfor unexpected ones (wallet / RPC outage / unrecognised codes).error.data.err,error.data.logs) from the@solana/rpc-transformerswrapper into the catch block so fine-grained gating becomes possible — e.g. distinguishingFillStatus already filled(silent) fromcompute budget exceeded(worth alerting on).(2) and (3) are deliberately deferred to keep this PR minimal and immediately deployable.
Test plan
-32002(preflight failure): no Slack alert; appears at debug level withmrkdwncontext preserved."Filled v3 deposit on SVM 🚀"still logs at info.simulateQueuefailure path: unchanged — still alerts vianotificationPath: "across-error".Originated from a Slack thread investigating a
-32002alert for deposit4363585. Full triage transcript including the regression diagnosis is in-thread.