Skip to content

[BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern#19568

Open
swjng wants to merge 3 commits into
apache:mainfrom
swjng:fix/llvm-math-overflow
Open

[BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern#19568
swjng wants to merge 3 commits into
apache:mainfrom
swjng:fix/llvm-math-overflow

Conversation

@swjng
Copy link
Copy Markdown
Contributor

@swjng swjng commented May 15, 2026

Summary

Six LLVM legalize rules in src/target/llvm/intrin_rule_llvm.cc use inline mathematical identities that fail on representable inputs because the intermediate computation overflows or cancels, even though the true result is in float32 range:

Op Inline form Failure True result
sinh/cosh (#19559) (exp(x) ± exp(-x)) / 2 exp(89) > FLT_MAX, intermediate is inf sinh(89) ≈ 2.24e38
atan (#19560) asin(x / sqrt(x²+1)) overflows for ` x
asinh (#19561) log(x + sqrt(x²+1)) same overflow → log(inf)=inf asinh(3e22) ≈ 52.45
erf (#19562) A&S 1 − poly(t)·exp(−x²) poly·exp(−x²) ≈ 1 for tiny ` x
acosh (no issue) log(x + sqrt(x²−1)) same overflow → inf acosh(3e22) ≈ 52.45

acosh was not in the original issue cluster but shows the identical bug pattern to asinh; folding it in keeps this PR's scope consistent ("naive math identity → libm extern"). Happy to split it out if reviewers prefer.

Fix

Route all six through the existing DispatchPureExtern<FloatSuffix> helper — i.e. sinhf, coshf, atanf, asinhf, acoshf, erff — the same pattern asin/acos use after #19567. ULP-grade accuracy across the reported ranges.

sinh(89.0):    ORT=2.244806e+38  TVM=2.244806e+38  (was inf)
atan(3e22):    ORT=1.5707964     TVM=1.5707963     (was 0.0)
asinh(3e22):   ORT=52.44863      TVM=52.44863      (was inf)
acosh(3e22):   ORT=52.44863      TVM=52.44863      (was inf)
erf(3e-12):    ORT=3.385e-12     TVM=3.385e-12     (was 0.0)

Atan is re-enabled in test_unary; the overflow that previously broke it is fixed.

Notes for reviewers

Inline-vs-extern decision. If the inline identities were a deliberate fast-path (e.g. for autovectorization or to avoid extern-call overhead in tight loops), please flag it and I'll switch to stable inline forms instead — exp(x − ln 2) ± exp(−x − ln 2) for sinh/cosh, range-reduced asinh/acosh sign(x)·log(2|x|) for large |x|, small-|x| Taylor branch for erf, etc. I could not find evidence of such intent in the git history (sinh/cosh: original commit; atan/asinh/acosh: #17945 / #17969 follow-ups; erf: #18104 was framed as "more precise than tanh-approx", not "fast inline").

Fixes #19559.
Fixes #19560.
Fixes #19561.
Fixes #19562.

…tern

Five LLVM legalize rules used inline mathematical identities that fail
on representable inputs because the intermediate computation overflows
or cancels, even though the true result is in range:

- `sinh`/`cosh`: `(exp(x) ± exp(-x)) / 2` — `exp(89) > FLT_MAX` so
  `exp(x)` itself overflows. True `sinh(89) ≈ 2.24e38` fits in float32.
  (apache#19559)
- `atan`: `asin(x / sqrt(x*x + 1))` — `x*x` overflows for
  `|x| > sqrt(FLT_MAX) ≈ 1.84e19`, then `sqrt(inf) = inf`,
  `x / inf = 0`, `asin(0) = 0`. (apache#19560)
- `asinh`: `log(x + sqrt(x*x + 1))` — same `x*x` overflow. True
  `asinh(3e22) ≈ 52.45`. (apache#19561)
- `erf`: Abramowitz–Stegun `1 - poly(t) * exp(-x*x)` — for small `|x|`,
  `poly * exp(-x*x) ≈ 1` and the subtraction cancels to 0 in float32,
  flushing `erf(3e-12)` to 0 instead of the true `~3.4e-12`. (apache#19562)

All four route through the existing `DispatchPureExtern<FloatSuffix>`
helper (i.e. `sinhf`, `coshf`, `atanf`, `asinhf`, `erff`), the same
pattern already used by `asin`/`acos`. ULP-grade accuracy across
representative ranges; `Atan` is re-enabled in `test_unary` since the
overflow that previously broke it is fixed.

Note for reviewers: if the inline identities were a deliberate
fast-path (e.g. for autovectorization or to avoid extern call overhead
in tight loops), please flag it and I'll switch to stable inline forms
(`exp(x − ln 2) ± exp(−x − ln 2)` for sinh/cosh, range-reduced asinh,
small-x Taylor for erf, etc.). I could not find evidence of such intent
in the git history.

Acosh shows the same `sqrt(x*x − 1)` overflow pattern but is not
covered by any of the listed issues; happy to include it as a follow-up
if maintainers want.

Fixes apache#19559.
Fixes apache#19560.
Fixes apache#19561.
Fixes apache#19562.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the LLVM legalization rules for several intrinsic operations, including cosh, sinh, atan, asinh, and erf, by replacing manual mathematical expansions with a unified DispatchPureExtern call. Additionally, the Atan operator has been enabled in the ONNX frontend tests. The reviewer pointed out that the using namespace intrin; statements, local call variable assignments, and manual null checks are redundant in the new implementation because DispatchPureExtern is fully qualified and performs its own internal validation.

Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
swjng added 2 commits May 15, 2026 16:41
`acosh` has the same `sqrt(x*x - 1)` overflow pattern as `asinh`:
intermediate `x*x` overflows float32 for `|x| > sqrt(FLT_MAX) ≈ 1.84e19`,
so `sqrt(inf) = inf`, `log(x + inf) = inf`, while the true result
`acosh(3e22) ≈ 52.45` is well within range. No issue was filed for this
op but the bug is identical to apache#19561 and the fix is the same.
…rappers

Per review feedback: with the inline math identities gone, the
`DispatchPureExtern<FloatSuffix>` call is fully qualified
(`::tvm::codegen::intrin::...`) and the `using namespace intrin;` line
inside each lambda no longer brings anything into scope. Drop it from
the six ops touched in this PR (sinh, cosh, atan, asinh, acosh, erf).
The `CallNode* call` ICHECK is kept for parity with the rest of the
file (every legalize lambda in this translation unit performs that
check).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant