New complex acos function. by s-oboyle · Pull Request #9096 · NVIDIA/cccl

s-oboyle · 2026-05-21T13:57:08Z

Unlike asin and atan, acos needs more than just a call to the equivalent inverse-hyperbolic function.
Doing it this way fixes all the under/overflow issues.

Perf

We have a slightly suspicious result here. Despite this being basically a wrapper around (the fairly large) acosh with an extra fma and some sign flips, it has much less perf than acosh, as seen here, where I would expect them to be quite similar.
We could have either hit a register usage boundary, or maybe acosh hasn't been inlined.
Also possible is the values that math_bench test (which try to guess real-life usage) now hit a slowpath in acosh more often.
To be investigated, as this is nearly ~1.5x slower than anticipated.

Operations/SM/cycle:
cacos():

H100	old	new	new/old
fp64	0.1518	0.1430	0.94
fp32	0.4461	0.4072	0.91

Correctness

GPU fp64:
Max ulp real error (4.772,0.3392) @ (4174.277773,1.009244847)	(0x40b04e471c22e769,0x3ff025ddec967ac6)
	Ours = (0.0002417771195,-9.029843829)    Ref = (0.0002417771195,-9.029843829)
	Ours = (0x3f2fb0b1a45a9274,0xc0220f47b0bd75df)               Ref = (0x3f2fb0b1a45a926f,0xc0220f47b0bd75df)

Max ulp imag error (0.7242,4.233) @ (-4.243991582e-314,0.0009757882705)	(0x8000000200000000,0x3f4ff9815ad08602)
	Ours = (1.570796327,-0.0009757881156)    Ref = (1.570796327,-0.0009757881156)
	Ours = (0x3ff921fb54442d19,0xbf4ff98105af1dab)               Ref = (0x3ff921fb54442d18,0xbf4ff98105af1daf)

GPU fp32:
Max ulp real error (4.768,0.2087) @ (1.67785461e+35,4.062130506e+31)	(0x7a0141d9,0x74002da1)
	Ours = (0.0002421027166,-81.80113983)    Ref = (0.0002421026438,-81.80113983)
	Ours = (0x397ddcf4,0xc2a39a2f)               Ref = (0x397ddcef,0xc2a39a2f)

Max ulp imag error (0.6333,4.976) @ (-2.802596929e-45,0.007756583858)	(0x80000002,0x3bfe2af1)
	Ours = (1.570796251,-0.007756503765)    Ref = (1.570796371,-0.007756506093)
	Ours = (0x3fc90fda,0xbbfe2a45)               Ref = (0x3fc90fdb,0xbbfe2a4a)

CPU fp64:
Max ulp real error (4.855,0.004554) @ (8476.368757,1.467965354e-247)    (0x40c08e2f336b8800,0xcb06c1784f32580)
        Ours = (1.731832824e-251,-9.738184602)    Ref = (1.731832824e-251,-9.738184602)
        Ours = (0xbdfbe1a4050c7e0,0xc02379f35503c149)               Ref = (0xbdfbe1a4050c7db,0xc02379f35503c149)

Max ulp imag error (0.7242,4.184) @ (-0,0.003891582592) (0x8000000000000000,0x3f6fe13d7ebcb800)
        Ours = (1.570796327,-0.003891572769)    Ref = (1.570796327,-0.003891572769)
        Ours = (0x3ff921fb54442d19,0xbf6fe13838bc3a57)               Ref = (0x3ff921fb54442d18,0xbf6fe13838bc3a53)

CPU fp32:
Max ulp real error (5.348,0.09948) @ (66712.41406,-8356.245117) (0x47824c35,0xc60290fb)
        Ours = (0.1246087849,11.80907726)    Ref = (0.1246087477,11.80907726)
        Ours = (0x3dff32e4,0x413cf1fb)               Ref = (0x3dff32df,0x413cf1fb)

Max ulp imag error (0.1838,3.693) @ (-0.4162324369,-0.1135988012)       (0xbed51c6b,0xbde8a67d)
        Ours = (1.996576786,0.1244143695)    Ref = (1.996576786,0.1244143993)
        Ours = (0x3fff8fd4,0x3dfeccf6)               Ref = (0x3fff8fd4,0x3dfeccfa)

copy-pr-bot · 2026-05-21T13:57:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-21T14:02:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 868d7bcb-fb59-43e1-ba9b-e9325a35558b

📥 Commits

Reviewing files that changed from the base of the PR and between 3f2087c and 01ee399.

📒 Files selected for processing (1)

libcudacxx/include/cuda/std/__complex/inverse_trigonometric_functions.h

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Streamlined the internal implementation of the inverse cosine function for complex numbers in the CUDA standard library, improving computational efficiency and reducing code dependencies while maintaining equivalent behavior.

important:

Walkthrough

The cuda::std::acos implementation for complex numbers is refactored to replace explicit special-case branches with sign-normalized arithmetic: extract signbits, compute acosh on absolute components, apply conditional pi correction via fma for negative real inputs, and restore original signs.

Changes

Complex acos arithmetic optimization

Layer / File(s)	Summary
Header includes updated `libcudacxx/include/cuda/std/__complex/inverse_trigonometric_functions.h`	Added `__cmath/fma.h`, retained predicate headers (`isinf`, `isnan`, `signbit`), removed headers from the previous `log/sqrt` code path.
acos(complex<_Tp>) sign-normalized acosh computation `libcudacxx/include/cuda/std/__complex/inverse_trigonometric_functions.h`	Rewrote `acos(const complex<_Tp>&)` to use signbit/fabs normalization, compute `acosh` on magnitudes, reconstruct real/imag per quadrant, conditionally apply a pi high/low `fma` correction when original real<0, and flip imag sign for original imag<0. Removed prior explicit isinf/isnan/zero checks and `log + sqrt` path.

Suggested reviewers

davebayer
fbusato

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fa3d28f3-03af-4c48-9c2d-954c73bc2ca3

📥 Commits

Reviewing files that changed from the base of the PR and between 52f2794 and 3f2087c.

📒 Files selected for processing (1)

libcudacxx/include/cuda/std/__complex/inverse_trigonometric_functions.h

s-oboyle · 2026-05-21T14:29:15Z

/ok to test 01ee399

github-actions · 2026-05-21T16:24:21Z

🥳 CI Workflow Results

🟩 Finished in 1h 53m: Pass: 100%/116 | Total: 2d 18h | Max: 1h 15m | Hits: 70%/516144

See results here.

New comlex acos function.

3f2087c

s-oboyle requested a review from a team as a code owner May 21, 2026 13:57

s-oboyle requested a review from griwes May 21, 2026 13:57

github-project-automation Bot added this to CCCL May 21, 2026

github-project-automation Bot moved this to Todo in CCCL May 21, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 21, 2026

Cleanup

0cc59a5

s-oboyle requested review from davebayer, fbusato and miscco and removed request for griwes May 21, 2026 14:01

Merge branch 'main' into complex_acos_accuracy_refinement

e7e02f5

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread libcudacxx/include/cuda/std/__complex/inverse_trigonometric_functions.h Outdated

Remove old unused variable

01ee399

davebayer approved these changes May 21, 2026

View reviewed changes

s-oboyle merged commit c12def0 into NVIDIA:main May 21, 2026
134 of 137 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New complex acos function.#9096

New complex acos function.#9096
s-oboyle merged 4 commits into
NVIDIA:mainfrom
s-oboyle:complex_acos_accuracy_refinement

s-oboyle commented May 21, 2026

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

s-oboyle commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

s-oboyle commented May 21, 2026

Perf

Correctness

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

s-oboyle commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 53m: Pass: 100%/116 | Total: 2d 18h | Max: 1h 15m | Hits: 70%/516144

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 21, 2026 •

edited

Loading