New complex acos function.#9096
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
important: WalkthroughThe ChangesComplex acos arithmetic optimization
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fa3d28f3-03af-4c48-9c2d-954c73bc2ca3
📒 Files selected for processing (1)
libcudacxx/include/cuda/std/__complex/inverse_trigonometric_functions.h
|
/ok to test 01ee399 |
🥳 CI Workflow Results🟩 Finished in 1h 53m: Pass: 100%/116 | Total: 2d 18h | Max: 1h 15m | Hits: 70%/516144See results here. |
Unlike
asinandatan,acosneeds more than just a call to the equivalent inverse-hyperbolic function.Doing it this way fixes all the under/overflow issues.
Perf
We have a slightly suspicious result here. Despite this being basically a wrapper around (the fairly large)
acoshwith an extrafmaand some sign flips, it has much less perf thanacosh, as seen here, where I would expect them to be quite similar.We could have either hit a register usage boundary, or maybe
acoshhasn't been inlined.Also possible is the values that math_bench test (which try to guess real-life usage) now hit a slowpath in
acoshmore often.To be investigated, as this is nearly ~1.5x slower than anticipated.
Operations/SM/cycle:
cacos():Correctness