Various improvements to the docs by giordano · Pull Request #3030 · JuliaGPU/CUDA.jl

giordano · 2026-02-13T15:49:38Z

I had some...uhm...fun in the last couple of days trying to port some C++ CUDA code to CUDA.jl, and profile it. I dumped into this PR my experience, hoping to make lives of people after me a little bit easier 🙂

…onding C/C++ variables

github-actions · 2026-02-13T15:50:12Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/src/device/intrinsics/indexing.jl b/src/device/intrinsics/indexing.jl
index 5e6209fe3..dd9655911 100644
--- a/src/device/intrinsics/indexing.jl
+++ b/src/device/intrinsics/indexing.jl
@@ -92,62 +92,62 @@ end
 @doc """
     threadIdx()::NamedTuple
 
-Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
+    Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
 """ threadIdx
 @inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z())
 
 @doc """
     blockDim()::NamedTuple
 
-Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
+    Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
 """ blockDim
 @inline blockDim() = (x=blockDim_x(), y=blockDim_y(), z=blockDim_z())
 
 @doc """
     blockIdx()::NamedTuple
 
-Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
+    Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
 """ blockIdx
 @inline blockIdx() = (x=blockIdx_x(), y=blockIdx_y(), z=blockIdx_z())
 
 @doc """
     gridDim()::NamedTuple
 
-Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
+    Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
 """ gridDim
 @inline gridDim() = (x=gridDim_x(), y=gridDim_y(), z=gridDim_z())
 
 @doc """
     blockIdxInCluster()::NamedTuple
 
-Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+    Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based.
 """ blockIdxInCluster
 @inline blockIdxInCluster() = (x=blockIdxInCluster_x(), y=blockIdxInCluster_y(), z=blockIdxInCluster_z())
 
 @doc """
     clusterDim()::NamedTuple
 
-Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
 """ clusterDim
 @inline clusterDim() = (x=clusterDim_x(), y=clusterDim_y(), z=clusterDim_z())
 
 @doc """
     clusterIdx()::NamedTuple
 
-Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+    Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    These indices are 1-based.
 """ clusterIdx
 @inline clusterIdx() = (x=clusterIdx_x(), y=clusterIdx_y(), z=clusterIdx_z())
 
 @doc """
     gridClusterDim()::NamedTuple
 
-Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+    Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
 """ gridClusterDim
 @inline gridClusterDim() = (x=gridClusterDim_x(), y=gridClusterDim_y(), z=gridClusterDim_z())
 
@@ -155,7 +155,7 @@ Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`
     linearBlockIdxInCluster()::Int32
 
 Returns the linear block index within the cluster.
-These indices are 1-based.
+    These indices are 1-based.
 """ linearBlockIdxInCluster
 @eval @inline $(:linearBlockIdxInCluster)() = _index($(Val(Symbol("cluster.ctarank"))), $(Val(0:max_cluster_length-1))) + 1i32
 
@@ -170,7 +170,7 @@ Returns the linear cluster size (in blocks).
     warpsize()::Int32
 
 Returns the warp size (in threads).
-This corresponds to the `warpSize` built-in variable in the C/C++ extension.
+    This corresponds to the `warpSize` built-in variable in the C/C++ extension.
 """ warpsize
 @inline warpsize() = ccall("llvm.nvvm.read.ptx.sreg.warpsize", llvmcall, Int32, ())
 
@@ -178,7 +178,7 @@ This corresponds to the `warpSize` built-in variable in the C/C++ extension.
     laneid()::Int32
 
 Returns the thread's lane within the warp.
-This ID is 1-based.
+    This ID is 1-based.
 """ laneid
 @inline laneid() = ccall("llvm.nvvm.read.ptx.sreg.laneid", llvmcall, Int32, ()) + 1i32

github-actions

CUDA.jl Benchmarks

Details

Benchmark suite	Current: `69236fb`	Previous: `e260b92`	Ratio
`latency/precompile`	`4073991756` ns	`4060336008` ns	`1.00`
`latency/ttfp`	`14097056796` ns	`14188556469` ns	`0.99`
`latency/import`	`3572377951` ns	`3555441335` ns	`1.00`
`integration/volumerhs`	`9444918.5` ns	`9440665.5` ns	`1.00`
`integration/byval/slices=1`	`145945` ns	`145820` ns	`1.00`
`integration/byval/slices=3`	`422989.5` ns	`422996` ns	`1.00`
`integration/byval/reference`	`143993` ns	`143940` ns	`1.00`
`integration/byval/slices=2`	`284659` ns	`284595` ns	`1.00`
`integration/cudadevrt`	`102612` ns	`102603` ns	`1.00`
`kernel/indexing`	`13380` ns	`13331` ns	`1.00`
`kernel/indexing_checked`	`13998.5` ns	`14078` ns	`0.99`
`kernel/occupancy`	`664.1677018633541` ns	`692.6139240506329` ns	`0.96`
`kernel/launch`	`2244.4444444444443` ns	`2098.5555555555557` ns	`1.07`
`kernel/rand`	`14734` ns	`15622` ns	`0.94`
`array/reverse/1d`	`18638` ns	`18269` ns	`1.02`
`array/reverse/2dL_inplace`	`66108` ns	`66029` ns	`1.00`
`array/reverse/1dL`	`69150` ns	`68802` ns	`1.01`
`array/reverse/2d`	`21119` ns	`20617` ns	`1.02`
`array/reverse/1d_inplace`	`8636` ns	`10283.666666666666` ns	`0.84`
`array/reverse/2d_inplace`	`10351` ns	`10353` ns	`1.00`
`array/reverse/2dL`	`73052.5` ns	`72617` ns	`1.01`
`array/reverse/1dL_inplace`	`66004` ns	`65907` ns	`1.00`
`array/copy`	`18680` ns	`18749` ns	`1.00`
`array/iteration/findall/int`	`150479` ns	`149387.5` ns	`1.01`
`array/iteration/findall/bool`	`132426` ns	`132253.5` ns	`1.00`
`array/iteration/findfirst/int`	`84111` ns	`83271.5` ns	`1.01`
`array/iteration/findfirst/bool`	`81929` ns	`81441` ns	`1.01`
`array/iteration/scalar`	`67243` ns	`69131` ns	`0.97`
`array/iteration/logical`	`201055.5` ns	`199952` ns	`1.01`
`array/iteration/findmin/1d`	`90426` ns	`86816.5` ns	`1.04`
`array/iteration/findmin/2d`	`118065.5` ns	`117208` ns	`1.01`
`array/reductions/reduce/Int64/1d`	`43892` ns	`43408` ns	`1.01`
`array/reductions/reduce/Int64/dims=1`	`46449.5` ns	`43024` ns	`1.08`
`array/reductions/reduce/Int64/dims=2`	`60070` ns	`59829` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`87812` ns	`87729` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`84937` ns	`84578` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`35205` ns	`35224` ns	`1.00`
`array/reductions/reduce/Float32/dims=1`	`48235.5` ns	`40532` ns	`1.19`
`array/reductions/reduce/Float32/dims=2`	`57313` ns	`56836` ns	`1.01`
`array/reductions/reduce/Float32/dims=1L`	`52043` ns	`51874` ns	`1.00`
`array/reductions/reduce/Float32/dims=2L`	`69923` ns	`69617.5` ns	`1.00`
`array/reductions/mapreduce/Int64/1d`	`43350` ns	`43343` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1`	`43301.5` ns	`42594` ns	`1.02`
`array/reductions/mapreduce/Int64/dims=2`	`60015` ns	`59634` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=1L`	`88049.5` ns	`87814` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2L`	`84985` ns	`84815` ns	`1.00`
`array/reductions/mapreduce/Float32/1d`	`34814` ns	`34828` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=1`	`40509.5` ns	`39897` ns	`1.02`
`array/reductions/mapreduce/Float32/dims=2`	`57186` ns	`56752` ns	`1.01`
`array/reductions/mapreduce/Float32/dims=1L`	`51853` ns	`51768.5` ns	`1.00`
`array/reductions/mapreduce/Float32/dims=2L`	`69785` ns	`69310` ns	`1.01`
`array/broadcast`	`20787` ns	`20615` ns	`1.01`
`array/copyto!/gpu_to_gpu`	`11358` ns	`11301` ns	`1.01`
`array/copyto!/cpu_to_gpu`	`218837` ns	`216699` ns	`1.01`
`array/copyto!/gpu_to_cpu`	`284192` ns	`284359.5` ns	`1.00`
`array/accumulate/Int64/1d`	`118889` ns	`118782` ns	`1.00`
`array/accumulate/Int64/dims=1`	`80273` ns	`80255` ns	`1.00`
`array/accumulate/Int64/dims=2`	`156141` ns	`156856` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1705695` ns	`1704288.5` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`961718.5` ns	`961419` ns	`1.00`
`array/accumulate/Float32/1d`	`101456.5` ns	`101642` ns	`1.00`
`array/accumulate/Float32/dims=1`	`77050` ns	`76595` ns	`1.01`
`array/accumulate/Float32/dims=2`	`144136` ns	`144764` ns	`1.00`
`array/accumulate/Float32/dims=1L`	`1587086` ns	`1593525` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`660874` ns	`660030` ns	`1.00`
`array/construct`	`1277.5` ns	`1287.9` ns	`0.99`
`array/random/randn/Float32`	`43764` ns	`43834` ns	`1.00`
`array/random/randn!/Float32`	`31585` ns	`27591` ns	`1.14`
`array/random/rand!/Int64`	`33711` ns	`27841` ns	`1.21`
`array/random/rand!/Float32`	`8674.333333333334` ns	`8461` ns	`1.03`
`array/random/rand/Int64`	`37472` ns	`30522.5` ns	`1.23`
`array/random/rand/Float32`	`13421` ns	`13025` ns	`1.03`
`array/permutedims/4d`	`53048.5` ns	`52112.5` ns	`1.02`
`array/permutedims/2d`	`53071` ns	`52576` ns	`1.01`
`array/permutedims/3d`	`53518` ns	`52685` ns	`1.02`
`array/sorting/1d`	`2735142` ns	`2744009` ns	`1.00`
`array/sorting/by`	`3328331.5` ns	`3314220` ns	`1.00`
`array/sorting/2d`	`1072830` ns	`1071845` ns	`1.00`
`cuda/synchronization/stream/auto`	`1066.1` ns	`1071.4` ns	`1.00`
`cuda/synchronization/stream/nonblocking`	`8135` ns	`8252.9` ns	`0.99`
`cuda/synchronization/stream/blocking`	`850.7313432835821` ns	`852.530303030303` ns	`1.00`
`cuda/synchronization/context/auto`	`1197.7` ns	`1205.3` ns	`0.99`
`cuda/synchronization/context/nonblocking`	`7182.5` ns	`8066.9` ns	`0.89`
`cuda/synchronization/context/blocking`	`930.7941176470588` ns	`931.074074074074` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

codecov · 2026-02-16T00:59:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.58%. Comparing base (e260b92) to head (69236fb).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3030   +/-   ##
=======================================
  Coverage   90.58%   90.58%           
=======================================
  Files         134      134           
  Lines       11637    11637           
=======================================
  Hits        10541    10541           
  Misses       1096     1096

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maleadt · 2026-03-06T11:12:47Z

-""" threadIdx
-@inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z())
+Returns the dimensions of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+These dimensions have the same starting index as the `gridDim` built-in variable in the C/C++ extension.


gridDim returns a dimension/size, not an index.

Replaced "index" with "dimension" here.

starting dimension doesn't make much sense to me. What else could a size() query return? 0 vs 1-based indexing doesn't apply here.

That said, I'm okay with this if you think this clarifies things.

Maybe it could be phrased along the lines of:

Unlike the `*Idx` intrinsics `gridDim` returns the same value as its C/C++ extension counterpart.

I do think this should be mentioned in form though. The indexing intrinsics being offset while the dim intrinsics not makes sense when you think about it, but I've also gotten confused by this, and not everyone will think/know to check the source code to confirm.

Either way, the same edits the gridDim receives should also be mirrored to blockDim

Co-authored-by: Christian Guinard <28689358+christiangnrd@users.noreply.github.com>

giordano · 2026-03-18T10:29:20Z

Bump? I expanded also doscstrings introduced in #3017.

giordano · 2026-03-26T13:45:33Z

Bump. This PR keeps running into conflicts with other PRs which are merged in the meantime... 🫠

maleadt · 2026-04-09T11:22:41Z

Sorry, forgot about this. Thanks!

giordano added 3 commits February 13, 2026 15:00

Update link to new CUDA programming guide

964b509

[docs] Make it crystal clear that some indices different from corresp…

17ea30f

…onding C/C++ variables

[docs] Add more troubleshooting information for Nsight Compute

f6d638b

github-actions Bot reviewed Feb 13, 2026

View reviewed changes

[docs] Try to fix reference

337b7a7

maleadt reviewed Mar 6, 2026

View reviewed changes

giordano and others added 8 commits March 6, 2026 13:24

"index" -> "dimension" for `gridDim

60f4172

Merge remote-tracking branch 'origin/master' into mg/docs

c931395

Clarify language around dimensions

f4c1569

Co-authored-by: Christian Guinard <28689358+christiangnrd@users.noreply.github.com>

Merge remote-tracking branch 'origin/master' into mg/docs

d15abef

Merge branch 'master' into mg/docs

0aef286

Merge branch 'master' into mg/docs

a3411fd

Merge remote-tracking branch 'origin/master' into mg/docs

7120172

Expand docstrings for new dimensions/indices

f3c0846

maleadt force-pushed the master branch from f1e7455 to 5a6f767 Compare March 26, 2026 08:13

Merge remote-tracking branch 'origin/master' into mg/docs

69236fb

maleadt merged commit 5f45772 into JuliaGPU:master Apr 9, 2026
2 checks passed

giordano deleted the mg/docs branch April 9, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various improvements to the docs#3030

Various improvements to the docs#3030
maleadt merged 13 commits intoJuliaGPU:masterfrom
giordano:mg/docs

giordano commented Feb 13, 2026

Uh oh!

github-actions Bot commented Feb 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

codecov Bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

maleadt Mar 6, 2026

Uh oh!

giordano Mar 6, 2026

Uh oh!

maleadt Mar 6, 2026

Uh oh!

christiangnrd Mar 7, 2026 •

edited

Loading

Uh oh!

giordano Mar 7, 2026

Uh oh!

giordano commented Mar 18, 2026

Uh oh!

giordano commented Mar 26, 2026

Uh oh!

maleadt commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

giordano commented Feb 13, 2026

Uh oh!

github-actions Bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

codecov Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maleadt Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

giordano Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

maleadt Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

christiangnrd Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giordano Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

giordano commented Mar 18, 2026

Uh oh!

giordano commented Mar 26, 2026

Uh oh!

maleadt commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Feb 13, 2026 •

edited

Loading

github-actions Bot left a comment •

edited

Loading

codecov Bot commented Feb 16, 2026 •

edited

Loading

christiangnrd Mar 7, 2026 •

edited

Loading