Various improvements to the docs#3030
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/device/intrinsics/indexing.jl b/src/device/intrinsics/indexing.jl
index 5e6209fe3..dd9655911 100644
--- a/src/device/intrinsics/indexing.jl
+++ b/src/device/intrinsics/indexing.jl
@@ -92,62 +92,62 @@ end
@doc """
threadIdx()::NamedTuple
-Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
+ Returns the thread index within the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based, unlike the `threadIdx` built-in variable in the C/C++ extension which is 0-based.
""" threadIdx
@inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z())
@doc """
blockDim()::NamedTuple
-Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
+ Returns the dimensions (in threads) of the block as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Unlike the `*Idx` intrinsics, `blockDim` returns the same value as its C/C++ extension counterpart.
""" blockDim
@inline blockDim() = (x=blockDim_x(), y=blockDim_y(), z=blockDim_z())
@doc """
blockIdx()::NamedTuple
-Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
+ Returns the block index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based, unlike the `blockIdx` built-in variable in the C/C++ extension which is 0-based.
""" blockIdx
@inline blockIdx() = (x=blockIdx_x(), y=blockIdx_y(), z=blockIdx_z())
@doc """
gridDim()::NamedTuple
-Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
+ Returns the dimensions (in blocks) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Unlike the `*Idx` intrinsics, `gridDim` returns the same value as its C/C++ extension counterpart.
""" gridDim
@inline gridDim() = (x=gridDim_x(), y=gridDim_y(), z=gridDim_z())
@doc """
blockIdxInCluster()::NamedTuple
-Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+ Returns the block index within the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based.
""" blockIdxInCluster
@inline blockIdxInCluster() = (x=blockIdxInCluster_x(), y=blockIdxInCluster_y(), z=blockIdxInCluster_z())
@doc """
clusterDim()::NamedTuple
-Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Returns the dimensions (in blocks) of the cluster as a `NamedTuple` with keys `x`, `y`, and `z`.
""" clusterDim
@inline clusterDim() = (x=clusterDim_x(), y=clusterDim_y(), z=clusterDim_z())
@doc """
clusterIdx()::NamedTuple
-Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
-These indices are 1-based.
+ Returns the cluster index within the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ These indices are 1-based.
""" clusterIdx
@inline clusterIdx() = (x=clusterIdx_x(), y=clusterIdx_y(), z=clusterIdx_z())
@doc """
gridClusterDim()::NamedTuple
-Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
+ Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`, `y`, and `z`.
""" gridClusterDim
@inline gridClusterDim() = (x=gridClusterDim_x(), y=gridClusterDim_y(), z=gridClusterDim_z())
@@ -155,7 +155,7 @@ Returns the dimensions (in clusters) of the grid as a `NamedTuple` with keys `x`
linearBlockIdxInCluster()::Int32
Returns the linear block index within the cluster.
-These indices are 1-based.
+ These indices are 1-based.
""" linearBlockIdxInCluster
@eval @inline $(:linearBlockIdxInCluster)() = _index($(Val(Symbol("cluster.ctarank"))), $(Val(0:max_cluster_length-1))) + 1i32
@@ -170,7 +170,7 @@ Returns the linear cluster size (in blocks).
warpsize()::Int32
Returns the warp size (in threads).
-This corresponds to the `warpSize` built-in variable in the C/C++ extension.
+ This corresponds to the `warpSize` built-in variable in the C/C++ extension.
""" warpsize
@inline warpsize() = ccall("llvm.nvvm.read.ptx.sreg.warpsize", llvmcall, Int32, ())
@@ -178,7 +178,7 @@ This corresponds to the `warpSize` built-in variable in the C/C++ extension.
laneid()::Int32
Returns the thread's lane within the warp.
-This ID is 1-based.
+ This ID is 1-based.
""" laneid
@inline laneid() = ccall("llvm.nvvm.read.ptx.sreg.laneid", llvmcall, Int32, ()) + 1i32
|
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 69236fb | Previous: e260b92 | Ratio |
|---|---|---|---|
latency/precompile |
4073991756 ns |
4060336008 ns |
1.00 |
latency/ttfp |
14097056796 ns |
14188556469 ns |
0.99 |
latency/import |
3572377951 ns |
3555441335 ns |
1.00 |
integration/volumerhs |
9444918.5 ns |
9440665.5 ns |
1.00 |
integration/byval/slices=1 |
145945 ns |
145820 ns |
1.00 |
integration/byval/slices=3 |
422989.5 ns |
422996 ns |
1.00 |
integration/byval/reference |
143993 ns |
143940 ns |
1.00 |
integration/byval/slices=2 |
284659 ns |
284595 ns |
1.00 |
integration/cudadevrt |
102612 ns |
102603 ns |
1.00 |
kernel/indexing |
13380 ns |
13331 ns |
1.00 |
kernel/indexing_checked |
13998.5 ns |
14078 ns |
0.99 |
kernel/occupancy |
664.1677018633541 ns |
692.6139240506329 ns |
0.96 |
kernel/launch |
2244.4444444444443 ns |
2098.5555555555557 ns |
1.07 |
kernel/rand |
14734 ns |
15622 ns |
0.94 |
array/reverse/1d |
18638 ns |
18269 ns |
1.02 |
array/reverse/2dL_inplace |
66108 ns |
66029 ns |
1.00 |
array/reverse/1dL |
69150 ns |
68802 ns |
1.01 |
array/reverse/2d |
21119 ns |
20617 ns |
1.02 |
array/reverse/1d_inplace |
8636 ns |
10283.666666666666 ns |
0.84 |
array/reverse/2d_inplace |
10351 ns |
10353 ns |
1.00 |
array/reverse/2dL |
73052.5 ns |
72617 ns |
1.01 |
array/reverse/1dL_inplace |
66004 ns |
65907 ns |
1.00 |
array/copy |
18680 ns |
18749 ns |
1.00 |
array/iteration/findall/int |
150479 ns |
149387.5 ns |
1.01 |
array/iteration/findall/bool |
132426 ns |
132253.5 ns |
1.00 |
array/iteration/findfirst/int |
84111 ns |
83271.5 ns |
1.01 |
array/iteration/findfirst/bool |
81929 ns |
81441 ns |
1.01 |
array/iteration/scalar |
67243 ns |
69131 ns |
0.97 |
array/iteration/logical |
201055.5 ns |
199952 ns |
1.01 |
array/iteration/findmin/1d |
90426 ns |
86816.5 ns |
1.04 |
array/iteration/findmin/2d |
118065.5 ns |
117208 ns |
1.01 |
array/reductions/reduce/Int64/1d |
43892 ns |
43408 ns |
1.01 |
array/reductions/reduce/Int64/dims=1 |
46449.5 ns |
43024 ns |
1.08 |
array/reductions/reduce/Int64/dims=2 |
60070 ns |
59829 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
87812 ns |
87729 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
84937 ns |
84578 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35205 ns |
35224 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
48235.5 ns |
40532 ns |
1.19 |
array/reductions/reduce/Float32/dims=2 |
57313 ns |
56836 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
52043 ns |
51874 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69923 ns |
69617.5 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
43350 ns |
43343 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
43301.5 ns |
42594 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=2 |
60015 ns |
59634 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
88049.5 ns |
87814 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
84985 ns |
84815 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
34814 ns |
34828 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
40509.5 ns |
39897 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=2 |
57186 ns |
56752 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1L |
51853 ns |
51768.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69785 ns |
69310 ns |
1.01 |
array/broadcast |
20787 ns |
20615 ns |
1.01 |
array/copyto!/gpu_to_gpu |
11358 ns |
11301 ns |
1.01 |
array/copyto!/cpu_to_gpu |
218837 ns |
216699 ns |
1.01 |
array/copyto!/gpu_to_cpu |
284192 ns |
284359.5 ns |
1.00 |
array/accumulate/Int64/1d |
118889 ns |
118782 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80273 ns |
80255 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156141 ns |
156856 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1705695 ns |
1704288.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961718.5 ns |
961419 ns |
1.00 |
array/accumulate/Float32/1d |
101456.5 ns |
101642 ns |
1.00 |
array/accumulate/Float32/dims=1 |
77050 ns |
76595 ns |
1.01 |
array/accumulate/Float32/dims=2 |
144136 ns |
144764 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1587086 ns |
1593525 ns |
1.00 |
array/accumulate/Float32/dims=2L |
660874 ns |
660030 ns |
1.00 |
array/construct |
1277.5 ns |
1287.9 ns |
0.99 |
array/random/randn/Float32 |
43764 ns |
43834 ns |
1.00 |
array/random/randn!/Float32 |
31585 ns |
27591 ns |
1.14 |
array/random/rand!/Int64 |
33711 ns |
27841 ns |
1.21 |
array/random/rand!/Float32 |
8674.333333333334 ns |
8461 ns |
1.03 |
array/random/rand/Int64 |
37472 ns |
30522.5 ns |
1.23 |
array/random/rand/Float32 |
13421 ns |
13025 ns |
1.03 |
array/permutedims/4d |
53048.5 ns |
52112.5 ns |
1.02 |
array/permutedims/2d |
53071 ns |
52576 ns |
1.01 |
array/permutedims/3d |
53518 ns |
52685 ns |
1.02 |
array/sorting/1d |
2735142 ns |
2744009 ns |
1.00 |
array/sorting/by |
3328331.5 ns |
3314220 ns |
1.00 |
array/sorting/2d |
1072830 ns |
1071845 ns |
1.00 |
cuda/synchronization/stream/auto |
1066.1 ns |
1071.4 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
8135 ns |
8252.9 ns |
0.99 |
cuda/synchronization/stream/blocking |
850.7313432835821 ns |
852.530303030303 ns |
1.00 |
cuda/synchronization/context/auto |
1197.7 ns |
1205.3 ns |
0.99 |
cuda/synchronization/context/nonblocking |
7182.5 ns |
8066.9 ns |
0.89 |
cuda/synchronization/context/blocking |
930.7941176470588 ns |
931.074074074074 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3030 +/- ##
=======================================
Coverage 90.58% 90.58%
=======================================
Files 134 134
Lines 11637 11637
=======================================
Hits 10541 10541
Misses 1096 1096 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| """ threadIdx | ||
| @inline threadIdx() = (x=threadIdx_x(), y=threadIdx_y(), z=threadIdx_z()) | ||
| Returns the dimensions of the grid as a `NamedTuple` with keys `x`, `y`, and `z`. | ||
| These dimensions have the same starting index as the `gridDim` built-in variable in the C/C++ extension. |
There was a problem hiding this comment.
gridDim returns a dimension/size, not an index.
There was a problem hiding this comment.
Replaced "index" with "dimension" here.
There was a problem hiding this comment.
starting dimension doesn't make much sense to me. What else could a size() query return? 0 vs 1-based indexing doesn't apply here.
That said, I'm okay with this if you think this clarifies things.
There was a problem hiding this comment.
Maybe it could be phrased along the lines of:
Unlike the `*Idx` intrinsics `gridDim` returns the same value as its C/C++ extension counterpart.
I do think this should be mentioned in form though. The indexing intrinsics being offset while the dim intrinsics not makes sense when you think about it, but I've also gotten confused by this, and not everyone will think/know to check the source code to confirm.
Either way, the same edits the gridDim receives should also be mirrored to blockDim
Co-authored-by: Christian Guinard <28689358+christiangnrd@users.noreply.github.com>
|
Bump? I expanded also doscstrings introduced in #3017. |
|
Bump. This PR keeps running into conflicts with other PRs which are merged in the meantime... 🫠 |
|
Sorry, forgot about this. Thanks! |
I had some...uhm...fun in the last couple of days trying to port some C++ CUDA code to CUDA.jl, and profile it. I dumped into this PR my experience, hoping to make lives of people after me a little bit easier 🙂