From eac7298b0facf1b68c8f2615b9504e254ae58aa3 Mon Sep 17 00:00:00 2001
From: Karl Gyllstrom <gylls@meta.com>
Date: Mon, 9 Mar 2026 11:14:18 -0700
Subject: [PATCH] Fix [[nodiscard]] build errors and BUCK deps across comms,
 gloo, caffe2 (#494)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Summary:
Pull Request resolved: https://github.com/pytorch/gloo/pull/494

X-link: https://github.com/meta-pytorch/torchcomms/pull/960

X-link: https://github.com/pytorch/pytorch/pull/176671

ROCm 7.0+ HIP headers annotate API functions (hipStreamDestroy,
hipMemcpyAsync, hipStreamSynchronize, hipSetDevice, hipGetDevice, hipFree,
hipHostUnregister, hipDeviceEnablePeerAccess, cuGetErrorString) with
[[nodiscard]]. Combined with -Werror, this causes build failures wherever
return values are discarded.

Originally discovered building with ROCm 7.2 headers, but confirmed to
also affect ROCm 7.0 builds (reported independently by yvliu and hqguo).
The [[nodiscard]] attribute is present in both ROCm 7.0 and 7.2 HIP
headers — the fix is the same for both versions.

Changes:
- Add (void) casts to suppress [[nodiscard]] warnings across comms/
  (tcp_devmem, ctran, rcclx), gloo/, and caffe2/ (nativert) — 12 C++ files
- Fix BUCK dependency issues in comms/tcp_devmem/nccl (replace devmgr-client
  with common:common) and comms/tcp_devmem/unpack (explicit glog dep path)
  that surface when building these targets under ROCm constraints

The (void) casts are no-ops on CUDA and older ROCm — safe to land
regardless of ROCm version.

Reviewed By: bbeckca

Differential Revision: D93759269
---
 gloo/cuda_collectives_native.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gloo/cuda_collectives_native.h b/gloo/cuda_collectives_native.h
index e6c45c401..dbc62cb3f 100644
--- a/gloo/cuda_collectives_native.h
+++ b/gloo/cuda_collectives_native.h
@@ -83,7 +83,7 @@ class CudaLocalNativeReduce : public LocalOp<T> {
 
         // Enable peer access for devA to memory on devB
         CUDA_CHECK(cudaSetDevice(devA));
-        cudaDeviceEnablePeerAccess(devB, 0);
+        (void)cudaDeviceEnablePeerAccess(devB, 0);
 
         // Use cudaGetLastError so that any error is cleared.
         auto err = cudaGetLastError();
@@ -196,7 +196,7 @@ class CudaLocalNativeBroadcast : public LocalOp<T> {
 
         // Enable peer access for devA to memory on devB
         CUDA_CHECK(cudaSetDevice(devA));
-        cudaDeviceEnablePeerAccess(devB, 0);
+        (void)cudaDeviceEnablePeerAccess(devB, 0);
 
         // Use cudaGetLastError so that any error is cleared.
         auto err = cudaGetLastError();