From 62e367cc17db5b24cafd64cb04c40e72df866d41 Mon Sep 17 00:00:00 2001
From: Thijs Vogels <thijs.vogels@microsoft.com>
Date: Tue, 26 May 2026 12:57:24 +0000
Subject: [PATCH] docs: note CUDA_VISIBLE_DEVICES workaround for multi-GPU
 systems

Importing gpu4pyscf allocates memory on every visible CUDA device, which
conflicts with PyTorch and with other processes sharing those GPUs (e.g.
in MPI-parallel workloads). Document the CUDA_VISIBLE_DEVICES workaround
and link to the upstream tracking issue.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 README.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/README.md b/README.md
index f0ec1d0..ff5e117 100644
--- a/README.md
+++ b/README.md
@@ -123,6 +123,26 @@ ks = SkalaKS(mol, xc="skala-1.1")
 ks.kernel()
 ```
 
+### Known issue: multiple visible GPUs
+
+Skala uses a single GPU, but importing `gpu4pyscf` allocates memory on **every**
+visible CUDA device. This can conflict with PyTorch and with other processes
+sharing those GPUs (e.g. in MPI-parallel workloads).
+
+Restrict CUDA to one device **before** launching Python:
+
+```bash
+CUDA_VISIBLE_DEVICES=0 python my_script.py
+```
+
+For MPI-parallel runs, assign one GPU per local rank:
+
+```bash
+mpirun -np 4 bash -c 'CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK python my_script.py'
+```
+
+Tracked upstream at [pyscf/gpu4pyscf#435](https://github.com/pyscf/gpu4pyscf/issues/435).
+
 ## Getting started: ASE calculator
 
 Skala also provides an [ASE](https://wiki.fysik.dtu.dk/ase/) calculator for energy, force, and geometry optimization workflows: