Skip to content

Develop#400

Merged
cristibleotiu merged 4 commits intomainfrom
develop
May 6, 2026
Merged

Develop#400
cristibleotiu merged 4 commits intomainfrom
develop

Conversation

@cristibleotiu
Copy link
Copy Markdown
Contributor

No description provided.

toderian and others added 4 commits May 3, 2026 00:23
…t build cost

Collapse the three previous workflows (build_testnet, build_mainnet,
build_gpu) into one workflow per branch:

- build_develop.yml: devnet CPU -> testnet CPU -> devnet GPU -> testnet GPU
- build_main.yml: mainnet CPU -> mainnet GPU (with version git tag)

Steps run sequentially in a single job (one runner, one buildx-cloud
session, one Docker login), and a shared concurrency group prevents a
develop build and a main build from overlapping. Build steps use
`if: !cancelled()` so a failure in one image does not block subsequent
images, since the images are independent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cloud builder's gcPolicy caps cache at 25GiB. After 2 CPU builds the
cache fills with non-GPU layers, then the GPU base (libtorch + CUDA)
cannot be extracted and the GPU build fails with "no space left on
device" inside /buildkit/data/runc-overlayfs/snapshots/.

Prune the builder cache right before the GPU pair so the CPU layers (which
the GPU builds cannot reuse anyway) are evicted. The two GPU builds still
share the freshly cached GPU base, so we keep within-pair cache reuse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each image (devnet/testnet CPU+GPU on develop, mainnet CPU+GPU on main)
is now its own job, chained via `needs:` to keep them sequential. The
run-summary view shows one box per image instead of a single job with
nested steps, and individual builds can be re-run on their own.

Buildx-cloud cache stays shared across jobs because it lives on the cloud
builder (server-side), so layer reuse between adjacent builds is
preserved. The CPU-to-GPU prune still runs at the start of the first GPU
job to free the CPU layers from the cache cap.

Failure semantics unchanged: `if: !cancelled()` on downstream jobs lets
each build run independently of the previous one's outcome.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cristibleotiu cristibleotiu merged commit 40e2a07 into main May 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants