Skip to content

feat: S3-backed cache-s3 siblings (golang/rust/node-pnpm) for Hetzner RGW#187

Open
mike-ainsel wants to merge 4 commits into
v4-betafrom
feat/hz-s3-cache-actions
Open

feat: S3-backed cache-s3 siblings (golang/rust/node-pnpm) for Hetzner RGW#187
mike-ainsel wants to merge 4 commits into
v4-betafrom
feat/hz-s3-cache-actions

Conversation

@mike-ainsel

@mike-ainsel mike-ainsel commented Jun 22, 2026

Copy link
Copy Markdown
Member

Adds -s3 sibling cache actions for the hz self-hosted runner fleet, following the existing turborepo/cache-s3 precedent.

What

  • actions/golang/cache-s3
  • actions/rust/cache-s3
  • actions/node/cache-pnpm-s3

Each mirrors its GitHub-cache sibling (same key scheme) but uses tespkg/actions-cache (pinned to e07e2d49 = v1.10.2) against an S3 endpoint, defaulting to the Hetzner RadosGW s3.hz.platforma.bio (split-DNS → in-cluster RGW on hz, public elsewhere), bucket ci-actions-cache.

Why

On hz runners we are dropping the shared-hostPath toolchain caches (a cross-job contamination surface) and letting each job restore/save its own copy from RGW — node-local NVMe is left purely for the ephemeral per-job workdir. Caches now live in object storage exactly like the turbo cache.

Safety

  • Additive only — existing golang/cache / rust/cache / node/cache-pnpm are untouched, so AWS-runner jobs are unaffected.
  • use-fallback: true → falls back to the GitHub-hosted cache if RGW is briefly unreachable.
  • Creds are inputs (passed from the org secrets HZ_CI_CACHE_S3_ACCESS_KEY/_SECRET_KEY), never baked.

Consumption (next, in the per-repo migration PRs)

hz jobs call e.g. milaboratory/github-ci/actions/golang/cache-s3@v4 with the org secrets; on rl8, rust/cache-s3 takes cargo-home: /opt/rust/cargo (the image bakes CARGO_HOME there).

Greptile Summary

Adds three additive cache-s3 composite actions (golang/cache-s3, rust/cache-s3, node/cache-pnpm-s3) that mirror their GitHub-cache siblings but route cache I/O through the Hetzner RadosGW via tespkg/actions-cache (pinned by full commit SHA). Existing actions and AWS-runner jobs are completely untouched.

  • Each action preserves the exact same cache key scheme as its sibling so keys are portable and fallback to the GitHub-hosted cache (use-fallback: true) works seamlessly.
  • rust/cache-s3 adds a cargo-home input (default ~/.cargo) to handle the hz-rl8 runner image where CARGO_HOME=/opt/rust/cargo is baked in.
  • S3 credentials are injected as required inputs, never hardcoded; insecure defaults to false (TLS on).

Confidence Score: 4/5

All three actions are additive-only; no existing workflows are touched and the changes are straightforward YAML wrappers around a pinned third-party action.

The actions are well-structured mirrors of their siblings. Two observations temper a clean bill of health: retry-count is not a documented input of tespkg/actions-cache and is likely silently ignored, and golang/cache-s3 drops the save-always guarantee present in golang/cache so Go caches will not be written when a job fails mid-run.

All three files share the same retry-count concern; actions/golang/cache-s3/action.yaml additionally warrants a second look for the missing save-on-failure behaviour.

Important Files Changed

Filename Overview
actions/golang/cache-s3/action.yaml New S3-backed Go cache action for hz runners; mirrors sibling key scheme but omits save-always behavior and passes an unverified retry-count input.
actions/rust/cache-s3/action.yaml New S3-backed Rust cache action with configurable cargo-home (needed for hz-rl8 image); same retry-count concern as golang sibling; otherwise clean.
actions/node/cache-pnpm-s3/action.yaml New S3-backed pnpm cache action; dynamic store path resolved via pnpm store path, dual-path safety net carried from sibling; same retry-count concern.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Job as hz Runner Job
    participant Action as cache-s3 Action
    participant RGW as Hetzner RadosGW
    participant GHCache as GitHub-hosted Cache

    Job->>Action: invoke with access-key / secret-key
    Action->>RGW: restore cache (key lookup)
    alt RGW reachable
        RGW-->>Action: cache hit / miss
    else RGW unreachable and use-fallback true
        Action->>GHCache: restore cache (same key)
        GHCache-->>Action: cache hit / miss
    end
    Action-->>Job: cache restored (or cold build)
    Note over Job: build / test steps run
    alt Job succeeds
        Job->>Action: post-step: save cache
        Action->>RGW: upload cache artifact
        RGW-->>Action: saved
    else Job fails
        Note over Action: cache NOT saved to RGW
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Job as hz Runner Job
    participant Action as cache-s3 Action
    participant RGW as Hetzner RadosGW
    participant GHCache as GitHub-hosted Cache

    Job->>Action: invoke with access-key / secret-key
    Action->>RGW: restore cache (key lookup)
    alt RGW reachable
        RGW-->>Action: cache hit / miss
    else RGW unreachable and use-fallback true
        Action->>GHCache: restore cache (same key)
        GHCache-->>Action: cache hit / miss
    end
    Action-->>Job: cache restored (or cold build)
    Note over Job: build / test steps run
    alt Job succeeds
        Job->>Action: post-step: save cache
        Action->>RGW: upload cache artifact
        RGW-->>Action: saved
    else Job fails
        Note over Action: cache NOT saved to RGW
    end
Loading

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
actions/golang/cache-s3/action.yaml:56-57
**`retry-count` may be silently ignored**

`retry` is a documented input for `tespkg/actions-cache`, but `retry-count` does not appear in the published action interface. GitHub Actions silently discards unrecognised `with:` keys, so all three actions would retry with the action's default count (likely 1–3) rather than the intended value of `3`. If you want a deterministic retry count, verify this input is supported at the pinned commit; otherwise remove it to avoid misleading configuration. The same pattern appears in `rust/cache-s3` and `node/cache-pnpm-s3`.

### Issue 2 of 2
actions/golang/cache-s3/action.yaml:60-76
**No `save-always` equivalent; cache not saved on job failure**

The sibling `golang/cache` defaults `cache-save-always: true`, which passes `save-always: true` to `actions/cache` so the cache is written even when a later step fails. `tespkg/actions-cache` does not document a `save-always` input, so the S3 action only saves on clean-run success. On the hz fleet, where jobs are more likely to see flaky infra failures, this means a partially-built Go module cache is silently discarded and the next run has to start cold. Consider documenting this divergence in the action description, or adding a separate post-step / `save-always`-like mechanism if the action supports it.

Reviews (1): Last reviewed commit: "feat: S3-backed cache-s3 siblings for go..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

@mike-ainsel mike-ainsel changed the base branch from v4 to v4-beta June 22, 2026 15:06
…RGW)

Add -s3 sibling cache actions mirroring the existing GitHub-cache ones,
following the turborepo/cache-s3 precedent. They restore/save toolchain
caches to the Hetzner RadosGW (s3.hz.platforma.bio) via tespkg/actions-cache
(pinned to v1.10.2 / e07e2d49) instead of the GitHub-hosted cache.

Purpose: on the hz self-hosted runners, drop the shared-hostPath node caches
(cross-job contamination surface) and let each job restore/save its own copy
from RGW; node-local NVMe is left for the ephemeral per-job workdir only.

- Additive only; the existing golang/cache, rust/cache, node/cache-pnpm are
  untouched, so AWS-runner jobs keep using the GitHub-hosted cache.
- Identical key schemes to their siblings.
- use-fallback=true → resilient to a transient RGW outage.
- Linux only (the hz fleet is Linux/x86).
@mike-ainsel mike-ainsel force-pushed the feat/hz-s3-cache-actions branch from 2d1d99d to 4a5c0ed Compare June 22, 2026 15:08
Comment on lines +56 to +57
runs:
using: "composite"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 retry-count may be silently ignored

retry is a documented input for tespkg/actions-cache, but retry-count does not appear in the published action interface. GitHub Actions silently discards unrecognised with: keys, so all three actions would retry with the action's default count (likely 1–3) rather than the intended value of 3. If you want a deterministic retry count, verify this input is supported at the pinned commit; otherwise remove it to avoid misleading configuration. The same pattern appears in rust/cache-s3 and node/cache-pnpm-s3.

Prompt To Fix With AI
This is a comment left during a code review.
Path: actions/golang/cache-s3/action.yaml
Line: 56-57

Comment:
**`retry-count` may be silently ignored**

`retry` is a documented input for `tespkg/actions-cache`, but `retry-count` does not appear in the published action interface. GitHub Actions silently discards unrecognised `with:` keys, so all three actions would retry with the action's default count (likely 1–3) rather than the intended value of `3`. If you want a deterministic retry count, verify this input is supported at the pinned commit; otherwise remove it to avoid misleading configuration. The same pattern appears in `rust/cache-s3` and `node/cache-pnpm-s3`.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment on lines +60 to +76
uses: tespkg/actions-cache@e07e2d4953dc8c020d447363e5064e36d04f3cf9 # v1.10.2
with:
endpoint: ${{ inputs.endpoint }}
region: ${{ inputs.region }}
bucket: ${{ inputs.bucket }}
insecure: ${{ inputs.insecure }}
accessKey: ${{ inputs.access-key }}
secretKey: ${{ inputs.secret-key }}
use-fallback: ${{ inputs.use-fallback }}
retry: 'true'
retry-count: '3'
path: |
~/.cache/go-build
~/go/pkg/mod
key: ${{ runner.os }}-${{ runner.arch }}-cache-go-${{ inputs.cache-version }}-${{ hashFiles(inputs.cache-dependency-hashfiles-path) }}
restore-keys: |
${{ runner.os }}-${{ runner.arch }}-cache-go-${{ inputs.cache-version }}-

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No save-always equivalent; cache not saved on job failure

The sibling golang/cache defaults cache-save-always: true, which passes save-always: true to actions/cache so the cache is written even when a later step fails. tespkg/actions-cache does not document a save-always input, so the S3 action only saves on clean-run success. On the hz fleet, where jobs are more likely to see flaky infra failures, this means a partially-built Go module cache is silently discarded and the next run has to start cold. Consider documenting this divergence in the action description, or adding a separate post-step / save-always-like mechanism if the action supports it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: actions/golang/cache-s3/action.yaml
Line: 60-76

Comment:
**No `save-always` equivalent; cache not saved on job failure**

The sibling `golang/cache` defaults `cache-save-always: true`, which passes `save-always: true` to `actions/cache` so the cache is written even when a later step fails. `tespkg/actions-cache` does not document a `save-always` input, so the S3 action only saves on clean-run success. On the hz fleet, where jobs are more likely to see flaky infra failures, this means a partially-built Go module cache is silently discarded and the next run has to start cold. Consider documenting this divergence in the action description, or adding a separate post-step / `save-always`-like mechanism if the action supports it.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

…ache

tespkg/actions-cache extracts with 'tar --keep-old-files' (unlike stock
actions/cache which overwrites), so the read-only (0444) Go module cache trips
'Cannot open: File exists'. Cache only ~/.cache/go-build (the compile-time win);
modules repopulate from GOPROXY.
cache-backend=s3 routes the Go build cache to an S3/RGW bucket (tespkg) instead
of the Azure-backed GitHub Actions cache, which self-hosted hz runners can't
reach. Linux-only, go-build only (modules repopulate from GOPROXY). Default
stays 'github' so other repos/runners are unchanged. prepare threads the new
inputs through and pins golang/cache to this branch (folds to @v4 on merge).
…le to prepare

Consolidate on the golang/cache-s3 sibling action for the s3/RGW Go cache
(consistent with node/cache-pnpm-s3 + rust/cache-s3). golang/cache returns to
its original GitHub actions/cache form; golang/prepare gains cache-enabled
(default true) so a caller can skip the built-in GitHub cache and use the
golang/cache-s3 sibling instead (no double-cache).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant