Skip to content

refresh-dwarf bot: runtime-verify each generated JSON via cephadm#129

Open
taodd wants to merge 2 commits into
mainfrom
feature/embedded-dwarf-runtime-verify
Open

refresh-dwarf bot: runtime-verify each generated JSON via cephadm#129
taodd wants to merge 2 commits into
mainfrom
feature/embedded-dwarf-runtime-verify

Conversation

@taodd
Copy link
Copy Markdown
Owner

@taodd taodd commented Jun 5, 2026

Summary

The embedded-DWARF refresh bot previously only proved a generated JSON parses and links into the embedded header — never that osdtrace/radostrace actually trace meaningfully through it. This adds per-version runtime trace verification that exercises the embedded path against a real cluster of the matching version.

What changed

Workflow split into three jobs (.github/workflows/refresh-embedded-dwarf.yaml):

  • generate — detect + generate JSONs (one podman container per version), re-aggregate the embedded header as an early link gate, publish the new JSONs + manifest as artifacts, and emit a JSON array of generated versions.
  • verify — dynamic parallel matrix, one runner per generated version. Each cell rebuilds osdtrace/radostrace with the new JSON embedded, provisions a single-host cephadm cluster on quay.io/ceph/ceph:v<version>, drives an S3 workload, and traces a live OSD + radosgw through the embedded path. REQUIRE_EMBEDDED=1 makes the Using embedded DWARF data marker mandatory — a silent fall-back to live DWARF parsing fails the cell.
  • open-pr — assembles only the versions that passed verification; failures are dropped from the PR and listed for retry next run.

tests/functional-test-cephadm-rgw.sh gains two opt-in knobs (existing PR matrix behavior unchanged when unset):

  • CEPH_IMAGE — pin an exact point-release image instead of the per-major latest.
  • REQUIRE_EMBEDDED=1 — make the embedded-DWARF boot marker mandatory.

Why a cephadm cluster of the exact version works

Verified empirically: for tagged point releases the quay.io image ships the same binary as the el9 RPM the JSON is generated from — ceph-osd build_id for v19.2.2 matched the el9 RPM's build_id exactly (702d13c4…). So the embedded JSON matches by build_id and the embedded path genuinely engages.

Test plan

  • bash -n + actionlint clean.
  • Unit-tested the versions-array emit and the open-pr assemble logic (keeps verified, drops failed) locally.
  • Full workflow_dispatch run to validate the three-job flow end-to-end (spins up one cephadm cluster per generated version).

🤖 Generated with Claude Code

taodd added 2 commits June 5, 2026 23:37
The refresh bot previously only proved a generated JSON parses and links
into the embedded header -- never that the tools actually trace meaningfully
through it.  Add a parallel per-version runtime verification:

- Split the workflow into three jobs: generate -> verify (matrix) -> open-pr.
- verify fans out one runner per generated version, rebuilds osdtrace +
  radostrace with the new JSON embedded, provisions a single-host cephadm
  cluster on quay.io/ceph/ceph:v<version> (whose ceph-osd build_id matches
  the el9 RPM the JSON was extracted from), drives an S3 workload, and traces
  a live OSD + radosgw through the EMBEDDED path.
- open-pr includes only versions that passed; failures are dropped from the
  PR and listed for retry next run.

functional-test-cephadm-rgw.sh gains two opt-in knobs (existing matrix
behaviour unchanged when unset):
- CEPH_IMAGE: pin an exact point-release image instead of the per-major latest.
- REQUIRE_EMBEDDED=1: make the 'Using embedded DWARF data' marker mandatory,
  so a silent fall-back to live DWARF parsing fails the test.
actions/upload-artifact@v4 rejects ':' in file paths, and the JSON
filenames embed the package epoch (osd-2:19.2.2-0.el9_dwarf.json), so the
raw-file upload failed.  Bundle the new JSONs into a colon-free tarball
(alongside the manifest TSVs) in generate, and untar in the verify and
open-pr jobs.  No other logic change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant