Skip to content

Run integration Vertex AI tests against a freshly-built source image#668

Draft
kmontemayor2-sc wants to merge 10 commits into
mainfrom
kmonte/smoke-test-suite
Draft

Run integration Vertex AI tests against a freshly-built source image#668
kmontemayor2-sc wants to merge 10 commits into
mainfrom
kmonte/smoke-test-suite

Conversation

@kmontemayor2-sc

@kmontemayor2-sc kmontemayor2-sc commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Follow up #666 - would have caught this error much earlier.

Fixes a source/image skew: the integration tests that launch Vertex AI jobs used to run their
workers on the pinned release image (DEFAULT_GIGL_RELEASE_SRC_IMAGE_CPU = src-cpu:0.2.0),
so worker-side source changes (e.g. get_graph_store_info) weren't validated until a release.

Now make integration_test always builds a fresh src-cpu image from the current source and
runs the suite against it (the Vertex-AI-launching tests pick it up via GIGL_CPU_DOCKER_URI):

  • Makefile: integration_test builds+pushes ${INTEGRATION_TEST_CPU_IMAGE} then runs
    tests.integration.main with GIGL_CPU_DOCKER_URI exported. New
    INTEGRATION_TEST_CPU_IMAGE_TAG/INTEGRATION_TEST_CPU_IMAGE vars (tag defaults to ${DATE} locally).
  • Tests (tests/integration/{distributed/utils/networking_test.py, common/services/vertex_ai_test.py}):
    read GIGL_CPU_DOCKER_URI fail-fast in setUp; workers run real functions on the fresh image —
    _assert_graph_store_info(...) and _assert_machine_cpu_count(...) (verifies provisioned vCPU
    count per pool); explicit timeout_s on all CustomJob configs.
  • CI: on-pr-merge ci-integration-test and the /integration_test comment job pass an
    immutable per-run tag ${{ github.run_id }}.${{ github.run_attempt }}.
  • Docs: CLAUDE.md + README.md updated.

Net diff is 7 files (the branch history first added a tests/smoke/ suite, then folded it back into
tests/integration — no tests/smoke/ remains).

Test Plan

  • ty, ruff, mdformat --check pass on changed files
  • unittest discovery collects the integration suite (72 cases); worker fns import
  • make -n integration_test dry-run: build image tag == exported GIGL_CPU_DOCKER_URI
  • Merge-queue CI runs make integration_test (builds image + launches Vertex AI jobs) green

…e image

Relocate the two non-e2e tests that launch real Vertex AI jobs
(networking_test, vertex_ai_test) into a new tests/smoke/ package with its own
main.py, and add a `make smoke_test` target that builds a fresh src-cpu image
from the current source and runs them against it (via GIGL_CPU_DOCKER_URI).

This closes a source/image skew gap: `make integration_test` runs workers on
the pinned release image, so worker-side source changes (e.g. get_graph_store_info)
were only validated after a release. smoke_test rebuilds from current source so
they're validated on the PR.

- Makefile: SMOKE_TEST_CPU_IMAGE_TAG / SMOKE_TEST_CPU_IMAGE vars + smoke_test target.
- CI: run `make smoke_test` in on-pr-merge's ci-integration-test, and add a
  `/smoke_test` (+ /all_test) on-demand job to on-pr-comment. Both pass an
  immutable per-run tag (run_id.run_attempt) so concurrent runs can't clobber it.
- networking_test: worker runs a real _assert_graph_store_info() function (thin
  python -c import+call) instead of an inlined script, now that the image is
  rebuilt from source.
- vertex_ai_test: the CustomJob tests run a real worker function asserting the
  provisioned machine's vCPU count, on the fresh image.
- All smoke job configs set an explicit short timeout_s.
- Document make smoke_test / tests/smoke in CLAUDE.md and README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kmontemayor2-sc

Copy link
Copy Markdown
Collaborator Author

/all_test

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 16:52:25UTC : 🔄 C++ Unit Test started.

@ 16:54:20UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 16:52:26UTC : 🔄 Integration Test started.

@ 17:43:25UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 16:52:27UTC : 🔄 Lint Test started.

@ 17:00:32UTC : ❌ Workflow failed.
Please check the logs for more details.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 16:52:28UTC : 🔄 Python Unit Test started.

@ 17:55:27UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 16:52:28UTC : 🔄 E2E Test started.

@ 18:12:45UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 16:52:33UTC : 🔄 Scala Unit Test started.

@ 17:01:20UTC : ✅ Workflow completed successfully.

kmonte and others added 9 commits June 4, 2026 16:54
…ion identifiers

Reverses the tests/smoke/ packaging: relocates vertex_ai_test.py and
networking_test.py back under tests/integration/ (their natural home), deletes
the entire tests/smoke/ package, and renames all smoke-specific identifiers
(class names, job-name prefixes, KFP pipeline names, display names, experiments,
labels, the GCS path segment, and the timeout constant) to their integration
equivalents so the files are accurate and the worker python -c import strings
resolve on the rebuilt image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…smoke_test target

Renames SMOKE_TEST_CPU_IMAGE_TAG/SMOKE_TEST_CPU_IMAGE vars to
INTEGRATION_TEST_CPU_IMAGE_TAG/INTEGRATION_TEST_CPU_IMAGE, rewrites the
integration_test target to build and push a fresh src-cpu Docker image before
running the suite (so Vertex-AI-launching tests use current source), and
removes the now-superseded smoke_test target entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_test job

- on-pr-merge.yml: remove the separate "Run Smoke Tests" step; pass
  INTEGRATION_TEST_CPU_IMAGE_TAG to the integration step so make
  integration_test builds and tests against a fresh, immutable per-run image.
- on-pr-comment.yml: delete the smoke-test job (/smoke_test trigger);
  pass INTEGRATION_TEST_CPU_IMAGE_TAG to the integration-test job's command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fresh src-cpu image

Remove all references to the removed make smoke_test target, tests/smoke/ package,
and /smoke_test CI command. Update the integration test docs in both CLAUDE.md and
README.md to state that make integration_test now builds a fresh src-cpu image from
current source (so Vertex-AI-launching tests run against current code, not a pinned
release image).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kmontemayor2-sc

Copy link
Copy Markdown
Collaborator Author

/all_test

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 19:10:05UTC : 🔄 Integration Test started.

@ 20:41:43UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 19:10:06UTC : 🔄 C++ Unit Test started.

@ 19:11:56UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 19:10:06UTC : 🔄 Scala Unit Test started.

@ 19:20:53UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 19:10:06UTC : 🔄 E2E Test started.

@ 20:52:23UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 19:10:08UTC : 🔄 Lint Test started.

@ 19:19:01UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 19:11:29UTC : 🔄 Python Unit Test started.

@ 20:27:16UTC : ✅ Workflow completed successfully.

@kmontemayor2-sc kmontemayor2-sc changed the title Add tests/smoke suite that runs Vertex AI tests against a fresh source image Run integration Vertex AI tests against a freshly-built source image Jun 11, 2026
@kmontemayor2-sc

Copy link
Copy Markdown
Collaborator Author

/all_test

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 22:32:00UTC : 🔄 Integration Test started.

@ 23:53:52UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 22:32:01UTC : 🔄 C++ Unit Test started.

@ 22:33:58UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 22:32:01UTC : 🔄 Scala Unit Test started.

@ 22:42:36UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 22:32:04UTC : 🔄 Lint Test started.

@ 22:40:54UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 22:32:04UTC : 🔄 Python Unit Test started.

@ 23:47:44UTC : ✅ Workflow completed successfully.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

GiGL Automation

@ 22:32:04UTC : 🔄 E2E Test started.

@ 23:54:18UTC : ✅ Workflow completed successfully.

@mkolodner-sc mkolodner-sc left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle! Mostly LGTM, a few comments

gcp_service_account_email: ${{ secrets.GCP_SERVICE_ACCOUNT_EMAIL }}
command: |
make integration_test
# Immutable per-run image tag so concurrent runs can't overwrite each other's tag.

@mkolodner-sc mkolodner-sc Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? This puts a shell comment inside the command: block. That command is later passed as _CMD and executed via $_CMD in .github/cloud_builder/run_command_on_active_checkout.yaml:50, where expanded # is not parsed as a comment. This may fail before make integration_test with #: command not found

container_uri = "condaforge/miniforge3:25.3.0-1"
command = ["python", "-c", "import logging; logging.info('Hello, World!')"]

command = [

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot review:

Vertex AI worker commands import test-only dependencies missing from src-cpu
vertex_ai_test.py:104/140/153 and networking_test.py:135 import helper functions from test modules inside the freshly built runtime image. Those modules import parameterized at top level (vertex_ai_test.py:8, networking_test.py:6), but parameterized is only in the test dependency group (pyproject.toml:148), while the CPU image uses non-dev install and Dockerfile.src only runs uv pip install .. The remote workers will likely fail on import before reaching the assertions.
Fix: put worker entrypoints in a minimal module with only runtime deps, or inline the tiny commands so they do not import test modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants