From f4ad17dc330c4b11f32dafb0be02a36b87765e6d Mon Sep 17 00:00:00 2001 From: ritmun Date: Wed, 4 Mar 2026 09:24:23 -0600 Subject: [PATCH] SDCICD-1715 update readme file --- README.md | 601 ++++++++++++++++++++++++------------------------------ 1 file changed, 265 insertions(+), 336 deletions(-) diff --git a/README.md b/README.md index f6ce954921..5781b3bbef 100644 --- a/README.md +++ b/README.md @@ -1,354 +1,224 @@ -# OSDe2e - -[![GoDoc](https://godoc.org/github.com/openshift/osde2e?status.svg)](https://godoc.org/github.com/openshift/osde2e) - -## Introduction - -A comprehensive test framework used for Service Delivery to test all aspects of -Managed OpenShift Clusters ([OpenShift Dedicated]). The data generated by -the different test coverage is used to inform product releases and decisions. - -OSDe2e key features are: - -* Portable test framework that can run anywhere to validate end to end test workflows - * Run locally from a developers workstation or from a CI application -* Supports create/delete different cluster deployment types - * ROSA, ROSA Hosted Control Plane (e.g. HyperShift), OSD on AWS - * OSD on GCP - * Azure (Azure Red Hat OpenShift) -* Performs cluster health checks to ensure cluster is operational prior to - running tests -* Perform cluster upgrades -* Captures artifacts for later use, such as - * Cluster install/uninstall logs - * Test logs - * Metrics - * Metadata - * Must gather artifacts -* Tests OSD operators along with other OpenShift features from a - customer/SRE point of view -* Provides a test harness to validate [Add Ons][OSDE2E Test Harness] - -When osde2e is invoked, the standard workflow is followed: - -* Load configuration -* Cluster deployment (when not leveraging an existing cluster) -* Verify the health of the cluster -* Run tests (pre upgrade) -* Collect logs, metrics and metadata -* Upgrade cluster (when defined) -* Verify the health of the cluster post upgrade -* Run tests (post upgrade - when upgrade is defined) -* Collect logs, metrics and metadata -* Cluster deprovision (when this is toggled on) - -## Prerequisites - -Prior to running osde2e, make sure you meet the minimal prerequisites defined below: - -* Navigate to [OpenShift Cluster Manager (OCM)] to create a service account with client ID and secret. - * Save your client ID into the environment variable `OCM_CLIENT_ID` and client secret into `OCM_CLIENT_SECRET` for later usage -* Verify (submit a request if required to [ocm resources]) your Red Hat account - has adequate quota for deploying clusters based on your preferred deployment type -* A go workspace running the minimal version defined in the [go.mod](go.mod) - -## Run - -OSDe2e can be invoked by one of two ways. Refer to each section below to learn how -to run it. - -### From Source - -Running from source requires you to build the osde2e binary. Follow the steps below -to do this: - -```shell -git clone https://github.com/openshift/osde2e.git -cd osde2e -go mod tidy -make build -``` - -On completion of the `make build` target, the generated binary will reside in the -directory `./out/`. Where you can then invoke osde2e `./out/osde2e --help`. - -### From Container Image - -Running from a container image using a container engine (e.g. docker, podman). -You can either build the image locally or consume the public image available on -[quay.io][OSDE2E Quay Image]. - -```shell -export CONTAINER_ENGINE= - -# Build Image -make build-image -$CONTAINER_ENGINE run quay.io/redhat-services-prod/osde2e-cicada-tenant/osde2e:latest - -# Pull Image -$CONTAINER_ENGINE pull quay.io/redhat-services-prod/osde2e-cicada-tenant/osde2e:latest -$CONTAINER_ENGINE run quay.io/redhat-services-prod/osde2e-cicada-tenant/osde2e:latest -``` - -## Config Input - -OSDe2e provides multiple ways for you to provide input to tailor what test workflows -you wish to validate. It provides four ways for you to provide input -(order is lowest to highest precedence): - -* Use pre-canned composable default [configs] -* Provide a custom config -* Environment variables -* Command line options - -*It is highly recommended to leave sensitive settings as environment variables (e.g. `OCM_CLIENT_ID`, `OCM_CLIENT_SECRET`). -This way the chance of these settings defined in a custom config file are not checked into -source control.* - -### Pre-Canned Default Configs - -The [configs] package provides pre-canned default configs available for you to use. -These config files are named based on what action they are performing. Within the config -file can contain multiple settings to tailor osde2e. - -Example config [stage](configs/stage.yaml): - -This default config is telling osde2e to use the stage OCM environment. - -```yaml -ocm: - env: stage -``` - -You can provide N+1 pre-canned configs to osde2e. Example below will deploy -a OSD cluster within the OCM stage environment. - -```shell -./out/osde2e test --configs aws,stage -``` - -### Custom Config - -The composable configs consist of a number of small YAML files that can all be loaded together. -Rather than using the built in configs, you can also elect to build your own custom YAML file -and provide that using the `--custom-config` CLI option. - -```shell -osde2e test --custom-config ./osde2e.yaml -``` - -The custom config below is a basic example for deploying a ROSA STS cluster and running -all of the OSD operators tests that do not have the informing label associated to them. - -```yaml -dryRun: false -provider: rosa -cloudProvider: - providerId: aws - region: us-east-1 -rosa: - env: stage - STS: true -cluster: - name: osde2e -tests: - ginkgoLabelFilter: Operators && !Informing -``` - -You can use both pre-canned default configs and your own custom configs: - -```shell -./out/osde2e test --configs aws --custom-config ./osde2e.yaml -``` - -### Environment Variables - -Any config option can be passed in using environment variables. -Please refer to the [config package] for exact environment variable names. +# OSDE2E Documentation Summary -Below is an example to spin up a OSD cluster and test it: +## Table of Contents -```shell -OCM_CLIENT_ID= \ -OCM_CLIENT_SECRET= \ -OSD_ENV=prod \ -CLUSTER_NAME=my-cluster \ -CLUSTER_VERSION=4.12.0 \ -osde2e test -``` +- [Overview](#overview) +- [Core Components](#core-components) + - [1. Testing Framework](#1-testing-framework-osde2e-testingmd) + - [2. Test Execution Methods](#2-test-execution-methods) + - [A. Ginkgo Test Images](#a-ginkgo-test-images-test-harnessesmd) + - [B. Writing Tests](#b-writing-tests-writing-testsmd) + - [3. Periodic Jobs](#3-ci-jobs-ci-jobsmd) + - [5. Self-Service Operations](#5-self-service-operations) + - [Gap Analysis Testing](#gap-analysis-testing-adhoc-osde2e-testingmd-ad-hoc-e2e-jobmd) + - [Instance Type Enablement](#instance-type-enablement-instance-type-enablementmd) + - [Region Enablement](#region-enablement-region-enablementmd) + - [Testgrid Pipeline Integration](#testgrid-pipeline-integration-adding-testgrid-pipelines-through-ci-operatormd) + - [6. Running Tests](#6-running-tests) + - [Local Testing](#local-testing-run-osde2e-testsmd-testing-with-osde2emd) + - [7. Slack Notifications](#7-slack-notifications) + - [8. Configuration](#8-configuration-configmd) +- [Workflow Summary](#workflow-summary) + - [Typical Test Flow](#typical-test-flow) + - [Integration Points](#integration-points) +- [Key Resources](#key-resources) + +## Overview -These also can be combined with pre-canned default configs and custom configs: +OSDE2E (OpenShift Dedicated End-to-End) is a comprehensive test framework for qualifying new versions of OpenShift in managed environments. It facilitates testing for Managed OpenShift platforms (OSD, ROSA, ROSA HCP, ARO), OSD Operators, and Addons. + +OSDE2E integrates with OpenShift's CI/CD infrastructure to provide continuous validation of new OpenShift releases and serves as a critical actor in the release gating process. The framework supports various testing scenarios from ad-hoc testing to automated periodic validation, with robust cluster provisioning, artifact collection and reporting capabilities. -```shell -OCM_CLIENT_ID= \ -OCM_CLIENT_SECRET= \ -CLUSTER_VERSION=4.12.0 \ -osde2e test --configs prod,e2e-suite -``` +## Core Components + +### 1. Testing Framework (OSDE2E-Testing.md) -```shell -OCM_CLIENT_ID= \ -OCM_CLIENT_SECRET= \ -CLUSTER_VERSION=4.12.0 \ -osde2e test --configs prod,e2e-suite -``` +**Primary Use Cases:** +- Managed OpenShift platforms (OSD, ROSA, ROSA HCP) +- OSD Operators running on Managed OpenShift +- Addons integration testing with OpenShift versions + +Test results serve as gating signals for promotion between environments. + +### 2. Test Execution Methods + +#### A. Ginkgo Test Images (Test-Harnesses.md) +Standalone Ginkgo e2e test images run on test pods. Three types available: + +**Operator Ad-hoc Test Image:** +- Uses `openshift/golang-osd-operator-osde2e` boilerplate +- Test structure in operator repo under `/test/e2e/` +- Automated publishing via CI/CD pipelines +- Integrated with Prow for automated testing + +**Addon Test Harness:** +- For OpenShift addon components +- Requires `ADDON_IDS` environment parameter +- Addon installation before test execution + +#### B. Writing Tests (Writing-Tests.md) + +**Best Practices:** +- Follow Kubernetes best practices guide for e2e tests +- Use Ginkgo and Gomega frameworks +- Leverage osde2e-common module to reduce code duplication +- Use e2e-framework for cluster interfacing +- Apply labels/tags for test classification +- Focus test cases on specific scope +- Cover both positive and negative cases +- Include proper error messages for debugging + +**Example Repositories:** +- Managed Upgrade Operator Tests +- OCM Agent Operator Tests +- RBAC Permissions Operator Tests + +### 3. CI Jobs (CI-Jobs.md) + +**SD CICD Periodic Jobs:** +- ROSA BYOVPC Proxy Install/Post Install +- OSD AWS Upgrade suites (Y-1 to Y, Z-1 to Z, Y to Y+1) +- OSD AWS SREP Operator Informing Suite + +**TRT Nightly Periodic Jobs:** +- Validates Managed OpenShift for new nightly OCP builds +- Provides informing signal to releases +- Covers OSD (AWS/GCP) and ROSA (Classic STS/HCP) +- Supports OCP versions 4.10-4.14 + + +**Adding Jobs:** +- PR to release repo for periodic job +- PR to continuous-release-jobs repo for signal notification +- Auto-included after 24 hours + **Removing Jobs:** +- PR to release repo to remove job +- PR to continuous-release-jobs repo to remove signal + + +These jobs send alerts to #hcm-cicd-alerts Slack channel. + + +### 5. Self-Service Operations + +#### Gap Analysis Testing (adhoc-osde2e-testing.md, Ad-Hoc-E2E-Job.md) +- Jenkins parameterized job for on-demand testing +- Available at ci.int.devshift.net +- Supports AWS and GCP testing +- Custom configuration via environment variables +- Useful for region/instance type enablement -A list of commonly used environment variables are included in [Config variables]. +#### Instance Type Enablement (Instance-Type-Enablement.md) -### Command Line Options +**Prerequisites:** +- Instance type enabled in Stage via OCM/ROSA CLIs +- Quota verification +- Pricing enablement -Some configuration settings are also exposed as command-line options. -A full list can be displayed by providing `--help` after the command. +**Process:** +1. Create PR in release repo for osde2e jobs +2. Configure job with new instance type +3. Run `make jobs` to generate prowgen jobs +4. Merge and monitor results in Prow +5. Validate with 3 consecutive successful runs -Below is an example of using options for the `test` command: +#### Region Enablement (Region-Enablement.md) -```shell -./out/osde2e test --cluster-id \ - --provider stage \ - --skip-health-check \ - --focus-tests "RBAC Operator" -``` +**Prerequisites:** +- Region enabled in AWS account +- SDA team enables region for ocm account -Another example below is you can skip cluster health check, must gather -as follows. +**Common Issues:** +- AMI availability errors (report to BU) +- Quota errors (request quota increase) -```shell -POLLING_TIMEOUT=1 \ -./out/osde2e test --cluster-id= \ ---configs stage \ ---skip-must-gather \ ---skip-health-check \ ---focus-tests="rh-api-lb-test" -``` +**Process:** +- Create periodic Prow job +- Or run ad-hoc Jenkins job +- Follow region enablement SOP -A list of commonly used CLI flags are included in [Config variables]. +#### Testgrid Pipeline Integration (Adding-Testgrid-Pipelines-Through-Ci-Operator.md) +- Integration with ci-operator for testgrid pipelines +- Jobs added to redhat-openshift-osd dashboard +- Custom 'osde2e' tag for identification +- Must be prowgen job +- Add to _allow-list.yaml in release repo +- Auto-updates every 24 hours -### Examples +### 6. Running Tests + +#### Local Testing (run-osde2e-tests.md, testing-with-osde2e.md) -To see more examples of configuring input for osde2e, refer to the -[prowgen jobs][OSDE2E ProwGen Job Config] in the OpenShift release repository -owned by the team. These will be always up to date with the latest changes -osde2e has to offer. - -## Cluster Deployments - -OSDe2e provides native support for deploying the following cluster types: +**On Existing Cluster:** +```bash +# OCM credentials +export OCM_CLIENT_ID= +export OCM_CLIENT_SECRET= -* ROSA -* ROSA Hosted Control Plane (HyperShift) -* OSD (OpenShift Dedicated) +# AWS credentials (for ROSA/OSD on AWS) +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= +export AWS_REGION= # e.g., us-east-1 -You can have osde2e deploy the cluster if a cluster ID is not provided or -you can leverage an existing cluster by giving the cluster ID as input at -runtime. +# Optional: For BYO-VPC clusters +export AWS_VPC_SUBNET_IDS= -You can also provide it a kubeconfig file and osde2e can attempt to target -that cluster. +# Optional: For ROSA using AWS profile +export AWS_PROFILE= -```shell -export TEST_KUBECONFIG= -./out/osde2e test +./osde2e test --cluster-id ${CLUSTERID} --configs rosa,e2e-suite,stage ``` -*It may be possible to test against a non Managed OpenShift cluster -(a traditional OpenShift Container Platform cluster). Though this will -require you to alter the input settings as non managed clusters will not -have certain items applied to them like a Managed cluster would (e.g. OSD -operators, health checks, etc).* - -## Tests - -OSDe2e currently holds all core and operator specific tests and are maintained by the CICD team. -Test types range from core OSD verification, OSD operators to scale/conformance. - -*Currently in flight: OSD operator tests will no longer reside in osde2e repository and -live directly alongside the operator source code in its repository* - -### Selecting Tests To Run - -OSDe2e supports a couple different ways you can select which tests you would like to run. Below presents -the commonly used methods for this: - -* Using the label filter. Labels are ginkgos way to tag test cases. The examples below - will tell osde2e to run all tests that have the `E2E` label applied. - -```shell -# Command line option -osde2e test --label-filter E2E - -# Passed in using a custom config file -tests: - ginkgoLabelFilter: E2E +**Cluster Upgrades:** +```bash +# OCM and AWS credentials (same as above) +export OCM_CLIENT_ID= +export OCM_CLIENT_SECRET= +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= + +# Upgrade configuration +export CLUSTER_ID= +export UPGRADE_MANAGED=true +export UPGRADE_TO_LATEST_Z=true # Or UPGRADE_TO_LATEST, UPGRADE_TO_LATEST_Y, UPGRADE_RELEASE_NAME + +ocm login --url stg +./osde2e test --cluster-id ${CLUSTER_ID} --configs rosa,e2e-suite,stage ``` -* Using focus strings. Focus strings are ginkgos way to select test cases based on string regex. +**Key Points:** +- Supports OCM-driven upgrades via managed-upgrade-operator +- Verify cluster health pre and post-upgrade +- Specify target version with `UPGRADE_RELEASE_NAME` or use latest flags -```shell -# Command line option -osde2e test --focus-tests "OCM Agent Operator" - -# Custom config file -tests: - focus: "OCM Agent Operator" -``` +**On New Cluster:** +```bash +# OCM credentials +export OCM_CLIENT_ID= +export OCM_CLIENT_SECRET= -* Using a combination of labels and focus strings to fine tune your test selection. - The examples below tell osde2e to run all ocm agent operator tests and avoid running - the upgrade test case. +# AWS credentials +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= +export AWS_REGION= -```shell -# Command line options -osde2e test --label-filter "Operators && !Upgrade" --focus-tests "OCM Agent Operator" +# Optional cluster configuration +export CLUSTER_VERSION=openshift-v4.14.0 -# Custom config file -tests: - ginkgoLabelFilter: "Operators && !Upgrade" - focus: "OCM Agent Operator" +./osde2e test --configs rosa,e2e-suite,stage ``` -### Writing Tests - -Refer to the [Writing Tests] document for guidelines and standards. - -Third-party (Addon) tests are built as containers that spin up and report back results to OSDe2e. -These containers are built and maintained by external groups looking to get CI signal for -their product within OSD. The definition of a third-party test is maintained within -the `managed-tenants` repo and is returned via the Add-Ons API. - -For more information please see the [OSDE2E Test Harness] repository to learn more -for writing add on tests. - -## Reporting - -Each time osde2e runs it captures as much data that it possible can. Data can include -cluster/pod logs, prometheus metrics, test data generated, hive version and osde2e version -to identify any possible flakiness in the environment. - -Each time tests are executed a JUnit XML file will be generated to capture all the tests -that ran and statistics about them (e.g. pass/fail, duration). These XML files will be later -used by external applications to present metrics and data for others to see into. An example of -this is they are used to present data in [TestGrid Dashboards][TestGrid Dashboard]. - -## Slack Notifications +### 7. Slack Notifications OSDe2e can send AI-powered failure analysis to Slack when tests fail. Each test suite can notify a different Slack channel with failure details, analysis, and logs. -### Setup - -**1. Add Workflow to Your Slack Channel** - -Each team adds the shared E2E Test Notifications workflow to their channel: +#### Setup -1. Open the workflow: https://slack.com/shortcuts/Ft09RL7M2AMV/60f07b46919da20d103806a8f5bba094 -2. Click **Add to Slack** -3. Select your destination channel -4. Copy the webhook URL (starts with `https://hooks.slack.com/workflows/...`) - -**2. Get Your Channel ID** +**1. Get Your Channel ID** Right-click your channel → **View channel details** → copy the channel ID (starts with `C`, e.g., `C06HQR8HN0L`) -**3. Configure Test Suites** +**2. Configure Test Suites** Set `TEST_SUITES_YAML` with your test images, webhook URLs, and Slack channel IDs: @@ -363,7 +233,7 @@ export TEST_SUITES_YAML=' ' ``` -**4. Enable Notifications** +**3. Enable Notifications** Enable Slack notifications in your config: @@ -374,29 +244,88 @@ logAnalysis: enableAnalysis: true ``` -### What You'll Receive +#### What You'll Receive When tests fail, you'll get a threaded Slack message with: 1. **Main message**: Test suite info (what failed) 2. **Reply 1**: AI analysis (why it failed) -3. **Reply 2**: Test failure logs (evidence) +3. **Reply 2**: Links to persisted logs and junit results (evidence) 4. **Reply 3**: Cluster details (for debugging) -For implementation details, see [internal/reporter/README.md](internal/reporter/README.md). - -## CI Jobs - -Periodic jobs are run daily validating Managed OpenShift clusters, using -`osde2e`. Check out the [CI Jobs] page to learn more. - -[Config variables]:/docs/Config.md -[configs]:/configs/ -[config package]:/pkg/common/config/config.go -[ocm resources]: https://gitlab.cee.redhat.com/service/ocm-resources/ -[OSDE2E Quay Image]: quay.io/redhat-services-prod/osde2e-cicada-tenant/osde2e -[OpenShift Dedicated]: https://docs.openshift.com/dedicated/welcome/index.html -[OSDE2E Test Harness]: https://github.com/openshift/osde2e-example-test-harness -[OSDE2E ProwGen Job Config]: https://github.com/openshift/release/blob/master/ci-operator/config/openshift/osde2e/openshift-osde2e-main.yaml -[TestGrid Dashboard]: https://testgrid.k8s.io/redhat-openshift-osd -[Writing Tests]:/docs/Writing-Tests.md -[CI Jobs]: /docs/CI-Jobs.md +For implementation details, see [internal/reporter/README.md](../internal/reporter/README.md). + +### 8. Configuration (Config.md) + +**Environment Variables:** + +**Cluster Related:** +- CLUSTER_ID, OSD_ENV, CLOUD_PROVIDER_ID +- CLOUD_PROVIDER_REGION, CLUSTER_VERSION +- SKIP_DESTROY_CLUSTER, MULTI_AZ + +**ROSA Specific:** +- ROSA_ENV, ROSA_STS, ROSA_REPLICAS + +**Hypershift:** +- Hypershift (boolean for HostedCluster) + +**OCM:** +- OCM_COMPUTE_MACHINE_TYPE, OCM_CCS +- OCM_FLAVOUR, OCM_ADDITIONAL_LABELS + +**Upgrade:** +- UPGRADE_TO_LATEST, UPGRADE_TO_LATEST_Z/Y +- UPGRADE_RELEASE_NAME, UPGRADE_IMAGE + +**Test Execution:** +- GINKGO_SKIP, GINKGO_FOCUS (only for monorepo tests) +- ADDON_IDS_AT_CREATION, ADDONS_IDS + +**Command Line Flags:** +- `--cluster-id`: Test existing cluster +- `--configs`: Comma-separated built-in configs +- `--skip-destroy-cluster`: Retain cluster after test +- `--skip-health-check`: Skip health checks +- `--skip-tests`: Skip matching tests + +**Config Values:** +- Environments: int, stage, prod +- Providers: aws, gcp, ocm, rosa +- Test Suites: e2e-suite, informing-suite, openshift-suite +- Special: dry-run, skip-health-checks, upgrade-to-latest + +## Workflow Summary + +### Typical Test Flow +1. **Setup**: Configure environment variables or use built-in configs +2. **Cluster Provisioning**: Create new or use existing cluster +3. **Health Checks**: Validate cluster health (optional) +4. **Test Execution**: Run Ginkgo test suites via ad-hoc test images +5. **Metrics Collection**: Send results to Prometheus +6. **Upgrade Testing**: Optional cluster upgrade validation +7. **Cleanup**: Delete cluster (optional) and collect must-gather + +### Integration Points +- **Prow**: Automated CI/CD testing +- **Testgrid**: Result visualization +- **Prometheus**: Metrics storage and querying +- **OCM**: Cluster lifecycle management +- **TRT**: OpenShift release gating + +## Key Resources + +**Documentation:** +- Ad-hoc Test Image Example: github.com/openshift/osde2e-example-test-harness +- OSDE2E Common: github.com/openshift/osde2e-common +- Release Repo: github.com/openshift/release +- E2E Framework: github.com/kubernetes-sigs/e2e-framework + +**Dashboards:** +- Jenkins jobs: https://ci.int.devshift.net/view/osde2e/ +- Progressive delivery rollouts: https://inscope.corp.redhat.com/catalog +- SD CICD TestGrid: testgrid.k8s.io/redhat-openshift-osd +- Prow: prow.ci.openshift.org +- Prometheus: prometheus.app-sre-prod-01.devshift.net + +**Communication:** +- Slack: #hcm-delivery \ No newline at end of file