Skip to content

Commit 2f479cc

Browse files
committed
merge: bring feat-microsoft-provider-v2 up to main
2 parents abfb900 + f061b1d commit 2f479cc

202 files changed

Lines changed: 26279 additions & 7106 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/build-from-issue/SKILL.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,8 @@ In the prompt, instruct the reviewer to:
148148
- **Medium**: Multiple files/components, some design decisions, but well-scoped
149149
- **High**: Cross-cutting changes, architectural decisions needed, significant unknowns
150150
8. Call out risks, unknowns, and decisions that need stakeholder input.
151-
9. Assess **LSM compatibility** — if the change touches process identity, `/proc` filesystem access, binary execution, or inter-process visibility, flag whether it will behave differently on hosts running SELinux (enforcing) or AppArmor. In particular, tests that fork+exec into system binaries will fail on SELinux-enforcing hosts due to cross-label `/proc/<pid>/exe` access restrictions.
151+
9. Assess **gateway config documentation impact** — if the change adds, removes, renames, or changes defaults for gateway TOML keys or driver-specific config options, the plan must include an update to `docs/reference/gateway-config.mdx`. If the change is surfaced through Helm or a compute-driver overview, also include `docs/reference/sandbox-compute-drivers.mdx` or the relevant deployment docs.
152+
10. Assess **LSM compatibility** — if the change touches process identity, `/proc` filesystem access, binary execution, or inter-process visibility, flag whether it will behave differently on hosts running SELinux (enforcing) or AppArmor. In particular, tests that fork+exec into system binaries will fail on SELinux-enforcing hosts due to cross-label `/proc/<pid>/exe` access restrictions.
152153

153154
### A2: Post the Plan Comment
154155

@@ -436,6 +437,13 @@ Review the documentation requirements in `AGENTS.md` and update any affected
436437
docs as part of the implementation. Keep documentation changes scoped to the
437438
behavior or subsystem that changed.
438439

440+
If the implementation changes gateway TOML parsing, `[openshell.gateway]`
441+
fields, `[openshell.drivers.<name>]` fields, driver config defaults, or Helm
442+
rendering of `gateway.toml`, update `docs/reference/gateway-config.mdx` in the
443+
same branch. If the change affects user-facing compute-driver setup, also
444+
update `docs/reference/sandbox-compute-drivers.mdx` or the relevant deployment
445+
page.
446+
439447
### Step 12: Commit and Push
440448

441449
Commit all changes using conventional commit format. The `<type>` comes from the issue type in the plan:

.agents/skills/create-github-pr/SKILL.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,15 @@ Create pull requests on GitHub using the `gh` CLI.
1515

1616
## Before Creating a PR
1717

18+
### Check Config Documentation
19+
20+
If the branch changes gateway TOML parsing, `[openshell.gateway]` fields,
21+
`[openshell.drivers.<name>]` fields, driver config defaults, or Helm rendering
22+
of `gateway.toml`, verify that `docs/reference/gateway-config.mdx` is updated
23+
in the same branch. If the change affects user-facing compute-driver setup,
24+
also update `docs/reference/sandbox-compute-drivers.mdx` or the relevant
25+
deployment docs.
26+
1827
### Run Pre-commit Checks
1928

2029
Run the local pre-commit task before opening a PR:

.agents/skills/create-spike/SKILL.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,11 @@ The prompt to the reviewer **must** instruct it to:
9191

9292
9. **Check architecture docs** in the `architecture/` directory for relevant documentation about the affected subsystems.
9393

94-
10. **Assess Linux Security Module (LSM) impact.** If the change involves process identity, `/proc` filesystem access, file labeling, binary execution, or inter-process visibility, call out whether it will behave differently on hosts running SELinux (enforcing) or AppArmor. For example: reading `/proc/<pid>/exe` across an SELinux domain boundary returns ENOENT, not EACCES. Tests that fork+exec into system binaries (different SELinux label) will fail on enforcing hosts. Flag any LSM-sensitive code paths and recommend mitigations.
94+
10. **Assess gateway config documentation impact.** If the change would add, remove, rename, or change defaults for gateway TOML keys or driver-specific config options, call out that `docs/reference/gateway-config.mdx` must be updated. If the change is surfaced through Helm or compute-driver setup docs, call out the relevant deployment or compute-driver docs too.
9595

96-
11. **Determine the issue type:** `feat`, `fix`, `refactor`, `chore`, `perf`, or `docs`.
96+
11. **Assess Linux Security Module (LSM) impact.** If the change involves process identity, `/proc` filesystem access, file labeling, binary execution, or inter-process visibility, call out whether it will behave differently on hosts running SELinux (enforcing) or AppArmor. For example: reading `/proc/<pid>/exe` across an SELinux domain boundary returns ENOENT, not EACCES. Tests that fork+exec into system binaries (different SELinux label) will fail on enforcing hosts. Flag any LSM-sensitive code paths and recommend mitigations.
97+
98+
12. **Determine the issue type:** `feat`, `fix`, `refactor`, `chore`, `perf`, or `docs`.
9799

98100
### What makes a good investigation prompt
99101

.agents/skills/debug-openshell-cluster/SKILL.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,15 @@ kubectl -n openshell rollout status statefulset/openshell
138138

139139
Look for failed installs, unexpected values, missing namespace, wrong image tag, TLS settings that do not match the registered endpoint, and scheduling failures.
140140

141+
For HA or PostgreSQL-backed installs, also check the service-binding Secret and
142+
bundled PostgreSQL workload:
143+
144+
```bash
145+
kubectl -n openshell get secret -l app.kubernetes.io/instance=openshell
146+
kubectl -n openshell get statefulset,pod,pvc -l app.kubernetes.io/instance=openshell
147+
kubectl -n openshell logs statefulset/openshell-postgres --tail=200
148+
```
149+
141150
Check required Helm deployment secrets:
142151

143152
```bash

.agents/skills/helm-dev-environment/SKILL.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: helm-dev-environment
3-
description: Start up, tear down, and configure the local Kubernetes development environment for OpenShell. Uses k3d (Docker-backed k3s) + Skaffold + Helm. Covers cluster lifecycle, optional add-ons (Keycloak OIDC, Envoy Gateway), and port mappings. Trigger keywords - local k8s, local cluster, k3d, skaffold, helm dev, start cluster, stop cluster, tear down cluster, delete cluster, create cluster, helm:k3s, helm:skaffold, local dev environment, dev cluster, k8s dev, envoy gateway local, keycloak local.
3+
description: Start up, tear down, and configure the local Kubernetes development environment for OpenShell. Uses k3d (Docker-backed k3s) + Skaffold + Helm. Covers cluster lifecycle, optional add-ons (Keycloak OIDC, Envoy Gateway), HA testing, and port mappings. Trigger keywords - local k8s, local cluster, k3d, skaffold, helm dev, start cluster, stop cluster, tear down cluster, delete cluster, create cluster, helm:k3s, helm:skaffold, local dev environment, dev cluster, k8s dev, envoy gateway local, keycloak local, high availability, HA.
44
---
55

66
# Helm Dev Environment
@@ -65,6 +65,10 @@ generates mTLS secrets on first install. Envoy Gateway opt-in; see the Optional
6565

6666
The gateway Service uses ClusterIP. Access is via Envoy Gateway (port `8080`) or `kubectl port-forward`.
6767

68+
**HA test deploy** (two gateway replicas + bundled PostgreSQL): uncomment
69+
`#- ci/values-high-availability.yaml` in `deploy/helm/openshell/skaffold.yaml`,
70+
then run `mise run helm:skaffold:run` or `mise run helm:skaffold:dev`.
71+
6872
### TLS behaviour
6973

7074
`ci/values-skaffold.yaml` sets `server.disableTls: true`, so Skaffold-based deploys run
@@ -198,6 +202,7 @@ mise run helm:k3s:status
198202
| `deploy/helm/openshell/ci/values-skaffold.yaml` | Dev overrides (image pull policy, TLS disabled for local Skaffold) |
199203
| `deploy/helm/openshell/ci/values-cert-manager.yaml` | cert-manager PKI overlay (opt-in; disables pkiInitJob) |
200204
| `deploy/helm/openshell/ci/values-gateway.yaml` | Envoy Gateway GRPCRoute + Gateway overlay |
205+
| `deploy/helm/openshell/ci/values-high-availability.yaml` | HA test overlay (`replicaCount: 2` with bundled PostgreSQL) |
201206
| `deploy/helm/openshell/ci/values-keycloak.yaml` | Keycloak OIDC overlay |
202207
| `deploy/helm/openshell/ci/values-tls-disabled.yaml` | Lint-only: TLS + auth disabled (reverse-proxy edge termination) |
203208
| `deploy/kube/manifests/envoy-gateway-openshell.yaml` | GatewayClass for Envoy Gateway (`mise run helm:gateway:apply`) |

.agents/skills/update-docs/SKILL.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ git log -50 --oneline --no-merges
3434
Filter to commits that are likely to affect docs. Look for these signals:
3535

3636
1. **Commit type**: `feat`, `fix`, `refactor`, `perf` commits often change behavior. `docs` commits are already doc changes. `chore`, `ci`, `test` commits rarely need doc updates.
37-
2. **Files changed**: Changes to `crates/openshell-cli/`, `python/`, `proto/`, `deploy/`, or policy-related code are high-signal.
37+
2. **Files changed**: Changes to `crates/openshell-cli/`, `python/`, `proto/`, `deploy/`, gateway config parsing, driver config structs, or policy-related code are high-signal.
3838
3. **Ignore**: Changes limited to `tests/`, `e2e/`, `.github/`, `tasks/`, or internal-only modules.
3939

4040
```bash
@@ -52,6 +52,10 @@ For each relevant commit, determine which doc page(s) it affects. Use this mappi
5252
| `crates/openshell-cli/` (sandbox commands) | `docs/sandboxes/manage-sandboxes.mdx` |
5353
| `crates/openshell-cli/` (provider commands) | `docs/sandboxes/manage-providers.mdx` |
5454
| `crates/openshell-cli/` (new top-level command) | May need a new page or `docs/reference/` entry |
55+
| `crates/openshell-server/src/config_file.rs` or gateway TOML parsing | `docs/reference/gateway-config.mdx` |
56+
| `crates/openshell-server/src/cli.rs` gateway config merge/default behavior | `docs/reference/gateway-config.mdx` |
57+
| `crates/openshell-driver-*/` config structs or driver defaults | `docs/reference/gateway-config.mdx`, `docs/reference/sandbox-compute-drivers.mdx` |
58+
| `deploy/helm/openshell/templates/gateway-config.yaml` | `docs/reference/gateway-config.mdx`, `docs/reference/sandbox-compute-drivers.mdx`, Helm docs if values change |
5559
| Proxy or policy code | `docs/sandboxes/policies.mdx`, `docs/reference/policy-schema.mdx` |
5660
| Inference code | `docs/inference/configure.mdx` |
5761
| `python/` (SDK changes) | `docs/reference/` or `docs/get-started/quickstart.mdx` |

.github/actions/release-helm-oci/action.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,14 @@ runs:
7171
exit 1
7272
fi
7373
74+
- name: Build chart dependencies
75+
env:
76+
CHART_DIR: ${{ steps.prep.outputs.chart_dir }}
77+
shell: bash
78+
run: |
79+
set -euo pipefail
80+
helm dependency build "${CHART_DIR}"
81+
7482
- name: Package Helm chart
7583
env:
7684
CHART_DIR: ${{ steps.prep.outputs.chart_dir }}

.github/workflows/branch-e2e.yml

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ jobs:
2323
should_run: ${{ steps.gate.outputs.should_run }}
2424
run_core_e2e: ${{ steps.labels.outputs.run_core_e2e }}
2525
run_gpu_e2e: ${{ steps.labels.outputs.run_gpu_e2e }}
26+
run_kubernetes_ha_e2e: ${{ steps.labels.outputs.run_kubernetes_ha_e2e }}
2627
run_any_e2e: ${{ steps.labels.outputs.run_any_e2e }}
2728
steps:
2829
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -39,24 +40,27 @@ jobs:
3940
if [ "$EVENT_NAME" != "push" ]; then
4041
run_core_e2e=true
4142
run_gpu_e2e=true
43+
run_kubernetes_ha_e2e=true
4244
else
4345
run_core_e2e="$(jq -r 'index("test:e2e") != null' <<< "$LABELS_JSON")"
4446
run_gpu_e2e="$(jq -r 'index("test:e2e-gpu") != null' <<< "$LABELS_JSON")"
47+
run_kubernetes_ha_e2e="$(jq -r 'index("test:e2e-kubernetes") != null' <<< "$LABELS_JSON")"
4548
fi
46-
if [ "$run_core_e2e" = "true" ] || [ "$run_gpu_e2e" = "true" ]; then
49+
if [ "$run_core_e2e" = "true" ] || [ "$run_gpu_e2e" = "true" ] || [ "$run_kubernetes_ha_e2e" = "true" ]; then
4750
run_any_e2e=true
4851
else
4952
run_any_e2e=false
5053
fi
5154
{
5255
echo "run_core_e2e=$run_core_e2e"
5356
echo "run_gpu_e2e=$run_gpu_e2e"
57+
echo "run_kubernetes_ha_e2e=$run_kubernetes_ha_e2e"
5458
echo "run_any_e2e=$run_any_e2e"
5559
} >> "$GITHUB_OUTPUT"
5660
5761
build-gateway:
5862
needs: [pr_metadata]
59-
if: needs.pr_metadata.outputs.should_run == 'true' && needs.pr_metadata.outputs.run_core_e2e == 'true'
63+
if: needs.pr_metadata.outputs.should_run == 'true' && (needs.pr_metadata.outputs.run_core_e2e == 'true' || needs.pr_metadata.outputs.run_kubernetes_ha_e2e == 'true')
6064
permissions:
6165
contents: read
6266
packages: write
@@ -107,6 +111,18 @@ jobs:
107111
with:
108112
image-tag: ${{ github.sha }}
109113

114+
kubernetes-ha-e2e:
115+
needs: [pr_metadata, build-gateway, build-supervisor]
116+
if: needs.pr_metadata.outputs.should_run == 'true' && needs.pr_metadata.outputs.run_kubernetes_ha_e2e == 'true'
117+
permissions:
118+
contents: read
119+
packages: read
120+
uses: ./.github/workflows/e2e-kubernetes-test.yml
121+
with:
122+
image-tag: ${{ github.sha }}
123+
job-name: Kubernetes HA E2E (Rust smoke)
124+
extra-helm-values: deploy/helm/openshell/ci/values-high-availability.yaml
125+
110126
core-e2e-result:
111127
name: Core E2E result
112128
needs: [pr_metadata, build-gateway, build-supervisor, e2e, kubernetes-e2e]
@@ -160,3 +176,30 @@ jobs:
160176
fi
161177
done
162178
exit "$failed"
179+
180+
kubernetes-ha-e2e-result:
181+
name: Kubernetes HA E2E result
182+
needs: [pr_metadata, build-gateway, build-supervisor, kubernetes-ha-e2e]
183+
if: always() && needs.pr_metadata.outputs.should_run == 'true' && needs.pr_metadata.outputs.run_kubernetes_ha_e2e == 'true'
184+
runs-on: ubuntu-latest
185+
steps:
186+
- name: Verify Kubernetes HA E2E jobs
187+
env:
188+
BUILD_GATEWAY_RESULT: ${{ needs.build-gateway.result }}
189+
BUILD_SUPERVISOR_RESULT: ${{ needs.build-supervisor.result }}
190+
KUBERNETES_HA_E2E_RESULT: ${{ needs.kubernetes-ha-e2e.result }}
191+
run: |
192+
set -euo pipefail
193+
failed=0
194+
for item in \
195+
"build-gateway:$BUILD_GATEWAY_RESULT" \
196+
"build-supervisor:$BUILD_SUPERVISOR_RESULT" \
197+
"kubernetes-ha-e2e:$KUBERNETES_HA_E2E_RESULT"; do
198+
name="${item%%:*}"
199+
result="${item#*:}"
200+
if [ "$result" != "success" ]; then
201+
echo "::error::$name concluded $result"
202+
failed=1
203+
fi
204+
done
205+
exit "$failed"

.github/workflows/docker-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
cargo-version: ${{ inputs['cargo-version'] }}
163163
image-tag: ${{ needs.resolve.outputs.image_tag_base }}
164164
checkout-ref: ${{ inputs['checkout-ref'] }}
165-
features: openshell-core/dev-settings
165+
features: ${{ inputs.component == 'gateway' && 'openshell-core/dev-settings bundled-z3' || 'openshell-core/dev-settings' }}
166166
artifact-name: ${{ needs.resolve.outputs.artifact_prefix }}-linux-${{ matrix.arch }}
167167
secrets: inherit
168168

.github/workflows/e2e-kubernetes-test.yml

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,29 @@ on:
1717
required: false
1818
type: string
1919
default: ""
20+
job-name:
21+
description: "Display name for the Kubernetes e2e job"
22+
required: false
23+
type: string
24+
default: "Kubernetes E2E (Rust smoke)"
25+
extra-helm-values:
26+
description: "Colon-separated Helm values files to layer on the Kubernetes e2e chart install"
27+
required: false
28+
type: string
29+
default: ""
30+
mise-version:
31+
description: "mise version to install on the bare Kubernetes e2e runner"
32+
required: false
33+
type: string
34+
default: "v2026.4.25"
2035

2136
permissions:
2237
contents: read
2338
packages: read
2439

2540
jobs:
2641
e2e-kubernetes:
27-
name: Kubernetes E2E (Rust smoke)
42+
name: ${{ inputs.job-name }}
2843
# Bare runner: running kind-in-container hits nested-Docker / kubeconfig
2944
# complications. The runner has Docker; mise installs helm, kubectl, and
3045
# the Rust toolchain.
@@ -35,6 +50,8 @@ jobs:
3550
packages: read
3651
env:
3752
MISE_GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
53+
# Keep bare-runner installs aligned with the project CI image.
54+
MISE_VERSION: ${{ inputs.mise-version }}
3855
KIND_CLUSTER_NAME: kube-e2e-${{ github.run_id }}
3956
steps:
4057
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
@@ -43,7 +60,7 @@ jobs:
4360

4461
- name: Install mise
4562
run: |
46-
curl https://mise.run | sh
63+
curl https://mise.run | MISE_VERSION=v2026.4.25 sh
4764
echo "$HOME/.local/bin" >> "$GITHUB_PATH"
4865
echo "$HOME/.local/share/mise/shims" >> "$GITHUB_PATH"
4966
@@ -93,6 +110,7 @@ jobs:
93110
- name: Run Kubernetes E2E (Rust smoke)
94111
env:
95112
OPENSHELL_E2E_KUBE_CONTEXT: kind-${{ env.KIND_CLUSTER_NAME }}
113+
OPENSHELL_E2E_KUBE_EXTRA_VALUES: ${{ inputs.extra-helm-values }}
96114
IMAGE_TAG: ${{ inputs.image-tag }}
97115
OPENSHELL_REGISTRY: ghcr.io/nvidia/openshell
98116
run: mise run --no-deps --skip-deps e2e:kubernetes

0 commit comments

Comments
 (0)