Commit 9febe89
committed
feat(python-sdk): support OIDC Bearer auth on SandboxClient
PR #1596 hardened the gateway side of the OIDC story; the Python SDK
was the remaining gap — it only supported plaintext or mTLS, with no
Bearer metadata anywhere. Deployments with OIDC enabled (the
recommended posture since PR #935 / PR #1404) were unreachable from
the SDK.
Adds:
- `bearer_token: str | Callable[[], str] | None` kwarg on
`SandboxClient`. Static strings or zero-arg callables (the latter
is invoked per RPC, so callers can drop in a refresh loop or
token-file watcher without reconstructing the client). Composes
with `tls` for OIDC-over-mTLS deployments.
- `_BearerAuthInterceptor` implementing all four
`grpc.{Unary,Stream}{Unary,Stream}ClientInterceptor` types.
Appends `authorization: Bearer <token>` to outgoing metadata.
Implemented as an interceptor (not call credentials) so it works
on both plaintext (`disableTls=true` dev) and TLS channels without
`grpc.composite_channel_credentials`.
- `TlsConfig` ergonomics: all three fields (`ca_path`, `cert_path`,
`key_path`) are now optional with `cert_path` / `key_path`
required-together-or-not-at-all (enforced in `__post_init__`). This
unlocks three transport profiles from one dataclass:
* full mTLS (all three)
* CA-only trust (`ca_path` only)
* system roots (`TlsConfig()` — for OIDC gateways behind a
public CA)
- `from_active_cluster` mirrors `crates/openshell-tui/src/lib.rs`
`build_oidc_channel`:
* For any `https://` gateway, always build a secure channel.
Pick the strongest TLS profile available in `mtls/` (full
mTLS → CA-only → system roots). No more `insecure_channel`
fallback for HTTPS.
* Gate OIDC bearer attachment on
`metadata.json["auth_mode"] == "oidc"`. Matches
`crates/openshell-cli/src/main.rs:132` and the TUI; a stale
`oidc_token.json` next to a non-OIDC gateway no longer causes
the SDK to attach a bearer.
- `_OidcRefresher` — thread-safe, in-process native OAuth2 refresh
modeled on `google.oauth2.credentials.Credentials` and
`botocore.tokens.SSOTokenProvider`. Lazily checks expiry on every
RPC; when stale, re-reads disk first (the CLI may have rotated
the bundle), and only then exchanges the refresh_token against
the IdP's token endpoint discovered via OIDC discovery
(`/.well-known/openid-configuration`, cached after first call).
Concurrent RPCs share a single refresh via `threading.Lock` (no
IdP stampede). Honors refresh-token rotation. Surfaces IdP
failures as `SandboxError` with the RFC 6749 error body included
for diagnostics.
Mirrors the Rust CLI's HTTP-policy posture from
`crates/openshell-cli/src/oidc_auth.rs`:
* `follow_redirects=False` so a 3xx during discovery can't
steer us to an attacker-controlled token endpoint.
* Discovery `issuer` is validated against the configured
issuer; a discovery document claiming a different issuer is
rejected, preventing the SDK from POSTing the refresh_token
to a malicious endpoint.
* `insecure: bool` flag plumbed through to httpx's
`verify=` so self-signed-cert deployments work the same way
they do in the Rust CLI.
Built on `httpx` (chosen over `urllib` specifically for
follow_redirects + verify control as kwargs). The OAuth2
refresh-token grant itself (RFC 6749 §6) is one form-encoded
POST — handled inline rather than via a dedicated OAuth library;
tried `authlib`'s `OAuth2Client` first but it auto-injects an
Authorization header on every request, which breaks the
unauthenticated discovery GET.
- `_make_cluster_bearer_provider(..., auto_refresh=True,
write_back=True, insecure=False)` factory. Defaults to the
refresher path with write-back enabled; `auto_refresh=False`
falls back to the read-only fail-closed behavior for callers that
don't want the SDK to make outbound HTTP calls to the IdP.
`write_back=True` is the default (changed from the first round of
review): IdPs with refresh-token rotation (Keycloak with
rotation, Entra in strict mode) invalidate the old refresh_token
on each refresh, so an in-memory-only refresh would leave the
on-disk bundle pointing at an invalidated value — any second
process starting from disk would `invalid_grant`. With write-back
enabled by default, the SDK keeps the shared cache consistent
with the IdP.
- `from_active_cluster` exposes `auto_refresh`, `write_back`, and
`insecure` kwargs (defaults: True / True / False). The
high-level `Sandbox` context manager surfaces the same three
kwargs and forwards them through, so callers using the wrapper
have parity with `SandboxClient` for OIDC-protected gateways.
- `SandboxClient.close()` chains to a `_bearer_close` hook so the
`_OidcRefresher`'s underlying `httpx.Client` is released
deterministically instead of leaking sockets/FDs until GC runs
`__del__`. Idempotent.
- `_OidcRefresher._write_to_disk` uses `tempfile.mkstemp` (PID +
random suffix) instead of a fixed `.oidc_token.json.tmp` path,
so two writers racing on the same gateway directory don't
trample each other's tmp content. Success path atomically
replaces; failure path unlinks the orphan.
OAuth2 refresh policy and write-back semantics deliberately mirror
what the major Python SDKs do — see
github.com/googleapis/google-auth-library-python (`Credentials`)
and github.com/boto/botocore (`SSOTokenProvider`):
| Library | Native refresh | Writes back |
|-------------------------------|----------------|-------------|
| google-auth Credentials | yes | no |
| botocore SSOTokenProvider | yes | yes |
| openshell SandboxClient (here)| yes (opt-out) | yes (opt-out)|
OpenShell sits between the two; chose write-back-by-default because
the rotation invariant matters more for our deployments than the
"CLI is the only writer" assumption that fits google-auth.
Adds `httpx>=0.27` as a runtime dependency. No new OAuth2 library —
the refresh grant is a single POST.
Tested:
- 44 sandbox_test.py tests pass (5 pre-existing + 39 new across
the bearer interceptor, fail-closed provider, refresher
behavior, TlsConfig validation, from_active_cluster auth ladder,
security-review regressions, Sandbox-wrapper kwarg forwarding,
and lifecycle / concurrency probes).
`mise run test:python` → 49 passed total across the python
suite.
- `mise run python:lint` (ruff) clean.
- End-to-end against a Keycloak-protected gateway on OpenShift:
* unauthenticated `Health` bypass works
* admin + `openshell:all` reaches user-callable methods
* reader (`sandbox:read`) denied on `CreateSandbox` by scope
* admin + `openshell:all` denied on PR #1596 sandbox-only
methods at the router (the new gate is honored from the SDK)
* full provider CRUD lifecycle via the SDK
* callable token provider rotates per RPC as expected
- Regression-probed against three pre-review security findings:
* **Discovery issuer validation** — a discovery document
claiming a different `issuer` than the configured one is
rejected with a clear `SandboxError` before any refresh POST
can reach the attacker-controlled endpoint.
* **Redirect during discovery** — `follow_redirects=False` on
the underlying httpx client means a 3xx during discovery
surfaces as a SandboxError rather than silently chasing the
redirect.
* **Cross-process rotation** — a two-process simulation shows
process B starting from disk and successfully refreshing
with the rotated refresh_token, because process A's
write-back updated the shared cache.
- Refresher unit tests cover: cached-fresh fast path, disk-rotated
re-read before refresh, OAuth2 exchange against the discovered
token endpoint, refresh-token rotation, atomic write-back at
0600 mode (default), default-on write_back proven by test,
concurrent N-thread coordination (one refresh shared across 8
threads), IdP failure surfaced with the error body, the
client_credentials / no-refresh_token error path, issuer-
mismatch rejection, redirect-during-discovery rejection,
insecure flag plumbing.
- Lifecycle / concurrency regression tests added: `close()`
invokes the `_bearer_close` hook (idempotent), the refresher's
`httpx.Client` is marked closed after `SandboxClient.close()`,
and 16 concurrent writers don't leave orphan tmp files behind
while producing a valid final bundle. The `Sandbox` wrapper has
direct forwarding tests proving `auto_refresh`, `write_back`,
and `insecure` reach `from_active_cluster` (both explicit
values and defaults).
- End-to-end against a real OpenShift + Keycloak cluster from
inside a pod: real OIDC discovery against
`keycloak.keycloak.svc.cluster.local:8080`, refresh-token grant
POST, atomic write-back of the rotated bundle at 0600, and a
follow-up RPC reusing the freshly-rotated in-memory token —
full round-trip in ~170ms.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>1 parent 6c7950d commit 9febe89
4 files changed
Lines changed: 1826 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
44 | 63 | | |
45 | 64 | | |
46 | 65 | | |
| |||
53 | 72 | | |
54 | 73 | | |
55 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
56 | 79 | | |
57 | 80 | | |
58 | 81 | | |
| |||
0 commit comments