Skip to content

feat(exposure): CloudFront mode for domain-less deployments#680

Draft
allamand wants to merge 3 commits into
feature/agent-platformfrom
feature/cloudfront-exposure
Draft

feat(exposure): CloudFront mode for domain-less deployments#680
allamand wants to merge 3 commits into
feature/agent-platformfrom
feature/cloudfront-exposure

Conversation

@allamand
Copy link
Copy Markdown
Contributor

Summary

Implements conditional ingress rendering based on exposure.mode config:

  • domain (default): HTTPS:443, host-based routing, ACM cert required
  • cloudfront: HTTP:80, no host, CloudFront terminates TLS — no custom domain needed

Problem

In Workshop Studio environments there's no custom domain or Route53 hosted zone. The ALB controller fails with no certificate found for host because ingresses specify HTTPS with a host that has no ACM cert.

Changes

  • Add exposure.mode to config.yaml schema and template
  • Update ingress templates (keycloak, argo-workflows, grafana, jupyterhub, kubeflow) with conditional HTTP/HTTPS rendering
  • Pass exposure_mode annotation through registry valuesObject to addon charts
  • Add hub:cloudfront Taskfile task (creates ALB + CloudFront distribution, updates Secrets Manager)
  • Update fleet-secret chart to propagate exposure_mode annotation

Testing

Verified on live hub cluster:

  • ALB provisioned successfully with HTTP:80 listeners
  • Keycloak responding at /keycloak via ALB
  • All ingresses sharing the platform group without cert errors

Closes #677

@allamand
Copy link
Copy Markdown
Contributor Author

Companion issue for the external agent-platform repo charts (agent-gateway, langfuse ingresses): aws-samples/sample-agent-platform-on-eks#11

@allamand allamand force-pushed the feature/cloudfront-exposure branch from 446ae43 to d3d7ed9 Compare May 20, 2026 21:04
@allamand allamand marked this pull request as draft May 20, 2026 21:09
@allamand allamand force-pushed the feature/cloudfront-exposure branch from d3d7ed9 to 7d0d5ce Compare May 20, 2026 21:11
Implements conditional ingress rendering:
- exposure.mode: 'domain' (default) — HTTPS:443, host-based routing, TLS
- exposure.mode: 'cloudfront' — HTTP:80, no host, CloudFront terminates TLS

Changes:
- Add exposure.mode to config schema and template
- Update ingress templates (keycloak, argo-workflows, grafana, jupyterhub, kubeflow)
- Pass exposure_mode annotation through registry valuesObject
- Add hub:cloudfront Taskfile task (creates ALB + CloudFront distribution)
- Update fleet-secret chart to propagate exposure_mode annotation

Closes #677
@allamand allamand force-pushed the feature/cloudfront-exposure branch from 7d0d5ce to e9bd3b3 Compare May 20, 2026 21:34
@shapirov103 shapirov103 requested a review from hmuthusamy May 20, 2026 21:42
@hmuthusamy
Copy link
Copy Markdown
Collaborator

Review: feature/cloudfront-exposure

Overall approach is solid — the pre-create ALB → CloudFront → use CF domain pattern is correct. A few gaps need addressing before this will work reliably with Keycloak and SSE (Agent Gateway MCP):

Issues

# Issue Detail Fix
1 No OriginReadTimeout set CloudFront config JSON doesn't specify OriginReadTimeout — defaults to 30s. The Terraform reference (platform/infra/terraform/common/cloudfront.tf) uses 60s. Agent Gateway SSE requires the origin to send data within this window or CloudFront drops the connection. Add "OriginReadTimeout": 60 to CustomOriginConfig in the distribution JSON
2 No OriginKeepaliveTimeout set Defaults to 5s. Terraform reference uses 30s. Short keepalive means CloudFront opens new TCP connections frequently, adding latency. Add "OriginKeepaliveTimeout": 30 to CustomOriginConfig
3 Missing X-Forwarded-Proto / X-Forwarded-Port custom headers Terraform adds X-Forwarded-Proto: https and X-Forwarded-Port: 443 as custom origin headers. Without these, Keycloak generates redirect URIs with http:// instead of https:// (it sees the ALB connection as HTTP). Add CustomHeaders to the origin config with these two headers
4 No separate cache behavior for /keycloak/* Terraform has an ordered_cache_behavior for Keycloak with TTL=0 and all headers/cookies forwarded. The branch uses a single default behavior. Keycloak requires all cookies and headers for session management — the AllViewer origin request policy should cover this, but explicit TTL=0 prevents stale auth responses. Add an ordered_cache_behavior for /keycloak/* with MinTTL=0, DefaultTTL=0, MaxTTL=0
5 No destroy cleanup for CloudFront/ALB The destroy task doesn't delete the CloudFront distribution, the pre-created ALB, or the dedicated security group. These will be orphaned on teardown. Add CloudFront disable+delete, ALB delete, and SG delete to the destroy task (CloudFront requires disabling first, then waiting, then deleting)
6 CloudFront deployment propagation delay CloudFront distributions take 5-15 minutes to deploy. wait_for_deployment = false in Terraform skips this, but the Taskfile should either wait or warn that the domain won't be reachable immediately. Add a wait loop or print a warning after hub:cloudfront

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants