Context
The Constellation has an implicit tiering (Tier 1 production, Tier 2 early-stage, Tier 3 skeleton) but no formal maturity model. There's no way to enforce that a Tier 1 service has tracing, alerts, a runbook, and load tests — or to prevent a Tier 3 service from being promoted to Tier 1 without meeting requirements.
Adapted from Google's Production Readiness Review, Uber's production standards, and Mercari's open-source checklist.
Requirements
service.yaml specification
name: identity
tier: 1 # 1=critical-path, 2=important, 3=internal/experimental
owner: platform-team
dependencies:
- postgres
- redis
- nats-jetstream
slo:
availability: 99.9
latency_p99_ms: 200
readiness:
has_tracing: true
has_alerts: true
has_runbook: true
has_load_test: false
has_integration_tests: true
has_openapi_spec: false
Tier requirements
CI enforcement
Rollout
Why this matters
Without a formal model, Tier 3 skeleton services quietly become production dependencies without acquiring the operational maturity (tracing, alerts, runbooks) that production demands. This is how incidents happen.
Context
The Constellation has an implicit tiering (Tier 1 production, Tier 2 early-stage, Tier 3 skeleton) but no formal maturity model. There's no way to enforce that a Tier 1 service has tracing, alerts, a runbook, and load tests — or to prevent a Tier 3 service from being promoted to Tier 1 without meeting requirements.
Adapted from Google's Production Readiness Review, Uber's production standards, and Mercari's open-source checklist.
Requirements
service.yaml specification
service.yamlschema that lives in every repo root:Tier requirements
true. SLO must be defined. Breaking change detection in CI.has_tracing,has_alerts,has_integration_testsmust betrue. SLO defined.CI enforcement
service.yamland validates tier-appropriate requirementsRollout
service.yamltotemplate-go-serviceservice.yamlfor all existing repos (start with accurate current state, not aspirational)Why this matters
Without a formal model, Tier 3 skeleton services quietly become production dependencies without acquiring the operational maturity (tracing, alerts, runbooks) that production demands. This is how incidents happen.