diff --git a/docs/guide/scaling.md b/docs/guide/scaling.md index 7db88f89..7073641a 100644 --- a/docs/guide/scaling.md +++ b/docs/guide/scaling.md @@ -1,6 +1,16 @@ # Scaling -YOLO runs the web service as a single Fargate task by default. You can scale it two ways: +By default an app runs as **one Fargate task** doing everything — Octane plus, if enabled, the bundled queue worker and scheduler (`tasks.web.queue` / `tasks.web.scheduler`). That's the cheap floor and fine at low scale. The three workloads have different scaling shapes, though, so each can be **extracted into its own ECS service** that scales independently: + +| Service | How it scales | Opt in with | +|---|---|---| +| **web** | target tracking (CPU + request count), `min`→`max` | `tasks.web.autoscaling` | +| **queue** | backlog-per-task, **scales to zero** | top-level `tasks.queue` | +| **scheduler** | never — pinned singleton (exactly one task) | top-level `tasks.scheduler` | + +Extraction is additive and per-workload: keep the queue bundled but give the scheduler its own service, or any mix. A workload can't be both bundled (`tasks.web.queue`) and extracted (`tasks.queue`) at once — `sync` hard-fails if you configure both. + +You can scale the web (and queue) service two ways: - **Autoscaling** — let AWS adjust the task count automatically from live metrics. - **`yolo scale`** — set the capacity yourself, out of band, without a deploy. @@ -62,13 +72,14 @@ Application Auto Scaling targets and policies can't carry tags, so they don't sh [`yolo scale`](/reference/commands#yolo-scale) changes capacity without a build or deploy. Like `env:push`, it shows a current → new comparison and asks before applying. ```bash -yolo scale production --web --min=3 --max=10 # autoscaled: set the bounds -yolo scale production --web 3 # fixed: set the desired count +yolo scale production --web --min=3 --max=10 # web autoscaled: set the bounds +yolo scale production --web 3 # web fixed: set the desired count +yolo scale production --queue --min=0 --max=20 # queue bounds (min 0 = scale to zero) ``` Under autoscaling you set the **bounds** (`--min`/`--max`), never a desired count — the policies own desired count and would override it. Crucially, `scale` **writes the bounds back to the manifest** (surgically — your comments and formatting survive), so the manifest stays the single source of truth and the next `yolo sync` reconciles to the same values rather than clobbering your change. -For a fixed service (no `autoscaling` block) a positional `count` sets the ECS desired count directly. +For a fixed web service (no `autoscaling` block) a positional `count` sets the ECS desired count directly. A standalone queue is always autoscaling-managed, so it only takes `--min`/`--max`. The scheduler is a singleton and can't be scaled (`--scheduler` errors out). ### Reducing capacity is guarded @@ -80,17 +91,41 @@ Because the manifest is authoritative, a `yolo sync` run with a **stale** manife So an emergency `yolo scale production --web --min=10` is durable: it's written to the manifest *and* live, and no unattended sync can quietly walk it back. -## The scheduler caveat +## The queue (scale to zero) + +Add a top-level `tasks.queue` block to give the queue worker its own ECS service, separate from web: + +```yaml +tasks: + web: {} + queue: + min: 0 # scale to zero when idle (the default) + max: 20 + backlog-per-task: 100 + spot: true # optional: ~70% cheaper interruptible capacity +``` + +It scales on **backlog per task** — `ApproximateNumberOfMessagesVisible / RunningTaskCount`, computed with CloudWatch metric math (no Lambda) and held at `backlog-per-task` messages per running task. As the backlog grows it scales out toward `max`; as it drains it scales back in toward `min`. -This is the one thing to get right before scaling a service that runs the scheduler. +With `min: 0` the queue **scales to zero**: no tasks and no compute cost when idle. Target tracking can't lift it off zero (dividing by zero running tasks is undefined), so YOLO also attaches a step-scaling alarm that sets the service to exactly one task the instant a message becomes visible; target tracking owns it from one upward. The cost is a **~30–60s cold start** (image pull + boot) on the first message after idle. -In the default topology the web container also runs the queue worker and the **scheduler** (`crond` firing `schedule:run` every minute). The queue is safe to multiply — SQS only hands each message to one worker. The scheduler is **not**: scale to N tasks and `schedule:run` fires on every replica, so every scheduled task runs N times (N× emails, N× billing, N× reports). +That makes the choice of *where* the queue lives a latency decision: -There's no stable per-task identity on Fargate to elect a single scheduler from, so pick one of two strategies: +| Topology | Idle cost | Pickup latency | Use for | +|---|---|---|---| +| Bundled (`tasks.web.queue: true`) | included in web | **instant** (worker always warm) | light, latency-sensitive jobs | +| Standalone, `min: 0` | **~$0** | ~30–60s cold start from idle | bursty, latency-tolerant async | +| Standalone, `min: 1+` | one always-on task | instant, then autoscales | high-volume, always-busy | -### 1. `->onOneServer()` (recommended) +For multi-tenant apps, a single queue service works the app's default queue; per-tenant queue fan-out composes with [LPX-601](https://linear.app/codinglabsau/issue/LPX-601) and isn't covered here. -Add Laravel's [`onOneServer()`](https://laravel.com/docs/scheduling#running-tasks-on-one-server) to **every** scheduled task in your console kernel. It takes an atomic lock in the shared cache so only one replica runs each task per minute: +## The scheduler + +The scheduler (`crond` firing `schedule:run` every minute) must run as a **singleton** — if it runs on N tasks, every scheduled job fires N times (N× emails, N× billing, N× reports). The queue is safe to multiply (SQS hands each message to one worker); the scheduler is not. There's no stable per-task identity on Fargate to elect one from, so pick one of two strategies. + +### 1. `->onOneServer()` + +Keep the scheduler bundled in the web container (`tasks.web.scheduler: true`) and add Laravel's [`onOneServer()`](https://laravel.com/docs/scheduling#running-tasks-on-one-server) to **every** scheduled task. It takes an atomic lock in the shared cache so only one replica runs each task per minute: ```php $schedule->command('reports:send')->daily()->onOneServer(); @@ -100,10 +135,19 @@ This requires a shared lock store (the Valkey/Redis cache YOLO provisions, or a The catch: it's per-task. A scheduled task registered by a package (Telescope pruning, backups, etc.) that you can't annotate will still multi-fire — which is your signal to reach for strategy 2. -### 2. Separate the scheduler +### 2. Extract the scheduler (recommended once web scales) + +Give the scheduler its own service with a top-level `tasks.scheduler` block: + +```yaml +tasks: + web: + autoscaling: { min: 1, max: 6 } + scheduler: {} # its own pinned-singleton service +``` -Move the scheduler into its own service pinned at exactly one task. This removes the requirement entirely (it's genuinely a singleton) and lets the web tier scale without any scheduler concern. (Dedicated queue/scheduler services are on the roadmap.) +YOLO pins it at exactly one task (never a scalable target) and deploys it **stop-then-start** (`minimumHealthyPercent: 0` / `maximumPercent: 100`) so a rollout stops the old cron before starting the new one — a deploy never briefly runs two schedulers (a missed cron minute is harmless; a double-run isn't). This removes the `onOneServer()` *requirement* entirely — it's genuinely a singleton now — though leaving `onOneServer()` on is harmless. The web tier then scales without any scheduler concern. ::: tip -When you enable autoscaling on a task that still runs the scheduler, `yolo sync` prints a one-line reminder of exactly this. It's a nudge, not a gate — YOLO can't see inside your kernel to know which strategy you chose. +When you enable autoscaling on a web task that still **bundles** the scheduler, `yolo sync` prints a one-line advisory pointing at these two strategies. It's a nudge, not a gate — YOLO can't see inside your kernel to know whether you've used `onOneServer()`. ::: diff --git a/docs/reference/commands.md b/docs/reference/commands.md index 4ba722b9..6134089a 100644 --- a/docs/reference/commands.md +++ b/docs/reference/commands.md @@ -113,7 +113,7 @@ The image-building steps only run when the manifest declares `tasks`. See [Build Build, push, and deploy the application — runs [`build`](#yolo-build) first, then the zero-downtime rollout. ```bash -yolo deploy [--app-version=] [--no-progress] +yolo deploy [--app-version=] [--group=] [--no-progress] ``` | Argument | Required | Description | @@ -123,9 +123,10 @@ yolo deploy [--app-version=] [--no-progress] | Option | Value | Default | Description | |---|---|---|---| | `--app-version` | string | timestamp `y.W.N.Hi` | Tag to stamp on the build (same rules as `build`). | +| `--group` | comma-separated | all the app runs | Service groups to roll (`web,queue,scheduler`). Defaults to every service the app runs. | | `--no-progress` | flag | off | Hide the live progress output. | -After building, `deploy` pushes assets to S3, registers a new task-definition revision, runs `deploy` hooks as a one-off task, updates the ECS service, waits for it to go healthy (the deployment circuit breaker auto-rolls-back on failure), then UPSERTs Route 53 records. It always waits for the rollout to stabilise — there is no opt-out flag. +After building, `deploy` pushes assets to S3, registers a new task-definition revision **for each service group** (web plus any standalone queue/scheduler), runs `deploy` hooks as a one-off task, rolls each ECS service onto its new revision, waits for the web service to go healthy (the deployment circuit breaker auto-rolls-back on failure), then UPSERTs Route 53 records. It always waits for the rollout to stabilise — there is no opt-out flag. `--group` narrows the rollout to a subset of services (the shared image is built either way); a deploy that omits `web` skips the ALB health wait, relying on the circuit breaker. --- @@ -151,7 +152,9 @@ yolo run [--command=""] [--group=] - **No `--command`** → opens an interactive `/bin/sh` in the first running task (searched in the order `scheduler → queue → web`). - **With `--command`** → runs the command. With `--group`, it **fans out** across every running task in each listed group. Without `--group`, it runs on the first group that has a running task. -**Requirements:** the AWS [Session Manager plugin](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html) installed locally, and `tasks.web.enable-execute-command: true` in the manifest. +Each group is its own ECS service when extracted, and `run` execs into the container named after the group. A bundled queue/scheduler runs inside the web container, so a `--group=queue` lookup that finds no standalone queue service simply falls through to the next group. + +**Requirements:** the AWS [Session Manager plugin](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html) installed locally, and `enable-execute-command: true` on the target group in the manifest. ```bash yolo run production @@ -159,10 +162,6 @@ yolo run production --command="php artisan migrate:status" yolo run production --command="php artisan queue:restart" --group=web,queue ``` -::: tip -Today web, queue, and scheduler all run in the single `web` container, so the groups collapse onto it — the distinction matters once independent task groups land. -::: - --- ## `yolo scale` @@ -181,19 +180,22 @@ yolo scale [count] [--web] [--min=] [--max=] [--queue] [--sc | Option | Value | Description | |---|---|---| | `--web` | flag | Target the web service (the default). | -| `--min` / `--max` | int | Autoscaling bounds — the autoscaled form. | -| `--queue` | flag | Target the queue service (errors until it's a separate service — on the roadmap). | +| `--queue` | flag | Target the standalone queue service. Always autoscaling-managed — takes `--min`/`--max` (min may be `0`), never a count. | | `--scheduler` | flag | Always errors — the scheduler is a singleton and can't be scaled. | +| `--min` / `--max` | int | Autoscaling bounds — the autoscaled form. | + +There are two forms, picked by what you pass: -The web service has two forms, picked by what you pass: +- **Autoscaled** — `--min`/`--max` set the bounds. The values are written back to the manifest (surgically — comments and formatting are preserved): web → [`tasks.web.autoscaling.min/max`](/reference/manifest#tasks-web-autoscaling), queue → [`tasks.queue.min/max`](/reference/manifest#tasks-queue). The scalable target is then registered, so the **manifest stays the source of truth** and the next sync reconciles to the same values. A desired count is never set under autoscaling (the policies would override it). +- **Fixed** — a positional `count` sets the ECS desired count directly (`UpdateService`), for a **web** service with no `autoscaling` block. A standalone queue is always autoscaling-managed, so passing it a count errors and points you to `--min/--max`. -- **Autoscaled** — `--min`/`--max` set the bounds. The values are written back to [`tasks.web.autoscaling.min/max`](/reference/manifest#tasks-web-autoscaling) (surgically — comments and formatting are preserved) and the scalable target is registered, so the **manifest stays the source of truth** and the next sync reconciles to the same values. A desired count is never set under autoscaling (the policies would override it). -- **Fixed** — a positional `count` sets the ECS desired count directly (`UpdateService`), for a service with no `autoscaling` block. Trying to pass a count to an autoscaling-managed service errors and points you to `--min/--max`. +Lowering a live bound is guarded the same as [reducing capacity](/guide/scaling#reducing-capacity-is-guarded) — an explicit confirm defaulting to no. ```bash -yolo scale production --web --min=3 --max=10 # autoscaled bounds (writes the manifest) -yolo scale production --web 3 # fixed desired count -yolo scale production # prompt for a fixed count +yolo scale production --web --min=3 --max=10 # web autoscaled bounds (writes the manifest) +yolo scale production --web 3 # web fixed desired count +yolo scale production --queue --min=0 --max=20 # queue bounds — min 0 = scale to zero +yolo scale production # prompt for a fixed count ``` **Reducing capacity** (a bound below the live value) is confirm-gated and defaults to *no*. See [Scaling](/guide/scaling). diff --git a/docs/reference/manifest.md b/docs/reference/manifest.md index 5ce1ea4b..5984156f 100644 --- a/docs/reference/manifest.md +++ b/docs/reference/manifest.md @@ -85,6 +85,28 @@ environments: # scale-out-cooldown: 60 # default: 60 # scale-in-cooldown: 300 # default: 300 + # Extract the queue into its own ECS service (scale independently of web). + # Mutually exclusive with `web.queue` above — configure a queue in one place, + # not both. A standalone queue scales to zero by default. + # queue: + # min: 0 # default: 0 — 0 = scale to zero when idle + # max: 10 # default: 10 + # backlog-per-task: 100 # default: 100 — target messages per running task + # cpu: '256' # default: '256' + # memory: '512' # default: '512' + # spot: false # default: false — true = Fargate Spot (~70% cheaper) + # shutdown-grace-period: 70 # default: 70 — let an in-flight job finish on SIGTERM + # enable-execute-command: false # default: false + + # Extract the scheduler into its own pinned-singleton service (always one + # task; deploys stop-then-start so a rollout never runs two crons). Mutually + # exclusive with `web.scheduler` above. + # scheduler: + # cpu: '256' # default: '256' + # memory: '512' # default: '512' + # shutdown-grace-period: 10 # default: 10 — wait out an in-flight schedule:run + # enable-execute-command: false # default: false + build: - composer install --no-cache --no-interaction --optimize-autoloader --no-progress --classmap-authoritative --no-dev - npm ci @@ -286,8 +308,8 @@ Declaring `tasks.web` makes the app a Fargate web service. Omit `tasks` entirely | `tasks.web.memory` | `'1024'` | Fargate memory (MB). | | `tasks.web.platform` | `linux/amd64` | Docker build platform. | | `tasks.web.enable-execute-command` | `false` | Enable ECS Exec so [`yolo run`](/reference/commands#yolo-run) can attach. Gate access with MFA on your IAM. | -| `tasks.web.queue` | `false` | Run `queue:work` in the container. `true`, or an object to override its `shutdown-grace-period`. | -| `tasks.web.scheduler` | `false` | Run the Laravel scheduler (cron + `schedule:run`). `true`, or an object form like `queue`. | +| `tasks.web.queue` | `false` | Run `queue:work` **bundled** in the web container (warm, instant pickup). `true`, or an object to override its `shutdown-grace-period`. For independent scaling / scale-to-zero, extract it to a top-level [`tasks.queue`](#tasks-queue) instead — configuring both is an error. | +| `tasks.web.scheduler` | `false` | Run the Laravel scheduler (cron + `schedule:run`) **bundled** in the web container. `true`, or an object form like `queue`. To make it a true singleton, extract it to a top-level [`tasks.scheduler`](#tasks-scheduler) instead — not both. | | `tasks.web.shutdown-grace-period` | `10` (web), `70` (queue) | Seconds a process gets on `SIGTERM` before `SIGKILL`. For web it's also the ALB drain window and the container `stopTimeout`. See [graceful shutdown](/guide/images#graceful-shutdown). | | `tasks.web.log-retention` | `30` | CloudWatch Logs retention (days). Must be a valid CloudWatch retention value. | | `tasks.web.execution-role` | shared `yolo-{env}` role | Override the ECS execution role ARN. | @@ -332,11 +354,49 @@ tasks: ``` ::: warning Bundled scheduler -When the scheduler runs in the same task (`tasks.web.scheduler: true`), scaling to N tasks runs cron N times — every scheduled task would fire on each replica. Every scheduled task **must** use Laravel's `->onOneServer()`, or you should separate the scheduler into its own service. `sync` prints a one-line advisory in this case. See [Scaling → the scheduler caveat](/guide/scaling#the-scheduler-caveat). +When the scheduler runs in the same task (`tasks.web.scheduler: true`), scaling to N tasks runs cron N times — every scheduled task would fire on each replica. Every scheduled task **must** use Laravel's `->onOneServer()`, or extract the scheduler into its own service ([`tasks.scheduler`](#tasks-scheduler)). `sync` prints a one-line advisory in this case. See [Scaling → the scheduler](/guide/scaling#the-scheduler). ::: --- +## `tasks.queue.*` + +A top-level `tasks.queue` block extracts the queue worker into its **own** ECS service, so it scales independently of web. Presence is the opt-in — an empty block (`queue:`) gives a scale-to-zero worker on default sizing. Mutually exclusive with the bundled [`tasks.web.queue`](#tasks-web): configure the queue in one place, not both, or `sync` hard-fails. + +A standalone queue **scales to zero by default** (`min: 0`): zero tasks — and zero compute cost — when the queue is empty, scaling up on backlog. The trade-off is a ~30–60s Fargate cold start on the first message after idle, so it suits bursty, latency-tolerant work. For latency-sensitive jobs that must start instantly, keep them bundled in the web container (`tasks.web.queue: true`, a warm worker) or set a standing floor (`min: 1`). + +Scaling is **backlog-per-task** target tracking (`ApproximateNumberOfMessagesVisible / RunningTaskCount`, CloudWatch metric math — no Lambda). A scale-to-zero queue (`min: 0`) also gets a step-scaling alarm that lifts it 0→1 the instant a message arrives (target tracking can't divide by zero running tasks). + +| Key | Default | Description | +|---|---|---| +| `tasks.queue.min` | `0` | Minimum tasks. `0` = scale to zero when idle. | +| `tasks.queue.max` | `10` | Maximum tasks. | +| `tasks.queue.backlog-per-task` | `100` | Target visible messages per running task — the scale-out trigger. | +| `tasks.queue.cpu` | `'256'` | Fargate CPU units. | +| `tasks.queue.memory` | `'512'` | Fargate memory (MB). | +| `tasks.queue.spot` | `false` | `true` runs the queue on Fargate Spot (~70% cheaper, interruptible — fine for a worker whose jobs retry). | +| `tasks.queue.shutdown-grace-period` | `70` | Seconds the worker gets on `SIGTERM` to finish its in-flight job before `SIGKILL`. | +| `tasks.queue.enable-execute-command` | `false` | Enable ECS Exec on the queue service. | + +See [Scaling → the queue](/guide/scaling#the-queue-scale-to-zero). + +--- + +## `tasks.scheduler.*` + +A top-level `tasks.scheduler` block extracts the scheduler (busybox `crond` firing `schedule:run`) into its **own** ECS service, pinned at exactly one task — a genuine singleton, so `->onOneServer()` is no longer required. It deploys **stop-then-start** (`minimumHealthyPercent: 0` / `maximumPercent: 100`) so a rollout never briefly runs two crons; a missed cron minute is harmless, a double-run isn't. Mutually exclusive with the bundled [`tasks.web.scheduler`](#tasks-web). + +The scheduler never scales (a per-minute cron can't tolerate a cold start), so it has no `min`/`max`. + +| Key | Default | Description | +|---|---|---| +| `tasks.scheduler.cpu` | `'256'` | Fargate CPU units (the scheduler is light — the smallest tier is usually plenty). | +| `tasks.scheduler.memory` | `'512'` | Fargate memory (MB). | +| `tasks.scheduler.shutdown-grace-period` | `10` | Seconds to wait out an in-flight `schedule:run` on `SIGTERM`. Long-running work belongs on the queue, not the cron tick. | +| `tasks.scheduler.enable-execute-command` | `false` | Enable ECS Exec on the scheduler service. | + +--- + ## Deploy hooks Three arrays run shell commands at different points in the pipeline — see [Building & Deploying](/guide/building-and-deploying#hooks-build-vs-deploy-vs-deploy-all). diff --git a/src/Aws.php b/src/Aws.php index 62a78af3..5d171999 100644 --- a/src/Aws.php +++ b/src/Aws.php @@ -33,21 +33,6 @@ public static function runningInAws(): bool return Helpers::app('runningInAws'); } - public static function runningInAwsWebEnvironment(): bool - { - return Helpers::app('runningInAwsWebEnvironment'); - } - - public static function runningInAwsQueueEnvironment(): bool - { - return Helpers::app('runningInAwsQueueEnvironment'); - } - - public static function runningInAwsSchedulerEnvironment(): bool - { - return Helpers::app('runningInAwsSchedulerEnvironment'); - } - public static function tags(array $tags = [], string $wrap = 'Tags', bool $associative = false): array { $tags = static::expectedTags($tags); diff --git a/src/Commands/Command.php b/src/Commands/Command.php index 83c92acd..8ad6dcf5 100644 --- a/src/Commands/Command.php +++ b/src/Commands/Command.php @@ -90,7 +90,32 @@ protected function ensureManifestIntegrity(): bool && $this->ensureManifestKeyDeclared('region') && $this->ensureManifestKeyDeclared('account-id') && $this->ensureCacheStoreValid() - && $this->ensureSessionDriverValid(); + && $this->ensureSessionDriverValid() + && $this->ensureTaskGroupsNotDoublyDefined(); + } + + /** + * A workload runs either bundled in the web container (`tasks.web.queue` / + * `tasks.web.scheduler`) or as its own service (a top-level `tasks.queue` / + * `tasks.scheduler` block) — never both. Configuring both is ambiguous (is the + * queue in the web task or a service of its own?), so hard-fail and tell the + * operator which line to drop. + */ + protected function ensureTaskGroupsNotDoublyDefined(): bool + { + foreach (['queue', 'scheduler'] as $group) { + if (Manifest::bundles($group) && Manifest::has("tasks.$group")) { + error(sprintf( + "yolo.yml runs `%s` both bundled under `tasks.web.%s` and as its own service `tasks.%s` — pick one.\n" + . 'Drop `tasks.web.%s` to extract it into a standalone service, or drop `tasks.%s` to keep it in the web container.', + $group, $group, $group, $group, $group, + )); + + return false; + } + } + + return true; } /** diff --git a/src/Commands/DeployCommand.php b/src/Commands/DeployCommand.php index 5675c13e..b7de7486 100644 --- a/src/Commands/DeployCommand.php +++ b/src/Commands/DeployCommand.php @@ -3,6 +3,7 @@ namespace Codinglabs\Yolo\Commands; use Codinglabs\Yolo\Steps; +use Symfony\Component\Console\Input\InputOption; use Symfony\Component\Console\Input\InputArgument; use function Laravel\Prompts\intro; @@ -25,6 +26,7 @@ protected function configure(): void ->setName('deploy') ->addArgument('environment', InputArgument::REQUIRED, 'The environment name') ->addOption('app-version', null, InputArgument::OPTIONAL, 'Tag to stamp on the build (defaults to a timestamp)') + ->addOption('group', null, InputOption::VALUE_REQUIRED, 'Comma-separated service groups to roll (web,queue,scheduler) — defaults to all the app runs') ->addOption('no-progress', null, null, 'Hide the progress output') ->setDescription('Build, push, and deploy the application'); } diff --git a/src/Commands/RunCommand.php b/src/Commands/RunCommand.php index 46d773e6..9952ccfe 100644 --- a/src/Commands/RunCommand.php +++ b/src/Commands/RunCommand.php @@ -17,10 +17,6 @@ class RunCommand extends Command { - // Today every process runs in the single web container; when queue/scheduler - // become their own services this becomes per-group. - protected const CONTAINER = 'web'; - protected function configure(): void { $this @@ -43,8 +39,9 @@ public function handle(): int $command = $this->option('command'); // An explicit --group fans out across every listed group; the default is - // an ordered fallback — scheduler → queue → web. All three collapse into - // the web container today, so the first two lookups just fall through. + // an ordered fallback — scheduler → queue → web — so a one-off lands on + // the first group that has a running task. Each group is its own ECS + // service now, so a lookup that misses just falls through to the next. $groups = ($group = $this->option('group')) ? array_map('trim', explode(',', $group)) : ['scheduler', 'queue', 'web']; @@ -53,17 +50,17 @@ public function handle(): int // Interactive shell can only attach to one task — first running, in order. if (! $command) { - $task = collect($groups) - ->flatMap(fn (string $group) => Ecs::runningTasks($cluster, Helpers::keyedResourceName($group, exclusive: true))) - ->first(); - - if (! $task) { - error('No running task found to attach to.'); - - return self::FAILURE; + foreach ($groups as $group) { + if ($task = Ecs::runningTasks($cluster, Helpers::keyedResourceName($group, exclusive: true))[0] ?? null) { + // The container name is the group (the task-def names its + // container after the role), so we exec into the right one. + return $this->exec($cluster, $task, '/bin/sh', $group, interactive: true); + } } - return $this->exec($cluster, $task, '/bin/sh', interactive: true); + error('No running task found to attach to.'); + + return self::FAILURE; } // One-off command: fan out across all tasks when --group was given, @@ -75,7 +72,7 @@ public function handle(): int foreach ($tasks as $task) { note(sprintf('%s · %s', $group, $task)); - $this->exec($cluster, $task, $command, interactive: false); + $this->exec($cluster, $task, $command, $group, interactive: false); $ran++; } @@ -91,10 +88,10 @@ public function handle(): int return self::SUCCESS; } - protected function exec(string $cluster, string $task, string $command, bool $interactive): int + protected function exec(string $cluster, string $task, string $command, string $container, bool $interactive): int { $process = new Process( - static::executeCommandArgs($cluster, $task, $command, Manifest::get('region'), Helpers::keyedEnv('AWS_PROFILE')), + static::executeCommandArgs($cluster, $task, $command, $container, Manifest::get('region'), Helpers::keyedEnv('AWS_PROFILE')), timeout: null, ); @@ -108,16 +105,18 @@ protected function exec(string $cluster, string $task, string $command, bool $in /** * The `aws ecs execute-command` invocation. Always `--interactive` (the API * requires it); the command is `/bin/sh` for a shell or the one-off command. + * The container is the service group (web/queue/scheduler) — the task-def + * names its container after the role. * * @return array */ - public static function executeCommandArgs(string $cluster, string $task, string $command, string $region, ?string $profile): array + public static function executeCommandArgs(string $cluster, string $task, string $command, string $container, string $region, ?string $profile): array { $args = [ 'aws', 'ecs', 'execute-command', '--cluster', $cluster, '--task', $task, - '--container', static::CONTAINER, + '--container', $container, '--interactive', '--command', $command, '--region', $region, diff --git a/src/Commands/ScaleCommand.php b/src/Commands/ScaleCommand.php index fcfe06b5..a3276cae 100644 --- a/src/Commands/ScaleCommand.php +++ b/src/Commands/ScaleCommand.php @@ -5,6 +5,7 @@ use Codinglabs\Yolo\Aws; use Codinglabs\Yolo\Aws\Ecs; use Codinglabs\Yolo\Manifest; +use Codinglabs\Yolo\Enums\ServerGroup; use Codinglabs\Yolo\Resources\Ecs\EcsCluster; use Codinglabs\Yolo\Resources\Ecs\EcsService; use Symfony\Component\Console\Input\InputOption; @@ -20,20 +21,20 @@ use function Laravel\Prompts\confirm; /** - * Adjust the web service's capacity out of band — no build, no task-definition + * Adjust a service's capacity out of band — no build, no task-definition * revision. Mirrors env:push's compare-then-confirm UX: read live state, show a * current → new table, gate on a confirm, bail with the chick. * * The manifest is the source of truth, so the autoscaled path writes the new * bounds back to yolo.yml (surgically, preserving formatting) and registers them * — sync then reconciles to the same values rather than clobbering them. The - * fixed path (no scalable target) sets the ECS desired count directly, since - * there are no bounds to manage. + * fixed path (a web service with no scalable target) sets the ECS desired count + * directly, since there are no bounds to manage. * - * yolo scale production --web --min=3 --max=10 # autoscaled bounds - * yolo scale production --web 3 # fixed desired count - * yolo scale production --queue … # not yet — the queue runs inside the web task - * yolo scale production --scheduler … # error — the scheduler is a singleton + * yolo scale production --web --min=3 --max=10 # web autoscaling bounds + * yolo scale production --web 3 # fixed web desired count + * yolo scale production --queue --min=0 --max=20 # queue bounds (min 0 = scale to zero) + * yolo scale production --scheduler … # error — the scheduler is a singleton */ class ScaleCommand extends Command { @@ -53,34 +54,35 @@ protected function configure(): void public function handle(): void { - if (! $this->resolveWebGroup()) { + if (($group = $this->resolveGroup()) === null) { return; } $cluster = (new EcsCluster())->name(); - $serviceName = (new EcsService())->name(); + $serviceName = (new EcsService($group))->name(); try { $service = Ecs::service($cluster, $serviceName); } catch (ResourceDoesNotExistException) { - error(sprintf('Could not find the web service for %s — has it been deployed?', $this->argument('environment'))); + error(sprintf('Could not find the %s service for %s — has it been deployed?', $group->value, $this->argument('environment'))); return; } - $target = new ScalableTarget(); + $target = new ScalableTarget($group); $live = $target->current(); if ($this->option('min') !== null || $this->option('max') !== null) { - $this->scaleBounds($target, $live); + $this->scaleBounds($group, $target, $live); return; } - // No bounds given → desired-count path. Setting desired count on an - // autoscaling-managed service is futile (the policies override it), so - // redirect to the bounds form rather than quietly no-op. - if ($live !== null) { + // A standalone queue is always autoscaling-managed, as is any web service + // with a registered target. Setting a fixed desired count there is futile + // (the policies override it), so redirect to the bounds form rather than + // quietly no-op. Only a fixed web service falls through to desired count. + if ($group === ServerGroup::QUEUE || $live !== null) { error('This service is autoscaling-managed — use --min/--max to change its bounds, not a desired count.'); return; @@ -90,33 +92,36 @@ public function handle(): void } /** - * Resolve the target group. Only --web is live; --queue is a forward-compat - * stub until queue/scheduler become their own services, and --scheduler is - * never permitted (a singleton can't be scaled). Returns false when the - * command should stop (error already surfaced). + * Resolve the target group from the flags. The scheduler is a singleton and + * can never be scaled; web is the default. Returns null when the command + * should stop (error already surfaced). */ - protected function resolveWebGroup(): bool + protected function resolveGroup(): ?ServerGroup { if ($this->option('scheduler')) { error('The scheduler is a singleton and cannot be scaled — it always runs exactly one task.'); - return false; + return null; } - if ($this->option('queue')) { - error('Queue scaling will land when the queue becomes its own service. For now it runs inside the web task and scales with it.'); - - return false; - } - - return true; + return $this->option('queue') ? ServerGroup::QUEUE : ServerGroup::WEB; } - protected function scaleBounds(ScalableTarget $target, ?array $live): void + protected function scaleBounds(ServerGroup $group, ScalableTarget $target, ?array $live): void { $newMin = $this->option('min') !== null ? (int) $this->option('min') : ($live['min'] ?? $target->min()); $newMax = $this->option('max') !== null ? (int) $this->option('max') : ($live['max'] ?? $target->max()); + // The queue may floor at zero (scale to zero); the web tier must keep at + // least one task serving. + $floor = $group === ServerGroup::QUEUE ? 0 : 1; + + if ($newMin < $floor) { + error(sprintf('Minimum capacity for the %s service cannot be below %d.', $group->value, $floor)); + + return; + } + if ($newMin > $newMax) { error(sprintf('Minimum capacity (%d) cannot exceed maximum capacity (%d).', $newMin, $newMax)); @@ -142,8 +147,10 @@ protected function scaleBounds(ScalableTarget $target, ?array $live): void // Manifest is the source of truth — write the bounds back (surgically, so // comments/formatting survive) so the next sync reconciles to these values // rather than clobbering them. - Manifest::put('tasks.web.autoscaling.min', $newMin); - Manifest::put('tasks.web.autoscaling.max', $newMax); + [$minKey, $maxKey] = static::boundsKeys($group); + + Manifest::put($minKey, $newMin); + Manifest::put($maxKey, $newMax); $target->register($newMin, $newMax); @@ -179,6 +186,20 @@ protected function scaleDesiredCount(string $cluster, string $serviceName, int $ info('Scaled successfully.'); } + /** + * The manifest min/max key paths for a group — web autoscaling bounds live + * under tasks.web.autoscaling, the queue's directly under tasks.queue (a + * standalone queue is always autoscaled). + * + * @return array{0: string, 1: string} + */ + public static function boundsKeys(ServerGroup $group): array + { + return $group === ServerGroup::QUEUE + ? ['tasks.queue.min', 'tasks.queue.max'] + : ['tasks.web.autoscaling.min', 'tasks.web.autoscaling.max']; + } + /** * Bounds comparison rows for the autoscaled path (current → new min/max). * diff --git a/src/Commands/SyncAppCommand.php b/src/Commands/SyncAppCommand.php index d050f032..2f655766 100644 --- a/src/Commands/SyncAppCommand.php +++ b/src/Commands/SyncAppCommand.php @@ -38,9 +38,9 @@ public function handle(): int /** * A soft, non-blocking nudge (not a guard) when autoscaling is enabled on a - * task that also runs the scheduler. Scaling the bundled task to N replicas - * runs cron N times, so every scheduled task must use ->onOneServer(); apps - * that outgrow that should separate the scheduler into its own service. + * web task that also bundles the scheduler. Scaling the bundled task to N + * replicas runs cron N times, so every scheduled task must use ->onOneServer(); + * apps that outgrow that should extract the scheduler into its own service. */ public static function schedulerAdvisory(): ?string { @@ -48,7 +48,8 @@ public static function schedulerAdvisory(): ?string return null; } - return 'Autoscaling a bundled web+scheduler task: every scheduled task must use ->onOneServer() so it does not run on each replica.'; + return 'Autoscaling a bundled web+scheduler task: every scheduled task must use ->onOneServer() so it does not run on each replica. ' + . 'To drop that requirement, extract the scheduler into its own pinned-singleton service with a top-level `tasks.scheduler` block.'; } public function scopes(): array @@ -108,6 +109,26 @@ public function scopes(): array // back down. Both steps no-op when it was never enabled. Steps\Sync\App\SyncScalableTargetStep::class, Steps\Sync\App\SyncScalingPoliciesStep::class, + // Standalone queue service (own task-def + service + + // scale-to-zero autoscaling) — only when tasks.queue extracts + // it from the web container. + ...Manifest::hasStandaloneQueue() + ? [ + Steps\Sync\App\SyncQueueTaskDefinitionStep::class, + Steps\Sync\App\SyncQueueServiceStep::class, + Steps\Sync\App\SyncQueueScalableTargetStep::class, + Steps\Sync\App\SyncQueueScalingPolicyStep::class, + Steps\Sync\App\SyncQueueScaleToZeroAlarmStep::class, + ] + : [], + // Standalone scheduler service (pinned singleton) — only when + // tasks.scheduler extracts it from the web container. + ...Manifest::hasStandaloneScheduler() + ? [ + Steps\Sync\App\SyncSchedulerTaskDefinitionStep::class, + Steps\Sync\App\SyncSchedulerServiceStep::class, + ] + : [], Steps\Sync\App\SyncAssetDistributionStep::class, ] : [], diff --git a/src/Concerns/ChecksIfCommandsShouldBeRunning.php b/src/Concerns/ChecksIfCommandsShouldBeRunning.php index 88a051a9..b44fb343 100644 --- a/src/Concerns/ChecksIfCommandsShouldBeRunning.php +++ b/src/Concerns/ChecksIfCommandsShouldBeRunning.php @@ -7,12 +7,9 @@ use Codinglabs\Yolo\Contracts\Step; use Codinglabs\Yolo\Commands\Command; use Codinglabs\Yolo\Contracts\RunsOnAws; -use Codinglabs\Yolo\Contracts\RunsOnAwsWeb; -use Codinglabs\Yolo\Contracts\RunsOnAwsQueue; use Codinglabs\Yolo\Contracts\ExecutesIvsStep; use Codinglabs\Yolo\Contracts\ExecutesWebStep; use Codinglabs\Yolo\Contracts\ExecutesSoloStep; -use Codinglabs\Yolo\Contracts\RunsOnAwsScheduler; use Codinglabs\Yolo\Contracts\ExecutesMultitenancyStep; trait ChecksIfCommandsShouldBeRunning @@ -44,18 +41,6 @@ public function skipReason(Command|Step $instance): ?string } if (Aws::runningInAws()) { - if ($instance instanceof RunsOnAwsWeb) { - return Aws::runningInAwsWebEnvironment() ? null : 'not the web environment'; - } - - if ($instance instanceof RunsOnAwsQueue) { - return Aws::runningInAwsQueueEnvironment() ? null : 'not the queue environment'; - } - - if ($instance instanceof RunsOnAwsScheduler) { - return Aws::runningInAwsSchedulerEnvironment() ? null : 'not the scheduler environment'; - } - return $instance instanceof RunsOnAws ? null : 'does not run on AWS instances'; } diff --git a/src/Concerns/ParsesOnlyOption.php b/src/Concerns/ParsesOnlyOption.php deleted file mode 100644 index 7f5ec6a6..00000000 --- a/src/Concerns/ParsesOnlyOption.php +++ /dev/null @@ -1,38 +0,0 @@ -parseOnlyOption($options['only'] ?? null)); - } - - public function parseOnlyOption(?string $only): array - { - if (! $only) { - return [ - ServerGroup::WEB, - ServerGroup::QUEUE, - ServerGroup::SCHEDULER, - ]; - } - - $servers = []; - - $values = array_map('trim', explode(',', $only)); - - foreach ($values as $server) { - $servers[] = match ($server) { - ServerGroup::WEB->value => ServerGroup::WEB, - ServerGroup::QUEUE->value => ServerGroup::QUEUE, - ServerGroup::SCHEDULER->value => ServerGroup::SCHEDULER, - }; - } - - return $servers; - } -} diff --git a/src/Concerns/RegistersAws.php b/src/Concerns/RegistersAws.php index d3a0d5c9..db6c6a19 100644 --- a/src/Concerns/RegistersAws.php +++ b/src/Concerns/RegistersAws.php @@ -23,7 +23,6 @@ use Aws\CodeDeploy\CodeDeployClient; use Aws\ElastiCache\ElastiCacheClient; use Aws\EventBridge\EventBridgeClient; -use Codinglabs\Yolo\Enums\ServerGroup; use Aws\Credentials\CredentialProvider; use GuzzleHttp\Exception\ConnectException; use Aws\CloudWatchLogs\CloudWatchLogsClient; @@ -68,11 +67,6 @@ protected function registerAwsServices(): void Helpers::app()->singleton('sqs', fn () => new SqsClient($arguments)); Helpers::app()->singleton('ssm', fn () => new SsmClient($arguments)); Helpers::app()->singleton('sts', fn () => new StsClient($arguments)); - - // with all clients registered, we can now determine specific environments - Helpers::app()->singleton('runningInAwsWebEnvironment', fn () => static::detectAwsWebEnvironment()); - Helpers::app()->singleton('runningInAwsQueueEnvironment', fn () => static::detectAwsQueueEnvironment()); - Helpers::app()->singleton('runningInAwsSchedulerEnvironment', fn () => static::detectAwsSchedulerEnvironment()); } protected static function awsCredentials(): callable|array|null @@ -155,7 +149,7 @@ protected static function detectCiEnvironment(): bool return env('CI', false) === true; } - protected static function detectAwsEnvironment(?ServerGroup $serverGroup = null): bool + protected static function detectAwsEnvironment(): bool { if (static::detectLocalEnvironment() || static::detectCiEnvironment()) { // skip if we are local or in continuous integration @@ -163,34 +157,8 @@ protected static function detectAwsEnvironment(?ServerGroup $serverGroup = null) } try { - $instanceId = (new Client(['timeout' => 2])) - ->get('http://169.254.169.254/latest/meta-data/instance-id') - ->getBody(); - - if ($serverGroup) { - $awsResult = Aws::ec2()->describeTags([ - 'Filters' => [ - [ - 'Name' => 'resource-id', - 'Values' => [$instanceId], - ], - [ - 'Name' => 'key', - 'Values' => ['Name'], - ], - ], - ]); - - $allowedMatch = Helpers::keyedResourceName($serverGroup, exclusive: false); - - foreach ($awsResult['Tags'] as $tag) { - if ($tag['Key'] === 'Name' && $tag['Value'] === $allowedMatch) { - return true; - } - } - - return false; - } + (new Client(['timeout' => 2])) + ->get('http://169.254.169.254/latest/meta-data/instance-id'); return true; } catch (ConnectException $e) { @@ -198,19 +166,4 @@ protected static function detectAwsEnvironment(?ServerGroup $serverGroup = null) return false; } - - protected static function detectAwsWebEnvironment(): bool - { - return static::detectAwsEnvironment(ServerGroup::WEB); - } - - protected static function detectAwsQueueEnvironment(): bool - { - return static::detectAwsEnvironment(ServerGroup::QUEUE); - } - - protected static function detectAwsSchedulerEnvironment(): bool - { - return static::detectAwsEnvironment(ServerGroup::SCHEDULER); - } } diff --git a/src/Concerns/ResolvesServerGroups.php b/src/Concerns/ResolvesServerGroups.php new file mode 100644 index 00000000..d0ec4b96 --- /dev/null +++ b/src/Concerns/ResolvesServerGroups.php @@ -0,0 +1,43 @@ + + */ + protected function resolveServerGroups(?string $only): array + { + $available = Manifest::serverGroups(); + + if (! $only) { + return $available; + } + + return array_map(function (string $value) use ($available) { + $group = ServerGroup::tryFrom(trim($value)); + + if ($group === null || ! in_array($group, $available, true)) { + throw new IntegrityCheckException(sprintf( + 'Unknown --group "%s". This app runs: %s.', + trim($value), + implode(', ', array_map(fn (ServerGroup $group) => $group->value, $available)), + )); + } + + return $group; + }, explode(',', $only)); + } +} diff --git a/src/Contracts/RunsOnAwsQueue.php b/src/Contracts/RunsOnAwsQueue.php deleted file mode 100644 index a3793966..00000000 --- a/src/Contracts/RunsOnAwsQueue.php +++ /dev/null @@ -1,5 +0,0 @@ -value}"; + } + + /** + * Only the web service sits behind the ALB (target group, health-check grace, + * port mapping). The queue and scheduler are headless workers. + */ + public function attachesToLoadBalancer(): bool + { + return $this === self::WEB; + } + + /** + * The scheduler is a pinned singleton — exactly one task, never a scalable + * target, deployed stop-then-start so a rollout never briefly runs two crons. + */ + public function isSingleton(): bool + { + return $this === self::SCHEDULER; + } + + /** + * Default Fargate task CPU units. The web tier serves requests so it gets the + * larger default; the queue and scheduler are lighter and start at 0.25 vCPU. + */ + public function defaultCpu(): string + { + return $this === self::WEB ? '512' : '256'; + } + + /** + * Default Fargate task memory (MiB) — paired with defaultCpu() to a valid + * Fargate CPU/memory combination (256 → 512, 512 → 1024). + */ + public function defaultMemory(): string + { + return $this === self::WEB ? '1024' : '512'; + } } diff --git a/src/Helpers.php b/src/Helpers.php index d866c26d..de3894d8 100644 --- a/src/Helpers.php +++ b/src/Helpers.php @@ -162,6 +162,25 @@ public static function validatePositiveInt(mixed $value, string $key): int return $validated; } + /** + * A whole number ≥ 0 — for capacity floors that may legitimately be zero (a + * queue that scales to zero), where validatePositiveInt's 1-minimum is wrong. + */ + public static function validateNonNegativeInt(mixed $value, string $key): int + { + $validated = filter_var($value, FILTER_VALIDATE_INT, ['options' => ['min_range' => 0]]); + + if ($validated === false) { + throw new IntegrityCheckException(sprintf( + '%s must be a non-negative integer (got %s)', + $key, + json_encode($value), + )); + } + + return $validated; + } + public static function validateCloudWatchLogRetention(mixed $value, string $key): int { $allowed = [1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, 2192, 2557, 2922, 3288, 3653]; diff --git a/src/Manifest.php b/src/Manifest.php index d0441d64..98e39f8a 100644 --- a/src/Manifest.php +++ b/src/Manifest.php @@ -4,6 +4,7 @@ use Illuminate\Support\Arr; use Symfony\Component\Yaml\Yaml; +use Codinglabs\Yolo\Enums\ServerGroup; use Codinglabs\Yolo\Exceptions\IntegrityCheckException; class Manifest @@ -33,7 +34,7 @@ class Manifest 'sqs.depth-alarm-threshold', 'sqs.depth-alarm-period', 'sqs.depth-alarm-evaluation-periods', 'cache.store', 'session.driver', - 'tasks.web.*', + 'tasks.web.*', 'tasks.queue.*', 'tasks.scheduler.*', 'build', 'deploy', 'deploy-all', ]; @@ -224,7 +225,12 @@ protected static function setScalarPreservingFormat(string $raw, array $path, mi return implode("\n", $lines); } - if ($currentPath === $parentPath) { + // Only a block-style parent (nothing after the colon but maybe a + // comment) can take a new child line. A parent with an inline value — + // `queue: {}` / `queue: []` — would be corrupted by splicing a block + // child beneath it, so leave parentLine null and let put() fall back to + // a full re-dump (which renders it as a proper block). + if ($currentPath === $parentPath && trim(preg_replace('/#.*$/', '', $matches[3])) === '') { $parentLine = $index; $parentIndent = $indent; } @@ -289,6 +295,58 @@ public static function sessionDriver(): ?string return static::get('session.driver', static::has('tasks.web') ? 'redis' : null); } + /** + * Whether the web container also runs this workload in-process (bundled mode) + * — `tasks.web.queue` / `tasks.web.scheduler` set truthy (a bare `true` or an + * object of overrides). The bare-flag form goes through strict bool validation + * so a typo can't silently disable a bundled process. The alternative is to + * extract the workload into its own service (see hasStandaloneQueue / + * hasStandaloneScheduler); a workload can't be both at once. + */ + public static function bundles(string $program): bool + { + $value = static::get("tasks.web.$program", false); + + return is_array($value) || Helpers::validateStrictBool($value, "tasks.web.$program"); + } + + /** + * Whether the queue runs as its own ECS service (a top-level `tasks.queue` + * block) rather than bundled in the web container. Presence is the opt-in — + * an empty block extracts the queue with default sizing and scale-to-zero. + */ + public static function hasStandaloneQueue(): bool + { + return static::has('tasks.queue'); + } + + /** + * Whether the scheduler runs as its own pinned-singleton ECS service (a + * top-level `tasks.scheduler` block) rather than bundled in the web container. + */ + public static function hasStandaloneScheduler(): bool + { + return static::has('tasks.scheduler'); + } + + /** + * The workloads that run as their own ECS service for this app: web (when + * there's a `tasks.web` block) plus any extracted queue/scheduler. This is the + * single list that deploy registers task-def revisions for, sync provisions + * services for, and `yolo run --group` fans across. Bundled queue/scheduler + * are NOT here — they ride inside the web container, not their own service. + * + * @return array + */ + public static function serverGroups(): array + { + return array_values(array_filter([ + static::has('tasks.web') ? ServerGroup::WEB : null, + static::hasStandaloneQueue() ? ServerGroup::QUEUE : null, + static::hasStandaloneScheduler() ? ServerGroup::SCHEDULER : null, + ])); + } + public static function apex(): string { if (static::isMultitenanted()) { diff --git a/src/ProcessCommands.php b/src/ProcessCommands.php new file mode 100644 index 00000000..35a3ce3a --- /dev/null +++ b/src/ProcessCommands.php @@ -0,0 +1,39 @@ +current() !== null; + } + + /** + * Diff the live policy against the desired config and (only on drift, when + * applying) upsert it. + * + * @return array + */ + public function synchronise(bool $apply): array + { + $changes = $this->drift($this->current()); + + if ($changes === [] || ! $apply) { + return $changes; + } + + Aws::applicationAutoScaling()->putScalingPolicy([ + 'PolicyName' => $this->policyName(), + 'ServiceNamespace' => ApplicationAutoScaling::SERVICE_NAMESPACE, + 'ResourceId' => ScalableTarget::resourceId(ServerGroup::QUEUE), + 'ScalableDimension' => ApplicationAutoScaling::SCALABLE_DIMENSION, + 'PolicyType' => 'TargetTrackingScaling', + 'TargetTrackingScalingPolicyConfiguration' => $this->configuration(), + ]); + + return $changes; + } + + /** + * The desired TargetTrackingScalingPolicyConfiguration: a customised metric + * that divides the queue's visible-message count by the running task count. + * + * @return array + */ + public function configuration(): array + { + $queueName = Helpers::keyedResourceName(); + $cluster = (new EcsCluster())->name(); + $service = (new EcsService(ServerGroup::QUEUE))->name(); + + return [ + 'TargetValue' => $this->targetValue(), + 'CustomizedMetricSpecification' => [ + 'Metrics' => [ + [ + 'Id' => 'visible', + 'MetricStat' => [ + 'Metric' => [ + 'Namespace' => 'AWS/SQS', + 'MetricName' => 'ApproximateNumberOfMessagesVisible', + 'Dimensions' => [['Name' => 'QueueName', 'Value' => $queueName]], + ], + 'Stat' => 'Sum', + ], + 'ReturnData' => false, + ], + [ + 'Id' => 'running', + 'MetricStat' => [ + 'Metric' => [ + // RunningTaskCount is published under Container Insights, + // which the cluster enables at create. + 'Namespace' => 'ECS/ContainerInsights', + 'MetricName' => 'RunningTaskCount', + 'Dimensions' => [ + ['Name' => 'ClusterName', 'Value' => $cluster], + ['Name' => 'ServiceName', 'Value' => $service], + ], + ], + 'Stat' => 'Average', + ], + 'ReturnData' => false, + ], + [ + 'Id' => 'backlog_per_task', + // Division by a zero running-task count yields no data, so + // this stays silent at zero — the step-scaling bootstrap owns 0→1. + 'Expression' => 'visible / running', + 'Label' => 'Backlog per task', + 'ReturnData' => true, + ], + ], + ], + 'ScaleOutCooldown' => self::SCALE_OUT_COOLDOWN, + 'ScaleInCooldown' => self::SCALE_IN_COOLDOWN, + ]; + } + + /** + * Diff the comparable fields of the live policy against the desired config. A + * null $live reports every field as a change, so a fresh policy shows as a + * full create. + * + * @param array|null $live + * @return array + */ + public function drift(?array $live): array + { + $current = $live['TargetTrackingScalingPolicyConfiguration'] ?? []; + $changes = []; + + $currentTarget = isset($current['TargetValue']) ? (float) $current['TargetValue'] : null; + + if ($currentTarget !== $this->targetValue()) { + $changes[] = Change::make('queue backlog TargetValue', $currentTarget, $this->targetValue()); + } + + $currentOut = isset($current['ScaleOutCooldown']) ? (int) $current['ScaleOutCooldown'] : null; + + if ($currentOut !== self::SCALE_OUT_COOLDOWN) { + $changes[] = Change::make('queue backlog ScaleOutCooldown', $currentOut, self::SCALE_OUT_COOLDOWN); + } + + $currentIn = isset($current['ScaleInCooldown']) ? (int) $current['ScaleInCooldown'] : null; + + if ($currentIn !== self::SCALE_IN_COOLDOWN) { + $changes[] = Change::make('queue backlog ScaleInCooldown', $currentIn, self::SCALE_IN_COOLDOWN); + } + + return $changes; + } + + /** + * The live policy, or null when it isn't registered yet. + * + * @return array|null + */ + public function current(): ?array + { + try { + return ApplicationAutoScaling::scalingPolicy(ScalableTarget::resourceId(ServerGroup::QUEUE), $this->policyName()); + } catch (ResourceDoesNotExistException) { + return null; + } + } +} diff --git a/src/Resources/ApplicationAutoScaling/QueueScaleToZeroBootstrap.php b/src/Resources/ApplicationAutoScaling/QueueScaleToZeroBootstrap.php new file mode 100644 index 00000000..01960715 --- /dev/null +++ b/src/Resources/ApplicationAutoScaling/QueueScaleToZeroBootstrap.php @@ -0,0 +1,155 @@ +policyExists() && $this->alarmExists(); + } + + /** + * Provision (or confirm) the step policy + its alarm. The config is static, so + * drift is simply "either piece is missing"; reported as a Change so the sync + * step renders WOULD_CREATE / CREATED and survives the only-pending filter. + * + * @return array + */ + public function synchronise(bool $apply): array + { + $changes = []; + + if (! $this->policyExists()) { + $changes[] = Change::make('queue scale-to-zero policy', null, $this->policyName()); + } + + if (! $this->alarmExists()) { + $changes[] = Change::make('queue scale-to-zero alarm', null, $this->alarmName()); + } + + if ($changes === [] || ! $apply) { + return $changes; + } + + $policyArn = Aws::applicationAutoScaling()->putScalingPolicy([ + 'PolicyName' => $this->policyName(), + 'ServiceNamespace' => ApplicationAutoScaling::SERVICE_NAMESPACE, + 'ResourceId' => ScalableTarget::resourceId(ServerGroup::QUEUE), + 'ScalableDimension' => ApplicationAutoScaling::SCALABLE_DIMENSION, + 'PolicyType' => 'StepScaling', + 'StepScalingPolicyConfiguration' => [ + 'AdjustmentType' => 'ExactCapacity', + 'Cooldown' => self::COOLDOWN, + 'MetricAggregationType' => 'Maximum', + 'StepAdjustments' => [ + ['MetricIntervalLowerBound' => 0, 'ScalingAdjustment' => 1], + ], + ], + ])['PolicyARN']; + + Aws::cloudWatch()->putMetricAlarm([ + 'ActionsEnabled' => true, + 'AlarmName' => $this->alarmName(), + 'AlarmDescription' => 'Lifts the queue off zero when a message arrives. Created by yolo CLI', + 'ComparisonOperator' => 'GreaterThanThreshold', + 'Dimensions' => [['Name' => 'QueueName', 'Value' => Helpers::keyedResourceName()]], + 'EvaluationPeriods' => 1, + 'MetricName' => 'ApproximateNumberOfMessagesVisible', + 'Namespace' => 'AWS/SQS', + 'Period' => 60, + 'Statistic' => 'Maximum', + 'Threshold' => 0, + 'TreatMissingData' => 'notBreaching', + 'AlarmActions' => [$policyArn], + ...Aws::tags($this->tags()), + ]); + + // PutMetricAlarm ignores Tags when updating an existing alarm, so reconcile + // the ownership markers explicitly (as QueueAlarm does) — so the alarm reads + // as `ok` in yolo audit rather than `rogue`. + Aws::synchroniseCloudWatchTags( + CloudWatch::alarm($this->alarmName())['AlarmArn'], + $this->tags(), + apply: true, + ); + + return $changes; + } + + public function policyExists(): bool + { + try { + ApplicationAutoScaling::scalingPolicy(ScalableTarget::resourceId(ServerGroup::QUEUE), $this->policyName()); + + return true; + } catch (ResourceDoesNotExistException) { + return false; + } + } + + public function alarmExists(): bool + { + try { + CloudWatch::alarm($this->alarmName()); + + return true; + } catch (ResourceDoesNotExistException) { + return false; + } + } + + /** + * App-scoped ownership tags, matching what a Resource's ResolvesTags would + * stamp. The yolo:environment baseline is added at write time by Aws::tags(). + * + * @return array + */ + public function tags(): array + { + return [ + 'Name' => $this->alarmName(), + 'yolo:scope' => Scope::App->value, + 'yolo:app' => Manifest::name(), + ]; + } +} diff --git a/src/Resources/ApplicationAutoScaling/ScalableTarget.php b/src/Resources/ApplicationAutoScaling/ScalableTarget.php index 05d4bf14..0cfc98c2 100644 --- a/src/Resources/ApplicationAutoScaling/ScalableTarget.php +++ b/src/Resources/ApplicationAutoScaling/ScalableTarget.php @@ -6,18 +6,24 @@ use Codinglabs\Yolo\Change; use Codinglabs\Yolo\Helpers; use Codinglabs\Yolo\Manifest; +use Codinglabs\Yolo\Enums\ServerGroup; use Codinglabs\Yolo\Resources\Ecs\EcsCluster; use Codinglabs\Yolo\Resources\Ecs\EcsService; use Codinglabs\Yolo\Aws\ApplicationAutoScaling; use Codinglabs\Yolo\Exceptions\ResourceDoesNotExistException; /** - * The Application Auto Scaling scalable target that hands the web ECS service's - * desired count to target-tracking policies. Like QueueAlarm / Dashboard this is - * a standalone reconciler, NOT a Resource: App Auto Scaling targets aren't - * RGT-taggable (so they carry none of the ownership tags the Resource contract - * reconciles, and stay invisible to `yolo audit`) and RegisterScalableTarget is a - * pure upsert with no create/update split. + * The Application Auto Scaling scalable target that hands an ECS service's + * desired count to scaling policies. Group-aware: the web target's bounds come + * from `tasks.web.autoscaling.min/max` (autoscaling is opt-in for web), the queue + * target's from `tasks.queue.min/max` (a standalone queue is always autoscaled, + * and its floor may be 0 — scale to zero). + * + * Like QueueAlarm / Dashboard this is a standalone reconciler, NOT a Resource: + * App Auto Scaling targets aren't RGT-taggable (so they carry none of the + * ownership tags the Resource contract reconciles, and stay invisible to + * `yolo audit`) and RegisterScalableTarget is a pure upsert with no create/update + * split. * * Dry-run honest — it reads the live min/max, diffs them, and only re-registers * on drift, so `sync --dry-run` reports exactly when the capacity bounds change. @@ -28,13 +34,15 @@ */ class ScalableTarget { + public function __construct(protected ServerGroup $group = ServerGroup::WEB) {} + /** - * service/{cluster}/{web-service} — the App Auto Scaling resource id for the - * app's ECS web service. + * service/{cluster}/{service} — the App Auto Scaling resource id for a group's + * ECS service. */ - public static function resourceId(): string + public static function resourceId(ServerGroup $group = ServerGroup::WEB): string { - return sprintf('service/%s/%s', (new EcsCluster())->name(), (new EcsService())->name()); + return sprintf('service/%s/%s', (new EcsCluster())->name(), (new EcsService($group))->name()); } public function exists(): bool @@ -44,18 +52,21 @@ public function exists(): bool public function min(): int { - return Helpers::validatePositiveInt( - Manifest::get('tasks.web.autoscaling.min', 1), - 'tasks.web.autoscaling.min', - ); + if ($this->group === ServerGroup::QUEUE) { + // A standalone queue's floor may be 0 (scale to zero) — that's the opt-in. + return Helpers::validateNonNegativeInt(Manifest::get('tasks.queue.min', 0), 'tasks.queue.min'); + } + + return Helpers::validatePositiveInt(Manifest::get('tasks.web.autoscaling.min', 1), 'tasks.web.autoscaling.min'); } public function max(): int { - return Helpers::validatePositiveInt( - Manifest::get('tasks.web.autoscaling.max', 4), - 'tasks.web.autoscaling.max', - ); + if ($this->group === ServerGroup::QUEUE) { + return Helpers::validatePositiveInt(Manifest::get('tasks.queue.max', 10), 'tasks.queue.max'); + } + + return Helpers::validatePositiveInt(Manifest::get('tasks.web.autoscaling.max', 4), 'tasks.web.autoscaling.max'); } /** @@ -95,7 +106,7 @@ public function register(int $min, int $max): void { Aws::applicationAutoScaling()->registerScalableTarget([ 'ServiceNamespace' => ApplicationAutoScaling::SERVICE_NAMESPACE, - 'ResourceId' => static::resourceId(), + 'ResourceId' => static::resourceId($this->group), 'ScalableDimension' => ApplicationAutoScaling::SCALABLE_DIMENSION, 'MinCapacity' => $min, 'MaxCapacity' => $max, @@ -106,7 +117,7 @@ public function deregister(): void { Aws::applicationAutoScaling()->deregisterScalableTarget([ 'ServiceNamespace' => ApplicationAutoScaling::SERVICE_NAMESPACE, - 'ResourceId' => static::resourceId(), + 'ResourceId' => static::resourceId($this->group), 'ScalableDimension' => ApplicationAutoScaling::SCALABLE_DIMENSION, ]); } @@ -119,7 +130,7 @@ public function deregister(): void public function current(): ?array { try { - $target = ApplicationAutoScaling::scalableTarget(static::resourceId()); + $target = ApplicationAutoScaling::scalableTarget(static::resourceId($this->group)); return ['min' => (int) $target['MinCapacity'], 'max' => (int) $target['MaxCapacity']]; } catch (ResourceDoesNotExistException) { diff --git a/src/Resources/Ecs/EcsService.php b/src/Resources/Ecs/EcsService.php index 6dd42330..28d2e886 100644 --- a/src/Resources/Ecs/EcsService.php +++ b/src/Resources/Ecs/EcsService.php @@ -8,6 +8,7 @@ use Codinglabs\Yolo\Helpers; use Codinglabs\Yolo\Manifest; use Codinglabs\Yolo\Enums\Scope; +use Codinglabs\Yolo\Enums\ServerGroup; use Codinglabs\Yolo\Resources\Resource; use Codinglabs\Yolo\Resources\ResolvesTags; use Codinglabs\Yolo\Resources\Ec2\PublicSubnet; @@ -15,15 +16,35 @@ use Codinglabs\Yolo\Resources\Ec2\EcsTaskSecurityGroup; use Codinglabs\Yolo\Exceptions\ResourceDoesNotExistException; +/** + * One app's ECS service for a given workload group. Each group (web / queue / + * scheduler) gets its own service + task-definition family so they scale + * independently. The group defaults to web, so every bare `new EcsService()` + * keeps meaning the web service it always did. + * + * Topology follows the group: + * - web attaches to the ALB (target group, health-check grace, container port); + * queue and scheduler are headless workers. + * - the scheduler is a pinned singleton, deployed stop-then-start so a rollout + * never briefly runs two crons; web and queue roll the normal way. + * - desired count is create-only and owned by ops/autoscaling afterwards — sync + * never clobbers it. The queue starts at its autoscaling floor (0 when it + * scales to zero); web and scheduler start at one task. + */ class EcsService implements Resource { use ResolvesTags; - protected const INITIAL_DESIRED_COUNT = 1; + public function __construct(protected ServerGroup $group = ServerGroup::WEB) {} + + public function group(): ServerGroup + { + return $this->group; + } public function name(): string { - return $this->keyedName('web'); + return $this->keyedName($this->group); } public function scope(): Scope @@ -58,12 +79,11 @@ public function synchroniseTags(bool $apply): array } /** - * Exec-command and grace-period drift are reconciled by updateService, so - * toggling tasks.web.enable-execute-command takes effect on the next sync. - * Desired count is NOT reconciled — capacity is set once at create then owned - * by ops (the console, a future `yolo scale`, or autoscaling), so a deploy/sync - * never resets it out from under a manual scale. Task definition revision - * adoption is owned by `yolo deploy`, not sync. + * Exec-command and (web only) grace-period drift are reconciled by + * updateService. Desired count is NOT reconciled — capacity is set once at + * create then owned by ops (the console, `yolo scale`, or autoscaling), so a + * deploy/sync never resets it out from under a manual scale. Task definition + * revision adoption is owned by `yolo deploy`, not sync. */ public function needsUpdate(): bool { @@ -82,23 +102,24 @@ public function pendingChanges(): array Ecs::service((new EcsCluster())->name(), $this->name()), $this->gracePeriod(), $this->enableExecuteCommand(), + $this->reconcilesGracePeriod(), ); } - public static function serviceNeedsUpdate(array $service, int $gracePeriod, bool $enableExecuteCommand): bool + public static function serviceNeedsUpdate(array $service, int $gracePeriod, bool $enableExecuteCommand, bool $reconcilesGracePeriod = true): bool { - return static::serviceChanges($service, $gracePeriod, $enableExecuteCommand) !== []; + return static::serviceChanges($service, $gracePeriod, $enableExecuteCommand, $reconcilesGracePeriod) !== []; } /** * Pure comparison — extracted so tests can pin headless / missing-grace-period * behaviour without mocking the ECS client. Exec-command drift is always - * reconciled; the grace period only when the service is ALB-attached (headless - * services have no grace period to reconcile). + * reconciled; the grace period only for an ALB-attached (web, non-headless) + * service — a headless web app or a queue/scheduler worker has none. * * @return array */ - public static function serviceChanges(array $service, int $gracePeriod, bool $enableExecuteCommand): array + public static function serviceChanges(array $service, int $gracePeriod, bool $enableExecuteCommand, bool $reconcilesGracePeriod = true): array { $changes = []; @@ -108,7 +129,7 @@ public static function serviceChanges(array $service, int $gracePeriod, bool $en $changes[] = Change::make('enableExecuteCommand', $currentExecuteCommand, $enableExecuteCommand); } - if (! Manifest::isHeadless()) { + if ($reconcilesGracePeriod) { $currentGracePeriod = $service['healthCheckGracePeriodSeconds'] ?? $gracePeriod; if ($currentGracePeriod !== $gracePeriod) { @@ -129,17 +150,17 @@ public function createPayload(): array return [ 'cluster' => (new EcsCluster())->name(), 'serviceName' => $this->name(), - // The task definition family is the web service name — SyncTaskDefinitionStep + // The task definition family is the service name — SyncTaskDefinitionStep // registers the family from this same value. TaskDef doesn't fit the Resource // shape (re-registered every sync, no exists/create distinction), so the family // is the service name rather than its own Resource. 'taskDefinition' => $this->name(), - // Capacity isn't a manifest concern — start at one task and let ops - // scale it (console / `yolo scale` / autoscaling); never reconciled. - 'desiredCount' => self::INITIAL_DESIRED_COUNT, - 'launchType' => 'FARGATE', - ...Manifest::isHeadless() ? [] : ['healthCheckGracePeriodSeconds' => $this->gracePeriod()], - 'deploymentConfiguration' => static::deploymentConfiguration(), + // Capacity isn't a manifest concern — start at the group's floor and let + // ops scale it (console / `yolo scale` / autoscaling); never reconciled. + 'desiredCount' => $this->initialDesiredCount(), + ...$this->launchConfiguration(), + ...$this->attachesToLoadBalancer() ? ['healthCheckGracePeriodSeconds' => $this->gracePeriod()] : [], + 'deploymentConfiguration' => $this->deploymentConfiguration(), 'networkConfiguration' => [ 'awsvpcConfiguration' => [ 'subnets' => PublicSubnet::ids(), @@ -147,41 +168,62 @@ public function createPayload(): array 'assignPublicIp' => 'ENABLED', ], ], - ...Manifest::isHeadless() ? [] : [ + ...$this->attachesToLoadBalancer() ? [ 'loadBalancers' => [ [ 'targetGroupArn' => (new TargetGroup())->arn(), - 'containerName' => 'web', + 'containerName' => $this->group->value, 'containerPort' => (int) Manifest::get('tasks.web.port', 8000), ], ], - ], + ] : [], 'tags' => Aws::ecsTags($this->tags()), 'propagateTags' => 'SERVICE', 'enableExecuteCommand' => $this->enableExecuteCommand(), ]; } + /** + * FARGATE by default. A standalone queue can opt into Spot (`tasks.queue.spot: + * true`) for ~70% cheaper interruptible capacity — fine for a worker whose + * jobs retry on interruption. Spot uses a capacity-provider strategy, which is + * mutually exclusive with launchType, so it's one or the other. + * + * @return array + */ + protected function launchConfiguration(): array + { + if ($this->group === ServerGroup::QUEUE && $this->spot()) { + return ['capacityProviderStrategy' => [['capacityProvider' => 'FARGATE_SPOT', 'weight' => 1]]]; + } + + return ['launchType' => 'FARGATE']; + } + /** * Roll one task in at a time (minimumHealthyPercent 100 keeps the old version * serving until the new one is healthy; maximumPercent 200 allows the extra * task), with the deployment circuit breaker aborting and rolling back to the - * last healthy revision on a failed rollout. The breaker is also what makes - * ECS set the deployment's rolloutState to FAILED — the signal - * WaitForDeploymentHealthyStep fast-fails on — so without it a crash-looping - * deploy is never marked failed and the health-wait eats its full timeout. + * last healthy revision on a failed rollout. + * + * The scheduler is the exception: it's a singleton, so it deploys stop-then-start + * (minimumHealthyPercent 0 / maximumPercent 100) — the old cron task stops + * before the new one starts, so a rollout never briefly runs two schedulers + * (a missed cron minute is harmless; a double-run isn't). The circuit breaker + * stays on either way — it's what makes ECS mark a broken deploy FAILED, the + * signal WaitForDeploymentHealthyStep fast-fails on. * * @return array */ - public static function deploymentConfiguration(): array + public function deploymentConfiguration(): array { return [ 'deploymentCircuitBreaker' => [ 'enable' => true, 'rollback' => true, ], - 'minimumHealthyPercent' => 100, - 'maximumPercent' => 200, + 'minimumHealthyPercent' => $this->group->isSingleton() ? 0 : 100, + 'maximumPercent' => $this->group->isSingleton() ? 100 : 200, ]; } @@ -192,15 +234,15 @@ public function updatePayload(): array 'service' => $this->name(), 'enableExecuteCommand' => $this->enableExecuteCommand(), // No desiredCount — capacity is create-only (see needsUpdate()). - ...Manifest::isHeadless() ? [] : ['healthCheckGracePeriodSeconds' => $this->gracePeriod()], + ...$this->attachesToLoadBalancer() ? ['healthCheckGracePeriodSeconds' => $this->gracePeriod()] : [], ]; } public function enableExecuteCommand(): bool { return Helpers::validateStrictBool( - Manifest::get('tasks.web.enable-execute-command', false), - 'tasks.web.enable-execute-command', + Manifest::get("{$this->group->manifestPrefix()}.enable-execute-command", false), + "{$this->group->manifestPrefix()}.enable-execute-command", ); } @@ -208,4 +250,40 @@ public function gracePeriod(): int { return (int) Manifest::get('tasks.web.health-check.grace-period', 60); } + + /** + * The desired count to create the service at. The queue starts at its + * autoscaling floor — 0 when it scales to zero — so a fresh idle queue costs + * nothing; web and the scheduler start at one task. + */ + protected function initialDesiredCount(): int + { + if ($this->group === ServerGroup::QUEUE) { + return (int) Manifest::get('tasks.queue.min', 0); + } + + return 1; + } + + /** + * Whether this service sits behind the ALB — web only, and only when the app + * isn't headless (no domain → no ALB to attach to). + */ + protected function attachesToLoadBalancer(): bool + { + return $this->group->attachesToLoadBalancer() && ! Manifest::isHeadless(); + } + + protected function reconcilesGracePeriod(): bool + { + return $this->attachesToLoadBalancer(); + } + + protected function spot(): bool + { + return Helpers::validateStrictBool( + Manifest::get('tasks.queue.spot', false), + 'tasks.queue.spot', + ); + } } diff --git a/src/Resources/Iam/DeployerPolicy.php b/src/Resources/Iam/DeployerPolicy.php index c6e22d67..68a9b653 100644 --- a/src/Resources/Iam/DeployerPolicy.php +++ b/src/Resources/Iam/DeployerPolicy.php @@ -96,7 +96,18 @@ public function document(): array $ecrRepositoryArn = sprintf('arn:aws:ecr:%s:%s:repository/%s', $region, $accountId, (new EcrRepository())->name()); $cluster = (new EcsCluster())->name(); - $service = (new EcsService())->name(); // also the task-definition family + + // Each service group (web + any standalone queue/scheduler) gets its own + // service + task-definition family (the family is the service name), so the + // deployer needs UpdateService/RegisterTaskDefinition scoped to all of them. + $serviceArns = []; + $taskDefinitionArns = []; + + foreach (Manifest::serverGroups() as $group) { + $name = (new EcsService($group))->name(); + $serviceArns[] = sprintf('arn:aws:ecs:%s:%s:service/%s/%s', $region, $accountId, $cluster, $name); + $taskDefinitionArns[] = sprintf('arn:aws:ecs:%s:%s:task-definition/%s:*', $region, $accountId, $name); + } $assetBucketArn = sprintf('arn:aws:s3:::%s', (new AssetBucket())->name()); $artefactsBucketArn = sprintf('arn:aws:s3:::%s', Paths::s3ArtefactsBucket()); @@ -147,13 +158,13 @@ public function document(): array ], ], [ - // Roll the new revision onto this app's service and run the one-off + // Roll the new revision onto this app's services and run the one-off // deploy task (migrations) on its cluster. 'Effect' => 'Allow', 'Resource' => [ sprintf('arn:aws:ecs:%s:%s:cluster/%s', $region, $accountId, $cluster), - sprintf('arn:aws:ecs:%s:%s:service/%s/%s', $region, $accountId, $cluster, $service), - sprintf('arn:aws:ecs:%s:%s:task-definition/%s:*', $region, $accountId, $service), + ...$serviceArns, + ...$taskDefinitionArns, sprintf('arn:aws:ecs:%s:%s:task/%s/*', $region, $accountId, $cluster), ], 'Action' => [ diff --git a/src/ShutdownTimings.php b/src/ShutdownTimings.php index 94cbeabd..bbc768c2 100644 --- a/src/ShutdownTimings.php +++ b/src/ShutdownTimings.php @@ -2,8 +2,10 @@ namespace Codinglabs\Yolo; +use Codinglabs\Yolo\Enums\ServerGroup; + /** - * One source of truth for how the web container shuts down, so supervisord's + * One source of truth for how a container shuts down, so supervisord's * per-program stop waits and ECS's stopTimeout can't drift apart. * * Each process shares one key — `shutdown-grace-period`: how long it gets to finish work on @@ -29,6 +31,11 @@ class ShutdownTimings // The web process's graceful-stop window when not set in the manifest. private const WEB_DEFAULT_GRACE = 10; + // A standalone scheduler's graceful-stop window: long enough to let an + // in-flight schedule:run tick finish. A scheduled command that routinely + // outlasts this belongs on the queue, not the cron tick. + private const SCHEDULER_DEFAULT_GRACE = 10; + // Headroom between the longest graceful stop and ECS's SIGKILL so a process // draining right up to its window isn't cut off at the wire. private const STOP_TIMEOUT_BUFFER = 5; @@ -90,14 +97,42 @@ public static function stopTimeout(): int ); } - protected static function enabled(string $program): bool + /** + * The graceful-stop window for a standalone service's sole process, read from + * its own task block (`tasks.{group}.shutdown-grace-period`). The queue worker + * defaults longer (an in-flight job can run a while); the scheduler just needs + * its current cron tick to finish. + */ + public static function standaloneGrace(ServerGroup $group): int { - $value = Manifest::get("tasks.web.$program", false); + return match ($group) { + ServerGroup::WEB => static::webGrace(), + ServerGroup::QUEUE => (int) Manifest::get('tasks.queue.shutdown-grace-period', static::QUEUE_DEFAULT_GRACE), + ServerGroup::SCHEDULER => (int) Manifest::get('tasks.scheduler.shutdown-grace-period', static::SCHEDULER_DEFAULT_GRACE), + }; + } - // The object form (with overrides like shutdown-grace-period) means enabled; a bare - // flag still goes through strict validation so a typo can't silently - // disable a process. - return is_array($value) || Helpers::validateStrictBool($value, "tasks.web.$program"); + /** + * ECS's SIGTERM-to-SIGKILL ceiling for a service's task. The web container + * bundles several processes behind the ALB drain, so it uses the full + * drain-plus-slowest-program calc. A standalone queue/scheduler runs one + * process with no ALB to drain, so it just needs its grace plus buffer. + */ + public static function stopTimeoutFor(ServerGroup $group): int + { + if ($group === ServerGroup::WEB) { + return static::stopTimeout(); + } + + return min(static::standaloneGrace($group) + static::STOP_TIMEOUT_BUFFER, static::MAX_STOP_TIMEOUT); + } + + protected static function enabled(string $program): bool + { + // Whether the web container bundles this program — the object form (with + // overrides like shutdown-grace-period) means enabled; a bare flag still + // goes through strict validation so a typo can't silently disable it. + return Manifest::bundles($program); } protected static function grace(string $program, int $default): int diff --git a/src/Steps/Build/Fargate/GenerateEntrypointScriptStep.php b/src/Steps/Build/Fargate/GenerateEntrypointScriptStep.php index 76526e8f..8583797f 100644 --- a/src/Steps/Build/Fargate/GenerateEntrypointScriptStep.php +++ b/src/Steps/Build/Fargate/GenerateEntrypointScriptStep.php @@ -5,10 +5,29 @@ use Codinglabs\Yolo\Paths; use Codinglabs\Yolo\Manifest; use Codinglabs\Yolo\Contracts\Step; +use Codinglabs\Yolo\ProcessCommands; use Codinglabs\Yolo\ShutdownTimings; use Codinglabs\Yolo\Enums\StepResult; use Illuminate\Filesystem\Filesystem; - +use Codinglabs\Yolo\Enums\ServerGroup; + +/** + * Generates the container entrypoint into the build context. One image serves + * every workload — the ECS task definition passes the role (web | queue | + * scheduler) as the container command, and the entrypoint dispatches on it: + * + * - web → supervisord (octane + any bundled queue/scheduler), drained + * behind the ALB so a deploy doesn't 502. The default role. + * - queue → the queue worker; queue:work finishes its in-flight job on + * SIGTERM, so the generic supervise-and-forward is the whole drain. + * - scheduler → busybox crond; the drain halts cron and waits out any in-flight + * schedule:run so a deploy never cuts a scheduled command short. + * + * The queue and scheduler branches are emitted only when the app extracts them + * into their own service (a top-level tasks.queue / tasks.scheduler block), so a + * plain web app's entrypoint mentions neither. deploy-all: hooks run on every + * container start regardless of role. + */ class GenerateEntrypointScriptStep implements Step { public function __construct( @@ -19,38 +38,39 @@ public function __construct( public function __invoke(): StepResult { $deployAll = Manifest::get('deploy-all', []); - $drain = ShutdownTimings::drain(); - $graces = ShutdownTimings::programGraces(); $body = "#!/bin/sh\n" - . "# Auto-generated by YOLO. Runs deploy-all: hooks, then supervises the CMD\n" - . "# with a lame-duck drain so the ALB stops routing before the server stops.\n" + . "# Auto-generated by YOLO. Runs deploy-all: hooks, then supervises the\n" + . "# role's process (web | queue | scheduler — passed as the container command)\n" + . "# with a per-role graceful drain.\n" . "set -e\n\n"; foreach ($deployAll as $command) { $body .= $command . "\n"; } - // Past startup. ECS sends SIGTERM at the same moment it deregisters the task, - // but the ALB takes a few seconds to actually stop routing to a draining - // target. Backgrounding the CMD lets us trap that SIGTERM and keep serving for - // the web shutdown-grace-period window before forwarding the stop, so requests the ALB - // sends mid-drain still land on a live server instead of 502ing. A headless - // app has no target group to drain, so the sleep is dropped and we stop at once. - $drainBody = $this->drainBody($drain, $graces); - $body .= sprintf(<<<'SH' +role="${1:-web}" + set +e draining=0 drain() { draining=1 -%s kill -TERM "$child" 2>/dev/null + case "$role" in + web) +%s ;; +%s esac + kill -TERM "$child" 2>/dev/null } trap drain TERM -"$@" & +case "$role" in +%s *) cmd='supervisord -c /etc/supervisord.conf -n' ;; +esac + +$cmd & child=$! wait "$child" status=$? @@ -64,7 +84,11 @@ public function __invoke(): StepResult exit "$status" -SH, $drainBody); +SH, + $this->indent($this->webDrainBody(), 12), + $this->schedulerDrainCase(), + $this->commandCases(), + ); $path = Paths::build('.yolo-entrypoint.sh'); @@ -76,34 +100,110 @@ public function __invoke(): StepResult } /** - * The body of the trap that runs on SIGTERM, before the stop is forwarded to - * supervisord. With a scheduler it halts cron (no new schedule:run) and waits - * out any in-flight run; otherwise it's just the lame-duck sleep for the ALB. - * - * @param array $graces + * The cmd dispatch branches for the roles this app extracts into their own + * service. Web is the `*)` default (supervisord), so it needs no branch. + */ + protected function commandCases(): string + { + $cases = ''; + + if (Manifest::hasStandaloneQueue()) { + $cases .= sprintf(" queue) cmd='%s' ;;\n", ProcessCommands::queue()); + } + + if (Manifest::hasStandaloneScheduler()) { + $cases .= sprintf(" scheduler) cmd='%s' ;;\n", ProcessCommands::scheduler()); + } + + return $cases; + } + + /** + * The standalone scheduler's drain branch, emitted only when the scheduler is + * its own service. The queue role needs no drain branch — queue:work finishes + * its in-flight job on the generic SIGTERM forward. + */ + protected function schedulerDrainCase(): string + { + if (! Manifest::hasStandaloneScheduler()) { + return ''; + } + + return sprintf(" scheduler)\n%s ;;\n", $this->indent($this->schedulerDrainBody(), 12)); + } + + /** + * The web role's drain, run before the stop is forwarded to supervisord. ECS + * sends SIGTERM the moment it deregisters the task, but the ALB takes a few + * seconds to stop routing to a draining target, so we keep serving for the + * drain window first. With a bundled scheduler it also halts cron and waits + * out any in-flight schedule:run. A headless app has no target group, so the + * sleep is dropped and we stop at once. */ - protected function drainBody(int $drain, array $graces): string + protected function webDrainBody(): string { + $drain = ShutdownTimings::drain(); + $graces = ShutdownTimings::programGraces(); + if (! isset($graces['scheduler'])) { - return $drain > 0 ? " sleep $drain\n" : ''; + return $drain > 0 ? "sleep $drain\n" : ''; } // Stop cron first so no new schedule:run fires during the drain, then hold // the container open until the drain window has elapsed *and* any in-flight // schedule:run has finished — bounded by the scheduler's grace so a long job - // can't stall the deploy past the stopTimeout. A scheduled command is never - // cut mid-run; one that outlasts the window belongs on the queue instead. + // can't stall the deploy past the stopTimeout. return sprintf(<<<'SH' - supervisorctl -c /etc/supervisord.conf stop scheduler >/dev/null 2>&1 - waited=0 - while [ "$waited" -lt %d ]; do - if [ "$waited" -ge %d ] && ! pgrep -f 'artisan schedule:run' >/dev/null 2>&1; then - break - fi - sleep 1 - waited=$((waited + 1)) - done +supervisorctl -c /etc/supervisord.conf stop scheduler >/dev/null 2>&1 +waited=0 +while [ "$waited" -lt %d ]; do + if [ "$waited" -ge %d ] && ! pgrep -f 'artisan schedule:run' >/dev/null 2>&1; then + break + fi + sleep 1 + waited=$((waited + 1)) +done SH, max($drain, $graces['scheduler']), $drain); } + + /** + * The standalone scheduler role's drain: stop crond (the child) so no new + * schedule:run fires, then wait out any in-flight run bounded by the + * scheduler's grace. Killing crond here is harmless when the generic + * forwarder kills the (already-dead) child again afterwards. + */ + protected function schedulerDrainBody(): string + { + return sprintf(<<<'SH' +kill -TERM "$child" 2>/dev/null +waited=0 +while [ "$waited" -lt %d ]; do + if ! pgrep -f 'artisan schedule:run' >/dev/null 2>&1; then + break + fi + sleep 1 + waited=$((waited + 1)) +done + +SH, ShutdownTimings::standaloneGrace(ServerGroup::SCHEDULER)); + } + + /** + * Indent every non-empty line of a generated block by $spaces, so the drain + * bodies sit at the right depth inside the entrypoint's case statement. + */ + protected function indent(string $block, int $spaces): string + { + if ($block === '') { + return ''; + } + + $pad = str_repeat(' ', $spaces); + + return implode("\n", array_map( + fn (string $line) => $line === '' ? '' : $pad . $line, + explode("\n", rtrim($block, "\n")), + )) . "\n"; + } } diff --git a/src/Steps/Build/Fargate/GenerateSupervisorConfigStep.php b/src/Steps/Build/Fargate/GenerateSupervisorConfigStep.php index 44d3417b..4532be47 100644 --- a/src/Steps/Build/Fargate/GenerateSupervisorConfigStep.php +++ b/src/Steps/Build/Fargate/GenerateSupervisorConfigStep.php @@ -3,8 +3,8 @@ namespace Codinglabs\Yolo\Steps\Build\Fargate; use Codinglabs\Yolo\Paths; -use Codinglabs\Yolo\Manifest; use Codinglabs\Yolo\Contracts\Step; +use Codinglabs\Yolo\ProcessCommands; use Codinglabs\Yolo\ShutdownTimings; use Codinglabs\Yolo\Enums\StepResult; use Illuminate\Filesystem\Filesystem; @@ -55,23 +55,20 @@ protected function config(array $graces): string // matches the container stopTimeout derived from the same source. $blocks = [ $this->header(), - $this->program('octane', sprintf( - 'php artisan octane:frankenphp --host=0.0.0.0 --port=%d', - (int) Manifest::get('tasks.web.port', 8000), - ), stopwaitsecs: $graces['octane']), + $this->program('octane', ProcessCommands::octane(), stopwaitsecs: $graces['octane']), ]; if (isset($graces['scheduler'])) { // Cron (crond) fires an ephemeral schedule:run each minute rather than a // long-lived schedule:work daemon — so the trigger halts cleanly on // shutdown and only the in-flight run is waited out (see the entrypoint). - $blocks[] = $this->program('scheduler', 'crond -f -d 8 -c /app/docker/crontabs', stopwaitsecs: $graces['scheduler']); + $blocks[] = $this->program('scheduler', ProcessCommands::scheduler(), stopwaitsecs: $graces['scheduler']); } if (isset($graces['queue'])) { // Longer stop wait so an in-flight job can finish on SIGTERM before // supervisor force-kills the worker. - $blocks[] = $this->program('queue', 'php artisan queue:work --tries=3 --max-time=3600', stopwaitsecs: $graces['queue']); + $blocks[] = $this->program('queue', ProcessCommands::queue(), stopwaitsecs: $graces['queue']); } return implode("\n\n", $blocks) . "\n"; diff --git a/src/Steps/Deploy/RegisterTaskDefinitionRevisionStep.php b/src/Steps/Deploy/RegisterTaskDefinitionRevisionStep.php index 412f5056..7418d66f 100644 --- a/src/Steps/Deploy/RegisterTaskDefinitionRevisionStep.php +++ b/src/Steps/Deploy/RegisterTaskDefinitionRevisionStep.php @@ -6,17 +6,28 @@ use Illuminate\Support\Arr; use Codinglabs\Yolo\Contracts\Step; use Codinglabs\Yolo\Enums\StepResult; +use Codinglabs\Yolo\Concerns\ResolvesServerGroups; use Codinglabs\Yolo\Steps\Sync\App\SyncTaskDefinitionStep; class RegisterTaskDefinitionRevisionStep implements Step { + use ResolvesServerGroups; + public function __construct(protected string $environment) {} + /** + * Mint a fresh, immutable task-definition revision (stamped with this deploy's + * image tag) for each targeted service group — every group the app runs by + * default, or the subset named by --group — so UpdateEcsServiceStep can roll + * each service onto it. + */ public function __invoke(array $options): StepResult { - Aws::ecs()->registerTaskDefinition( - SyncTaskDefinitionStep::payload(Arr::get($options, 'app-version')) - ); + foreach ($this->resolveServerGroups(Arr::get($options, 'group')) as $group) { + Aws::ecs()->registerTaskDefinition( + SyncTaskDefinitionStep::payload($group, Arr::get($options, 'app-version')) + ); + } return StepResult::CREATED; } diff --git a/src/Steps/Deploy/UpdateEcsServiceStep.php b/src/Steps/Deploy/UpdateEcsServiceStep.php index b2dee39b..43c2afb1 100644 --- a/src/Steps/Deploy/UpdateEcsServiceStep.php +++ b/src/Steps/Deploy/UpdateEcsServiceStep.php @@ -3,26 +3,39 @@ namespace Codinglabs\Yolo\Steps\Deploy; use Codinglabs\Yolo\Aws; +use Illuminate\Support\Arr; use Codinglabs\Yolo\Contracts\Step; use Codinglabs\Yolo\Enums\StepResult; use Codinglabs\Yolo\Resources\Ecs\EcsCluster; use Codinglabs\Yolo\Resources\Ecs\EcsService; +use Codinglabs\Yolo\Concerns\ResolvesServerGroups; class UpdateEcsServiceStep implements Step { + use ResolvesServerGroups; + public function __construct(protected string $environment) {} - public function __invoke(): StepResult + /** + * Roll each targeted service group (every group the app runs, or the --group + * subset) onto the revision RegisterTaskDefinitionRevisionStep just minted. + * Each service's task-definition family is its own name (see EcsService), so + * pointing the service at its family adopts that group's newest revision. + */ + public function __invoke(array $options): StepResult { - $service = new EcsService(); - - Aws::ecs()->updateService([ - 'cluster' => (new EcsCluster())->name(), - 'service' => $service->name(), - // The task definition family is the web service name (see EcsService). - 'taskDefinition' => $service->name(), - 'forceNewDeployment' => true, - ]); + $cluster = (new EcsCluster())->name(); + + foreach ($this->resolveServerGroups(Arr::get($options, 'group')) as $group) { + $service = new EcsService($group); + + Aws::ecs()->updateService([ + 'cluster' => $cluster, + 'service' => $service->name(), + 'taskDefinition' => $service->name(), + 'forceNewDeployment' => true, + ]); + } return StepResult::SYNCED; } diff --git a/src/Steps/Deploy/WaitForDeploymentHealthyStep.php b/src/Steps/Deploy/WaitForDeploymentHealthyStep.php index f327d05b..712cbb64 100644 --- a/src/Steps/Deploy/WaitForDeploymentHealthyStep.php +++ b/src/Steps/Deploy/WaitForDeploymentHealthyStep.php @@ -4,15 +4,20 @@ use RuntimeException; use Codinglabs\Yolo\Aws; +use Illuminate\Support\Arr; use Codinglabs\Yolo\Aws\Ecs; use Codinglabs\Yolo\Contracts\Step; use Codinglabs\Yolo\Enums\StepResult; +use Codinglabs\Yolo\Enums\ServerGroup; use Codinglabs\Yolo\Resources\Ecs\EcsCluster; use Codinglabs\Yolo\Resources\Ecs\EcsService; use Codinglabs\Yolo\Resources\ElbV2\TargetGroup; +use Codinglabs\Yolo\Concerns\ResolvesServerGroups; class WaitForDeploymentHealthyStep implements Step { + use ResolvesServerGroups; + public function __construct(protected string $environment) {} /** @@ -25,6 +30,14 @@ public function __construct(protected string $environment) {} */ public function __invoke(array $options): StepResult { + // The health gate is ALB-based, so it only applies to the web service. A + // deploy that targets only the queue/scheduler (--group) has no ALB rollout + // to wait on — the ECS circuit breaker still auto-rolls-back a broken + // headless deploy — so skip the wait entirely. + if (! in_array(ServerGroup::WEB, $this->resolveServerGroups(Arr::get($options, 'group')), true)) { + return StepResult::SKIPPED; + } + $cluster = (new EcsCluster())->name(); $service = (new EcsService())->name(); $targetGroupArn = (new TargetGroup())->arn(); diff --git a/src/Steps/Sync/App/SyncEcsServiceStep.php b/src/Steps/Sync/App/SyncEcsServiceStep.php index ebc0a97e..dad5fb99 100644 --- a/src/Steps/Sync/App/SyncEcsServiceStep.php +++ b/src/Steps/Sync/App/SyncEcsServiceStep.php @@ -5,6 +5,7 @@ use Illuminate\Support\Arr; use Codinglabs\Yolo\Contracts\Step; use Codinglabs\Yolo\Enums\StepResult; +use Codinglabs\Yolo\Enums\ServerGroup; use Codinglabs\Yolo\Resources\Ecs\EcsService; use Codinglabs\Yolo\Concerns\SynchronisesResource; @@ -12,9 +13,19 @@ class SyncEcsServiceStep implements Step { use SynchronisesResource; + /** + * The workload group this step syncs a service for — web here; the + * queue/scheduler subclasses override it. Standalone queue/scheduler steps are + * only wired into sync:app when their block is present. + */ + protected function group(): ServerGroup + { + return ServerGroup::WEB; + } + public function __invoke(array $options): StepResult { - $service = new EcsService(); + $service = new EcsService($this->group()); // Task definition revision adoption is owned by `yolo deploy`, not sync — // sync reconciles only the slow-moving service-level knobs. diff --git a/src/Steps/Sync/App/SyncQueueScalableTargetStep.php b/src/Steps/Sync/App/SyncQueueScalableTargetStep.php new file mode 100644 index 00000000..2834f14c --- /dev/null +++ b/src/Steps/Sync/App/SyncQueueScalableTargetStep.php @@ -0,0 +1,19 @@ +min() !== 0 || ! (new EcsService(ServerGroup::QUEUE))->exists()) { + return StepResult::SKIPPED; + } + + $dryRun = (bool) Arr::get($options, 'dry-run'); + $bootstrap = new QueueScaleToZeroBootstrap(); + $existed = $bootstrap->exists(); + + $changes = $bootstrap->synchronise(apply: ! $dryRun); + + $this->recordChanges($changes); + + if (! $existed) { + return $dryRun ? StepResult::WOULD_CREATE : StepResult::CREATED; + } + + return StepResult::SYNCED; + } +} diff --git a/src/Steps/Sync/App/SyncQueueScalingPolicyStep.php b/src/Steps/Sync/App/SyncQueueScalingPolicyStep.php new file mode 100644 index 00000000..79a6bdca --- /dev/null +++ b/src/Steps/Sync/App/SyncQueueScalingPolicyStep.php @@ -0,0 +1,49 @@ +exists()) { + return StepResult::SKIPPED; + } + + $dryRun = (bool) Arr::get($options, 'dry-run'); + $policy = new QueueBacklogPolicy(); + $existed = $policy->exists(); + + $changes = $policy->synchronise(apply: ! $dryRun); + + $this->recordChanges($changes); + + if (! $existed) { + return $dryRun ? StepResult::WOULD_CREATE : StepResult::CREATED; + } + + if ($changes !== []) { + return $dryRun ? StepResult::WOULD_SYNC : StepResult::SYNCED; + } + + return StepResult::SYNCED; + } +} diff --git a/src/Steps/Sync/App/SyncQueueServiceStep.php b/src/Steps/Sync/App/SyncQueueServiceStep.php new file mode 100644 index 00000000..00183d61 --- /dev/null +++ b/src/Steps/Sync/App/SyncQueueServiceStep.php @@ -0,0 +1,17 @@ +exists()) { + if (! (new EcsService($this->group()))->exists()) { return StepResult::SKIPPED; } $dryRun = (bool) Arr::get($options, 'dry-run'); - $target = new ScalableTarget(); + $target = new ScalableTarget($this->group()); $live = $target->current(); if (! Manifest::has('tasks.web.autoscaling')) { @@ -68,7 +78,8 @@ public function __invoke(array $options): StepResult if (! $dryRun && static::wouldReduce($target, $live) && static::unattended($options)) { warning(sprintf( - 'Skipped the web autoscaling reduction: manifest bounds (%d–%d) are below live (%d–%d). Lower capacity with an interactive `yolo sync` or `yolo scale` — never unattended.', + 'Skipped the %s autoscaling reduction: manifest bounds (%d–%d) are below live (%d–%d). Lower capacity with an interactive `yolo sync` or `yolo scale` — never unattended.', + $this->group()->value, $target->min(), $target->max(), $live['min'], diff --git a/src/Steps/Sync/App/SyncSchedulerServiceStep.php b/src/Steps/Sync/App/SyncSchedulerServiceStep.php new file mode 100644 index 00000000..129dd44f --- /dev/null +++ b/src/Steps/Sync/App/SyncSchedulerServiceStep.php @@ -0,0 +1,18 @@ +registerTaskDefinition(static::payload()); + Aws::ecs()->registerTaskDefinition(static::payload($this->group())); return StepResult::SYNCED; } - public static function payload(?string $imageTag = null): array + /** + * The workload group this step registers a task definition for — web here; + * the queue/scheduler subclasses override it. Standalone queue/scheduler steps + * are only wired into sync:app when their block is present. + */ + protected function group(): ServerGroup { - $port = (int) Manifest::get('tasks.web.port', 8000); - $cpu = (string) Manifest::get('tasks.web.cpu', '512'); - $memory = (string) Manifest::get('tasks.web.memory', '1024'); - - $image = (new EcrRepository())->uri() . ':' . ($imageTag ?? 'latest'); + return ServerGroup::WEB; + } - $taskRoleArn = Manifest::has('tasks.web.task-role') - ? Manifest::get('tasks.web.task-role') - : (new EcsTaskRole())->arn(); + public static function payload(ServerGroup $group = ServerGroup::WEB, ?string $imageTag = null): array + { + $prefix = $group->manifestPrefix(); + $cpu = (string) Manifest::get("$prefix.cpu", $group->defaultCpu()); + $memory = (string) Manifest::get("$prefix.memory", $group->defaultMemory()); - $executionRoleArn = Manifest::has('tasks.web.execution-role') - ? Manifest::get('tasks.web.execution-role') - : (new EcsExecutionRole())->arn(); + $image = (new EcrRepository())->uri() . ':' . ($imageTag ?? 'latest'); - // The family is the web service name — EcsService points its `taskDefinition` + // The family is the service name — EcsService points its `taskDefinition` // at the same value, so they stay in lockstep. The task definition isn't its // own Resource (re-registered every sync — no exists/create distinction). - $family = (new EcsService())->name(); + $family = (new EcsService($group))->name(); // ECS's SIGTERM-to-SIGKILL ceiling. Derived from the same source as the // entrypoint drain and supervisord's stop waits so a long drain or queue // job isn't cut short by SIGKILL mid-shutdown. - $stopTimeout = ShutdownTimings::stopTimeout(); + $stopTimeout = ShutdownTimings::stopTimeoutFor($group); return [ 'family' => $family, @@ -59,30 +70,37 @@ public static function payload(?string $imageTag = null): array 'requiresCompatibilities' => ['FARGATE'], 'cpu' => $cpu, 'memory' => $memory, - 'executionRoleArn' => $executionRoleArn, - 'taskRoleArn' => $taskRoleArn, + 'executionRoleArn' => static::executionRoleArn(), + 'taskRoleArn' => static::taskRoleArn(), 'containerDefinitions' => [ [ - 'name' => 'web', + 'name' => $group->value, 'image' => $image, 'essential' => true, + // The container command is the role — the entrypoint dispatches + // on it (web → supervisord, queue → worker, scheduler → cron). + 'command' => [$group->value], 'stopTimeout' => $stopTimeout, 'linuxParameters' => [ 'initProcessEnabled' => true, ], - 'portMappings' => [ - [ - 'containerPort' => $port, - 'hostPort' => $port, - 'protocol' => 'tcp', + // Only the web container is reached over the network (the ALB); + // queue and scheduler are headless and map no port. + ...$group->attachesToLoadBalancer() ? [ + 'portMappings' => [ + [ + 'containerPort' => (int) Manifest::get('tasks.web.port', 8000), + 'hostPort' => (int) Manifest::get('tasks.web.port', 8000), + 'protocol' => 'tcp', + ], ], - ], + ] : [], 'logConfiguration' => [ 'logDriver' => 'awslogs', 'options' => [ 'awslogs-group' => (new TaskLogGroup())->name(), 'awslogs-region' => Manifest::get('region'), - 'awslogs-stream-prefix' => 'web', + 'awslogs-stream-prefix' => $group->value, ], ], ], @@ -90,4 +108,18 @@ public static function payload(?string $imageTag = null): array 'tags' => Aws::ecsTags(['Name' => $family]), ]; } + + protected static function taskRoleArn(): string + { + return Manifest::has('tasks.web.task-role') + ? Manifest::get('tasks.web.task-role') + : (new EcsTaskRole())->arn(); + } + + protected static function executionRoleArn(): string + { + return Manifest::has('tasks.web.execution-role') + ? Manifest::get('tasks.web.execution-role') + : (new EcsExecutionRole())->arn(); + } } diff --git a/stubs/Dockerfile.stub b/stubs/Dockerfile.stub index c7dd0c4f..d613f8e2 100644 --- a/stubs/Dockerfile.stub +++ b/stubs/Dockerfile.stub @@ -1,7 +1,9 @@ # Scaffolded by `yolo init`. Owns the base image + PHP extensions — customise -# freely. YOLO generates the .yolo-entrypoint.sh (deploy-all hooks) and -# docker/supervisord.conf (octane + queue + scheduler from tasks.web.*) into the -# build context at build time, so this Dockerfile just runs them. +# freely. YOLO generates the .yolo-entrypoint.sh (deploy-all hooks + per-role +# dispatch) and docker/supervisord.conf (octane + any bundled queue/scheduler +# from tasks.web.*) into the build context at build time, so this Dockerfile just +# runs them. The entrypoint takes the role (web | queue | scheduler) as its +# argument; each ECS task definition passes its own, so one image serves all. FROM dunglas/frankenphp:1-php8.4-alpine RUN apk add --no-cache git supervisor \ @@ -20,7 +22,8 @@ USER www-data ENV SERVER_NAME=:8000 EXPOSE 8000 -# supervisord (PID 1 after the entrypoint exec) runs the supervised processes; -# ECS sends SIGTERM to it, which stops the children gracefully. +# The entrypoint dispatches on the role argument (default web → supervisord). +# Each ECS task definition overrides this with its own role; ECS sends SIGTERM +# on stop, which the entrypoint traps to drain before forwarding it. ENTRYPOINT ["/app/.yolo-entrypoint.sh"] -CMD ["supervisord", "-n"] +CMD ["web"] diff --git a/stubs/yolo.yml.stub b/stubs/yolo.yml.stub index 3af903a9..2742c352 100644 --- a/stubs/yolo.yml.stub +++ b/stubs/yolo.yml.stub @@ -43,13 +43,16 @@ environments: # queue: # shutdown-grace-period: 90 - # Independent task groups — not yet implemented. Today the web task runs - # octane + queue:work + the scheduler together (toggle the flags above). - # When you need to scale the web tier without duplicating the scheduler, - # these become their own ECS services: + # Extract a workload into its own ECS service so it scales independently of + # web. Opt in by uncommenting — but configure each workload in ONE place: + # either bundled above (web.queue / web.scheduler) or extracted here, never + # both. A standalone queue scales to zero by default; the scheduler is a + # pinned singleton (exactly one task), which drops the onOneServer() need. # queue: - # cpu: '256' - # memory: '512' + # min: 0 # 0 = scale to zero when idle (default) + # max: 10 + # backlog-per-task: 100 # target messages per running task + # spot: true # ~70% cheaper interruptible capacity # scheduler: # cpu: '256' # memory: '512' diff --git a/tests/Unit/Commands/CommandManifestIntegrityTest.php b/tests/Unit/Commands/CommandManifestIntegrityTest.php index 73afd29c..48889f47 100644 --- a/tests/Unit/Commands/CommandManifestIntegrityTest.php +++ b/tests/Unit/Commands/CommandManifestIntegrityTest.php @@ -137,3 +137,35 @@ function writeRawManifest(array $manifest): void expect(invokeManifestIntegrity())->toBeTrue(); }); + +it('bails when the queue is both bundled and a standalone service', function () { + writeManifest([ + 'account-id' => '848509375702', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => ['queue' => true], 'queue' => ['min' => 0]], + ]); + + expect(invokeManifestIntegrity())->toBeFalse(); + + $output = test()->promptOutput->fetch(); + expect($output)->toContain('tasks.web.queue'); + expect($output)->toContain('tasks.queue'); +}); + +it('bails when the scheduler is both bundled and a standalone service', function () { + writeManifest([ + 'account-id' => '848509375702', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => ['scheduler' => true], 'scheduler' => []], + ]); + + expect(invokeManifestIntegrity())->toBeFalse(); + expect(test()->promptOutput->fetch())->toContain('tasks.scheduler'); +}); + +it('accepts a bundled queue with a standalone scheduler (mix and match per workload)', function () { + writeManifest([ + 'account-id' => '848509375702', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => ['queue' => true], 'scheduler' => []], + ]); + + expect(invokeManifestIntegrity())->toBeTrue(); +}); diff --git a/tests/Unit/Commands/RunCommandTest.php b/tests/Unit/Commands/RunCommandTest.php index 8ec174f9..c1f939bf 100644 --- a/tests/Unit/Commands/RunCommandTest.php +++ b/tests/Unit/Commands/RunCommandTest.php @@ -7,6 +7,7 @@ cluster: 'yolo-production-codinglabs', task: 'arn:aws:ecs:ap-southeast-2:111:task/abc', command: '/bin/sh', + container: 'web', region: 'ap-southeast-2', profile: 'codinglabs', ); @@ -23,11 +24,25 @@ ]); }); +it('targets the container named after the service group', function () { + $args = RunCommand::executeCommandArgs( + cluster: 'yolo-production-codinglabs', + task: 'task-arn', + command: '/bin/sh', + container: 'queue', + region: 'ap-southeast-2', + profile: null, + ); + + expect($args)->toContain('--container', 'queue'); +}); + it('omits --profile when none is configured (e.g. running on AWS)', function () { $args = RunCommand::executeCommandArgs( cluster: 'yolo-production-codinglabs', task: 'task-arn', command: 'php artisan migrate --force', + container: 'web', region: 'ap-southeast-2', profile: null, ); diff --git a/tests/Unit/Commands/ScaleCommandTest.php b/tests/Unit/Commands/ScaleCommandTest.php index 21eb74d3..d764a806 100644 --- a/tests/Unit/Commands/ScaleCommandTest.php +++ b/tests/Unit/Commands/ScaleCommandTest.php @@ -64,9 +64,46 @@ function invokeScale(array $arguments = [], array $options = [], string $environ invokeScale(options: ['scheduler' => true]); })->throwsNoExceptions(); -it('errors on --queue (not yet a separate service)', function () { - invokeScale(options: ['queue' => true]); -})->throwsNoExceptions(); +it('queue: writes tasks.queue bounds and registers, allowing a zero floor', function () { + writeManifest([ + 'account-id' => '111111111111', + 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => []], + ]); + + $ecs = []; + $aa = []; + bindRoutedEcsClient(['DescribeServices' => new Result(['services' => [['status' => 'ACTIVE', 'desiredCount' => 0, 'runningCount' => 0]]])], $ecs); + bindMockApplicationAutoScalingClient([ + // Green-field queue — no target registered yet, so setting a zero floor + // isn't a reduction and applies straight through. + 'DescribeScalableTargets' => new Result(['ScalableTargets' => []]), + 'RegisterScalableTarget' => new Result([]), + ], $aa); + + invokeScale(options: ['queue' => true, 'min' => '0', 'max' => '20']); + + expect(collect($aa)->firstWhere('name', 'RegisterScalableTarget')['args'])->toMatchArray(['MinCapacity' => 0, 'MaxCapacity' => 20]); + expect(Manifest::get('tasks.queue.min'))->toBe(0); + expect(Manifest::get('tasks.queue.max'))->toBe(20); +}); + +it('queue: rejects a fixed desired count (always autoscaling-managed)', function () { + writeManifest([ + 'account-id' => '111111111111', + 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => []], + ]); + + $ecs = []; + $aa = []; + bindRoutedEcsClient(['DescribeServices' => new Result(['services' => [['status' => 'ACTIVE', 'desiredCount' => 0, 'runningCount' => 0]]])], $ecs); + bindMockApplicationAutoScalingClient(['DescribeScalableTargets' => new Result(['ScalableTargets' => []])], $aa); + + invokeScale(arguments: ['count' => '3'], options: ['queue' => true]); + + expect(collect($ecs)->pluck('name'))->not->toContain('UpdateService'); +}); it('fixed: sets the ECS desired count directly when no scalable target exists', function () { writeManifest([ diff --git a/tests/Unit/Commands/SyncAppSchedulerAdvisoryTest.php b/tests/Unit/Commands/SyncAppSchedulerAdvisoryTest.php index b1f1e1c0..75815970 100644 --- a/tests/Unit/Commands/SyncAppSchedulerAdvisoryTest.php +++ b/tests/Unit/Commands/SyncAppSchedulerAdvisoryTest.php @@ -30,5 +30,6 @@ ]); expect(SyncAppCommand::schedulerAdvisory()) - ->toContain('onOneServer()'); + ->toContain('onOneServer()') + ->toContain('tasks.scheduler'); }); diff --git a/tests/Unit/ManifestTest.php b/tests/Unit/ManifestTest.php index 0dfe5327..3ba66a39 100644 --- a/tests/Unit/ManifestTest.php +++ b/tests/Unit/ManifestTest.php @@ -1,6 +1,7 @@ toThrow(IntegrityCheckException::class); }); }); + +describe('server groups', function () { + it('lists only web for a plain web app', function () { + writeManifest(['tasks' => ['web' => []]]); + + expect(Manifest::serverGroups())->toBe([ServerGroup::WEB]); + }); + + it('lists web, queue and scheduler when both are extracted', function () { + writeManifest(['tasks' => ['web' => [], 'queue' => [], 'scheduler' => []]]); + + expect(Manifest::serverGroups())->toBe([ServerGroup::WEB, ServerGroup::QUEUE, ServerGroup::SCHEDULER]); + }); + + it('does not list a bundled queue as its own group', function () { + writeManifest(['tasks' => ['web' => ['queue' => true]]]); + + expect(Manifest::serverGroups())->toBe([ServerGroup::WEB]); + expect(Manifest::hasStandaloneQueue())->toBeFalse(); + expect(Manifest::bundles('queue'))->toBeTrue(); + }); + + it('detects a standalone queue and reads it as not bundled', function () { + writeManifest(['tasks' => ['web' => [], 'queue' => ['min' => 0]]]); + + expect(Manifest::hasStandaloneQueue())->toBeTrue(); + expect(Manifest::bundles('queue'))->toBeFalse(); + }); +}); diff --git a/tests/Unit/Resources/ApplicationAutoScaling/QueueBacklogPolicyTest.php b/tests/Unit/Resources/ApplicationAutoScaling/QueueBacklogPolicyTest.php new file mode 100644 index 00000000..1c8a7693 --- /dev/null +++ b/tests/Unit/Resources/ApplicationAutoScaling/QueueBacklogPolicyTest.php @@ -0,0 +1,70 @@ + '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => []], + ]); +}); + +it('tracks backlog-per-task with metric math dividing visible messages by running tasks', function () { + $config = (new QueueBacklogPolicy())->configuration(); + + expect($config['TargetValue'])->toBe(100.0); + + $metrics = collect($config['CustomizedMetricSpecification']['Metrics']); + + // The visible-messages metric on this app's queue, the running-task count, and + // the math expression that divides them — only the expression returns data. + expect($metrics->firstWhere('Id', 'visible')['MetricStat']['Metric'])->toMatchArray([ + 'Namespace' => 'AWS/SQS', + 'MetricName' => 'ApproximateNumberOfMessagesVisible', + ]); + expect($metrics->firstWhere('Id', 'running')['MetricStat']['Metric']['MetricName'])->toBe('RunningTaskCount'); + expect($metrics->firstWhere('Id', 'backlog_per_task'))->toMatchArray([ + 'Expression' => 'visible / running', + 'ReturnData' => true, + ]); +}); + +it('reads the backlog-per-task target from the manifest', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => ['backlog-per-task' => 40]], + ]); + + expect((new QueueBacklogPolicy())->configuration()['TargetValue'])->toBe(40.0); +}); + +it('upserts the target-tracking policy onto the queue scalable target when absent', function () { + $captured = []; + bindMockApplicationAutoScalingClient([ + 'DescribeScalingPolicies' => new Result(['ScalingPolicies' => []]), + 'PutScalingPolicy' => new Result(['PolicyARN' => 'arn:aws:autoscaling:...:policy/queue']), + ], $captured); + + $changes = (new QueueBacklogPolicy())->synchronise(apply: true); + + expect($changes)->not->toBe([]); + + $put = collect($captured)->firstWhere('name', 'PutScalingPolicy'); + expect($put['args'])->toMatchArray([ + 'PolicyType' => 'TargetTrackingScaling', + 'ResourceId' => 'service/yolo-testing-my-app/yolo-testing-my-app-queue', + ]); +}); + +it('reports drift without writing on a dry-run', function () { + $captured = []; + bindMockApplicationAutoScalingClient([ + 'DescribeScalingPolicies' => new Result(['ScalingPolicies' => []]), + ], $captured); + + $changes = (new QueueBacklogPolicy())->synchronise(apply: false); + + expect($changes)->not->toBe([]); + expect(collect($captured)->pluck('name'))->not->toContain('PutScalingPolicy'); +}); diff --git a/tests/Unit/Resources/ApplicationAutoScaling/QueueScaleToZeroBootstrapTest.php b/tests/Unit/Resources/ApplicationAutoScaling/QueueScaleToZeroBootstrapTest.php new file mode 100644 index 00000000..7d4f4213 --- /dev/null +++ b/tests/Unit/Resources/ApplicationAutoScaling/QueueScaleToZeroBootstrapTest.php @@ -0,0 +1,61 @@ + '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => ['min' => 0]], + ]); +}); + +it('reports both pieces as pending on a dry-run without writing', function () { + $aa = []; + $cw = []; + bindMockApplicationAutoScalingClient(['DescribeScalingPolicies' => new Result(['ScalingPolicies' => []])], $aa); + bindMockCloudWatchClient(['DescribeAlarms' => new Result(['MetricAlarms' => []])], $cw); + + $changes = (new QueueScaleToZeroBootstrap())->synchronise(apply: false); + + expect($changes)->toHaveCount(2); + expect(collect($aa)->pluck('name'))->not->toContain('PutScalingPolicy'); + expect(collect($cw)->pluck('name'))->not->toContain('PutMetricAlarm'); +}); + +it('sets the queue to exactly one task when a message arrives at zero', function () { + $alarmArn = 'arn:aws:cloudwatch:ap-southeast-2:111111111111:alarm:yolo-testing-my-app-queue-has-messages'; + $policyArn = 'arn:aws:autoscaling:ap-southeast-2:111111111111:scalingPolicy:x:resource/ecs/service/yolo-testing-my-app/yolo-testing-my-app-queue:policyName/bootstrap'; + + $aa = []; + $cw = []; + bindMockApplicationAutoScalingClient([ + 'DescribeScalingPolicies' => new Result(['ScalingPolicies' => []]), + 'PutScalingPolicy' => new Result(['PolicyARN' => $policyArn]), + ], $aa); + bindMockCloudWatchClient([ + 'DescribeAlarms' => new Result(['MetricAlarms' => [ + ['AlarmName' => 'yolo-testing-my-app-queue-has-messages', 'AlarmArn' => $alarmArn], + ]]), + 'ListTagsForResource' => new Result(['Tags' => []]), + ], $cw); + + (new QueueScaleToZeroBootstrap())->synchronise(apply: true); + + // A StepScaling policy that asserts ExactCapacity 1 — never fights the backlog + // policy's higher number (App Auto Scaling takes the max), just breaks zero. + $put = collect($aa)->firstWhere('name', 'PutScalingPolicy'); + expect($put['args']['PolicyType'])->toBe('StepScaling'); + expect($put['args']['StepScalingPolicyConfiguration']['AdjustmentType'])->toBe('ExactCapacity'); + expect($put['args']['StepScalingPolicyConfiguration']['StepAdjustments'][0]['ScalingAdjustment'])->toBe(1); + + // The alarm fires the moment a message is visible (> 0) and points at the policy. + $alarm = collect($cw)->firstWhere('name', 'PutMetricAlarm'); + expect($alarm['args'])->toMatchArray([ + 'MetricName' => 'ApproximateNumberOfMessagesVisible', + 'Namespace' => 'AWS/SQS', + 'Threshold' => 0, + 'ComparisonOperator' => 'GreaterThanThreshold', + ]); + expect($alarm['args']['AlarmActions'])->toBe([$policyArn]); +}); diff --git a/tests/Unit/Resources/ApplicationAutoScaling/ScalableTargetTest.php b/tests/Unit/Resources/ApplicationAutoScaling/ScalableTargetTest.php index e6ab5d1c..2899cf49 100644 --- a/tests/Unit/Resources/ApplicationAutoScaling/ScalableTargetTest.php +++ b/tests/Unit/Resources/ApplicationAutoScaling/ScalableTargetTest.php @@ -1,6 +1,7 @@ pluck('name'))->not->toContain('RegisterScalableTarget'); }); +it('builds the queue service resource id and defaults its floor to zero', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => []], + ]); + + expect(ScalableTarget::resourceId(ServerGroup::QUEUE))->toBe('service/yolo-testing-my-app/yolo-testing-my-app-queue'); + expect((new ScalableTarget(ServerGroup::QUEUE))->min())->toBe(0); + expect((new ScalableTarget(ServerGroup::QUEUE))->max())->toBe(10); +}); + +it('registers the queue target with a zero floor (scale to zero)', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => ['min' => 0, 'max' => 20]], + ]); + + $captured = []; + bindMockApplicationAutoScalingClient([ + 'DescribeScalableTargets' => new Result(['ScalableTargets' => []]), + 'RegisterScalableTarget' => new Result([]), + ], $captured); + + (new ScalableTarget(ServerGroup::QUEUE))->synchronise(apply: true); + + expect(collect($captured)->firstWhere('name', 'RegisterScalableTarget')['args'])->toMatchArray([ + 'ResourceId' => 'service/yolo-testing-my-app/yolo-testing-my-app-queue', + 'MinCapacity' => 0, + 'MaxCapacity' => 20, + ]); +}); + it('deregisters the target with the fixed namespace and dimension', function () { $captured = []; bindMockApplicationAutoScalingClient(['DeregisterScalableTarget' => new Result([])], $captured); diff --git a/tests/Unit/Resources/Iam/DeployerPolicyTest.php b/tests/Unit/Resources/Iam/DeployerPolicyTest.php index c9538983..da53744f 100644 --- a/tests/Unit/Resources/Iam/DeployerPolicyTest.php +++ b/tests/Unit/Resources/Iam/DeployerPolicyTest.php @@ -14,6 +14,7 @@ function statementFor(array $document, string $action): array beforeEach(function () { writeManifest([ 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => []], ]); }); @@ -78,6 +79,22 @@ function statementFor(array $document, string $action): array expect($statement['Action'])->toContain('ecs:RunTask', 'ecs:DescribeServices'); }); +it('widens UpdateService scope to the standalone queue and scheduler services when extracted', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => [], 'scheduler' => []], + ]); + + $statement = statementFor((new DeployerPolicy())->document(), 'ecs:UpdateService'); + + expect($statement['Resource'])->toContain( + 'arn:aws:ecs:ap-southeast-2:111111111111:service/yolo-testing-my-app/yolo-testing-my-app-queue', + 'arn:aws:ecs:ap-southeast-2:111111111111:task-definition/yolo-testing-my-app-queue:*', + 'arn:aws:ecs:ap-southeast-2:111111111111:service/yolo-testing-my-app/yolo-testing-my-app-scheduler', + 'arn:aws:ecs:ap-southeast-2:111111111111:task-definition/yolo-testing-my-app-scheduler:*', + ); +}); + it('scopes PassRole to the task and execution roles, passed only to ECS tasks', function () { $statement = statementFor((new DeployerPolicy())->document(), 'iam:PassRole'); diff --git a/tests/Unit/ShutdownTimingsTest.php b/tests/Unit/ShutdownTimingsTest.php index 91f6f7ef..cfcbdf37 100644 --- a/tests/Unit/ShutdownTimingsTest.php +++ b/tests/Unit/ShutdownTimingsTest.php @@ -1,6 +1,7 @@ ShutdownTimings::programGraces()) ->toThrow(IntegrityCheckException::class); }); + +describe('standalone services', function () { + it('defaults the standalone queue grace longer than the scheduler', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => [], 'scheduler' => []], + ]); + + expect(ShutdownTimings::standaloneGrace(ServerGroup::QUEUE))->toBe(70); + expect(ShutdownTimings::standaloneGrace(ServerGroup::SCHEDULER))->toBe(10); + }); + + it('honours a per-service shutdown-grace-period override', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => ['shutdown-grace-period' => 110]], + ]); + + expect(ShutdownTimings::standaloneGrace(ServerGroup::QUEUE))->toBe(110); + }); + + it('sizes a standalone stop timeout as the grace plus buffer (no ALB drain)', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => []], + ]); + + // 70 (default queue grace) + 5 buffer; no ALB drain folded in. + expect(ShutdownTimings::stopTimeoutFor(ServerGroup::QUEUE))->toBe(75); + }); + + it('caps a standalone stop timeout at the Fargate maximum', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => ['shutdown-grace-period' => 200]], + ]); + + expect(ShutdownTimings::stopTimeoutFor(ServerGroup::QUEUE))->toBe(120); + }); +}); diff --git a/tests/Unit/Steps/Build/Fargate/GenerateEntrypointScriptStepTest.php b/tests/Unit/Steps/Build/Fargate/GenerateEntrypointScriptStepTest.php index fdeee06e..bda1892b 100644 --- a/tests/Unit/Steps/Build/Fargate/GenerateEntrypointScriptStepTest.php +++ b/tests/Unit/Steps/Build/Fargate/GenerateEntrypointScriptStepTest.php @@ -35,16 +35,47 @@ function generatedEntrypointScript(): string expect($script)->toContain("php artisan migrate --force\nphp artisan config:cache\n"); }); -it('supervises the CMD instead of exec-ing it so SIGTERM can be trapped', function () { +it('supervises the role command instead of exec-ing it so SIGTERM can be trapped', function () { $script = generatedEntrypointScript(); expect($script)->not->toContain('exec "$@"'); - expect($script)->toContain('"$@" &'); + expect($script)->toContain('$cmd &'); expect($script)->toContain('child=$!'); expect($script)->toContain('trap drain TERM'); expect($script)->toContain('wait "$child"'); }); +it('dispatches a web-only app to supervisord with no queue or scheduler branch', function () { + $script = generatedEntrypointScript(); + + expect($script)->toContain("cmd='supervisord -c /etc/supervisord.conf -n'"); + expect($script)->not->toContain('queue)'); + expect($script)->not->toContain('scheduler)'); +}); + +it('adds a queue branch running the worker when the queue is its own service', function () { + writeManifest([ + 'apex' => 'example.com', + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => []], + ]); + + expect(generatedEntrypointScript())->toContain("queue) cmd='php artisan queue:work"); +}); + +it('adds a scheduler branch running cron when the scheduler is its own service', function () { + writeManifest([ + 'apex' => 'example.com', + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'scheduler' => []], + ]); + + $script = generatedEntrypointScript(); + + expect($script)->toContain("scheduler) cmd='crond"); + expect($script)->toContain("pgrep -f 'artisan schedule:run'"); +}); + it('drains for the web shutdown-grace-period before forwarding the stop', function () { expect(generatedEntrypointScript())->toContain("sleep 10\n"); }); diff --git a/tests/Unit/Steps/Fargate/SyncEcsServiceStepTest.php b/tests/Unit/Steps/Fargate/SyncEcsServiceStepTest.php index 864db6b8..05420e63 100644 --- a/tests/Unit/Steps/Fargate/SyncEcsServiceStepTest.php +++ b/tests/Unit/Steps/Fargate/SyncEcsServiceStepTest.php @@ -1,5 +1,6 @@ true, 'healthCheckGracePeriodSeconds' => 9999], gracePeriod: 60, enableExecuteCommand: true, + reconcilesGracePeriod: false, ))->toBeFalse(); }); @@ -122,15 +124,29 @@ }); describe('deploymentConfiguration', function () { + beforeEach(function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => [], 'scheduler' => []], + ]); + }); + it('enables the circuit breaker with rollback so a failed rollout self-reverts', function () { - expect(EcsService::deploymentConfiguration()['deploymentCircuitBreaker']) + expect((new EcsService())->deploymentConfiguration()['deploymentCircuitBreaker']) ->toBe(['enable' => true, 'rollback' => true]); }); - it('keeps one-at-a-time rolling capacity (100% min healthy, 200% max)', function () { - $config = EcsService::deploymentConfiguration(); + it('keeps one-at-a-time rolling capacity for web (100% min healthy, 200% max)', function () { + $config = (new EcsService(ServerGroup::WEB))->deploymentConfiguration(); expect($config['minimumHealthyPercent'])->toBe(100); expect($config['maximumPercent'])->toBe(200); }); + + it('deploys the scheduler stop-then-start (0% min healthy, 100% max) so a rollout never runs two crons', function () { + $config = (new EcsService(ServerGroup::SCHEDULER))->deploymentConfiguration(); + + expect($config['minimumHealthyPercent'])->toBe(0); + expect($config['maximumPercent'])->toBe(100); + }); }); diff --git a/tests/Unit/Steps/Fargate/SyncTaskDefinitionStepTest.php b/tests/Unit/Steps/Fargate/SyncTaskDefinitionStepTest.php index 399f1e99..a9670122 100644 --- a/tests/Unit/Steps/Fargate/SyncTaskDefinitionStepTest.php +++ b/tests/Unit/Steps/Fargate/SyncTaskDefinitionStepTest.php @@ -1,6 +1,7 @@ toBe('111111111111.dkr.ecr.ap-southeast-2.amazonaws.com/my-app:26.21.2.1500'); }); +it('names the container after the role and passes it as the command', function () { + $payload = SyncTaskDefinitionStep::payload(ServerGroup::QUEUE); + + expect($payload['family'])->toBe('yolo-testing-my-app-queue'); + expect($payload['containerDefinitions'][0]['name'])->toBe('queue'); + expect($payload['containerDefinitions'][0]['command'])->toBe(['queue']); +}); + +it('maps no port for a headless worker group (queue/scheduler)', function () { + expect(SyncTaskDefinitionStep::payload(ServerGroup::SCHEDULER)['containerDefinitions'][0]) + ->not->toHaveKey('portMappings'); +}); + +it('sizes queue and scheduler smaller by default than web', function () { + writeManifest([ + 'account-id' => '111111111111', 'region' => 'ap-southeast-2', + 'tasks' => ['web' => [], 'queue' => [], 'scheduler' => []], + ]); + + bindMockIamClient([ + 'yolo-testing-ecs-task-role' => 'arn:aws:iam::111111111111:role/yolo-testing-ecs-task-role', + 'yolo-testing-ecs-execution-role' => 'arn:aws:iam::111111111111:role/yolo-testing-ecs-execution-role', + ]); + + $queue = SyncTaskDefinitionStep::payload(ServerGroup::QUEUE); + + expect($queue['cpu'])->toBe('256'); + expect($queue['memory'])->toBe('512'); +}); + it('falls back to defaults when manifest omits task config', function () { writeManifest([ 'account-id' => '111111111111', 'region' => 'ap-southeast-2',