Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 57 additions & 13 deletions docs/guide/scaling.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
# Scaling

YOLO runs the web service as a single Fargate task by default. You can scale it two ways:
By default an app runs as **one Fargate task** doing everything β€” Octane plus, if enabled, the bundled queue worker and scheduler (`tasks.web.queue` / `tasks.web.scheduler`). That's the cheap floor and fine at low scale. The three workloads have different scaling shapes, though, so each can be **extracted into its own ECS service** that scales independently:

| Service | How it scales | Opt in with |
|---|---|---|
| **web** | target tracking (CPU + request count), `min`β†’`max` | `tasks.web.autoscaling` |
| **queue** | backlog-per-task, **scales to zero** | top-level `tasks.queue` |
| **scheduler** | never β€” pinned singleton (exactly one task) | top-level `tasks.scheduler` |

Extraction is additive and per-workload: keep the queue bundled but give the scheduler its own service, or any mix. A workload can't be both bundled (`tasks.web.queue`) and extracted (`tasks.queue`) at once β€” `sync` hard-fails if you configure both.

You can scale the web (and queue) service two ways:

- **Autoscaling** β€” let AWS adjust the task count automatically from live metrics.
- **`yolo scale`** β€” set the capacity yourself, out of band, without a deploy.
Expand Down Expand Up @@ -62,13 +72,14 @@ Application Auto Scaling targets and policies can't carry tags, so they don't sh
[`yolo scale`](/reference/commands#yolo-scale) changes capacity without a build or deploy. Like `env:push`, it shows a current β†’ new comparison and asks before applying.

```bash
yolo scale production --web --min=3 --max=10 # autoscaled: set the bounds
yolo scale production --web 3 # fixed: set the desired count
yolo scale production --web --min=3 --max=10 # web autoscaled: set the bounds
yolo scale production --web 3 # web fixed: set the desired count
yolo scale production --queue --min=0 --max=20 # queue bounds (min 0 = scale to zero)
```

Under autoscaling you set the **bounds** (`--min`/`--max`), never a desired count β€” the policies own desired count and would override it. Crucially, `scale` **writes the bounds back to the manifest** (surgically β€” your comments and formatting survive), so the manifest stays the single source of truth and the next `yolo sync` reconciles to the same values rather than clobbering your change.

For a fixed service (no `autoscaling` block) a positional `count` sets the ECS desired count directly.
For a fixed web service (no `autoscaling` block) a positional `count` sets the ECS desired count directly. A standalone queue is always autoscaling-managed, so it only takes `--min`/`--max`. The scheduler is a singleton and can't be scaled (`--scheduler` errors out).

### Reducing capacity is guarded

Expand All @@ -80,17 +91,41 @@ Because the manifest is authoritative, a `yolo sync` run with a **stale** manife

So an emergency `yolo scale production --web --min=10` is durable: it's written to the manifest *and* live, and no unattended sync can quietly walk it back.

## The scheduler caveat
## The queue (scale to zero)

Add a top-level `tasks.queue` block to give the queue worker its own ECS service, separate from web:

```yaml
tasks:
web: {}
queue:
min: 0 # scale to zero when idle (the default)
max: 20
backlog-per-task: 100
spot: true # optional: ~70% cheaper interruptible capacity
```

It scales on **backlog per task** β€” `ApproximateNumberOfMessagesVisible / RunningTaskCount`, computed with CloudWatch metric math (no Lambda) and held at `backlog-per-task` messages per running task. As the backlog grows it scales out toward `max`; as it drains it scales back in toward `min`.

This is the one thing to get right before scaling a service that runs the scheduler.
With `min: 0` the queue **scales to zero**: no tasks and no compute cost when idle. Target tracking can't lift it off zero (dividing by zero running tasks is undefined), so YOLO also attaches a step-scaling alarm that sets the service to exactly one task the instant a message becomes visible; target tracking owns it from one upward. The cost is a **~30–60s cold start** (image pull + boot) on the first message after idle.

In the default topology the web container also runs the queue worker and the **scheduler** (`crond` firing `schedule:run` every minute). The queue is safe to multiply β€” SQS only hands each message to one worker. The scheduler is **not**: scale to N tasks and `schedule:run` fires on every replica, so every scheduled task runs N times (NΓ— emails, NΓ— billing, NΓ— reports).
That makes the choice of *where* the queue lives a latency decision:

There's no stable per-task identity on Fargate to elect a single scheduler from, so pick one of two strategies:
| Topology | Idle cost | Pickup latency | Use for |
|---|---|---|---|
| Bundled (`tasks.web.queue: true`) | included in web | **instant** (worker always warm) | light, latency-sensitive jobs |
| Standalone, `min: 0` | **~$0** | ~30–60s cold start from idle | bursty, latency-tolerant async |
| Standalone, `min: 1+` | one always-on task | instant, then autoscales | high-volume, always-busy |

### 1. `->onOneServer()` (recommended)
For multi-tenant apps, a single queue service works the app's default queue; per-tenant queue fan-out composes with [LPX-601](https://linear.app/codinglabsau/issue/LPX-601) and isn't covered here.

Add Laravel's [`onOneServer()`](https://laravel.com/docs/scheduling#running-tasks-on-one-server) to **every** scheduled task in your console kernel. It takes an atomic lock in the shared cache so only one replica runs each task per minute:
## The scheduler

The scheduler (`crond` firing `schedule:run` every minute) must run as a **singleton** β€” if it runs on N tasks, every scheduled job fires N times (NΓ— emails, NΓ— billing, NΓ— reports). The queue is safe to multiply (SQS hands each message to one worker); the scheduler is not. There's no stable per-task identity on Fargate to elect one from, so pick one of two strategies.

### 1. `->onOneServer()`

Keep the scheduler bundled in the web container (`tasks.web.scheduler: true`) and add Laravel's [`onOneServer()`](https://laravel.com/docs/scheduling#running-tasks-on-one-server) to **every** scheduled task. It takes an atomic lock in the shared cache so only one replica runs each task per minute:

```php
$schedule->command('reports:send')->daily()->onOneServer();
Expand All @@ -100,10 +135,19 @@ This requires a shared lock store (the Valkey/Redis cache YOLO provisions, or a

The catch: it's per-task. A scheduled task registered by a package (Telescope pruning, backups, etc.) that you can't annotate will still multi-fire β€” which is your signal to reach for strategy 2.

### 2. Separate the scheduler
### 2. Extract the scheduler (recommended once web scales)

Give the scheduler its own service with a top-level `tasks.scheduler` block:

```yaml
tasks:
web:
autoscaling: { min: 1, max: 6 }
scheduler: {} # its own pinned-singleton service
```

Move the scheduler into its own service pinned at exactly one task. This removes the requirement entirely (it's genuinely a singleton) and lets the web tier scale without any scheduler concern. (Dedicated queue/scheduler services are on the roadmap.)
YOLO pins it at exactly one task (never a scalable target) and deploys it **stop-then-start** (`minimumHealthyPercent: 0` / `maximumPercent: 100`) so a rollout stops the old cron before starting the new one β€” a deploy never briefly runs two schedulers (a missed cron minute is harmless; a double-run isn't). This removes the `onOneServer()` *requirement* entirely β€” it's genuinely a singleton now β€” though leaving `onOneServer()` on is harmless. The web tier then scales without any scheduler concern.

::: tip
When you enable autoscaling on a task that still runs the scheduler, `yolo sync` prints a one-line reminder of exactly this. It's a nudge, not a gate β€” YOLO can't see inside your kernel to know which strategy you chose.
When you enable autoscaling on a web task that still **bundles** the scheduler, `yolo sync` prints a one-line advisory pointing at these two strategies. It's a nudge, not a gate β€” YOLO can't see inside your kernel to know whether you've used `onOneServer()`.
:::
32 changes: 17 additions & 15 deletions docs/reference/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ The image-building steps only run when the manifest declares `tasks`. See [Build
Build, push, and deploy the application β€” runs [`build`](#yolo-build) first, then the zero-downtime rollout.

```bash
yolo deploy <environment> [--app-version=<tag>] [--no-progress]
yolo deploy <environment> [--app-version=<tag>] [--group=<groups>] [--no-progress]
```

| Argument | Required | Description |
Expand All @@ -123,9 +123,10 @@ yolo deploy <environment> [--app-version=<tag>] [--no-progress]
| Option | Value | Default | Description |
|---|---|---|---|
| `--app-version` | string | timestamp `y.W.N.Hi` | Tag to stamp on the build (same rules as `build`). |
| `--group` | comma-separated | all the app runs | Service groups to roll (`web,queue,scheduler`). Defaults to every service the app runs. |
| `--no-progress` | flag | off | Hide the live progress output. |

After building, `deploy` pushes assets to S3, registers a new task-definition revision, runs `deploy` hooks as a one-off task, updates the ECS service, waits for it to go healthy (the deployment circuit breaker auto-rolls-back on failure), then UPSERTs Route 53 records. It always waits for the rollout to stabilise β€” there is no opt-out flag.
After building, `deploy` pushes assets to S3, registers a new task-definition revision **for each service group** (web plus any standalone queue/scheduler), runs `deploy` hooks as a one-off task, rolls each ECS service onto its new revision, waits for the web service to go healthy (the deployment circuit breaker auto-rolls-back on failure), then UPSERTs Route 53 records. It always waits for the rollout to stabilise β€” there is no opt-out flag. `--group` narrows the rollout to a subset of services (the shared image is built either way); a deploy that omits `web` skips the ALB health wait, relying on the circuit breaker.

---

Expand All @@ -151,18 +152,16 @@ yolo run <environment> [--command="<cmd>"] [--group=<groups>]
- **No `--command`** β†’ opens an interactive `/bin/sh` in the first running task (searched in the order `scheduler β†’ queue β†’ web`).
- **With `--command`** β†’ runs the command. With `--group`, it **fans out** across every running task in each listed group. Without `--group`, it runs on the first group that has a running task.

**Requirements:** the AWS [Session Manager plugin](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html) installed locally, and `tasks.web.enable-execute-command: true` in the manifest.
Each group is its own ECS service when extracted, and `run` execs into the container named after the group. A bundled queue/scheduler runs inside the web container, so a `--group=queue` lookup that finds no standalone queue service simply falls through to the next group.

**Requirements:** the AWS [Session Manager plugin](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html) installed locally, and `enable-execute-command: true` on the target group in the manifest.

```bash
yolo run production
yolo run production --command="php artisan migrate:status"
yolo run production --command="php artisan queue:restart" --group=web,queue
```

::: tip
Today web, queue, and scheduler all run in the single `web` container, so the groups collapse onto it β€” the distinction matters once independent task groups land.
:::

---

## `yolo scale`
Expand All @@ -181,19 +180,22 @@ yolo scale <environment> [count] [--web] [--min=<n>] [--max=<n>] [--queue] [--sc
| Option | Value | Description |
|---|---|---|
| `--web` | flag | Target the web service (the default). |
| `--min` / `--max` | int | Autoscaling bounds β€” the autoscaled form. |
| `--queue` | flag | Target the queue service (errors until it's a separate service β€” on the roadmap). |
| `--queue` | flag | Target the standalone queue service. Always autoscaling-managed β€” takes `--min`/`--max` (min may be `0`), never a count. |
| `--scheduler` | flag | Always errors β€” the scheduler is a singleton and can't be scaled. |
| `--min` / `--max` | int | Autoscaling bounds β€” the autoscaled form. |

There are two forms, picked by what you pass:

The web service has two forms, picked by what you pass:
- **Autoscaled** β€” `--min`/`--max` set the bounds. The values are written back to the manifest (surgically β€” comments and formatting are preserved): web β†’ [`tasks.web.autoscaling.min/max`](/reference/manifest#tasks-web-autoscaling), queue β†’ [`tasks.queue.min/max`](/reference/manifest#tasks-queue). The scalable target is then registered, so the **manifest stays the source of truth** and the next sync reconciles to the same values. A desired count is never set under autoscaling (the policies would override it).
- **Fixed** β€” a positional `count` sets the ECS desired count directly (`UpdateService`), for a **web** service with no `autoscaling` block. A standalone queue is always autoscaling-managed, so passing it a count errors and points you to `--min/--max`.

- **Autoscaled** β€” `--min`/`--max` set the bounds. The values are written back to [`tasks.web.autoscaling.min/max`](/reference/manifest#tasks-web-autoscaling) (surgically β€” comments and formatting are preserved) and the scalable target is registered, so the **manifest stays the source of truth** and the next sync reconciles to the same values. A desired count is never set under autoscaling (the policies would override it).
- **Fixed** β€” a positional `count` sets the ECS desired count directly (`UpdateService`), for a service with no `autoscaling` block. Trying to pass a count to an autoscaling-managed service errors and points you to `--min/--max`.
Lowering a live bound is guarded the same as [reducing capacity](/guide/scaling#reducing-capacity-is-guarded) β€” an explicit confirm defaulting to no.

```bash
yolo scale production --web --min=3 --max=10 # autoscaled bounds (writes the manifest)
yolo scale production --web 3 # fixed desired count
yolo scale production # prompt for a fixed count
yolo scale production --web --min=3 --max=10 # web autoscaled bounds (writes the manifest)
yolo scale production --web 3 # web fixed desired count
yolo scale production --queue --min=0 --max=20 # queue bounds β€” min 0 = scale to zero
yolo scale production # prompt for a fixed count
```

**Reducing capacity** (a bound below the live value) is confirm-gated and defaults to *no*. See [Scaling](/guide/scaling).
Expand Down
Loading