LPX-649: extract queue & scheduler into their own ECS services by stevethomas · Pull Request #73 · codinglabsau/yolo

stevethomas · 2026-06-02T08:20:17Z

Hey, I made a thing! 🥳

LPX-649 — extract queue & scheduler into their own ECS services.

What problems are you solving?

YOLO bundles web + queue + scheduler into one Fargate task, coupling three workloads with different scaling shapes onto a single desiredCount. This makes each a service it can scale independently:

web — target tracking (unchanged).
queue — its own service, scale-to-zero by default. Backlog-per-task target tracking (ApproximateNumberOfMessagesVisible / RunningTaskCount via CloudWatch metric math, no Lambda), plus a step-scaling alarm that lifts it 0→1 the instant a message lands (target tracking can't divide by zero running tasks). Opt-in Fargate Spot (~70% cheaper). Costs ~$0 idle.
scheduler — its own pinned-singleton service (min=max=1, never a scalable target), deployed stop-then-start so a rollout never briefly runs two crons. Drops the ->onOneServer() requirement.

Topology is encoded by location, not a flag (your call on the design):

Manifest	Means
`tasks.web.queue` / `tasks.web.scheduler`	bundled in the web container — warm, instant pickup. Unchanged.
top-level `tasks.queue` / `tasks.scheduler`	extracted into its own service with the grown-up config.
both, for one workload	hard error — pick one.

So tasks.web.queue = "a chore the web box also does"; tasks.queue = "a workload that stands on its own." Nothing breaks — existing manifests are untouched, extraction is additive opt-in.

Also in this PR:

--group on deploy / run; scale --queue (min 0 = scale to zero); group-aware DeployerPolicy.
One image serves every role — the task-def passes the role as the container command and the entrypoint dispatches with a per-role graceful drain.
Retires the dead EC2-era RunsOnAws*Environment detectors + the unused ParsesOnlyOption concern (no Fargate implementors).
Hardened Manifest::put's surgical YAML writer to fall back to a full re-dump for an inline-empty-map parent (queue: {}) rather than corrupting it.
Docs: scaling guide, manifest + commands reference, yolo.yml / Dockerfile stubs.

Is there anything the reviewer needs to know to deploy this?

No infra was touched and nothing was merged — this is code + docs only.
No breaking manifest change. Bundled tasks.web.queue/scheduler keep working exactly as before; CL's yolo.yml needs no changes. Extraction is opt-in.
Multi-tenancy: a standalone queue is one service per app on the default/landlord queue; per-tenant queue fan-out is out of scope and composes with LPX-601.
The web ALB health-gate is unchanged; a --group-scoped deploy that omits web skips that wait and relies on the ECS circuit breaker for the headless services.
553 Pest pass · PHPStan clean · Pint clean · VitePress docs build clean. Rebased onto latest main (incl. feat(sync): heartbeat + realistic timeout for slow AWS waiters #72 and the IAM-policy-drift / elasticache deployer changes).

🤖 Generated with Claude Code

Promote the bundled web+queue+scheduler task into three independent, group-aware ECS services so each workload scales on its own shape: - web: target tracking (unchanged) - queue: standalone service, scale-to-zero by default — backlog-per-task target tracking (MessagesVisible / RunningTaskCount metric math) plus a step-scaling alarm to lift it 0->1; opt-in Fargate Spot - scheduler: pinned-singleton service (min=max=1), deployed stop-then-start so a rollout never runs two crons (drops the onOneServer() requirement) Topology is encoded by location: bundled via tasks.web.queue/scheduler (warm, instant pickup — unchanged), extracted via top-level tasks.queue / tasks.scheduler. Configuring a workload both ways hard-fails. One image serves every role; the task definition passes the role as the container command and the entrypoint dispatches with a per-role graceful drain. Also: --group on deploy/run, scale --queue, a group-aware DeployerPolicy, and retires the dead EC2-era RunsOnAws*Environment detectors + the unused ParsesOnlyOption concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolve conflicts from main's audit refactor + DynamoDB removal landing ahead of this PR: - ScaleCommand: keep the queue-as-its-own-service path (resolveGroup returns ServerGroup; docblock examples) over main's 'queue not yet' placeholders — this PR is what makes queue scaling real. - SyncAppCommand + advisory test: keep the fuller scheduler advisory that points at the new top-level 'tasks.scheduler' block (main only trimmed it because that feature did not exist there yet). - docs/guide/scaling.md: take main's lock-store wording (drops the removed DynamoDB, names Valkey/Redis); keep this PR's tasks.scheduler extraction section. - docs/reference/commands.md: keep the standalone --queue row; drop main's duplicate --min/--max row. Absorbs main's DynamoDB removal (sessions on Valkey). pint, phpstan, 546 pest, and the VitePress build all pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

stevethomas and others added 2 commits June 2, 2026 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LPX-649: extract queue & scheduler into their own ECS services#73

LPX-649: extract queue & scheduler into their own ECS services#73
stevethomas wants to merge 2 commits into
mainfrom
steve/distracted-boyd-d8122f

stevethomas commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevethomas commented Jun 2, 2026

Hey, I made a thing! 🥳

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant