LPX-649: extract queue & scheduler into their own ECS services#73
Open
stevethomas wants to merge 2 commits into
Open
LPX-649: extract queue & scheduler into their own ECS services#73stevethomas wants to merge 2 commits into
stevethomas wants to merge 2 commits into
Conversation
Promote the bundled web+queue+scheduler task into three independent, group-aware ECS services so each workload scales on its own shape: - web: target tracking (unchanged) - queue: standalone service, scale-to-zero by default — backlog-per-task target tracking (MessagesVisible / RunningTaskCount metric math) plus a step-scaling alarm to lift it 0->1; opt-in Fargate Spot - scheduler: pinned-singleton service (min=max=1), deployed stop-then-start so a rollout never runs two crons (drops the onOneServer() requirement) Topology is encoded by location: bundled via tasks.web.queue/scheduler (warm, instant pickup — unchanged), extracted via top-level tasks.queue / tasks.scheduler. Configuring a workload both ways hard-fails. One image serves every role; the task definition passes the role as the container command and the entrypoint dispatches with a per-role graceful drain. Also: --group on deploy/run, scale --queue, a group-aware DeployerPolicy, and retires the dead EC2-era RunsOnAws*Environment detectors + the unused ParsesOnlyOption concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflicts from main's audit refactor + DynamoDB removal landing ahead of this PR: - ScaleCommand: keep the queue-as-its-own-service path (resolveGroup returns ServerGroup; docblock examples) over main's 'queue not yet' placeholders — this PR is what makes queue scaling real. - SyncAppCommand + advisory test: keep the fuller scheduler advisory that points at the new top-level 'tasks.scheduler' block (main only trimmed it because that feature did not exist there yet). - docs/guide/scaling.md: take main's lock-store wording (drops the removed DynamoDB, names Valkey/Redis); keep this PR's tasks.scheduler extraction section. - docs/reference/commands.md: keep the standalone --queue row; drop main's duplicate --min/--max row. Absorbs main's DynamoDB removal (sessions on Valkey). pint, phpstan, 546 pest, and the VitePress build all pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey, I made a thing! 🥳
LPX-649 — extract queue & scheduler into their own ECS services.
What problems are you solving?
YOLO bundles web + queue + scheduler into one Fargate task, coupling three workloads with different scaling shapes onto a single
desiredCount. This makes each a service it can scale independently:ApproximateNumberOfMessagesVisible / RunningTaskCountvia CloudWatch metric math, no Lambda), plus a step-scaling alarm that lifts it0→1the instant a message lands (target tracking can't divide by zero running tasks). Opt-in Fargate Spot (~70% cheaper). Costs ~$0 idle.min=max=1, never a scalable target), deployed stop-then-start so a rollout never briefly runs two crons. Drops the->onOneServer()requirement.Topology is encoded by location, not a flag (your call on the design):
tasks.web.queue/tasks.web.schedulertasks.queue/tasks.schedulerSo
tasks.web.queue= "a chore the web box also does";tasks.queue= "a workload that stands on its own." Nothing breaks — existing manifests are untouched, extraction is additive opt-in.Also in this PR:
--groupondeploy/run;scale --queue(min0= scale to zero); group-awareDeployerPolicy.RunsOnAws*Environmentdetectors + the unusedParsesOnlyOptionconcern (no Fargate implementors).Manifest::put's surgical YAML writer to fall back to a full re-dump for an inline-empty-map parent (queue: {}) rather than corrupting it.yolo.yml/ Dockerfile stubs.Is there anything the reviewer needs to know to deploy this?
tasks.web.queue/schedulerkeep working exactly as before; CL'syolo.ymlneeds no changes. Extraction is opt-in.--group-scoped deploy that omits web skips that wait and relies on the ECS circuit breaker for the headless services.main(incl. feat(sync): heartbeat + realistic timeout for slow AWS waiters #72 and the IAM-policy-drift / elasticache deployer changes).🤖 Generated with Claude Code