Introduce dedicated stage EFS; fix MQ broker drift and Memcached SG by e9e4e5f0faef · Pull Request #378 · thunderbird/addons-server

e9e4e5f0faef · 2026-04-13T17:53:40Z

Summary

Replace the previous shared-EFS approach with a dedicated stage-owned EFS filesystem
Add an EFS access point (/addons, POSIX 9500:9500) so the eventual NETAPP_STORAGE_ROOT flip does not fail on filesystem permissions for the non-root olympia runtime user
Fix perpetual Amazon MQ broker replacement caused by engineType casing mismatch
Correct Memcached ingress rule placement to the security group actually used by the cluster
Fix pre-flight validator to recognise Amazon MQ .on.aws domain endpoints
Remove obsolete SG rules from the superseded storage and broker designs

Changes

File	Change
`infra/pulumi/__main__.py`	Add dedicated stage EFS filesystem, mount targets, and access point; inject access-point authorisation into web/worker/cron volume configs (scoped to the `addons-efs` volume); correct Memcached ingress rule placement; fix `engine_type` casing to prevent force-new broker replacement; remove obsolete SG rules
`infra/pulumi/config.stage.yaml`	Replace shared EFS mount-target configuration with dedicated stage EFS configuration
`infra/scripts/preflight_check.py`	Recognise `.mq.<region>.on.aws` endpoints in broker isolation and SG reachability checks

Why

Dedicated stage EFS: the previous approach tried to mount a shared filesystem across VPC boundaries and failed with MountTargetConflict. A dedicated stage filesystem keeps storage aligned with the stage isolation model and avoids the cross-VPC limitation.

EFS access point for olympia UID/GID 9500: the application runs as olympia (UID 9500) per Dockerfile.ecs. Without an access point, the eventual NETAPP_STORAGE_ROOT flip from /tmp/storage to /var/addons would fail with EACCES because an empty EFS root directory is owned by root:root with 0755 permissions. The access point at /addons with POSIX 9500:9500 exposes a writable subtree without requiring a root-task bootstrap ritual at activation time. Containers still mount it at /var/addons via mountPoints. Injection is scoped to the volume named addons-efs and applies to web/worker (YAML-defined) and cron (Python-constructed).

MQ broker drift fix: AWS returns engineType as RabbitMQ, while the code previously used RABBITMQ. Because this is a force-new field, the mismatch caused a perpetual broker replacement diff. The fix aligns the configured value with the value returned by AWS.

Memcached SG correction: the 11211 ingress rule was attached to the wrong security group, so it had no effect. This moves it to the correct SG and restores the intended cache connectivity.

Validator fix: Amazon MQ RabbitMQ endpoints use .mq.<region>.on.aws, which the validator did not previously recognise.

Storage activation model (staged)

This PR creates and mounts the dedicated stage EFS filesystem at /var/addons on web/worker/cron, with the access point in place so the runtime user can write into it. It does not switch application writes to EFS.

After this PR deploys:

EFS is mounted at /var/addons via the /addons access point
NETAPP_STORAGE_ROOT remains /tmp/storage
Application file writes therefore remain ephemeral until a deliberate later flip

The NETAPP_STORAGE_ROOT flip is intentionally a separate, future operational step. It is gated on post-deploy validation (mount verified, write/read/delete as olympia UID 9500 succeeds, persistence across task restart confirmed).

Validation

pulumi preview shows + 6 to create / ~ 20 to update / - 3 to delete / +- 3 to replace / = 140 unchanged. The creates are the expected dedicated EFS resources, access point, and the corrected Memcached SG reachability. The replaces are the task-definition updates that pick up the new filesystem ID and access-point authorisation. No broker replacement.
ruff check and ruff format --check pass
Pre-flight validator passes with the .on.aws fix applied

Safety

Scheduled tasks remain disabled
The MQ change is drift prevention only; no functional broker migration is included here
The new EFS filesystem is created empty; no existing data is modified
Normal application writes remain inert with respect to EFS because NETAPP_STORAGE_ROOT still points at /tmp/storage
The removed SG rules belong to superseded designs and are not used by running services

Follow-up

Run pulumi up
Validate task startup and EFS mount behaviour after deploy (mount visible at /var/addons)
Validate write/read/delete as olympia UID 9500 against the access point
Flip NETAPP_STORAGE_ROOT to the EFS-backed path after mount and write verification
The monitoring baseline (Add env-gating monitoring baseline for ATN stage #379) lands once this PR merges and Add env-gating monitoring baseline for ATN stage #379 retargets to stage

Addresses part of #375, with issue closure to follow post-deploy validation.

…hecks

…vation

fix(preflight): recognise Amazon MQ .on.aws domain in broker and SG c…

980fb96

…hecks

e9e4e5f0faef requested a review from Sancus April 13, 2026 17:58

e9e4e5f0faef self-assigned this Apr 13, 2026

fix(pulumi): MQ broker engineType casing, Memcached SG placement

7d13018

e9e4e5f0faef changed the title ~~Introduce dedicated stage EFS; fix .on.aws validator support~~ Introduce dedicated stage EFS; fix MQ broker drift and Memcached SG Apr 15, 2026

e9e4e5f0faef mentioned this pull request Apr 16, 2026

Add env-gating monitoring baseline for ATN stage #379

Open

8 tasks

fix(pulumi): add EFS access point for olympia UID before storage acti…

335bab6

…vation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce dedicated stage EFS; fix MQ broker drift and Memcached SG#378

Introduce dedicated stage EFS; fix MQ broker drift and Memcached SG#378
e9e4e5f0faef wants to merge 3 commits intostagefrom
feat/stage-efs-isolation

e9e4e5f0faef commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

e9e4e5f0faef commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why

Storage activation model (staged)

Validation

Safety

Follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

e9e4e5f0faef commented Apr 13, 2026 •

edited

Loading