The Problem
Both compose.yaml and compose-prod.yaml run Redis without persistence configured.
By default, Redis only snapshots to disk periodically (see https://redis.io/tutorials/operate/redis-at-scale/persistence-and-durability/). There thus exists a lengthy window where enqueued jobs exist only in memory. The volume mount preserves the data directory across restarts but does not help if Redis crashes before the next snapshot.
If Redis restarts during that window, jobs are silently dropped. This could be a problem during deadline periods when high volumes of submission collection, PDF splitting, and autotest jobs are actively being enqueued.
The Fix
Add --appendonly yes to the Redis command in both Compose files:
command: redis-server --appendonly yes
Append only file logging appends every write operation to disk (sequentially) so Redis can reconstruct its full state on restart, closing the data loss window. appendfsync everysec is the default when AOF is enabled, so we'd flush to disk once/second, meaning at most one second of data loss with negligible performance impact (fsync is performed asynchronously within a background thread as per the above documentation).
The Problem
Both
compose.yamlandcompose-prod.yamlrun Redis without persistence configured.By default, Redis only snapshots to disk periodically (see https://redis.io/tutorials/operate/redis-at-scale/persistence-and-durability/). There thus exists a lengthy window where enqueued jobs exist only in memory. The volume mount preserves the data directory across restarts but does not help if Redis crashes before the next snapshot.
If Redis restarts during that window, jobs are silently dropped. This could be a problem during deadline periods when high volumes of submission collection, PDF splitting, and autotest jobs are actively being enqueued.
The Fix
Add
--appendonly yesto the Redis command in both Compose files:Append only file logging appends every write operation to disk (sequentially) so Redis can reconstruct its full state on restart, closing the data loss window.
appendfsync everysecis the default when AOF is enabled, so we'd flush to disk once/second, meaning at most one second of data loss with negligible performance impact (fsyncis performed asynchronously within a background thread as per the above documentation).