Skip to content

Redis queue not durable #7961

@akarki2005

Description

@akarki2005

The Problem

Both compose.yaml and compose-prod.yaml run Redis without persistence configured.

By default, Redis only snapshots to disk periodically (see https://redis.io/tutorials/operate/redis-at-scale/persistence-and-durability/). There thus exists a lengthy window where enqueued jobs exist only in memory. The volume mount preserves the data directory across restarts but does not help if Redis crashes before the next snapshot.

If Redis restarts during that window, jobs are silently dropped. This could be a problem during deadline periods when high volumes of submission collection, PDF splitting, and autotest jobs are actively being enqueued.

The Fix

Add --appendonly yes to the Redis command in both Compose files:

command: redis-server --appendonly yes

Append only file logging appends every write operation to disk (sequentially) so Redis can reconstruct its full state on restart, closing the data loss window. appendfsync everysec is the default when AOF is enabled, so we'd flush to disk once/second, meaning at most one second of data loss with negligible performance impact (fsync is performed asynchronously within a background thread as per the above documentation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions