Skip to content

Drop unused unique index on task_processor_task.uuid (~7 GB in prod) #222

@gagantrivedi

Description

@gagantrivedi

Summary

The task_processor_task and task_processor_recurringtask tables each carry an auto-generated unique B-tree index on their uuid column (task_processor_task_uuid_key, task_processor_recurringtask_uuid_key). Neither index is read by any query in our codebases.

In production, task_processor_task_uuid_key alone is ~7 GB.

Origin

The uuid field was introduced in the very first migration of the task processor — commit c5110873a ("Async processor (#1334)", 2022-08-03), defined as:

uuid = models.UUIDField(unique=True, default=uuid.uuid4)

unique=True triggers Postgres to create the backing unique index. The field has been carried through every relocation since (extraction to flagsmith-task-processor, then port to flagsmith-common) without ever being queried.

Why this matters

task_processor_task is a high-churn table (insert per enqueued task, update on lock/run, delete on cleanup). A unique index on a randomly-generated UUID is one of the more expensive index shapes to maintain — every insert pays for a B-tree write at a random position, every delete pays for a tombstone, and the index never returns the favour with a read. At ~7 GB it's also a non-trivial chunk of buffer cache, backup volume, and replication traffic.

Primary key is unaffected

Both tables keep Django's default auto-increment id PK. Every existing query already uses it:

  • Task.objects.filter(pk__in=…) (tasks.py:51)
  • TaskRun / RecurringTaskRun FKs target task_id
  • The get_tasks_to_process() SQL function selects by id ordering

Dropping uuid (or just unique=True) leaves all of that intact.

Proposed change

Drop unique=True from AbstractBaseTask.uuid (or remove the field outright, pending a check on external consumers — e.g. log/metric pipelines that may emit task.uuid). Either change is a one-migration cleanup; in prod, follow with DROP INDEX CONCURRENTLY to reclaim the 7 GB without locking the table.

Verification done

  • No .filter(uuid=…) / .get(uuid=…) / task__uuid / raw-SQL reference anywhere.
  • The only uuid lookup in task_processor is on the unrelated HealthCheckModel, which has its own index.
  • RecurringTaskAdmin.list_display renders uuid but does not filter by it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions