Skip to content

Content app OOM on directory listing for large repositories #7745

@dkliban

Description

@dkliban

Problem

When a client requests a directory listing for a repository with a very large number of content units, the content app builds the entire listing in memory, causing an instant OOM kill of the pulp-content pod.

In production, requesting the directory listing for @rubygems/rubygems/fedora-43-x86_64 (266,934 RPM packages) causes an ~800MB-1.2GB memory spike in a single request, immediately OOM-killing the content pod (2 GiB memory limit). The pod has been OOM-killed repeatedly by this request pattern.

Root Cause

pulpcore/content/handler.py — the list_directory_blocking() method (line 606) and render_html() method (line 534) build the entire directory listing in memory:

  1. list_directory_blocking() iterates all ContentArtifact objects matching the path, building four in-memory collections (directory_list set, dates dict, content_to_find dict, sizes dict). For 267K packages, this loads ~267K Django ORM objects with select_related("artifact").

  2. It then iterates content_repo_ver._content_relationships() to update dates — loading another ~267K RepositoryContent objects.

  3. render_html() sorts the 267K entry set, then renders a Jinja template producing ~267K <a href> lines into a single HTML string (~27MB of HTML).

  4. The complete HTML string is returned via HTTPOk(text=...), holding the entire response in memory.

Total memory impact: ~267K ORM objects (~267MB) + 4 dicts/sets of 267K entries + sorted list + rendered HTML ≈ 800MB-1.2GB.

Evidence from Production

  • Pod: pulp-content-7469c446f6-89jz5 (2 GiB memory limit)
  • OOM kill 1: 2026-05-22 09:16 UTC — memory jumped from 521MB to 1365MB in one minute
  • OOM kill 2: 2026-05-22 10:28 UTC — memory jumped from 477MB to 1640MB in one minute
  • Repository: @rubygems/rubygems/fedora-43-x86_64 — 266,934 RPM packages in latest version
  • All other requests in the access logs were 302 redirects (not memory-intensive)

Suggested Approaches

  • Stream the HTML response: Use StreamResponse to write the directory listing in chunks instead of building the entire HTML string in memory
  • Paginate: Limit directory listings to a configurable maximum number of entries (e.g., 10,000) with pagination links
  • Cap and warn: If the directory listing exceeds a threshold, return a truncated listing with a message indicating the listing is too large
  • Lazy iteration: Use Django's .iterator() on the queryset and stream entries as they're fetched from the database, avoiding materializing all ORM objects at once
  • Pre-generate at publish/version time: Generate the HTML directory listing pages when a publication or repository version is created, storing them as static artifacts. The content app would then serve pre-built pages instead of generating them on each request

Related

The content app also has a gradual memory leak (~7.3 MB per 1000 requests) that compounds this issue. Worker recycling via --max-requests is being enabled separately to address the leak.

Update: 504 timeout even when OOM is resolved

After increasing the content pod memory limit from 2Gi to 3Gi, the directory listing for @rubygems/rubygems/fedora-43-x86_64 (266,934 packages) no longer OOM-kills the pod — it peaked at 1532MB and survived. However, the request still fails with a 504 Gateway Timeout because building the directory listing takes longer than 30 seconds to complete.

This reinforces the need for an approach that avoids building the entire response on-the-fly at request time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions