Content app OOM on directory listing for large repositories

## Problem

When a client requests a directory listing for a repository with a very large number of content units, the content app builds the entire listing in memory, causing an instant OOM kill of the pulp-content pod.

In production, requesting the directory listing for `@rubygems/rubygems/fedora-43-x86_64` (266,934 RPM packages) causes an ~800MB-1.2GB memory spike in a single request, immediately OOM-killing the content pod (2 GiB memory limit). The pod has been OOM-killed repeatedly by this request pattern.

## Root Cause

`pulpcore/content/handler.py` — the `list_directory_blocking()` method (line 606) and `render_html()` method (line 534) build the entire directory listing in memory:

1. **`list_directory_blocking()`** iterates all `ContentArtifact` objects matching the path, building four in-memory collections (`directory_list` set, `dates` dict, `content_to_find` dict, `sizes` dict). For 267K packages, this loads ~267K Django ORM objects with `select_related("artifact")`.

2. It then iterates `content_repo_ver._content_relationships()` to update dates — loading another ~267K `RepositoryContent` objects.

3. **`render_html()`** sorts the 267K entry set, then renders a Jinja template producing ~267K `<a href>` lines into a single HTML string (~27MB of HTML).

4. The complete HTML string is returned via `HTTPOk(text=...)`, holding the entire response in memory.

**Total memory impact**: ~267K ORM objects (~267MB) + 4 dicts/sets of 267K entries + sorted list + rendered HTML ≈ 800MB-1.2GB.

## Evidence from Production

- **Pod**: `pulp-content-7469c446f6-89jz5` (2 GiB memory limit)
- **OOM kill 1**: 2026-05-22 09:16 UTC — memory jumped from 521MB to 1365MB in one minute
- **OOM kill 2**: 2026-05-22 10:28 UTC — memory jumped from 477MB to 1640MB in one minute
- **Repository**: `@rubygems/rubygems/fedora-43-x86_64` — 266,934 RPM packages in latest version
- All other requests in the access logs were 302 redirects (not memory-intensive)

## Suggested Approaches

- **Stream the HTML response**: Use `StreamResponse` to write the directory listing in chunks instead of building the entire HTML string in memory
- **Paginate**: Limit directory listings to a configurable maximum number of entries (e.g., 10,000) with pagination links
- **Cap and warn**: If the directory listing exceeds a threshold, return a truncated listing with a message indicating the listing is too large
- **Lazy iteration**: Use Django's `.iterator()` on the queryset and stream entries as they're fetched from the database, avoiding materializing all ORM objects at once
- **Pre-generate at publish/version time**: Generate the HTML directory listing pages when a publication or repository version is created, storing them as static artifacts. The content app would then serve pre-built pages instead of generating them on each request

## Related

The content app also has a gradual memory leak (~7.3 MB per 1000 requests) that compounds this issue. Worker recycling via `--max-requests` is being enabled separately to address the leak.

## Update: 504 timeout even when OOM is resolved

After increasing the content pod memory limit from 2Gi to 3Gi, the directory listing for `@rubygems/rubygems/fedora-43-x86_64` (266,934 packages) no longer OOM-kills the pod — it peaked at 1532MB and survived. However, the request still fails with a **504 Gateway Timeout** because building the directory listing takes longer than 30 seconds to complete.

This reinforces the need for an approach that avoids building the entire response on-the-fly at request time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content app OOM on directory listing for large repositories #7745

Problem

Root Cause

Evidence from Production

Suggested Approaches

Related

Update: 504 timeout even when OOM is resolved

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Content app OOM on directory listing for large repositories #7745

Description

Problem

Root Cause

Evidence from Production

Suggested Approaches

Related

Update: 504 timeout even when OOM is resolved

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions