Add a stored normalized name column with an index by git-hyagi · Pull Request #1160 · pulp/pulp_python

git-hyagi · 2026-03-24T17:58:51Z

Add a name_normalized field to PythonPackageContent that stores the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value, populated via a BEFORE_SAVE hook.
Add db_index=True.
Change all name__normalize= lookups to use name_normalized__exact=. This eliminates the regex computation at query time.

closes: #1159
Assisted By: claude-opus-4.6

📜 Checklist

Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
A changelog entry or entries has been added for any significant changes
Follows the Pulp policy on AI Usage
(For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

pulp_python/app/models.py

gerrod3

Thanks!

pulp_python/app/models.py

jobselko

Partial review - I will look at viewsets.py tomorrow.

jobselko · 2026-03-25T16:30:38Z

pulp_python/app/pypi/views.py

-        names = content.order_by("name").values_list("name", flat=True).distinct().iterator()
+        names = (
+            content.order_by("name_normalized")
+            .values_list("name", flat=True)
+            .distinct("name_normalized")
+            .iterator()
+        )


Why was name changed to name_normalized and why was name_normalized added to distinct?

Why was name changed to name_normalized

now that we have a name_normalized field, we can use it to do a "better ordering" (instead of ordering Django, django, and DJANGO as different entries and in different order, we can now handle them as a "same" entrance during ordering).

why was name_normalized added to distinct?

https://docs.djangoproject.com/en/5.2/ref/models/querysets/#django.db.models.query.QuerySet.distinct

"On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply. [..]For a normal distinct() call, the database compares each field in each row when determining which rows are distinct. For a distinct() call with specified field names, the database will only compare the specified field names."

"When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order." (order_by and distinct fields must match)

pulp_python/app/models.py

Add a name_normalized field to PythonPackageContent that stores the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value, populated via a BEFORE_SAVE hook. Add db_index=True. Change all name__normalize= lookups to use name_normalized__exact=. This eliminates the regex computation at query time. closes: pulp#1159 Assisted By: claude-opus-4.6

git-hyagi requested review from dkliban, gerrod3 and jobselko March 24, 2026 17:58

gerrod3 reviewed Mar 24, 2026

View reviewed changes

pulp_python/app/models.py Outdated Show resolved Hide resolved

git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 2edeffd to 02d0537 Compare March 24, 2026 19:28

git-hyagi requested a review from gerrod3 March 24, 2026 19:42

git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 02d0537 to 53b6f61 Compare March 24, 2026 20:32

gerrod3 approved these changes Mar 24, 2026

View reviewed changes

jobselko reviewed Mar 25, 2026

View reviewed changes

pulp_python/app/models.py Outdated Show resolved Hide resolved

git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 53b6f61 to a6d943c Compare March 25, 2026 15:08

git-hyagi requested review from gerrod3 and jobselko March 25, 2026 15:23

jobselko reviewed Mar 25, 2026

View reviewed changes

git-hyagi force-pushed the pulp-python-high-db-load-issue branch from a6d943c to e8dcc19 Compare March 25, 2026 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a stored normalized name column with an index#1160

Add a stored normalized name column with an index#1160
git-hyagi wants to merge 1 commit intopulp:mainfrom
git-hyagi:pulp-python-high-db-load-issue

git-hyagi commented Mar 24, 2026

Uh oh!

Uh oh!

gerrod3 left a comment

Uh oh!

Uh oh!

jobselko left a comment

Uh oh!

jobselko Mar 25, 2026

Uh oh!

git-hyagi Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

git-hyagi commented Mar 24, 2026

📜 Checklist

Uh oh!

Uh oh!

gerrod3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jobselko left a comment

Choose a reason for hiding this comment

Uh oh!

jobselko Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

git-hyagi Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants