Skip to content

Add a stored normalized name column with an index#1160

Open
git-hyagi wants to merge 1 commit intopulp:mainfrom
git-hyagi:pulp-python-high-db-load-issue
Open

Add a stored normalized name column with an index#1160
git-hyagi wants to merge 1 commit intopulp:mainfrom
git-hyagi:pulp-python-high-db-load-issue

Conversation

@git-hyagi
Copy link
Contributor

Add a name_normalized field to PythonPackageContent that stores the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value, populated via a BEFORE_SAVE hook.
Add db_index=True.
Change all name__normalize= lookups to use name_normalized__exact=. This eliminates the regex computation at query time.

closes: #1159
Assisted By: claude-opus-4.6

📜 Checklist

  • Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
  • A changelog entry or entries has been added for any significant changes
  • Follows the Pulp policy on AI Usage
  • (For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

@git-hyagi git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 2edeffd to 02d0537 Compare March 24, 2026 19:28
@git-hyagi git-hyagi requested a review from gerrod3 March 24, 2026 19:42
@git-hyagi git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 02d0537 to 53b6f61 Compare March 24, 2026 20:32
Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@git-hyagi git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 53b6f61 to a6d943c Compare March 25, 2026 15:08
@git-hyagi git-hyagi requested review from gerrod3 and jobselko March 25, 2026 15:23
Copy link
Contributor

@jobselko jobselko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review - I will look at viewsets.py tomorrow.

Comment on lines -306 to +311
names = content.order_by("name").values_list("name", flat=True).distinct().iterator()
names = (
content.order_by("name_normalized")
.values_list("name", flat=True)
.distinct("name_normalized")
.iterator()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was name changed to name_normalized and why was name_normalized added to distinct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was name changed to name_normalized

now that we have a name_normalized field, we can use it to do a "better ordering" (instead of ordering Django, django, and DJANGO as different entries and in different order, we can now handle them as a "same" entrance during ordering).

why was name_normalized added to distinct?

https://docs.djangoproject.com/en/5.2/ref/models/querysets/#django.db.models.query.QuerySet.distinct

"On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply. [..]For a normal distinct() call, the database compares each field in each row when determining which rows are distinct. For a distinct() call with specified field names, the database will only compare the specified field names."

"When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order." (order_by and distinct fields must match)

Add a name_normalized field to PythonPackageContent that stores
the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value, populated
via a BEFORE_SAVE hook.
Add db_index=True.
Change all name__normalize= lookups to use name_normalized__exact=.
This eliminates the regex computation at query time.

closes: pulp#1159
Assisted By: claude-opus-4.6
@git-hyagi git-hyagi force-pushed the pulp-python-high-db-load-issue branch from a6d943c to e8dcc19 Compare March 25, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NormalizeName transform uses unindexable REGEXP_REPLACE

3 participants