Skip to content

Fixes #39054 - Improve empty deb metadata handling#11631

Draft
quba42 wants to merge 1 commit intoKatello:masterfrom
ATIX-AG:improve_empty_deb_metadata
Draft

Fixes #39054 - Improve empty deb metadata handling#11631
quba42 wants to merge 1 commit intoKatello:masterfrom
ATIX-AG:improve_empty_deb_metadata

Conversation

@quba42
Copy link
Copy Markdown
Contributor

@quba42 quba42 commented Feb 3, 2026

Remaining work: Everything relating to tests.

Summary by Sourcery

Improve handling of empty metadata for Debian repositories in Pulp and add validation for missing deb entities.

Bug Fixes:

  • Ensure initial empty deb metadata is removed from Pulp repositories before syncing or uploading real content, preventing stale empty components from persisting.
  • Raise a dedicated error when a deb repository version lacks required components or distributions, surfacing inconsistent or invalid repository states earlier.

Enhancements:

  • Add support to destroy the initially created empty deb component metadata on demand via the repository backend service.
  • Always initialize deb repositories on creation, regardless of whether they are library instances.

Tests:

  • Update deb upload orchestration tests to account for additional repository versions created by initializing and later removing empty deb metadata.

@quba42 quba42 force-pushed the improve_empty_deb_metadata branch from caa608c to 221e90d Compare February 3, 2026 14:24
@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Feb 3, 2026

The following tests require new VCR recordings:

test/services/katello/pulp3/deb_test.rb
test/actions/pulp3/orchestration/deb_upload_test.rb

This is very much expected, since the fix consists in sending new API calls to pulp_deb.

@quba42 quba42 force-pushed the improve_empty_deb_metadata branch from 221e90d to 1462578 Compare February 3, 2026 15:13
@quba42 quba42 marked this pull request as ready for review February 3, 2026 15:27
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In destroy_empty_metadata, consider handling cases where content_release_components_api.list(opts) returns no results (or more than one) to avoid potential nil access and to make the cleanup behavior explicit when the expected empty component is missing.
  • Both pulp_components and pulp_distributions now raise MissingDebEntityError when their respective lists are empty; if there are valid scenarios where a version can have components but no distributions (or vice versa), you may want to differentiate these cases or only enforce the invariant where it is truly required.
  • The guards in sync/upload (repository.version_href.ends_with?("/1/")) assume version_href is present and stable; consider defensive checks for nil and/or documenting why version /1/ is always the initial empty metadata version to make this coupling clearer for future maintainers.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `destroy_empty_metadata`, consider handling cases where `content_release_components_api.list(opts)` returns no results (or more than one) to avoid potential `nil` access and to make the cleanup behavior explicit when the expected empty component is missing.
- Both `pulp_components` and `pulp_distributions` now raise `MissingDebEntityError` when their respective lists are empty; if there are valid scenarios where a version can have components but no distributions (or vice versa), you may want to differentiate these cases or only enforce the invariant where it is truly required.
- The guards in sync/upload (`repository.version_href.ends_with?("/1/")`) assume `version_href` is present and stable; consider defensive checks for `nil` and/or documenting why version `/1/` is always the initial empty metadata version to make this coupling clearer for future maintainers.

## Individual Comments

### Comment 1
<location> `app/services/katello/pulp3/repository/apt.rb:27-31` </location>
<code_context>
           api.content_release_components_api.create(opts)
         end

+        def destroy_empty_metadata
+          # What we initialize in initialize_empty, we must also remove once it is no longer needed!
+          opts = {:component => "empty", :distribution => "katello"}
+          repo_href = repository_reference.repository_href
+          component_href = api.content_release_components_api.list(opts).results[0].pulp_href
+          api.repositories_api.modify(repo_href, remove_content_units: [component_href])
+        end
</code_context>

<issue_to_address>
**issue:** Handle the case where the 'empty' component is not found before indexing into results[0].

This assumes `list(opts).results` always has at least one element. If the `empty/katello` component was removed or never created, `results[0]` will be `nil` and calling `.pulp_href` will raise. Please guard against `results.empty?` (e.g., early return or explicit error), especially since this runs on sync/upload paths.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread app/services/katello/pulp3/repository/apt.rb Outdated
@quba42 quba42 force-pushed the improve_empty_deb_metadata branch 4 times, most recently from da07980 to 48901c1 Compare February 10, 2026 10:47
@quba42 quba42 changed the title Fixes #39054 - Imporve empty deb metadata handling Fixes #39054 - Improve empty deb metadata handling Feb 10, 2026
@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Feb 13, 2026

I need to re-record the following:

record=true mode=all ktest test/services/katello/pulp3/deb_test.rb
record=true mode=all ktest test/actions/pulp3/orchestration/deb_upload_test.rb

@quba42 quba42 force-pushed the improve_empty_deb_metadata branch from 48901c1 to e709b60 Compare February 13, 2026 12:50
@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Feb 13, 2026

Note: VCR recordings were performed with pulp_deb_client 3.7.0. This is out of date with the Katello gemspec: https://github.com/Katello/katello/blob/master/katello.gemspec#L59

However, the new client is not yet available in packaging: theforeman/foreman-packaging#13013

If it becomes available I can re-do the recordings.

@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Feb 19, 2026

We have discovered that this change currently breaks export/import for deb content.

@m-bucher
Copy link
Copy Markdown
Contributor

We have discovered that this change currently breaks export/import for deb content.

That one is fixed now, but there is another one left:

Publishing a new ContentView version with no content (e.g. all packages filtered out) fails in Actions::Candlepin::Product::ContentCreate with Katello::Errors::MissingDebEntityError.

Co-authored-by: Markus Bucher <bucher@atix.de>
@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Mar 2, 2026

I re-based on top of current master and re-recorded the VCR tests using rubygem-pulp_deb_client 3.8.1.

Export/Import is fixed. As @m-bucher has mentioned the current state will now throw a hard error for the following workflow:

  • Create a deb repo and add it to a CV.
  • Create a deb exclude filter that removes all packages from the deb repo in question.
  • Publish a new CV version.

In the past this would create a broken CV version that cannot be synced to proxies or consumed by hosts without error. Now it throws the newly added Katello::Errors::MissingDebEntityError when trying to publish the new CV version. Making this kind of issue visible by failing early is the very much the intention behind this change.

Since this is a rarely used workflow (it is at least somewhat odd to add a repo to a CV only to filter out all of its packages), that was differently broken before, I would like to fix it in a follow up PR, and not have it block this PR.


def remove_empty_metadata
# What we initialize in initialize_empty, we must also remove once it is no longer needed!
opts = {:component => "empty", :distribution => "katello", :repository_version => repo.version_href}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not fully aware of how Pulp Deb works, how did having this empty component around before lead to broken content view versions?

Copy link
Copy Markdown
Contributor Author

@quba42 quba42 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its more complicated: The status quo before was: "We add it in at repo initialization, to have a publishable and consumable empty repo".

If you then sync actual content using mirroring, it automatically gets removed again because of the mirroring. If you sync additive or use upload to add content, it does not get removed.

Both variants have certain issues:

  • If it got removed, and then you filter out all the packages in a CV, you get a green Pulp publish that is completely empty. This cannot be synced to smart proxy, or consumed on hosts because there is nothing there to sync/consume. This is what I mean by: "lets you create a broken CV version".
  • If it did not get removed (the additive and upload cases), then you get some meaningless APT warnings on consuming hosts essentially complaining that there is an extra empty component with nothing useful in it. This is technically harmless, but confusing and annoying for users.

The solution I implemented is:

  • We always explicitly remove the empty component when we first add real packages to the repo (including for additive and upload).
  • In addition we make "you are trying to publish a completely empty Pulp repo version" a hard error to make remaining edge cases visible right then and there. (With the plan to fix any such edge cases in follow up tasks over time). This prevents us from just creating empty Pulp publications by accident, which only start causing problems when they are synced to smart proxies or consumed on hosts, essentially placing landmines in what looks like a successfully created CV version.

Copy link
Copy Markdown
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally fine with the code changes after scanning through. Just want to make sure it's tested first.

return [] if repo.version_href.blank?
return ["all"] if version_missing_structure_content?
pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort
components = pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubocop isn't complaining about these long lines?

Copy link
Copy Markdown
Contributor Author

@quba42 quba42 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for me locally nor here on the PR. 🤷

Copy link
Copy Markdown
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I tested a case with the empty component in the CV, and one without. I see the new error. The CV publish getting paused is problematic, but it should be rare that people filter out all packages.

Just so I understand, the idea is to later tackle the empty repo edge case here to avoid paused CV publishes? I agree it's better to improve the 99% case anyway, so I think I'm cool with the latest state.

pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort
components = pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort
if components.empty?
fail ::Katello::Errors::MissingDebEntityError.new('component', repo.name, repo.version_href)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried the following:

  1. Create empty deb repo with 'additive' mirroring policy
  2. Upload a package
  3. Add the repo to a CV
  4. Filter out all debs
  5. Publish and see success

In this case I did have the empty component and the publish succeeded.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see the Library Default Org View repo did have the new empty metadata removal task run on it when I uploaded the deb file. Should the above CV publish have failed?

I don't see the empty component on the Library Default Org repo, seems just the CV repo received one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected that to fail as well. And I would need to dig through the actions plan to figure out why, in this case it does not. But the plan for the follow up task remains mostly unchanged:

Make sure, that for both workflows, if the repo is emptied via filters, we put the "empty repo metadata" back in.

All of these subtly different edge cases are starting to make me think I made a bad design choice early on: With hind sight, it might have been better, to "fix" this entire issue on the pulp_deb side. If I created some kind of fallback mechanism on the pulp_deb side along the lines of: "If pulp_deb is asked to publish an empty repo version, just fallback to publishing some sane empty repo metadata", then Katello would not need to mess around with all of these edge cases around when to "initialize an empty repo with empty metadata" and when to "cleanup that empty repo metadata". For now I am still stuck on "sunk cost fallacy" since I have gone quite far with the current design. I will have a think of whether it still makes sense to change course completely. Even if I decide to go that route, that would take some time for the new pulp_deb feature to make it to Katello, so we should still proceed with the current changes for now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still time to consider having Pulp Deb publish empty metadata, but I agree that in the short term it's worth improving it from the Katello side.

ianballou
ianballou previously approved these changes Mar 6, 2026
Copy link
Copy Markdown
Member

@ianballou ianballou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with this, thanks!

@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Mar 10, 2026

We found a major issue affecting this change that somehow slipped through all our testing!

I am converting this back to draft!

@quba42 quba42 marked this pull request as draft March 10, 2026 09:33
@ianballou ianballou dismissed their stale review March 10, 2026 19:02

A recent comment mentioned an issue.

@ianballou
Copy link
Copy Markdown
Member

We found a major issue affecting this change that somehow slipped through all our testing!

I am converting this back to draft!

Once you figure this out I'm curious to hear what the issue was!

@quba42
Copy link
Copy Markdown
Contributor Author

quba42 commented Mar 31, 2026

Once you figure this out I'm curious to hear what the issue was!

In short we are now getting the error we introduced for workflows that should be working.

We are now thinking of changing our entire approach by moving responsibility for empty repo metadata to pulp_deb: pulp/pulp_deb#1424

This should allow us to greatly simplify what we need on the Katello side, so we can replace this PR with (initial draft) something like: master...ATIX-AG:katello:simplify_empty_deb_metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants