Fixes #39054 - Improve empty deb metadata handling#11631
Fixes #39054 - Improve empty deb metadata handling#11631quba42 wants to merge 1 commit intoKatello:masterfrom
Conversation
caa608c to
221e90d
Compare
|
The following tests require new VCR recordings: This is very much expected, since the fix consists in sending new API calls to pulp_deb. |
221e90d to
1462578
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
destroy_empty_metadata, consider handling cases wherecontent_release_components_api.list(opts)returns no results (or more than one) to avoid potentialnilaccess and to make the cleanup behavior explicit when the expected empty component is missing. - Both
pulp_componentsandpulp_distributionsnow raiseMissingDebEntityErrorwhen their respective lists are empty; if there are valid scenarios where a version can have components but no distributions (or vice versa), you may want to differentiate these cases or only enforce the invariant where it is truly required. - The guards in sync/upload (
repository.version_href.ends_with?("/1/")) assumeversion_hrefis present and stable; consider defensive checks forniland/or documenting why version/1/is always the initial empty metadata version to make this coupling clearer for future maintainers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `destroy_empty_metadata`, consider handling cases where `content_release_components_api.list(opts)` returns no results (or more than one) to avoid potential `nil` access and to make the cleanup behavior explicit when the expected empty component is missing.
- Both `pulp_components` and `pulp_distributions` now raise `MissingDebEntityError` when their respective lists are empty; if there are valid scenarios where a version can have components but no distributions (or vice versa), you may want to differentiate these cases or only enforce the invariant where it is truly required.
- The guards in sync/upload (`repository.version_href.ends_with?("/1/")`) assume `version_href` is present and stable; consider defensive checks for `nil` and/or documenting why version `/1/` is always the initial empty metadata version to make this coupling clearer for future maintainers.
## Individual Comments
### Comment 1
<location> `app/services/katello/pulp3/repository/apt.rb:27-31` </location>
<code_context>
api.content_release_components_api.create(opts)
end
+ def destroy_empty_metadata
+ # What we initialize in initialize_empty, we must also remove once it is no longer needed!
+ opts = {:component => "empty", :distribution => "katello"}
+ repo_href = repository_reference.repository_href
+ component_href = api.content_release_components_api.list(opts).results[0].pulp_href
+ api.repositories_api.modify(repo_href, remove_content_units: [component_href])
+ end
</code_context>
<issue_to_address>
**issue:** Handle the case where the 'empty' component is not found before indexing into results[0].
This assumes `list(opts).results` always has at least one element. If the `empty/katello` component was removed or never created, `results[0]` will be `nil` and calling `.pulp_href` will raise. Please guard against `results.empty?` (e.g., early return or explicit error), especially since this runs on sync/upload paths.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
da07980 to
48901c1
Compare
|
I need to re-record the following: |
48901c1 to
e709b60
Compare
|
Note: VCR recordings were performed with pulp_deb_client 3.7.0. This is out of date with the Katello gemspec: https://github.com/Katello/katello/blob/master/katello.gemspec#L59 However, the new client is not yet available in packaging: theforeman/foreman-packaging#13013 If it becomes available I can re-do the recordings. |
|
We have discovered that this change currently breaks export/import for deb content. |
That one is fixed now, but there is another one left: Publishing a new ContentView version with no content (e.g. all packages filtered out) fails in |
Co-authored-by: Markus Bucher <bucher@atix.de>
f05269c to
de04178
Compare
|
I re-based on top of current master and re-recorded the VCR tests using rubygem-pulp_deb_client 3.8.1. Export/Import is fixed. As @m-bucher has mentioned the current state will now throw a hard error for the following workflow:
In the past this would create a broken CV version that cannot be synced to proxies or consumed by hosts without error. Now it throws the newly added Since this is a rarely used workflow (it is at least somewhat odd to add a repo to a CV only to filter out all of its packages), that was differently broken before, I would like to fix it in a follow up PR, and not have it block this PR. |
|
|
||
| def remove_empty_metadata | ||
| # What we initialize in initialize_empty, we must also remove once it is no longer needed! | ||
| opts = {:component => "empty", :distribution => "katello", :repository_version => repo.version_href} |
There was a problem hiding this comment.
Since I'm not fully aware of how Pulp Deb works, how did having this empty component around before lead to broken content view versions?
There was a problem hiding this comment.
Its more complicated: The status quo before was: "We add it in at repo initialization, to have a publishable and consumable empty repo".
If you then sync actual content using mirroring, it automatically gets removed again because of the mirroring. If you sync additive or use upload to add content, it does not get removed.
Both variants have certain issues:
- If it got removed, and then you filter out all the packages in a CV, you get a green Pulp publish that is completely empty. This cannot be synced to smart proxy, or consumed on hosts because there is nothing there to sync/consume. This is what I mean by: "lets you create a broken CV version".
- If it did not get removed (the additive and upload cases), then you get some meaningless APT warnings on consuming hosts essentially complaining that there is an extra empty component with nothing useful in it. This is technically harmless, but confusing and annoying for users.
The solution I implemented is:
- We always explicitly remove the empty component when we first add real packages to the repo (including for additive and upload).
- In addition we make "you are trying to publish a completely empty Pulp repo version" a hard error to make remaining edge cases visible right then and there. (With the plan to fix any such edge cases in follow up tasks over time). This prevents us from just creating empty Pulp publications by accident, which only start causing problems when they are synced to smart proxies or consumed on hosts, essentially placing landmines in what looks like a successfully created CV version.
ianballou
left a comment
There was a problem hiding this comment.
I'm generally fine with the code changes after scanning through. Just want to make sure it's tested first.
| return [] if repo.version_href.blank? | ||
| return ["all"] if version_missing_structure_content? | ||
| pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort | ||
| components = pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort |
There was a problem hiding this comment.
Rubocop isn't complaining about these long lines?
There was a problem hiding this comment.
Not for me locally nor here on the PR. 🤷
ianballou
left a comment
There was a problem hiding this comment.
Okay, I tested a case with the empty component in the CV, and one without. I see the new error. The CV publish getting paused is problematic, but it should be rare that people filter out all packages.
Just so I understand, the idea is to later tackle the empty repo edge case here to avoid paused CV publishes? I agree it's better to improve the 99% case anyway, so I think I'm cool with the latest state.
| pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort | ||
| components = pulp_primary_api.content_release_components_api.list({:repository_version => repo.version_href}).results.map { |x| x.plain_component }.uniq.sort | ||
| if components.empty? | ||
| fail ::Katello::Errors::MissingDebEntityError.new('component', repo.name, repo.version_href) |
There was a problem hiding this comment.
I just tried the following:
- Create empty deb repo with 'additive' mirroring policy
- Upload a package
- Add the repo to a CV
- Filter out all debs
- Publish and see success
In this case I did have the empty component and the publish succeeded.
There was a problem hiding this comment.
I did see the Library Default Org View repo did have the new empty metadata removal task run on it when I uploaded the deb file. Should the above CV publish have failed?
I don't see the empty component on the Library Default Org repo, seems just the CV repo received one.
There was a problem hiding this comment.
I would have expected that to fail as well. And I would need to dig through the actions plan to figure out why, in this case it does not. But the plan for the follow up task remains mostly unchanged:
Make sure, that for both workflows, if the repo is emptied via filters, we put the "empty repo metadata" back in.
All of these subtly different edge cases are starting to make me think I made a bad design choice early on: With hind sight, it might have been better, to "fix" this entire issue on the pulp_deb side. If I created some kind of fallback mechanism on the pulp_deb side along the lines of: "If pulp_deb is asked to publish an empty repo version, just fallback to publishing some sane empty repo metadata", then Katello would not need to mess around with all of these edge cases around when to "initialize an empty repo with empty metadata" and when to "cleanup that empty repo metadata". For now I am still stuck on "sunk cost fallacy" since I have gone quite far with the current design. I will have a think of whether it still makes sense to change course completely. Even if I decide to go that route, that would take some time for the new pulp_deb feature to make it to Katello, so we should still proceed with the current changes for now.
There was a problem hiding this comment.
I think there is still time to consider having Pulp Deb publish empty metadata, but I agree that in the short term it's worth improving it from the Katello side.
ianballou
left a comment
There was a problem hiding this comment.
I'm cool with this, thanks!
|
We found a major issue affecting this change that somehow slipped through all our testing! I am converting this back to draft! |
Once you figure this out I'm curious to hear what the issue was! |
In short we are now getting the error we introduced for workflows that should be working. We are now thinking of changing our entire approach by moving responsibility for empty repo metadata to pulp_deb: pulp/pulp_deb#1424 This should allow us to greatly simplify what we need on the Katello side, so we can replace this PR with (initial draft) something like: master...ATIX-AG:katello:simplify_empty_deb_metadata |
Remaining work: Everything relating to tests.
Summary by Sourcery
Improve handling of empty metadata for Debian repositories in Pulp and add validation for missing deb entities.
Bug Fixes:
Enhancements:
Tests: