Skip to content

OCPBUGS-85410: Fix performance related issues when selinux metrics are emitted#2668

Open
gnufied wants to merge 3 commits into
openshift:masterfrom
gnufied:manual-backport-selinux-perf-fix-ocp-master
Open

OCPBUGS-85410: Fix performance related issues when selinux metrics are emitted#2668
gnufied wants to merge 3 commits into
openshift:masterfrom
gnufied:manual-backport-selinux-perf-fix-ocp-master

Conversation

@gnufied
Copy link
Copy Markdown
Member

@gnufied gnufied commented May 18, 2026

Fixes https://redhat.atlassian.net/browse/OCPBUGS-85410

Summary by CodeRabbit

  • Performance
    • Faster SELinux conflict detection via parsed labels and optimized caching for quicker, more reliable identification across volumes and pods.
  • New Behavior
    • Conflict reporting is now aggregated and returned as a snapshot, simplifying downstream consumption.
  • Bug Fixes
    • Improved cache consistency and targeted updates reduce stale/conflicting entries.
  • Tests
    • Expanded test coverage for multi-volume conflicts, deletion scenarios, and label parsing.

@openshift-ci-robot openshift-ci-robot added backports/validated-commits Indicates that all commits come to merged upstream PRs. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@gnufied: This pull request references Jira Issue OCPBUGS-85410, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Fixes https://redhat.atlassian.net/browse/OCPBUGS-85410

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown

@gnufied: the contents of this pull request could be automatically validated.

The following commits are valid:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@gnufied gnufied changed the title OCPBUGS-85410: Manual backport selinux perf fix ocp master OCPBUGS-85410: Fix performance related issues when selinux metrics are emitted May 18, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f37276f6-faa9-4945-aeaa-e57a1400cdea

📥 Commits

Reviewing files that changed from the base of the PR and between f3ff95e and 1f27b44.

📒 Files selected for processing (7)
  • pkg/controller/volume/selinuxwarning/cache/volumecache.go
  • pkg/controller/volume/selinuxwarning/cache/volumecache_test.go
  • pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go
  • pkg/controller/volume/selinuxwarning/internal/parse/selinux_label_test.go
  • pkg/controller/volume/selinuxwarning/metrics.go
  • pkg/controller/volume/selinuxwarning/selinux_warning_controller_test.go
  • pkg/controller/volume/selinuxwarning/translator/selinux_translator.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/controller/volume/selinuxwarning/metrics.go
  • pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go
  • pkg/controller/volume/selinuxwarning/cache/volumecache_test.go
  • pkg/controller/volume/selinuxwarning/cache/volumecache.go

Walkthrough

The PR refactors the SELinux warning controller's conflict cache from a streaming SendConflicts pattern to a direct list-returning GetConflicts pattern. It adds reverse indexing and per-volume conflict caching for efficiency, pre-parses SELinux labels into structured arrays, and updates all consumers to the new interface.

Changes

SELinux Conflict Cache Refactoring

Layer / File(s) Summary
SELinux label parsing utility
pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go, pkg/controller/volume/selinuxwarning/internal/parse/selinux_label_test.go
Introduces ParseSELinuxLabel(label string) [4]string to split SELinux labels into fixed arrays of user, role, type, and level components, with comprehensive test coverage for edge cases.
Translator refactoring for parsed labels
pkg/controller/volume/selinuxwarning/translator/selinux_translator.go
Conflicts method now parses labels via ParseSELinuxLabel and delegates to new ConflictsParsed helper that compares pre-split label components, eliminating repeated string parsing.
Cache interface and state data structures
pkg/controller/volume/selinuxwarning/cache/volumecache.go
VolumeCache interface replaces SendConflicts with GetConflicts() []Conflict. Internal struct adds reverse pod-to-volumes index and per-volume conflict cache; podInfo now caches parsed SELinux label components in seLinuxParts.
AddVolume with reverse indexing and conflict caching
pkg/controller/volume/selinuxwarning/cache/volumecache.go
AddVolume registers pod-volume relationships in reverse index, stores parsed labels, computes conflicts using ConflictsParsed, caches results per volume, and removes stale conflicts when pods are updated.
DeletePod with cached conflict pruning
pkg/controller/volume/selinuxwarning/cache/volumecache.go
DeletePod prunes conflicts from cache, removes pods only from volumes they actually use via reverse index, cleans up empty entries, and maintains reverse-index consistency via new registerPodVolume helper.
GetConflicts public API and cache debugging
pkg/controller/volume/selinuxwarning/cache/volumecache.go
Implements new GetConflicts method that aggregates all cached conflicts into a single list. Updates dump output to report reverse pod-to-volumes index instead of detailed label logging.
Comprehensive cache test suite
pkg/controller/volume/selinuxwarning/cache/volumecache_test.go
Adds verifyReverseIndexConsistency helper to validate index symmetry. Renames and updates AddVolume test to use GetConflicts() with parsed label expectations. Introduces TestVolumeCache_MultiVolumeConflicts for aggregation and deduplication scenarios. Adds table-driven TestVolumeCache_DeletePodConflicts for removal and idempotency validation. Extends TestVolumeCache_DeleteAll to verify final state and index consistency.
Consumer updates for GetConflicts interface
pkg/controller/volume/selinuxwarning/metrics.go, pkg/controller/volume/selinuxwarning/selinux_warning_controller_test.go
Metrics collector and controller test double updated to call GetConflicts() directly instead of streaming via channels and goroutines.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive The custom check requires reviewing "Ginkgo test code", but all modified test files use standard Go testing (*testing.T), not Ginkgo. The check is not applicable to these tests. Modified tests use standard Go testing, not Ginkgo. Check should clarify if it applies only to Ginkgo or all tests. Test quality is good: meaningful error messages, single responsibility, proper helpers.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main performance-related changes: replacing streaming conflicts via SendConflicts with efficient direct retrieval via GetConflicts, adding reverse indexing for efficient lookups, and pre-parsing SELinux labels to avoid repeated parsing overhead.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic, using descriptive static strings with no dynamic content or generated IDs.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All test modifications are standard Go unit tests using testing.T, which are not subject to the MicroShift compatibility check.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added. Only standard Go unit tests in pkg/controller/volume/selinuxwarning/. Check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR is not applicable. Changes are entirely within controller source code (internal cache/metrics/translator logic). No deployment manifests, operator code, or scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed Code is part of standard kube-controller-manager, not an OTE test binary. OTE Stdout Contract applies only to OTE test binaries communicating via JSON on stdout.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Check is not applicable. PR adds only standard Go unit tests (testing.T), not Ginkgo e2e tests. No IPv6/disconnected network compatibility issues exist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from jerpeter1 and p0lyn0mial May 18, 2026 13:59
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gnufied
Once this PR has been reviewed and has the lgtm label, please assign jacobsee for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown

@gnufied: This pull request references Jira Issue OCPBUGS-85410, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Fixes https://redhat.atlassian.net/browse/OCPBUGS-85410

Summary by CodeRabbit

  • Performance
  • SELinux conflict detection now leverages optimized caching mechanisms with efficient label parsing to deliver faster identification of conflicts affecting volumes and containers.
  • Improved cache consistency and management provides more reliable conflict detection across multiple volumes and pod configurations for enhanced system stability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
pkg/controller/volume/selinuxwarning/cache/volumecache_test.go (1)

680-700: ⚡ Quick win

Tighten surviving-conflict assertions to reject unexpected extras.

This currently verifies required pairs exist, but when expectedSurvivingPairs is non-empty it does not fail on additional unexpected conflicts.

Suggested patch
 			// Verify each expected surviving pair exists in both directions
 			for _, pair := range tt.expectedSurvivingPairs {
 				hasForward := false
 				hasReverse := false
 				for _, conflict := range remaining {
 					if conflict.Pod == pair[0] && conflict.OtherPod == pair[1] {
 						hasForward = true
 					}
 					if conflict.Pod == pair[1] && conflict.OtherPod == pair[0] {
 						hasReverse = true
 					}
 				}
 				if !hasForward || !hasReverse {
 					t.Errorf("expected symmetric conflict between %s and %s, got %+v", pair[0], pair[1], remaining)
 				}
 			}
 
-			// If no pairs are expected, there should be no conflicts at all
-			if len(tt.expectedSurvivingPairs) == 0 && len(remaining) != 0 {
-				t.Errorf("expected no conflicts, got %+v", remaining)
-			}
+			expectedConflictCount := len(tt.expectedSurvivingPairs) * 2 // forward + reverse
+			if len(remaining) != expectedConflictCount {
+				t.Errorf("expected %d remaining conflicts, got %d: %+v", expectedConflictCount, len(remaining), remaining)
+			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/volume/selinuxwarning/cache/volumecache_test.go` around lines
680 - 700, The test currently only checks that each expected symmetric pair
appears in remaining but does not fail on extra unexpected conflicts; update the
assertion to require exact equivalence when tt.expectedSurvivingPairs is
non-empty by building the full expected set of directional conflicts (for each
pair in tt.expectedSurvivingPairs include both [a,b] and [b,a]), then assert
that remaining contains exactly that set (compare lengths and that every
conflict in remaining matches an entry in the expected set using conflict.Pod
and conflict.OtherPod) and fail if any extras or missing entries are found; keep
the existing symmetric existence check only for the empty-case branch.
pkg/controller/volume/selinuxwarning/cache/volumecache.go (1)

151-189: 💤 Low value

Consider skipping self-comparison in conflict detection loop.

The loop iterates all pods including the one being added (podKey). While self-comparison produces no conflicts (same policy, same labels), explicitly skipping it saves one iteration and one ConflictsParsed call per AddVolume.

♻️ Proposed optimization
 	// Emit conflicts for the pod
 	for otherPodKey, otherPodInfo := range volume.pods {
+		if otherPodKey == podKey {
+			continue
+		}
 		if otherPodInfo.changePolicy != changePolicy {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/volume/selinuxwarning/cache/volumecache.go` around lines 151 -
189, The loop over volume.pods currently compares the pod being added against
itself, causing an unnecessary iteration and a redundant call to
c.seLinuxTranslator.ConflictsParsed; modify the loop in the function that adds
pods (the AddVolume / pod-add block containing volume.pods, podKey, podInfo) to
explicitly skip self-comparison by checking if otherPodKey == podKey and
continue, so you avoid creating duplicate/confusing conflict entries and avoid
the extra ConflictsParsed invocation on the same pod.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/volume/selinuxwarning/cache/volumecache_test.go`:
- Line 495: The assertion message is stale: replace the "SendConflicts returned
unexpected conflicts: %+v" text with a message that reflects the current API
under test (e.g. "GetConflicts returned unexpected conflicts: %+v") in the test
that checks receivedConflicts in volumecache_test.go so failures correctly
reference GetConflicts; update the error string used in the t.Errorf call that
reports receivedConflicts.

In `@pkg/controller/volume/selinuxwarning/cache/volumecache.go`:
- Line 301: The code calls slices.Sort(podVolumes) but the slices package is not
imported; add the missing import for the slices package so the call
compiles—prefer importing the standard library "slices" (Go 1.21+), or if the
project targets older Go, import "golang.org/x/exp/slices"; ensure the import is
added alongside other imports so slices.Sort(podVolumes) resolves.

---

Nitpick comments:
In `@pkg/controller/volume/selinuxwarning/cache/volumecache_test.go`:
- Around line 680-700: The test currently only checks that each expected
symmetric pair appears in remaining but does not fail on extra unexpected
conflicts; update the assertion to require exact equivalence when
tt.expectedSurvivingPairs is non-empty by building the full expected set of
directional conflicts (for each pair in tt.expectedSurvivingPairs include both
[a,b] and [b,a]), then assert that remaining contains exactly that set (compare
lengths and that every conflict in remaining matches an entry in the expected
set using conflict.Pod and conflict.OtherPod) and fail if any extras or missing
entries are found; keep the existing symmetric existence check only for the
empty-case branch.

In `@pkg/controller/volume/selinuxwarning/cache/volumecache.go`:
- Around line 151-189: The loop over volume.pods currently compares the pod
being added against itself, causing an unnecessary iteration and a redundant
call to c.seLinuxTranslator.ConflictsParsed; modify the loop in the function
that adds pods (the AddVolume / pod-add block containing volume.pods, podKey,
podInfo) to explicitly skip self-comparison by checking if otherPodKey == podKey
and continue, so you avoid creating duplicate/confusing conflict entries and
avoid the extra ConflictsParsed invocation on the same pod.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 52eb5551-368b-4461-9666-fd519900a29b

📥 Commits

Reviewing files that changed from the base of the PR and between f9b62a6 and f3ff95e.

📒 Files selected for processing (7)
  • pkg/controller/volume/selinuxwarning/cache/volumecache.go
  • pkg/controller/volume/selinuxwarning/cache/volumecache_test.go
  • pkg/controller/volume/selinuxwarning/internal/parse/selinux_label.go
  • pkg/controller/volume/selinuxwarning/internal/parse/selinux_label_test.go
  • pkg/controller/volume/selinuxwarning/metrics.go
  • pkg/controller/volume/selinuxwarning/selinux_warning_controller_test.go
  • pkg/controller/volume/selinuxwarning/translator/selinux_translator.go

Comment thread pkg/controller/volume/selinuxwarning/cache/volumecache_test.go Outdated
Comment thread pkg/controller/volume/selinuxwarning/cache/volumecache.go
@gnufied gnufied force-pushed the manual-backport-selinux-perf-fix-ocp-master branch from f3ff95e to cfab855 Compare May 18, 2026 14:50
@openshift-ci-robot
Copy link
Copy Markdown

@gnufied: the contents of this pull request could be automatically validated.

The following commits are valid:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

tchap and others added 3 commits May 18, 2026 10:51
When calling ControllerSELinuxTranslator.Conflicts(), the SELinux label
is repeatedly split into []string to detect conflicts. This causes a huge
number of allocations when there are many comparisons.

This is now made more efficient by pre-parsing the SELinux label and
storing it in podInfo as [4]string for fast comparison when needed.
Added podToVolumes reverse index to optimize DeletePod.
Currently we simply iterate through all the volumes and remove the pod
being deleted from there. This is inefficient and takes longer the
longer the volume list becomes.

Keeping a map pod -> volumes makes removing a pod fast. We can just jump
to the relevant volumes directly and remove the pod from there.
Also prevent duplicate metric emissions
@gnufied gnufied force-pushed the manual-backport-selinux-perf-fix-ocp-master branch from cfab855 to 1f27b44 Compare May 18, 2026 14:51
@openshift-ci-robot
Copy link
Copy Markdown

@gnufied: the contents of this pull request could be automatically validated.

The following commits are valid:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@openshift-ci-robot
Copy link
Copy Markdown

@gnufied: This pull request references Jira Issue OCPBUGS-85410, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Fixes https://redhat.atlassian.net/browse/OCPBUGS-85410

Summary by CodeRabbit

  • Performance
  • Faster SELinux conflict detection via parsed labels and optimized caching for quicker, more reliable identification across volumes and pods.
  • New Behavior
  • Conflict reporting is now aggregated and returned as a snapshot, simplifying downstream consumption.
  • Bug Fixes
  • Improved cache consistency and targeted updates reduce stale/conflicting entries.
  • Tests
  • Expanded test coverage for multi-volume conflicts, deletion scenarios, and label parsing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

@gnufied: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-hypershift 1f27b44 link true /test e2e-aws-ovn-hypershift
ci/prow/k8s-e2e-conformance-aws 1f27b44 link true /test k8s-e2e-conformance-aws
ci/prow/e2e-aws-ovn-techpreview 1f27b44 link false /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-ovn-serial-1of2 1f27b44 link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-aws-ovn-runc 1f27b44 link false /test e2e-aws-ovn-runc
ci/prow/e2e-aws-ovn-techpreview-serial-1of2 1f27b44 link false /test e2e-aws-ovn-techpreview-serial-1of2
ci/prow/e2e-aws-ovn-fips 1f27b44 link true /test e2e-aws-ovn-fips
ci/prow/verify 1f27b44 link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backports/validated-commits Indicates that all commits come to merged upstream PRs. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants