APPENG-4467- rpm analyzier mile stone one by RedTanny · Pull Request #222 · RHEcosystemAppEng/vulnerability-analysis

RedTanny · 2026-04-15T14:20:05Z

Summary

This PR implements the RPM Vulnerability Checker (Milestone 1) - a standalone pipeline branch that provides focused, two-level vulnerability investigation for RPM packages. Unlike the full E2E executor path, this checker answers three specific questions for a target package: Does the CVE apply? Where is the vulnerable code? Is a fix/mitigation in place?

JIRA: APPENG-4467

Architecture

The checker integrates as a conditional branch after add_start_time, selected via pipeline_mode: PACKAGE_CHECKER on the request JSON. It runs independently from the full pipeline while sharing fetch_intel, add_completed_time, and output_results.

START -> add_start_time
              |
    [conditional: pipeline_mode]
              |
    ┌─────────┴────────────────────┐
    |                              |
    v                              v
  [full_pipeline]           [package_checker]
  generate_vdbs            checker_init_state
  (UNCHANGED)                     |
    |                              v
    ...                    checker_fetch_intel
                                  |
                                  v
                           source_acquisition
                                  |
                                  v
                           checker_segmentation
                                  |
                                  v
                           l1_investigation -> route_after_l1
                             [vulnerable/uncertain -> l2_build_agent]
                             [protected -> generate_report]
                                  |
                                  v
                           add_completed_time -> output_results -> END

Two-Level Investigation

Level 1: Package Code Agent (Always runs)

Operates on extracted SRPM source (.spec, .patch, changelogs, source code).

Stage	Purpose
Target Package Analysis	Deterministic: find CVE-named `.patch`, parse spec `PatchN:`, extract `%changelog`, check build log for patch application
Reference Intel Gathering	Fetch fixed SRPM via BrewDownloader, detect rebase fixes, retrieve OSV/GitHub patches when Brew unavailable
ReAct Agent Loop	LLM-guided code search using Source Grep, Code Keyword Search to verify vulnerable/fixed patterns

Verdicts: code_not_present, protected_by_mitigating_control, vulnerable, uncertain

Level 2: Build Agent (Optional, runs when L1 = vulnerable)

Two sequential phases:

BuildCompilationCheck (Phase 1): Is the vulnerable code compiled into the binary? Uses .spec %build/%configure, Makefile, CMakeLists.txt, #ifdef guards, build log. May override L1 verdict to NOT_VULNERABLE if code is provably not compiled.
HardeningCheck (Phase 2): Do compiler/linker flags mitigate the CVE? Parses CFLAGS/LDFLAGS from build log, evaluates relevance to CVE mechanism (CWE). May refine to VULNERABLE_MITIGATED.

Key Features

Mandatory target_package input: User specifies the package to investigate (name, version, release, arch)
Brew profile support: Internal (Red Hat VPN) and External (Fedora public Koji) profiles via rpm_user_type
Multi-architecture build logs: Stored per-arch at logs/{arch}/build.log
VulnerabilityIntel extraction: Structured, grep-ready patterns from CVE descriptions and patches
OSV/GitHub patch retrieval: Fallback when Brew patches unavailable
Kernel package support: Kconfig-based prompts, hardening phase skipped (RHEL kernels assumed hardened)
Spec-only fallback: L2 runs without build log using spec and build-system file analysis

New Files

File	Purpose
`src/vuln_analysis/functions/cve_package_code_agent.py`	L1 agent graph and investigation
`src/vuln_analysis/functions/code_agent_graph_defs.py`	L1 state schemas, search pipelines, report generation
`src/vuln_analysis/functions/cve_build_agent.py`	L2 Build Agent graph
`src/vuln_analysis/functions/build_agent_graph_defs.py`	L2 state, `BuildHarvestReport`, `harvest_build_data()`
`src/vuln_analysis/functions/cve_checker_report.py`	Final markdown report (L1 + L2 synthesis)
`src/vuln_analysis/utils/rpm_checker_prompts.py`	All L1/L2 prompt templates
`src/vuln_analysis/utils/osv_patch_retriever.py`	OSV/GitHub patch retrieval
`src/vuln_analysis/utils/vulnerability_intel_sanitizer.py`	Post-extraction intel cleanup
`src/vuln_analysis/tools/source_inspector.py`	`SourceInspector` (multi-pattern grep)
`src/vuln_analysis/tools/source_grep.py`	Source Grep LangGraph tool
`src/vuln_analysis/tools/brew_downloader.py`	`BrewDownloader` (Koji/Brew SRPM download)
`src/vuln_analysis/configs/brew/internal-user-profile.yml`	Red Hat Brew profile
`src/vuln_analysis/configs/brew/external-user-profile.yml`	Fedora Koji profile

API Changes

New pipeline_mode field: FULL_PIPELINE (default) or PACKAGE_CHECKER
New target_package field: Required when pipeline_mode == PACKAGE_CHECKER
OpenAPI spec updated: See src/vuln_analysis/configs/openapi/openapi.json

Example request:

{
  "scan": {
    "vulns": [{ "vuln_id": "CVE-2024-12345" }]
  },
  "image": {
    "pipeline_mode": "PACKAGE_CHECKER",
    "target_package": {
      "name": "openssl",
      "version": "1.1.1k",
      "release": "8.el9_9",
      "arch": "x86_64"
    }
  }
}

Testing

Unit tests for BrewDownloader, package identification, intel sanitizer
Integration test via /test vulnerability-analysis-on-pr
Smoke test for external profile: scripts/test_fedora_brew_download.py

Limitations (v1)

RPM-only (Python/Go/Java/npm ecosystems deferred)
Single-package focus (no transitive dependency analysis)
Binary checksec/readelf path not yet implemented (planned Phase C)
External profile limited to Fedora Koji (RHEL packages require VPN)

batzionb · 2026-05-06T12:05:41Z

Can the API payload be simplified somehow?
Will this more simplified structure work:

{
  "scan": {
    "vulns": [
      { "vuln_id": "CVE-2023-0464" }
    ]
  },
  "rpm": {
      "name": "openssl",
      "version": "1.1.1k",
      "release": "8.el9_9",
      "arch": "x86_64"
    }
}

without the additional ecosystem option
more intuitive name than "target_package"
without all other fields being required and without pipeline_mode
if rpm is present, use the rpm pipleline mode
and not under the image field, which is confusing as it's not an image

batzionb · 2026-05-06T12:06:35Z

In addition - can the API changes appear in the openapi spec

batzionb · 2026-05-06T14:55:39Z

Follow up on #222 (comment)
Using the API as is, requires the client to define a dummy repo URL so request won't be rejected
See here

zvigrinberg

Hi @RedTanny
Very Good job.
Please see my comments, several things should be done.

In addition, once the code understanding sub-agent PR with the refactoring of sub-agent skeleton is merged, please rebase and adapt the 2 new sub-agents to this new template accordingly.

zvigrinberg · 2026-05-14T12:32:13Z

+
+
+_PROFILE_PATHS: dict[BrewProfileType, Path] = {
+    BrewProfileType.INTERNAL: _CONFIGS_DIR / "internal-user-profile.yml",


@RedTanny What about External profile configuration? Do you have a template file how to configure that? Just saw it wasn't implemented yet (raise of not implemented exception ), so better add comment about that, and maybe worthwhile adding some documentation about the process, that explicitly stating it.

zvigrinberg · 2026-05-14T12:35:58Z

+    base_code_index_dir: str = Field(
+        default=".cache/am_cache/code_index",
+        description="Base directory for Tantivy code index storage.",
+    )


@RedTanny Heads up about this one, Theo also touched this tool, so i anticipate conflicts here

zvigrinberg · 2026-05-14T14:12:08Z

            for root, _, files in os.walk(code_path):
                for file in files:
-                    if any(file.endswith(ext) for ext in include_extensions):
+                    if any(file.endswith(ext) for ext in include_extensions) or file in no_extension:


@RedTanny What files with no extension are you adding here?
Potentially Maybe it could add a lot of noise or a lot of irrelevant files to the documents...
Can you characterize the pattern of files that are with no extensions that you willing to add to the search??

@zvigrinberg

The original intent was to include build system files (Makefile, GNUmakefile, configure) in the full-text search index, since these files contain compilation flags and conditional logic that can reveal whether vulnerable code paths are actually built.

However, looking at this again - you raise a valid point. The agent typically uses the grep tool directly when searching for build-related patterns (like checking for -DFEATURE flags or Makefile targets), not the lexical search index. So this addition may be unnecessary noise.

I can remove this change since the grep tool cover these cases

zvigrinberg · 2026-05-14T14:19:28Z

+def _is_binary_file_path(path: str) -> bool:
+    """Check if file path has a binary file extension."""
+    path_lower = path.lower()
+    return any(path_lower.endswith(ext) for ext in _BINARY_FILE_EXTENSIONS)


@RedTanny Have you considered using the linux file utilitiy to dynamically determine that type of the file based on content? ( not all the time a binary is with the expected ext, especially when extracted from payloads as base64 or from databases...) , off course there is the performance issue here, but i think you can check that if the file is without extension or not a code extension file also.

@zvigrinberg
Good point about the limitations of extension-based detection. In this specific context, we're parsing patches fetched from GitHub APIs, so we only have file paths (not actual file content) to work with. The unidiff library's is_binary_file check (line 106) handles the content-based detection from the patch format itself. The extension check is a secondary filter for paths that might slip through.

zvigrinberg · 2026-05-14T14:25:33Z

+
+logger = LoggingFactory.get_agent_logger(__name__)
+
+_RPM_NEVRA_RE = re.compile(r"^(.+?)-(\d+):(.+?)-(.+)$")


@RedTanny Please add comment for what does NEVRA means

@zvigrinberg Done

zvigrinberg · 2026-05-14T14:51:12Z

+_JUSTIFICATION_LABEL_TO_STATUS: dict[str, _StatusLiteral] = {
+    "code_not_present": "FALSE",
+    "code_not_reachable": "FALSE",
+    "protected_by_mitigating_control": "FALSE",
+    "protected_by_compiler": "FALSE",
+    "requires_environment": "FALSE",
+    "vulnerable": "TRUE",
+    "uncertain": "UNKNOWN",
+}


@RedTanny Doesn't justification labels categories - protected_at_runtime and protected_at_perimeter , requires_dependency and requires_configuration relevant here?

@zvigrinberg the only thing maybe relevant is requires_configuration
but it is duplicate is the area is covered by code_not_present

code_not_present -- > code not compile
code_not_reachable --> code compile but because configuration defaults it is not reachable
protected_by_mitigating_control -->code is patch
protected_by_compiler --> compiler hardening flags protect from the exploit
requires_environment --> a case where vulnerability is only for 32bit system but code is compile to 64bit

zvigrinberg · 2026-05-14T15:06:39Z

In addition - can the API changes appear in the openapi spec

Yes, good point @batzionb.
@RedTanny , When you're running the agent locally, Can you please just download the updated schema openapi.json from this endpoint:

http://localhost:26466/openapi.json

Just beautify it, and put it updated in the PR.

…thought

zvigrinberg

@RedTanny Thank you for the work you've done.
In general , very good job.

Still, two general comments:

The PR description should list all the details , features, architecture and implementation details, especially for such a huge PR, currently it's missing.
Still missing some tests for the new RPM agent logic - especially for code_agent_graph_defs.py, build_agent_graph_defs.py , i suggest adding them separately in a new PR after that one will be done, due to the magnitude of this PR.

Moreover, please see more specific comments below.

zvigrinberg · 2026-05-31T07:19:52Z

+    def __new__(cls):
+        if cls._instance is None:
+            with cls._lock:
+                if cls._instance is None:
+                    cls._instance = super().__new__(cls)
+        return cls._instance
+
+    def __init__(self, json_path: str | Path | None = None) -> None:
+        if not hasattr(self, '_initialized'):
+            base_path = Path(__file__).resolve().parents[1]
+            default_json = base_path / "data" / "hardening_kb" / "hardening_kb.json"
+            self.json_path = Path(json_path) if json_path else default_json
+
+            self._entries: list[HardeningEntry] = []
+            self._cwe_index: dict[str, list[HardeningEntry]] = {}
+            self._initialized = True
+            self._load()
+
+    @classmethod
+    def get_instance(cls) -> "HardeningKB":
+        """Get the singleton instance of HardeningKB."""
+        return cls()


@RedTanny Using this code disposition, the default json will be used always, ignoring the json_path argument, which is currently not passed ( always None).
Consider propagate the json_path in that flow, or leave it as is and create another factory method to get instance from_path

@RedTanny Is the Hardening knowledge base should be configurable or it must be changed in a PR ( if at all) only after enough tests?

zvigrinberg · 2026-05-31T08:23:48Z

+            result = subprocess.run(
+                ['bsdtar', '-xf', str(file), '-C', str(output_path)],
+                capture_output=True,
+                text=True
+            )
+            if result.returncode != 0:
+                logger.error(f"Failed to extract {file.name}: {result.stderr.strip()}")


@RedTanny Why switching from a python library of tar to a tar cli utility in the function? ( in addition, need to document it as well for local development, that the developer would install it if running locally).

@zvigrinberg bsdtar more reliable and support more formats of compression

@RedTanny OK Then just add it to local development/running locally section in the documentation.

…eded

RedTanny force-pushed the APPENG-4467-Rpm-Checker branch from cde60ba to 50638ee Compare April 27, 2026 13:30

batzionb mentioned this pull request May 7, 2026

RPM analysis UI and API RHEcosystemAppEng/agent-morpheus-client#220

Merged

6 tasks

zvigrinberg self-requested a review May 13, 2026 11:56

zvigrinberg requested changes May 14, 2026

View reviewed changes

RedTanny force-pushed the APPENG-4467-Rpm-Checker branch from dff2e5c to 4d0b194 Compare May 17, 2026 13:11

RedTanny added 21 commits May 19, 2026 13:56

rpm analyzier mile stone one

cb75925

start prompt the identify keywords

50f99db

Identiy sub graph flow

88eebf0

chunk and parse code file and index it to the lexical search

4b360c3

milestone 1 locate vulnerability place

0ee5b56

locate mile stone 2

33df787

verify step 1

33e38e3

clear labels

43698cf

last changes

dda4c2a

generating report for L1 agent

a274144

fix L1 report

a2ac3ce

update prompt

7b48e67

Save changes before change in design

639f15b

redesign: preprocess node

17c806f

save work

4498ae1

first React agent loop work

684489b

improve report for downstream L1

beeb530

before report change

4225e7d

fix report

95a8ba0

cleanup and fix bug

a8f71fa

add observation logic

b563f68

RedTanny added 3 commits May 28, 2026 10:19

fix

0d3f31a

support verify download brew

03dc117

disable js avoid errors segmentation , fix identification logic tree …

95ba375

…thought

zvigrinberg requested changes May 31, 2026

View reviewed changes

zvigrinberg reviewed May 31, 2026

View reviewed changes

Comment thread src/vuln_analysis/configs/brew/external-user-profile.yml Outdated

zvigrinberg reviewed May 31, 2026

View reviewed changes

Comment thread src/vuln_analysis/tools/source_inspector.py

RedTanny added 9 commits May 31, 2026 16:27

fix identification and support chromium patch

b5d105d

code review

824800f

update identify tests

1941f95

improve performance and LLM conclusion vulnerability

467324a

fix review clean api

a323e5a

remove deadcode

4c1a8d2

codereview: fix location of import

2094232

codeReview: remove wrapper

1f53555

CodeReview fix message

5c4b0ee

RedTanny requested a review from zvigrinberg June 3, 2026 05:37

clean python cache

1e8e2f8

zvigrinberg reviewed Jun 3, 2026

View reviewed changes

Comment thread .tekton/on-pull-request.yaml Outdated

RedTanny added 3 commits June 3, 2026 09:22

clean python cache2

6d0b0d3

disable tests temp

300de5a

enable tests

2d7f07c

zvigrinberg reviewed Jun 3, 2026

View reviewed changes

Comment thread src/vuln_analysis/tools/brew_downloader.py Outdated

UI now takes the pack url from artifacts.source_url the source not ne…

13710e1

…eded

zvigrinberg reviewed Jun 3, 2026

View reviewed changes

Comment thread src/vuln_analysis/tools/source_inspector.py

zvigrinberg reviewed Jun 3, 2026

View reviewed changes

Comment thread src/vuln_analysis/tools/source_inspector.py

CodeReview default for ssl verify true

000a841

tmihalac mentioned this pull request Jun 3, 2026

Add RPM package analyzer support from client RHEcosystemAppEng/exploitiq-mcp-server#6

Merged

RedTanny added 3 commits June 3, 2026 10:22

CodeReview: safe path for readfile

d995374

CodeReveiw: safe path in _init_

1d2ead9

CodeReview: add documentation for build-essential

1bb6ab6



		_PROFILE_PATHS: dict[BrewProfileType, Path] = {
		BrewProfileType.INTERNAL: _CONFIGS_DIR / "internal-user-profile.yml",


		logger = LoggingFactory.get_agent_logger(__name__)

		_RPM_NEVRA_RE = re.compile(r"^(.+?)-(\d+):(.+?)-(.+)$")

Conversation

RedTanny commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Two-Level Investigation

Level 1: Package Code Agent (Always runs)

Level 2: Build Agent (Optional, runs when L1 = vulnerable)

Key Features

New Files

API Changes

Testing

Limitations (v1)

Uh oh!

batzionb commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

batzionb commented May 6, 2026

Uh oh!

batzionb commented May 6, 2026

Uh oh!

zvigrinberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RedTanny May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zvigrinberg commented May 14, 2026

Uh oh!

zvigrinberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

RedTanny commented Apr 15, 2026 •

edited

Loading

batzionb commented May 6, 2026 •

edited

Loading

RedTanny May 17, 2026 •

edited

Loading