Skip to content

Fix Python ecosystem PyPI-to-import name mismatch, Go stdlib handling, and integration test accuracy#243

Open
tmihalac wants to merge 21 commits into
RHEcosystemAppEng:mainfrom
tmihalac:fix-python
Open

Fix Python ecosystem PyPI-to-import name mismatch, Go stdlib handling, and integration test accuracy#243
tmihalac wants to merge 21 commits into
RHEcosystemAppEng:mainfrom
tmihalac:fix-python

Conversation

@tmihalac
Copy link
Copy Markdown

@tmihalac tmihalac commented May 31, 2026

Changes

Python CCA name resolution (chain_of_calls_retriever.py, dep_tree.py, python_functions_parser.py)

  • Add _resolve_tree_key with is_same_package fallback for normalized tree lookups
  • Two-pass __determine_doc_package_name: exact dict lookup first, then is_same_package fallback
  • Fix _get_parents to use is not None check instead of or (empty lists were treated as falsy)
  • Re-key dep tree from PyPI names to import names via _find_module_dirs/top_level.txt
  • Add PEP 503 is_same_package override in PythonLanguageFunctionsParser
  • Override get_package_name and filter_docs_by_func_pkg_name for Python name normalization
  • Wire dependency builder into parser via lang_functions_parsers_factory
  • Use parser is_same_package in FL instead of inline lambda

Python deptree venv resilience (dep_tree.py, .tekton/)

  • Root cause: when the pickle cache is warm, install_dependencies() is skipped. The venv's bin/python is a symlink to the previous pod's interpreter path, which breaks across pod restarts. Additionally, uv fails with Permission denied on ~/.cache/uv in OpenShift containers when UV_CACHE_DIR is not set.
  • Add _ensure_uv_cache_dir: sets UV_CACHE_DIR fallback to writable path inside the cloned repo when not already configured. Called from both build_tree and install_dependencies.
  • Add _ensure_venv_python: detects broken/missing bin/python and re-creates the venv. Uses _detect_existing_site_packages_version to match the Python version of existing packages (e.g. python3.12/site-packages/) instead of project metadata, preventing version mismatch.
  • Add _build_flat_tree_from_manifest: last-resort fallback that builds a flat dependency tree from requirements.txt if deptree still returns empty.
  • Set UV_CACHE_DIR=/tmp/uv-cache in .tekton/on-pull-request.yaml and on-cm-runner.yaml sidecar env.

Deptree invocation fix (dep_tree.py)

  • Invoke deptree as python -m deptree instead of bin/deptree entry point (broken by uv pip install)
  • Replace pip install with uv pip install for deptree setup

Go stdlib package selection (base_graph_agent.py)

  • Add _find_go_stdlib_candidate to bypass LLM package filter for Go stdlib packages (e.g. crypto/x509)
  • Scan both candidate_packages and critical_context hints for stdlib module paths
  • Use _GO_STDLIB_ROOTS frozenset for reliable detection

CCA and parser fixes

  • Fix dotted function name splitting with rpartition instead of split (chain_of_calls_retriever.py)
  • Guard is_function check before get_function_name in JS parser search_for_called_function

Prompt and rule improvements (react_internals.py)

  • Merge reachability rules from 9/10 down to 6/7 — combine package/function priority into Rule 6
  • Add patch hint preference and FL close-match retry guidance to merged Rule 6
  • Update all runtime rule violation messages to match new numbering

Integration test accuracy (integration-tests-input.json)

  • Fix Go CVE-2024-51744 expected result: code_not_reachable (was incorrectly vulnerable)

Patch enrichment (intel_utils.py)

  • Add JS/Python test function noise filter names (describe, it, beforeEach, expect, etc.)

tmihalac added 2 commits May 31, 2026 22:01
…low-ups

  Python name normalization:
  - Add _resolve_tree_key in CCA to fall back to is_same_package for tree
  lookups
  - Re-key dep tree from PyPI names to import names via
  _find_module_dirs/top_level.txt
  - Add PEP 503 is_same_package override in PythonLanguageFunctionsParser
  - Wire dependency builder into parser via lang_functions_parsers_factory
  - Use parser is_same_package in FL _is_package_available instead of inline
  lambda
  - Use PEP 503 re.sub(r'[-_.]', '-') for root dep matching instead of
  replace('-', '_')
  - Replace pip install with uv pip install for deptree setup

  CCA query parsing:
  - Fix dotted function name splitting with rpartition instead of split
  - Fix __determine_doc_package_name to use _resolve_tree_key with fallback

  JavaScript parser:
  - Guard is_function check before get_function_name in
  search_for_called_function

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

/test vulnerability-analysis-on-pr

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 1, 2026

/test vulnerability-analysis-on-pr

2 similar comments
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 1, 2026

/test vulnerability-analysis-on-pr

@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 1, 2026

/test vulnerability-analysis-on-pr

…, and integration test accuracy

Python CCA name resolution:
- Add _resolve_tree_key with is_same_package fallback for normalized tree lookups
- Re-key dep tree from PyPI names to import names via _find_module_dirs/top_level.txt
- Add PEP 503 is_same_package override in PythonLanguageFunctionsParser
- Wire dependency builder into parser via lang_functions_parsers_factory
- Use parser is_same_package in FL instead of inline lambda
- Invoke deptree as python -m deptree instead of bin/deptree entry point
- Replace pip install with uv pip install for deptree setup

Go stdlib package selection:
- Add _find_go_stdlib_candidate to bypass LLM package filter for Go stdlib packages
- Scan both candidate_packages and critical_context hints for stdlib module paths
- Use _GO_STDLIB_ROOTS frozenset for reliable detection

CCA and parser fixes:
- Fix dotted functionion and move Go '/'
check into correct bl
- Fix Python parser Nt in code_documents
- Guard is_function c
search_for_called_fun

Prompt and rule impro
- Merge reachability le Rule 6
(package/function pri
- Add patch hint preftry guidance
- Reduce rule count:
- Update all runtime rtions to match new
numbering

Patch enrichment:
- Add JS/Python test FUNC_NAMES (describe,it, beforeEach, etc.)

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac tmihalac changed the title Fix Python ecosystem PyPI-to-import name mismatch and code review follow-ups Fix Python ecosystem PyPI-to-import name mismatch, Go stdlib handling, and integration test accuracy Jun 1, 2026
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 1, 2026

/test vulnerability-analysis-on-pr

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

1 similar comment
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

  - build_tree() fails when pickle cache is warm and
  install_dependencies() is skipped
  - transitive_env/bin/python is a broken symlink from a previous pod
  - uv fails with Permission denied on ~/.cache/uv when UV_CACHE_DIR is
  not set
  - Empty dep tree causes FL to report "Package not found" for valid
  packages
  - Add _ensure_uv_cache_dir: sets UV_CACHE_DIR fallback to writable path
  in cloned repo
  - Add _ensure_venv_python: re-creates venv when bin/python is missing
  or broken
  - Add _build_flat_tree_from_manifest: fallback flat tree from
  requirements.txt
  - Call helpers from both build_tree and install_dependencies
  - Set UV_CACHE_DIR=/tmp/uv-cache in .tekton sidecar env for
  on-pull-request and on-cm-runner

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

  - uv downloaded Python 3.13 but packages live under python3.12
  site-packages
  - Add _detect_existing_site_packages_version: finds installed
  Python version from transitive_env/lib/python*/
  - Prefer detected version over project metadata when re-creating
  venv

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

5 similar comments
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

  version hint exists

  - When no existing site-packages and no project metadata, uv
  downloads latest Python (3.13)
  - Packages installed by lint-and-test under python3.12 become
  invisible to python3.13 deptree
  - Add sys.version_info fallback as last resort so venv matches
  the running container's Python
  - Always pass explicit --python to uv venv, never let uv pick
  the latest

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

2 similar comments
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

  install_dependencies

  - Extract venv creation from install_dependencies into
  _ensure_venv with existence check
  - Both build_tree and install_dependencies now call _ensure_venv
  for consistent Python version
  - Remove _detect_existing_site_packages_version — got confused
  by polluted python3.13 directories
  - Remove sys.version_info fallback — unnecessary when using
  determine_python_version consistently
  - Both lint-and-test (UBI) and integration-test (sidecar)
  produce same Python version via determine_python_version

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

  version hint exists

  - When determine_python_version returns None, uv venv without
  --python downloads latest Python (3.13)
  - Sidecar image has no system Python so uv can't find 3.12 on
  PATH
  - Add sys.version_info fallback so --python is always passed
  explicitly
  - Prevents version mismatch between lint-and-test (python3.12)
  and sidecar (python3.13)

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

tmihalac added 2 commits June 2, 2026 19:54
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  symlink breakage

  - Set UV_PYTHON_INSTALL_DIR on PVC for both lint-and-test and
  integration-test sidecar
  - Both containers find Python at the same PVC path, bin/python
  symlink no longer breaks
  - lint-and-test: UV_PYTHON_INSTALL_DIR points to
  .cache/am_cache/python (PVC via symlink)
  - integration-test sidecar: UV_PYTHON_INSTALL_DIR points to
  /exploit-iq-data/python (same PVC)

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test vulnerability-analysis-on-pr

tmihalac added 7 commits June 2, 2026 21:40
  shared PVC

  - Replace ubi9/python-312 with ubi9/ubi-minimal (no system
  Python)
  - uv downloads standalone Python to UV_PYTHON_INSTALL_DIR on PVC
  - bin/python symlink points to PVC path, survives across
  containers

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  - ubi-minimal lacks git, curl, tar, make, gcc needed by the
  build script
  - Add microdnf install step before git config

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  curl-minimal which conflicts

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  PATH by default

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
…ests

  - python-312 image had Node v20 pre-installed, ubi-minimal
  doesn't
  - Enable nodejs:20 module stream to match the major version
  - npm ls --json --all needs Node.js to build JavaScript
  dependency trees

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
…ted.

  Fix shared PVC Python path for cross-container venv symlinks

  - Mount PVC at /exploit-iq-data in lint-test step (same path as
  sidecar)
  - Set UV_PYTHON_INSTALL_DIR=/exploit-iq-data/python in both
  containers
  - Run uv python install 3.12 explicitly so Python is stored on
  PVC
  - uv venv --python 3.12 now symlinks to PVC path that exists in
  both containers

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  CCA

  - Agent was calling CCA with formparser.MultiPartParser.parse
  (method) instead of formparser.MultiPartParser (class)
  - CCA searches for the last component — parse() isn't directly
  called from app code, MultiPartParser is
  - Add REACHABILITY_AGENT_THOUGHT_INSTRUCTIONS_PYTHON with Rule
  7: use class name, not method name for CCA
  - Wire up ecosystem-specific prompt selection: Go, Python, or
  default

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 2, 2026

/test-heavy

1 similar comment
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 3, 2026

/test-heavy

tmihalac added 2 commits June 3, 2026 14:20
  dummy-branch warning

  - FL tool description: replace static libxml2 example with
  {fl_input_format} template
  - Add FL_INPUT_FORMATS and FL_EXAMPLES dicts in
  prompt_factory.py for per-ecosystem examples
  - Substitute ecosystem-specific input format in
  _build_tool_guidance_for_ecosystem
  - FL error message: use ecosystem-specific example from
  FL_EXAMPLES instead of libxml2
  - CCA: warn agent when package not in dependency tree
  (dummy-branch fallback used)

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  tree

  - When CCA falls back to import scanning (package not in
  tree_dict), prepend NOTE to first result
  - Helps agent distinguish real call chain analysis from
  import-based fallback
  - Applies to all ecosystems: Go stdlib, Java missing source
  JARs, C/C++ system libs
  - Prepend to existing entry instead of appending to preserve
  list length for assertions

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 3, 2026

/test vulnerability-analysis-on-pr

1 similar comment
@tmihalac
Copy link
Copy Markdown
Author

tmihalac commented Jun 3, 2026

/test vulnerability-analysis-on-pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant