Skip to content

Improve context extraction and test discovery performance#1846

Closed
aseembits93 wants to merge 2 commits intomainfrom
codex/perf-hotspots-context-and-discovery
Closed

Improve context extraction and test discovery performance#1846
aseembits93 wants to merge 2 commits intomainfrom
codex/perf-hotspots-context-and-discovery

Conversation

@aseembits93
Copy link
Copy Markdown
Contributor

Summary

  • reuse Jedi project instances and avoid repeated full-reference filtering in Python code context extraction
  • cache per-call file source and pruning work so context generation does not reread and reprocess the same files repeatedly
  • batch discovered-test cache writes and trim redundant Jedi scans in Python test discovery while failing open if the cache DB is read-only

Testing

  • UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/test_code_context_extractor.py -q
  • UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/test_unit_test_discovery.py -q
  • UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/benchmarks/test_benchmark_code_extract_code_context.py::test_benchmark_extract -q
  • UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/benchmarks/test_benchmark_discover_unit_tests.py::test_benchmark_codeflash_test_discovery -q

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 16, 2026

Claude finished @aseembits93's task in 7m 55s —— View job


PR Review Summary


Prek Checks

uv run prek run --from-ref origin/main — passed with no issues, no fixes needed.

Code Review

1. Naming convention violation — _get_jedi_project uses a leading underscore

File: codeflash/languages/python/context/code_context_extractor.py:60

@cache
def _get_jedi_project(project_root_path: str):  # type: ignore[no-untyped-def]

Per the repo's code style rules: "NEVER use leading underscores — Python has no true private functions, use public names." This should be get_jedi_project. The test at line 2573 also references this by name (code_context_extractor._get_jedi_project.cache_clear()).

Fix this →

2. Missing return type on cached function — type: ignore instead of annotation

File: code_context_extractor.py:60

def _get_jedi_project(project_root_path: str):  # type: ignore[no-untyped-def]

The # type: ignore[no-untyped-def] suppresses the mypy error rather than fixing it. The return type is jedi.Project but requires a conditional import. A cleaner approach:

def get_jedi_project(project_root_path: str) -> "jedi.Project":
    import jedi
    return jedi.Project(path=project_root_path)

3. Cache key missing helper_functions — potential correctness issue

File: code_context_extractor.py:268–274

cache_key = (
    file_path,
    frozenset(primary_qualified_names),
    frozenset(secondary_qualified_names),
    code_context_type,
    remove_docstrings,
)

helper_functions is used inside the function at line 321 (add_needed_imports_from_module(..., helper_functions=helper_functions)) but is not part of the cache key. If the same (file_path, names, type, docstrings) combo is ever called with different helper_functions, the cached result will be wrong.

In the current call graph this is safe because helper_functions is always derived from the same dict within one ContextProcessingCache lifetime. But it's a hidden correctness assumption that makes the code fragile. The project_root_path is also used (for relative_path computation) but not in the cache key — though since ContextProcessingCache is created per get_code_optimization_context call, this is fine in practice.

4. Uncovered error path — read-only DB fallback not tested

File: discover_unit_tests.py:186–188

except sqlite3.OperationalError as e:
    logger.debug(f"Failed to persist discovered test cache, disabling cache writes: {e}")
    self.writes_enabled = False

Coverage shows lines 186–188 are not covered. The PR description mentions "failing open if the cache DB is read-only" as a feature, but there's no test that triggers an OperationalError to verify the behavior (e.g., setting the DB file to read-only and asserting writes_enabled is False after flush()).

5. Missed optimization opportunity — two jedi.Project(...) calls not using the cache

File: code_context_extractor.py:847, 902

script = jedi.Script(script_code, project=jedi.Project(path=project_root_path))

These two calls (in extract_parameter_type_constructors and a related helper) still create new jedi.Project instances directly without going through _get_jedi_project. This is minor but inconsistent with the PR's optimization goal.

Positive notes

  • Batch commit logic in TestsCache.flush() is well-structured. Calling flush() both per-file (line 1113) and in close() provides correct durability semantics without over-committing.
  • all_defs optimization ([name for name in all_names if name.is_definition()] instead of a second script.get_names() call) is semantically correct: get_names(references=True) includes definitions by default in Jedi.
  • functions_to_optimize_by_name dict replacing the O(n) inner loop is a clean improvement.
  • ContextProcessingCache scoped per get_code_optimization_context call is the right granularity — avoids cross-call contamination.

Duplicate Detection

No duplicates detected. The _get_jedi_project cached factory is new and localized to code_context_extractor.py. The ContextProcessingCache dataclass is unique. The flush() / batched-write pattern in TestsCache has no equivalent elsewhere.

Test Coverage

File Coverage
codeflash/discovery/discover_unit_tests.py 75%
codeflash/languages/python/context/code_context_extractor.py 71%

New code paths with missing coverage:

  • discover_unit_tests.py:186–188 — the OperationalError / read-only DB fallback in flush() is not tested
  • The pruned_source_by_key and source_by_file cache hit paths in process_file_context (lines ~292–297, ~279–284) appear to be exercised implicitly via the Jedi project reuse test, but there's no dedicated test for the source/pruned-source caching layers

Last updated: 2026-03-16

@aseembits93
Copy link
Copy Markdown
Contributor Author

@claude review it

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 16, 2026

Claude finished @aseembits93's task in 3m 19s —— View job


PR Review: Improve context extraction and test discovery performance

  • Triage: assess PR scope
  • Read changed files
  • Code review
  • Post summary

Overall Assessment

The PR achieves its stated goals with good test coverage. A few correctness concerns and design questions are worth addressing before merge.


codeflash/languages/python/context/code_context_extractor.py

🔴 Correctness: get_function_sources_from_jedi — recursive calls now included as helpers

Old code used belongs_to_function_qualified(ref, qualified_function_name) which has an explicit guard:

# belongs_to_function_qualified line 1586-1590
if get_qualified_name(name.module_name, name.full_name) == qualified_function_name:
    # Handles function definition and recursive function calls
    return False

This excluded recursive self-calls from the "refs to follow" list.

The new grouping approach at code_context_extractor.py:471-478 has no such exclusion. For a recursive function f, the ref f inside f has parent().full_name == "f", so it lands in refs_by_qualified_function["f"]. When followed via goto(), it resolves back to f's definition. The not belongs_to_function_qualified(definition, ...) check at line 505 evaluates f_definition — which is NOT inside f — so belongs_to_function_qualified returns False, meaning the not passes and f gets added as its own "helper."

For the common case this is harmless (deduplicated), but it could cause subtle diffs in context extraction for recursive functions. Worth verifying with a test. Fix this →


🟡 Design: _get_jedi_project is a process-lifetime cache

@cache
def _get_jedi_project(project_root_path: str):
    import jedi
    return jedi.Project(path=project_root_path)

functools.cache = lru_cache(maxsize=None) — cached forever for the process. This is fine within a single optimization run, but if codeflash ever processes multiple projects in the same interpreter (daemon, LSP mode), or if source files change mid-session, stale Jedi state can persist. The test correctly calls cache_clear() before each test run, but production code never does. Consider documenting this limitation or using a scoped cache (e.g., store on ContextProcessingCache or explicitly clear at optimizer start). Fix this →


🟡 Type annotation: jedi_project: object | None

At code_context_extractor.py:459 and get_function_to_optimize_as_function_source:417, the jedi_project parameter is typed as object | None. Since jedi is lazily imported, use a string forward ref or TYPE_CHECKING guard:

if TYPE_CHECKING:
    import jedi as _jedi
    _JediProject = _jedi.Project
...
jedi_project: "_JediProject | None" = None

At minimum, a comment explaining why object is used would help.


🟢 ContextProcessingCache.pruned_source_by_key — key correctness

The key (file_path, frozenset(all_names)) where all_names = primary | secondary is correct: pruning only depends on which names to preserve. Different (primary, secondary) splits with the same union will correctly share the cached pruned source while having distinct code_string_by_key entries. Good.


🟢 process_file_context cache — None result caching on ValueError

Caching None on line 306 and 332 correctly prevents re-running expensive parsing on repeated failures. The pruned_source_by_key may or may not be populated at that point depending on where the exception is thrown, but that's safe.


codeflash/discovery/discover_unit_tests.py

🟡 writes_enabled = False silently disables all subsequent files

When a flush fails (e.g., DB locked), writes_enabled is set to False at discover_unit_tests.py:161, which silently skips all remaining test files for the rest of the session. This is intentional "fail open" behavior, but it means a transient lock can result in an entirely empty cache for a run. Consider resetting writes_enabled = True on the next flush() call (i.e., retry once per file rather than failing all-or-nothing for the whole session).


🟢 all_defs derived from all_names

all_names = script.get_names(all_scopes=True, references=True)
all_defs = [name for name in all_names if name.is_definition()]

Since script.get_names defaults definitions=True, all_names already contains both definitions and references. Filtering by is_definition() is equivalent to the old separate call with definitions=True, references=False. One saved Jedi scan. ✓


🟢 Batch flush placement

tests_cache.flush() at discover_unit_tests.py:1110 is called after each test file's loop body, which provides a good balance: rows are batched within a file, and the DB is committed between files. The close() calling flush() handles the last file. ✓


🟢 functions_to_optimize_by_name dict

Pre-building the dict at line 872-875 changes the fallback lookup from O(|functions_to_optimize|) per unresolved name to O(1). The redundant if func_to_opt.function_name == name.name check was correctly removed since dict.get(name.name, []) already does this match. ✓


Tests

  • test_code_context_reuses_target_file_reads asserts target_read_count <= 2 — the <= 2 bound is unexplained. A comment clarifying what cases allow a second read would help.
  • test_tests_cache_batches_commits correctly verifies zero commits before flush() and exactly one after. ✓
  • test_code_context_reuses_cached_jedi_project correctly clears the module-level cache before the test (see concern above about production code never doing this).

Summary

Severity Issue
🔴 Recursive functions may now appear as their own helpers in get_function_sources_from_jedi
🟡 @cache on _get_jedi_project is a process-lifetime singleton with no eviction
🟡 writes_enabled = False on first flush failure kills all subsequent cache writes
🟡 `jedi_project: object
🟢 All other changes are correct and well-structured

@KRRT7
Copy link
Copy Markdown
Collaborator

KRRT7 commented Mar 16, 2026

that UV_CACHE_DIR=/tmp/uv-cache looks suspicious to me

@aseembits93
Copy link
Copy Markdown
Contributor Author

@KRRT7 it's an experimental PR which I opened via the codex desktop app, i'm suspicious too. surprised though that it passes our CI (except the windows test)

@aseembits93
Copy link
Copy Markdown
Contributor Author

@claude fix the failing windows unit tests

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @aseembits93's task in 5m 23s —— View job


Fixing failing Windows unit tests

  • Investigate CI status and find failing Windows tests
  • Read the relevant changed files
  • Identify root cause of Windows failures
  • Fix the issues
  • Run prek (clean — no issues)
  • Commit and push

Root Cause

The new project_root_path = project_root_path.resolve() at line 112 of code_context_extractor.py expands Windows 8.3 short names (e.g. RUNNER~1runneradmin). However, file_path.resolve() at line 155 does not consistently expand 8.3 short names for files on the GitHub Windows runner (a known Windows quirk). This mismatch caused:

ValueError: 'C:\Users\RUNNER~1\AppData\Local\Temp\tmpgspi0fi0.py' is not in the subpath of 'C:\Users\runneradmin\AppData\Local\Temp'

The ValueError propagated to support.py:330 which catches all exceptions and returns CodeContext(target_code="", ...), making the tests fail.

Fix (commit 7944e43e)

Wrapped the cosmetic "ensure target file is first" reordering block (lines 154–159) in a try/except ValueError, matching the identical existing pattern at process_file_context lines 323–326. The reordering is purely cosmetic; skipping it when paths can't be compared has no correctness impact.

# Before
target_relative = function_to_optimize.file_path.resolve().relative_to(project_root_path.resolve())
target_blocks = [...]
...

# After
try:
    target_relative = function_to_optimize.file_path.resolve().relative_to(project_root_path)
    target_blocks = [...]
    ...
except ValueError:
    pass

Also removed the redundant .resolve() call on project_root_path in this expression since it was already resolved at line 112.

Note: the Java timing test (CV=5.17% vs <5% threshold) is a pre-existing flaky test unrelated to this PR.
| Branch

…dering

On Windows, Path.resolve() expands 8.3 short names for directories but not
always for files. After project_root_path.resolve() converts RUNNER~1 to
runneradmin, file_path.resolve() may still return the 8.3 form, causing
relative_to() to raise ValueError and silently returning empty target_code.

Wrap the cosmetic reordering block in try/except to match the existing pattern
in process_file_context, preventing the crash without any correctness loss.

Co-authored-by: Aseem Saxena <aseembits93@users.noreply.github.com>
@KRRT7
Copy link
Copy Markdown
Collaborator

KRRT7 commented Mar 17, 2026

@claude what's the difference between the implementations here vs the implementations in #1660 ultra think

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @KRRT7's task in 2m 26s —— View job


PR #1846 vs PR #1660: Deep Comparison

  • Gather context on both PRs
  • Read relevant changed files
  • Deep comparison of implementations
  • Post analysis

TL;DR

These two PRs attack the same performance bottleneck (redundant work in code_context_extractor.py and discover_unit_tests.py) but from completely different angles — and they are currently on parallel tracks (neither is based on the other). Merging #1846 first and then landing #1660 would mean #1660 partially undoes #1846's work.


Scope

PR #1846 (this PR) PR #1660
Focus Narrow: caching/memoization in 2 files Broad: ranking overhaul + context refactor across 15+ files
Lines changed +319 / -77 +2666 / -1987
Files touched 4 21
Strategy Add a caching layer on top of the existing architecture Restructure the architecture to eliminate the redundant work structurally

code_context_extractor.py — Same Problem, Different Solutions

Root cause: get_code_optimization_context calls extract_code_markdown_context_from_files (and process_file_context inside it) up to 4–6 times with the same file paths, causing the same files to be read from disk and re-parsed (CST) repeatedly.

PR #1846 approach: ContextProcessingCache

A @dataclass cache object is threaded through the call chain. On each process_file_context call it checks:

  1. Is file.read_text() result cached? → source_by_file: dict[Path, str]
  2. Is the remove_unused_definitions_by_function_names() result cached for this (file, names) combo? → pruned_source_by_key
  3. Is the full CodeString result cached for this (file, primary_names, secondary_names, type, docstrings) combo? → code_string_by_key

The architecture (4 separate passes over files) stays unchanged. The caching masks the redundancy.

# PR #1846 — cache miss pays full cost, cache hit skips file I/O + CST parse
if processing_cache is not None and cache_key in processing_cache.code_string_by_key:
    return processing_cache.code_string_by_key[cache_key]

PR #1660 approach: extract_all_contexts_from_files + threaded cst.Module

Per commit message: "Parse each file once instead of up to 16 times by: Making remove_unused_definitions_by_function_names accept/return cst.Module, threading the parsed Module through process_file_context, adding extract_all_contexts_from_files that processes all 4 context types (READ_WRITABLE, READ_ONLY, HASHING, TESTGEN) in a single per-file pass."

The redundancy is eliminated structurally — the 4 separate context type calls are replaced with one function that produces all 4 results in one pass. cst.Module objects are threaded through so CST deserialization runs once per file, not once per context type per file.

PR #1846: 4 passes × N files = 4N file reads (deduped by cache to N)
PR #1660: 1 pass × N files = N file reads (structurally)

Key differences

Aspect PR #1846 PR #1660
Mechanism Memoization cache (runtime deduplication) Structural refactor (architectural deduplication)
Correctness risk Cache key might miss a relevant dimension (e.g., helper_functions not in key) Function signature changes propagate widely
Code complexity Adds 3 dicts + cache-check boilerplate at every call site Removes code overall (-1232 lines net in code_context_extractor.py)
Cross-call safety ContextProcessingCache is scoped per get_code_optimization_context call (safe) cst.Module is passed directly (no cross-call leakage)
Jedi Project caching @cache on _get_jedi_project (process-lifetime singleton) No equivalent (not part of PR #1660's context extraction refactor)

discover_unit_tests.py — Different Purposes, No Overlap

PR #1846 PR #1660
What changed Batch SQLite writes (flush()), O(1) name lookup dict, eliminate one script.get_names() call Added existing_unit_test_count() function (22 lines, 0 deleted)
Where Inside process_test_files / TestsCache New standalone function
Purpose Reduce I/O overhead during test discovery Count tests per function for ranking boost
Conflict None — completely different parts of the file None

These changes do not conflict. PR #1660's existing_unit_test_count() sits alongside the TestsCache class without touching it. PR #1846's batching of TestsCache.insert_test() is independent.


Merge Order Implications

If #1846 merges first, then #1660:

If #1660 merges first, then #1846:


Summary Verdict

PR #1846 is a lightweight, low-risk, purely additive performance improvement that works on the current architecture. It provides real wins (fewer disk reads, fewer Jedi Project allocations, fewer SQLite round-trips) with minimal churn.

PR #1660 is a deep architectural investment — it refactors the context extraction system to be structurally correct (single-pass, no redundant parsing), alongside a large ranking overhaul. Once merged, it supersedes most of PR #1846's caching in code_context_extractor.py.

The two PRs are not redundant today because they target different bottlenecks, but they will become architecturally incompatible in code_context_extractor.py if both are in-flight and merge independently. The discover_unit_tests.py changes in both PRs are independent and can both land without conflict.


| Branch

@KRRT7
Copy link
Copy Markdown
Collaborator

KRRT7 commented Mar 17, 2026

@claude #1660 i've merged it into main, update your findings and delete your old reviews

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 17, 2026

Claude finished @KRRT7's task in 2m 57s —— View job


Claude finished @KRRT7's task ——


Updated Review: Post-#1660 Merge


Status: Needs Rebase

With #1660 now in main, this PR's code_context_extractor.py changes are based on a superseded architecture and will not merge cleanly. The two branches diverged from 4e55159d and made conflicting structural changes to the same file.

Metric This PR (HEAD) main (post-#1660)
code_context_extractor.py lines 1,598 1,736
Core architecture process_file_context + extract_code_markdown_context_from_files (4 separate passes) extract_all_contexts_from_files (1 pass, all 4 context types)
File parse caching ContextProcessingCache (runtime memoization) Structural — each file parsed once by design

What Became Obsolete After #1660

ContextProcessingCache + its threading through process_file_context / extract_code_markdown_context_from_files

These functions no longer exist in main in the same form. Main's extract_all_contexts_from_files already achieves the deduplication structurally — each file is parsed once and produces all 4 context type results in one pass. The caching layer this PR added is now solving a problem that #1660 eliminated architecturally.

jedi_project parameter plumbing into get_function_sources_from_jedi / get_function_to_optimize_as_function_source

The plumbing needs to be redone against the new architecture, but the concept still applies (see below).


What Remains Valuable and Should Be Ported

1. _get_jedi_project process-level cache — still needed

Main still calls jedi.Project(path=project_root_path) 4 times without caching:

# code_context_extractor.py in main:
line 470: jedi.Script(path=..., project=jedi.Project(path=project_root_path))  # get_function_to_optimize_as_function_source
line 508: jedi.Script(path=file_path, project=jedi.Project(path=project_root_path))  # get_function_sources_from_jedi loop
line 1106: jedi.Script(script_code, project=jedi.Project(path=project_root_path))  # _resolve_imported_class_reference
line 1266: jedi.Script(script_code, project=jedi.Project(path=project_root_path))  # extract_parameter_type_constructors

The _get_jedi_project cache from this PR is still a clean win here. Port this →

2. Windows 8.3 path fix — still missing in main

The fix from commit 7944e43e (wrapping the relative_to() call in get_code_optimization_context in try/except ValueError) was applied on this branch only. Main's get_code_optimization_context at line ~140 still has the unguarded call:

# main — still unfixed:
target_relative = function_to_optimize.file_path.resolve().relative_to(project_root_path.resolve())

This Windows bug is live in main right now. Fix this on main →

3. discover_unit_tests.py changes — fully clean, no conflicts

#1660 only added the new existing_unit_test_count function (line 39 in main). All of this PR's discover_unit_tests.py changes touch different parts of the file and can be rebased without conflict:

Change Status
flush() / batched executemany in TestsCache ✅ Clean
writes_enabled + OperationalError fallback ✅ Clean
functions_to_optimize_by_name O(1) lookup ✅ Clean
all_defs derived from all_names (saves 1 Jedi call) ✅ Clean

4. New tests — partially salvageable

  • test_unit_test_discovery.py additions: all applicable, no conflicts
  • test_code_context_extractor.py additions: need to be rewritten against the new extract_all_contexts_from_files / re_extract_from_cache API

Recommended Path Forward

  1. Immediately: Cherry-pick the Windows try/except ValueError fix to main as a standalone patch (it's a real bug currently live in production)
  2. For this PR: Rebase onto current main, then:
    • Drop ContextProcessingCache (superseded by extract_all_contexts_from_files)
    • Port get_jedi_project cache to main's 4 call sites
    • Keep all discover_unit_tests.py changes as-is
    • Rewrite the test_code_context_extractor.py additions against the new API

Previous review comments above are now superseded by this analysis.
| Branch

@KRRT7
Copy link
Copy Markdown
Collaborator

KRRT7 commented Mar 17, 2026

@aseembits93 sorry about this, I had been working on a similar thing for the call + reference graph so I will close this

@KRRT7 KRRT7 closed this Mar 17, 2026
@KRRT7 KRRT7 deleted the codex/perf-hotspots-context-and-discovery branch March 17, 2026 05:30
KRRT7 added a commit that referenced this pull request Mar 17, 2026
…ve_to bug

Port valuable improvements from #1846 that remain applicable after #1660:
- Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
- Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths)
- Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
- Batch TestsCache writes with flush() + executemany instead of per-row commit
- Gracefully disable cache writes on sqlite3.OperationalError
- Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
- Derive all_defs from all_names via is_definition() to save a redundant Jedi call
KRRT7 added a commit that referenced this pull request Mar 17, 2026
…ve_to bug

Port valuable improvements from #1846 that remain applicable after #1660:
- Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
- Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths)
- Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
- Batch TestsCache writes with flush() + executemany instead of per-row commit
- Gracefully disable cache writes on sqlite3.OperationalError
- Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
- Derive all_defs from all_names via is_definition() to save a redundant Jedi call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants