feat: improve function ranking with reference graph and test-based boosting by KRRT7 · Pull Request #1660 · codeflash-ai/codeflash

KRRT7 · 2026-02-25T10:20:05Z

Summary

Enable reference graph for dependency resolution and log callee/test counts per function during optimization
Boost functions with existing unit tests higher in the ranking (test count used as tiebreaker after addressable time / callee count)
Auto-escalate top 15 ranked functions to high effort when user hasn't set --effort
Add CallGraph model with BFS traversal, topological sort, and subgraph extraction
Auto-detect language_version in add_language_metadata when not provided (fixes python_version=None API rejection)
Refactor and simplify context extraction system (-1232 lines, -18 mypy errors): dead code removal, AST visitor simplification, helper decomposition, and PruneConfig dataclass replacing boolean params
Eliminate redundant CST parsing in get_code_optimization_context by processing all 4 context types in a single per-file pass
Performance optimizations: eager adjacency build in CallGraph, cached Path.resolve(), os.path.basename over Path().name, hasattr-based AST traversal, iterative _get_expr_name

Test plan

Added unit tests for existing_unit_test_count and ranking boost logic
Added unit tests for CallGraph (traversal, topological sort, subgraph, ancestors/descendants)
Added integration tests for ReferenceGraph.get_call_graph
Verify ranking order with a real optimization run on a multi-file project

- Add existing_unit_test_count() with parametrized test deduplication - Stable-sort ranked functions so tested ones come first - Enable reference graph resolver (was disabled) for non-CI runs - Add per-function logging with ref count and test count - Auto-upgrade top N functions to high effort when user hasn't set --effort - Add CallGraph model with traversal (BFS, topological sort, subgraph) - Add get_call_graph() to DependencyResolver protocol and ReferenceGraph - Refactor get_callees() to delegate through get_call_graph() CF-1660

The optimization inlines `qualified_name_with_modules_from_root` and wraps the expensive `module_name_from_file_path` call—which performs `Path.resolve()` and `relative_to()` operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that `module_name_from_file_path` consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.

…er_unit_tests

- Preserve callee_metadata in augment_with_trace - Use count-based test boost in trace ranking path (was binary) - Rename misleading "refs" label to "callees" - Hoist existing_unit_test_count import out of loop - Warn when topological_order drops cyclic nodes - Cache nodes property in CallGraph

…2026-03-15T01.04.08 ⚡️ Speed up function `existing_unit_test_count` by 30% in PR #1660 (`unstructured-inference`)

…rovided review_generated_tests and repair_generated_tests called add_language_metadata without language_version, sending python_version=None to the API which rejects with "Python version is required". Now falls back to current_language_support().

…paths - Build CallGraph adjacency eagerly in __post_init__ instead of lazily, eliminating per-call None checks in callers_of/callees_of hot paths - Skip file read+hash in ensure_file_indexed/build_index when the file is already in the in-memory indexed_file_hashes cache - Cache Path.resolve() results in ReferenceGraph to avoid repeated filesystem syscalls for the same paths - Reuse callee_counts from rank_by_dependency_count in the optimizer loop instead of recomputing

…ching - Use try/except KeyError instead of .get() in file_to_path cache (faster on hot path when key is usually present) - Use os.path.basename() instead of Path().name in get_optimized_code_for_module to avoid constructing Path objects from strings - Disable PTH119 ruff rule to allow os.path.basename for performance

- Split descendants/ancestors into max_depth=None fast path (no tuple packing) and depth-limited path - Cache forward/reverse dicts locally to avoid property access per iteration - Inline dict.get() instead of calling callees_of/callers_of methods - Cache nodes set in topological_order to avoid repeated property access

The in-memory cache check skipped re-reading files, so on-disk changes between calls were not detected. Keep the fast path only in build_index (one-time batch initialization) where files don't change mid-operation.

- Use binary tier sort (has tests vs no tests) instead of full count sort key, preserving stable sort order within tiers - Declare _cached_callee_counts in __init__ instead of using getattr - Pre-compute test counts before the optimization loop to avoid redundant filtering on each iteration

codeflash-ai · 2026-03-15T04:53:16Z

codeflash/models/call_graph.py

+            queue: deque[FunctionNode] = deque([node])
+            while queue:
+                current = queue.popleft()
+                for edge in reverse_map.get(current, []):
+                    if edge.caller not in visited:
+                        visited.add(edge.caller)
+                        queue.append(edge.caller)
+        else:
+            depth_queue: deque[tuple[FunctionNode, int]] = deque([(node, 0)])
+            while depth_queue:
+                current, depth = depth_queue.popleft()


⚡️Codeflash found 34% (0.34x) speedup for CallGraph.ancestors in codeflash/models/call_graph.py

⏱️ Runtime : 519 microseconds → 386 microseconds (best of 56 runs)

📝 Explanation and details

The optimization replaces deque with list for both queue data structures, switching from FIFO (popleft()) to LIFO (pop()) traversal. This change delivers a 34% runtime improvement because list.pop() is substantially faster than deque.popleft() — the profiler shows the pop operation dropping from 215.7 µs to 232.2 µs total time but with better amortized cost per hit (207.2 → 223.0 ns/hit), and critically, eliminating the overhead of deque object creation (17.1 µs → 4.2 µs for initialization). The switch from breadth-first to depth-first traversal order does not affect correctness since ancestors() returns an unordered set, and all tests including large-scale chains (1000 nodes) show consistent or improved performance with no regressions.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests ✅ 39 Passed

🌀 Generated Regression Tests ✅ 28 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

test_call_graph.py::TestAncestors.test_empty_for_root 3.43μs 2.93μs 17.1%✅

test_call_graph.py::TestAncestors.test_max_depth_limits_traversal 4.96μs 4.29μs 15.6%✅

test_call_graph.py::TestAncestors.test_transitive_ancestors 5.92μs 5.39μs 9.85%✅

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.models.call_graph import CallEdge, CallGraph, FunctionNode from pathlib import Path def test_empty_edges_no_ancestors(): cg = CallGraph(edges=[]) node = FunctionNode(file_path=Path("test.py"), qualified_name="unused") assert cg.ancestors(node) == set() # 4.74μs -> 4.11μs (15.4% faster) assert cg.ancestors(node, max_depth=0) == set() # 1.31μs -> 1.07μs (22.6% faster) def test_single_edge_basic(): a = FunctionNode(file_path=Path("test.py"), qualified_name="A") b = FunctionNode(file_path=Path("test.py"), qualified_name="B") edge = CallEdge(caller=a, callee=b, is_cross_file=False) cg = CallGraph(edges=[edge]) ancestors_of_b = cg.ancestors(b) # 2.77μs -> 2.06μs (34.4% faster) assert ancestors_of_b == {a} assert cg.ancestors(a) == set() # 922ns -> 721ns (27.9% faster) def test_multiple_parents(): a = FunctionNode(file_path=Path("test.py"), qualified_name="A") b = FunctionNode(file_path=Path("test.py"), qualified_name="B") c = FunctionNode(file_path=Path("test.py"), qualified_name="C") edges = [CallEdge(caller=a, callee=c, is_cross_file=False), CallEdge(caller=b, callee=c, is_cross_file=False)] cg = CallGraph(edges=edges) result = cg.ancestors(c) # 3.26μs -> 2.94μs (10.9% faster) assert result == {a, b} assert cg.ancestors(a) == set() # 852ns -> 692ns (23.1% faster) assert cg.ancestors(b) == set() # 681ns -> 511ns (33.3% faster) def test_transitive_and_max_depth(): a = FunctionNode(file_path=Path("test.py"), qualified_name="A") b = FunctionNode(file_path=Path("test.py"), qualified_name="B") c = FunctionNode(file_path=Path("test.py"), qualified_name="C") d = FunctionNode(file_path=Path("test.py"), qualified_name="D") edges = [ CallEdge(caller=a, callee=b, is_cross_file=False), CallEdge(caller=b, callee=c, is_cross_file=False), CallEdge(caller=c, callee=d, is_cross_file=False), ] cg = CallGraph(edges=edges) assert cg.ancestors(d) == {a, b, c} # 3.55μs -> 2.87μs (23.8% faster) assert cg.ancestors(d, max_depth=1) == {c} # 2.07μs -> 1.83μs (13.1% faster) assert cg.ancestors(d, max_depth=2) == {b, c} # 1.72μs -> 1.49μs (15.4% faster) assert cg.ancestors(d, max_depth=0) == set() # 732ns -> 531ns (37.9% faster) def test_nonexistent_node_returns_empty(): a = FunctionNode(file_path=Path("test.py"), qualified_name="A") b = FunctionNode(file_path=Path("test.py"), qualified_name="B") cg = CallGraph(edges=[CallEdge(caller=a, callee=b, is_cross_file=False)]) outsider = FunctionNode(file_path=Path("other.py"), qualified_name="outsider") assert outsider not in cg._nodes # 3.43μs -> 2.92μs (17.5% faster) assert cg.ancestors(outsider) == set() # 1.32μs -> 1.23μs (7.22% faster) assert cg.ancestors(outsider, max_depth=10) == set() def test_self_call_includes_node(): a = FunctionNode(file_path=Path("test.py"), qualified_name="A") cg = CallGraph(edges=[CallEdge(caller=a, callee=a, is_cross_file=False)]) assert cg.ancestors(a) == {a} # 2.67μs -> 2.08μs (28.4% faster) assert cg.ancestors(a, max_depth=0) == set() # 1.15μs -> 942ns (22.3% faster) assert cg.ancestors(a, max_depth=1) == {a} # 1.58μs -> 1.35μs (17.1% faster) def test_cycle_does_not_infinite_loop(): a = FunctionNode(file_path=Path("test.py"), qualified_name="A") b = FunctionNode(file_path=Path("test.py"), qualified_name="B") c = FunctionNode(file_path=Path("test.py"), qualified_name="C") edges = [ CallEdge(caller=a, callee=b, is_cross_file=False), CallEdge(caller=b, callee=c, is_cross_file=False), CallEdge(caller=c, callee=a, is_cross_file=False), ] cg = CallGraph(edges=edges) ancestors_a = cg.ancestors(a) # 3.70μs -> 2.96μs (25.1% faster) assert ancestors_a == {b, c} assert cg.ancestors(b) == {a, c} # 2.21μs -> 1.88μs (17.6% faster) assert cg.ancestors(c) == {a, b} # 1.92μs -> 1.62μs (18.5% faster) assert cg.ancestors(a, max_depth=1) == {c} # 1.79μs -> 1.53μs (17.0% faster) def test_special_character_node_names(): n1 = FunctionNode(file_path=Path("test.py"), qualified_name="func$1") n2 = FunctionNode(file_path=Path("test.py"), qualified_name="func-2") n3 = FunctionNode(file_path=Path("test.py"), qualified_name="функция3") edges = [CallEdge(caller=n1, callee=n2, is_cross_file=False), CallEdge(caller=n2, callee=n3, is_cross_file=False)] cg = CallGraph(edges=edges) assert cg.ancestors(n3) == {n1, n2} # 2.92μs -> 2.19μs (33.3% faster) assert cg.ancestors(n3, max_depth=1) == {n2} # 1.90μs -> 1.62μs (17.3% faster) def test_large_scale_chain_full_ancestors(): size = 1000 nodes = [FunctionNode(file_path=Path("test.py"), qualified_name=f"n{i}") for i in range(size)] edges = [CallEdge(caller=nodes[i], callee=nodes[i+1], is_cross_file=False) for i in range(size - 1)] cg = CallGraph(edges=edges) last = nodes[-1] ancestors_of_last = cg.ancestors(last) # 435μs -> 317μs (37.4% faster) assert len(ancestors_of_last) == size - 1 assert nodes[0] in ancestors_of_last assert nodes[size // 2] in ancestors_of_last def test_large_scale_chain_max_depth_limit(): size = 1000 nodes = [FunctionNode(file_path=Path("test.py"), qualified_name=f"m{i}") for i in range(size)] edges = [CallEdge(caller=nodes[i], callee=nodes[i+1], is_cross_file=False) for i in range(size - 1)] cg = CallGraph(edges=edges) last = nodes[-1] limited = cg.ancestors(last, max_depth=10) # 8.05μs -> 6.55μs (22.9% faster) assert len(limited) == 10 assert nodes[-2] in limited assert nodes[-11] in limited

from collections import deque from dataclasses import dataclass # imports import pytest from codeflash.models.call_graph import CallEdge, FunctionNode, CallGraph from pathlib import Path def test_ancestors_no_ancestors(): node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A") node_b = FunctionNode(file_path=Path("b.py"), qualified_name="B") edge = CallEdge(caller=node_a, callee=node_b, is_cross_file=False) graph = CallGraph(edges=[edge]) result = graph.ancestors(node_b) # 2.95μs -> 2.10μs (40.0% faster) assert node_a in result assert len(result) == 1 def test_ancestors_single_node_no_edges(): node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A") graph = CallGraph(edges=[]) result = graph.ancestors(node_a) # 4.85μs -> 4.11μs (18.1% faster) assert result == set() def test_ancestors_self_cycle(): node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A") edges = [ CallEdge(caller=node_a, callee=node_a, is_cross_file=False), ] graph = CallGraph(edges=edges) result = graph.ancestors(node_a) # 2.75μs -> 2.18μs (25.7% faster) assert result == set() def test_ancestors_max_depth_negative(): node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A") node_b = FunctionNode(file_path=Path("b.py"), qualified_name="B") node_c = FunctionNode(file_path=Path("c.py"), qualified_name="C") edges = [ CallEdge(caller=node_a, callee=node_b, is_cross_file=False), CallEdge(caller=node_b, callee=node_c, is_cross_file=False), ] graph = CallGraph(edges=edges) result = graph.ancestors(node_c, max_depth=1) # 3.13μs -> 2.39μs (30.6% faster) assert len(result) == 1 assert node_b in result

To test or edit this optimization locally git merge codeflash/optimize-pr1660-2026-03-15T04.53.15

Suggested change

queue: deque[FunctionNode] = deque([node])

while queue:

current = queue.popleft()

for edge in reverse_map.get(current, []):

if edge.caller not in visited:

visited.add(edge.caller)

queue.append(edge.caller)

else:

depth_queue: deque[tuple[FunctionNode, int]] = deque([(node, 0)])

while depth_queue:

current, depth = depth_queue.popleft()

queue: list[FunctionNode] = [node]

while queue:

current = queue.pop()

for edge in reverse_map.get(current, []):

if edge.caller not in visited:

visited.add(edge.caller)

queue.append(edge.caller)

else:

depth_queue: list[tuple[FunctionNode, int]] = [(node, 0)]

while depth_queue:

current, depth = depth_queue.pop()

…2026-03-16T19.35.10 ⚡️ Speed up function `_should_use_raw_project_class_context` by 51% in PR #1838 (`codeflash/optimize-pr1660-2026-03-16T19.13.06`)

…2026-03-16T19.13.06 ⚡️ Speed up function `collect_existing_class_names` by 188% in PR #1660 (`unstructured-inference`)

codeflash-ai · 2026-03-16T19:59:12Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function collect_existing_class_names by 188% in PR #1660 (unstructured-inference) #1838

…2026-03-16T18.33.59 ⚡️ Speed up function `_parse_and_collect_imports` by 12% in PR #1660 (`unstructured-inference`)

codeflash-ai · 2026-03-16T19:59:22Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function _parse_and_collect_imports by 12% in PR #1660 (unstructured-inference) #1837

…2026-03-16T19.38.09 ⚡️ Speed up function `_should_use_raw_project_class_context` by 42% in PR #1837 (`codeflash/optimize-pr1660-2026-03-16T18.33.59`)

Fixes ruff TC003 — Path is only used in annotations which are strings at runtime due to `from __future__ import annotations`.

Build test_count_cache once before ranking instead of calling existing_unit_test_count O(2N) times. Guard for None function_to_tests and add debug logging when effort is escalated from medium to high.

re_extract_from_cache was always calling add_needed_imports_from_module, but HASHING context should use ast.unparse(ast.parse()) to normalize whitespace for consistent hashing, matching extract_all_contexts_from_files.

KRRT7 · 2026-03-16T22:23:04Z

@claude review this PR with max effort.

claude · 2026-03-16T22:24:23Z

Claude finished @KRRT7's task in 1m 11s —— View job

PR Review Summary

Todo List

Triage PR scope (not trivial — 2694 additions, 1987 deletions across 21 files)
Run lint and type checks
Resolve stale review threads
Code review
Duplicate detection
Test coverage
Handle optimization PRs

Prek Checks

✅ Passed — ruff check and ruff format both clean.

mypy shows 246 errors across 9 files, but these are all pre-existing — none are introduced by this PR's changed lines. The one new annotation pattern worth noting is the duplicate type annotation in optimizer.py (see Code Review item 1).

Code Review

🐛 Bug / Type Error

1. Duplicate variable annotation in optimizer.py:530,536

test_count_cache is annotated with the same type in both branches of an if/else:

# optimizer.py:530
test_count_cache: dict[tuple[Path, str], int] = { ... }
# optimizer.py:536
test_count_cache: dict[tuple[Path, str], int] = {}

Mypy strict mode rejects this with Cannot redefine variable "test_count_cache" in the same scope. Fix: declare the annotation once before the if, then assign (without annotation) in each branch. Fix this →

⚠️ Design Concerns

2. Silent effort escalation (optimizer.py:575)

The top 15 ranked functions are automatically promoted from medium → high effort with only a logger.debug message — nothing in user-visible console output. Users will see higher API usage without knowing why. Consider escalating the log level to info. Fix this →

3. Reference graph disabled in CI (optimizer.py:501)

if lang_support and not env_utils.is_ci():
    resolver = lang_support.create_dependency_resolver(...)

This disables callee-count ranking, test-count logging, and callee-count logging in any CI environment. End-to-end CI tests won't exercise the ranking-boost code path. If intentional (e.g. CI has no persistent cache), add a comment explaining the trade-off.

4. Unexplained depth filter on statement edges (reference_graph.py:168)

elif definition.type == "statement":
    callee_qn = get_qualified_name(definition.module_name, definition.full_name)
    if len(callee_qn.split(".")) > 2:
        continue

Silently drops statement references with 3+ dotted parts (e.g. pkg.mod.VAR). No comment explains why — is this filtering third-party constants? Without documentation a future reader might widen or remove it incorrectly. Fix this →

🐢 Performance (Pending Optimization)

5. _find_class_node_by_name still uses ast.walk() (code_context_extractor.py:756)

def _find_class_node_by_name(class_name: str, module_tree: ast.Module) -> ast.ClassDef | None:
    return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)

The codeflash-ai review comment already identified a 2142% speedup (36 ms → 1.6 ms) via a targeted stack-based traversal. This is a new function added in this PR — it should accept the pending suggestion before merging. Fix this →

✅ What looks good

Single-pass context extraction in extract_all_contexts_from_files — one CST parse per file instead of 4 separate passes. FileContextCache and re_extract_from_cache are well-structured.
remove_unused_definitions_by_function_names now accepts str | cst.Module, eliminating re-parsing in the hot path.
existing_unit_test_count correctly deduplicates parameterized variants via base_name.split("[", 1)[0].
CallGraph.__post_init__ pre-builds forward/reverse maps for O(1) lookups.
module_name_from_file_path_cached @lru_cache is a sensible optimization.
get_call_graph on ReferenceGraph cleanly unifies get_callees and the new graph-based lookup via a single SQL query path.

Duplicate Detection

No significant duplicates. callees_from_graph in call_graph.py is the single source of truth, correctly imported by reference_graph.py.

augment_with_trace and topological_order currently have no production callers (only tests). Fine if forward-looking infrastructure — a # future use comment would help.

Test Coverage

Tests run: 3608 passed, 8 failed (pre-existing test_tracer.py failures unrelated to this PR), 57 skipped.

File	Coverage
`codeflash/models/call_graph.py`	99% ✅
`codeflash/languages/python/context/code_context_extractor.py`	80% ✅
`codeflash/discovery/discover_unit_tests.py`	76% ✅
`codeflash/languages/python/reference_graph.py`	73% ⚠️ (just under 75%)
`codeflash/optimization/optimizer.py`	29% ⚠️ (expected — hard to unit-test full pipeline)

The new _find_class_node_by_name function is only covered indirectly. The reference_graph.py miss is in the new get_call_graph paths (metadata and non-metadata branches).

Optimization Bot PRs

Closed ⚡️ Speed up function get_optimized_code_for_module by 21% in PR #1774 (feat/gradle-executor-from-java) #1849 (codeflash/optimize-pr1774-2026-03-17T02.35.54): js-cjs-function-optimization CI check was failing on target branch feat/gradle-executor-from-java.

The optimization replaced a large multi-type `isinstance()` check (13 AST node types constructed into a tuple on every iteration) with a single `hasattr(node, "body")` test, then conditionally checked for `orelse`, `finalbody`, and `handlers` only when `body` exists. Line profiler shows the original `isinstance` block consumed ~40% of runtime across 7327 calls, while the new `hasattr` checks are ~3× cheaper per call. The nested conditionals avoid calling `getattr` with default values when attributes are absent (e.g., `orelse` is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional `getattr` calls to typically one or two `hasattr` checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.

…existing_class_names

…2026-03-16T23.30.46 ⚡️ Speed up function `collect_existing_class_names` by 109% in PR #1660 (`unstructured-inference`)

codeflash-ai · 2026-03-16T23:59:19Z

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:

⚡️ Speed up function collect_existing_class_names by 109% in PR #1660 (unstructured-inference) #1845

codeflash-ai · 2026-03-17T02:10:54Z

codeflash/languages/python/context/code_context_extractor.py

+
+
+def _find_class_node_by_name(class_name: str, module_tree: ast.Module) -> ast.ClassDef | None:
+    return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)


⚡️Codeflash found 2,142% (21.42x) speedup for _find_class_node_by_name in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 36.0 milliseconds → 1.60 milliseconds (best of 5 runs)

⚡️ This change will improve the performance of the following benchmarks:

Benchmark File :: Function Original Runtime Expected New Runtime Speedup

tests.benchmarks.test_benchmark_code_extract_code_context::test_benchmark_extract 17.0 seconds 17.0 seconds 0.15%

🔻 This change will degrade the performance of the following benchmarks:

{benchmark_info_degraded}

📝 Explanation and details

The optimization replaced ast.walk() — which traverses every single node in the AST (literals, constants, expressions, etc.) — with a manual stack-based traversal that only visits nodes with a .body attribute (modules, classes, and functions). This eliminates ~97% of unnecessary node visits: profiler data shows the original spent 509 ms on ast.walk() while the optimized version completes in 21 ms. Early-return on match avoids scanning the entire tree when the target class appears early. The optimization preserves correctness by continuing to traverse function bodies where nested classes can be defined, unlike prior attempts that skipped them entirely.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 58 Passed

⏪ Replay Tests ✅ 31 Passed

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import ast # used to parse source code into an AST Module for the function under test import pytest # used for our unit tests # import the function under test using the exact module path shown in the provided context from codeflash.languages.python.context.code_context_extractor import \ _find_class_node_by_name def test_find_top_level_class_basic(): # simple source with two top-level classes; we expect to find 'Foo' source = """ class Foo: pass class Bar: pass """ # parse source into an ast.Module module_tree = ast.parse(source) # call the function to find class node by name node = _find_class_node_by_name("Foo", module_tree) # 13.5μs -> 2.00μs (572% faster) # ensure we got an ast.ClassDef back assert isinstance(node, ast.ClassDef) # ensure the found class has the expected name assert node.name == "Foo" # ensure the line number corresponds to the first class definition (should be 2) assert node.lineno == 2 def test_find_nonexistent_returns_none(): # source contains no class named 'NoSuchClass' source = "x = 1\n\ndef func():\n return x\n" module_tree = ast.parse(source) # searching for a non-existent class should return None assert _find_class_node_by_name("NoSuchClass", module_tree) is None # 17.9μs -> 2.75μs (552% faster) def test_empty_module_returns_none(): # empty module (no body) module_tree = ast.parse("") # empty source # should return None for any class name assert _find_class_node_by_name("Anything", module_tree) is None # 6.13μs -> 1.08μs (467% faster) def test_nested_class_found_before_later_top_level_class(): # nested class 'Target' appears inside a function defined before a later top-level class 'Target' source = """def outer(): class Target: pass class Target: pass """ module_tree = ast.parse(source) # the nested Target (inside outer) should be the first encountered by ast.walk node = _find_class_node_by_name("Target", module_tree) # 12.3μs -> 1.80μs (581% faster) assert isinstance(node, ast.ClassDef) # nested Target's lineno should be 2 (def outer is 1, nested class is 2) assert node.lineno == 2 # confirm the returned node indeed corresponds to the nested class by checking the indentation # nested class will have col_offset > 0 (indented), while top-level class has col_offset == 0 assert node.col_offset > 0 def test_multiple_same_name_top_level_returns_first_definition(): # two top-level classes with the same name 'Dup'; we expect the first one to be returned source = """class Dup: x = 1 class Other: pass class Dup: x = 2 """ module_tree = ast.parse(source) node = _find_class_node_by_name("Dup", module_tree) # 10.0μs -> 1.49μs (571% faster) assert isinstance(node, ast.ClassDef) # first occurrence of Dup is at line 2 assert node.lineno == 2 # ensure it's the first one by verifying the body content (the attribute value x should be 1) # the first Dup assigns x = 1, which will be in the class body as an ast.Assign assigns = [n for n in node.body if isinstance(n, ast.Assign)] # there should be at least one assignment and its value should be the constant 1 assert assigns and isinstance(assigns[0].value, ast.Constant) and assigns[0].value.value == 1 def test_special_characters_and_unicode_class_names(): # class names can contain underscores and unicode letters in Python 3 source = """class _123: pass class Ωmega: pass """ module_tree = ast.parse(source) # find the underscore-prefixed class node1 = _find_class_node_by_name("_123", module_tree) # 9.88μs -> 1.42μs (594% faster) assert isinstance(node1, ast.ClassDef) assert node1.name == "_123" # 7.20μs -> 911ns (691% faster) # find the Unicode-named class node2 = _find_class_node_by_name("Ωmega", module_tree) assert isinstance(node2, ast.ClassDef) assert node2.name == "Ωmega" def test_invalid_name_like_empty_string_returns_none(): # passing an empty string should not match any class; function should return None source = "class A:\n pass\n" module_tree = ast.parse(source) assert _find_class_node_by_name("", module_tree) is None # 9.32μs -> 2.22μs (319% faster) def test_large_scale_many_classes_and_repeated_lookups(): # construct a large module with 1000 classes named Class0 ... Class999 n = 1000 class_defs = [] for i in range(n): # each class has a simple pass body; ensure deterministic ordering class_defs.append(f"class Class{i}:\n x = {i}\n") source = "\n".join(class_defs) module_tree = ast.parse(source) # sanity check: find a few classes, including first, middle, and last node_first = _find_class_node_by_name("Class0", module_tree) # 73.4μs -> 2.11μs (3374% faster) node_mid = _find_class_node_by_name("Class500", module_tree) node_last = _find_class_node_by_name(f"Class{n-1}", module_tree) # 732μs -> 68.8μs (964% faster) assert isinstance(node_first, ast.ClassDef) and node_first.name == "Class0" assert isinstance(node_mid, ast.ClassDef) and node_mid.name == "Class500" # 1.43ms -> 109μs (1205% faster) assert isinstance(node_last, ast.ClassDef) and node_last.name == f"Class{n-1}" # repeat many lookups deterministically to exercise performance/path stability (1000 iterations) for i in range(0, n, 100): # sample every 100th to limit runtime while still exercising scale name = f"Class{i}" found = _find_class_node_by_name(name, module_tree) # 6.66ms -> 462μs (1341% faster) assert isinstance(found, ast.ClassDef) assert found.name == name def test_repeatability_same_node_identity_on_repeated_calls(): # ensure repeated calls for the same class name return the same node object (identity) since the tree is unchanged source = """class Unique: pass """ module_tree = ast.parse(source) first = _find_class_node_by_name("Unique", module_tree) # 10.5μs -> 1.38μs (657% faster) second = _find_class_node_by_name("Unique", module_tree) # both calls should return the exact same AST node object (identity) assert first is second # 5.36μs -> 561ns (855% faster)

import ast # imports import pytest from codeflash.languages.python.context.code_context_extractor import \ _find_class_node_by_name class TestFindClassNodeByNameBasic: """Basic tests for normal usage patterns with typical inputs.""" def test_find_single_class_in_simple_module(self): """Test finding a single class in a module with one class.""" code = """ class MyClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("MyClass", module_tree) # 9.47μs -> 1.43μs (561% faster) assert result is not None assert isinstance(result, ast.ClassDef) assert result.name == "MyClass" def test_find_class_among_multiple_classes(self): """Test finding a specific class when multiple classes exist.""" code = """ class FirstClass: pass class SecondClass: pass class ThirdClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("SecondClass", module_tree) # 11.3μs -> 1.58μs (615% faster) assert result is not None assert isinstance(result, ast.ClassDef) assert result.name == "SecondClass" def test_find_class_with_methods(self): """Test finding a class that contains methods.""" code = """ class ClassWithMethods: def method1(self): pass def method2(self): return 42 """ module_tree = ast.parse(code) result = _find_class_node_by_name("ClassWithMethods", module_tree) # 9.99μs -> 1.44μs (592% faster) assert result is not None assert isinstance(result, ast.ClassDef) assert result.name == "ClassWithMethods" assert len(result.body) == 2 def test_find_nested_class_by_top_level_name(self): """Test finding top-level class by name (ast.walk traverses nested classes).""" code = """ class OuterClass: class InnerClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("OuterClass", module_tree) # 9.48μs -> 1.44μs (557% faster) assert result is not None assert isinstance(result, ast.ClassDef) assert result.name == "OuterClass" def test_find_first_class_when_multiple_with_same_name(self): """Test that function returns first occurrence when multiple classes have same name.""" code = """ class DuplicateClass: pass class DuplicateClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("DuplicateClass", module_tree) # 9.53μs -> 1.36μs (600% faster) assert result is not None assert isinstance(result, ast.ClassDef) assert result.name == "DuplicateClass" def test_find_class_with_inheritance(self): """Test finding a class that inherits from another class.""" code = """ class BaseClass: pass class DerivedClass(BaseClass): pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("DerivedClass", module_tree) # 11.4μs -> 1.53μs (643% faster) assert result is not None assert isinstance(result, ast.ClassDef) assert result.name == "DerivedClass" assert len(result.bases) == 1 class TestFindClassNodeByNameEdge: """Edge case tests for unusual or boundary conditions.""" def test_class_not_found_returns_none(self): """Test that None is returned when class does not exist.""" code = """ class ExistingClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("NonExistentClass", module_tree) # 9.43μs -> 2.23μs (322% faster) assert result is None def test_empty_module(self): """Test searching in an empty module.""" code = "" module_tree = ast.parse(code) result = _find_class_node_by_name("AnyClass", module_tree) # 5.88μs -> 1.10μs (433% faster) assert result is None def test_module_with_no_classes_only_functions(self): """Test module containing functions but no classes.""" code = """ def function1(): pass def function2(): pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("SomeClass", module_tree) # 15.5μs -> 3.53μs (339% faster) assert result is None def test_module_with_no_classes_only_variables(self): """Test module with variable assignments but no classes.""" code = """ x = 42 y = "string" z = [1, 2, 3] """ module_tree = ast.parse(code) result = _find_class_node_by_name("MyClass", module_tree) # 19.8μs -> 1.89μs (945% faster) assert result is None def test_class_name_case_sensitive(self): """Test that class name search is case-sensitive.""" code = """ class MyClass: pass """ module_tree = ast.parse(code) result_lower = _find_class_node_by_name("myclass", module_tree) # 9.09μs -> 2.24μs (305% faster) result_upper = _find_class_node_by_name("MYCLASS", module_tree) assert result_lower is None # 5.21μs -> 972ns (436% faster) assert result_upper is None def test_class_with_special_characters_in_name(self): """Test finding class with underscores in name.""" code = """ class _PrivateClass: pass class __DunderClass__: pass """ module_tree = ast.parse(code) result_private = _find_class_node_by_name("_PrivateClass", module_tree) # 9.48μs -> 1.46μs (548% faster) result_dunder = _find_class_node_by_name("__DunderClass__", module_tree) assert result_private is not None # 6.64μs -> 882ns (653% faster) assert result_private.name == "_PrivateClass" assert result_dunder is not None assert result_dunder.name == "__DunderClass__" def test_empty_string_class_name(self): """Test searching for empty string as class name.""" code = """ class ValidClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("", module_tree) # 8.85μs -> 2.14μs (313% faster) assert result is None def test_class_with_class_variables(self): """Test finding class that has class variables.""" code = """ class ClassWithVariables: class_var = 42 another_var = "test" """ module_tree = ast.parse(code) result = _find_class_node_by_name("ClassWithVariables", module_tree) # 9.61μs -> 1.40μs (585% faster) assert result is not None assert isinstance(result, ast.ClassDef) def test_deeply_nested_class(self): """Test finding class nested deeply in function or another class.""" code = """ class OuterClass: class MiddleClass: class InnerClass: pass """ module_tree = ast.parse(code) # ast.walk traverses all nodes, so nested classes are also found result_outer = _find_class_node_by_name("OuterClass", module_tree) # 9.38μs -> 1.29μs (625% faster) result_middle = _find_class_node_by_name("MiddleClass", module_tree) result_inner = _find_class_node_by_name("InnerClass", module_tree) # 6.88μs -> 1.19μs (478% faster) assert result_outer is not None assert result_middle is not None # 7.36μs -> 1.02μs (621% faster) assert result_inner is not None def test_class_inside_function(self): """Test finding class defined inside a function (ast.walk still finds it).""" code = """ def create_class(): class LocalClass: pass return LocalClass """ module_tree = ast.parse(code) result = _find_class_node_by_name("LocalClass", module_tree) # 13.8μs -> 2.21μs (521% faster) # ast.walk traverses all nodes including those in functions assert result is not None assert result.name == "LocalClass" def test_whitespace_in_name_not_found(self): """Test that searching for name with whitespace doesn't match.""" code = """ class MyClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("My Class", module_tree) # 8.79μs -> 2.14μs (310% faster) assert result is None def test_partial_name_not_found(self): """Test that partial class names don't match.""" code = """ class MyLongClassName: pass """ module_tree = ast.parse(code) result_partial1 = _find_class_node_by_name("MyLong", module_tree) # 8.54μs -> 2.00μs (326% faster) result_partial2 = _find_class_node_by_name("ClassName", module_tree) assert result_partial1 is None # 5.16μs -> 972ns (431% faster) assert result_partial2 is None def test_class_with_decorators(self): """Test finding class that has decorators.""" code = """ @decorator @another_decorator class DecoratedClass: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("DecoratedClass", module_tree) # 9.85μs -> 1.32μs (645% faster) assert result is not None assert result.name == "DecoratedClass" def test_class_with_base_from_function_call(self): """Test finding class with base class from function call.""" code = """ class DynamicBase(get_base()): pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("DynamicBase", module_tree) # 9.43μs -> 1.41μs (567% faster) assert result is not None assert result.name == "DynamicBase" class TestFindClassNodeByNameLargeScale: """Large-scale tests for performance and scalability.""" def test_find_class_in_large_module_with_many_classes(self): """Test finding a class in a module with 100 classes.""" # Generate code with 100 classes class_names = [f"Class{i}" for i in range(100)] code_lines = [f"class {name}:\n pass\n" for name in class_names] code = "\n".join(code_lines) module_tree = ast.parse(code) result = _find_class_node_by_name("Class50", module_tree) # 83.1μs -> 7.51μs (1005% faster) assert result is not None assert result.name == "Class50" def test_find_class_in_module_with_many_classes_last_element(self): """Test finding last class in module with 100 classes.""" class_names = [f"Class{i}" for i in range(100)] code_lines = [f"class {name}:\n pass\n" for name in class_names] code = "\n".join(code_lines) module_tree = ast.parse(code) result = _find_class_node_by_name("Class99", module_tree) # 144μs -> 13.3μs (992% faster) assert result is not None assert result.name == "Class99" def test_find_class_in_module_with_many_classes_first_element(self): """Test finding first class in module with 100 classes.""" class_names = [f"Class{i}" for i in range(100)] code_lines = [f"class {name}:\n pass\n" for name in class_names] code = "\n".join(code_lines) module_tree = ast.parse(code) result = _find_class_node_by_name("Class0", module_tree) # 16.0μs -> 1.50μs (967% faster) assert result is not None assert result.name == "Class0" def test_nonexistent_class_in_large_module(self): """Test searching for non-existent class in large module with 200 classes.""" class_names = [f"Class{i}" for i in range(200)] code_lines = [f"class {name}:\n pass\n" for name in class_names] code = "\n".join(code_lines) module_tree = ast.parse(code) result = _find_class_node_by_name("ClassNonExistent", module_tree) # 355μs -> 74.2μs (379% faster) assert result is None def test_class_with_many_methods(self): """Test finding class that has many methods (100+).""" method_defs = "\n".join([f" def method{i}(self):\n pass" for i in range(100)]) code = f""" class ClassWithManyMethods: {method_defs} """ module_tree = ast.parse(code) result = _find_class_node_by_name("ClassWithManyMethods", module_tree) # 16.5μs -> 1.41μs (1069% faster) assert result is not None assert len(result.body) == 100 def test_module_with_mixed_large_content(self): """Test finding class in module with 500+ functions and classes mixed.""" lines = [] # Generate 250 functions for i in range(250): lines.append(f"def function{i}():\n pass\n") # Generate 250 classes for i in range(250): lines.append(f"class Class{i}:\n pass\n") code = "\n".join(lines) module_tree = ast.parse(code) # Find specific classes result_first = _find_class_node_by_name("Class0", module_tree) # 391μs -> 35.0μs (1018% faster) result_middle = _find_class_node_by_name("Class125", module_tree) result_last = _find_class_node_by_name("Class249", module_tree) # 544μs -> 45.7μs (1093% faster) assert result_first is not None assert result_middle is not None # 704μs -> 57.2μs (1131% faster) assert result_last is not None def test_class_with_many_base_classes(self): """Test finding class that inherits from many base classes.""" base_list = ", ".join([f"Base{i}" for i in range(50)]) code = f""" class MultipleInheritance({base_list}): pass """ module_tree = ast.parse(code) result = _find_class_node_by_name("MultipleInheritance", module_tree) # 13.4μs -> 1.50μs (794% faster) assert result is not None assert len(result.bases) == 50 def test_module_with_very_long_class_names(self): """Test finding class with very long name from pool of long names.""" long_name1 = "A" * 100 long_name2 = "B" * 100 code = f""" class {long_name1}: pass class {long_name2}: pass """ module_tree = ast.parse(code) result = _find_class_node_by_name(long_name1, module_tree) # 10.0μs -> 1.39μs (622% faster) assert result is not None assert result.name == long_name1

⏪ Click to see Replay Tests

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

benchmarks/codeflash_replay_tests_260k0cbn/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor__find_class_node_by_name_test_benchmark_extract 24.5ms 667μs 3565%✅

To test or edit this optimization locally git merge codeflash/optimize-pr1660-2026-03-17T02.10.53

Click to see suggested changes

Suggested change

return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)

stack = [module_tree]

while stack:

node = stack.pop()

# Only nodes with .body attribute can contain class definitions

body = getattr(node, 'body', None)

if body:

for item in body:

if isinstance(item, ast.ClassDef):

if item.name == class_name:

return item

stack.append(item)

elif isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):

stack.append(item)

return None

The optimization replaced recursive calls in `_get_expr_name` with an iterative loop that walks attribute chains once, collecting parts into a list and reversing them only at the end, eliminating function-call overhead that dominated 46% of original runtime (line profiler shows recursive calls at 1154 ns/hit vs. the new loop iterations at ~300 ns/hit). Additionally, `_expr_matches_name` now precomputes `"." + suffix` once instead of building it twice per invocation via f-strings, cutting redundant string allocations. The net 26% runtime improvement comes primarily from avoiding Python's recursion stack and reducing temporary object creation in the hot path, with all tests passing and only minor per-test slowdowns (typically 10–25%) offset by dramatic wins on deep attribute chains (up to 393% faster for 100-level nesting).

…2026-03-17T03.31.35 ⚡️ Speed up function `_expr_matches_name` by 26% in PR #1660 (`unstructured-inference`)

codeflash-ai · 2026-03-17T03:45:41Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function _expr_matches_name by 26% in PR #1660 (unstructured-inference) #1850

…e_by_name and ancestors - Fix duplicate type annotation for test_count_cache in optimizer.py - Replace ast.walk() with stack-based traversal in _find_class_node_by_name (21x speedup) - Use list instead of deque in CallGraph.ancestors (34% speedup, order doesn't matter for set result)

…ve_to bug Port valuable improvements from #1846 that remain applicable after #1660: - Cache jedi.Project instances via @cache to avoid recreating across 5 call sites - Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths) - Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup - Batch TestsCache writes with flush() + executemany instead of per-row commit - Gracefully disable cache writes on sqlite3.OperationalError - Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files - Derive all_defs from all_names via is_definition() to save a redundant Jedi call

codeflash-ai bot mentioned this pull request Feb 25, 2026

⚡️ Speed up function get_optimized_code_for_module by 12% in PR #1660 (unstructured-inference) #1662

Closed

claude bot mentioned this pull request Mar 3, 2026

chore: switch claude workflow model from opus to sonnet 4.6 #1734

Merged

2 tasks

KRRT7 force-pushed the unstructured-inference branch from 3d7dd8e to a90cda2 Compare March 15, 2026 00:54

codeflash-ai bot mentioned this pull request Mar 15, 2026

⚡️ Speed up function existing_unit_test_count by 30% in PR #1660 (unstructured-inference) #1833

Merged

github-actions bot and others added 4 commits March 15, 2026 01:06

fix: remove duplicate function, fix naming and import order in discov…

57d0d56

…er_unit_tests

Merge pull request #1833 from codeflash-ai/codeflash/optimize-pr1660-…

6510ebe

…2026-03-15T01.04.08 ⚡️ Speed up function `existing_unit_test_count` by 30% in PR #1660 (`unstructured-inference`)

codeflash-ai bot mentioned this pull request Mar 15, 2026

⚡️ Speed up method CallGraph.ancestors by 37% in PR #1660 (unstructured-inference) #1834

Closed

KRRT7 added 2 commits March 14, 2026 21:04

codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026

codeflash-ai deleted a comment from claude bot Mar 15, 2026

codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026

KRRT7 added 2 commits March 14, 2026 21:18

fix: remove in-memory fast path from ensure_file_indexed

0acc1fd

The in-memory cache check skipped re-reading files, so on-disk changes between calls were not detected. Keep the fast path only in build_index (one-time batch initialization) where files don't change mid-operation.

codeflash-ai deleted a comment from claude bot Mar 15, 2026

codeflash-ai bot reviewed Mar 15, 2026

View reviewed changes

fix ranking boost ordering and statement helper extraction

cee12fe

github-actions bot and others added 3 commits March 16, 2026 19:41

style: fix trailing whitespace on blank lines (W293)

f7d0e79

Merge pull request #1839 from codeflash-ai/codeflash/optimize-pr1838-…

eadb3a2

…2026-03-16T19.35.10 ⚡️ Speed up function `_should_use_raw_project_class_context` by 51% in PR #1838 (`codeflash/optimize-pr1660-2026-03-16T19.13.06`)

Merge pull request #1838 from codeflash-ai/codeflash/optimize-pr1660-…

299853b

…2026-03-16T19.13.06 ⚡️ Speed up function `collect_existing_class_names` by 188% in PR #1660 (`unstructured-inference`)

Merge pull request #1837 from codeflash-ai/codeflash/optimize-pr1660-…

c7e3426

…2026-03-16T18.33.59 ⚡️ Speed up function `_parse_and_collect_imports` by 12% in PR #1660 (`unstructured-inference`)

KRRT7 and others added 4 commits March 16, 2026 14:01

Merge pull request #1840 from codeflash-ai/codeflash/optimize-pr1837-…

e55e552

…2026-03-16T19.38.09 ⚡️ Speed up function `_should_use_raw_project_class_context` by 42% in PR #1837 (`codeflash/optimize-pr1660-2026-03-16T18.33.59`)

fix: move Path import into TYPE_CHECKING block in code_extractor.py

1d1d183

Fixes ruff TC003 — Path is only used in annotations which are strings at runtime due to `from __future__ import annotations`.

fix: deduplicate test count calls, guard None, and log effort escalation

2cafadb

Build test_count_cache once before ranking instead of calling existing_unit_test_count O(2N) times. Guard for None function_to_tests and add debug logging when effort is escalated from medium to high.

fix: normalize HASHING context in re_extract_from_cache

85d0d7c

re_extract_from_cache was always calling add_needed_imports_from_module, but HASHING context should use ast.unparse(ast.parse()) to normalize whitespace for consistent hashing, matching extract_all_contexts_from_files.

codeflash-ai deleted a comment from claude bot Mar 16, 2026

codeflash-ai bot mentioned this pull request Mar 16, 2026

⚡️ Speed up function collect_existing_class_names by 109% in PR #1660 (unstructured-inference) #1845

Merged

github-actions bot and others added 2 commits March 16, 2026 23:33

fix: add explicit list[ast.AST] type annotation for stack in collect_…

a508ef6

…existing_class_names

Merge pull request #1845 from codeflash-ai/codeflash/optimize-pr1660-…

c0577e5

…2026-03-16T23.30.46 ⚡️ Speed up function `collect_existing_class_names` by 109% in PR #1660 (`unstructured-inference`)

KRRT7 mentioned this pull request Mar 17, 2026

Improve context extraction and test discovery performance #1846

Closed

codeflash-ai bot reviewed Mar 17, 2026

View reviewed changes

codeflash-ai bot mentioned this pull request Mar 17, 2026

⚡️ Speed up function _expr_matches_name by 26% in PR #1660 (unstructured-inference) #1850

Merged

Merge pull request #1850 from codeflash-ai/codeflash/optimize-pr1660-…

b809cb3

…2026-03-17T03.31.35 ⚡️ Speed up function `_expr_matches_name` by 26% in PR #1660 (`unstructured-inference`)

KRRT7 merged commit a0a2a85 into main Mar 17, 2026
25 of 28 checks passed

KRRT7 deleted the unstructured-inference branch March 17, 2026 05:05

KRRT7 mentioned this pull request Mar 17, 2026

perf: cache jedi project, batch test cache writes, fix Windows relative_to bug #1852

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve function ranking with reference graph and test-based boosting#1660

feat: improve function ranking with reference graph and test-based boosting#1660
KRRT7 merged 41 commits intomainfrom
unstructured-inference

KRRT7 commented Feb 25, 2026 •

edited

Loading

Uh oh!

codeflash-ai bot Mar 15, 2026

Uh oh!

codeflash-ai bot commented Mar 16, 2026

Uh oh!

codeflash-ai bot commented Mar 16, 2026

Uh oh!

KRRT7 commented Mar 16, 2026

Uh oh!

claude bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

codeflash-ai bot commented Mar 16, 2026

Uh oh!

codeflash-ai bot Mar 17, 2026

Uh oh!

codeflash-ai bot commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test	Status
⚙️ Existing Unit Tests	✅ 39 Passed
🌀 Generated Regression Tests	✅ 28 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_call_graph.py::TestAncestors.test_empty_for_root`	3.43μs	2.93μs	17.1%✅
`test_call_graph.py::TestAncestors.test_max_depth_limits_traversal`	4.96μs	4.29μs	15.6%✅
`test_call_graph.py::TestAncestors.test_transitive_ancestors`	5.92μs	5.39μs	9.85%✅



		def _find_class_node_by_name(class_name: str, module_tree: ast.Module) -> ast.ClassDef \| None:
		return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)

-    return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)
+    stack = [module_tree]
+    while stack:
+        node = stack.pop()
+        # Only nodes with .body attribute can contain class definitions
+        body = getattr(node, 'body', None)
+        if body:
+            for item in body:
+                if isinstance(item, ast.ClassDef):
+                    if item.name == class_name:
+                        return item
+                    stack.append(item)
+                elif isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
+                    stack.append(item)
+    return None

Conversation

KRRT7 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

codeflash-ai bot Mar 15, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 34% (0.34x) speedup for CallGraph.ancestors in codeflash/models/call_graph.py

Uh oh!

codeflash-ai bot commented Mar 16, 2026

Uh oh!

codeflash-ai bot commented Mar 16, 2026

Uh oh!

KRRT7 commented Mar 16, 2026

Uh oh!

claude bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Todo List

Prek Checks

Code Review

🐛 Bug / Type Error

⚠️ Design Concerns

🐢 Performance (Pending Optimization)

✅ What looks good

Duplicate Detection

Test Coverage

Optimization Bot PRs

Uh oh!

codeflash-ai bot commented Mar 16, 2026

Uh oh!

codeflash-ai bot Mar 17, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 2,142% (21.42x) speedup for _find_class_node_by_name in codeflash/languages/python/context/code_context_extractor.py

⚡️ This change will improve the performance of the following benchmarks:

🔻 This change will degrade the performance of the following benchmarks:

Uh oh!

codeflash-ai bot commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented Feb 25, 2026 •

edited

Loading

⚡️Codeflash found 34% (0.34x) speedup for `CallGraph.ancestors` in `codeflash/models/call_graph.py`

claude bot commented Mar 16, 2026 •

edited

Loading

⚡️Codeflash found 2,142% (21.42x) speedup for `_find_class_node_by_name` in `codeflash/languages/python/context/code_context_extractor.py`