Skip to content

feat: improve function ranking with reference graph and test-based boosting#1660

Merged
KRRT7 merged 41 commits intomainfrom
unstructured-inference
Mar 17, 2026
Merged

feat: improve function ranking with reference graph and test-based boosting#1660
KRRT7 merged 41 commits intomainfrom
unstructured-inference

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Feb 25, 2026

Summary

  • Enable reference graph for dependency resolution and log callee/test counts per function during optimization
  • Boost functions with existing unit tests higher in the ranking (test count used as tiebreaker after addressable time / callee count)
  • Auto-escalate top 15 ranked functions to high effort when user hasn't set --effort
  • Add CallGraph model with BFS traversal, topological sort, and subgraph extraction
  • Auto-detect language_version in add_language_metadata when not provided (fixes python_version=None API rejection)
  • Refactor and simplify context extraction system (-1232 lines, -18 mypy errors): dead code removal, AST visitor simplification, helper decomposition, and PruneConfig dataclass replacing boolean params
  • Eliminate redundant CST parsing in get_code_optimization_context by processing all 4 context types in a single per-file pass
  • Performance optimizations: eager adjacency build in CallGraph, cached Path.resolve(), os.path.basename over Path().name, hasattr-based AST traversal, iterative _get_expr_name

Test plan

  • Added unit tests for existing_unit_test_count and ranking boost logic
  • Added unit tests for CallGraph (traversal, topological sort, subgraph, ancestors/descendants)
  • Added integration tests for ReferenceGraph.get_call_graph
  • Verify ranking order with a real optimization run on a multi-file project

- Add existing_unit_test_count() with parametrized test deduplication
- Stable-sort ranked functions so tested ones come first
- Enable reference graph resolver (was disabled) for non-CI runs
- Add per-function logging with ref count and test count
- Auto-upgrade top N functions to high effort when user hasn't set --effort
- Add CallGraph model with traversal (BFS, topological sort, subgraph)
- Add get_call_graph() to DependencyResolver protocol and ReferenceGraph
- Refactor get_callees() to delegate through get_call_graph()

CF-1660
@KRRT7 KRRT7 force-pushed the unstructured-inference branch from 3d7dd8e to a90cda2 Compare March 15, 2026 00:54
The optimization inlines `qualified_name_with_modules_from_root` and wraps the expensive `module_name_from_file_path` call—which performs `Path.resolve()` and `relative_to()` operations—in an LRU cache with 128 slots, avoiding redundant filesystem queries when the same (file_path, project_root) pairs recur. Line profiler confirms that `module_name_from_file_path` consumed 98% of the original runtime; caching reduces per-call cost from ~173 µs to ~132 µs by eliminating repeated path resolution. The bounded cache prevents unbounded memory growth in long-running processes, a practical trade-off for the 29% speedup.
github-actions bot and others added 4 commits March 15, 2026 01:06
- Preserve callee_metadata in augment_with_trace
- Use count-based test boost in trace ranking path (was binary)
- Rename misleading "refs" label to "callees"
- Hoist existing_unit_test_count import out of loop
- Warn when topological_order drops cyclic nodes
- Cache nodes property in CallGraph
…2026-03-15T01.04.08

⚡️ Speed up function `existing_unit_test_count` by 30% in PR #1660 (`unstructured-inference`)
…rovided

review_generated_tests and repair_generated_tests called add_language_metadata
without language_version, sending python_version=None to the API which rejects
with "Python version is required". Now falls back to current_language_support().
KRRT7 added 2 commits March 14, 2026 21:04
…paths

- Build CallGraph adjacency eagerly in __post_init__ instead of lazily,
  eliminating per-call None checks in callers_of/callees_of hot paths
- Skip file read+hash in ensure_file_indexed/build_index when the file
  is already in the in-memory indexed_file_hashes cache
- Cache Path.resolve() results in ReferenceGraph to avoid repeated
  filesystem syscalls for the same paths
- Reuse callee_counts from rank_by_dependency_count in the optimizer
  loop instead of recomputing
…ching

- Use try/except KeyError instead of .get() in file_to_path cache
  (faster on hot path when key is usually present)
- Use os.path.basename() instead of Path().name in get_optimized_code_for_module
  to avoid constructing Path objects from strings
- Disable PTH119 ruff rule to allow os.path.basename for performance
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from claude bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
- Split descendants/ancestors into max_depth=None fast path (no tuple
  packing) and depth-limited path
- Cache forward/reverse dicts locally to avoid property access per
  iteration
- Inline dict.get() instead of calling callees_of/callers_of methods
- Cache nodes set in topological_order to avoid repeated property access
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from codeflash-ai bot Mar 15, 2026
KRRT7 added 2 commits March 14, 2026 21:18
The in-memory cache check skipped re-reading files, so on-disk changes
between calls were not detected. Keep the fast path only in build_index
(one-time batch initialization) where files don't change mid-operation.
- Use binary tier sort (has tests vs no tests) instead of full count
  sort key, preserving stable sort order within tiers
- Declare _cached_callee_counts in __init__ instead of using getattr
- Pre-compute test counts before the optimization loop to avoid
  redundant filtering on each iteration
@codeflash-ai codeflash-ai deleted a comment from claude bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from claude bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from claude bot Mar 15, 2026
@codeflash-ai codeflash-ai deleted a comment from claude bot Mar 15, 2026
Comment on lines +104 to +114
queue: deque[FunctionNode] = deque([node])
while queue:
current = queue.popleft()
for edge in reverse_map.get(current, []):
if edge.caller not in visited:
visited.add(edge.caller)
queue.append(edge.caller)
else:
depth_queue: deque[tuple[FunctionNode, int]] = deque([(node, 0)])
while depth_queue:
current, depth = depth_queue.popleft()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 34% (0.34x) speedup for CallGraph.ancestors in codeflash/models/call_graph.py

⏱️ Runtime : 519 microseconds 386 microseconds (best of 56 runs)

📝 Explanation and details

The optimization replaces deque with list for both queue data structures, switching from FIFO (popleft()) to LIFO (pop()) traversal. This change delivers a 34% runtime improvement because list.pop() is substantially faster than deque.popleft() — the profiler shows the pop operation dropping from 215.7 µs to 232.2 µs total time but with better amortized cost per hit (207.2 → 223.0 ns/hit), and critically, eliminating the overhead of deque object creation (17.1 µs → 4.2 µs for initialization). The switch from breadth-first to depth-first traversal order does not affect correctness since ancestors() returns an unordered set, and all tests including large-scale chains (1000 nodes) show consistent or improved performance with no regressions.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 39 Passed
🌀 Generated Regression Tests 28 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_call_graph.py::TestAncestors.test_empty_for_root 3.43μs 2.93μs 17.1%✅
test_call_graph.py::TestAncestors.test_max_depth_limits_traversal 4.96μs 4.29μs 15.6%✅
test_call_graph.py::TestAncestors.test_transitive_ancestors 5.92μs 5.39μs 9.85%✅
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from codeflash.models.call_graph import CallEdge, CallGraph, FunctionNode
from pathlib import Path

def test_empty_edges_no_ancestors():
    cg = CallGraph(edges=[])
    node = FunctionNode(file_path=Path("test.py"), qualified_name="unused")
    assert cg.ancestors(node) == set() # 4.74μs -> 4.11μs (15.4% faster)
    assert cg.ancestors(node, max_depth=0) == set() # 1.31μs -> 1.07μs (22.6% faster)

def test_single_edge_basic():
    a = FunctionNode(file_path=Path("test.py"), qualified_name="A")
    b = FunctionNode(file_path=Path("test.py"), qualified_name="B")
    edge = CallEdge(caller=a, callee=b, is_cross_file=False)
    cg = CallGraph(edges=[edge])
    ancestors_of_b = cg.ancestors(b) # 2.77μs -> 2.06μs (34.4% faster)
    assert ancestors_of_b == {a}
    assert cg.ancestors(a) == set() # 922ns -> 721ns (27.9% faster)

def test_multiple_parents():
    a = FunctionNode(file_path=Path("test.py"), qualified_name="A")
    b = FunctionNode(file_path=Path("test.py"), qualified_name="B")
    c = FunctionNode(file_path=Path("test.py"), qualified_name="C")
    edges = [CallEdge(caller=a, callee=c, is_cross_file=False), CallEdge(caller=b, callee=c, is_cross_file=False)]
    cg = CallGraph(edges=edges)
    result = cg.ancestors(c) # 3.26μs -> 2.94μs (10.9% faster)
    assert result == {a, b}
    assert cg.ancestors(a) == set() # 852ns -> 692ns (23.1% faster)
    assert cg.ancestors(b) == set() # 681ns -> 511ns (33.3% faster)

def test_transitive_and_max_depth():
    a = FunctionNode(file_path=Path("test.py"), qualified_name="A")
    b = FunctionNode(file_path=Path("test.py"), qualified_name="B")
    c = FunctionNode(file_path=Path("test.py"), qualified_name="C")
    d = FunctionNode(file_path=Path("test.py"), qualified_name="D")
    edges = [
        CallEdge(caller=a, callee=b, is_cross_file=False),
        CallEdge(caller=b, callee=c, is_cross_file=False),
        CallEdge(caller=c, callee=d, is_cross_file=False),
    ]
    cg = CallGraph(edges=edges)
    assert cg.ancestors(d) == {a, b, c} # 3.55μs -> 2.87μs (23.8% faster)
    assert cg.ancestors(d, max_depth=1) == {c} # 2.07μs -> 1.83μs (13.1% faster)
    assert cg.ancestors(d, max_depth=2) == {b, c} # 1.72μs -> 1.49μs (15.4% faster)
    assert cg.ancestors(d, max_depth=0) == set() # 732ns -> 531ns (37.9% faster)

def test_nonexistent_node_returns_empty():
    a = FunctionNode(file_path=Path("test.py"), qualified_name="A")
    b = FunctionNode(file_path=Path("test.py"), qualified_name="B")
    cg = CallGraph(edges=[CallEdge(caller=a, callee=b, is_cross_file=False)])
    outsider = FunctionNode(file_path=Path("other.py"), qualified_name="outsider")
    assert outsider not in cg._nodes # 3.43μs -> 2.92μs (17.5% faster)
    assert cg.ancestors(outsider) == set() # 1.32μs -> 1.23μs (7.22% faster)
    assert cg.ancestors(outsider, max_depth=10) == set()

def test_self_call_includes_node():
    a = FunctionNode(file_path=Path("test.py"), qualified_name="A")
    cg = CallGraph(edges=[CallEdge(caller=a, callee=a, is_cross_file=False)])
    assert cg.ancestors(a) == {a} # 2.67μs -> 2.08μs (28.4% faster)
    assert cg.ancestors(a, max_depth=0) == set() # 1.15μs -> 942ns (22.3% faster)
    assert cg.ancestors(a, max_depth=1) == {a} # 1.58μs -> 1.35μs (17.1% faster)

def test_cycle_does_not_infinite_loop():
    a = FunctionNode(file_path=Path("test.py"), qualified_name="A")
    b = FunctionNode(file_path=Path("test.py"), qualified_name="B")
    c = FunctionNode(file_path=Path("test.py"), qualified_name="C")
    edges = [
        CallEdge(caller=a, callee=b, is_cross_file=False),
        CallEdge(caller=b, callee=c, is_cross_file=False),
        CallEdge(caller=c, callee=a, is_cross_file=False),
    ]
    cg = CallGraph(edges=edges)
    ancestors_a = cg.ancestors(a) # 3.70μs -> 2.96μs (25.1% faster)
    assert ancestors_a == {b, c}
    assert cg.ancestors(b) == {a, c} # 2.21μs -> 1.88μs (17.6% faster)
    assert cg.ancestors(c) == {a, b} # 1.92μs -> 1.62μs (18.5% faster)
    assert cg.ancestors(a, max_depth=1) == {c} # 1.79μs -> 1.53μs (17.0% faster)

def test_special_character_node_names():
    n1 = FunctionNode(file_path=Path("test.py"), qualified_name="func$1")
    n2 = FunctionNode(file_path=Path("test.py"), qualified_name="func-2")
    n3 = FunctionNode(file_path=Path("test.py"), qualified_name="функция3")
    edges = [CallEdge(caller=n1, callee=n2, is_cross_file=False), CallEdge(caller=n2, callee=n3, is_cross_file=False)]
    cg = CallGraph(edges=edges)
    assert cg.ancestors(n3) == {n1, n2} # 2.92μs -> 2.19μs (33.3% faster)
    assert cg.ancestors(n3, max_depth=1) == {n2} # 1.90μs -> 1.62μs (17.3% faster)

def test_large_scale_chain_full_ancestors():
    size = 1000
    nodes = [FunctionNode(file_path=Path("test.py"), qualified_name=f"n{i}") for i in range(size)]
    edges = [CallEdge(caller=nodes[i], callee=nodes[i+1], is_cross_file=False) for i in range(size - 1)]
    cg = CallGraph(edges=edges)
    last = nodes[-1]
    ancestors_of_last = cg.ancestors(last) # 435μs -> 317μs (37.4% faster)
    assert len(ancestors_of_last) == size - 1
    assert nodes[0] in ancestors_of_last
    assert nodes[size // 2] in ancestors_of_last

def test_large_scale_chain_max_depth_limit():
    size = 1000
    nodes = [FunctionNode(file_path=Path("test.py"), qualified_name=f"m{i}") for i in range(size)]
    edges = [CallEdge(caller=nodes[i], callee=nodes[i+1], is_cross_file=False) for i in range(size - 1)]
    cg = CallGraph(edges=edges)
    last = nodes[-1]
    limited = cg.ancestors(last, max_depth=10) # 8.05μs -> 6.55μs (22.9% faster)
    assert len(limited) == 10
    assert nodes[-2] in limited
    assert nodes[-11] in limited
from collections import deque
from dataclasses import dataclass

# imports
import pytest
from codeflash.models.call_graph import CallEdge, FunctionNode, CallGraph
from pathlib import Path

def test_ancestors_no_ancestors():
    node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A")
    node_b = FunctionNode(file_path=Path("b.py"), qualified_name="B")
    edge = CallEdge(caller=node_a, callee=node_b, is_cross_file=False)
    graph = CallGraph(edges=[edge])
    
    result = graph.ancestors(node_b) # 2.95μs -> 2.10μs (40.0% faster)
    assert node_a in result
    assert len(result) == 1

def test_ancestors_single_node_no_edges():
    node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A")
    graph = CallGraph(edges=[])
    
    result = graph.ancestors(node_a) # 4.85μs -> 4.11μs (18.1% faster)
    assert result == set()

def test_ancestors_self_cycle():
    node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A")
    edges = [
        CallEdge(caller=node_a, callee=node_a, is_cross_file=False),
    ]
    graph = CallGraph(edges=edges)
    
    result = graph.ancestors(node_a) # 2.75μs -> 2.18μs (25.7% faster)
    assert result == set()

def test_ancestors_max_depth_negative():
    node_a = FunctionNode(file_path=Path("a.py"), qualified_name="A")
    node_b = FunctionNode(file_path=Path("b.py"), qualified_name="B")
    node_c = FunctionNode(file_path=Path("c.py"), qualified_name="C")
    edges = [
        CallEdge(caller=node_a, callee=node_b, is_cross_file=False),
        CallEdge(caller=node_b, callee=node_c, is_cross_file=False),
    ]
    graph = CallGraph(edges=edges)
    
    result = graph.ancestors(node_c, max_depth=1) # 3.13μs -> 2.39μs (30.6% faster)
    assert len(result) == 1
    assert node_b in result

To test or edit this optimization locally git merge codeflash/optimize-pr1660-2026-03-15T04.53.15

Suggested change
queue: deque[FunctionNode] = deque([node])
while queue:
current = queue.popleft()
for edge in reverse_map.get(current, []):
if edge.caller not in visited:
visited.add(edge.caller)
queue.append(edge.caller)
else:
depth_queue: deque[tuple[FunctionNode, int]] = deque([(node, 0)])
while depth_queue:
current, depth = depth_queue.popleft()
queue: list[FunctionNode] = [node]
while queue:
current = queue.pop()
for edge in reverse_map.get(current, []):
if edge.caller not in visited:
visited.add(edge.caller)
queue.append(edge.caller)
else:
depth_queue: list[tuple[FunctionNode, int]] = [(node, 0)]
while depth_queue:
current, depth = depth_queue.pop()

Static Badge

github-actions bot and others added 3 commits March 16, 2026 19:41
…2026-03-16T19.35.10

⚡️ Speed up function `_should_use_raw_project_class_context` by 51% in PR #1838 (`codeflash/optimize-pr1660-2026-03-16T19.13.06`)
…2026-03-16T19.13.06

⚡️ Speed up function `collect_existing_class_names` by 188% in PR #1660 (`unstructured-inference`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Mar 16, 2026

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

…2026-03-16T18.33.59

⚡️ Speed up function `_parse_and_collect_imports` by 12% in PR #1660 (`unstructured-inference`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Mar 16, 2026

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

KRRT7 and others added 4 commits March 16, 2026 14:01
…2026-03-16T19.38.09

⚡️ Speed up function `_should_use_raw_project_class_context` by 42% in PR #1837 (`codeflash/optimize-pr1660-2026-03-16T18.33.59`)
Fixes ruff TC003 — Path is only used in annotations which are strings
at runtime due to `from __future__ import annotations`.
Build test_count_cache once before ranking instead of calling
existing_unit_test_count O(2N) times. Guard for None function_to_tests
and add debug logging when effort is escalated from medium to high.
re_extract_from_cache was always calling add_needed_imports_from_module,
but HASHING context should use ast.unparse(ast.parse()) to normalize
whitespace for consistent hashing, matching extract_all_contexts_from_files.
@codeflash-ai codeflash-ai deleted a comment from claude bot Mar 16, 2026
@KRRT7
Copy link
Copy Markdown
Collaborator Author

KRRT7 commented Mar 16, 2026

@claude review this PR with max effort.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 16, 2026

Claude finished @KRRT7's task in 1m 11s —— View job


PR Review Summary

Todo List

  • Triage PR scope (not trivial — 2694 additions, 1987 deletions across 21 files)
  • Run lint and type checks
  • Resolve stale review threads
  • Code review
  • Duplicate detection
  • Test coverage
  • Handle optimization PRs

Prek Checks

Passedruff check and ruff format both clean.

mypy shows 246 errors across 9 files, but these are all pre-existing — none are introduced by this PR's changed lines. The one new annotation pattern worth noting is the duplicate type annotation in optimizer.py (see Code Review item 1).


Code Review

🐛 Bug / Type Error

1. Duplicate variable annotation in optimizer.py:530,536

test_count_cache is annotated with the same type in both branches of an if/else:

# optimizer.py:530
test_count_cache: dict[tuple[Path, str], int] = { ... }
# optimizer.py:536
test_count_cache: dict[tuple[Path, str], int] = {}

Mypy strict mode rejects this with Cannot redefine variable "test_count_cache" in the same scope. Fix: declare the annotation once before the if, then assign (without annotation) in each branch. Fix this →

⚠️ Design Concerns

2. Silent effort escalation (optimizer.py:575)

The top 15 ranked functions are automatically promoted from mediumhigh effort with only a logger.debug message — nothing in user-visible console output. Users will see higher API usage without knowing why. Consider escalating the log level to info. Fix this →

3. Reference graph disabled in CI (optimizer.py:501)

if lang_support and not env_utils.is_ci():
    resolver = lang_support.create_dependency_resolver(...)

This disables callee-count ranking, test-count logging, and callee-count logging in any CI environment. End-to-end CI tests won't exercise the ranking-boost code path. If intentional (e.g. CI has no persistent cache), add a comment explaining the trade-off.

4. Unexplained depth filter on statement edges (reference_graph.py:168)

elif definition.type == "statement":
    callee_qn = get_qualified_name(definition.module_name, definition.full_name)
    if len(callee_qn.split(".")) > 2:
        continue

Silently drops statement references with 3+ dotted parts (e.g. pkg.mod.VAR). No comment explains why — is this filtering third-party constants? Without documentation a future reader might widen or remove it incorrectly. Fix this →

🐢 Performance (Pending Optimization)

5. _find_class_node_by_name still uses ast.walk() (code_context_extractor.py:756)

def _find_class_node_by_name(class_name: str, module_tree: ast.Module) -> ast.ClassDef | None:
    return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)

The codeflash-ai review comment already identified a 2142% speedup (36 ms → 1.6 ms) via a targeted stack-based traversal. This is a new function added in this PR — it should accept the pending suggestion before merging. Fix this →

✅ What looks good

  • Single-pass context extraction in extract_all_contexts_from_files — one CST parse per file instead of 4 separate passes. FileContextCache and re_extract_from_cache are well-structured.
  • remove_unused_definitions_by_function_names now accepts str | cst.Module, eliminating re-parsing in the hot path.
  • existing_unit_test_count correctly deduplicates parameterized variants via base_name.split("[", 1)[0].
  • CallGraph.__post_init__ pre-builds forward/reverse maps for O(1) lookups.
  • module_name_from_file_path_cached @lru_cache is a sensible optimization.
  • get_call_graph on ReferenceGraph cleanly unifies get_callees and the new graph-based lookup via a single SQL query path.

Duplicate Detection

No significant duplicates. callees_from_graph in call_graph.py is the single source of truth, correctly imported by reference_graph.py.

augment_with_trace and topological_order currently have no production callers (only tests). Fine if forward-looking infrastructure — a # future use comment would help.


Test Coverage

Tests run: 3608 passed, 8 failed (pre-existing test_tracer.py failures unrelated to this PR), 57 skipped.

File Coverage
codeflash/models/call_graph.py 99% ✅
codeflash/languages/python/context/code_context_extractor.py 80% ✅
codeflash/discovery/discover_unit_tests.py 76% ✅
codeflash/languages/python/reference_graph.py 73% ⚠️ (just under 75%)
codeflash/optimization/optimizer.py 29% ⚠️ (expected — hard to unit-test full pipeline)

The new _find_class_node_by_name function is only covered indirectly. The reference_graph.py miss is in the new get_call_graph paths (metadata and non-metadata branches).


Optimization Bot PRs


The optimization replaced a large multi-type `isinstance()` check (13 AST node types constructed into a tuple on every iteration) with a single `hasattr(node, "body")` test, then conditionally checked for `orelse`, `finalbody`, and `handlers` only when `body` exists. Line profiler shows the original `isinstance` block consumed ~40% of runtime across 7327 calls, while the new `hasattr` checks are ~3× cheaper per call. The nested conditionals avoid calling `getattr` with default values when attributes are absent (e.g., `orelse` is missing in 85% of nodes), cutting wasted attribute lookups from four unconditional `getattr` calls to typically one or two `hasattr` checks plus direct accesses. Across 59 test runs processing ~7300 AST nodes each, this yields a 109% speedup with identical correctness.
github-actions bot and others added 2 commits March 16, 2026 23:33
…2026-03-16T23.30.46

⚡️ Speed up function `collect_existing_class_names` by 109% in PR #1660 (`unstructured-inference`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Mar 16, 2026

This PR is now faster! 🚀 @claude[bot] accepted my optimizations from:



def _find_class_node_by_name(class_name: str, module_tree: ast.Module) -> ast.ClassDef | None:
return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 2,142% (21.42x) speedup for _find_class_node_by_name in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 36.0 milliseconds 1.60 milliseconds (best of 5 runs)

⚡️ This change will improve the performance of the following benchmarks:

Benchmark File :: Function Original Runtime Expected New Runtime Speedup
tests.benchmarks.test_benchmark_code_extract_code_context::test_benchmark_extract 17.0 seconds 17.0 seconds 0.15%

🔻 This change will degrade the performance of the following benchmarks:

{benchmark_info_degraded}

📝 Explanation and details

The optimization replaced ast.walk() — which traverses every single node in the AST (literals, constants, expressions, etc.) — with a manual stack-based traversal that only visits nodes with a .body attribute (modules, classes, and functions). This eliminates ~97% of unnecessary node visits: profiler data shows the original spent 509 ms on ast.walk() while the optimized version completes in 21 ms. Early-return on match avoids scanning the entire tree when the target class appears early. The optimization preserves correctness by continuing to traverse function bodies where nested classes can be defined, unlike prior attempts that skipped them entirely.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 58 Passed
⏪ Replay Tests 31 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import ast  # used to parse source code into an AST Module for the function under test

import pytest  # used for our unit tests
# import the function under test using the exact module path shown in the provided context
from codeflash.languages.python.context.code_context_extractor import \
    _find_class_node_by_name

def test_find_top_level_class_basic():
    # simple source with two top-level classes; we expect to find 'Foo'
    source = """
class Foo:
    pass

class Bar:
    pass
"""
    # parse source into an ast.Module
    module_tree = ast.parse(source)
    # call the function to find class node by name
    node = _find_class_node_by_name("Foo", module_tree) # 13.5μs -> 2.00μs (572% faster)
    # ensure we got an ast.ClassDef back
    assert isinstance(node, ast.ClassDef)
    # ensure the found class has the expected name
    assert node.name == "Foo"
    # ensure the line number corresponds to the first class definition (should be 2)
    assert node.lineno == 2

def test_find_nonexistent_returns_none():
    # source contains no class named 'NoSuchClass'
    source = "x = 1\n\ndef func():\n    return x\n"
    module_tree = ast.parse(source)
    # searching for a non-existent class should return None
    assert _find_class_node_by_name("NoSuchClass", module_tree) is None # 17.9μs -> 2.75μs (552% faster)

def test_empty_module_returns_none():
    # empty module (no body)
    module_tree = ast.parse("")  # empty source
    # should return None for any class name
    assert _find_class_node_by_name("Anything", module_tree) is None # 6.13μs -> 1.08μs (467% faster)

def test_nested_class_found_before_later_top_level_class():
    # nested class 'Target' appears inside a function defined before a later top-level class 'Target'
    source = """def outer():
    class Target:
        pass

class Target:
    pass
"""
    module_tree = ast.parse(source)
    # the nested Target (inside outer) should be the first encountered by ast.walk
    node = _find_class_node_by_name("Target", module_tree) # 12.3μs -> 1.80μs (581% faster)
    assert isinstance(node, ast.ClassDef)
    # nested Target's lineno should be 2 (def outer is 1, nested class is 2)
    assert node.lineno == 2
    # confirm the returned node indeed corresponds to the nested class by checking the indentation
    # nested class will have col_offset > 0 (indented), while top-level class has col_offset == 0
    assert node.col_offset > 0

def test_multiple_same_name_top_level_returns_first_definition():
    # two top-level classes with the same name 'Dup'; we expect the first one to be returned
    source = """class Dup:
    x = 1

class Other:
    pass

class Dup:
    x = 2
"""
    module_tree = ast.parse(source)
    node = _find_class_node_by_name("Dup", module_tree) # 10.0μs -> 1.49μs (571% faster)
    assert isinstance(node, ast.ClassDef)
    # first occurrence of Dup is at line 2
    assert node.lineno == 2
    # ensure it's the first one by verifying the body content (the attribute value x should be 1)
    # the first Dup assigns x = 1, which will be in the class body as an ast.Assign
    assigns = [n for n in node.body if isinstance(n, ast.Assign)]
    # there should be at least one assignment and its value should be the constant 1
    assert assigns and isinstance(assigns[0].value, ast.Constant) and assigns[0].value.value == 1

def test_special_characters_and_unicode_class_names():
    # class names can contain underscores and unicode letters in Python 3
    source = """class _123:
    pass

class Ωmega:
    pass
"""
    module_tree = ast.parse(source)
    # find the underscore-prefixed class
    node1 = _find_class_node_by_name("_123", module_tree) # 9.88μs -> 1.42μs (594% faster)
    assert isinstance(node1, ast.ClassDef)
    assert node1.name == "_123" # 7.20μs -> 911ns (691% faster)
    # find the Unicode-named class
    node2 = _find_class_node_by_name("Ωmega", module_tree)
    assert isinstance(node2, ast.ClassDef)
    assert node2.name == "Ωmega"

def test_invalid_name_like_empty_string_returns_none():
    # passing an empty string should not match any class; function should return None
    source = "class A:\n    pass\n"
    module_tree = ast.parse(source)
    assert _find_class_node_by_name("", module_tree) is None # 9.32μs -> 2.22μs (319% faster)

def test_large_scale_many_classes_and_repeated_lookups():
    # construct a large module with 1000 classes named Class0 ... Class999
    n = 1000
    class_defs = []
    for i in range(n):
        # each class has a simple pass body; ensure deterministic ordering
        class_defs.append(f"class Class{i}:\n    x = {i}\n")
    source = "\n".join(class_defs)
    module_tree = ast.parse(source)
    # sanity check: find a few classes, including first, middle, and last
    node_first = _find_class_node_by_name("Class0", module_tree) # 73.4μs -> 2.11μs (3374% faster)
    node_mid = _find_class_node_by_name("Class500", module_tree)
    node_last = _find_class_node_by_name(f"Class{n-1}", module_tree) # 732μs -> 68.8μs (964% faster)
    assert isinstance(node_first, ast.ClassDef) and node_first.name == "Class0"
    assert isinstance(node_mid, ast.ClassDef) and node_mid.name == "Class500" # 1.43ms -> 109μs (1205% faster)
    assert isinstance(node_last, ast.ClassDef) and node_last.name == f"Class{n-1}"
    # repeat many lookups deterministically to exercise performance/path stability (1000 iterations)
    for i in range(0, n, 100):  # sample every 100th to limit runtime while still exercising scale
        name = f"Class{i}"
        found = _find_class_node_by_name(name, module_tree) # 6.66ms -> 462μs (1341% faster)
        assert isinstance(found, ast.ClassDef)
        assert found.name == name

def test_repeatability_same_node_identity_on_repeated_calls():
    # ensure repeated calls for the same class name return the same node object (identity) since the tree is unchanged
    source = """class Unique:
    pass
"""
    module_tree = ast.parse(source)
    first = _find_class_node_by_name("Unique", module_tree) # 10.5μs -> 1.38μs (657% faster)
    second = _find_class_node_by_name("Unique", module_tree)
    # both calls should return the exact same AST node object (identity)
    assert first is second # 5.36μs -> 561ns (855% faster)
import ast

# imports
import pytest
from codeflash.languages.python.context.code_context_extractor import \
    _find_class_node_by_name

class TestFindClassNodeByNameBasic:
    """Basic tests for normal usage patterns with typical inputs."""
    
    def test_find_single_class_in_simple_module(self):
        """Test finding a single class in a module with one class."""
        code = """
class MyClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("MyClass", module_tree) # 9.47μs -> 1.43μs (561% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
        assert result.name == "MyClass"
    
    def test_find_class_among_multiple_classes(self):
        """Test finding a specific class when multiple classes exist."""
        code = """
class FirstClass:
    pass

class SecondClass:
    pass

class ThirdClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("SecondClass", module_tree) # 11.3μs -> 1.58μs (615% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
        assert result.name == "SecondClass"
    
    def test_find_class_with_methods(self):
        """Test finding a class that contains methods."""
        code = """
class ClassWithMethods:
    def method1(self):
        pass
    
    def method2(self):
        return 42
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("ClassWithMethods", module_tree) # 9.99μs -> 1.44μs (592% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
        assert result.name == "ClassWithMethods"
        assert len(result.body) == 2
    
    def test_find_nested_class_by_top_level_name(self):
        """Test finding top-level class by name (ast.walk traverses nested classes)."""
        code = """
class OuterClass:
    class InnerClass:
        pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("OuterClass", module_tree) # 9.48μs -> 1.44μs (557% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
        assert result.name == "OuterClass"
    
    def test_find_first_class_when_multiple_with_same_name(self):
        """Test that function returns first occurrence when multiple classes have same name."""
        code = """
class DuplicateClass:
    pass

class DuplicateClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("DuplicateClass", module_tree) # 9.53μs -> 1.36μs (600% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
        assert result.name == "DuplicateClass"
    
    def test_find_class_with_inheritance(self):
        """Test finding a class that inherits from another class."""
        code = """
class BaseClass:
    pass

class DerivedClass(BaseClass):
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("DerivedClass", module_tree) # 11.4μs -> 1.53μs (643% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
        assert result.name == "DerivedClass"
        assert len(result.bases) == 1

class TestFindClassNodeByNameEdge:
    """Edge case tests for unusual or boundary conditions."""
    
    def test_class_not_found_returns_none(self):
        """Test that None is returned when class does not exist."""
        code = """
class ExistingClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("NonExistentClass", module_tree) # 9.43μs -> 2.23μs (322% faster)
        assert result is None
    
    def test_empty_module(self):
        """Test searching in an empty module."""
        code = ""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("AnyClass", module_tree) # 5.88μs -> 1.10μs (433% faster)
        assert result is None
    
    def test_module_with_no_classes_only_functions(self):
        """Test module containing functions but no classes."""
        code = """
def function1():
    pass

def function2():
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("SomeClass", module_tree) # 15.5μs -> 3.53μs (339% faster)
        assert result is None
    
    def test_module_with_no_classes_only_variables(self):
        """Test module with variable assignments but no classes."""
        code = """
x = 42
y = "string"
z = [1, 2, 3]
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("MyClass", module_tree) # 19.8μs -> 1.89μs (945% faster)
        assert result is None
    
    def test_class_name_case_sensitive(self):
        """Test that class name search is case-sensitive."""
        code = """
class MyClass:
    pass
"""
        module_tree = ast.parse(code)
        result_lower = _find_class_node_by_name("myclass", module_tree) # 9.09μs -> 2.24μs (305% faster)
        result_upper = _find_class_node_by_name("MYCLASS", module_tree)
        assert result_lower is None # 5.21μs -> 972ns (436% faster)
        assert result_upper is None
    
    def test_class_with_special_characters_in_name(self):
        """Test finding class with underscores in name."""
        code = """
class _PrivateClass:
    pass

class __DunderClass__:
    pass
"""
        module_tree = ast.parse(code)
        result_private = _find_class_node_by_name("_PrivateClass", module_tree) # 9.48μs -> 1.46μs (548% faster)
        result_dunder = _find_class_node_by_name("__DunderClass__", module_tree)
        assert result_private is not None # 6.64μs -> 882ns (653% faster)
        assert result_private.name == "_PrivateClass"
        assert result_dunder is not None
        assert result_dunder.name == "__DunderClass__"
    
    def test_empty_string_class_name(self):
        """Test searching for empty string as class name."""
        code = """
class ValidClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("", module_tree) # 8.85μs -> 2.14μs (313% faster)
        assert result is None
    
    def test_class_with_class_variables(self):
        """Test finding class that has class variables."""
        code = """
class ClassWithVariables:
    class_var = 42
    another_var = "test"
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("ClassWithVariables", module_tree) # 9.61μs -> 1.40μs (585% faster)
        assert result is not None
        assert isinstance(result, ast.ClassDef)
    
    def test_deeply_nested_class(self):
        """Test finding class nested deeply in function or another class."""
        code = """
class OuterClass:
    class MiddleClass:
        class InnerClass:
            pass
"""
        module_tree = ast.parse(code)
        # ast.walk traverses all nodes, so nested classes are also found
        result_outer = _find_class_node_by_name("OuterClass", module_tree) # 9.38μs -> 1.29μs (625% faster)
        result_middle = _find_class_node_by_name("MiddleClass", module_tree)
        result_inner = _find_class_node_by_name("InnerClass", module_tree) # 6.88μs -> 1.19μs (478% faster)
        assert result_outer is not None
        assert result_middle is not None # 7.36μs -> 1.02μs (621% faster)
        assert result_inner is not None
    
    def test_class_inside_function(self):
        """Test finding class defined inside a function (ast.walk still finds it)."""
        code = """
def create_class():
    class LocalClass:
        pass
    return LocalClass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("LocalClass", module_tree) # 13.8μs -> 2.21μs (521% faster)
        # ast.walk traverses all nodes including those in functions
        assert result is not None
        assert result.name == "LocalClass"
    
    def test_whitespace_in_name_not_found(self):
        """Test that searching for name with whitespace doesn't match."""
        code = """
class MyClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("My Class", module_tree) # 8.79μs -> 2.14μs (310% faster)
        assert result is None
    
    def test_partial_name_not_found(self):
        """Test that partial class names don't match."""
        code = """
class MyLongClassName:
    pass
"""
        module_tree = ast.parse(code)
        result_partial1 = _find_class_node_by_name("MyLong", module_tree) # 8.54μs -> 2.00μs (326% faster)
        result_partial2 = _find_class_node_by_name("ClassName", module_tree)
        assert result_partial1 is None # 5.16μs -> 972ns (431% faster)
        assert result_partial2 is None
    
    def test_class_with_decorators(self):
        """Test finding class that has decorators."""
        code = """
@decorator
@another_decorator
class DecoratedClass:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("DecoratedClass", module_tree) # 9.85μs -> 1.32μs (645% faster)
        assert result is not None
        assert result.name == "DecoratedClass"
    
    def test_class_with_base_from_function_call(self):
        """Test finding class with base class from function call."""
        code = """
class DynamicBase(get_base()):
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("DynamicBase", module_tree) # 9.43μs -> 1.41μs (567% faster)
        assert result is not None
        assert result.name == "DynamicBase"

class TestFindClassNodeByNameLargeScale:
    """Large-scale tests for performance and scalability."""
    
    def test_find_class_in_large_module_with_many_classes(self):
        """Test finding a class in a module with 100 classes."""
        # Generate code with 100 classes
        class_names = [f"Class{i}" for i in range(100)]
        code_lines = [f"class {name}:\n    pass\n" for name in class_names]
        code = "\n".join(code_lines)
        
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("Class50", module_tree) # 83.1μs -> 7.51μs (1005% faster)
        assert result is not None
        assert result.name == "Class50"
    
    def test_find_class_in_module_with_many_classes_last_element(self):
        """Test finding last class in module with 100 classes."""
        class_names = [f"Class{i}" for i in range(100)]
        code_lines = [f"class {name}:\n    pass\n" for name in class_names]
        code = "\n".join(code_lines)
        
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("Class99", module_tree) # 144μs -> 13.3μs (992% faster)
        assert result is not None
        assert result.name == "Class99"
    
    def test_find_class_in_module_with_many_classes_first_element(self):
        """Test finding first class in module with 100 classes."""
        class_names = [f"Class{i}" for i in range(100)]
        code_lines = [f"class {name}:\n    pass\n" for name in class_names]
        code = "\n".join(code_lines)
        
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("Class0", module_tree) # 16.0μs -> 1.50μs (967% faster)
        assert result is not None
        assert result.name == "Class0"
    
    def test_nonexistent_class_in_large_module(self):
        """Test searching for non-existent class in large module with 200 classes."""
        class_names = [f"Class{i}" for i in range(200)]
        code_lines = [f"class {name}:\n    pass\n" for name in class_names]
        code = "\n".join(code_lines)
        
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("ClassNonExistent", module_tree) # 355μs -> 74.2μs (379% faster)
        assert result is None
    
    def test_class_with_many_methods(self):
        """Test finding class that has many methods (100+)."""
        method_defs = "\n".join([f"    def method{i}(self):\n        pass" for i in range(100)])
        code = f"""
class ClassWithManyMethods:
{method_defs}
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("ClassWithManyMethods", module_tree) # 16.5μs -> 1.41μs (1069% faster)
        assert result is not None
        assert len(result.body) == 100
    
    def test_module_with_mixed_large_content(self):
        """Test finding class in module with 500+ functions and classes mixed."""
        lines = []
        # Generate 250 functions
        for i in range(250):
            lines.append(f"def function{i}():\n    pass\n")
        
        # Generate 250 classes
        for i in range(250):
            lines.append(f"class Class{i}:\n    pass\n")
        
        code = "\n".join(lines)
        module_tree = ast.parse(code)
        
        # Find specific classes
        result_first = _find_class_node_by_name("Class0", module_tree) # 391μs -> 35.0μs (1018% faster)
        result_middle = _find_class_node_by_name("Class125", module_tree)
        result_last = _find_class_node_by_name("Class249", module_tree) # 544μs -> 45.7μs (1093% faster)
        
        assert result_first is not None
        assert result_middle is not None # 704μs -> 57.2μs (1131% faster)
        assert result_last is not None
    
    def test_class_with_many_base_classes(self):
        """Test finding class that inherits from many base classes."""
        base_list = ", ".join([f"Base{i}" for i in range(50)])
        code = f"""
class MultipleInheritance({base_list}):
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name("MultipleInheritance", module_tree) # 13.4μs -> 1.50μs (794% faster)
        assert result is not None
        assert len(result.bases) == 50
    
    def test_module_with_very_long_class_names(self):
        """Test finding class with very long name from pool of long names."""
        long_name1 = "A" * 100
        long_name2 = "B" * 100
        code = f"""
class {long_name1}:
    pass

class {long_name2}:
    pass
"""
        module_tree = ast.parse(code)
        result = _find_class_node_by_name(long_name1, module_tree) # 10.0μs -> 1.39μs (622% faster)
        assert result is not None
        assert result.name == long_name1
⏪ Click to see Replay Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
benchmarks/codeflash_replay_tests_260k0cbn/test_tests_benchmarks_test_benchmark_code_extract_code_context__replay_test_0.py::test_codeflash_languages_python_context_code_context_extractor__find_class_node_by_name_test_benchmark_extract 24.5ms 667μs 3565%✅

To test or edit this optimization locally git merge codeflash/optimize-pr1660-2026-03-17T02.10.53

Click to see suggested changes
Suggested change
return next((n for n in ast.walk(module_tree) if isinstance(n, ast.ClassDef) and n.name == class_name), None)
stack = [module_tree]
while stack:
node = stack.pop()
# Only nodes with .body attribute can contain class definitions
body = getattr(node, 'body', None)
if body:
for item in body:
if isinstance(item, ast.ClassDef):
if item.name == class_name:
return item
stack.append(item)
elif isinstance(item, (ast.FunctionDef, ast.AsyncFunctionDef)):
stack.append(item)
return None

The optimization replaced recursive calls in `_get_expr_name` with an iterative loop that walks attribute chains once, collecting parts into a list and reversing them only at the end, eliminating function-call overhead that dominated 46% of original runtime (line profiler shows recursive calls at 1154 ns/hit vs. the new loop iterations at ~300 ns/hit). Additionally, `_expr_matches_name` now precomputes `"." + suffix` once instead of building it twice per invocation via f-strings, cutting redundant string allocations. The net 26% runtime improvement comes primarily from avoiding Python's recursion stack and reducing temporary object creation in the hot path, with all tests passing and only minor per-test slowdowns (typically 10–25%) offset by dramatic wins on deep attribute chains (up to 393% faster for 100-level nesting).
…2026-03-17T03.31.35

⚡️ Speed up function `_expr_matches_name` by 26% in PR #1660 (`unstructured-inference`)
@codeflash-ai
Copy link
Copy Markdown
Contributor

codeflash-ai bot commented Mar 17, 2026

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

…e_by_name and ancestors

- Fix duplicate type annotation for test_count_cache in optimizer.py
- Replace ast.walk() with stack-based traversal in _find_class_node_by_name (21x speedup)
- Use list instead of deque in CallGraph.ancestors (34% speedup, order doesn't matter for set result)
@KRRT7 KRRT7 merged commit a0a2a85 into main Mar 17, 2026
25 of 28 checks passed
@KRRT7 KRRT7 deleted the unstructured-inference branch March 17, 2026 05:05
KRRT7 added a commit that referenced this pull request Mar 17, 2026
…ve_to bug

Port valuable improvements from #1846 that remain applicable after #1660:
- Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
- Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths)
- Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
- Batch TestsCache writes with flush() + executemany instead of per-row commit
- Gracefully disable cache writes on sqlite3.OperationalError
- Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
- Derive all_defs from all_names via is_definition() to save a redundant Jedi call
KRRT7 added a commit that referenced this pull request Mar 17, 2026
…ve_to bug

Port valuable improvements from #1846 that remain applicable after #1660:
- Cache jedi.Project instances via @cache to avoid recreating across 5 call sites
- Fix unguarded relative_to() in get_code_optimization_context (Windows 8.3 paths)
- Pre-group references by parent function in get_function_sources_from_jedi for O(1) lookup
- Batch TestsCache writes with flush() + executemany instead of per-row commit
- Gracefully disable cache writes on sqlite3.OperationalError
- Build functions_to_optimize_by_name dict for O(1) fallback lookup in process_test_files
- Derive all_defs from all_names via is_definition() to save a redundant Jedi call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants