This guide explains how to integrate the Enhanced AST Repository (including CFG, DFG, and CPG capabilities) with other ElixirScope components and leverage its advanced features for debugging and analysis.
- Overview of Integration Architecture
- Compile-Time Integration
- Runtime Integration
- Query-Time and Analysis Integration
- AI Components Integration
- Advanced Debugging Features
- Best Practices for Integration
- Troubleshooting Common Issues
The Enhanced AST Repository is central to ElixirScope's advanced analysis and debugging capabilities. It provides a rich, static representation of code (AST, CFG, DFG, CPG) that is correlated with dynamic runtime information.
Key Integration Principles:
- AST Node ID: A unique, stable identifier (
module:function:path_hash) assigned to AST nodes. This ID is the primary key for linking static analysis data with runtime events. - Compile-Time Analysis: During compilation (or a pre-processing step), code is parsed, AST Node IDs are assigned, CFG/DFG/CPGs are generated, and this information is stored in the
EnhancedRepository. - Instrumentation: The
EnhancedTransformerinjects calls toInstrumentationRuntime, embeddingast_node_ids into these calls. - Runtime Correlation:
InstrumentationRuntimecaptures events tagged withast_node_ids. TheRuntimeCorrelator(andTemporalBridgeEnhancement) uses these IDs to link runtime behavior back to the static code structures in theEnhancedRepository. - Querying: The
QueryEngine(viaASTExtensions) can perform correlated queries, joining static properties from theEnhancedRepositorywith runtime event data fromEventStoreorTemporalStorage. - AI Leverage: AI components use the CPG and correlated data from the repository to provide deeper insights, plan instrumentation, and make predictions.
- Source Parsing: Elixir source files (
.ex,.exs) are read. - AST Generation:
Code.string_to_quoted/2generates the initial Elixir AST. - Node ID Assignment:
- The
ElixirScope.ASTRepository.Parser(specifically itsassign_node_ids/1function or logic integrated withinASTAnalyzer/ProjectPopulator) traverses the AST. - It uses
ElixirScope.ASTRepository.NodeIdentifier.generate_id_for_current_node/2to create and assign uniqueast_node_ids to relevant AST nodes. These IDs are stored in the node's metadata (e.g.,Keyword.put(meta, :ast_node_id, new_id)). - The
NodeIdentifieraims for stability of these IDs across non-structural code changes.
- The
Once an AST (potentially with Node IDs) is available for a function:
- CFG Generation:
ElixirScope.ASTRepository.Enhanced.CFGGenerator.generate_cfg(function_ast, opts)is called.- It produces
CFGData.t()containing nodes, edges, complexity metrics, and path analysis. CFG nodes are linked to originalast_node_ids.
- DFG Generation:
ElixirScope.ASTRepository.Enhanced.DFGGenerator.generate_dfg(function_ast, opts)is called.- It produces
DFGData.t()using SSA form, detailing variable definitions, uses, data flows, and phi nodes. DFG elements are also linked toast_node_ids.
- CPG Generation:
ElixirScope.ASTRepository.Enhanced.CPGBuilder.build_cpg(function_ast_or_enhanced_function_data, opts)is called.- It takes the AST (or
EnhancedFunctionDatacontaining AST, CFG, and DFG) and unifies them into aCPGData.t(). - CPG nodes primarily derive from AST nodes, augmented with CFG/DFG info. Edges represent AST structure, control flow, and data flow.
- Instrumentation Plan: The
ElixirScope.CompileTime.Orchestrator(usingAI.CodeAnalyzerandAI.PatternRecognizer) generates an instrumentation plan. This plan specifies which code constructs (functions, expressions, etc.) should be instrumented. - Mapper:
ElixirScope.ASTRepository.InstrumentationMapper.map_instrumentation_points/2takes an AST and determines specific AST nodes that correspond to the plan's targets. It usesast_node_ids for precision. - Transformer:
ElixirScope.AST.Transformer(for basic instrumentation) orElixirScope.AST.EnhancedTransformer(for granular "Cinema Data" instrumentation) modifies the AST.- It uses
ElixirScope.AST.InjectorHelpersto generate AST snippets for calls toElixirScope.Capture.InstrumentationRuntime. - Crucially, the
ast_node_idof the instrumented source construct is embedded as an argument in the injected runtime call (e.g.,InstrumentationRuntime.report_ast_function_entry_with_node_id(..., ast_node_id)).
- Initial Population:
ElixirScope.ASTRepository.Enhanced.ProjectPopulator.populate_project(repo_pid, project_path, opts)discovers all relevant Elixir files.- For each file, it parses the AST, invokes
ASTAnalyzer(which implicitly handles Node ID assignment viaParserorNodeIdentifier), and then triggers CFG, DFG, (optionally) CPG generation. - The resulting
EnhancedModuleData(containingEnhancedFunctionDatawith their respective graphs) is stored in theEnhancedRepository.
- Continuous Synchronization:
ElixirScope.ASTRepository.Enhanced.FileWatchermonitors project files for changes.- Upon detecting a change (create, modify, delete), it notifies
ElixirScope.ASTRepository.Enhanced.Synchronizer. - The
Synchronizerre-parses and re-analyzes the changed file(s) and updates theEnhancedRepositoryincrementally.
- Instrumented code, when executed, calls functions in
ElixirScope.Capture.InstrumentationRuntime(e.g.,report_ast_function_entry_with_node_id,report_ast_variable_snapshot). - These calls include the
ast_node_id(embedded at compile-time) and the currentcorrelation_id(managed byInstrumentationRuntime's call stack). InstrumentationRuntimeforwards these events, now tagged withast_node_idandcorrelation_id, to the event ingestion pipeline (e.g.,Ingestor->RingBuffer).
ElixirScope.ASTRepository.RuntimeCorrelatoris responsible for the primary link between runtime events and static AST data.correlate_event_to_ast(repo, event): Given a runtime event containingmodule,function,arity, and potentiallyline_numberor an explicitast_node_id, this function queries theEnhancedRepositoryto find the corresponding staticast_context(including the canonicalast_node_id, CPG info, etc.).get_runtime_context(repo, event): Provides a more comprehensive context, including variable scope and call hierarchy, by leveraging CFG/DFG data associated with the correlated AST node.enhance_event_with_ast(repo, event): Augments a raw runtime event with richast_context, structural info, and data flow info.build_execution_trace(repo, events): Constructs an AST-aware trace, showing the sequence of AST nodes executed and related variable states.
- Events captured by
InstrumentationRuntime(now potentially includingast_node_id) are passed toElixirScope.Capture.Ingestor. - The
Ingestorwrites these events toRingBuffer. AsyncWriterPoolprocesses events fromRingBufferand sends them toElixirScope.Storage.EventStore(viaElixirScope.Storage.DataAccess). TheEventStoreshould be capable of indexing events byast_node_idandcorrelation_id.ElixirScope.Capture.TemporalBridgeconsumes events (potentially fromInstrumentationRuntimedirectly or fromEventStore) and stores them inElixirScope.Capture.TemporalStorage, which also indexes bytimestamp,ast_node_id, andcorrelation_id.
- The
ElixirScope.ASTRepository.Enhanced.Repositoryprovides direct APIs to fetchEnhancedModuleData,EnhancedFunctionData, and specific graphs (CFG, DFG, CPG). - For more complex static queries (e.g., "find all functions with cyclomatic complexity > 10 and calling
Ecto.Repo.all/2"), useElixirScope.ASTRepository.QueryBuilderto construct a query specification. - This specification is then passed to
ElixirScope.ASTRepository.QueryExecutor.execute_query/2(or directly toEnhancedRepository.query_analysis/2) which processes it against the repository's data.
ElixirScope.QueryEngine.ASTExtensions.execute_ast_query(query)allows querying static data from theEnhancedRepository.ElixirScope.QueryEngine.ASTExtensions.execute_correlated_query(static_query, runtime_query_template, join_key)is the core function for combining static and dynamic data:- It first executes the
static_queryagainst theEnhancedRepositoryto get a set of static elements (e.g., functions matching certain criteria). - It extracts
join_keyvalues (e.g.,ast_node_ids orfunction_keys) from the static results. - It uses these values to parameterize and execute
runtime_query_templateagainst theEventStore(viaQueryEngine.Engine). - Finally, it joins the static results with the runtime events.
- It first executes the
ElixirScope.Capture.TemporalBridgeEnhancementusesRuntimeCorrelatorandEnhancedRepositoryto provide AST-aware time-travel features.reconstruct_state_with_ast(...): Reconstructs process state at a timestamp and enriches it with the AST/CPG context of the code executing at that time.get_ast_execution_trace(...): Shows the sequence of AST nodes traversed during an execution segment, correlating them with runtime events and state changes.get_states_for_ast_node(...): Allows "semantic stepping" by finding all runtime states associated with a particularast_node_id.get_execution_flow_between_nodes(...): Visualizes the runtime path taken between two points in the static code structure.
The ElixirScope.AI.Bridge module serves as the primary interface for AI components.
ElixirScope.AI.Bridge.get_function_cpg_for_ai(function_key, ...): Fetches the CPG for a function, which is a rich input for many AI models.ElixirScope.AI.Bridge.find_cpg_nodes_for_ai_pattern(pattern_dsl, ...): Allows AI to query for specific code structures using a CPG pattern.ElixirScope.AI.Bridge.get_correlated_features_for_ai(...): Provides a way to extract a combined feature set (static CPG properties + dynamic runtime summaries) for AI models, especially forPredictiveAnalyzer.
ElixirScope.AI.Analysis.IntelligentCodeAnalyzeruses ASTs (and potentially CPGs viaAI.Bridge) to perform semantic analysis, quality assessment, and suggest refactorings.ElixirScope.AI.ComplexityAnalyzeranalyzes ASTs/CPGs for various complexity metrics.ElixirScope.AI.PatternRecognizeruses ASTs/CPGs to identify OTP patterns, Phoenix structures, and other architectural elements.ElixirScope.ASTRepository.PatternMatcherprovides a dedicated service for matching AST, behavioral, and anti-patterns against theEnhancedRepository.
ElixirScope.AI.Predictive.ExecutionPredictoruses historical data (runtime events correlated with static features viaAI.Bridge) to train models that predict execution paths, resource usage, and concurrency impacts.
ElixirScope.AI.LLM.Clientuses the configured LLM provider.ElixirScope.AI.Bridge.query_llm_with_cpg_context(...)shows a pattern where CPG data (e.g., code snippets, complexity) enriches prompts sent to an LLM for code understanding or suggestions.
These features are primarily managed by ElixirScope.Capture.EnhancedInstrumentation and leverage the RuntimeCorrelator and EnhancedRepository.
- Setup:
EnhancedInstrumentation.set_structural_breakpoint(spec)defines a breakpoint based on an AST pattern (e.g., a specific function call signature, a type of loop).specincludes the ASTpattern,condition(e.g.,:pattern_match_failure), andast_path. - Runtime:
- When
InstrumentationRuntimereports an event (e.g.,report_enhanced_function_entry), it includes theast_node_id. EnhancedInstrumentation(orRuntimeCorrelatoron its behalf) checks if the AST node associated withast_node_id(fetched fromEnhancedRepository) matches any active structural breakpoint patterns.- If a match and condition are met, the breakpoint "triggers" (e.g., logs, pauses execution via a debugger interface).
- When
- Setup:
EnhancedInstrumentation.set_data_flow_breakpoint(spec)defines a breakpoint on avariablename, anast_path(scope), andflow_conditions(e.g.,:assignment,:function_call). - Runtime:
- Requires DFG information from
EnhancedRepositoryfor the relevant function. - When
InstrumentationRuntime.report_enhanced_variable_snapshotis called,EnhancedInstrumentationchecks if the snapshot involves the watchedvariable. - It then uses the DFG to see if the current
ast_node_idand the state of the variable satisfy theflow_conditionswithin the specifiedast_path.
- Requires DFG information from
- Setup:
EnhancedInstrumentation.set_semantic_watchpoint(spec)defines a watchpoint on avariablewithin anast_scope, tracking its value changes as it flowstrack_throughcertain AST constructs (e.g.,:pattern_match,:function_call). - Runtime:
- Leverages CPG data from
EnhancedRepository. - When
InstrumentationRuntime.report_enhanced_variable_snapshotoccurs,EnhancedInstrumentationchecks if the snapshot is within theast_scopeand involves the watchedvariable. - It uses the CPG's data flow edges and AST structure to determine if the variable's current state change is part of a tracked semantic flow.
- Value history is maintained for the watchpoint.
- Leverages CPG data from
- AST Node ID Consistency: Ensure
NodeIdentifierlogic is robust and consistently applied byParser/ASTAnalyzerand used byEnhancedTransformer. This is the bedrock of correlation. - Repository Availability: Ensure
EnhancedRepository(and its GenServer process) is started and available before compile-time tasks (Mix.Tasks.Compile.ElixirScope) or runtime components (RuntimeCorrelator,TemporalBridgeEnhancement) that depend on it. - Configuration: Use
ElixirScope.ConfigandElixirScope.ASTRepository.Configfor centralized configuration. - Asynchronous Operations: For performance, interactions that might be slow (e.g., full CPG generation, complex AI analysis) should be done asynchronously or in background tasks, especially if triggered by runtime events.
- Caching: Leverage caching mechanisms provided by
QueryBuilderandMemoryManagerfor frequently accessed static data or query results. - Error Handling: Implement robust error handling for API calls between components (e.g., when
RuntimeCorrelatorqueriesEnhancedRepository). - Incremental Updates: Utilize
FileWatcherandSynchronizerfor efficient incremental updates to theEnhancedRepositoryto keep static analysis fresh without full project re-scans.
- No Correlation Data:
- Verify
ast_node_ids are being correctly assigned during parsing and injected during transformation. - Ensure
RuntimeCorrelatoris running and correctly configured with theEnhancedRepository. - Check if
InstrumentationRuntimeis reporting events withast_node_ids.
- Verify
- Slow Performance:
- Analysis Time: Profile
ASTAnalyzerand graph generators (CFG, DFG, CPG). Consider optimizing their algorithms or enabling lazy generation for parts of the CPG. - Query Time: Use
QueryBuilder.get_optimization_hints()andQueryEngine.Engine.get_optimization_suggestions(). EnsureEnhancedRepositoryindexes are effective. CheckMemoryManagercache hit rates. - Runtime Overhead: Reduce instrumentation granularity or sampling rate via
ElixirScope.Config.
- Analysis Time: Profile
- AST Node ID Mismatches:
- Ensure the same
NodeIdentifierlogic is used consistently. - If code is refactored,
ast_node_ids may change. The repository might need mechanisms to map old IDs to new ones or version AST data.
- Ensure the same
EnhancedRepositoryNot Populated:- Ensure
ProjectPopulator.populate_project/3has been run successfully. - Check logs from
FileWatcherandSynchronizerfor any errors during file processing.
- Ensure
- AI Components Not Working:
- Verify
AI.Bridgecan accessEnhancedRepositoryandQueryEngine. - Check logs from the specific AI component for errors (e.g., LLM API errors, model loading issues).
- Ensure CPGs (if required by the AI component) are being generated and are accessible.
- Verify
- Out-of-Memory Errors:
- Monitor
MemoryManagerstatistics. Adjust its thresholds or theEnhancedRepository's memory limits. - Profile CPG generation and storage, as CPGs can be large. Consider lazy loading or partial CPGs for very large functions/modules.
- Monitor