Skip to content

Add explain to AnalyticsExecutionEngine#5442

Open
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:analytics-engine-explain
Open

Add explain to AnalyticsExecutionEngine#5442
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:analytics-engine-explain

Conversation

@finnegancarroll
Copy link
Copy Markdown
Contributor

@finnegancarroll finnegancarroll commented May 14, 2026

Description

Connects SQL plugin AnalyticsExecutionEngine explain API with analytics engine executeWithProfile path. Providing per execution stage profiling for queries executed on this endpoint.

Note that _explain API on a an analytics engine index will provide the logical plan, execute the query, and provide profiling information for each execution stage. In contrast non analytics engine indices will only return the logical plan.

Testing

1. Publish SQL Plugin to Maven Local

./gradlew :opensearch-sql-plugin:publishToMavenLocal

2. Start Single-Node Cluster

./gradlew run -Dsandbox.enabled=true \
  -PinstalledPlugins="['opensearch-job-scheduler:3.7.0.0-SNAPSHOT', 'arrow-flight-rpc', 'analytics-engine', 'parquet-data-format', 'analytics-backend-datafusion', 'analytics-backend-lucene', 'composite-engine', 'opensearch-sql-plugin:3.7.0.0-SNAPSHOT']" \
  -Dtests.jvm.argline="-Dopensearch.experimental.feature.pluggable.dataformat.enabled=true -Dopensearch.experimental.feature.transport.stream.enabled=true"

3. Create Parquet-Backed Index

curl -X PUT "localhost:9200/test_parquet" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 0,
    "index.pluggable.dataformat.enabled": true,
    "index.pluggable.dataformat": "composite"
  },
  "mappings": {
    "properties": {
      "name": {"type": "keyword"},
      "age": {"type": "integer"},
      "score": {"type": "double"}
    }
  }
}'

4. Ingest Sample Data

curl -X POST "localhost:9200/test_parquet/_bulk" -H 'Content-Type: application/x-ndjson' -d'
{"index":{}}
{"name":"alice","age":30,"score":95.5}
{"index":{}}
{"name":"bob","age":25,"score":87.3}
{"index":{}}
{"name":"carol","age":35,"score":91.0}
'

curl -X POST "localhost:9200/test_parquet/_refresh"

5. Run Explain Query (SQL Plugin Path)

curl -s -X POST "localhost:9200/_plugins/_ppl/_explain" \
  -H 'Content-Type: application/json' \
  -d '{"query": "source = test_parquet | stats avg(score) by name"}'

Result

{
  "calcite": {
    "logical": "LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n  LogicalProject(avg(score)=[$1], name=[$0])\n    LogicalAggregate(group=[{0}], avg(score)=[AVG($1)])\n      LogicalProject(name=[$1], score=[$2])\n        LogicalTableScan(table=[[opensearch, test_parquet]])\n",
    "profile": {
      "queryId": "8ffee8b8-944b-4fcd-9f71-6f4601f8034b",
      "fullPlan": [
        "OpenSearchSort(fetch=[10000], viableBackends=[[datafusion]])",
        "  OpenSearchProject(avg(score)=[$1], name=[$0], viableBackends=[[datafusion]])",
        "    OpenSearchProject(name=[$0], avg(score)=[ANNOTATED_PROJECT_EXPR(id=2, backends=[datafusion], /($1, $2))], viableBackends=[[datafusion]])",
        "      OpenSearchAggregate(group=[{0}], agg#0=[SUM(...)], agg#1=[COUNT(...)], mode=[FINAL], viableBackends=[[datafusion]])",
        "        OpenSearchExchangeReducer(viableBackends=[[datafusion]])",
        "          OpenSearchAggregate(group=[{0}], agg#0=[SUM(...)], agg#1=[COUNT(...)], mode=[PARTIAL], viableBackends=[[datafusion]])",
        "            OpenSearchProject(name=[$1], score=[$2], viableBackends=[[datafusion]])",
        "              OpenSearchTableScan(table=[[test_parquet]], viableBackends=[[datafusion]])"
      ],
      "totalElapsedMs": 220,
      "stages": [
        {
          "stageId": 0,
          "executionType": "SHARD_FRAGMENT",
          "distribution": "SINGLETON",
          "state": "SUCCEEDED",
          "elapsedMs": 217,
          "rowsProcessed": 3,
          "tasksCompleted": 2,
          "tasksFailed": 0,
          "fragment": [
            "OpenSearchAggregate(..., mode=[PARTIAL], ...)",
            "  OpenSearchProject(name=[$1], score=[$2], ...)",
            "    OpenSearchTableScan(table=[[test_parquet]], ...)"
          ],
          "tasks": [
            {
              "partitionId": 0,
              "node": "953slSrWQiaMqNimQGKTfg/shard[0]",
              "state": "FINISHED",
              "elapsedMs": 216
            },
            {
              "partitionId": 1,
              "node": "953slSrWQiaMqNimQGKTfg/shard[1]",
              "state": "FINISHED",
              "elapsedMs": 206
            }
          ]
        },
        {
          "stageId": 1,
          "executionType": "COORDINATOR_REDUCE",
          "state": "SUCCEEDED",
          "elapsedMs": 3,
          "rowsProcessed": 0,
          "tasksCompleted": 0,
          "tasksFailed": 0,
          "fragment": [
            "OpenSearchSort(fetch=[10000], ...)",
            "  OpenSearchProject(avg(score)=[$1], name=[$0], ...)",
            "    OpenSearchAggregate(..., mode=[FINAL], ...)",
            "      OpenSearchExchangeReducer(...)",
            "        OpenSearchStageInputScan(childStageId=[0], ...)"
          ],
          "tasks": []
        }
      ]
    }
  }
}

Related Issues

N/A

Check List

- [ ] New functionality includes testing. <- Tested locally, no IT suite for analytics plugin at this time

  • New functionality has been documented. <- TODO as follow up if design does not change
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

PR Reviewer Guide 🔍

(Review updated until commit e613538)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Query Execution Side Effect

The explain method now executes the query via executeWithProfile to gather profiling data. This means every explain request will actually run the query against the data, potentially causing performance impact, resource consumption, and side effects if the query modifies state. Traditional explain operations only analyze the query plan without execution. This behavior should be intentional and documented, as users typically expect explain to be a read-only, lightweight operation.

planExecutor.executeWithProfile(
    plan,
    null,
    new ActionListener<>() {
      @Override
      public void onResponse(ProfiledResult result) {
        try {
          QueryProfile profile = result.profile();
          ExplainResponse response =
              new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
          listener.onResponse(ExplainResponse.normalizeLf(response));
        } catch (Exception e) {
          listener.onFailure(e);
        }
      }

      @Override
      public void onFailure(Exception e) {
        // Fall back to plan-only explain if profiling fails
        try {
          ExplainResponse response =
              new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
          listener.onResponse(ExplainResponse.normalizeLf(response));
        } catch (Exception ex) {
          listener.onFailure(ex);
        }
      }
    });
Silent Failure Masking

When executeWithProfile fails in the onFailure callback (lines 138-147), the code falls back to returning a plan-only explain response without the profile. This silently masks execution failures that could indicate real problems (e.g., invalid query, resource exhaustion, permission issues). Users receive a successful response with no indication that profiling failed or why, making debugging difficult.

public void onFailure(Exception e) {
  // Fall back to plan-only explain if profiling fails
  try {
    ExplainResponse response =
        new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
    listener.onResponse(ExplainResponse.normalizeLf(response));
  } catch (Exception ex) {
    listener.onFailure(ex);
  }
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

PR Code Suggestions ✨

Latest suggestions up to e613538

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Log profiling failure before fallback

The original exception e from profiling failure is silently ignored. Consider
logging the profiling failure before falling back to plan-only explain to aid
debugging and monitoring of profiling issues.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [137-148]

 @Override
 public void onFailure(Exception e) {
   // Fall back to plan-only explain if profiling fails
+  // TODO: Add logging for profiling failure: e
   try {
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception ex) {
     listener.onFailure(ex);
   }
 }
Suggestion importance[1-10]: 7

__

Why: Valid suggestion to log the profiling failure exception e before falling back to plan-only explain. This would improve debugging and monitoring capabilities. However, it's a TODO comment rather than actual implementation, and logging suggestions typically have moderate impact.

Medium
Verify null parameter handling

The second parameter null passed to executeWithProfile may cause issues if the
method expects a non-null value. Verify that passing null is intentional and
properly handled by the executeWithProfile method, or provide an appropriate default
value.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [121-124]

+// Verify that null is acceptable for the second parameter
 planExecutor.executeWithProfile(
     plan,
-    null,
+    /* context or parameters */ null,
     new ActionListener<>() {
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that null is passed as the second parameter to executeWithProfile. While this is a valid concern for verification, the suggestion only asks to verify rather than identifying a concrete issue, and the PR context doesn't show this causing problems.

Low

Previous suggestions

Suggestions up to commit c73e747
CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify null context parameter

The second parameter to executeWithProfile is null, which may cause issues if the
method expects a valid context or configuration object. Verify that passing null is
intentional and supported by the API, or provide the appropriate context parameter.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [115-117]

 planExecutor.executeWithProfile(
     plan,
-    null,
+    context,
     new ActionListener<>() {
       @Override
       public void onResponse(ProfiledResult result) {
         try {
           QueryProfile profile = result.profile();
           ExplainResponse response =
               new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
           listener.onResponse(ExplainResponse.normalizeLf(response));
         } catch (Exception e) {
           listener.onFailure(e);
         }
       }
       ...
     });
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that null is passed as the second parameter to executeWithProfile, and recommends using the available context parameter instead. This is a valid concern as passing null could cause issues if the method expects a valid context object. However, without knowing the API contract, this may be intentional behavior.

Medium
General
Log profiling failure before fallback

The fallback error handling silently swallows the original profiling failure
exception e. Consider logging the original exception before falling back to ensure
debugging information is not lost when profiling fails.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-141]

 @Override
 public void onFailure(Exception e) {
   // Fall back to plan-only explain if profiling fails
+  logger.warn("Profiling failed, falling back to plan-only explain", e);
   try {
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception ex) {
     listener.onFailure(ex);
   }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the original exception e is silently swallowed in the fallback handler. Adding logging would improve debugging capabilities. However, the improved code assumes a logger exists without verifying its declaration, and logging suggestions typically have moderate impact on code quality.

Low
Suggestions up to commit c73e747
CategorySuggestion                                                                                                                                    Impact
General
Log original profiling failure exception

The fallback error handling silently swallows the original profiling failure
exception e. Consider logging the original exception before falling back to ensure
debugging information is not lost, especially for production troubleshooting.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-141]

 @Override
 public void onFailure(Exception e) {
   // Fall back to plan-only explain if profiling fails
+  // TODO: Add logging for the profiling failure: e
   try {
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception ex) {
     listener.onFailure(ex);
   }
 }
Suggestion importance[1-10]: 7

__

Why: Valid observation that the original exception e is not logged before falling back to plan-only explain. Logging would aid debugging, though the fallback mechanism itself is functional. The suggestion appropriately uses a TODO comment rather than implementing logging directly.

Medium
Verify null parameter handling

The second parameter null passed to executeWithProfile may cause issues if the
method expects a non-null value. Verify that passing null is intentional and
properly handled by the executeWithProfile method, or provide an appropriate default
value.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [115-118]

+// Verify null is acceptable or provide appropriate context
 planExecutor.executeWithProfile(
     plan,
-    null,
+    /* context or parameters */ null,
     new ActionListener<>() {
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that null is passed as the second parameter to executeWithProfile. However, it only asks for verification rather than identifying a concrete issue. Without knowing the method signature, this is a reasonable concern but may be intentional design.

Low
Suggestions up to commit f6de314
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate null parameter in executeWithProfile

The executeWithProfile method is called with null as the second parameter without
validation. If this parameter is required or if executeWithProfile doesn't handle
null gracefully, this could cause a NullPointerException or unexpected behavior.
Verify the method signature and pass an appropriate value or add null-safety checks.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [115-142]

+// Verify that executeWithProfile accepts null for the second parameter
+// or provide appropriate default value based on the method contract
 planExecutor.executeWithProfile(
     plan,
-    null,
+    /* provide appropriate parameter or verify null is acceptable */,
     new ActionListener<>() {
       @Override
       public void onResponse(ProfiledResult result) {
         try {
           QueryProfile profile = result.profile();
           ExplainResponse response =
               new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
           listener.onResponse(ExplainResponse.normalizeLf(response));
         } catch (Exception e) {
           listener.onFailure(e);
         }
       }
       ...
     });
Suggestion importance[1-10]: 5

__

Why: The suggestion asks to verify if null is acceptable for the second parameter of executeWithProfile. While this is a valid concern, the suggestion only requests verification without providing concrete evidence of an issue. The PR code may intentionally pass null if the method signature allows it.

Low
General
Add null safety for profile result

The result.profile() call could potentially return null, which would be passed to
ExplainResponseNodeV2. Add a null check to ensure the profile is valid before
constructing the response, or handle the null case explicitly to prevent potential
issues downstream.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [120-129]

 @Override
 public void onResponse(ProfiledResult result) {
   try {
-    QueryProfile profile = result.profile();
+    QueryProfile profile = result != null ? result.profile() : null;
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception e) {
     listener.onFailure(e);
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion adds a null check for result before calling result.profile(), but the improved_code doesn't meaningfully change the behavior since profile would still be null if result is null. The ExplainResponseNodeV2 constructor already accepts null for the profile parameter, so this check provides minimal value.

Low

@finnegancarroll finnegancarroll changed the title Wire analytics engine explain to return stage profiling via executeWi… Add explain to AnalyticsExecutionEngine May 14, 2026
@finnegancarroll finnegancarroll force-pushed the analytics-engine-explain branch from f6de314 to c73e747 Compare May 22, 2026 04:56
@finnegancarroll finnegancarroll marked this pull request as ready for review May 22, 2026 04:56
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c73e747

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c73e747

…thProfile

AnalyticsExecutionEngine.explain() now calls executeWithProfile on the
analytics engine's QueryPlanExecutor, executing the query and capturing
per-stage timing from the coordinator's perspective. The resulting
QueryProfile is attached to ExplainResponseNodeV2 and serialized in the
/_plugins/_ppl/_explain response.

Changes:
- ExplainResponseNodeV2: add QueryProfile field with backward-compatible
  3-arg constructor
- ExplainResponse.normalizeLf(): preserve profile field when normalizing
  line endings
- AnalyticsExecutionEngine.explain(): call executeWithProfile, attach
  profile to response. Falls back to plan-only on failure.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
@finnegancarroll finnegancarroll force-pushed the analytics-engine-explain branch from c73e747 to e613538 Compare May 26, 2026 22:29
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e613538.

PathLineSeverityDescription
core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java130lowThe EXPLAIN endpoint now calls `planExecutor.executeWithProfile(plan, null, ...)`, meaning an EXPLAIN request triggers actual query execution to collect profiling data. If the authorization model distinguishes between explain-only and execute permissions, this change silently collapses that boundary. Appears to be an intentional feature decision for profiling rather than malicious, but warrants a review of whether EXPLAIN callers are subject to the same execution-level authorization checks.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e613538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants