Skip to content

[Hackathon] feat: Workflow performance profiler + agent-driven optimization#5098

Open
PG1204 wants to merge 14 commits into
apache:mainfrom
PG1204:hackathon/workflow_performance_profiler
Open

[Hackathon] feat: Workflow performance profiler + agent-driven optimization#5098
PG1204 wants to merge 14 commits into
apache:mainfrom
PG1204:hackathon/workflow_performance_profiler

Conversation

@PG1204
Copy link
Copy Markdown
Contributor

@PG1204 PG1204 commented May 16, 2026

Demo Video

https://drive.google.com/file/d/1rRaCWynJkJE6WtomWQiceh9KCH0Qc48M/view?usp=drive_link

What changes were proposed in this PR?

A user runs a workflow today and sees a few numeric stat badges on the canvas. They have no visual signal for which operator is slow, why it's slow, or what to do about it. This PR closes that loop end-to-end and then lets the AI agent participate in it.

Before / after

User task Before After
Spot a slow operator Read raw stat badges Canvas heatmap colors operators green → red by relative cost
Understand why it's slow No guidance 6 rule-based hints in the side panel + canvas ghost suggestions
Compare two runs Not possible Upload a downloaded JSON report or pick a past execution from the popover
Apply a fix Manual property editing Click Apply on a canvas ghost or an agent proposal card
Ask the agent about performance Agent only sees workflow shape Agent has 5 read-only profiler tools and a structured-proposal channel
Get a smart Filter / worker default Static defaults (first column / 4 workers) Agent reasons over schema + runtime to suggest informed values; rule-based fallback when offline

The story

Turn on the profiler (gauge icon in the run bar). The canvas paints itself — the Python UDF that takes most of the wall-clock turns red, everything else stays green. Hover over the red operator and a tooltip shows its runtime, throughput, and idle ratio. The property panel adds a "Profiler" section listing the fired hints (RUNTIME_OUTLIER, LOW_PARALLELISM_HOT_OP, …) with plain-English messages.

Hints that map to mechanical fixes also appear as ghost suggestions on the canvas: a "Bump workers" tag floats next to hot single-worker operators, and an "Insert Filter" ghost sits on edges where the rule engine sees an over-producing upstream. Click Apply and the change lands, with a "Run now" prompt so you can verify.
image

Want to compare runs? Download a profiler report and re-upload it later, or open the popover dropdown and pick directly from past executions — the existing delta heatmap and side-panel UI render from either source.
image

Open the agent chat and ask "is anything slow?" The agent calls getProfilerSummary and getOptimizationHints, then surfaces a structured proposal that renders inline as an Apply / Reject card. The agent never mutates the workflow itself — the frontend's Apply button is the only mutation path. Multi-step optimizations come back as a numbered plan card with per-step Apply plus an "Apply All" button.
image

The canvas ghosts themselves get smarter when the agent is available: clicking "Insert Filter" calls a proposeFilterPredicate endpoint that reads the upstream schema and downstream context to fill in real {attribute, condition, value} rows instead of the rule-based is not null placeholder. Similarly, "Bump workers" calls proposeWorkerCount to pick a number based on runtime and idle ratio. Both fall back to the static defaults on any miss, so the feature works with or without the agent running.

On the backend, a new ProfilerScoring.scala helper mirrors the frontend's three scoring formulas so any future server-side use (persisted stats, scheduler decisions) stays consistent with the UI. No call sites yet — purely future-use infrastructure.

Any related issues, documentation, discussions?

Related to the Apache Texera Agent Hackathon (#5059).

How was this PR tested?

# Frontend
cd frontend
./node_modules/.bin/tsc --noEmit -p tsconfig.json
./node_modules/.bin/ng test --watch=false \
  --include='**/profiler*.spec.ts' --include='**/agent-proposal*.spec.ts'
./node_modules/.bin/ng build

# Agent-service
cd agent-service
bun test
bunx tsc --noEmit

258/258 frontend Vitest tests pass across 12 spec files; 147/147 agent-service Bun tests pass across 10 spec files; both tsc --noEmit clean; ng build succeeds. The Scala spec for ProfilerScoring was not run locally — the amber sbt project hits a pre-existing AddMetaInfLicenseFiles not found plugin error unrelated to this PR; CI is the canonical validator.

Manual end-to-end: built a CSVScan → Filter → heavy-Python-UDF → Visualize workflow, confirmed the heatmap reds the UDF; toggled all three views; uploaded a JSON report and confirmed delta heatmap; picked a past execution from the new dropdown and got the same result; asked the agent "is anything slow?" and confirmed the orange Apply/Reject card lands the change on the canvas; asked "what can we do to make this faster?" and confirmed the blue multi-step plan card renders with per-step Apply + Apply All; clicked Insert-Filter and Bump-Workers ghosts both with and without the agent running, confirming the fallback path.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@github-actions github-actions Bot added engine frontend Changes related to the frontend GUI agent-service labels May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service engine frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants