Skip to content

[Hackathon] feat: add smart source and visual trace#5094

Open
tanishqgandhi1908 wants to merge 1 commit into
apache:mainfrom
tanishqgandhi1908:codex/feat/smart-source-visual-trace
Open

[Hackathon] feat: add smart source and visual trace#5094
tanishqgandhi1908 wants to merge 1 commit into
apache:mainfrom
tanishqgandhi1908:codex/feat/smart-source-visual-trace

Conversation

@tanishqgandhi1908
Copy link
Copy Markdown

@tanishqgandhi1908 tanishqgandhi1908 commented May 16, 2026

Video Submission

https://youtu.be/KlyswVrWLZU

What changes were proposed in this PR?

This PR improves the end-to-end data experience for hackathon workflows by making ingestion smarter, image workflows first-class, and visual outputs easier to understand.

Motivation

Before:

User task Current friction
Load a dataset Users must choose the right source operator before they know the file format
Read a folder There is no source input that can read a folder and process all files in it together
Work with images There is no first-class way to load image datasets as structured input
Understand a visual result Users can see the final output, but not how it was produced

After:

User task New experience
Load a dataset Smart Source auto-detects file type, dialect, and schema
Read a folder The same source can read a folder of similar files together and preserve source-file lineage
Work with images Image folders become structured input rows with real image previews
Understand a visual result Clicking a visual result can open a Visual Journey side panel
Screenshot 2026-05-16 at 12 06 02 PM

Main changes

  1. Add Smart Source (SmartFileScan) with support for CSV, TSV, JSON, JSONL, Arrow, Parquet, Excel, images, and plain text.
  2. Add backend file inference plus frontend inference summaries so the property panel can show detected format, delimiter, header status, sheet, schema size, and folder counts.
  3. Extend folder support across dataset selection and file scanning:
    • folders can be selected from the dataset picker
    • FileScan can read folders while preserving relative file names
    • new File Split operator routes rows from the same source file to the same output port
  4. Make image workflows more natural:
    • image folders produce rows containing image bytes plus format and dimensions
    • recognized image binaries are serialized as image data URLs
    • result tables render image thumbnails instead of raw binary text
  5. Teach the agent service about SmartFileScan and include operator display names in the prompt so the agent can reason about user-facing operator names such as Smart Source.
  6. Add a reusable Visual Journey side panel:
    • visualizers can emit rich trace payloads
    • ordinary image clicks fall back to a structural upstream workflow trace
    • iframe-origin clicks are handled correctly so visualizer interactions open the side panel reliably

Any related issues, documentation, discussions?

How was this PR tested?

PATH="/Users/tanishqgandhi/.bun/bin:$PATH" bun test agent-service/src/agent/prompts.test.ts agent-service/src/types/agent.test.ts

JAVA_HOME=$(/usr/libexec/java_home -v 17) sbt "testOnly org.apache.texera.amber.operator.source.scan.smart.CSVDialectSnifferSpec org.apache.texera.amber.operator.source.scan.smart.FormatDetectorSpec org.apache.texera.amber.operator.source.scan.smart.SmartFileSourceOpDescSpec org.apache.texera.amber.operator.source.scan.smart.SmartFileSourceOpExecSpec org.apache.texera.amber.operator.fileSplit.FileSplitOpDescSpec org.apache.texera.amber.operator.fileSplit.FileSplitOpExecSpec org.apache.texera.amber.operator.source.scan.file.FileScanSourceOpDescSpec org.apache.texera.web.service.ExecutionResultServiceSpec"

PATH="/Users/tanishqgandhi/.nvm/versions/node/v24.15.0/bin:$PATH" yarn ng test --watch=false --include='src/app/workspace/service/visual-trace/visual-trace.utils.spec.ts' --include='src/app/workspace/component/visual-trace-panel/visual-trace-panel.component.spec.ts' --include='src/app/workspace/component/result-panel/result-table-frame/result-table-cell.utils.spec.ts'

Manual verification:

  1. Loaded folder-backed CSV datasets through Smart Source.
  2. Loaded an image folder and confirmed result cells render image thumbnails.
  3. Opened an HTML visualizer, clicked a winner card, and confirmed the Visual Journey panel opens from iframe-origin clicks.

@github-actions github-actions Bot added engine dependencies Pull requests that update a dependency file frontend Changes related to the frontend GUI common agent-service labels May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service common dependencies Pull requests that update a dependency file engine frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant