[Hackathon] feat: add streaming HTTP/WebSocket sources and an LLM agent operator by nathant27 · Pull Request #5103 · apache/texera

nathant27 · 2026-05-16T17:10:47Z

What changes were proposed in this PR?

My Problem:
When I was using Texera for the first time, the thing I wanted to do the most was interact with data in real time. While there are some previously implemented operators that interact with external services, like the twitch and reddit ones, they are very limited in what endpoints they can access, and they especially can't access data in realtime. In addition, I wanted to be able to get this data and analyze or summarize this data with LLM's instead of having to do it myself.

My Solution:
PulseFlow, my extension to Texera that allows for real time LLM analysis from external API datasources.
For this extension I implemented 4 main operators that help build these live analytical workflows with real time data feeds:

PollingHttpSource polls an endpoint based on interval for a max number of iterations. For when web socket isn't available
WebSocketSource establishes web socket connection and uses the streamed response data for processing real time data.
HttpRequest for general Http Requests using REST methods. Mostly used POST for showing how it can interact with other external services with the extracted data from the Texera workflow
LLMAgent that can read and output data based on input tuples. Used for analyzing the responses and summarizing.

Implementation Details

PollingHttpSource (source)
- Polls an HTTP/REST endpoint at a fixed interval and emits each response as a tuple.
- Configurable method (GET/POST/PUT/PATCH/DELETE), headers, request body, interval, and an optional maxIterations cap (0 = forever).
- Output schema: response_body, status_code, polled_at.
- Implemented via a forever-running Iterator that sleeps between polls; works around Texera's bounded-source model without engine changes.
WebSocketSource (source)
- Connects to a ws:// or wss:// endpoint and emits each received frame as a tuple, forever.
- Uses JDK 11+ java.net.http WebSocket; supports an initial subscribe message and arbitrary handshake headers.
- Permissive URI handling: trims whitespace and percent-encodes the '@' character (common in Binance-style stream names) so users can paste provider URLs verbatim.
- Requests Long.MaxValue messages up front to avoid per-frame back-pressure bookkeeping; reassembles partial text frames.
- Output schema: message, received_at.
HttpRequest (transformer)
- For each input tuple, performs a configurable HTTP call with ${fieldName} interpolation in URL and body templates.
- Appends http_request_status, http_request_body, and http_request_error to the input schema (namespaced to avoid collisions with upstream columns like response_body).
- failOnError toggle controls whether non-2xx responses crash the workflow or are surfaced inline.
LLMAgent (transformer)
- Calls an Anthropic Messages or OpenAI Chat Completions endpoint per tuple with a templated system + user prompt.
- Provider enum (LLMProvider: ANTHROPIC, OPENAI) switches request body shape and reply-text extraction path (content[0].text vs choices[0].message.content).
- Request body built via Jackson ObjectNode so user-supplied prompt content is automatically JSON-escaped — no broken templates from embedded quotes or newlines.
- API key sourced from the operator field, falling back to the ANTHROPIC_API_KEY / OPENAI_API_KEY environment variable.
- Appends a configurable output column (default "llm_response") and "llm_error" to the input schema.

Shared utilities (operator/http/util/):

HttpClientFactory: lazy singleton java.net.http.HttpClient reused by all operators.
HttpMethod: enum with @jsonvalue so the UI renders a dropdown.
KeyValuePair: Jackson-friendly header entry class.
TemplateInterpolator: ${fieldName} substitution from a Tuple.

Frontend fix (result-panel cell click):

When clicking a cell in the result table, the modal now receives the table's row data as a fallback and displays it immediately, overwriting only if the paginated server lookup returns a non-empty tuple. Previously, clicking a cell when the paginated result service was not yet initialized produced a permanently blank modal because the request to fetch the full row never fired (?. short-circuit on undefined service).

@jsonvalue

Adds four first-class Texera operators ("PulseFlow") for building live analytical workflows over real-time data feeds: 1. PollingHttpSource (source) - Polls an HTTP/REST endpoint at a fixed interval and emits each response as a tuple. - Configurable method (GET/POST/PUT/PATCH/DELETE), headers, request body, interval, and an optional maxIterations cap (0 = forever). - Output schema: response_body, status_code, polled_at. - Implemented via a forever-running Iterator that sleeps between polls; works around Texera's bounded-source model without engine changes. 2. WebSocketSource (source) - Connects to a ws:// or wss:// endpoint and emits each received frame as a tuple, forever. - Uses JDK 11+ java.net.http WebSocket; supports an initial subscribe message and arbitrary handshake headers. - Permissive URI handling: trims whitespace and percent-encodes the '@' character (common in Binance-style stream names) so users can paste provider URLs verbatim. - Requests Long.MaxValue messages up front to avoid per-frame back-pressure bookkeeping; reassembles partial text frames. - Output schema: message, received_at. 3. HttpRequest (transformer) - For each input tuple, performs a configurable HTTP call with ${fieldName} interpolation in URL and body templates. - Appends http_request_status, http_request_body, and http_request_error to the input schema (namespaced to avoid collisions with upstream columns like response_body). - failOnError toggle controls whether non-2xx responses crash the workflow or are surfaced inline. 4. LLMAgent (transformer) - Calls an Anthropic Messages or OpenAI Chat Completions endpoint per tuple with a templated system + user prompt. - Provider enum (LLMProvider: ANTHROPIC, OPENAI) switches request body shape and reply-text extraction path (content[0].text vs choices[0].message.content). - Request body built via Jackson ObjectNode so user-supplied prompt content is automatically JSON-escaped — no broken templates from embedded quotes or newlines. - API key sourced from the operator field, falling back to the ANTHROPIC_API_KEY / OPENAI_API_KEY environment variable. - Appends a configurable output column (default "llm_response") and "llm_error" to the input schema. Shared utilities (operator/http/util/): - HttpClientFactory: lazy singleton java.net.http.HttpClient reused by all operators. - HttpMethod: enum with @jsonvalue so the UI renders a dropdown. - KeyValuePair: Jackson-friendly header entry class. - TemplateInterpolator: ${fieldName} substitution from a Tuple. Registration: each operator gets a single @JsonSubTypes entry in LogicalOp.scala; the metadata/palette refreshes automatically from that registry. Operator icons: 128x128 PNGs added under frontend assets — clock for PollingHttpSource, dashed stream for WebSocketSource, curly-brace API glyph for HttpRequest, and a stylized brain for LLMAgent. Frontend fix (result-panel cell click): - When clicking a cell in the result table, the modal now receives the table's row data as a fallback and displays it immediately, overwriting only if the paginated server lookup returns a non-empty tuple. Previously, clicking a cell when the paginated result service was not yet initialized produced a permanently blank modal because the request to fetch the full row never fired (?. short-circuit on undefined service). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nathant27 · 2026-05-16T17:13:07Z

https://drive.google.com/file/d/14WV4hxlNoN3keYo8xOTaiYfMO4pfEiC9/view?usp=sharing
Is the demo link. I'll also upload on YouTube and put the link here
https://youtu.be/Ic2XTnUJmHs

codecov-commenter · 2026-05-16T17:13:29Z

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.92%. Comparing base (7bb102e) to head (ce644d7).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
...onent/result-panel/result-panel-modal.component.ts	0.00%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5103      +/-   ##
============================================
+ Coverage     42.85%   42.92%   +0.07%     
- Complexity     2207     2209       +2     
============================================
  Files          1045     1045              
  Lines         40146    40178      +32     
  Branches       4240     4250      +10     
============================================
+ Hits          17203    17245      +42     
+ Misses        21878    21866      -12     
- Partials       1065     1067       +2

Flag	Coverage Δ		*Carryforward flag
access-control-service	`39.53% <ø> (ø)`
agent-service	`33.72% <ø> (ø)`		Carriedforward from 7bb102e
amber	`43.74% <ø> (ø)`		Carriedforward from 7bb102e
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`0.00% <ø> (ø)`
file-service	`32.18% <ø> (ø)`
frontend	`34.05% <0.00%> (+0.11%)`	⬆️
python	`88.99% <ø> (ø)`		Carriedforward from 7bb102e
workflow-compiling-service	`56.81% <ø> (+9.09%)`	⬆️

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions Bot assigned nathant27 May 16, 2026

github-actions Bot added feature frontend Changes related to the frontend GUI common labels May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hackathon] feat: add streaming HTTP/WebSocket sources and an LLM agent operator#5103

[Hackathon] feat: add streaming HTTP/WebSocket sources and an LLM agent operator#5103
nathant27 wants to merge 1 commit into
apache:mainfrom
nathant27:feat/pulseflow-streaming-and-llm-operators

nathant27 commented May 16, 2026 •

edited

Loading

Uh oh!

nathant27 commented May 16, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nathant27 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Uh oh!

nathant27 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nathant27 commented May 16, 2026 •

edited

Loading

nathant27 commented May 16, 2026 •

edited

Loading

codecov-commenter commented May 16, 2026 •

edited

Loading