Skip to content

[Hackathon] feat: add streaming HTTP/WebSocket sources and an LLM agent operator#5103

Draft
nathant27 wants to merge 1 commit into
apache:mainfrom
nathant27:feat/pulseflow-streaming-and-llm-operators
Draft

[Hackathon] feat: add streaming HTTP/WebSocket sources and an LLM agent operator#5103
nathant27 wants to merge 1 commit into
apache:mainfrom
nathant27:feat/pulseflow-streaming-and-llm-operators

Conversation

@nathant27
Copy link
Copy Markdown
Contributor

@nathant27 nathant27 commented May 16, 2026

What changes were proposed in this PR?

Video/Demo Link

My Problem:
When I was using Texera for the first time, the thing I wanted to do the most was interact with data in real time. While there are some previously implemented operators that interact with external services, like the twitch and reddit ones, they are very limited in what endpoints they can access, and they especially can't access data in realtime. In addition, I wanted to be able to get this data and analyze or summarize this data with LLM's instead of having to do it myself.

My Solution:
PulseFlow, my extension to Texera that allows for real time LLM analysis from external API datasources.
For this extension I implemented 4 main operators that help build these live analytical workflows with real time data feeds:

  • PollingHttpSource polls an endpoint based on interval for a max number of iterations. For when web socket isn't available
  • WebSocketSource establishes web socket connection and uses the streamed response data for processing real time data.
  • HttpRequest for general Http Requests using REST methods. Mostly used POST for showing how it can interact with other external services with the extracted data from the Texera workflow
  • LLMAgent that can read and output data based on input tuples. Used for analyzing the responses and summarizing.

Implementation Details

  1. PollingHttpSource (source)

    • Polls an HTTP/REST endpoint at a fixed interval and emits each response as a tuple.
    • Configurable method (GET/POST/PUT/PATCH/DELETE), headers, request body, interval, and an optional maxIterations cap (0 = forever).
    • Output schema: response_body, status_code, polled_at.
    • Implemented via a forever-running Iterator that sleeps between polls; works around Texera's bounded-source model without engine changes.
  2. WebSocketSource (source)

    • Connects to a ws:// or wss:// endpoint and emits each received frame as a tuple, forever.
    • Uses JDK 11+ java.net.http WebSocket; supports an initial subscribe message and arbitrary handshake headers.
    • Permissive URI handling: trims whitespace and percent-encodes the '@' character (common in Binance-style stream names) so users can paste provider URLs verbatim.
    • Requests Long.MaxValue messages up front to avoid per-frame back-pressure bookkeeping; reassembles partial text frames.
    • Output schema: message, received_at.
  3. HttpRequest (transformer)

    • For each input tuple, performs a configurable HTTP call with ${fieldName} interpolation in URL and body templates.
    • Appends http_request_status, http_request_body, and http_request_error to the input schema (namespaced to avoid collisions with upstream columns like response_body).
    • failOnError toggle controls whether non-2xx responses crash the workflow or are surfaced inline.
  4. LLMAgent (transformer)

    • Calls an Anthropic Messages or OpenAI Chat Completions endpoint per tuple with a templated system + user prompt.
    • Provider enum (LLMProvider: ANTHROPIC, OPENAI) switches request body shape and reply-text extraction path (content[0].text vs choices[0].message.content).
    • Request body built via Jackson ObjectNode so user-supplied prompt content is automatically JSON-escaped — no broken templates from embedded quotes or newlines.
    • API key sourced from the operator field, falling back to the ANTHROPIC_API_KEY / OPENAI_API_KEY environment variable.
    • Appends a configurable output column (default "llm_response") and "llm_error" to the input schema.

Shared utilities (operator/http/util/):

  • HttpClientFactory: lazy singleton java.net.http.HttpClient reused by all operators.
  • HttpMethod: enum with @jsonvalue so the UI renders a dropdown.
  • KeyValuePair: Jackson-friendly header entry class.
  • TemplateInterpolator: ${fieldName} substitution from a Tuple.

Frontend fix (result-panel cell click):

  • When clicking a cell in the result table, the modal now receives the table's row data as a fallback and displays it immediately, overwriting only if the paginated server lookup returns a non-empty tuple. Previously, clicking a cell when the paginated result service was not yet initialized produced a permanently blank modal because the request to fetch the full row never fired (?. short-circuit on undefined service).

Adds four first-class Texera operators ("PulseFlow") for building live
analytical workflows over real-time data feeds:

1. PollingHttpSource (source)
   - Polls an HTTP/REST endpoint at a fixed interval and emits each
     response as a tuple.
   - Configurable method (GET/POST/PUT/PATCH/DELETE), headers, request
     body, interval, and an optional maxIterations cap (0 = forever).
   - Output schema: response_body, status_code, polled_at.
   - Implemented via a forever-running Iterator that sleeps between
     polls; works around Texera's bounded-source model without engine
     changes.

2. WebSocketSource (source)
   - Connects to a ws:// or wss:// endpoint and emits each received
     frame as a tuple, forever.
   - Uses JDK 11+ java.net.http WebSocket; supports an initial
     subscribe message and arbitrary handshake headers.
   - Permissive URI handling: trims whitespace and percent-encodes the
     '@' character (common in Binance-style stream names) so users can
     paste provider URLs verbatim.
   - Requests Long.MaxValue messages up front to avoid per-frame
     back-pressure bookkeeping; reassembles partial text frames.
   - Output schema: message, received_at.

3. HttpRequest (transformer)
   - For each input tuple, performs a configurable HTTP call with
     ${fieldName} interpolation in URL and body templates.
   - Appends http_request_status, http_request_body, and
     http_request_error to the input schema (namespaced to avoid
     collisions with upstream columns like response_body).
   - failOnError toggle controls whether non-2xx responses crash the
     workflow or are surfaced inline.

4. LLMAgent (transformer)
   - Calls an Anthropic Messages or OpenAI Chat Completions endpoint
     per tuple with a templated system + user prompt.
   - Provider enum (LLMProvider: ANTHROPIC, OPENAI) switches request
     body shape and reply-text extraction path
     (content[0].text vs choices[0].message.content).
   - Request body built via Jackson ObjectNode so user-supplied prompt
     content is automatically JSON-escaped — no broken templates from
     embedded quotes or newlines.
   - API key sourced from the operator field, falling back to the
     ANTHROPIC_API_KEY / OPENAI_API_KEY environment variable.
   - Appends a configurable output column (default "llm_response") and
     "llm_error" to the input schema.

Shared utilities (operator/http/util/):
   - HttpClientFactory: lazy singleton java.net.http.HttpClient reused
     by all operators.
   - HttpMethod: enum with @jsonvalue so the UI renders a dropdown.
   - KeyValuePair: Jackson-friendly header entry class.
   - TemplateInterpolator: ${fieldName} substitution from a Tuple.

Registration: each operator gets a single @JsonSubTypes entry in
LogicalOp.scala; the metadata/palette refreshes automatically from
that registry.

Operator icons: 128x128 PNGs added under frontend assets — clock for
PollingHttpSource, dashed stream for WebSocketSource, curly-brace
API glyph for HttpRequest, and a stylized brain for LLMAgent.

Frontend fix (result-panel cell click):
   - When clicking a cell in the result table, the modal now receives
     the table's row data as a fallback and displays it immediately,
     overwriting only if the paginated server lookup returns a
     non-empty tuple. Previously, clicking a cell when the paginated
     result service was not yet initialized produced a permanently
     blank modal because the request to fetch the full row never
     fired (?. short-circuit on undefined service).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added feature frontend Changes related to the frontend GUI common labels May 16, 2026
@nathant27
Copy link
Copy Markdown
Contributor Author

nathant27 commented May 16, 2026

https://drive.google.com/file/d/14WV4hxlNoN3keYo8xOTaiYfMO4pfEiC9/view?usp=sharing
Is the demo link. I'll also upload on YouTube and put the link here
https://youtu.be/Ic2XTnUJmHs

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 16, 2026

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.92%. Comparing base (7bb102e) to head (ce644d7).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...onent/result-panel/result-panel-modal.component.ts 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5103      +/-   ##
============================================
+ Coverage     42.85%   42.92%   +0.07%     
- Complexity     2207     2209       +2     
============================================
  Files          1045     1045              
  Lines         40146    40178      +32     
  Branches       4240     4250      +10     
============================================
+ Hits          17203    17245      +42     
+ Misses        21878    21866      -12     
- Partials       1065     1067       +2     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.72% <ø> (ø) Carriedforward from 7bb102e
amber 43.74% <ø> (ø) Carriedforward from 7bb102e
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.18% <ø> (ø)
frontend 34.05% <0.00%> (+0.11%) ⬆️
python 88.99% <ø> (ø) Carriedforward from 7bb102e
workflow-compiling-service 56.81% <ø> (+9.09%) ⬆️

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common feature frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants