Skip to content

EventStats: compiled queries endpoint for server-side aggregations #116

@oto-macenauer-absa

Description

@oto-macenauer-absa

Feature Description

Extend the EventStats Lambda with a compiled queries endpoint — a curated list of predefined, server-executed queries (e.g. "failed jobs in the last 7 days", "job count aggregated by catalog") that consumers can invoke by name. Rather than pushing aggregation logic into the client or BI tool, the heavy lifting runs inside PostgreSQL and only the final result set is returned over the wire.

Problem / Opportunity

The current POST /stats/{topic_name} endpoint returns raw paginated rows. Consumers (dashboards, reports, downstream tools) re-derive the same aggregations repeatedly on the client side — wasting bandwidth, duplicating logic, and making results inconsistent across consumers. Predefined server-side queries centralise that logic, reduce payload sizes significantly, and allow PostgreSQL query planning to optimise repeated patterns.

Acceptance Criteria

  • A new route (e.g. POST /stats/{topic_name}/query/{query_name}) accepts a query identifier and an optional parameter bag (e.g. time window, tenant filter).
  • A SUPPORTED_QUERIES registry maps each query_name to its SQL template and accepted parameters — unknown names return 400.
  • Queries are executed server-side; only the aggregated result is returned (no raw row streaming).
  • The endpoint is protected by the same JWT auth and per-topic ACL as the rest of EventStats.
  • Unknown or unsupported query_name values produce a clear 400 error, not a 500.
  • Unit tests cover routing, registry lookup, and each query's result shaping.
  • Integration tests validate at least one aggregation query end-to-end against a seeded testcontainer database.
  • All quality gates pass (./ci_local.sh).

Note for implementer: the concrete set of queries and their SQL definitions must be identified and agreed upon as part of this issue's implementation. The examples below are starting points only — validate with stakeholders which aggregations are actually needed before writing SQL.

Proposed Solution

Introduce a CompiledQueryRegistry (or extend ReaderPostgres) that maps query names to parameterised SQL templates (using psycopg2 %s / sql.SQL composition — never string interpolation). Each entry declares its accepted input parameters and output schema.

Example candidate queries to evaluate with stakeholders:

  • failed_jobs_last_7d — count and list of jobs with a failure status in the last N days, grouped by pipeline/tenant.
  • aggregation_by_catalog — job count, success rate, and average elapsed time grouped by catalog/source.
  • run_status_summary — distribution of run statuses (running, completed, failed) over a configurable time window.

Route dispatch follows the existing ROUTE_HANDLERS pattern in event_stats_lambda.py. A new HandlerCompiledQuery (or an extension of HandlerStats) handles validation and delegates to the registry.

Dependencies / Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions