Skip to content

/search-plan may over-fetch split metadata from PostgreSQL metastore on wide queries #6430

@earlbread

Description

@earlbread

Summary

We are routing through a proxy for guardrails, and before executing a /search query, we call /search-plan to determine whether the query is too heavy.

We observed that a small number of wide /<index>/search-plan requests can put noticeable CPU and read I/O pressure on the PostgreSQL metastore.

This may be related to how /search-plan currently uses the general-purpose list_splits metastore path. For PostgreSQL, that path appears to fetch full split rows, including split_metadata_json, even though the /search-plan response only needs split identifiers for searched_splits.

Environment

  • Quickwit: e86f4fb
  • Metastore backend: PostgreSQL (AWS Aurora PostgreSQL)
  • Workload: low request rate, but some wide /<index>/search-plan calls covering broad time ranges

Observed behavior

During the execution of /search-plan, the PostgreSQL Read Replica showed read-heavy load that was disproportionate to the request rate:

  • DB CPU reached roughly 85–90%.
  • Statement calls stayed in the single-digit calls/sec range.
  • Active sessions were not saturated.
  • Connection pool usage was low.
  • Read throughput peaked around a few hundred MiB/s.
  • Read IOPS peaked around tens of thousands per second.
  • Returned/fetched tuples spiked sharply.
  • Locks were mostly AccessShareLock; no conflicts or deadlocks were observed.

This suggests the load is not caused by connection saturation, lock contention, or writes, but by a small number of read queries scanning and/or returning many rows.

As a mitigation we tried setting statement_timeout on the metastore connection, but it did not help: the heavy queries do not exceed the timeout, they simply do a lot of per-call work within it.

Image Image Image

Suspected cause

/<index>/search-plan goes through the normal split planning path:

  • quickwit-search/src/root.rssearch_plan
  • quickwit-search/src/lib.rslist_relevant_splits
  • PostgreSQL metastore — list_splits

For PostgreSQL, list_splits builds a general-purpose split listing query and returns full split rows, including the large split_metadata_json TEXT column for every matched split. Each row is then JSON-deserialized into SplitMetadata on the Quickwit side.

The /search-plan response, however, only appears to use the split identifier from each split:

let split_ids = split_metadatas
    .into_iter()
    .map(|split| format!("{}/{}", split.index_uid.index_id, split.split_id))
    .collect();

So for N matched splits, each /search-plan call appears to read and JSON-deserialize N full SplitMetadata values only to keep index_id and split_id from each. With wide queries (broad time range or * patterns) matching tens of thousands of splits, this is consistent with the read-heavy load we observed.

/<index>/search shares the same list_splits path. The downstream search code only uses a small number of scalar fields per split (index_uid, split_id, footer_offsets, time_range, num_docs), so a similar over-fetching shape likely applies there as well.

Possible improvement

Would it make sense to add a lightweight split listing path for planning-only use cases?

For /search-plan, the PostgreSQL metastore could avoid the general-purpose list_splits projection and fetch only the required identifiers, for example:

SELECT index_uid, split_id
FROM splits
WHERE ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions