perf(postgres): optimize KV storage upsert using executemany by wkpark · Pull Request #2742 · HKUDS/LightRAG

wkpark · 2026-03-03T04:37:14Z

Description

This PR optimizes the document ingestion performance for Postgres-based storage by introducing batch processing
for Key-Value (KV) storage operations. By replacing individual INSERT calls in a loop with executemany and implementing a robust sub-batching mechanism.

it addresses performance bottlenecks where small documents (e.g., <300kB) took several minutes to process.

Changes

PostgreSQLDB.executemany Implementation: Added a dedicated method to the PostgreSQLDB class to support asyncpg's batch execution, ensuring efficient data transmission to Postgres.
Refactored PGKVStorage.upsert: Updated the upsert logic for all namespaces (TEXT_CHUNKS, FULL_DOCS, LLM_CACHE, etc.) to collect data into tuples and execute them in a single batch.
Sub-batching (Chunking) Logic: Introduced a mechanism to split large data sets into smaller sub-batches (defined by _max_batch_size). This prevents memory spikes and database transaction timeouts when processing massive documents.

Checklist

Changes tested locally
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

The sub-batching unit is currently synchronized with the embedding_batch_num (default 10-50), which provides a balanced load for both the LLM embedding phase and the subsequent database write phase.

This PR and PR summary were created with the help of Gemini, with some additional modifications.

- Implement PostgreSQLDB.executemany to support batch SQL execution - Refactor PGKVStorage.upsert to use batch processing for all namespaces

- Chunk large data sets into smaller sub-batches using _max_batch_size

danielaskdd · 2026-03-03T04:58:14Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d3ae4af13

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T05:01:52Z

+            for i in range(0, len(batch_values), self._max_batch_size):
+                sub_batch = batch_values[i : i + self._max_batch_size]
+                await self.db.executemany(upsert_sql, sub_batch)


Stream sub-batches instead of buffering full KV payload

PGKVStorage.upsert now accumulates every row into batch_values and only starts writing inside the sub-batch loop, which means large ingestions hold a second full copy of the payload in memory before the first DB write. In the document path (text_chunks.upsert(inserting_chunks)), this can significantly increase peak RSS and cause OOMs on large inputs, whereas the previous per-row implementation did not buffer all rows at once; the sub-batching logic should flush incrementally as each chunk fills rather than after full accumulation.

Useful? React with 👍 / 👎.

wkpark added 2 commits March 3, 2026 03:48

perf(postgres): optimize KV storage upsert using executemany

664749a

- Implement PostgreSQLDB.executemany to support batch SQL execution - Refactor PGKVStorage.upsert to use batch processing for all namespaces

perf(postgres): implement sub-batching for KV storage upsert

4d3ae4a

- Chunk large data sets into smaller sub-batches using _max_batch_size

chatgpt-codex-connector Bot reviewed Mar 3, 2026

View reviewed changes

danielaskdd merged commit 531a278 into HKUDS:main Mar 18, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(postgres): optimize KV storage upsert using executemany#2742

perf(postgres): optimize KV storage upsert using executemany#2742
danielaskdd merged 2 commits intoHKUDS:mainfrom
wkpark:perf-postgres-batch

wkpark commented Mar 3, 2026

Uh oh!

danielaskdd commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wkpark commented Mar 3, 2026

Description

Changes

Checklist

Additional Notes

Uh oh!

danielaskdd commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants