chore: Updating VectorStore batch size to improve performance by jamie-ons · Pull Request #182 · datasciencecampus/classifai

jamie-ons · 2026-06-08T17:56:14Z

✨ Summary

VectorStore previously exposed batch_size as a repeated parameter on individual methods, creating multiple independent sources of truth. This PR consolidates that to a single value set at construction time.

To inform the choice of default, a profiling analysis was run across the target GCP instance range at batch sizes from 8 to 512. The default has been updated to the value that minimises search time without risking OOM on the smallest supported instances.

Constraints: must not break or perform significantly worse on 2 vCPU instances; optimised for typical cloud deployments at 4–8 vCPUs.

📜 Changes Introduced

VectorStore methods updated so batch_size is self.batch_size from the constructor.
Profiling analysis across e2-standard-2, e2-medium, and e2-standard-8 measuring latency and memory
Default batch_size updated from 8 to based on analysis findings

✅ Checklist

Code passes linting with Ruff
DocStrings follow Google-style and are added as per Pylint recommendations
Documentation has been updated if needed

🔍 How to Test

To test this code, run the DEMO/general_workflow_demo.ipynb.

To test that it does not break on small instances, run on the following GCP instances and locally:

e2-standard-2
e2-medium
e2-standard-8

…ngle source of truth

lukeroantreeONS · 2026-06-09T10:50:25Z


        return result_df

-    def search(self, query: VectorStoreSearchInput, n_results=10, batch_size=8) -> VectorStoreSearchOutput:  # noqa: C901, PLR0912, PLR0915


I think we'd like to retain the option for users to specify a different batch size at this point, but we'd want the default behaviour to follow the single source of truth.

updated VectorStore methods to use self.batch_size so there is one si…

5399d59

…ngle source of truth

jamie-ons linked an issue Jun 8, 2026 that may be closed by this pull request

Review 'batch_size' behaviour #181

Open

lukeroantreeONS reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Updating VectorStore batch size to improve performance#182

chore: Updating VectorStore batch size to improve performance#182
jamie-ons wants to merge 1 commit into
mainfrom
181-review-batch_size-behaviour

jamie-ons commented Jun 8, 2026

Uh oh!

lukeroantreeONS Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return result_df

		def search(self, query: VectorStoreSearchInput, n_results=10, batch_size=8) -> VectorStoreSearchOutput: # noqa: C901, PLR0912, PLR0915

Conversation

jamie-ons commented Jun 8, 2026

✨ Summary

📜 Changes Introduced

✅ Checklist

🔍 How to Test

Uh oh!

lukeroantreeONS Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants