You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 30, 2026. It is now read-only.
The current implementation of LLMBlock sends the entire dataset as a single large batch of requests to the OpenAI server. This may lead to some requests waiting too long for the response, resulting in timeout errors and potentially overloading the backend server with extremely large batches.
Proposed Changes
Use concurrent processing using Python’s concurrent.futures package in LLMBlock. The key changes are:
Utilizes concurrent.futures for managing parallel tasks with threading for launching parallel tasks.
Allows users to specify the number of requests to send in each batch.
Allows users to specify the number of concurrent worker threads to handle batches.
Example Usage
If the user sets the concurrency to 8 and the batch size to 32, the system will run 8 concurrent threads, each sending a batch of 32 prompts, resulting in a total of 256 requests processed simultaneously by the backend server.
From aakankshaduggal#8
Overview
The current implementation of LLMBlock sends the entire dataset as a single large batch of requests to the OpenAI server. This may lead to some requests waiting too long for the response, resulting in timeout errors and potentially overloading the backend server with extremely large batches.
Proposed Changes
Use concurrent processing using Python’s concurrent.futures package in
LLMBlock. The key changes are:Example Usage
If the user sets the concurrency to 8 and the batch size to 32, the system will run 8 concurrent threads, each sending a batch of 32 prompts, resulting in a total of 256 requests processed simultaneously by the backend server.
ilab data generatebatching should be disabled automatically with a remove llama-cpp endpoint instructlab#1892