fix: call warm_up_process to prevent loguru blocking#20
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a warm_up_process call in main_verl.py. Feedback recommends moving the import inside the run method for better Ray actor performance and resolving the configuration before the call to handle Hydra interpolations correctly.
| from loguru import logger | ||
| from omegaconf import OmegaConf | ||
| from verl.utils.fs import copy_to_local | ||
| warm_up_process(config) |
There was a problem hiding this comment.
The warm_up_process function uses configuration values such as experiment_dir and experiment_name to initialize logging. These values often contain Hydra interpolations (e.g., ${hydra:runtime.output_dir}) that must be resolved before they can be used as valid file system paths. Calling warm_up_process before OmegaConf.resolve(config) may lead to incorrect log paths or initialization failures. It is recommended to resolve the configuration first. Additionally, importing warm_up_process locally here maintains consistency with other imports in this method. Note that the existing call to OmegaConf.resolve(config) at line 124 will become redundant and should be removed.
| from loguru import logger | |
| from omegaconf import OmegaConf | |
| from verl.utils.fs import copy_to_local | |
| warm_up_process(config) | |
| from loguru import logger | |
| from omegaconf import OmegaConf | |
| from verl.utils.fs import copy_to_local | |
| from ajet.backbone.warm_up import warm_up_process | |
| OmegaConf.resolve(config) | |
| warm_up_process(config) |
| # Create training and validation datasets. | ||
| from ajet.backbone.warm_up import warm_up_process | ||
| from ajet.task_reader import RouterTaskReader, task_to_standard_dataset |
There was a problem hiding this comment.
The import of warm_up_process is currently placed between the comment # Create training and validation datasets. and the actual dataset-related imports, which is confusing. Furthermore, for Ray remote classes like TaskRunner, it is a common practice to keep imports local to the methods where they are used to keep the actor definition lightweight and avoid unnecessary side effects during module import on the head node. This import should be moved inside the run method.
| # Create training and validation datasets. | |
| from ajet.backbone.warm_up import warm_up_process | |
| from ajet.task_reader import RouterTaskReader, task_to_standard_dataset | |
| # Create training and validation datasets. | |
| from ajet.task_reader import RouterTaskReader, task_to_standard_dataset |
No description provided.