[megatron] make `empty_cuda_cache` default true for megatron by erictang000 · Pull Request #1731 · NovaSky-AI/SkyRL

erictang000 · 2026-05-29T20:44:16Z

This was originally added in #716, but was set to None by default during the config refactoring: d1e1b29

Setting back to true, as this can help with unnecessary OOMs during optim step.

gemini-code-assist

Code Review

This pull request changes the default value of empty_cuda_cache from None to True in MegatronConfig. The reviewer notes that while this change helps prevent Out-Of-Memory (OOM) issues, enabling it by default can introduce significant CUDA synchronization overhead and degrade training throughput. Additionally, the reviewer suggests simplifying the type annotation from Optional[bool] to bool since None is no longer the default.

gemini-code-assist · 2026-05-29T20:44:54Z

        default_factory=lambda: copy.deepcopy(DEFAULT_TRANSFORMER_CONFIG_KWARGS)
    )
-    empty_cuda_cache: Optional[bool] = None
+    empty_cuda_cache: Optional[bool] = True


While setting empty_cuda_cache to True by default helps prevent Out-Of-Memory (OOM) issues during the optimization step, calling torch.cuda.empty_cache() after every forward and forward-backward pass (per mini-batch) introduces significant CUDA synchronization overhead, which can drastically degrade training throughput.

Additionally, since None is no longer the default and is functionally treated as False in the worker implementation (if self.empty_cuda_cache:), we can simplify the type annotation from Optional[bool] to bool.

Suggested change

empty_cuda_cache: Optional[bool] = True

empty_cuda_cache: bool = True

make empty cuda cache default true for megatron

5f810d9

erictang000 merged commit 793a543 into NovaSky-AI:main May 29, 2026
2 of 3 checks passed

gemini-code-assist Bot reviewed May 29, 2026

View reviewed changes

erictang000 deleted the empty_cuda_cache branch May 29, 2026 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] make `empty_cuda_cache` default true for megatron#1731

[megatron] make `empty_cuda_cache` default true for megatron#1731
erictang000 merged 1 commit into
NovaSky-AI:mainfrom
erictang000:empty_cuda_cache

erictang000 commented May 29, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	empty_cuda_cache: Optional[bool] = True
	empty_cuda_cache: bool = True

Conversation

erictang000 commented May 29, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant