Respect optional IsaacLab action bounds by HiccupRL · Pull Request #36 · typoverflow/flow-rl

HiccupRL · 2026-04-22T14:42:11Z

Summary

Pass raw actions through the IsaacLab on-policy trainer when action_bound is unset.
Keep the existing normalised-action path when action_bound is set by clipping to [-1, 1] before the environment wrapper scales the action.
Mark action_bound as Optional[float] so null is a valid configuration value.

Rationale

IsaacLab does not impose a global [-1, 1] action limit for every task. Some tasks use unbounded policy actions and then apply task-specific scaling, action-term clipping, or actuator limits. The previous trainer behaviour clipped actions unconditionally before calling env.step, even when the wrapper was configured with action_bound=None.

That made action_bound=None ineffective and could change the interaction semantics for policies that intentionally emit actions outside [-1, 1], such as flow-based on-policy agents. With this change, the environment adapter remains responsible for bounded-action semantics: if action_bound is provided, actions are clipped and scaled; if it is not provided, actions are passed through unchanged.

Validation

Ran a syntax check with Python compile() for:
- examples/online/main_isaaclab_onpolicy.py
- flowrl/config/online/onpolicy_isaaclab_config.py

typoverflow · 2026-04-22T17:28:34Z

Hi,

Thanks for capturing this. Yes, I was aware that action spaces of environments from IsaacLab are not necessarily bounded. The reason why I imposed a bound here is that, in standard algorithms like PPO, the output range of our policy is always bounded to [-1, 1] because of tanh-squashing. In diffusion policies, it is also very common to bound the generated actions within a certain range, and we enabled this option (clip_samples=true) for every diffusion-based algorithms Given the range of PPO policies, I decided to set this range to [-1, 1] as well. Therefore accordingly, we have to impose some range of the action space to the envs so that our algorithms can behave normally.

We set an individual action_range for each of them (see the config file list) and rescale the [-1, 1] action to the given range in the environment wrapper.

flow-rl/flowrl/env/online/isaaclab_env.py

Line 78 in b1385e0

if self.action_bound is not None:

That said, I did not rigorously ablated the effect of this action clipping for tasks with unbounded ranges... Do you have specific observations where no action-clipping performs better?

HiccupRL · 2026-04-23T07:04:21Z

Thanks for the explanation. My concern is that this may not be a good default for all IsaacLab tasks. Different from Mujoco or OGBench, IsaacLab actions are often interpreted through task-specific scales, offsets, or controllers, and some environments such as isaac-Humanoid can work better with the native/unbounded action interface. This may also matter for diffusion-policy methods like GenPO, where modelling the natural action scale can be beneficial.

So I would suggest making action clipping/rescaling optional and environment-specific, rather than enforcing [-1, 1] globally for IsaacLab.

typoverflow · 2026-04-23T19:31:35Z

Hey @HiccupRL, thanks for the further explanation! We will launch a battery of experiments without action range and action clipping. Just one more question, for PPO, do you suggest using an unbounded action distribution (like Gaussian instead of TanhGaussian) in that case?

HiccupRL · 2026-04-25T16:41:30Z

My view is that we should only clip actions when the environment itself enforces action bounds. Environments such as MuJoCo or OGBench may raise an error if an action falls outside [-1, 1], whereas Isaac Lab does not. In practice, leaving actions unclipped can yield better performance on some tasks, especially Humanoid. You can verify it by experiments.

typoverflow · 2026-04-25T18:36:47Z

I launched some experiments yesterday without action range and they seem to outperform the ones with action clipping. I will finalize the experiments and update the results in the following week to come. By then we will merge this PR.

Thanks again!

HiccupRL · 2026-04-26T07:28:43Z

Thanks a lot for the update, and also for carefully organizing this benchmark.

One small reminder: if we remove the action range / tanh constraint, the corresponding config files should also be updated accordingly. Also, for PPO, the log likelihood computation needs to be changed after removing tanh squashing, so that the likelihood ratio is computed under the actual action distribution.

Respect optional IsaacLab action bounds

c7b790c

HiccupRL closed this Apr 22, 2026

HiccupRL reopened this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect optional IsaacLab action bounds#36

Respect optional IsaacLab action bounds#36
HiccupRL wants to merge 1 commit into
typoverflow:masterfrom
HiccupRL:codex-conditional-action-bound-clipping

HiccupRL commented Apr 22, 2026

Uh oh!

typoverflow commented Apr 22, 2026

Uh oh!

HiccupRL commented Apr 23, 2026

Uh oh!

typoverflow commented Apr 23, 2026

Uh oh!

HiccupRL commented Apr 25, 2026

Uh oh!

typoverflow commented Apr 25, 2026

Uh oh!

HiccupRL commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HiccupRL commented Apr 22, 2026

Summary

Rationale

Validation

Uh oh!

typoverflow commented Apr 22, 2026

Uh oh!

HiccupRL commented Apr 23, 2026

Uh oh!

typoverflow commented Apr 23, 2026

Uh oh!

HiccupRL commented Apr 25, 2026

Uh oh!

typoverflow commented Apr 25, 2026

Uh oh!

HiccupRL commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants