Skip to content

[WIP, just being used to read code]#366

Closed
eugenevinitsky wants to merge 22 commits into3.0from
ricky/actions_trajectory_3
Closed

[WIP, just being used to read code]#366
eugenevinitsky wants to merge 22 commits into3.0from
ricky/actions_trajectory_3

Conversation

@eugenevinitsky
Copy link
Copy Markdown

No description provided.

Copilot AI review requested due to automatic review settings March 24, 2026 11:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP PR introduces “action trajectory” support (predicting a sequence of future actions instead of a single action), threads that through the Drive policy/training loop, and adds C/Python plumbing to visualize predicted trajectories in the Ocean Drive renderer.

Changes:

  • Extend action sampling to emit an action trajectory of configurable length.
  • Augment observations with the previous predicted action trajectory and add a trajectory-consistency reward term during training.
  • Add Drive-side predicted-trajectory storage + rendering, including a Python API and a C extension setter.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
pufferlib/pytorch.py Updates sample_logits to support sampling full action trajectories.
pufferlib/pufferl.py Augments obs/actions buffers for trajectories; adds trajectory rollout + commitment reward; updates eval/train sampling.
pufferlib/ocean/torch.py Updates Drive policy to encode past action trajectories and output trajectory logits.
pufferlib/ocean/env_config.h Adds actions_trajectory_length parsing from [train] to env init config.
pufferlib/ocean/env_binding.h Adds vec_set_predicted_trajectories to copy predicted traj coords into C envs.
pufferlib/ocean/drive/visualize.c Threads traj length into env/net init; renders predicted trajectories.
pufferlib/ocean/drive/python_dynamics.py New: PyTorch rollout + commitment loss utilities for trajectory shaping.
pufferlib/ocean/drive/drivenet.h Extends DriveNet to output per-step logits and maintain past-actions state.
pufferlib/ocean/drive/drive.py Adds Python wrapper method set_predicted_trajectories.
pufferlib/ocean/drive/drive.h Adds predicted trajectory buffers + rollout helper + rendering in draw_scene.
pufferlib/ocean/drive/drive.c Updates drivenet init calls to pass trajectory length.
pufferlib/ocean/benchmark/evaluator.py Passes actions_trajectory_length into sampling during evaluation rollouts.
pufferlib/models.py Updates Default and LSTMWrapper observation sizing for trajectory augmentation.
pufferlib/config/ocean/drive.ini Adjusts training hyperparams; adds trajectory-length and loss config knobs.
.gitignore Ignores local experiment launcher artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pufferlib/pufferl.py
self.values = torch.zeros(segments, horizon, device=device)
self.logprobs = torch.zeros(segments, horizon, device=device)
self.rewards = torch.zeros(segments, horizon, device=device)
self.r_commitments = torch.zeros(segments, horizon, device=device)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.r_commitments is allocated but never read or written anywhere else in this file. If it’s no longer needed, remove it to avoid confusion; if it’s intended for logging/debugging, wire it into the training loop (e.g., store r_commitment per-step).

Suggested change
self.r_commitments = torch.zeros(segments, horizon, device=device)

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
Comment on lines +121 to +131
self.actions_trajectory_length = config.get("actions_trajectory_length", 80)
atn_traj_size = ( # TODO: please let's be precise with the semantics before checking this in
self.actions_trajectory_length,
2, # 2 real numbers for accel and steering
) # TODO: generalize to continuous case
assert len(obs_space.shape) == 1
self.obs_len = obs_space.shape[0]
obs_aug_shape = (
obs_space.shape[0] + np.prod(atn_traj_size),
) # assumes obs_space is 1d and we flatten the action trajectory

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new observation augmentation and assertions make PuffeRL effectively Drive-specific (assert len(obs_space.shape) == 1, hard-coded atn_traj_size=(traj_len,2)). This will fail for non-1D observations (e.g., images/dicts) and non-Drive action spaces. Consider gating this entire augmentation/trajectory-loss path behind an explicit config flag or env check, and keep the default behavior compatible with existing environments.

Copilot uses AI. Check for mistakes.
Comment on lines +929 to +937
float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr);
float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr);

int offset = 0;
for (int i = 0; i < vec->num_envs; i++) {
Drive *drive = (Drive *)vec->envs[i];
int n = drive->active_agent_count * drive->predicted_traj_len;
memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float));
memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float));
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec_set_predicted_trajectories memcpy’s from the provided NumPy arrays without validating dtype (e.g., float32), contiguity, or that the arrays are large enough for the computed offset + n. If callers pass the wrong dtype/shape, this can read past the buffer and crash the process. Consider checking PyArray_TYPE, PyArray_NDIM/SHAPE, PyArray_ISCARRAY, and PyArray_SIZE before memcpy.

Suggested change
float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr);
float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr);
int offset = 0;
for (int i = 0; i < vec->num_envs; i++) {
Drive *drive = (Drive *)vec->envs[i];
int n = drive->active_agent_count * drive->predicted_traj_len;
memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float));
memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float));
PyArrayObject *x_arr_obj = (PyArrayObject *)x_arr;
PyArrayObject *y_arr_obj = (PyArrayObject *)y_arr;
if (PyArray_TYPE(x_arr_obj) != NPY_FLOAT32 || PyArray_TYPE(y_arr_obj) != NPY_FLOAT32) {
PyErr_SetString(PyExc_TypeError, "x and y must be NumPy arrays of dtype float32");
return NULL;
}
if (!PyArray_ISCARRAY(x_arr_obj) || !PyArray_ISCARRAY(y_arr_obj)) {
PyErr_SetString(PyExc_TypeError, "x and y must be contiguous float32 NumPy arrays");
return NULL;
}
npy_intp size_x = PyArray_SIZE(x_arr_obj);
npy_intp size_y = PyArray_SIZE(y_arr_obj);
float *x_base = (float *)PyArray_DATA(x_arr_obj);
float *y_base = (float *)PyArray_DATA(y_arr_obj);
npy_intp offset = 0;
for (int i = 0; i < vec->num_envs; i++) {
Drive *drive = (Drive *)vec->envs[i];
npy_intp n = (npy_intp)(drive->active_agent_count * drive->predicted_traj_len);
if (n < 0 || offset > size_x - n || offset > size_y - n) {
PyErr_SetString(PyExc_ValueError, "Input trajectory arrays are too small for predicted trajectories");
return NULL;
}
memcpy(drive->predicted_traj_x, &x_base[offset], (size_t)n * sizeof(float));
memcpy(drive->predicted_traj_y, &y_base[offset], (size_t)n * sizeof(float));

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
import os
import sys
import glob
import json
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level import json appears unused (JSON is imported inside functions, and the only other references here are in commented-out code). Removing it avoids an unnecessary import and keeps the module’s dependencies minimal.

Suggested change
import json

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
Comment on lines +69 to +77
def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
accel = actions // 13
steering = actions % 13
return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)


def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
accel = actions // 13
steering = actions % 13
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These action decoding helpers hard-code 13 (steering bins) and assume the CLASSIC discrete joint action encoding. This will be incorrect for dynamics_model=jerk (3 lateral bins) and for continuous actions. Consider deriving the split based on the actual env action space / dynamics model (e.g., from vecenv.driver_env config) instead of a fixed constant.

Suggested change
def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
accel = actions // 13
steering = actions % 13
return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)
def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
accel = actions // 13
steering = actions % 13
def convert_single_action_integers_to_r2(
actions: torch.Tensor,
steering_bins: int = 13,
) -> torch.Tensor:
"""Convert scalar discrete action indices into (accel, steering) pairs.
The steering bin count defaults to 13 (CLASSIC encoding) but can be
overridden for other dynamics models (e.g., jerk with 3 lateral bins).
"""
accel = actions // steering_bins
steering = actions % steering_bins
return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)
def convert_bptt_action_integers_to_r2(
actions: torch.Tensor,
steering_bins: int = 13,
) -> torch.Tensor:
"""Convert BPTT-discounted discrete action indices into (accel, steering).
The steering bin count defaults to 13 (CLASSIC encoding) but can be
overridden for other dynamics models.
"""
accel = actions // steering_bins
steering = actions % steering_bins

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
Comment on lines +436 to +448
action_N2 = convert_bptt_action_integers_to_r2(self.actions)
traj, heading = rollout_state_trajectory_ego(action_N2, observations=self.observations)
r_commitment = compute_l2_loss_ego_action_traj(
traj,
self.prev_state_traj,
heading,
torch.cat([self.prev_terminals, self.terminals[:, :-1]], dim=1),
trajectory_loss_norm=config["trajectory_loss_norm"],
trajectory_loss_clamp_min=config["trajectory_loss_clamp_min"],
)
r_commitment = r_commitment.detach()
self.prev_state_traj = traj[:, -1, ...]
self.prev_terminals = self.terminals[:, [-1]]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.prev_state_traj is initialized with shape (total_agents, traj_len, 2) but here it’s used alongside traj whose batch dimension is segments (from self.actions/self.observations). If segments != total_agents (which is allowed as long as total_agents <= segments), compute_l2_loss_ego_action_traj will hit a shape mismatch on the first epoch. Consider making prev_state_traj consistently indexed by segments (or by the same row-indexing scheme as self.actions/self.observations) from initialization onward.

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
Comment on lines +154 to +161
).softmax(dim=-1)

self.prev_state_traj = torch.ones(
total_agents,
*atn_traj_size,
device=device,
dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype],
).softmax(dim=-1)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_actions_traj/prev_state_traj are initialized with -torch.ones(...).softmax(dim=-1), which converts the intended sentinel values into a valid probability distribution (e.g., [0.5, 0.5]) and makes later logic like prev_traj[done_mask] = -1.0 inconsistent. If these tensors are meant to store raw previous actions / coordinates or a sentinel, initialize them directly (e.g., -1 or 0) without softmax.

Suggested change
).softmax(dim=-1)
self.prev_state_traj = torch.ones(
total_agents,
*atn_traj_size,
device=device,
dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype],
).softmax(dim=-1)
)
self.prev_state_traj = -torch.ones(
total_agents,
*atn_traj_size,
device=device,
dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype],
)

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
loss = (
pg_loss
+ config["vf_coef"] * v_loss
- config["ent_coef"] * entropy_loss / config["actions_trajectory_length"]
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entropy scaling divides by config["actions_trajectory_length"], but elsewhere you treat this setting as optional via config.get(...). This can raise a KeyError for configs that don’t define it. Prefer self.actions_trajectory_length (or config.get(..., default)) for consistency.

Suggested change
- config["ent_coef"] * entropy_loss / config["actions_trajectory_length"]
- config["ent_coef"] * entropy_loss / self.actions_trajectory_length

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/pufferl.py
Comment on lines +1406 to +1408
action, logprob, _, _ = pufferlib.pytorch.sample_logits(
logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call unpacks 4 values from sample_logits, but pufferlib.pytorch.sample_logits returns 3 (action, logprob, entropy). Also, when actions_trajectory_length > 1, action will be a trajectory (e.g., shape [B, K]) and reshaping it directly to vecenv.action_space.shape will fail; you likely need to select the first action before stepping the env (similar to what PuffeRL.evaluate does).

Suggested change
action, logprob, _, _ = pufferlib.pytorch.sample_logits(
logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)
)
action, logprob, entropy = pufferlib.pytorch.sample_logits(
logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)
)
actions_traj_len = args["train"].get("actions_trajectory_length", 80)
if actions_traj_len > 1 and isinstance(action, torch.Tensor) and action.ndim >= 2:
# When a trajectory of actions is returned, select the first action
action = action[:, 0, ...]

Copilot uses AI. Check for mistakes.
Comment on lines +636 to +638
traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords
traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords
"""
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_predicted_trajectories forwards traj_x/traj_y directly into the C extension, which currently memcpy’s without dtype/size validation. To prevent hard crashes, coerce inputs to contiguous np.float32 (and ideally validate length against sum(active_agent_count) * predicted_traj_len) before calling into binding.vec_set_predicted_trajectories.

Suggested change
traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords
traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords
"""
traj_x: numpy array-like of shape (total_agents * traj_len,) with world x coords.
traj_y: numpy array-like of shape (total_agents * traj_len,) with world y coords.
"""
# Ensure inputs are contiguous float32 arrays to satisfy C extension expectations.
traj_x = np.ascontiguousarray(traj_x, dtype=np.float32)
traj_y = np.ascontiguousarray(traj_y, dtype=np.float32)
# Optionally validate length if metadata is available.
if hasattr(self, "active_agent_count") and hasattr(self, "predicted_traj_len"):
try:
total_active_agents = int(np.sum(self.active_agent_count))
expected_len = total_active_agents * int(self.predicted_traj_len)
if traj_x.size != expected_len or traj_y.size != expected_len:
raise ValueError(
f"Predicted trajectory arrays have incorrect length: "
f"expected {expected_len}, got traj_x.size={traj_x.size}, "
f"traj_y.size={traj_y.size}"
)
except Exception:
# If metadata is malformed, fall back without enforcing length to avoid breaking.
pass

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants