Conversation
…m in the meantime
There was a problem hiding this comment.
Pull request overview
This WIP PR introduces “action trajectory” support (predicting a sequence of future actions instead of a single action), threads that through the Drive policy/training loop, and adds C/Python plumbing to visualize predicted trajectories in the Ocean Drive renderer.
Changes:
- Extend action sampling to emit an action trajectory of configurable length.
- Augment observations with the previous predicted action trajectory and add a trajectory-consistency reward term during training.
- Add Drive-side predicted-trajectory storage + rendering, including a Python API and a C extension setter.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| pufferlib/pytorch.py | Updates sample_logits to support sampling full action trajectories. |
| pufferlib/pufferl.py | Augments obs/actions buffers for trajectories; adds trajectory rollout + commitment reward; updates eval/train sampling. |
| pufferlib/ocean/torch.py | Updates Drive policy to encode past action trajectories and output trajectory logits. |
| pufferlib/ocean/env_config.h | Adds actions_trajectory_length parsing from [train] to env init config. |
| pufferlib/ocean/env_binding.h | Adds vec_set_predicted_trajectories to copy predicted traj coords into C envs. |
| pufferlib/ocean/drive/visualize.c | Threads traj length into env/net init; renders predicted trajectories. |
| pufferlib/ocean/drive/python_dynamics.py | New: PyTorch rollout + commitment loss utilities for trajectory shaping. |
| pufferlib/ocean/drive/drivenet.h | Extends DriveNet to output per-step logits and maintain past-actions state. |
| pufferlib/ocean/drive/drive.py | Adds Python wrapper method set_predicted_trajectories. |
| pufferlib/ocean/drive/drive.h | Adds predicted trajectory buffers + rollout helper + rendering in draw_scene. |
| pufferlib/ocean/drive/drive.c | Updates drivenet init calls to pass trajectory length. |
| pufferlib/ocean/benchmark/evaluator.py | Passes actions_trajectory_length into sampling during evaluation rollouts. |
| pufferlib/models.py | Updates Default and LSTMWrapper observation sizing for trajectory augmentation. |
| pufferlib/config/ocean/drive.ini | Adjusts training hyperparams; adds trajectory-length and loss config knobs. |
| .gitignore | Ignores local experiment launcher artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.values = torch.zeros(segments, horizon, device=device) | ||
| self.logprobs = torch.zeros(segments, horizon, device=device) | ||
| self.rewards = torch.zeros(segments, horizon, device=device) | ||
| self.r_commitments = torch.zeros(segments, horizon, device=device) |
There was a problem hiding this comment.
self.r_commitments is allocated but never read or written anywhere else in this file. If it’s no longer needed, remove it to avoid confusion; if it’s intended for logging/debugging, wire it into the training loop (e.g., store r_commitment per-step).
| self.r_commitments = torch.zeros(segments, horizon, device=device) |
| self.actions_trajectory_length = config.get("actions_trajectory_length", 80) | ||
| atn_traj_size = ( # TODO: please let's be precise with the semantics before checking this in | ||
| self.actions_trajectory_length, | ||
| 2, # 2 real numbers for accel and steering | ||
| ) # TODO: generalize to continuous case | ||
| assert len(obs_space.shape) == 1 | ||
| self.obs_len = obs_space.shape[0] | ||
| obs_aug_shape = ( | ||
| obs_space.shape[0] + np.prod(atn_traj_size), | ||
| ) # assumes obs_space is 1d and we flatten the action trajectory | ||
|
|
There was a problem hiding this comment.
The new observation augmentation and assertions make PuffeRL effectively Drive-specific (assert len(obs_space.shape) == 1, hard-coded atn_traj_size=(traj_len,2)). This will fail for non-1D observations (e.g., images/dicts) and non-Drive action spaces. Consider gating this entire augmentation/trajectory-loss path behind an explicit config flag or env check, and keep the default behavior compatible with existing environments.
| float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr); | ||
| float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr); | ||
|
|
||
| int offset = 0; | ||
| for (int i = 0; i < vec->num_envs; i++) { | ||
| Drive *drive = (Drive *)vec->envs[i]; | ||
| int n = drive->active_agent_count * drive->predicted_traj_len; | ||
| memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float)); | ||
| memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float)); |
There was a problem hiding this comment.
vec_set_predicted_trajectories memcpy’s from the provided NumPy arrays without validating dtype (e.g., float32), contiguity, or that the arrays are large enough for the computed offset + n. If callers pass the wrong dtype/shape, this can read past the buffer and crash the process. Consider checking PyArray_TYPE, PyArray_NDIM/SHAPE, PyArray_ISCARRAY, and PyArray_SIZE before memcpy.
| float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr); | |
| float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr); | |
| int offset = 0; | |
| for (int i = 0; i < vec->num_envs; i++) { | |
| Drive *drive = (Drive *)vec->envs[i]; | |
| int n = drive->active_agent_count * drive->predicted_traj_len; | |
| memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float)); | |
| memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float)); | |
| PyArrayObject *x_arr_obj = (PyArrayObject *)x_arr; | |
| PyArrayObject *y_arr_obj = (PyArrayObject *)y_arr; | |
| if (PyArray_TYPE(x_arr_obj) != NPY_FLOAT32 || PyArray_TYPE(y_arr_obj) != NPY_FLOAT32) { | |
| PyErr_SetString(PyExc_TypeError, "x and y must be NumPy arrays of dtype float32"); | |
| return NULL; | |
| } | |
| if (!PyArray_ISCARRAY(x_arr_obj) || !PyArray_ISCARRAY(y_arr_obj)) { | |
| PyErr_SetString(PyExc_TypeError, "x and y must be contiguous float32 NumPy arrays"); | |
| return NULL; | |
| } | |
| npy_intp size_x = PyArray_SIZE(x_arr_obj); | |
| npy_intp size_y = PyArray_SIZE(y_arr_obj); | |
| float *x_base = (float *)PyArray_DATA(x_arr_obj); | |
| float *y_base = (float *)PyArray_DATA(y_arr_obj); | |
| npy_intp offset = 0; | |
| for (int i = 0; i < vec->num_envs; i++) { | |
| Drive *drive = (Drive *)vec->envs[i]; | |
| npy_intp n = (npy_intp)(drive->active_agent_count * drive->predicted_traj_len); | |
| if (n < 0 || offset > size_x - n || offset > size_y - n) { | |
| PyErr_SetString(PyExc_ValueError, "Input trajectory arrays are too small for predicted trajectories"); | |
| return NULL; | |
| } | |
| memcpy(drive->predicted_traj_x, &x_base[offset], (size_t)n * sizeof(float)); | |
| memcpy(drive->predicted_traj_y, &y_base[offset], (size_t)n * sizeof(float)); |
| import os | ||
| import sys | ||
| import glob | ||
| import json |
There was a problem hiding this comment.
Top-level import json appears unused (JSON is imported inside functions, and the only other references here are in commented-out code). Removing it avoids an unnecessary import and keeps the module’s dependencies minimal.
| import json |
| def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor: | ||
| accel = actions // 13 | ||
| steering = actions % 13 | ||
| return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1) | ||
|
|
||
|
|
||
| def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor: | ||
| accel = actions // 13 | ||
| steering = actions % 13 |
There was a problem hiding this comment.
These action decoding helpers hard-code 13 (steering bins) and assume the CLASSIC discrete joint action encoding. This will be incorrect for dynamics_model=jerk (3 lateral bins) and for continuous actions. Consider deriving the split based on the actual env action space / dynamics model (e.g., from vecenv.driver_env config) instead of a fixed constant.
| def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor: | |
| accel = actions // 13 | |
| steering = actions % 13 | |
| return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1) | |
| def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor: | |
| accel = actions // 13 | |
| steering = actions % 13 | |
| def convert_single_action_integers_to_r2( | |
| actions: torch.Tensor, | |
| steering_bins: int = 13, | |
| ) -> torch.Tensor: | |
| """Convert scalar discrete action indices into (accel, steering) pairs. | |
| The steering bin count defaults to 13 (CLASSIC encoding) but can be | |
| overridden for other dynamics models (e.g., jerk with 3 lateral bins). | |
| """ | |
| accel = actions // steering_bins | |
| steering = actions % steering_bins | |
| return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1) | |
| def convert_bptt_action_integers_to_r2( | |
| actions: torch.Tensor, | |
| steering_bins: int = 13, | |
| ) -> torch.Tensor: | |
| """Convert BPTT-discounted discrete action indices into (accel, steering). | |
| The steering bin count defaults to 13 (CLASSIC encoding) but can be | |
| overridden for other dynamics models. | |
| """ | |
| accel = actions // steering_bins | |
| steering = actions % steering_bins |
| action_N2 = convert_bptt_action_integers_to_r2(self.actions) | ||
| traj, heading = rollout_state_trajectory_ego(action_N2, observations=self.observations) | ||
| r_commitment = compute_l2_loss_ego_action_traj( | ||
| traj, | ||
| self.prev_state_traj, | ||
| heading, | ||
| torch.cat([self.prev_terminals, self.terminals[:, :-1]], dim=1), | ||
| trajectory_loss_norm=config["trajectory_loss_norm"], | ||
| trajectory_loss_clamp_min=config["trajectory_loss_clamp_min"], | ||
| ) | ||
| r_commitment = r_commitment.detach() | ||
| self.prev_state_traj = traj[:, -1, ...] | ||
| self.prev_terminals = self.terminals[:, [-1]] |
There was a problem hiding this comment.
self.prev_state_traj is initialized with shape (total_agents, traj_len, 2) but here it’s used alongside traj whose batch dimension is segments (from self.actions/self.observations). If segments != total_agents (which is allowed as long as total_agents <= segments), compute_l2_loss_ego_action_traj will hit a shape mismatch on the first epoch. Consider making prev_state_traj consistently indexed by segments (or by the same row-indexing scheme as self.actions/self.observations) from initialization onward.
| ).softmax(dim=-1) | ||
|
|
||
| self.prev_state_traj = torch.ones( | ||
| total_agents, | ||
| *atn_traj_size, | ||
| device=device, | ||
| dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype], | ||
| ).softmax(dim=-1) |
There was a problem hiding this comment.
prev_actions_traj/prev_state_traj are initialized with -torch.ones(...).softmax(dim=-1), which converts the intended sentinel values into a valid probability distribution (e.g., [0.5, 0.5]) and makes later logic like prev_traj[done_mask] = -1.0 inconsistent. If these tensors are meant to store raw previous actions / coordinates or a sentinel, initialize them directly (e.g., -1 or 0) without softmax.
| ).softmax(dim=-1) | |
| self.prev_state_traj = torch.ones( | |
| total_agents, | |
| *atn_traj_size, | |
| device=device, | |
| dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype], | |
| ).softmax(dim=-1) | |
| ) | |
| self.prev_state_traj = -torch.ones( | |
| total_agents, | |
| *atn_traj_size, | |
| device=device, | |
| dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype], | |
| ) |
| loss = ( | ||
| pg_loss | ||
| + config["vf_coef"] * v_loss | ||
| - config["ent_coef"] * entropy_loss / config["actions_trajectory_length"] |
There was a problem hiding this comment.
Entropy scaling divides by config["actions_trajectory_length"], but elsewhere you treat this setting as optional via config.get(...). This can raise a KeyError for configs that don’t define it. Prefer self.actions_trajectory_length (or config.get(..., default)) for consistency.
| - config["ent_coef"] * entropy_loss / config["actions_trajectory_length"] | |
| - config["ent_coef"] * entropy_loss / self.actions_trajectory_length |
| action, logprob, _, _ = pufferlib.pytorch.sample_logits( | ||
| logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80) | ||
| ) |
There was a problem hiding this comment.
This call unpacks 4 values from sample_logits, but pufferlib.pytorch.sample_logits returns 3 (action, logprob, entropy). Also, when actions_trajectory_length > 1, action will be a trajectory (e.g., shape [B, K]) and reshaping it directly to vecenv.action_space.shape will fail; you likely need to select the first action before stepping the env (similar to what PuffeRL.evaluate does).
| action, logprob, _, _ = pufferlib.pytorch.sample_logits( | |
| logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80) | |
| ) | |
| action, logprob, entropy = pufferlib.pytorch.sample_logits( | |
| logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80) | |
| ) | |
| actions_traj_len = args["train"].get("actions_trajectory_length", 80) | |
| if actions_traj_len > 1 and isinstance(action, torch.Tensor) and action.ndim >= 2: | |
| # When a trajectory of actions is returned, select the first action | |
| action = action[:, 0, ...] |
| traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords | ||
| traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords | ||
| """ |
There was a problem hiding this comment.
set_predicted_trajectories forwards traj_x/traj_y directly into the C extension, which currently memcpy’s without dtype/size validation. To prevent hard crashes, coerce inputs to contiguous np.float32 (and ideally validate length against sum(active_agent_count) * predicted_traj_len) before calling into binding.vec_set_predicted_trajectories.
| traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords | |
| traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords | |
| """ | |
| traj_x: numpy array-like of shape (total_agents * traj_len,) with world x coords. | |
| traj_y: numpy array-like of shape (total_agents * traj_len,) with world y coords. | |
| """ | |
| # Ensure inputs are contiguous float32 arrays to satisfy C extension expectations. | |
| traj_x = np.ascontiguousarray(traj_x, dtype=np.float32) | |
| traj_y = np.ascontiguousarray(traj_y, dtype=np.float32) | |
| # Optionally validate length if metadata is available. | |
| if hasattr(self, "active_agent_count") and hasattr(self, "predicted_traj_len"): | |
| try: | |
| total_active_agents = int(np.sum(self.active_agent_count)) | |
| expected_len = total_active_agents * int(self.predicted_traj_len) | |
| if traj_x.size != expected_len or traj_y.size != expected_len: | |
| raise ValueError( | |
| f"Predicted trajectory arrays have incorrect length: " | |
| f"expected {expected_len}, got traj_x.size={traj_x.size}, " | |
| f"traj_y.size={traj_y.size}" | |
| ) | |
| except Exception: | |
| # If metadata is malformed, fall back without enforcing length to avoid breaking. | |
| pass |
No description provided.