[WIP, just being used to read code] by eugenevinitsky · Pull Request #366 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-24T11:10:16Z

No description provided.

…f future actions

…m in the meantime

Copilot

Pull request overview

This WIP PR introduces “action trajectory” support (predicting a sequence of future actions instead of a single action), threads that through the Drive policy/training loop, and adds C/Python plumbing to visualize predicted trajectories in the Ocean Drive renderer.

Changes:

Extend action sampling to emit an action trajectory of configurable length.
Augment observations with the previous predicted action trajectory and add a trajectory-consistency reward term during training.
Add Drive-side predicted-trajectory storage + rendering, including a Python API and a C extension setter.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
pufferlib/pytorch.py	Updates `sample_logits` to support sampling full action trajectories.
pufferlib/pufferl.py	Augments obs/actions buffers for trajectories; adds trajectory rollout + commitment reward; updates eval/train sampling.
pufferlib/ocean/torch.py	Updates Drive policy to encode past action trajectories and output trajectory logits.
pufferlib/ocean/env_config.h	Adds `actions_trajectory_length` parsing from `[train]` to env init config.
pufferlib/ocean/env_binding.h	Adds `vec_set_predicted_trajectories` to copy predicted traj coords into C envs.
pufferlib/ocean/drive/visualize.c	Threads traj length into env/net init; renders predicted trajectories.
pufferlib/ocean/drive/python_dynamics.py	New: PyTorch rollout + commitment loss utilities for trajectory shaping.
pufferlib/ocean/drive/drivenet.h	Extends DriveNet to output per-step logits and maintain past-actions state.
pufferlib/ocean/drive/drive.py	Adds Python wrapper method `set_predicted_trajectories`.
pufferlib/ocean/drive/drive.h	Adds predicted trajectory buffers + rollout helper + rendering in `draw_scene`.
pufferlib/ocean/drive/drive.c	Updates drivenet init calls to pass trajectory length.
pufferlib/ocean/benchmark/evaluator.py	Passes `actions_trajectory_length` into sampling during evaluation rollouts.
pufferlib/models.py	Updates `Default` and `LSTMWrapper` observation sizing for trajectory augmentation.
pufferlib/config/ocean/drive.ini	Adjusts training hyperparams; adds trajectory-length and loss config knobs.
.gitignore	Ignores local experiment launcher artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T11:18:53Z

        self.values = torch.zeros(segments, horizon, device=device)
        self.logprobs = torch.zeros(segments, horizon, device=device)
        self.rewards = torch.zeros(segments, horizon, device=device)
+        self.r_commitments = torch.zeros(segments, horizon, device=device)


self.r_commitments is allocated but never read or written anywhere else in this file. If it’s no longer needed, remove it to avoid confusion; if it’s intended for logging/debugging, wire it into the training loop (e.g., store r_commitment per-step).

Suggested change

self.r_commitments = torch.zeros(segments, horizon, device=device)

Copilot · 2026-03-24T11:18:54Z

+        self.actions_trajectory_length = config.get("actions_trajectory_length", 80)
+        atn_traj_size = (  # TODO: please let's be precise with the semantics before checking this in
+            self.actions_trajectory_length,
+            2,  # 2 real numbers for accel and steering
+        )  # TODO: generalize to continuous case
+        assert len(obs_space.shape) == 1
+        self.obs_len = obs_space.shape[0]
+        obs_aug_shape = (
+            obs_space.shape[0] + np.prod(atn_traj_size),
+        )  # assumes obs_space is 1d and we flatten the action trajectory
+


The new observation augmentation and assertions make PuffeRL effectively Drive-specific (assert len(obs_space.shape) == 1, hard-coded atn_traj_size=(traj_len,2)). This will fail for non-1D observations (e.g., images/dicts) and non-Drive action spaces. Consider gating this entire augmentation/trajectory-loss path behind an explicit config flag or env check, and keep the default behavior compatible with existing environments.

Copilot · 2026-03-24T11:18:54Z

+    float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr);
+    float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr);
+
+    int offset = 0;
+    for (int i = 0; i < vec->num_envs; i++) {
+        Drive *drive = (Drive *)vec->envs[i];
+        int n = drive->active_agent_count * drive->predicted_traj_len;
+        memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float));
+        memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float));


vec_set_predicted_trajectories memcpy’s from the provided NumPy arrays without validating dtype (e.g., float32), contiguity, or that the arrays are large enough for the computed offset + n. If callers pass the wrong dtype/shape, this can read past the buffer and crash the process. Consider checking PyArray_TYPE, PyArray_NDIM/SHAPE, PyArray_ISCARRAY, and PyArray_SIZE before memcpy.

Suggested change

float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr);

float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr);

int offset = 0;

for (int i = 0; i < vec->num_envs; i++) {

Drive *drive = (Drive *)vec->envs[i];

int n = drive->active_agent_count * drive->predicted_traj_len;

memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float));

memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float));

PyArrayObject *x_arr_obj = (PyArrayObject *)x_arr;

PyArrayObject *y_arr_obj = (PyArrayObject *)y_arr;

if (PyArray_TYPE(x_arr_obj) != NPY_FLOAT32 || PyArray_TYPE(y_arr_obj) != NPY_FLOAT32) {

PyErr_SetString(PyExc_TypeError, "x and y must be NumPy arrays of dtype float32");

return NULL;

}

if (!PyArray_ISCARRAY(x_arr_obj) || !PyArray_ISCARRAY(y_arr_obj)) {

PyErr_SetString(PyExc_TypeError, "x and y must be contiguous float32 NumPy arrays");

return NULL;

}

npy_intp size_x = PyArray_SIZE(x_arr_obj);

npy_intp size_y = PyArray_SIZE(y_arr_obj);

float *x_base = (float *)PyArray_DATA(x_arr_obj);

float *y_base = (float *)PyArray_DATA(y_arr_obj);

npy_intp offset = 0;

for (int i = 0; i < vec->num_envs; i++) {

Drive *drive = (Drive *)vec->envs[i];

npy_intp n = (npy_intp)(drive->active_agent_count * drive->predicted_traj_len);

if (n < 0 || offset > size_x - n || offset > size_y - n) {

PyErr_SetString(PyExc_ValueError, "Input trajectory arrays are too small for predicted trajectories");

return NULL;

}

memcpy(drive->predicted_traj_x, &x_base[offset], (size_t)n * sizeof(float));

memcpy(drive->predicted_traj_y, &y_base[offset], (size_t)n * sizeof(float));

Copilot · 2026-03-24T11:18:54Z

 import os
 import sys
 import glob
+import json


Top-level import json appears unused (JSON is imported inside functions, and the only other references here are in commented-out code). Removing it avoids an unnecessary import and keeps the module’s dependencies minimal.

Suggested change

import json

Copilot · 2026-03-24T11:18:55Z

+def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
+    accel = actions // 13
+    steering = actions % 13
+    return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)
+
+
+def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
+    accel = actions // 13
+    steering = actions % 13


These action decoding helpers hard-code 13 (steering bins) and assume the CLASSIC discrete joint action encoding. This will be incorrect for dynamics_model=jerk (3 lateral bins) and for continuous actions. Consider deriving the split based on the actual env action space / dynamics model (e.g., from vecenv.driver_env config) instead of a fixed constant.

Suggested change

def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:

accel = actions // 13

steering = actions % 13

return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)

def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:

accel = actions // 13

steering = actions % 13

def convert_single_action_integers_to_r2(

actions: torch.Tensor,

steering_bins: int = 13,

) -> torch.Tensor:

"""Convert scalar discrete action indices into (accel, steering) pairs.

The steering bin count defaults to 13 (CLASSIC encoding) but can be

overridden for other dynamics models (e.g., jerk with 3 lateral bins).

"""

accel = actions // steering_bins

steering = actions % steering_bins

return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)

def convert_bptt_action_integers_to_r2(

actions: torch.Tensor,

steering_bins: int = 13,

) -> torch.Tensor:

"""Convert BPTT-discounted discrete action indices into (accel, steering).

The steering bin count defaults to 13 (CLASSIC encoding) but can be

overridden for other dynamics models.

"""

accel = actions // steering_bins

steering = actions % steering_bins

Copilot · 2026-03-24T11:18:57Z

+        action_N2 = convert_bptt_action_integers_to_r2(self.actions)
+        traj, heading = rollout_state_trajectory_ego(action_N2, observations=self.observations)
+        r_commitment = compute_l2_loss_ego_action_traj(
+            traj,
+            self.prev_state_traj,
+            heading,
+            torch.cat([self.prev_terminals, self.terminals[:, :-1]], dim=1),
+            trajectory_loss_norm=config["trajectory_loss_norm"],
+            trajectory_loss_clamp_min=config["trajectory_loss_clamp_min"],
+        )
+        r_commitment = r_commitment.detach()
+        self.prev_state_traj = traj[:, -1, ...]
+        self.prev_terminals = self.terminals[:, [-1]]


self.prev_state_traj is initialized with shape (total_agents, traj_len, 2) but here it’s used alongside traj whose batch dimension is segments (from self.actions/self.observations). If segments != total_agents (which is allowed as long as total_agents <= segments), compute_l2_loss_ego_action_traj will hit a shape mismatch on the first epoch. Consider making prev_state_traj consistently indexed by segments (or by the same row-indexing scheme as self.actions/self.observations) from initialization onward.

Copilot · 2026-03-24T11:18:57Z

+        ).softmax(dim=-1)
+
+        self.prev_state_traj = torch.ones(
+            total_agents,
+            *atn_traj_size,
+            device=device,
+            dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype],
+        ).softmax(dim=-1)


prev_actions_traj/prev_state_traj are initialized with -torch.ones(...).softmax(dim=-1), which converts the intended sentinel values into a valid probability distribution (e.g., [0.5, 0.5]) and makes later logic like prev_traj[done_mask] = -1.0 inconsistent. If these tensors are meant to store raw previous actions / coordinates or a sentinel, initialize them directly (e.g., -1 or 0) without softmax.

Suggested change

).softmax(dim=-1)

self.prev_state_traj = torch.ones(

total_agents,

*atn_traj_size,

device=device,

dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype],

).softmax(dim=-1)

)

self.prev_state_traj = -torch.ones(

total_agents,

*atn_traj_size,

device=device,

dtype=pufferlib.pytorch.numpy_to_torch_dtype_dict[obs_space.dtype],

)

Copilot · 2026-03-24T11:18:57Z

+            loss = (
+                pg_loss
+                + config["vf_coef"] * v_loss
+                - config["ent_coef"] * entropy_loss / config["actions_trajectory_length"]


Entropy scaling divides by config["actions_trajectory_length"], but elsewhere you treat this setting as optional via config.get(...). This can raise a KeyError for configs that don’t define it. Prefer self.actions_trajectory_length (or config.get(..., default)) for consistency.

Suggested change

- config["ent_coef"] * entropy_loss / config["actions_trajectory_length"]

- config["ent_coef"] * entropy_loss / self.actions_trajectory_length

Copilot · 2026-03-24T11:18:57Z

+                action, logprob, _, _ = pufferlib.pytorch.sample_logits(
+                    logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)
+                )


This call unpacks 4 values from sample_logits, but pufferlib.pytorch.sample_logits returns 3 (action, logprob, entropy). Also, when actions_trajectory_length > 1, action will be a trajectory (e.g., shape [B, K]) and reshaping it directly to vecenv.action_space.shape will fail; you likely need to select the first action before stepping the env (similar to what PuffeRL.evaluate does).

Suggested change

action, logprob, _, _ = pufferlib.pytorch.sample_logits(

logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)

)

action, logprob, entropy = pufferlib.pytorch.sample_logits(

logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)

)

actions_traj_len = args["train"].get("actions_trajectory_length", 80)

if actions_traj_len > 1 and isinstance(action, torch.Tensor) and action.ndim >= 2:

# When a trajectory of actions is returned, select the first action

action = action[:, 0, ...]

Copilot · 2026-03-24T11:18:58Z

+            traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords
+            traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords
+        """


set_predicted_trajectories forwards traj_x/traj_y directly into the C extension, which currently memcpy’s without dtype/size validation. To prevent hard crashes, coerce inputs to contiguous np.float32 (and ideally validate length against sum(active_agent_count) * predicted_traj_len) before calling into binding.vec_set_predicted_trajectories.

Suggested change

traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords

traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords

"""

traj_x: numpy array-like of shape (total_agents * traj_len,) with world x coords.

traj_y: numpy array-like of shape (total_agents * traj_len,) with world y coords.

"""

# Ensure inputs are contiguous float32 arrays to satisfy C extension expectations.

traj_x = np.ascontiguousarray(traj_x, dtype=np.float32)

traj_y = np.ascontiguousarray(traj_y, dtype=np.float32)

# Optionally validate length if metadata is available.

if hasattr(self, "active_agent_count") and hasattr(self, "predicted_traj_len"):

try:

total_active_agents = int(np.sum(self.active_agent_count))

expected_len = total_active_agents * int(self.predicted_traj_len)

if traj_x.size != expected_len or traj_y.size != expected_len:

raise ValueError(

f"Predicted trajectory arrays have incorrect length: "

f"expected {expected_len}, got traj_x.size={traj_x.size}, "

f"traj_y.size={traj_y.size}"

)

except Exception:

# If metadata is malformed, fall back without enforcing length to avoid breaking.

pass

riccardosavorgnan added 22 commits March 9, 2026 20:12

temp commit

7e4e347

intermediate commit

3e80f58

kinda buggy version

7b38085

working version

c4acd33

temp commit

0c69dc9

removed files added by mistake

429b52c

Working version!

3daee03

Added detach and so on

a73c72d

some additional changes

d3b3174

length is now configurable

8216070

Working l2 reward term

7616d75

modified what to log

a8ec879

ClaudeMaxxing the visualization part

f147064

ClaudeMaxxing the config

a66e422

Logging the penalties to tune them

f6d8e23

moved the penalty to be a penalty of future occupied states instead o…

b140c6e

…f future actions

somethings not right, checking in status and working on a diff proble…

b623718

…m in the meantime

Still problems with training - degradation after ~400 epochs

66a9319

stabilized policy degradation during training runs

11442f9

parametrized the trajectory reward penalty to test diff values

a15b159

changes for the singularity container build

d23edf1

fix the render

9deebd1

Copilot AI review requested due to automatic review settings March 24, 2026 11:10

Copilot started reviewing on behalf of eugenevinitsky March 24, 2026 11:10 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

eugenevinitsky closed this Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP, just being used to read code]#366

[WIP, just being used to read code]#366
eugenevinitsky wants to merge 22 commits into3.0from
ricky/actions_trajectory_3

eugenevinitsky commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    float *x_base = (float *)PyArray_DATA((PyArrayObject *)x_arr);
-    float *y_base = (float *)PyArray_DATA((PyArrayObject *)y_arr);
-    int offset = 0;
-    for (int i = 0; i < vec->num_envs; i++) {
-        Drive *drive = (Drive *)vec->envs[i];
-        int n = drive->active_agent_count * drive->predicted_traj_len;
-        memcpy(drive->predicted_traj_x, &x_base[offset], n * sizeof(float));
-        memcpy(drive->predicted_traj_y, &y_base[offset], n * sizeof(float));
+    PyArrayObject *x_arr_obj = (PyArrayObject *)x_arr;
+    PyArrayObject *y_arr_obj = (PyArrayObject *)y_arr;
+    if (PyArray_TYPE(x_arr_obj) != NPY_FLOAT32 || PyArray_TYPE(y_arr_obj) != NPY_FLOAT32) {
+        PyErr_SetString(PyExc_TypeError, "x and y must be NumPy arrays of dtype float32");
+        return NULL;
+    }
+    if (!PyArray_ISCARRAY(x_arr_obj) || !PyArray_ISCARRAY(y_arr_obj)) {
+        PyErr_SetString(PyExc_TypeError, "x and y must be contiguous float32 NumPy arrays");
+        return NULL;
+    }
+    npy_intp size_x = PyArray_SIZE(x_arr_obj);
+    npy_intp size_y = PyArray_SIZE(y_arr_obj);
+    float *x_base = (float *)PyArray_DATA(x_arr_obj);
+    float *y_base = (float *)PyArray_DATA(y_arr_obj);
+    npy_intp offset = 0;
+    for (int i = 0; i < vec->num_envs; i++) {
+        Drive *drive = (Drive *)vec->envs[i];
+        npy_intp n = (npy_intp)(drive->active_agent_count * drive->predicted_traj_len);
+        if (n < 0 || offset > size_x - n || offset > size_y - n) {
+            PyErr_SetString(PyExc_ValueError, "Input trajectory arrays are too small for predicted trajectories");
+            return NULL;
+        }
+        memcpy(drive->predicted_traj_x, &x_base[offset], (size_t)n * sizeof(float));
+        memcpy(drive->predicted_traj_y, &y_base[offset], (size_t)n * sizeof(float));

-def convert_single_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
-    accel = actions // 13
-    steering = actions % 13
-    return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)
-def convert_bptt_action_integers_to_r2(actions: torch.Tensor) -> torch.Tensor:
-    accel = actions // 13
-    steering = actions % 13
+def convert_single_action_integers_to_r2(
+    actions: torch.Tensor,
+    steering_bins: int = 13,
+) -> torch.Tensor:
+    """Convert scalar discrete action indices into (accel, steering) pairs.
+    The steering bin count defaults to 13 (CLASSIC encoding) but can be
+    overridden for other dynamics models (e.g., jerk with 3 lateral bins).
+    """
+    accel = actions // steering_bins
+    steering = actions % steering_bins
+    return torch.cat([accel.unsqueeze(-1), steering.unsqueeze(-1)], dim=-1)
+def convert_bptt_action_integers_to_r2(
+    actions: torch.Tensor,
+    steering_bins: int = 13,
+) -> torch.Tensor:
+    """Convert BPTT-discounted discrete action indices into (accel, steering).
+    The steering bin count defaults to 13 (CLASSIC encoding) but can be
+    overridden for other dynamics models.
+    """
+    accel = actions // steering_bins
+    steering = actions % steering_bins

	- config["ent_coef"] * entropy_loss / config["actions_trajectory_length"]
	- config["ent_coef"] * entropy_loss / self.actions_trajectory_length

-                action, logprob, _, _ = pufferlib.pytorch.sample_logits(
-                    logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)
-                )
+                action, logprob, entropy = pufferlib.pytorch.sample_logits(
+                    logits, actions_trajectory_length=args["train"].get("actions_trajectory_length", 80)
+                )
+                actions_traj_len = args["train"].get("actions_trajectory_length", 80)
+                if actions_traj_len > 1 and isinstance(action, torch.Tensor) and action.ndim >= 2:
+                    # When a trajectory of actions is returned, select the first action
+                    action = action[:, 0, ...]

-            traj_x: numpy float32 array of shape (total_agents * traj_len,) with world x coords
-            traj_y: numpy float32 array of shape (total_agents * traj_len,) with world y coords
-        """
+            traj_x: numpy array-like of shape (total_agents * traj_len,) with world x coords.
+            traj_y: numpy array-like of shape (total_agents * traj_len,) with world y coords.
+        """
+        # Ensure inputs are contiguous float32 arrays to satisfy C extension expectations.
+        traj_x = np.ascontiguousarray(traj_x, dtype=np.float32)
+        traj_y = np.ascontiguousarray(traj_y, dtype=np.float32)
+        # Optionally validate length if metadata is available.
+        if hasattr(self, "active_agent_count") and hasattr(self, "predicted_traj_len"):
+            try:
+                total_active_agents = int(np.sum(self.active_agent_count))
+                expected_len = total_active_agents * int(self.predicted_traj_len)
+                if traj_x.size != expected_len or traj_y.size != expected_len:
+                    raise ValueError(
+                        f"Predicted trajectory arrays have incorrect length: "
+                        f"expected {expected_len}, got traj_x.size={traj_x.size}, "
+                        f"traj_y.size={traj_y.size}"
+                    )
+            except Exception:
+                # If metadata is malformed, fall back without enforcing length to avoid breaking.
+                pass

Conversation

eugenevinitsky commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants