4-atari-hard: Go-Explore (exploration phase) on Montezuma's Revenge + benchmark by dnddnjs · Pull Request #132 · rlcode/reinforcement-learning

dnddnjs · 2026-06-08T00:46:36Z

Go-Explore phase 1 (exploration only) on Montezuma's Revenge — the archive + emulator-restore paradigm, side by side with the PPO+RND row.

Best end-of-episode score: 31,000 at 500M agent steps (~5.5h, Mac Studio M4 Max, 12 explorer processes, no neural network). Single seed. Replay-verified: re-executing the stored 5,336-action trajectory from reset reproduces exactly 31,000.

Protocol notes (also in the README block):

Deterministic ALE (no sticky actions, frameskip 4, fixed seed) — required by restore-based exploration, not comparable to the sticky-action RL rows.
Score = best end-of-episode trajectory found by search, not an RL policy score; the paper's robustification phase is not run here.
Reference: Nature exploration-phase mean without domain knowledge is 24,758 at the same 2B-frame budget (50+ seeds vs our single seed). Rooms found: 24.

W&B (full metrics history + gameplay video): https://wandb.ai/rlcode/rl-atari-hard-go-explore/runs/m6ox4l3m

(Single-seed diagnostic run; merge is a human decision.)

… benchmark Go-Explore phase 1 (Ecoffet et al. 2019 / Nature 2021), no neural net: an archive of downscaled-frame cells (11x8, 9 gray levels), emulator state save/restore to return to frontier cells, repeated random actions to explore from them. 12 explorer processes over raw gymnasium ALE (envpool exposes no clone API, hence the separate env_go_explore.py). Result: best end-of-episode score 31,000 at 500M agent steps (~5.5h on a Mac Studio M4 Max), single seed, replay-verified (re-executing the stored 5,336-action demo from reset reproduces the score exactly). Deterministic protocol (no sticky actions) -- a trajectory-search result, not an RL policy score; see the README caveat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4-atari-hard: Go-Explore (exploration phase) on Montezuma's Revenge + benchmark#132

4-atari-hard: Go-Explore (exploration phase) on Montezuma's Revenge + benchmark#132
dnddnjs wants to merge 1 commit into
masterfrom
ai/montezuma-go-explore

dnddnjs commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dnddnjs commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant