fix: correct scripts, tests and agent code after PR #12 (eis) refactor by aleksandarbabicdnv · Pull Request #15 · dnv-opensource/crane-controller

aleksandarbabicdnv · 2026-06-19T04:06:12Z

Summary

PR #12 (eis branch) refactored AntiPendulumEnv and QLearningAgent to use frozen dataclasses (AntiPendulumConfig, QLearningConfig), but the scripts, tests, and agent code were not updated to match the new API. This PR fixes all breakage introduced by that merge.

Q-learning fixes

play_q.py, train_q.py, analyse_q.py, use_q_ide.py: wrap env kwargs in AntiPendulumConfig; update QLearningAgent constructor from old trained=(path, bool) to new filename=, use_file= API
q_agent.py: duck-type env.conf.dt so QLearningAgent works with SimpleTestEnv (which has no .conf); falls back to dt=1.0

PPO fixes

play_ppo.py, train_ppo.py: wrap flat env kwargs in AntiPendulumConfig; use dataclasses.replace to update start_speed between speed-sweep steps
ppo_agent.py: access continuous_actions and acc via env.conf (moved to frozen dataclass in PR Eis #12); restore self.render_mode as a direct attribute on AntiPendulumEnv (gymnasium convention, removed by PR Eis #12)
controlled_crane_pendulum.py: add self.render_mode = self.conf.render_mode in __init__

Config / dependency fixes

experiments/hybrid_cv01.yaml, hybrid_t_min.yaml: flip crane_velocity from 0.1 to -0.1 — reward sign convention changed in PR Eis #12 (+x_dot² is now positive); negative weight penalises velocity as intended
pyproject.toml: remove pyment>=0.3.3 from runtime dependencies (dev tool)
experiment_config.py: update RewardConfig.crane_velocity docstring to reflect new sign convention

Test fixes

test_ppo.py: remove 6 @pytest.mark.skip markers; update env_kwargs to use AntiPendulumConfig; add isinstance(rms, RunningMeanStd) narrowing for mypy
test_simple_q_env.py: fix SimpleTestEnv action indexing (actions are 0/1/2, not -1/0/1); cast action_space.sample() to int to avoid np.uint16 underflow; remove flaky RNG uniformity assertion
test_environment.py: delete test_t_min_crane_reward_term (term removed in PR Eis #12)
test_algorithm.py: update skip reasons to explain obs-index mismatch (known follow-up for AlgorithmAgent)

Test plan

uv run pytest tests/ — 47 passed, 3 skipped, 0 failures
uv run ruff check . — clean
uv run ruff format --check . — clean
uv run mypy src/ tests/ — clean
play_ppo.py with continuous and discrete models — runs and renders correctly

…API after PR #12 - play_q, train_q, analyse_q, use_q_ide: wrap env kwargs in AntiPendulumConfig; replace DEFAULT_DISCRETE reference with discrete="energy" string preset; replace QLearningAgent(trained=...) with filename/use_file API - experiments/*.yaml: negate crane_velocity weights (0.1→-0.1) to preserve penalty semantics under new rc.crane_velocity * x_dot^2 sign convention - pyproject.toml: remove pyment from runtime deps (docstring tool, not runtime) - experiment_config.py: update RewardConfig.crane_velocity docstring for new sign - test_ppo: remove skip markers, update env_kwargs to conf=AntiPendulumConfig - test_simple_q_env: remove skip markers (API already updated in PR #12) - test_environment: delete test_t_min_crane_reward_term (term removed from reward) - test_algorithm: update skip reason to name the obs-index mismatch blocking them

action_space.sample() and observation_space.sample() with seed=1 returned version-specific values that changed with the gymnasium upgrade in PR #12. The physics-loop assertions below are still valid and kept.

- q_agent.py: duck-type env.conf.dt so QLearningAgent works with SimpleTestEnv (which has no .conf); falls back to dt=1.0 - test_simple_q_env: cast action_space.sample() to int before arithmetic to avoid np.uint16 underflow (0-1 -> 65535); drop flaky stats uniformity assertion (physics assertions in the loop are sufficient) - test_ppo: add isinstance(rms, RunningMeanStd) narrowing so mypy accepts .mean access; add missing trailing newline - test_algorithm, test_simple_q_env: shorten skip-reason strings to fit ruff E501 line-length limit - test_environment, test_algorithm, test_ppo, test_simple_q_env: ruff format pass

…12 - play_ppo.py, train_ppo.py: wrap flat env kwargs (start_speed, render_mode, reward_fac, etc.) in AntiPendulumConfig; use dataclasses.replace to update start_speed between speed-sweep steps - ppo_agent.py: access continuous_actions and acc via env.conf instead of as direct env attributes (moved to frozen dataclass in PR #12) - controlled_crane_pendulum.py: restore self.render_mode as a direct instance attribute (gymnasium convention); was removed by PR #12, breaking any caller that reads env.render_mode

eisDNV

We take that as a new basis. I will commit a few more changes addressing Q-learning changes.

aleksandarbabicdnv added 4 commits June 18, 2026 22:24

fix: remove brittle seeded-RNG assertions from test_env

3a1b59a

action_space.sample() and observation_space.sample() with seed=1 returned version-specific values that changed with the gymnasium upgrade in PR #12. The physics-loop assertions below are still valid and kept.

aleksandarbabicdnv requested a review from eisDNV June 19, 2026 04:06

eisDNV approved these changes Jun 19, 2026

View reviewed changes

eisDNV merged commit 636454f into main Jun 19, 2026
20 checks passed

eisDNV deleted the fix/eis-pr12 branch June 19, 2026 05:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct scripts, tests and agent code after PR #12 (eis) refactor#15

fix: correct scripts, tests and agent code after PR #12 (eis) refactor#15
eisDNV merged 4 commits into
mainfrom
fix/eis-pr12

aleksandarbabicdnv commented Jun 19, 2026

Uh oh!

eisDNV left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aleksandarbabicdnv commented Jun 19, 2026

Summary

Q-learning fixes

PPO fixes

Config / dependency fixes

Test fixes

Test plan

Uh oh!

eisDNV left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants