fix: correct scripts, tests and agent code after PR #12 (eis) refactor#15
Merged
Conversation
…API after PR #12 - play_q, train_q, analyse_q, use_q_ide: wrap env kwargs in AntiPendulumConfig; replace DEFAULT_DISCRETE reference with discrete="energy" string preset; replace QLearningAgent(trained=...) with filename/use_file API - experiments/*.yaml: negate crane_velocity weights (0.1→-0.1) to preserve penalty semantics under new rc.crane_velocity * x_dot^2 sign convention - pyproject.toml: remove pyment from runtime deps (docstring tool, not runtime) - experiment_config.py: update RewardConfig.crane_velocity docstring for new sign - test_ppo: remove skip markers, update env_kwargs to conf=AntiPendulumConfig - test_simple_q_env: remove skip markers (API already updated in PR #12) - test_environment: delete test_t_min_crane_reward_term (term removed from reward) - test_algorithm: update skip reason to name the obs-index mismatch blocking them
action_space.sample() and observation_space.sample() with seed=1 returned version-specific values that changed with the gymnasium upgrade in PR #12. The physics-loop assertions below are still valid and kept.
- q_agent.py: duck-type env.conf.dt so QLearningAgent works with SimpleTestEnv (which has no .conf); falls back to dt=1.0 - test_simple_q_env: cast action_space.sample() to int before arithmetic to avoid np.uint16 underflow (0-1 -> 65535); drop flaky stats uniformity assertion (physics assertions in the loop are sufficient) - test_ppo: add isinstance(rms, RunningMeanStd) narrowing so mypy accepts .mean access; add missing trailing newline - test_algorithm, test_simple_q_env: shorten skip-reason strings to fit ruff E501 line-length limit - test_environment, test_algorithm, test_ppo, test_simple_q_env: ruff format pass
…12 - play_ppo.py, train_ppo.py: wrap flat env kwargs (start_speed, render_mode, reward_fac, etc.) in AntiPendulumConfig; use dataclasses.replace to update start_speed between speed-sweep steps - ppo_agent.py: access continuous_actions and acc via env.conf instead of as direct env attributes (moved to frozen dataclass in PR #12) - controlled_crane_pendulum.py: restore self.render_mode as a direct instance attribute (gymnasium convention); was removed by PR #12, breaking any caller that reads env.render_mode
eisDNV
approved these changes
Jun 19, 2026
eisDNV
left a comment
Collaborator
There was a problem hiding this comment.
We take that as a new basis. I will commit a few more changes addressing Q-learning changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #12 (eis branch) refactored
AntiPendulumEnvandQLearningAgentto use frozen dataclasses (AntiPendulumConfig,QLearningConfig), but the scripts, tests, and agent code were not updated to match the new API. This PR fixes all breakage introduced by that merge.Q-learning fixes
play_q.py,train_q.py,analyse_q.py,use_q_ide.py: wrap env kwargs inAntiPendulumConfig; updateQLearningAgentconstructor from oldtrained=(path, bool)to newfilename=,use_file=APIq_agent.py: duck-typeenv.conf.dtsoQLearningAgentworks withSimpleTestEnv(which has no.conf); falls back todt=1.0PPO fixes
play_ppo.py,train_ppo.py: wrap flat env kwargs inAntiPendulumConfig; usedataclasses.replaceto updatestart_speedbetween speed-sweep stepsppo_agent.py: accesscontinuous_actionsandaccviaenv.conf(moved to frozen dataclass in PR Eis #12); restoreself.render_modeas a direct attribute onAntiPendulumEnv(gymnasium convention, removed by PR Eis #12)controlled_crane_pendulum.py: addself.render_mode = self.conf.render_modein__init__Config / dependency fixes
experiments/hybrid_cv01.yaml,hybrid_t_min.yaml: flipcrane_velocityfrom0.1to-0.1— reward sign convention changed in PR Eis #12 (+x_dot²is now positive); negative weight penalises velocity as intendedpyproject.toml: removepyment>=0.3.3from runtime dependencies (dev tool)experiment_config.py: updateRewardConfig.crane_velocitydocstring to reflect new sign conventionTest fixes
test_ppo.py: remove 6@pytest.mark.skipmarkers; updateenv_kwargsto useAntiPendulumConfig; addisinstance(rms, RunningMeanStd)narrowing for mypytest_simple_q_env.py: fixSimpleTestEnvaction indexing (actions are 0/1/2, not -1/0/1); castaction_space.sample()tointto avoidnp.uint16underflow; remove flaky RNG uniformity assertiontest_environment.py: deletetest_t_min_crane_reward_term(term removed in PR Eis #12)test_algorithm.py: update skip reasons to explain obs-index mismatch (known follow-up forAlgorithmAgent)Test plan
uv run pytest tests/— 47 passed, 3 skipped, 0 failuresuv run ruff check .— cleanuv run ruff format --check .— cleanuv run mypy src/ tests/— cleanplay_ppo.pywith continuous and discrete models — runs and renders correctly