Skip to content

fix: correct scripts, tests and agent code after PR #12 (eis) refactor#15

Merged
eisDNV merged 4 commits into
mainfrom
fix/eis-pr12
Jun 19, 2026
Merged

fix: correct scripts, tests and agent code after PR #12 (eis) refactor#15
eisDNV merged 4 commits into
mainfrom
fix/eis-pr12

Conversation

@aleksandarbabicdnv

Copy link
Copy Markdown
Collaborator

Summary

PR #12 (eis branch) refactored AntiPendulumEnv and QLearningAgent to use frozen dataclasses (AntiPendulumConfig, QLearningConfig), but the scripts, tests, and agent code were not updated to match the new API. This PR fixes all breakage introduced by that merge.

Q-learning fixes

  • play_q.py, train_q.py, analyse_q.py, use_q_ide.py: wrap env kwargs in AntiPendulumConfig; update QLearningAgent constructor from old trained=(path, bool) to new filename=, use_file= API
  • q_agent.py: duck-type env.conf.dt so QLearningAgent works with SimpleTestEnv (which has no .conf); falls back to dt=1.0

PPO fixes

  • play_ppo.py, train_ppo.py: wrap flat env kwargs in AntiPendulumConfig; use dataclasses.replace to update start_speed between speed-sweep steps
  • ppo_agent.py: access continuous_actions and acc via env.conf (moved to frozen dataclass in PR Eis #12); restore self.render_mode as a direct attribute on AntiPendulumEnv (gymnasium convention, removed by PR Eis #12)
  • controlled_crane_pendulum.py: add self.render_mode = self.conf.render_mode in __init__

Config / dependency fixes

  • experiments/hybrid_cv01.yaml, hybrid_t_min.yaml: flip crane_velocity from 0.1 to -0.1 — reward sign convention changed in PR Eis #12 (+x_dot² is now positive); negative weight penalises velocity as intended
  • pyproject.toml: remove pyment>=0.3.3 from runtime dependencies (dev tool)
  • experiment_config.py: update RewardConfig.crane_velocity docstring to reflect new sign convention

Test fixes

  • test_ppo.py: remove 6 @pytest.mark.skip markers; update env_kwargs to use AntiPendulumConfig; add isinstance(rms, RunningMeanStd) narrowing for mypy
  • test_simple_q_env.py: fix SimpleTestEnv action indexing (actions are 0/1/2, not -1/0/1); cast action_space.sample() to int to avoid np.uint16 underflow; remove flaky RNG uniformity assertion
  • test_environment.py: delete test_t_min_crane_reward_term (term removed in PR Eis #12)
  • test_algorithm.py: update skip reasons to explain obs-index mismatch (known follow-up for AlgorithmAgent)

Test plan

  • uv run pytest tests/ — 47 passed, 3 skipped, 0 failures
  • uv run ruff check . — clean
  • uv run ruff format --check . — clean
  • uv run mypy src/ tests/ — clean
  • play_ppo.py with continuous and discrete models — runs and renders correctly

…API after PR #12

- play_q, train_q, analyse_q, use_q_ide: wrap env kwargs in AntiPendulumConfig;
  replace DEFAULT_DISCRETE reference with discrete="energy" string preset;
  replace QLearningAgent(trained=...) with filename/use_file API
- experiments/*.yaml: negate crane_velocity weights (0.1→-0.1) to preserve
  penalty semantics under new rc.crane_velocity * x_dot^2 sign convention
- pyproject.toml: remove pyment from runtime deps (docstring tool, not runtime)
- experiment_config.py: update RewardConfig.crane_velocity docstring for new sign
- test_ppo: remove skip markers, update env_kwargs to conf=AntiPendulumConfig
- test_simple_q_env: remove skip markers (API already updated in PR #12)
- test_environment: delete test_t_min_crane_reward_term (term removed from reward)
- test_algorithm: update skip reason to name the obs-index mismatch blocking them
action_space.sample() and observation_space.sample() with seed=1 returned
version-specific values that changed with the gymnasium upgrade in PR #12.
The physics-loop assertions below are still valid and kept.
- q_agent.py: duck-type env.conf.dt so QLearningAgent works with
  SimpleTestEnv (which has no .conf); falls back to dt=1.0
- test_simple_q_env: cast action_space.sample() to int before arithmetic
  to avoid np.uint16 underflow (0-1 -> 65535); drop flaky stats
  uniformity assertion (physics assertions in the loop are sufficient)
- test_ppo: add isinstance(rms, RunningMeanStd) narrowing so mypy
  accepts .mean access; add missing trailing newline
- test_algorithm, test_simple_q_env: shorten skip-reason strings to
  fit ruff E501 line-length limit
- test_environment, test_algorithm, test_ppo, test_simple_q_env:
  ruff format pass
…12

- play_ppo.py, train_ppo.py: wrap flat env kwargs (start_speed,
  render_mode, reward_fac, etc.) in AntiPendulumConfig; use
  dataclasses.replace to update start_speed between speed-sweep steps
- ppo_agent.py: access continuous_actions and acc via env.conf instead
  of as direct env attributes (moved to frozen dataclass in PR #12)
- controlled_crane_pendulum.py: restore self.render_mode as a direct
  instance attribute (gymnasium convention); was removed by PR #12,
  breaking any caller that reads env.render_mode

@eisDNV eisDNV left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We take that as a new basis. I will commit a few more changes addressing Q-learning changes.

@eisDNV eisDNV merged commit 636454f into main Jun 19, 2026
20 checks passed
@eisDNV eisDNV deleted the fix/eis-pr12 branch June 19, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants