'done_info' not generated and incorrect reward counting on maximum steps completion


It seems that when an actor successfully completes the maximum number of steps (2048) without failure, 

any 'done_info' is not generated at [Line 214](https://github.com/MadryLab/implementation-matters/blob/5ee6ecb12545365d9178135e65576adfc0d82f52/src/policy_gradients/agent.py#L214),
https://github.com/MadryLab/implementation-matters/blob/5ee6ecb12545365d9178135e65576adfc0d82f52/src/policy_gradients/agent.py#L214

and no 'done_info' is appended to 'completed_episode_info' at [Line 296](https://github.com/MadryLab/implementation-matters/blob/5ee6ecb12545365d9178135e65576adfc0d82f52/src/policy_gradients/agent.py#L296).
https://github.com/MadryLab/implementation-matters/blob/5ee6ecb12545365d9178135e65576adfc0d82f52/src/policy_gradients/agent.py#L296

Consequently, the reward is counted as -1, as observed in the code snippet at [Line 324](https://github.com/MadryLab/implementation-matters/blob/5ee6ecb12545365d9178135e65576adfc0d82f52/src/policy_gradients/agent.py#L324).
https://github.com/MadryLab/implementation-matters/blob/5ee6ecb12545365d9178135e65576adfc0d82f52/src/policy_gradients/agent.py#L324

If an actor reaches the maximum step without any failures, it should be considered 'done,' and the reward should be counted to the total reward as it is, rather than being set to -1. 
Could you please examine how the reward is tallied when an actor successfully completes the maximum number of steps without any failures?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'done_info' not generated and incorrect reward counting on maximum steps completion #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'done_info' not generated and incorrect reward counting on maximum steps completion #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions