Skip to content
Armen Kasparian edited this page Mar 28, 2024 · 5 revisions

Agent Configuration File

All agents will need the following configuration parameters:

  • --warmup_size: Number of steps before training begins (default=2500)
  • --batch_size: Batch size for training (default=2500)
  • --logdir: Logging directory (default=./logs)
  • --buffer_type: Buffer type (default='ER-v0')
  • --actor_model: Actor model (default='actor_fcnn-v0')
  • --critic_model: Critic model (default='critic_fcnn-v0')
  • --load_model: Path to model files for retraining or inference (default='None')

Loading Model Example

Loading models works by providing the path to the model directory in the cfg file of the desired agent.

Using config as follows:

"load_model": "/PATH/TO/MODELS/",

The agent will then look in your path to models and search for the corresponding model files to load for each.

Ex. TD3 has 6 models

  1. actor_model
  2. target_actor
  3. critic_model1
  4. target_critic1
  5. critic_model2
  6. target_critic2

This code searches for models in the directory that have the corresponding names (can have other appended information at the end actor_model_epoch50_40.h5 will work as well as actor_model.h5) and will make sure it is in the .h5 format.

DDPG has 4 models

  1. actor_model
  2. target_actor
  3. critic_model1
  4. target_critic1

Agent Base Class Documentation

The Agent class serves as an abstract base class (ABC) for defining agents in reinforcement learning (RL) applications. An agent is an entity that interacts with an environment, making decisions based on its observations (states) to achieve certain goals, typically maximizing some notion of cumulative reward. This base class specifies the foundational structure and methods that any RL agent should implement.

Constructor

__init__(self, **kwargs)

Initializes a new instance of an Agent. This constructor is designed to define all key variables required for all agents. It is deliberately left empty (pass) in this base class, allowing subclasses to define specific initializations through keyword arguments (**kwargs).

Parameters:

  • **kwargs: Arbitrary keyword arguments. This allows for flexible initialization tailored to the specific needs of the subclass.

Methods

soft_update(self)

An abstract method that must be implemented by subclasses. It is intended for the soft update of the target model, a technique often used in deep Q-learning to gradually blend the weights of a target network towards those of a trained network to improve learning stability.

train(self)

An abstract method that must be implemented by subclasses. This method encompasses the training logic for the agent, utilizing experiences gathered from the environment to improve its decision-making policy.

action(self, state)

An abstract method that must be implemented by subclasses. It defines how the agent decides on the next action to take, given the current state of the environment.

Parameters:

  • state: The current state of the environment.

Returns:

  • The action chosen by the agent.

load(self)

An abstract method that must be implemented by subclasses. It should provide functionality for loading previously trained machine learning models, enabling the agent to utilize learned policies without retraining from scratch.

save(self)

An abstract method that must be implemented by subclasses. This method should save the current state of the machine learning models used by the agent, allowing the training process to be paused and resumed or for the trained agent to be deployed.

save_cfg(self)

An abstract method that must be implemented by subclasses. It should save the configuration of the agent. This could include hyperparameters, architecture details, and any other settings critical to reproducing the agent's behavior.

Implementing an Agent

To create a specific type of agent (e.g., a Deep Q-Network (DQN) agent, an Actor-Critic agent), one must subclass Agent and provide concrete implementations for all the abstract methods. This involves detailing the agent's learning algorithm, how it updates its models, makes decisions, and manages the persistence of its state and configuration.

Clone this wiki locally