Table of Contents

Full policy Gradient agent for Reinforcement Learning

This time we are going to handle the creation of a full policy gradient algorithm implementation training on the OpenAI CartPole environment. As opposed to the previous simple policy gradient implementation, this time we will need to handle the previous states to decide what actions to take, and the training network will become sligthly more complex.

References

Definitions

Initial implementation

Analysis

\[Loss = - \sum_i log(y_i*(y_i - pi) + (1-y_i)*(y_i + pi)) * A_i\]

\[A_t = \sum_{k=0}^\infty \gamma^k r_{t+k}\]

Conclusion