Table of Contents

Multi-actions vanilla policy Gradient

As a small extension to the previous policy gradient implementation we discussed, we are now going to study how to support multiple actions (ie. num_actions > 2) in the policy network.

References

Reference implementation

Analysis

Our base implementation