Table of Contents

QNetwork learning

Continuing on my current “Reinforcement Learning” path we are now going to try the Q network implementation that we will train on the Frozenlake environment again.

References

Reference implementation

Initial investigations

Cleaning session run calls

Multi-inputs training

Recursive Q target update

with: \[p_i = \frac{Q[S, a, S_i, 0]}{\sum_j Q[S, a, S_j, 0]} \]

\[Q^t_{i,j} = \sum_k p_{i,j,k} \left( R_{i,j,k} + \gamma Q^s_k \right)\]

\[Q^t_{i,j} = \sum_{k \neq i} p_{i,j,k} \left( R_{i,j,k} + \gamma Q^s_k \right) + p_{i,j,i} \left( R_{i,j,i} + \gamma Q^s_i \right)\]