Table of Contents

Restricted Boltzmann Machine

Hinton - A practical guide to Training RBMs

Overview of Contrastive Divergence

Unbiased sample of \(\langle v_ih_j \rangle_{data}\)

\[\begin{equation}p(h_j=1 | \mathbf{v}) = \sigma(b_j + \sum\limits_i v_iw_{ij})\end{equation}\label{hid_prob}\]

\[\begin{equation}p(v_i=1 | \mathbf{h}) = \sigma(a_i + \sum\limits_j h_jw_{ij})\end{equation}\label{vis_prob}\]

Unbiased sample of \(\langle v_ih_j \rangle_{model}\)

Collect stats with Contrastive Divergence

Updating the hidden states

Updating the visible states

Collecting the statistics

Recipe for getting learning signal for CD1

Size of a mini batch

Monitor learning progress

Monitoring overfitting

training data may be growing even faster than the gap, so the probability of the validation data may still be improving).

The learning rate

The initial values of the weights and biases

Momentum

\[\Delta \theta_i(t) = \nu_i(t) = \alpha \nu_i(t-1) - \epsilon \frac{dE}{d\theta_i}(t)\]

error has settled down to gentle progress, increase the momentum to 0.9.

Weight-decay

Encouraging sparse hidden activities

\[penalty \propto -p log(q) - (1-p) log(1-q)\]

Number of hidden units

Different types of unit

Softmax units

Gaussian visible units

\[E(\mathbf{v},\mathbf{h}) = \sum\limits_{i \in Vis} \frac{(v_i - a_i)^2}{2\sigma_i^2} - \sum\limits_{j \in hid} b_jh_j - \sum\limits_{i,j} \frac{v_i}{\sigma_i} h_jw_{ij}\]

Gaussian visible and hidden units

Binomial units

Rectified linear units

Varieties of contrastive divergence

Displaying what is happening during learning

Using RBM’s for discrimination

Dealing with missing values

Taylor and Hinton - 2006 - Modeling Human Motion using Binary Latent Variables

An energy-based model for vectors of real-values

\[- log~p(\mathbf{v},\mathbf{h}) = \sum\limits_i \frac{(v_i - c_i)^2}{2\sigma_i^2} - \sum\limits_j b_jh_j - \sum\limits_{i,j} \frac{v_i}{\sigma_i} h_jw_{ij} + const\]

\[p(h_j = 1 |\mathbf{v}) = f(b_j + \sum\limits_i v_iw_{ij})\] \[p(v_i|\mathbf{h}) = \mathscr{N}(c_i + \sum\limits_j h_jw_{ij},1)\]

Conditional RBM model

\[\begin{equation}\Delta d_{ij}^{(t-q)} \propto v_i^{t-q} (\langle h_j^t \rangle_{data} - \langle h_j^t \rangle_{recon})\end{equation}\label{dir_v_h}\]

\[\begin{equation}\Delta a_{ki}^{(t-q)} \propto v_k^{t-q} (v_i^t - \langle v_i^t \rangle_{recon})\end{equation}\label{dir_v_v}\]

Approximations

Data gathering and preprocessing

Experiments

Higher level model

Discussion

Recurrent Neural Networks

The Unreasonable Effectiveness of Recurrent Neural Networks

What are Recurrent Neural Networks

* RNN allow us to operate over **sequences** of vectors.
* Check http://deepmind.com/

Tutorial on LSTM Recurrent Networks

LSTM implementation explained

Understanding LSTM Networks