Lecture 1
1.1 - Why do we need machine learning ?
- Because we don't know how to write the corresponding program!
- Good at:
- Recognizing patterns
- Recognizing anomalies
- Doing prediction
- Here we use the MNIST database of hand-written digits.
1.2 - What are neural networks ?
- Each neuron has:
- An axon
- A Dentritic tree
- Synapses can adapt. They are very slow, and very low power
1.3 - Some simple models of neurons
- For a linear neuron, the output is : \(y = b + \sum\limits_i x_i w_i\)
- where “b” is a bias term, xi, the activity on input term i, and wi the weight of input term i.
Binary threshold neurons
- we have a threshold used to decide if we output 0 or 1: two formulations:
- \(z = \sum\limits_i x_i w_i\) then y=1 if z > \(\theta\) or y=0 otherwise.
- \(z = b + \sum\limits_i x_i w_i\) then y=1 if z > 0 or y=0 otherwise.
Rectified linear neurons
- \(z = b + \sum\limits_i x_i w_i\) then y=z if z > 0 or y=0 otherwise.
Sigmoid neurons
- \(z = b + \sum\limits_i x_i w_i\) then \(y = \frac{1}{1+e^{-z}}\)
Stochastic binary neurons
- \(z = b + \sum\limits_i x_i w_i\) then \(p(y=1) = \frac{1}{1+e^{-z}}\)
1.4 - A simple example of learning
- Simple neural net with on input layer and one output layer.
- To train the network, we increment the weights from active pixels to the correct class.
- We also decrement the weights to all the classes the network guesses.
1.5 - Three types of learning
- Supervised learning
- Regression
- Classification
- Reinforcement learning
- Unsupervised learning
Supervised learning
- We start by choosing a model-class : \(y=f(x; W)\)
- Learning means adjusting the parameters to reduce the discrepancy between real output t and the model output y.
- For regression the measure the error we usually use a term such as \(\frac 12 (t-y)^2\)
- For classification we have other error measure functions.
Reinforcement learning
- About rewards in the future. Difficult to handle.
Unsupervised learning
- Usefull to get “an understanding” (eg. an internal representation) of the input without labeling it.
- Providing compact low-dimensional representation of input. (there is also PCA, but this one is linear)
- Provides economical high dimensional representation
- Finds sensible clusters in the input.