====== Lecture 2 ====== ===== 2.1 - Types of neural network architectures ===== ==== Feed-forward neural networks ==== * Most common in practical applications. * With 1 input layer, 1 output layer and x hidden layers. * With more that 1 layer of hidden units we get a **deep neural network**. ==== Recurrent neural networks ==== * Very difficult to train. * More biologically realistic. * Would be more appropriate to predict a stock price. ==== Symmetrically connected networks ==== * Connections between units are symmetrical (same weight in both directions) * John Hopfield realized they are much easier to analyze than recurrent networks. * More restricted in what they can do. * Symmetrically connected net without hidden units are called **Hopfield nets**. ===== 2.2 - Perceptrons: The first generation of neural networks ===== * We select features manually * Then we **learn** the weights of the features. * We compare the hypothesis with a threshold to decide the final target. * Perceptrons use binary threshold neurons ==== Perceptron training procedure ==== * For each test case: * If the output is correct, leave the weights as is. * If 0 is predicted instead of 1, we add the input vector to the weight vector. * If 1 is predicted instead of 0, we substract the input vector from the weight vector. ===== 2.3 - A geometrical view of perceptrons ===== * **Weight-space**: space with one dimension per weight. * A point in the weight space represents a particular settings of all the weights. * If we eliminate the threshold, each training case can be represented as a hyperplane though the origin. => The weights must lie on one side of this hyper-plane to get the answer correct. * The training case hyperplane is perpendicular to the input vector of that training case. The right side of the plane to get the right answer is given by the direction of the input vector and the target answer (0 or 1). * The average of 2 solutions is also a solution in this weight space. => The problem is convex. ===== 2.4 - Why the learning works ===== * We assume there exist a **feasible** vector of weights in the weight space (eg. vector to get the right answer for all inputs). * After a finite number of mistakes the weight vector must lie in the feasible region **if this region exists**. ===== 2.5 - What perceptrons can't do ===== * If we have the correct features then we can do almost anything. * Now we define a **data-space** where each point is a data point. * The weight vector defines a plane in this data space.