====== Lecture 4 ====== ===== 3.1 - Learning to predict the next word ===== * Nothing relevant here. ===== 3.2 - A brief diversion into cognitive science ===== * **feature theory**: a concept is a set of semantic features * **structuralist theory**: the meaning of a concept lies in its relationships to other concepts. ===== 3.3 - Another diversion: The softmax output function ===== * Eahc neuron in the output layer would receive a total input of \(z_i\) and will output a value \(y_i\) that also depends on the inputs from the other neurons in that group: \(y_i = \frac{e^{z_i}}{\sum\limits_{j \in group} e^{z_j}}\) * The derivative of the softmax is simple: \(\frac{\partial y_i}{\partial z_i} = y_i (1 - y_i)\) ==== Cross-entroy : the right cost function to use with softmax ==== * \(C = - \sum\limits_j t_j log(y_i)\) * C has a very big gradient when the target value is 1 and the output is almost zero. (eg. very steep derivative when the answer is very wrong) * \(\frac{\partial C}{\partial z_i} = y_i - t_i\)