public:courses:machine_learning:neural_networks:lecture_4 [NervTech's Wiki]

This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.

====== Lecture 4 ======

===== 3.1 - Learning to predict the next word =====

  * Nothing relevant here.

===== 3.2 - A brief diversion into cognitive science =====

 * **feature theory**: a concept is a set of semantic features
 * **structuralist theory**: the meaning of a concept lies in its relationships to other concepts.

 ===== 3.3 - Another diversion: The softmax output function =====

  * Eahc neuron in the output layer would receive a total input of \(z_i\) and will output a value \(y_i\) that also depends on the inputs from the other neurons in that group: \(y_i = \frac{e^{z_i}}{\sum\limits_{j \in group} e^{z_j}}\)

  * The derivative of the softmax is simple: \(\frac{\partial y_i}{\partial z_i} = y_i (1 - y_i)\)

==== Cross-entroy : the right cost function to use with softmax ====

  * \(C = - \sum\limits_j t_j log(y_i)\)
  * C has a very big gradient when the target value is 1 and the output is almost zero. (eg. very steep derivative when the answer is very wrong)

  * \(\frac{\partial C}{\partial z_i} = y_i - t_i\)