Differences
This shows you the differences between two versions of the page.
— | blog:2019:0101_mnist_convolution [2020/07/10 12:11] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== MNIST convolution network ====== | ||
+ | |||
+ | {{tag> | ||
+ | |||
+ | In this post we are going to build a convolutional network with the architecture suggested in the reference book, with **two pooling and two | ||
+ | convolutional layers interleaved** followed by a fully connected layer (with a dropout of p=0.5), and a terminal softmax layer. | ||
+ | |||
+ | ====== ====== | ||
+ | |||
+ | ===== Building the network ===== | ||
+ | |||
+ | To build this network we use [[blog: | ||
+ | count = weight_shape[0] * weight_shape[1] * weight_shape[2] | ||
+ | | ||
+ | weight_init = tf.random_normal_initializer(stddev=(2.0/ | ||
+ | W = tf.get_variable(" | ||
+ | | ||
+ | bias_init = tf.constant_initializer(value=0) | ||
+ | b = tf.get_variable(" | ||
+ | | ||
+ | conv_out = tf.nn.conv2d(input, | ||
+ | return tf.nn.relu(tf.nn.bias_add(conv_out, | ||
+ | |||
+ | def max_pool(input, | ||
+ | return tf.nn.max_pool(input, | ||
+ | </ | ||
+ | |||
+ | Then we update the inference function: <sxh python> | ||
+ | x = tf.reshape(x, | ||
+ | | ||
+ | with tf.variable_scope(" | ||
+ | conv_1 = conv2d(x, [5, 5, 1, 32], [32]) | ||
+ | pool_1 = max_pool(conv_1) | ||
+ | | ||
+ | with tf.variable_scope(" | ||
+ | conv_2 = conv2d(pool_1, | ||
+ | pool_2 = max_pool(conv_2) | ||
+ | | ||
+ | with tf.variable_scope(" | ||
+ | pool_2_flat = tf.reshape(pool_2, | ||
+ | fc_1 = layer(pool_2_flat, | ||
+ | | ||
+ | # apply dropout | ||
+ | fc_1_drop = tf.nn.dropout(fc_1, | ||
+ | |||
+ | with tf.variable_scope(" | ||
+ | output = layer(fc_1_drop, | ||
+ | | ||
+ | return output</ | ||
+ | |||
+ | ==== Figuring out layers widths/ | ||
+ | |||
+ | /* | ||
+ | One thing I don't quite understand with the code above is the computation of the final width and height value from the pool_2 layer: 7 x 7 x 64. | ||
+ | |||
+ | Where does those values " | ||
+ | We start with a width and height of 28 pixels. | ||
+ | From that we build a conv layer with a filter extend of 5 and stride of 1, padding = ' | ||
+ | |||
+ | cf. https:// | ||
+ | Padding = Same: means the input image ought to have zero padding so that the output in convolution doesnt differ in size as input. in simple, we add a n-pixel borders of zero to tell that ‘dont’ reduce the dimension and have same as input. important thing is that when we add borders of zero pixels to input, then we reduce contract in feature maps. | ||
+ | */ | ||
+ | |||
+ | < | ||
+ | |||
+ | To compute the width/ | ||
+ | |||
+ | \[width_{out} = \left\lceil\frac{width_{in} - extend + 2*padding}{stride} \right\rceil \] | ||
+ | \[height_{out} = \left\lceil \frac{height_{in} - extend + 2*padding}{stride}\right\rceil \] | ||
+ | |||
+ | So after the first max pooling we have a dimension of 14x14, and after the second one, we have a size of 7x7 as stated above. | ||
+ | |||
+ | ==== Setting up Adam Optimizer ==== | ||
+ | |||
+ | We should train the network using the Adam optimizer this time, so I updated the train function: <sxh python> | ||
+ | tf.summary.scalar(" | ||
+ | optimizer = tf.train.AdamOptimizer(learning_rate) | ||
+ | train_op = optimizer.minimize(cost, | ||
+ | return train_op | ||
+ | </ | ||
+ | |||
+ | ==== Proper setup with dropout ==== | ||
+ | |||
+ | One addition detail to keep in mind here is that we should use a **dropout probability** of 0.5 during training, but use a value of 1.0 during evaluation. So I tried to change just a little bit the main loop to reflect this: <sxh python> | ||
+ | cost = loss(output, | ||
+ | |||
+ | global_step = tf.Variable(0, | ||
+ | |||
+ | train_op = training(cost, | ||
+ | |||
+ | # For the evaluation we use a dropout value of 1.0: | ||
+ | eval_out = inference(x, | ||
+ | eval_op = evaluate(eval_out, | ||
+ | summary_op = tf.summary.merge_all()</ | ||
+ | |||
+ | ... But of course, this didn't work: because calling " | ||
+ | |||
+ | x = tf.placeholder(" | ||
+ | y = tf.placeholder(" | ||
+ | |||
+ | |||
+ | output = inference(x, | ||
+ | cost = loss(output, | ||
+ | |||
+ | global_step = tf.Variable(0, | ||
+ | |||
+ | train_op = training(cost, | ||
+ | |||
+ | # For the evaluation we use a dropout value of 1.0: | ||
+ | scope.reuse_variables() | ||
+ | | ||
+ | eval_out = inference(x, | ||
+ | |||
+ | => with this change I can launch the training, but it seems the network is not learning anything: <sxh bash> | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | 2019-01-01T21: | ||
+ | </ | ||
+ | |||
+ | So I eventually found [[https:// | ||
+ | |||
+ | # and later: | ||
+ | sess.run(train_op, | ||
+ | |||
+ | And again **this doesn' | ||
+ | |||
+ | => **OK found it**: it seems I was using a too high learning rate of **0.01**, with a rate of **0.001** the training results look good. Also note that I slightly increased the minibatch size as shown [[https:// | ||
+ | training_epochs = 300 | ||
+ | # training_epochs = 2000 | ||
+ | batch_size = 128 | ||
+ | display_step = 1</ | ||
+ | |||
+ | So with this last change I could acheive a test accuracy of **99.3%** after 300 training epochs, which is exactly what we expected! So we are all good on this experiment :-). | ||
+ | |||