Differences

This shows you the differences between two versions of the page.

Link to this comparison view

blog:2019:0101_mnist_convolution [2019/01/02 09:51] (current)
Line 1: Line 1:
 +====== MNIST convolution network ======
 +
 +{{tag>​deep_learning}}
 +
 +In this post we are going to build a convolutional network with the architecture suggested in the reference book, with **two pooling and two
 +convolutional layers interleaved** followed by a fully connected layer (with a dropout of p=0.5), and a terminal softmax layer.
 +
 +====== ======
 +
 +===== Building the network =====
 +
 +To build this network we use [[blog:​2018:​1230_mnist_multilayer|the previous implementation script]] as a reference, and we add the additional helper function needed to build the convolutional layers: <sxh python>​def conv2d(input,​ weight_shape,​ bias_shape):​
 +    count = weight_shape[0] * weight_shape[1] * weight_shape[2]
 +    ​
 +    weight_init = tf.random_normal_initializer(stddev=(2.0/​count)**0.5)
 +    W = tf.get_variable("​W",​ weight_shape,​ initializer=weight_init)
 +    ​
 +    bias_init = tf.constant_initializer(value=0)
 +    b = tf.get_variable("​b",​ bias_shape, initializer=bias_init)
 +    ​
 +    conv_out = tf.nn.conv2d(input,​ W, strides=[1, 1, 1, 1], padding='​SAME'​)
 +    return tf.nn.relu(tf.nn.bias_add(conv_out,​ b))
 +
 +def max_pool(input,​ k=2):
 +    return tf.nn.max_pool(input,​ ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='​SAME'​)
 +</​sxh>​
 +
 +Then we update the inference function: <sxh python>​def inference(x,​ keep_prob):
 +    x = tf.reshape(x,​ shape=[-1, 28, 28, 1])
 +    ​
 +    with tf.variable_scope("​conv_1"​):​
 +        conv_1 = conv2d(x, [5, 5, 1, 32], [32])
 +        pool_1 = max_pool(conv_1)
 +    ​
 +    with tf.variable_scope("​conv_2"​):​
 +        conv_2 = conv2d(pool_1,​ [5, 5, 32, 64], [64])
 +        pool_2 = max_pool(conv_2)
 +    ​
 +    with tf.variable_scope("​fc"​):​
 +        pool_2_flat = tf.reshape(pool_2,​ [-1, 7 * 7 * 64])
 +        fc_1 = layer(pool_2_flat,​ [7*7*64, 1024], [1024])
 +        ​
 +        # apply dropout
 +        fc_1_drop = tf.nn.dropout(fc_1,​ keep_prob)
 +
 +    with tf.variable_scope("​output"​):​
 +        output = layer(fc_1_drop,​ [1024, 10], [10])
 +    ​
 +    return output</​sxh>​
 +
 +==== Figuring out layers widths/​heights ====
 +
 +/*
 +One thing I don't quite understand with the code above is the computation of the final width and height value from the pool_2 layer: 7 x 7 x 64.
 +
 +Where does those values "​7"​ come from ?
 +We start with a width and height of 28 pixels.
 +From that we build a conv layer with a filter extend of 5 and stride of 1, padding = '​SAME'​
 +
 +cf. https://​www.quora.com/​What-does-the-same-padding-parameter-in-convolution-mean-in-TensorFlow
 +Padding = Same: means the input image ought to have zero padding so that the output in convolution doesnt differ in size as input. in simple, we add a n-pixel borders of zero to tell that ‘dont’ reduce the dimension and have same as input. important thing is that when we add borders of zero pixels to input, then we reduce contract in feature maps.
 +*/
 +
 +<​note>​Since we used padding='​SAME'​ for our convolution layers, this means that our output should have the same dimensions as our input on that layer, the input size is 28x28, and we apply a filter of size 5x5, so this means we will use a zero padding of 2 pixels around our input images.</​note>​
 +
 +To compute the width/​height of the output of a conv layer, we can use the formulas:
 +
 +\[width_{out} = \left\lceil\frac{width_{in} - extend + 2*padding}{stride} \right\rceil \]
 +\[height_{out} = \left\lceil \frac{height_{in} - extend + 2*padding}{stride}\right\rceil \]
 +
 +So after the first max pooling we have a dimension of 14x14, and after the second one, we have a size of 7x7 as stated above.
 +
 +==== Setting up Adam Optimizer ====
 +
 +We should train the network using the Adam optimizer this time, so I updated the train function: <sxh python>​def training(cost,​ global_step):​
 +    tf.summary.scalar("​cost",​ cost)
 +    optimizer = tf.train.AdamOptimizer(learning_rate)
 +    train_op = optimizer.minimize(cost,​ global_step=global_step)
 +    return train_op
 +</​sxh>​
 +
 +==== Proper setup with dropout ====
 +
 +One addition detail to keep in mind here is that we should use a **dropout probability** of 0.5 during training, but use a value of 1.0 during evaluation. So I tried to change just a little bit the main loop to reflect this: <sxh python> ​           output = inference(x,​ 0.5)
 +            cost = loss(output,​ y)
 +
 +            global_step = tf.Variable(0,​ name='​global_step',​ trainable=False)
 +
 +            train_op = training(cost,​ global_step)
 +
 +            # For the evaluation we use a dropout value of 1.0:
 +            eval_out = inference(x,​ 1.0)
 +            eval_op = evaluate(eval_out,​ y)
 +            summary_op = tf.summary.merge_all()</​sxh>​
 +
 +... But of course, this didn't work: because calling "​inference"​ twice means defining the layers twice. So instead i tried to set the reuse flag on the global variable scope: <sxh python> ​       with tf.variable_scope("​mlp_model"​) as scope:
 +
 +            x = tf.placeholder("​float",​ [None, 784]) # mnist data image of shape 28*28=784
 +            y = tf.placeholder("​float",​ [None, 10]) # 0-9 digits recognition => 10 classes
 +
 +
 +            output = inference(x,​ 0.5)
 +            cost = loss(output,​ y)
 +
 +            global_step = tf.Variable(0,​ name='​global_step',​ trainable=False)
 +
 +            train_op = training(cost,​ global_step)
 +
 +            # For the evaluation we use a dropout value of 1.0:
 +            scope.reuse_variables()
 +            ​
 +            eval_out = inference(x,​ 1.0)</​sxh>​
 +
 +=> with this change I can launch the training, but it seems the network is not learning anything: <sxh bash>​2019-01-01T21:​29:​01.783901 [DEBUG] Epoch: 0001, cost=2.378644308
 +2019-01-01T21:​29:​02.068586 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​07.736113 [DEBUG] Epoch: 0002, cost=2.302584587
 +2019-01-01T21:​29:​07.779086 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​13.306788 [DEBUG] Epoch: 0003, cost=2.302585052
 +2019-01-01T21:​29:​13.349761 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​18.897362 [DEBUG] Epoch: 0004, cost=2.302586809
 +2019-01-01T21:​29:​18.940488 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​24.368556 [DEBUG] Epoch: 0005, cost=2.302585125
 +2019-01-01T21:​29:​24.412528 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​29.923250 [DEBUG] Epoch: 0006, cost=2.302585246
 +2019-01-01T21:​29:​29.967224 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​35.406020 [DEBUG] Epoch: 0007, cost=2.302585125
 +2019-01-01T21:​29:​35.448995 [DEBUG] Validation Error: 0.904200
 +2019-01-01T21:​29:​41.119687 [DEBUG] Epoch: 0008, cost=2.302585125
 +2019-01-01T21:​29:​41.166659 [DEBUG] Validation Error: 0.904200
 +</​sxh>​
 +
 +So I eventually found [[https://​stackoverflow.com/​questions/​44971349/​how-to-turn-off-dropout-for-testing-in-tensorflow|this page]], which suggest turning the keep_drop variable into a regular tensorflow "​placeholder",​ which makes a lot of sense. So I updated the code accordingly:​ <sxh python>​prob = tf.placeholder_with_default(1.0,​ shape=())
 +
 +# and later:
 +sess.run(train_op,​ feed_dict={x:​ minibatch_x,​ y: minibatch_y,​ prob: 0.5})</​sxh>​
 +
 +And again **this doesn'​t seem to work**: my network is not learning anything just as reported above (stuck on the same values after more than 100 epochs): there must be something incorrect here, so what is it ?
 +
 +=> **OK found it**: it seems I was using a too high learning rate of **0.01**, with a rate of **0.001** the training results look good. Also note that I slightly increased the minibatch size as shown [[https://​github.com/​aymericdamien/​TensorFlow-Examples/​blob/​master/​examples/​3_NeuralNetworks/​convolutional_network.py|here]]:​ <sxh python>​learning_rate = 0.001
 +training_epochs = 300
 +# training_epochs = 2000
 +batch_size = 128
 +display_step = 1</​sxh>​
 +
 +So with this last change I could acheive a test accuracy of **99.3%** after 300 training epochs, which is exactly what we expected! So we are all good on this experiment :-).
 +