====== MNIST convolution network ====== {{tag>deep_learning}} In this post we are going to build a convolutional network with the architecture suggested in the reference book, with **two pooling and two convolutional layers interleaved** followed by a fully connected layer (with a dropout of p=0.5), and a terminal softmax layer. ====== ====== ===== Building the network ===== To build this network we use [[blog:2018:1230_mnist_multilayer|the previous implementation script]] as a reference, and we add the additional helper function needed to build the convolutional layers: def conv2d(input, weight_shape, bias_shape): count = weight_shape[0] * weight_shape[1] * weight_shape[2] weight_init = tf.random_normal_initializer(stddev=(2.0/count)**0.5) W = tf.get_variable("W", weight_shape, initializer=weight_init) bias_init = tf.constant_initializer(value=0) b = tf.get_variable("b", bias_shape, initializer=bias_init) conv_out = tf.nn.conv2d(input, W, strides=[1, 1, 1, 1], padding='SAME') return tf.nn.relu(tf.nn.bias_add(conv_out, b)) def max_pool(input, k=2): return tf.nn.max_pool(input, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME') Then we update the inference function: def inference(x, keep_prob): x = tf.reshape(x, shape=[-1, 28, 28, 1]) with tf.variable_scope("conv_1"): conv_1 = conv2d(x, [5, 5, 1, 32], [32]) pool_1 = max_pool(conv_1) with tf.variable_scope("conv_2"): conv_2 = conv2d(pool_1, [5, 5, 32, 64], [64]) pool_2 = max_pool(conv_2) with tf.variable_scope("fc"): pool_2_flat = tf.reshape(pool_2, [-1, 7 * 7 * 64]) fc_1 = layer(pool_2_flat, [7*7*64, 1024], [1024]) # apply dropout fc_1_drop = tf.nn.dropout(fc_1, keep_prob) with tf.variable_scope("output"): output = layer(fc_1_drop, [1024, 10], [10]) return output ==== Figuring out layers widths/heights ==== /* One thing I don't quite understand with the code above is the computation of the final width and height value from the pool_2 layer: 7 x 7 x 64. Where does those values "7" come from ? We start with a width and height of 28 pixels. From that we build a conv layer with a filter extend of 5 and stride of 1, padding = 'SAME' cf. https://www.quora.com/What-does-the-same-padding-parameter-in-convolution-mean-in-TensorFlow Padding = Same: means the input image ought to have zero padding so that the output in convolution doesnt differ in size as input. in simple, we add a n-pixel borders of zero to tell that ‘dont’ reduce the dimension and have same as input. important thing is that when we add borders of zero pixels to input, then we reduce contract in feature maps. */ Since we used padding='SAME' for our convolution layers, this means that our output should have the same dimensions as our input on that layer, the input size is 28x28, and we apply a filter of size 5x5, so this means we will use a zero padding of 2 pixels around our input images. To compute the width/height of the output of a conv layer, we can use the formulas: \[width_{out} = \left\lceil\frac{width_{in} - extend + 2*padding}{stride} \right\rceil \] \[height_{out} = \left\lceil \frac{height_{in} - extend + 2*padding}{stride}\right\rceil \] So after the first max pooling we have a dimension of 14x14, and after the second one, we have a size of 7x7 as stated above. ==== Setting up Adam Optimizer ==== We should train the network using the Adam optimizer this time, so I updated the train function: def training(cost, global_step): tf.summary.scalar("cost", cost) optimizer = tf.train.AdamOptimizer(learning_rate) train_op = optimizer.minimize(cost, global_step=global_step) return train_op ==== Proper setup with dropout ==== One addition detail to keep in mind here is that we should use a **dropout probability** of 0.5 during training, but use a value of 1.0 during evaluation. So I tried to change just a little bit the main loop to reflect this: output = inference(x, 0.5) cost = loss(output, y) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = training(cost, global_step) # For the evaluation we use a dropout value of 1.0: eval_out = inference(x, 1.0) eval_op = evaluate(eval_out, y) summary_op = tf.summary.merge_all() ... But of course, this didn't work: because calling "inference" twice means defining the layers twice. So instead i tried to set the reuse flag on the global variable scope: with tf.variable_scope("mlp_model") as scope: x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes output = inference(x, 0.5) cost = loss(output, y) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = training(cost, global_step) # For the evaluation we use a dropout value of 1.0: scope.reuse_variables() eval_out = inference(x, 1.0) => with this change I can launch the training, but it seems the network is not learning anything: 2019-01-01T21:29:01.783901 [DEBUG] Epoch: 0001, cost=2.378644308 2019-01-01T21:29:02.068586 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:07.736113 [DEBUG] Epoch: 0002, cost=2.302584587 2019-01-01T21:29:07.779086 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:13.306788 [DEBUG] Epoch: 0003, cost=2.302585052 2019-01-01T21:29:13.349761 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:18.897362 [DEBUG] Epoch: 0004, cost=2.302586809 2019-01-01T21:29:18.940488 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:24.368556 [DEBUG] Epoch: 0005, cost=2.302585125 2019-01-01T21:29:24.412528 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:29.923250 [DEBUG] Epoch: 0006, cost=2.302585246 2019-01-01T21:29:29.967224 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:35.406020 [DEBUG] Epoch: 0007, cost=2.302585125 2019-01-01T21:29:35.448995 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:41.119687 [DEBUG] Epoch: 0008, cost=2.302585125 2019-01-01T21:29:41.166659 [DEBUG] Validation Error: 0.904200 So I eventually found [[https://stackoverflow.com/questions/44971349/how-to-turn-off-dropout-for-testing-in-tensorflow|this page]], which suggest turning the keep_drop variable into a regular tensorflow "placeholder", which makes a lot of sense. So I updated the code accordingly: prob = tf.placeholder_with_default(1.0, shape=()) # and later: sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y, prob: 0.5}) And again **this doesn't seem to work**: my network is not learning anything just as reported above (stuck on the same values after more than 100 epochs): there must be something incorrect here, so what is it ? => **OK found it**: it seems I was using a too high learning rate of **0.01**, with a rate of **0.001** the training results look good. Also note that I slightly increased the minibatch size as shown [[https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py|here]]: learning_rate = 0.001 training_epochs = 300 # training_epochs = 2000 batch_size = 128 display_step = 1 So with this last change I could acheive a test accuracy of **99.3%** after 300 training epochs, which is exactly what we expected! So we are all good on this experiment :-).