MNIST convolution network
In this post we are going to build a convolutional network with the architecture suggested in the reference book, with two pooling and two convolutional layers interleaved followed by a fully connected layer (with a dropout of p=0.5), and a terminal softmax layer.
Building the network
To build this network we use the previous implementation script as a reference, and we add the additional helper function needed to build the convolutional layers:
def conv2d(input, weight_shape, bias_shape): count = weight_shape[0] * weight_shape[1] * weight_shape[2] weight_init = tf.random_normal_initializer(stddev=(2.0/count)**0.5) W = tf.get_variable("W", weight_shape, initializer=weight_init) bias_init = tf.constant_initializer(value=0) b = tf.get_variable("b", bias_shape, initializer=bias_init) conv_out = tf.nn.conv2d(input, W, strides=[1, 1, 1, 1], padding='SAME') return tf.nn.relu(tf.nn.bias_add(conv_out, b)) def max_pool(input, k=2): return tf.nn.max_pool(input, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')
Then we update the inference function:
def inference(x, keep_prob): x = tf.reshape(x, shape=[-1, 28, 28, 1]) with tf.variable_scope("conv_1"): conv_1 = conv2d(x, [5, 5, 1, 32], [32]) pool_1 = max_pool(conv_1) with tf.variable_scope("conv_2"): conv_2 = conv2d(pool_1, [5, 5, 32, 64], [64]) pool_2 = max_pool(conv_2) with tf.variable_scope("fc"): pool_2_flat = tf.reshape(pool_2, [-1, 7 * 7 * 64]) fc_1 = layer(pool_2_flat, [7*7*64, 1024], [1024]) # apply dropout fc_1_drop = tf.nn.dropout(fc_1, keep_prob) with tf.variable_scope("output"): output = layer(fc_1_drop, [1024, 10], [10]) return output
Figuring out layers widths/heights
To compute the width/height of the output of a conv layer, we can use the formulas:
\[width_{out} = \left\lceil\frac{width_{in} - extend + 2*padding}{stride} \right\rceil \] \[height_{out} = \left\lceil \frac{height_{in} - extend + 2*padding}{stride}\right\rceil \]
So after the first max pooling we have a dimension of 14×14, and after the second one, we have a size of 7×7 as stated above.
Setting up Adam Optimizer
We should train the network using the Adam optimizer this time, so I updated the train function:
def training(cost, global_step): tf.summary.scalar("cost", cost) optimizer = tf.train.AdamOptimizer(learning_rate) train_op = optimizer.minimize(cost, global_step=global_step) return train_op
Proper setup with dropout
One addition detail to keep in mind here is that we should use a dropout probability of 0.5 during training, but use a value of 1.0 during evaluation. So I tried to change just a little bit the main loop to reflect this:
output = inference(x, 0.5) cost = loss(output, y) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = training(cost, global_step) # For the evaluation we use a dropout value of 1.0: eval_out = inference(x, 1.0) eval_op = evaluate(eval_out, y) summary_op = tf.summary.merge_all()
… But of course, this didn't work: because calling “inference” twice means defining the layers twice. So instead i tried to set the reuse flag on the global variable scope:
with tf.variable_scope("mlp_model") as scope: x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes output = inference(x, 0.5) cost = loss(output, y) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = training(cost, global_step) # For the evaluation we use a dropout value of 1.0: scope.reuse_variables() eval_out = inference(x, 1.0)
⇒ with this change I can launch the training, but it seems the network is not learning anything:
2019-01-01T21:29:01.783901 [DEBUG] Epoch: 0001, cost=2.378644308 2019-01-01T21:29:02.068586 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:07.736113 [DEBUG] Epoch: 0002, cost=2.302584587 2019-01-01T21:29:07.779086 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:13.306788 [DEBUG] Epoch: 0003, cost=2.302585052 2019-01-01T21:29:13.349761 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:18.897362 [DEBUG] Epoch: 0004, cost=2.302586809 2019-01-01T21:29:18.940488 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:24.368556 [DEBUG] Epoch: 0005, cost=2.302585125 2019-01-01T21:29:24.412528 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:29.923250 [DEBUG] Epoch: 0006, cost=2.302585246 2019-01-01T21:29:29.967224 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:35.406020 [DEBUG] Epoch: 0007, cost=2.302585125 2019-01-01T21:29:35.448995 [DEBUG] Validation Error: 0.904200 2019-01-01T21:29:41.119687 [DEBUG] Epoch: 0008, cost=2.302585125 2019-01-01T21:29:41.166659 [DEBUG] Validation Error: 0.904200
So I eventually found this page, which suggest turning the keep_drop variable into a regular tensorflow “placeholder”, which makes a lot of sense. So I updated the code accordingly:
prob = tf.placeholder_with_default(1.0, shape=()) # and later: sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y, prob: 0.5})
And again this doesn't seem to work: my network is not learning anything just as reported above (stuck on the same values after more than 100 epochs): there must be something incorrect here, so what is it ?
⇒ OK found it: it seems I was using a too high learning rate of 0.01, with a rate of 0.001 the training results look good. Also note that I slightly increased the minibatch size as shown here:
learning_rate = 0.001 training_epochs = 300 # training_epochs = 2000 batch_size = 128 display_step = 1
So with this last change I could acheive a test accuracy of 99.3% after 300 training epochs, which is exactly what we expected! So we are all good on this experiment .