MNIST Multilayer Perceptron network
As mentioned in my previous post on this topic, we should now try to build and train a multilayer perceptron network on the MNIST dataset. This should provide us a small improvement on the accuracy previously observed using a simple logistic regression network.
Building the network
So here is the python script I used for this experiment (again mostly a copy of the original script):
from nv.deep_learning import MNIST from nv.core.utils import * from nv.core.admin import * root_path = nvGetRootPath() logDEBUG("Retrieving MNIST dataset...") mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True) logDEBUG("Done retrieving MNIST dataset.") import tensorflow as tf import shutil, os # Architecture n_hidden_1 = 256 n_hidden_2 = 256 # Parameters learning_rate = 0.01 training_epochs = 300 # training_epochs = 1000 batch_size = 100 display_step = 1 def layer(input, weight_shape, bias_shape): weight_init = tf.random_normal_initializer(stddev=(2.0/weight_shape[0])**0.5) bias_init = tf.constant_initializer(value=0) W = tf.get_variable("W", weight_shape, initializer=weight_init) b = tf.get_variable("b", bias_shape, initializer=bias_init) return tf.nn.relu(tf.matmul(input, W) + b) def inference(x): with tf.variable_scope("hidden_1"): hidden_1 = layer(x, [784, n_hidden_1], [n_hidden_1]) with tf.variable_scope("hidden_2"): hidden_2 = layer(hidden_1, [n_hidden_1, n_hidden_2], [n_hidden_2]) with tf.variable_scope("output"): output = layer(hidden_2, [n_hidden_2, 10], [10]) return output def loss(output, y): xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y) loss = tf.reduce_mean(xentropy) return loss def training(cost, global_step): tf.summary.scalar("cost", cost) optimizer = tf.train.GradientDescentOptimizer(learning_rate) train_op = optimizer.minimize(cost, global_step=global_step) return train_op def evaluate(output, y): correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.summary.scalar("validation", accuracy) return accuracy if __name__ == '__main__': if os.path.exists("tf_logs/"): shutil.rmtree("tf_logs/") with tf.Graph().as_default(): with tf.variable_scope("mlp_model"): x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes output = inference(x) cost = loss(output, y) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = training(cost, global_step) eval_op = evaluate(output, y) summary_op = tf.summary.merge_all() saver = tf.train.Saver() sess = tf.Session() summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def) init_op = tf.global_variables_initializer() sess.run(init_op) # saver.restore(sess, "tf_logs/model-checkpoint-66000") # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): minibatch_x, minibatch_y = mnist.train.next_batch(batch_size) # Fit training using batch data sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y}) # Compute average loss avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch # Display logs per epoch step if epoch % display_step == 0: logDEBUG("Epoch: {:04d}, cost={:.9f}".format(epoch+1,avg_cost)) accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels}) logDEBUG("Validation Error: %f" % (1 - accuracy)) summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y}) summary_writer.add_summary(summary_str, sess.run(global_step)) saver.save(sess, "tf_logs/model-checkpoint", global_step=global_step) logDEBUG("Optimization Finished!") accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels}) logDEBUG("Test Accuracy: %f" % accuracy)
Observed results
Now strangely enough, after 300 epochs as suggested in the book Fundamentals of Deep Learning I only observed a test accuracy of about 86.9%, which is actually less good than the accuracy we got previously. Then I tested with 1000 training epochs,and also got an accuracy of only 86.7%, so it seems there is something going wrong here…
Also, there is a deprecated warning from tensorflow using the code above:
WARNING:tensorflow:From mnist_multilayer_regression.py:46: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. See `tf.nn.softmax_cross_entropy_with_logits_v2`.
Checking the documentation page, we see that the main difference will be (as reported above) that: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. To be honest I'm not absolutely sure what this implies. But it's probably a good idea to give it a try here anyway.
⇒ So I replaced the function call tf.nn.softmax_cross_entropy_with_logits(…) with the v2 version:
def loss(output, y): xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y) loss = tf.reduce_mean(xentropy) return loss
And this actually helped to improve the test accuracy significantly, as I could then achieve an accuracy of 97.9% with 300 training epochs (which is close enough to the book theoretical accuracy of 98.2% I would say, so this sounds acceptable now).
Another deprecation warning I noticed here is the following:
WARNING:tensorflow:Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.
⇒ For that one we could simply replace the code line:
# summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def) summary_writer = tf.summary.FileWriter("tf_logs/", graph=sess.graph)
And this should conclude our journey on this multilayer network implementation, next time, we should try the convolutional network implementation on MNIST, expecting a test accuracy of about 99.4%. This next task should prove slightly more complex since we will not have a full code template to start with, but that should be fun anyway !