====== MNIST Multilayer Perceptron network ====== {{tag>deep_learning}} As mentioned in [[blog:2018:1230_mnist_logistic|my previous post]] on this topic, we should now try to build and train a multilayer perceptron network on the MNIST dataset. This should provide us a small improvement on the accuracy previously observed using a simple logistic regression network. ====== ====== /* Using this github repo as reference: https://github.com/darksigma/Fundamentals-of-Deep-Learning-Book */ ===== Building the network ===== So here is the python script I used for this experiment (again mostly a copy of the original script): from nv.deep_learning import MNIST from nv.core.utils import * from nv.core.admin import * root_path = nvGetRootPath() logDEBUG("Retrieving MNIST dataset...") mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True) logDEBUG("Done retrieving MNIST dataset.") import tensorflow as tf import shutil, os # Architecture n_hidden_1 = 256 n_hidden_2 = 256 # Parameters learning_rate = 0.01 training_epochs = 300 # training_epochs = 1000 batch_size = 100 display_step = 1 def layer(input, weight_shape, bias_shape): weight_init = tf.random_normal_initializer(stddev=(2.0/weight_shape[0])**0.5) bias_init = tf.constant_initializer(value=0) W = tf.get_variable("W", weight_shape, initializer=weight_init) b = tf.get_variable("b", bias_shape, initializer=bias_init) return tf.nn.relu(tf.matmul(input, W) + b) def inference(x): with tf.variable_scope("hidden_1"): hidden_1 = layer(x, [784, n_hidden_1], [n_hidden_1]) with tf.variable_scope("hidden_2"): hidden_2 = layer(hidden_1, [n_hidden_1, n_hidden_2], [n_hidden_2]) with tf.variable_scope("output"): output = layer(hidden_2, [n_hidden_2, 10], [10]) return output def loss(output, y): xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y) loss = tf.reduce_mean(xentropy) return loss def training(cost, global_step): tf.summary.scalar("cost", cost) optimizer = tf.train.GradientDescentOptimizer(learning_rate) train_op = optimizer.minimize(cost, global_step=global_step) return train_op def evaluate(output, y): correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.summary.scalar("validation", accuracy) return accuracy if __name__ == '__main__': if os.path.exists("tf_logs/"): shutil.rmtree("tf_logs/") with tf.Graph().as_default(): with tf.variable_scope("mlp_model"): x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784 y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes output = inference(x) cost = loss(output, y) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = training(cost, global_step) eval_op = evaluate(output, y) summary_op = tf.summary.merge_all() saver = tf.train.Saver() sess = tf.Session() summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def) init_op = tf.global_variables_initializer() sess.run(init_op) # saver.restore(sess, "tf_logs/model-checkpoint-66000") # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): minibatch_x, minibatch_y = mnist.train.next_batch(batch_size) # Fit training using batch data sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y}) # Compute average loss avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch # Display logs per epoch step if epoch % display_step == 0: logDEBUG("Epoch: {:04d}, cost={:.9f}".format(epoch+1,avg_cost)) accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels}) logDEBUG("Validation Error: %f" % (1 - accuracy)) summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y}) summary_writer.add_summary(summary_str, sess.run(global_step)) saver.save(sess, "tf_logs/model-checkpoint", global_step=global_step) logDEBUG("Optimization Finished!") accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels}) logDEBUG("Test Accuracy: %f" % accuracy) In my previous network training experiment I mentioned a "display issue" where all the training data was only shown on my shell after the training was completed. Here using the "logDEBUG" function of mine instead of simple "print()" **fixed that issue**. In the end this was most probably just a matter of calling **sys.out.flush()** after outputing a message [and this is also probabbly only needed in my custom python setup] ===== Observed results ===== Now strangely enough, after 300 epochs as suggested in the book **Fundamentals of Deep Learning** I only observed a test accuracy of about **86.9%**, which is actually less good than the accuracy we got previously. Then I tested with 1000 training epochs,and also got an accuracy of only **86.7%**, so it seems there is something going wrong here... Also, there is a deprecated warning from tensorflow using the code above: WARNING:tensorflow:From mnist_multilayer_regression.py:46: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. See `tf.nn.softmax_cross_entropy_with_logits_v2`. Checking the documentation page, we see that the main difference will be (as reported above) that: //Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default//. To be honest I'm not absolutely sure what this implies. But it's probably a good idea to give it a try here anyway. => So I replaced the function call **tf.nn.softmax_cross_entropy_with_logits(...)** with the v2 version: def loss(output, y): xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y) loss = tf.reduce_mean(xentropy) return loss And this actually helped to improve the test accuracy significantly, as I could then achieve an accuracy of **97.9%** with 300 training epochs (which is close enough to the book theoretical accuracy of 98.2% I would say, so this sounds acceptable now). Another deprecation warning I noticed here is the following: WARNING:tensorflow:Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`. => For that one we could simply replace the code line: # summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def) summary_writer = tf.summary.FileWriter("tf_logs/", graph=sess.graph) And this should conclude our journey on this multilayer network implementation, next time, we should try the convolutional network implementation on MNIST, expecting a test accuracy of about 99.4%. This next task should prove slightly more complex since we will not have a full code template to start with, but that should be fun anyway :-)!