====== MNIST Multilayer Perceptron network ======
{{tag>deep_learning}}
As mentioned in [[blog:2018:1230_mnist_logistic|my previous post]] on this topic, we should now try to build and train a multilayer perceptron network on the MNIST dataset. This should provide us a small improvement on the accuracy previously observed using a simple logistic regression network.
====== ======
/*
Using this github repo as reference:
https://github.com/darksigma/Fundamentals-of-Deep-Learning-Book
*/
===== Building the network =====
So here is the python script I used for this experiment (again mostly a copy of the original script): from nv.deep_learning import MNIST
from nv.core.utils import *
from nv.core.admin import *
root_path = nvGetRootPath()
logDEBUG("Retrieving MNIST dataset...")
mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True)
logDEBUG("Done retrieving MNIST dataset.")
import tensorflow as tf
import shutil, os
# Architecture
n_hidden_1 = 256
n_hidden_2 = 256
# Parameters
learning_rate = 0.01
training_epochs = 300
# training_epochs = 1000
batch_size = 100
display_step = 1
def layer(input, weight_shape, bias_shape):
weight_init = tf.random_normal_initializer(stddev=(2.0/weight_shape[0])**0.5)
bias_init = tf.constant_initializer(value=0)
W = tf.get_variable("W", weight_shape,
initializer=weight_init)
b = tf.get_variable("b", bias_shape,
initializer=bias_init)
return tf.nn.relu(tf.matmul(input, W) + b)
def inference(x):
with tf.variable_scope("hidden_1"):
hidden_1 = layer(x, [784, n_hidden_1], [n_hidden_1])
with tf.variable_scope("hidden_2"):
hidden_2 = layer(hidden_1, [n_hidden_1, n_hidden_2], [n_hidden_2])
with tf.variable_scope("output"):
output = layer(hidden_2, [n_hidden_2, 10], [10])
return output
def loss(output, y):
xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y)
loss = tf.reduce_mean(xentropy)
return loss
def training(cost, global_step):
tf.summary.scalar("cost", cost)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(cost, global_step=global_step)
return train_op
def evaluate(output, y):
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("validation", accuracy)
return accuracy
if __name__ == '__main__':
if os.path.exists("tf_logs/"):
shutil.rmtree("tf_logs/")
with tf.Graph().as_default():
with tf.variable_scope("mlp_model"):
x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes
output = inference(x)
cost = loss(output, y)
global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = training(cost, global_step)
eval_op = evaluate(output, y)
summary_op = tf.summary.merge_all()
saver = tf.train.Saver()
sess = tf.Session()
summary_writer = tf.summary.FileWriter("tf_logs/",
graph_def=sess.graph_def)
init_op = tf.global_variables_initializer()
sess.run(init_op)
# saver.restore(sess, "tf_logs/model-checkpoint-66000")
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
minibatch_x, minibatch_y = mnist.train.next_batch(batch_size)
# Fit training using batch data
sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y})
# Compute average loss
avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch
# Display logs per epoch step
if epoch % display_step == 0:
logDEBUG("Epoch: {:04d}, cost={:.9f}".format(epoch+1,avg_cost))
accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels})
logDEBUG("Validation Error: %f" % (1 - accuracy))
summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y})
summary_writer.add_summary(summary_str, sess.run(global_step))
saver.save(sess, "tf_logs/model-checkpoint", global_step=global_step)
logDEBUG("Optimization Finished!")
accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels})
logDEBUG("Test Accuracy: %f" % accuracy)
In my previous network training experiment I mentioned a "display issue" where all the training data was only shown on my shell after the training was completed. Here using the "logDEBUG" function of mine instead of simple "print()" **fixed that issue**. In the end this was most probably just a matter of calling **sys.out.flush()** after outputing a message [and this is also probabbly only needed in my custom python setup]
===== Observed results =====
Now strangely enough, after 300 epochs as suggested in the book **Fundamentals of Deep Learning** I only observed a test accuracy of about **86.9%**, which is actually less good than the accuracy we got previously. Then I tested with 1000 training epochs,and also got an accuracy of only **86.7%**, so it seems there is something going wrong here...
Also, there is a deprecated warning from tensorflow using the code above: WARNING:tensorflow:From mnist_multilayer_regression.py:46: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
Checking the documentation page, we see that the main difference will be (as reported above) that: //Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default//. To be honest I'm not absolutely sure what this implies. But it's probably a good idea to give it a try here anyway.
=> So I replaced the function call **tf.nn.softmax_cross_entropy_with_logits(...)** with the v2 version: def loss(output, y):
xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y)
loss = tf.reduce_mean(xentropy)
return loss
And this actually helped to improve the test accuracy significantly, as I could then achieve an accuracy of **97.9%** with 300 training epochs (which is close enough to the book theoretical accuracy of 98.2% I would say, so this sounds acceptable now).
Another deprecation warning I noticed here is the following: WARNING:tensorflow:Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.
=> For that one we could simply replace the code line: # summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def)
summary_writer = tf.summary.FileWriter("tf_logs/", graph=sess.graph)
And this should conclude our journey on this multilayer network implementation, next time, we should try the convolutional network implementation on MNIST, expecting a test accuracy of about 99.4%. This next task should prove slightly more complex since we will not have a full code template to start with, but that should be fun anyway :-)!