MNIST Multilayer Perceptron network

As mentioned in my previous post on this topic, we should now try to build and train a multilayer perceptron network on the MNIST dataset. This should provide us a small improvement on the accuracy previously observed using a simple logistic regression network.

So here is the python script I used for this experiment (again mostly a copy of the original script):

from nv.deep_learning import MNIST
from nv.core.utils import *
from nv.core.admin import *

root_path = nvGetRootPath()
logDEBUG("Retrieving MNIST dataset...")
mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True)
logDEBUG("Done retrieving MNIST dataset.")

import tensorflow as tf
import shutil, os

# Architecture
n_hidden_1 = 256
n_hidden_2 = 256

# Parameters
learning_rate = 0.01
training_epochs = 300
# training_epochs = 1000
batch_size = 100
display_step = 1

def layer(input, weight_shape, bias_shape):
    weight_init = tf.random_normal_initializer(stddev=(2.0/weight_shape[0])**0.5)
    bias_init = tf.constant_initializer(value=0)
    W = tf.get_variable("W", weight_shape,
                        initializer=weight_init)
    b = tf.get_variable("b", bias_shape,
                        initializer=bias_init)
    return tf.nn.relu(tf.matmul(input, W) + b)

def inference(x):
    with tf.variable_scope("hidden_1"):
        hidden_1 = layer(x, [784, n_hidden_1], [n_hidden_1])
     
    with tf.variable_scope("hidden_2"):
        hidden_2 = layer(hidden_1, [n_hidden_1, n_hidden_2], [n_hidden_2])
     
    with tf.variable_scope("output"):
        output = layer(hidden_2, [n_hidden_2, 10], [10])

    return output

def loss(output, y):
    xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y)    
    loss = tf.reduce_mean(xentropy)
    return loss

def training(cost, global_step):
    tf.summary.scalar("cost", cost)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(cost, global_step=global_step)
    return train_op


def evaluate(output, y):
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("validation", accuracy)
    return accuracy

if __name__ == '__main__':
    
    if os.path.exists("tf_logs/"):
        shutil.rmtree("tf_logs/")

    with tf.Graph().as_default():

        with tf.variable_scope("mlp_model"):

            x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
            y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes


            output = inference(x)
            cost = loss(output, y)

            global_step = tf.Variable(0, name='global_step', trainable=False)

            train_op = training(cost, global_step)
            eval_op = evaluate(output, y)
            summary_op = tf.summary.merge_all()

            saver = tf.train.Saver()
            sess = tf.Session()

            summary_writer = tf.summary.FileWriter("tf_logs/",
                                                graph_def=sess.graph_def)

            
            init_op = tf.global_variables_initializer()
            sess.run(init_op)

            # saver.restore(sess, "tf_logs/model-checkpoint-66000")

            # Training cycle
            for epoch in range(training_epochs):

                avg_cost = 0.
                total_batch = int(mnist.train.num_examples/batch_size)
                
                # Loop over all batches
                for i in range(total_batch):
                    minibatch_x, minibatch_y = mnist.train.next_batch(batch_size)
                    # Fit training using batch data
                    sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y})
                    # Compute average loss
                    avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch
                
                # Display logs per epoch step
                if epoch % display_step == 0:
                    logDEBUG("Epoch: {:04d}, cost={:.9f}".format(epoch+1,avg_cost))

                    accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels})

                    logDEBUG("Validation Error: %f" % (1 - accuracy))

                    summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y})
                    summary_writer.add_summary(summary_str, sess.run(global_step))

                    saver.save(sess, "tf_logs/model-checkpoint", global_step=global_step)


            logDEBUG("Optimization Finished!")

            accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels})

            logDEBUG("Test Accuracy: %f" % accuracy)

In my previous network training experiment I mentioned a “display issue” where all the training data was only shown on my shell after the training was completed. Here using the “logDEBUG” function of mine instead of simple “print()” fixed that issue. In the end this was most probably just a matter of calling sys.out.flush() after outputing a message [and this is also probabbly only needed in my custom python setup]

Now strangely enough, after 300 epochs as suggested in the book Fundamentals of Deep Learning I only observed a test accuracy of about 86.9%, which is actually less good than the accuracy we got previously. Then I tested with 1000 training epochs,and also got an accuracy of only 86.7%, so it seems there is something going wrong here…

Also, there is a deprecated warning from tensorflow using the code above:

WARNING:tensorflow:From mnist_multilayer_regression.py:46: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Checking the documentation page, we see that the main difference will be (as reported above) that: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. To be honest I'm not absolutely sure what this implies. But it's probably a good idea to give it a try here anyway.

⇒ So I replaced the function call tf.nn.softmax_cross_entropy_with_logits(…) with the v2 version:

def loss(output, y):
    xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y)    
    loss = tf.reduce_mean(xentropy)
    return loss

And this actually helped to improve the test accuracy significantly, as I could then achieve an accuracy of 97.9% with 300 training epochs (which is close enough to the book theoretical accuracy of 98.2% I would say, so this sounds acceptable now).

Another deprecation warning I noticed here is the following:

WARNING:tensorflow:Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.

⇒ For that one we could simply replace the code line:

 # summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def)
summary_writer = tf.summary.FileWriter("tf_logs/", graph=sess.graph)

And this should conclude our journey on this multilayer network implementation, next time, we should try the convolutional network implementation on MNIST, expecting a test accuracy of about 99.4%. This next task should prove slightly more complex since we will not have a full code template to start with, but that should be fun anyway !

MNIST Multilayer Perceptron network

Building the network

Observed results