====== MNIST Multilayer Perceptron network ======

{{tag>deep_learning}}

As mentioned in [[blog:2018:1230_mnist_logistic|my previous post]] on this topic, we should now try to build and train a multilayer perceptron network on the MNIST dataset. This should provide us a small improvement on the accuracy previously observed using a simple logistic regression network.

====== ======

/*
Using this github repo as reference:
https://github.com/darksigma/Fundamentals-of-Deep-Learning-Book
*/

===== Building the network =====

So here is the python script I used for this experiment (again mostly a copy of the original script): <sxh python>from nv.deep_learning import MNIST
from nv.core.utils import *
from nv.core.admin import *

root_path = nvGetRootPath()
logDEBUG("Retrieving MNIST dataset...")
mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True)
logDEBUG("Done retrieving MNIST dataset.")

import tensorflow as tf
import shutil, os

# Architecture
n_hidden_1 = 256
n_hidden_2 = 256

# Parameters
learning_rate = 0.01
training_epochs = 300
# training_epochs = 1000
batch_size = 100
display_step = 1

def layer(input, weight_shape, bias_shape):
    weight_init = tf.random_normal_initializer(stddev=(2.0/weight_shape[0])**0.5)
    bias_init = tf.constant_initializer(value=0)
    W = tf.get_variable("W", weight_shape,
                        initializer=weight_init)
    b = tf.get_variable("b", bias_shape,
                        initializer=bias_init)
    return tf.nn.relu(tf.matmul(input, W) + b)

def inference(x):
    with tf.variable_scope("hidden_1"):
        hidden_1 = layer(x, [784, n_hidden_1], [n_hidden_1])
     
    with tf.variable_scope("hidden_2"):
        hidden_2 = layer(hidden_1, [n_hidden_1, n_hidden_2], [n_hidden_2])
     
    with tf.variable_scope("output"):
        output = layer(hidden_2, [n_hidden_2, 10], [10])

    return output

def loss(output, y):
    xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y)    
    loss = tf.reduce_mean(xentropy)
    return loss

def training(cost, global_step):
    tf.summary.scalar("cost", cost)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(cost, global_step=global_step)
    return train_op


def evaluate(output, y):
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("validation", accuracy)
    return accuracy

if __name__ == '__main__':
    
    if os.path.exists("tf_logs/"):
        shutil.rmtree("tf_logs/")

    with tf.Graph().as_default():

        with tf.variable_scope("mlp_model"):

            x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
            y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes


            output = inference(x)
            cost = loss(output, y)

            global_step = tf.Variable(0, name='global_step', trainable=False)

            train_op = training(cost, global_step)
            eval_op = evaluate(output, y)
            summary_op = tf.summary.merge_all()

            saver = tf.train.Saver()
            sess = tf.Session()

            summary_writer = tf.summary.FileWriter("tf_logs/",
                                                graph_def=sess.graph_def)

            
            init_op = tf.global_variables_initializer()
            sess.run(init_op)

            # saver.restore(sess, "tf_logs/model-checkpoint-66000")

            # Training cycle
            for epoch in range(training_epochs):

                avg_cost = 0.
                total_batch = int(mnist.train.num_examples/batch_size)
                
                # Loop over all batches
                for i in range(total_batch):
                    minibatch_x, minibatch_y = mnist.train.next_batch(batch_size)
                    # Fit training using batch data
                    sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y})
                    # Compute average loss
                    avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch
                
                # Display logs per epoch step
                if epoch % display_step == 0:
                    logDEBUG("Epoch: {:04d}, cost={:.9f}".format(epoch+1,avg_cost))

                    accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels})

                    logDEBUG("Validation Error: %f" % (1 - accuracy))

                    summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y})
                    summary_writer.add_summary(summary_str, sess.run(global_step))

                    saver.save(sess, "tf_logs/model-checkpoint", global_step=global_step)


            logDEBUG("Optimization Finished!")

            accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels})

            logDEBUG("Test Accuracy: %f" % accuracy)
</sxh>

<note tip>In my previous network training experiment I mentioned a "display issue" where all the training data was only shown on my shell after the training was completed. Here using the "logDEBUG" function of mine instead of simple "print()" **fixed that issue**. In the end this was most probably just a matter of calling **sys.out.flush()** after outputing a message [and this is also probabbly only needed in my custom python setup]</note>

===== Observed results =====

Now strangely enough, after 300 epochs as suggested in the book **Fundamentals of Deep Learning** I only observed a test accuracy of about **86.9%**, which is actually less good than the accuracy we got previously. Then I tested with 1000 training epochs,and also got an accuracy of only **86.7%**, so it seems there is something going wrong here...

Also, there is a deprecated warning from tensorflow using the code above: <sxh bash>WARNING:tensorflow:From mnist_multilayer_regression.py:46: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.</sxh>

Checking the documentation page, we see that the main difference will be (as reported above) that: //Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default//. To be honest I'm not absolutely sure what this implies. But it's probably a good idea to give it a try here anyway.

=> So I replaced the function call **tf.nn.softmax_cross_entropy_with_logits(...)** with the v2 version: <sxh python>def loss(output, y):
    xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=y)    
    loss = tf.reduce_mean(xentropy)
    return loss</sxh>

And this actually helped to improve the test accuracy significantly, as I could then achieve an accuracy of **97.9%** with 300 training epochs (which is close enough to the book theoretical accuracy of 98.2% I would say, so this sounds acceptable now).

Another deprecation warning I noticed here is the following: <sxh bash>WARNING:tensorflow:Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.</sxh>

=> For that one we could simply replace the code line: <sxh python> # summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def)
summary_writer = tf.summary.FileWriter("tf_logs/", graph=sess.graph)</sxh>

And this should conclude our journey on this multilayer network implementation, next time, we should try the convolutional network implementation on MNIST, expecting a test accuracy of about 99.4%. This next task should prove slightly more complex since we will not have a full code template to start with, but that should be fun anyway :-)!