Simple MNIST Logistic regression

In this post we should (finally) be able to perform logistic regression on the MNIST dataset, now that I have a properly setup tensorflow installation.

Retrieving the MNIST dataset

We can retrieve the MNIST dataset from the Yann LeCun website using this script.

On my side I updated the download process a bit simply to use a more consistent download location, so I use something like this:

from nv.deep_learning import MNIST
from nv.core.utils import *
from nv.core.admin import *

root_path = nvGetRootPath()
logDEBUG("Retrieving MNIST dataset...")
mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True)
logDEBUG("Done retrieving MNIST dataset.")

Building logistic network

Again, we use this script as a template to build our initial logistic network.

In the end I didn't change much on that implementation, only removing a few unused imports basically, so the script I used for my test was:

from nv.deep_learning import MNIST
from nv.core.utils import *
from nv.core.admin import *

root_path = nvGetRootPath()
logDEBUG("Retrieving MNIST dataset...")
mnist = MNIST.read_data_sets(root_path+"/data/MNIST/", one_hot=True)
logDEBUG("Done retrieving MNIST dataset.")

import tensorflow as tf
import shutil, os

# Parameters
learning_rate = 0.01
training_epochs = 60
batch_size = 100
display_step = 1

def inference(x):
    init = tf.constant_initializer(value=0)
    W = tf.get_variable("W", [784, 10], initializer=init)
    b = tf.get_variable("b", [10], initializer=init)
    output = tf.nn.softmax(tf.matmul(x, W) + b)

    w_hist = tf.summary.histogram("weights", W)
    b_hist = tf.summary.histogram("biases", b)
    y_hist = tf.summary.histogram("output", output)

    return output

def loss(output, y):
    dot_product = y * tf.log(output)

    # Reduction along axis 0 collapses each column into a single
    # value, whereas reduction along axis 1 collapses each row 
    # into a single value. In general, reduction along axis i 
    # collapses the ith dimension of a tensor to size 1.
    xentropy = -tf.reduce_sum(dot_product, axis=1)
     
    loss = tf.reduce_mean(xentropy)

    return loss

def training(cost, global_step):

    tf.summary.scalar("cost", cost)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(cost, global_step=global_step)

    return train_op


def evaluate(output, y):
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    tf.summary.scalar("validation error", (1.0 - accuracy))

    return accuracy


if __name__ == '__main__':
    if os.path.exists("logistic_logs/"):
        shutil.rmtree("logistic_logs/")

    with tf.Graph().as_default():

        x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
        y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes

        output = inference(x)
        cost = loss(output, y)
        
        global_step = tf.Variable(0, name='global_step', trainable=False)
        
        train_op = training(cost, global_step)
        eval_op = evaluate(output, y)
        summary_op = tf.summary.merge_all()
        
        saver = tf.train.Saver()
        sess = tf.Session()

        summary_writer = tf.summary.FileWriter("logistic_logs/", graph_def=sess.graph_def)

        init_op = tf.global_variables_initializer()

        sess.run(init_op)

        # Training cycle
        for epoch in range(training_epochs):

            avg_cost = 0.
            total_batch = int(mnist.train.num_examples/batch_size)
            # Loop over all batches
            for i in range(total_batch):
                minibatch_x, minibatch_y = mnist.train.next_batch(batch_size)
                # Fit training using batch data
                sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y})
                # Compute average loss
                avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch
            
            # Display logs per epoch step
            if epoch % display_step == 0:
                print("Epoch:", '%04d' % (epoch+1), "cost =", "{:.9f}".format(avg_cost))

                accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels})

                print("Validation Error:", (1 - accuracy))

                summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y})
                summary_writer.add_summary(summary_str, sess.run(global_step))

                saver.save(sess, "logistic_logs/model-checkpoint", global_step=global_step)


        print("Optimization Finished!")


        accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels})

        print("Test Accuracy:", accuracy)

Observed results

As expected, we get a Test accuracy of about 92% with this network. And the outputs I got were:

$ nv_call_python mnist_logistic_regression.py
2018-12-30T13:23:14.077830 [DEBUG] Retrieving MNIST dataset...
Extracting D:/Projects/NervSeed/data/MNIST/train-images-idx3-ubyte.gz
Extracting D:/Projects/NervSeed/data/MNIST/train-labels-idx1-ubyte.gz
Extracting D:/Projects/NervSeed/data/MNIST/t10k-images-idx3-ubyte.gz
Extracting D:/Projects/NervSeed/data/MNIST/t10k-labels-idx1-ubyte.gz
2018-12-30T13:23:14.583517 [DEBUG] Done retrieving MNIST dataset.
2018-12-30 13:23:15.007235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.60GiB
2018-12-30 13:23:15.007934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

2018-12-30 13:27:33.976381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-30 13:27:33.976805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-12-30 13:27:33.977068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-12-30 13:27:33.977682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6363 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.
Epoch: 0001 cost = 1.174406669
Validation Error: 0.15079998970031738

(... more epochs here...)

Epoch: 0058 cost = 0.298283045
Validation Error: 0.07740002870559692
Epoch: 0059 cost = 0.297683232
Validation Error: 0.07740002870559692
Epoch: 0060 cost = 0.297116048
Validation Error: 0.07700002193450928
Optimization Finished!
Test Accuracy: 0.9201
Concerning the tensorflow execution, I don't know if this was due to the fact this was a “first run” of my network, but it seemed very slow to me to start the training process, and then I almost instantly got the validation error values and all ⇒ I should be carefull about this performance question in the future.
The “slow display” I mentioned above might actually be due to an inappropriate flushing of the stdout/stderr pipes, as I tweaked those in my current system as far as I remember. To be investigated.

Now that we have some run statistics, we should be able to start tensorboard to display them. The tensorboard executable is available on Windows in the python 3 bin/Scripts/ folder. Given my very specific python setup I had to create an additional helper script to start this app:

nv_call_tensorboard()
{
  local pname="$(uname -s)"
  case "${pname}" in
  CYGWIN*)
    local PREVPATH="$PATH"
    local pdir="`nv_get_project_dir`/tools/windows/$__nv_tool_python3/bin"
    export PATH="$pdir:$pdir/Scripts:$PATH"
    $pdir/Scripts/tensorboard.exe "$@"
    export PATH="$PREVPATH"
    ;;
  *)
    local PREVPATH="$PATH"
    local pdir="`nv_get_project_dir`/tools/linux/$__nv_tool_python3/bin"
    export PATH="$pdir:$pdir/Scripts:$PATH"
    $pdir/tensorboard "$@"
    export PATH="$PREVPATH"
    ;;
  esac
}

So I start tensorboard using our log folder:

ultim@saturn /cygdrive/d/Projects/NervSeed/python/apps/deep_learning/mnist_logistic
$ nv_call_tensorboard --logdir=logistic_logs
TensorBoard 1.12.1 at http://saturn:6006 (Press CTRL+C to quit)

Then we can nagivate to the page http://localhost:6006, and there we have it! My first tensorboard graph display :-):

first_tensorboard.jpg

⇒ So let's call that a successful experiment and stop here for this post! Next time, we will try to move to the multilayer perceptron implementation to improve our current accuracy.