Using tensorflow queue system with CIFAR dataset

So my first experiment on the CIFAR-10 dataset wasn't that successful. And I current think this must be because I wasn't using all the data augmentation technique suggested in the refenrece book, because I didn't want to introduce the tensorflow queueing system into my pipeline, and preprocessing all the data with numpy only prooved too complex for me :-).

But now I'm back, and this time we are going to handle this queueing mechanism as expected.

This time we are going to use this github repo as a template.

First think to note is that this time, the input routines will expect us to use the binary dataset format, not the python version. So I start with downloading that new version of the dataset from the page:

We have an example of how to build a tensorflow unit test in this file

Then we can use this file to see how we should adapt our script to handle the tensorflow queue system.

Important note: what I finally understood in this process is that it is not easy to perform both training and evaluation of the network with this new approach. And actually the recommended option seems to be to execute both separately: the train process would write checkpoints regularly, and the evaluation process would load those checkpoints before trying to evaluate the network predictions.

⇒ Or maybe I wasn't trying hard enough ? As suggested on this page we might be able to use a placeholder with default to perform the evaluation/test run appropriately ? Let's try that… Bingo! :-)

So I was actually able to use the queue inputs for the training part and the “regular” static inputs for the evaluation/test part using this code:

if __name__ == '__main__':
    if os.path.exists("tf_logs/"):
        logDEBUG("Removing previous tf_logs folder.")

    with tf.Graph().as_default():

        with tf.variable_scope("mlp_model"):
            # We prepare the input queues:
            images, labels = CIFAR.distorted_inputs(data_dir, batch_size)
            # x = tf.placeholder("float", [None, 24,24,3]) # mnist data image of shape 24*24*3
            # y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes

            x = tf.placeholder_with_default( images, [None, 24,24,3])
            y = tf.placeholder_with_default( labels, [None])

            # Also retrieve the eval/test datasets:
            dataset = CIFAR.read_data_sets(root_path+"/data/CIFAR/", one_hot=False)

            phase_train = tf.placeholder_with_default(False, shape=())

            output = inference(x, phase_train)
            cost = loss(output, y)

            global_step = tf.Variable(0, name='global_step', trainable=False)

            train_op = training(cost, global_step)

            # eval_output = inference(x, phase_train, True)
            eval_op = evaluate(output, y)

            summary_op = tf.summary.merge_all()

            saver = tf.train.Saver()
            sess = tf.Session()

            # summary_writer = tf.summary.FileWriter("tf_logs/", graph_def=sess.graph_def)
            summary_writer = tf.summary.FileWriter("tf_logs/", graph=sess.graph)
            init_op = tf.global_variables_initializer()

            # saver.restore(sess, "tf_logs/model-checkpoint-66000")

            # Start the queue runners.
            logDEBUG("Starting queue runners...")
            logDEBUG("Done starting queue runners.")
            # Training cycle
            for step in range(max_steps):

                start_time = time.time()
                #, feed_dict={x: minibatch_x, y: minibatch_y, phase_train: True})
                # Compute average loss
                # loss_value =, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch

                _, loss_value =[train_op, cost], feed_dict={phase_train: True})
                duration = time.time() - start_time

                assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
                if step % 10 == 0:
                    num_examples_per_step = batch_size
                    examples_per_sec = num_examples_per_step / duration
                    sec_per_batch = float(duration)

                    format_str = ('step %d, loss = %.2f (%.1f examples/sec; %.3f '
                    logDEBUG(format_str % (step, loss_value, examples_per_sec, sec_per_batch))
                if step % 100 == 0:
                    summary_str =
                    summary_writer.add_summary(summary_str, step)

                # Save the model checkpoint periodically.
                if step % 1000 == 0 or (step + 1) == max_steps:
          , "tf_logs/model-checkpoint", global_step=step)

                if step % 200 == 0:
                    accuracy =, feed_dict={x: dataset.validation.images, y: dataset.validation.labels})
                    logDEBUG("Validation accuracy: %.2f" % accuracy)

            logDEBUG("Optimization Finished!")

            accuracy =, feed_dict={x: dataset.test.images, y: dataset.test.labels})

            logDEBUG("Test Accuracy: %f" % accuracy)

In the code above, I'm actually using both binary and python versions of the CIFAR dataset, which is quite dirty I admit. But since I just wanted to get something working as fast as possible this was the easiest option. This code would definitely require some cleaning, but let's first see what kind of results we get now…

⇒ So with this setup, using 116990 training steps (ie. corresponding to 300 epochs) we can reach a test accuracy of 81.5% which is still not that impressive. But since the values reported by google itself are rather in the range of 83% - 86%, I'm starting to think maybe there was a typo in the book where we can read the values “92%” or “96%” ?

⇒ Anyway, trying to train a bit longer now with 1000 epochs to confirm those results (and with a reduced learning rate): reaching 81.76%: not really a large improvement.