ANN with MNIST
Table of Contents
From Wikipedia
More here
We will be using MNIST to create a Multinomial Classifier that can detect if the MNIST image shown is a member of class 0,1,2,3,4,5,6,7,8 or 9. Susinctly, we're teaching a computer to recognize hand written digets.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
# Import Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
%matplotlib inline
Let's download and load the dataset.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print ("The training data set is:\n")
print (mnist.train.images.shape)
print (mnist.train.labels.shape)
print ("The test data set is:")
print (mnist.test.images.shape)
print (mnist.test.labels.shape)
Display a few random samples from it:
mnist.train.images[5]
# well, that's not a picture (or image), it's an array.
mnist.train.images[5].shape
You might think the training set is made up of 28 $\times$28 grayscale images of handwritten digits. No !!!
The thing is, the image has been flattened. These are 28x28 images that have been flattened into a 1D array. Let's reshape one.
img = np.reshape(mnist.train.images[5], [28,28])
img = mnist.train.images[5].reshape([28,28])
# So now we have a 28x28 matrix, where each element is an intensity level from 0 to 1.
img.shape
Let's visualize what some of these images and their corresponding training labels look like.
plt.figure(figsize = (6,6))
plt.imshow(img, 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
mnist.train.labels[5]
np.argmax(mnist.train.labels[5])
Batch maker embedded
x, y = mnist.train.next_batch(3)
print(x.shape)
print(y.shape)
# Import Library
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
train_x, train_y = mnist.train.next_batch(1)
img = train_x[0,:].reshape(28,28)
plt.figure(figsize=(6,6))
plt.imshow(img,'gray')
plt.title("Label : {}".format(np.argmax(train_y[0,:])))
plt.xticks([])
plt.yticks([])
plt.show()
One hot encoding
print ('Train labels : {}'.format(train_y[0, :]))
n_input = 28*28
n_hidden = 100
n_output = 10
weights = {
'hidden' : tf.Variable(tf.random_normal([n_input, n_hidden], stddev = 0.1)),
'output' : tf.Variable(tf.random_normal([n_hidden, n_output], stddev = 0.1))
}
biases = {
'hidden' : tf.Variable(tf.random_normal([n_hidden], stddev = 0.1)),
'output' : tf.Variable(tf.random_normal([n_output], stddev = 0.1))
}
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_output])
First, the layer performs several matrix multiplication to produce a set of linear activations
Second, each linear activation is running through a nonlinear activation function
Third, predict values with an affine transformation
# Define Network
def build_model(x, weights, biases):
# first hidden layer
hidden = tf.add(tf.matmul(x, weights['hidden']), biases['hidden'])
# non-linear activate function
hidden = tf.nn.relu(hidden)
# Output layer
output = tf.add(tf.matmul(hidden, weights['output']), biases['output'])
return output
Loss
Optimizer
# Define Loss
pred = build_model(x, weights, biases)
loss = tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels = y)
loss = tf.reduce_mean(loss)
LR = 0.0001
optm = tf.train.AdamOptimizer(LR).minimize(loss)
n_batch
: batch size for mini-batch gradient descentn_iter
: the number of iteration stepsn_prt
: check loss for every n_prt
iterationInitializer
n_batch = 50 # Batch Size
n_iter = 5000 # Learning Iteration
n_prt = 250 # Print Cycle
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
loss_record_train = []
loss_record_test = []
for epoch in range(n_iter):
train_x, train_y = mnist.train.next_batch(n_batch)
sess.run(optm, feed_dict = {x: train_x, y: train_y})
if epoch % n_prt == 0:
test_x, test_y = mnist.test.next_batch(n_batch)
c1 = sess.run(loss, feed_dict = {x: train_x, y: train_y})
c2 = sess.run(loss, feed_dict = {x: test_x, y: test_y})
loss_record_train.append(c1)
loss_record_test.append(c2)
print ("Iter : {}".format(epoch))
print ("Cost : {}".format(c1))
plt.figure(figsize=(10,8))
plt.plot(np.arange(len(loss_record_train))*n_prt,
loss_record_train, label = 'training')
plt.plot(np.arange(len(loss_record_test))*n_prt,
loss_record_test, label = 'testing')
plt.xlabel('iteration', fontsize = 15)
plt.ylabel('loss', fontsize = 15)
plt.legend(fontsize = 12)
plt.ylim([0, np.max(loss_record_train)])
plt.show()
test_x, test_y = mnist.test.next_batch(100)
my_pred = sess.run(pred, feed_dict = {x : test_x})
my_pred = np.argmax(my_pred, axis = 1)
labels = np.argmax(test_y, axis = 1)
accr = np.mean(np.equal(my_pred, labels))
print("Accuracy : {}%".format(accr*100))
test_x, test_y = mnist.test.next_batch(1)
logits = sess.run(tf.nn.softmax(pred), feed_dict = {x : test_x})
predict = np.argmax(logits)
plt.figure(figsize = (6,6))
plt.imshow(test_x.reshape(28,28), 'gray')
plt.xticks([])
plt.yticks([])
plt.show()
print('Prediction : {}'.format(predict))
np.set_printoptions(precision = 2, suppress = True)
print('Probability : {}'.format(logits.ravel()))
You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of overfitting, when a machine learning model performs worse on new data than on its training data.
What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...
$\Rightarrow$ As we saw in lecture, convolutional neural networks (CNNs) are particularly well-suited for a variety of tasks in computer vision, and have achieved near-perfect accuracies on the MNIST dataset. We will build a CNN and ultimately output a probability distribution over the 10 digit classes (0-9) in the next lectures.
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')