Implementation of ANN for MNIST Handwritten Digits Classification

In this article, I'm going to discuss the implementation of an Artificial Neural Network for the classification of MNIST handwritten digits.

Before moving into the implementation, I recommend you to go through following articles to recall the concepts and mathematics of Artificial Neural Networks.

  1. Artificial Neural Networks : Forward Propagation
  2. Artificial Neural Networks : Backpropagation

Let's start :).

Loading MNIST Handwritten Digits Dataset

This dataset contains 28*28 grayscale images of handwritten digits from 0 to 9. There are 60,000 for training and 10,000 for testing with labels.

To load the dataset,

import tensorflow as tf

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

Let's explore the data.

Let's assign the corresponding name for each digits of the labels so that we can visualize some images from the dataset with labels.

class_names = ['Zero', 'One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight', 'Nine']

Let's display some images from the dataset.

# Importing this library for visualizing images
import matplotlib.pyplot as plt

plt.figure( figsize = (7,7) )

for i in range(16):
  plt.subplot(4,4,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(train_images[i], cmap=plt.cm.binary)
  plt.xlabel(class_names[train_labels[i]])
plt.show()

Output will be,

Preprocessing the Input images

When you display an image from the dataset, you can see the pixel values ranging from 0 to 255.

plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.show()

As normalizing the data makes the model converge faster, let's divide the train and test images by 255 so that the range of pixels values will become [0, 1] from [0, 255].

train_images, test_images = train_images/255.0, test_images/255.0

Building a ANN model

Now we are going to create an ANN model.

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape = (28, 28)),
  tf.keras.layers.Dense(300, activation = 'sigmoid'),
  tf.keras.layers.Dense(10, activation = 'softmax')
])

Since our input images are in 2 dimensions (28*28), we have to convert the inputs to 1 dimensional array as we need to feed the inputs to an ANN as a 1 dimensional array. Therefore as a first layer of our ANN, we have added a Flatten layer into our ANN to convert the inputs from 2 dimensions to 1 dimension. It converts the inputs of 28*28 pixels to 784 pixels.

After converting, we have added a fully connected layer of 300 neurons with 'sigmoid' activation function. Then finally we have added a fully connected layer of 10 neurons (since we have 10 different classes to be classified) with 'softmax' activation function.

Compiling the ANN model

Before starting the training process, we have to compile the model. The 'compile()' method configures the model for training. The arguments 'optimizer' and 'loss' should be passed for compiling the model.

We are going to pass these 3 input arguments,

  1. Optimizer : Optimizer controls the learning rate in order to reduce the error. There many optimizers available such as SGD, Adam, RMSprop etc. As 'adam' optimizer achieves good results fast, we are going to use it.
  2. Loss : Loss function measures the performance of the model. Since it is a classification problem and our labels are 'integers', we are going to use 'sparse_categorical_crossentropy' loss function.
  3. Metrics : Metrics are used to monitor how good the model is. We are going to use 'accuracy' metric to see the accuracy of our training data.
model.compile ( optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'] )

Training of ANN

For the training of ANN, we are going to use the 'fit()' method.

results = model.fit ( train_images, train_labels, epochs = 10, batch_size = 1000 )

Let's discuss input arguments used in 'fit()'.

  1. Epoch : One forward and backward pass of the entire training samples is considered as an epoch
  2. Batch size : Number of training samples in a Batch

In our case, we divide our training data into 60 batches (1000 training samples in a batch) and when all 60 batches of data finish one forward and backward pass, we complete an epoch.

Output will look like,

We can also plot the accuracy of training samples against epochs.

plt.figure(figsize = (8, 6))

plt.plot(results.history['acc'])
plt.title("Accuracy of Training Images")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')

plt.show()

We have achieved the accuracy of 94% on train images. Let's see how good the trained model is in predicting the class of test images.

Testing of ANN

For predicting the class of test images,

predictions = model.predict ( test_images )

It will return an array of probabilities of being each class for each test image. From that array, we can predict the class which has the highest probability.

import numpy as np

predicted_labels = np.argmax ( predictions, axis = -1

We can check the accuracy of the model on test images using 'accuracy_score()'.

from sklearn.metrics import accuracy_score

test_accuracy = accuracy_score ( test_labels, predicted_labels )

It gives the accuracy of 93.5% on test images.

Let's see our code all together.

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

class_names = ['Zero', 'One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight', 'Nine']

plt.figure( figsize = (7,7) )

for i in range(16):
  plt.subplot(4,4,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(train_images[i], cmap=plt.cm.binary)
  plt.xlabel(class_names[train_labels[i]])
plt.show()

train_images, test_images = train_images/255.0, test_images/255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape = (28, 28)),
  tf.keras.layers.Dense(300, activation = 'sigmoid'),
  tf.keras.layers.Dense(10, activation = 'softmax')
])

model.compile ( optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'] )

results = model.fit ( train_images, train_labels, epochs = 10, batch_size = 1000 )

plt.figure(figsize = (8, 6))
plt.plot(results.history['acc'])
plt.title("Accuracy of Training Images")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()

predictions = model.predict ( test_images )

predicted_labels = np.argmax ( predictions, axis = -1 ) 

test_accuracy = accuracy_score ( test_labels, predicted_labels )

Hope you learnt the basics of how to implement an ANN for the classification of a dataset :).

If you want to know how to implement an ANN from scratch, please go through the following article.

Artificial Neural Network : From scratch in Python (For Beginners)

Related articles