Building a CNN model for the Classification of Fashion MNIST (Step by Step)

TLDR; This article explains the basics of how to build a CNN model for the classification of Fashion MNIST dataset (Step by Step).

Posted by Fazla Najumudeen on 2021-Jul-11

Before going into the implementation directly, If you want to refresh your knowledge on CNN and its layers, you can go through the following articles (If not, just skip it...!).

  1. Convolutional Neural Networks : Introduction, Brief History, Structure, Operation, Pros and Cons
  2. Layers of a Convolutional Neural Network (Part 1)
  3. Layers of a Convolutional Neural Network (Part 2)

Let's start :)

Loading the Fashion MNIST Dataset

This dataset contains 28*28 grayscale images of 60,000 for training and 10,000 for testing with labels. These images are categorized into 10 classes of fashion and clothing products. Pixel values of images are ranging from 0 to 255 and Labels are an array of integers ranging from 0 to 9.

To load the dataset,

import tensorflow as tf 

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()

Let's explore the dataset.

Let's see in which format the labels are saved (i.e. integers or strings).

You can see that the labels are an array of integers.

Let's assign the corresponding class for each integer of the labels so that we can visualize some images from the dataset with labels.

class_names = ['T-shirt/Top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Let's display some images from the dataset.

# Importing this library for visualizing images
import matplotlib.pyplot as plt

plt.figure( figsize = (8,8) )

for i in range(16):
  plt.subplot(4,4,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(train_images[i], cmap=plt.cm.binary)
  plt.xlabel(class_names[train_labels[i]])
plt.show()

Output will be,

Preprocessing the input images

When you display an image, you can see the pixel values ranging from 0 to 255.

As normalizing the data makes the model converge faster, let's divide the train and test images by 255 so that the range of pixel values will become [0, 1] from [0, 255].

train_images, test_images = train_images/255.0, test_images/255.0 

Preparing the input images

As the shape of an input image to a CNN should be in the format of (height, width, depth), let's reshape the train and test images.

train_images = train_images.reshape(-1, 28, 28, 1) 
test_images = test_images.reshape(-1, 28, 28, 1)

Building a CNN model

We are going to build the CNN model with 2 convolutional layers where first one is followed by a max pooling layer and then a fully connected layer.

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, (2,2), input_shape = (28, 28, 1), activation = 'relu'),
  tf.keras.layers.MaxPooling2D((2,2)),
  tf.keras.layers.Conv2D(64, (2,2), activation = 'relu'),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10, activation = 'softmax')
])

First layer of our CNN is a 'Convolutional' layer. In that layer, we have used 32 filters with the size of 2*2 and activation function used in that layer is 'ReLU'. After the convolutional layer, we have added a Max pooling layer with the window size of 2*2.

Then again we have added a convolutional layer with 64 filters of size 2*2 and used the 'ReLU' activation function in that layer.

After that we have flatten (converting into a 1 dimensional array) the feature maps obtained from the second Max pooling layers for inputting them to the next layer. At last we have added a fully connected layer with 10 neurons and used the 'softmax' activation function.

Compiling the CNN model

Before starting the training process, we have to compile the model. The 'compile()' method configures the model for training. The arguments 'optimizer' and 'loss' should be passed for compiling the model.

We are going to pass these 3 input arguments,

  1. Optimizer : Optimizer controls the learning rate in order to reduce the error. There many optimizers available such as SGD, Adam, RMSprop etc. As 'adam' optimizer achieves good results fast, we are going to use it.
  2. Loss : Loss function measures the performance of the model. Since it is a classification problem and our labels are integers, we are going to use 'sparse_categorical_crossentropy' loss function.
  3. Metrics : Metrics are used to monitor how good the model is. We are going to use 'accuracy' metric to see the accuracy of our training data.
model.compile ( optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'] )

Training of CNN

For the training of CNN, we are going to use the 'fit()' method.

results = model.fit ( train_images, train_labels, epochs = 10, batch_size = 1000 )

Let's discuss input arguments used in 'fit()'.

  1. Epoch : One forward and backward pass of the entire training samples is considered as an epoch
  2. Batch size : Number of training samples in a Batch

In our case, we divide our training data into 60 batches (1000 training samples in a batch) and when all 60 batches of data finish one forward and backward pass, we complete an epoch.

Output will look like,

We can also plot the accuracy of training samples against epochs.

plt.figure(figsize = (8, 6))

plt.plot(results.history['acc'])
plt.title("Accuracy of Training Images")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')

plt.show()

We have achieved the accuracy of 89.54% on train images. Let's see how good the trained model is in predicting the class of test images.

Testing of CNN

For predicting the class of test images,

predictions = model.predict ( test_images )

It will return an array of probabilities of being each class for each test image. From that array, we can predict the class which has the highest probability.

predicted_labels = np.argmax ( predictions, axis = -1

We can check the accuracy of the model on test images using 'accuracy_score()'.

from sklearn.metrics import accuracy_score

test_accuracy = accuracy_score ( test_labels, predicted_labels )

It gives the accuracy of 89.5% on test images.

Let's see our code all together.

import tensorflow as tf 
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()

class_names = ['T-shirt/Top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure( figsize = (8,8) )
for i in range(16):
  plt.subplot(4,4,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(train_images[i], cmap=plt.cm.binary)
  plt.xlabel(class_names[train_labels[i]])
plt.show()

train_images, test_images = train_images/255.0, test_images/255.0 

train_images = train_images.reshape(-1, 28, 28, 1) 
test_images = test_images.reshape(-1, 28, 28, 1)

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, (2,2), input_shape = (28, 28, 1), activation = 'relu'),
  tf.keras.layers.MaxPooling2D((2,2)),
  tf.keras.layers.Conv2D(64, (2,2), activation = 'relu'),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(10, activation = 'softmax')
])

model.compile ( optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'] )

results = model.fit ( train_images, train_labels, epochs = 10, batch_size = 1000 )

plt.figure(figsize = (8, 6))
plt.plot(results.history['acc'])
plt.title("Accuracy of Training Images")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()

predictions = model.predict ( test_images )

predicted_labels = np.argmax ( predictions, axis = -1 ) 

test_accuracy = accuracy_score ( test_labels, predicted_labels )

Hope you learnt the basics of how to implement a CNN for a classification dataset :).