Layers of a Convolutional Neural Network (Part 2)

You can find the Part 1 of this article here.

Non Linear Layer

The convolution operation is purely a linear operation as it only performs matrix multiplication and addition. But since most of the real world data are nonlinear, we need to introduce a nonlinearity into the CNN. So that we can capture the nonlinear and complex pattern and features of the data.

The most widely used nonlinear function in a CNN is Rectified Linear Unit (ReLU) as the performance of CNN is comparatively better when ReLU is used. But other nonlinear functions are also used such as 'sigmoid', 'tanh', etc.

ReLU Layer

ReLU function performs an element wise nonlinear operation by replacing the negative value with zero and leaving the positive value as it is. It can be represented using the following graph.

Let's pass the feature map obtained from the convolutional layer.

The output of the ReLU layer is as follow.

You can notice that ReLU layer doesn't down sample the size of the feature maps. If we pass the feature maps size of 4*4*64, then the ReLU layer will provide the output with the same size.

Pooling / Subsampling Layer

Pooling is done to reduce the spatial size of the feature maps. Pooling layer reduces the height and width of the feature maps while keeping the depth (filter count) as constant. By applying pooling operation, we can,

  1. Reduce the number of parameters
  2. Shorten the training time of a CNN
  3. Control over fitting

There are two common pooling methods used in the pooling layers.

  1. Max pooling
  2. Average pooling

Max pooling takes the maximum value of each sub-region of the feature map and average pooling takes the average value.

Here we have to define the window size of sub-region to be considered for pooling and stride value as well. Let's define the window size as 2*2 and stride value as 2 and then pass the feature map we obtained from ReLU layer to Max pooling layer.

The output of the pooling layer is as follow.

Here as it was already mentioned, pooling layers only downsample the height and width of the feature maps, not depth (filter count). If we pass the feature maps size of 4*4*64 through the pooling layer, we will get the output with size of 2*2*64.

Fully Connected Layer

Neurons in this layer are fully connected to the feature maps obtained from the previous layer. These fully connected layers act as a classifier on the extracted features. The number of neurons needed in the fully connected layer is determined from the size of feature maps obtained from the previous layer.

In our case, since we get the feature maps size of 2*2*64 from the pooling layer, we need 256 (2*2*64 = 256) number of neurons in the fully connected layer. Our fully connected layer looks like as follow.

In the next article, we will discuss the implementation of a CNN using machine learning libraries in Python.

Related articles