Convolutional Neural Network : Introduction, Brief History, Structure, Operation, Pros and Cons

TLDR; This article discusses what is a Convolutional Neural Network, its brief history, structure, operation, pros and cons.

Posted by Fazla Najumudeen on 2021-Jul-03

In this article, I'm going to discuss Convolutional Neural Network, its brief history, structure, operation, pros and cons.


In deep learning, A Convolutional Neural Network (CNN) is a type of artificial neural networks which is widely applied to identify and extract the features from the image data. It uses 'convolution' operation in at least one of its layers and since the main technique used in the network is convolution, it is called as Convolutional Neural Network.

Brief History

In 1968, David Hubel and Torsten Wiesel identified two basic visual cells, which are 'simple cells' and 'complex cells', in the human visual cortex. As an inspiration from Hubel and Wiesel, in 1980, Dr. Kunihiko Fukushima invented an artificial neural network called 'neocognitron'. It is the first CNN and it mimics the functions of simple cells and complex cells.

As a follow-up of Fukushima's neocognitron, in 1990s, a researcher Yann LeCun introduced a CNN (called LeNet). It classifies the handwritten digits and was trained using the MNIST dataset of handwritten digits. Since CNNs needed a lot of data for training and more computing resources were needed to process higher resolution images, at that time CNNs were only applicable to the images with low resolution.

Around 2012, the turning point came and CNNs experienced a huge popularity after in the ImageNet classification competition a CNN called AlexNet achieved a historical performance in labeling images. The availability of millions of data and huge computing resources made it possible to create complex CNNs whereas it was impossible earlier.


The most common layers of a CNN are,

  1. Convolutional Layer
  2. Non Linear Layer
  3. Pooling/Subsampling Layer
  4. Fully Connected Layer

Following image shows how an input image passes through the layers of a CNN.


We can divide the operation of the CNNs into two parts,

  1. Feature Extraction : Image data are passed through the stack of 'Convolutional' and 'Pooling' layers and during that the important image features for further processing are extracted
  2. Classification : After extracting the image features, the fully connected layers act as a classifier on those extracted features and help to classify them


As CNNs are specifically designed for images, We have the following advantages of using CNNs for image analysis,

  • Parameter Sharing : In a CNN, a single filter slides over the sub-regions of an image to produce a feature map. As a result, it reduces the number of parameters.
  • CNNs learn filters automatically : CNNs have the ability of learning the filters automatically without mentioning them explicitly.
  • Spatial features are preserved : Spatial features refer to the arrangement of pixel values in an image. When we use a CNN, it captures the spatial features from an image whereas an artificial neural network loses the spatial features.
  • Comparatively less preprocessing needed : When comparing to other image classification algorithms, CNNs need less preprocessing.


Even though CNNs have achieved excellent performance in image recognition, there are some drawbacks.

  • Translation invariant : CNNs are generally bad in classifying the images with some degree of tilt or rotation without explicit data augmentation
  • Loses a lot of meaningful information in Pooling layers : After multiple pooling layers, CNNs lose the accurate and meaningful information of the images
  • It takes a very long time to train a CNN
  • CNNs need a large dataset to train