In this article, I'm going to discuss the implementation of 'Forward propagation' and 'Backpropagation' of an Artificial Neural Network from scratch.
Before diving into the implementation, I recommend you to go through these two articles to refresh your understanding on the concepts and mathematics of Forward propagation and Backpropagation of an ANN.
Now let's start implementing.
We are going to create a Simple Neural Network (NN) (i.e. without hidden layers) for a binary classification problem. Following is the dataset that we are going to use.
First let's create the input data matrix such that each row represents a feature and each column represents a sample.
import numpy as np x = np.array ([ [ 0, 1, 0, 1 ], [ 0, 0, 1, 1 ] ])
Then actual outputs can be created as a row vector using,
y = np.array ([ [ 0, 0, 0, 1 ] ])
In forward propagation, we calculate the weighted sum of inputs and add bias to that sum and then pass it through an activation function. For that first we have to define the weight matrix and bias. Let's start from there.
By looking at the data, we can say that we need two neurons in the input layer (as two features in a sample) and a single neuron in the output layer (in our case no hidden layers). From that we can find the size of our weight matrix between input layer (Layer 1 of our ANN) and output layer (Layer 2 of our ANN) using following formula,
After finding the size of weight matrix, let's create it.
# x.shape will give the number of neurons (number of features) in layer 1 (Input layer) # y.shape will give the number of neurons in layer 2 (Output layer) W = np.zeros ( (x.shape, y.shape) )
After creating weight matrix, let's define the weights as 0.4 and 0.2 and bias as 1.
W, W = 0.4, 0.2 b = 1
The equation of the weighted sum of inputs after adding bias which we obtain during forward propagation of our model is given by,
Let's create matrix z.
z = np.dot ( W.T , x ) + b
Then we have to pass it through the activation function. Here we are going to use Sigmoid activation function which is given by,
Let's pass the matrix z through the Sigmoid function.
s = 1 / ( 1 + np.exp(-z) )
After passing through the sigmoid activation function, we have to define a threshold value to classify the output as 0 or 1. we are going to use a threshold value of 0.5. Then,
Let's implement that in code,
s = s >= 0.5 y_pred = np.array( s, dtype = 'int64' )
Then our predicted outputs will be,
You can see that there are errors between our actual and predicted outputs. Therefore using backpropagation, let's update our weights and bias to reduce our network.
During backpropagation, we backpropagate our error gradients with respect to weights and bias to update our parameters. Error gradients with respect to weights can be calculated using following equation (To see how this equation was derived, Refer this article),
Error gradient with respect to bias can be calculated using following equation (To see how this equation was derived, Refer this article),
Let's calculate the error gradients from above equations.
# x.shape will give the total number of samples m = x.shape dW = np.dot( x, (y_pred - y).T ) / m db = np.sum( y_pred - y ) / m
After finding the error gradients we can update our weights and bias using the following equations.
Let's define the learning rate as 0.1 and update our weights and bias.
a = 0.1 W = W - a * dW b = b - a * db
Our updated weights and bias will be,
Then the network will again predict the output (forward propagation) using the updated weights and bias and then perform backpropagation. The process of forward propagation and backpropagation will continue until the error of the network reaches its minimum.
Therefore let's do the forward propagation and backpropagation for 1000 times (iterations) to make our network learn well so that we can reduce the error.
for i in range (1000) : z = np.dot ( W.T , x ) + b s = 1 / ( 1 + np.exp(-z) ) s = s >= 0.5 y_pred = np.array( s, dtype = 'int64' ) dW = np.dot( x, (y_pred - y).T ) / m db = np.sum( y_pred - y ) / m W = W - a * dW b = b - a * db
Let's see our code all together.
import numpy as np # Creating Input matrix and Actual outputs x = np.array ([ [ 0, 1, 0, 1 ], [ 0, 0, 1, 1 ] ]) y = np.array ([ [ 0, 0, 0, 1 ] ]) # Defining the size of Weight matrix W = np.zeros ( (x.shape, y.shape) ) # Initializing the weights and bias W, W = 0.4, 0.2 b = 1 # Assigning the number of samples m = x.shape # Defining the learning rate a = 0.1 # Doing forward propagation and backward propagation for 1000 iterations for i in range (1000) : z = np.dot ( W.T , x ) + b s = 1 / ( 1 + np.exp(-z) ) s = s >= 0.5 y_pred = np.array( s, dtype = 'int64' ) dW = np.dot( x, (y_pred - y).T ) / m db = np.sum( y_pred - y ) / m W = W - a * dW b = b - a * db
After 1000 iteration our weights and bias are,
Using those parameters our predicted outputs are,
You can see there is 'no error' between our actual and predicted outputs. This is how we train a simple neural network using forward propagation and backpropagation.