After forward propagation, we backpropagate the error gradient (i.e. derivative of cost function or error function with respect to weights and biases) to update weight and bias parameters in order to reduce the error.
To understand the mathematics behind the backpropagation, Let's work on an example. For that let's take the predicted outputs of data obtained from forward propagation.
Since it is a binary classification problem, cost function or error function of the network is given by,
As we need the error gradient to update parameters, we have to find the derivative of error function with respect to weights and bias.
Let's derive the derivative of E with respect to any weight Wj,
From that we can write the derivative of E with respect to weights W as,
Likewise we can find the derivative of E with respect to bias b.
Using the derived equations we can find our error gradients. Error gradients with respect to weights are,
Error gradient with respect to bias is,
After finding the error gradients we can update our weights and bias using the following equations,
If we assign our learning rate as 0.1, the our updated weights and bias are,
This how using backpropagation, weights and bias are updated. Then the network will perform the forward propagation using the updated weights and bias and predict the outputs. From the predicted outputs, using backpropagation, weights and bias will be updated again and the process of forward propagation and backpropagation will continue until the cost function of the network reaches the minimum. This how the network learns from the training samples.
In the next article we will discuss the implementation of forward propagation and backpropagation from scratch and using machine learning libraries.