Principal Component Analysis (PCA) is generally a dimension reduction technique that transforms the higher dimensional dataset to the lower dimensional dataset while retaining most of the information.
Before going into the mathematics behind the PCA algorithm, let’s understand the concept in simple words. For instance, let’s take some data points with 2 dimensions and see how PCA reduces their dimensions.
First from each feature of each sample, the algorithm will subtract the mean of that corresponding feature. You can simply think it as moving the data points towards the origin such that the mean point of the features will be on the origin of the axes (Refer Fig 1). Here moving the data points doesn’t affect how the data points are placed related to each other.
After moving, PCA will find the direction of the largest variance of data points through the origin and it will be the first principal component (PC1). Then it will find the direction of second largest variance of data points through the origin such that it will be perpendicular to the first principal component. That will be the second principal component (PC2). Since our data points are in 2 dimensions, PCA will only find 2 principal components as for the n dimensions of data, PCA finds n principal components that are orthogonal (i.e. perpendicular) to each other.
After finding the principal components, we have to tell the algorithm how many numbers of principal components we want (i.e. number of dimensions or features) or how much variance should be retained. As our data points are in 2 dimensions, we can reduce it to 1 dimension. Therefore we should select PC1 only. After defining the number of principal components, PCA will project the data on the selected principal components. In our case our data points will be projected on PC1 and our data points will have 1 dimension only.
This is how PCA reduces the dimensions. For better understanding please refer the below images.
Now let’s move to how PCA functions mathematically with an example. Following is our sample dataset.
Let’s represent it as a matrix X (3*4) where Number of features = 3 and Number of samples = 4
We have to find the mean of each feature (i.e. each row) and subtract it from the corresponding feature of each sample. Mean of first, second and third features are 3 , 3 and 2 respectively.
Now we have to find the covariance matrix C of matrix A. Equation of a covariance matrix will be given by,
After finding the covariance matrix, we have to find the eigen values and normalized eigen vectors of the covariance matrix. Then we have to sort the eigen values and corresponding eigen vectors in order of decreasing eigen values.
Now we are going to select the number of principal components (i.e. eigen vectors) we need. But selecting it manually isn't a good approach. Instead you can find the k number of principal components such that it retains more than 0.9 (90%) of variance (Here you can decide whatever the amount of variance you need to retain). For that we can use the following equation,
For k = 2 (selecting only 2 eigen values from 3),
Since k = 2 retains more than 0.9 of variance, we can select 2 principal components (i.e. eigen vectors) only.
Finally now we can project our data points on selected principal components.
Here Z (2*4) is our new data matrix with 2 features and 4 samples. You can now clearly see how PCA reduced the data matrix X (3*4) with 3 features to the new data matrix Z (2*4) with only 2 features.
In our next article we will see how to implement PCA from scratch and using machine learning libraries.