Naive Bayes is a supervised classification algorithm based on the Baye’s theorem. It is called as ‘naive’ since it assumes that every feature is independent while predicting the class. As a classifier, it calculates the probabilities of each class for given features (posterior probabilities) and select the class with the highest posterior probability.

For understanding how Naive Bayes classifier functions, Let's have a look on Baye’s theorem. It is given by,

Let's understand how Naive Bayes classifier works on data samples. For example, we have a data sample with features f1 and f2 (features can be numerical or categorical) and we need to classify that sample into class 1 (C1) or class 2 (C2). If we use Naive Bayes classifier to predict the class of that sample, Naive Bayes classifier will calculate the posterior probability of each class and then predict the class with higher posterior probability.

Posterior probability of each class will be given by (from Baye's theorem),

But as P(C1/f1, f2) and P(C2/f1, f2) have same denominator,

Therefore we can compare the posterior probabilities using likelihood and prior probabilities only.

Let's work on the following data with categorical features for getting better understanding.

Using these data, we are going to use Naive Bayes classifier to predict whether players will play or not in the day given Outlook = Sunny, Temperature = Cool, Humidity = High and Windy = True.

First using these data, we have to create frequency tables for every feature (Outlook, Temperature, Humidity and Windy) against target class (Play).

Here since we are going to calculate the probability of each target (Yes, No) for the same set of given feature values (Sunny, Cool, High, True), we don't have to calculate the marginal likelihood probability of each feature value (like P(Sunny), P(Cool), P(High) and P(True)). As we discussed already, we can compare the posterior probabilities using likelihood and prior probabilities only. Therefore let's calculate them from the frequency tables.

Prior probabilities are,

Likelihood probabilities are,

Now we have to calculate the posterior probability of each target.

Probability of **P (Yes/Sunny, Cool, High, True) **is,

Probability of **P (No/Sunny, Cool, High, True) **is,

By comparing those two posterior probabilities, we can say that the probability of **P (No/Sunny, Cool, High, True) **is higher than that of** P (Yes/Sunny, Cool, High, True)**.

Therefore using Naive Bayes classifier, we can predict that players will **not play** in the day given Outlook = Sunny, Temperature = Cool, Humidity = High and Windy = True. Likewise using Naive Bayes classifier, we can predict Whether players will play or not for any given combination of feature values.

In our next article, we will discuss how Naive Bayes classifier works on the data samples with numerical features.