Let's understand the Naive Bayes Algorithm!!

Naive Bayes is a commonly used algorithm which used for classification problems. It uses the concept of probability to make predictions. Today we will:

Understand the Prerequisite Mathematical knowledge required
Solve a problem statement using Naive Bayes machine learning technique.

Prerequisites

This algorithm uses the Naive Bayes Algorithm, which uses the concept of "Independent Events" and "Dependent Events"

Independent and Dependent Events

Independent events:

Independent events are those where the outcome of one event does not affect the outcome of another event.
In simpler terms, what happens in the first event does not influence what happens in the second event.
For example, if you flip a coin twice, the outcome of the first coin flip (whether it's heads or tails) doesn't affect the outcome of the second coin flip. Each coin flip is independent of the other.

Dependent events:

Dependent events are those where the outcome of one event does affect the outcome of another event.
In other words, what happens in the first event has an impact on what happens in the second event
. For example, if you draw a queen from a deck and don't put it back, the probability of drawing a queen card in the second draw changes because the first queen has been removed. The events are dependent because the first draw influences the possibilities for the second draw

Now that we are clear on the idea of independent and dependent events, let's discuss the Bayes theorem.

Bayes theorem

Suppose there are two dependent events A and B and we wish to find out the probability of event B after event A has occurred. This can be accomplished using the Bayes theorem.

$${P(A|B) = (P(B|A) * P(A)) / P(B)}$$

Let's break down the formula to understand it clearly.

P(A|B) is the probability of event A given event B has happened (posterior probability).
P(B|A) is the probability of event B given event A happens (likelihood).
P(A) is the probability of event A occurring (prior probability).
P(B) is the probability of event B occurring.

Now question is, how will this formula apply in classification problems?

Suppose we have 3(x1, x2, x3) independent features and a dependent feature y, then we can write:

$$P(\frac{y}{x1,x2,x3})=\frac{P(\frac{(x1,x2,x3)}{y}) * P(y) }{P(x1,x2,x3)}$$

This formula is trained based on training data and is used by our model to make predictions. We will understand in the next section how this formula is exactly using a problem statement and an example dataset.

Example Problem statement

We will work with a very commonly used dataset in machine learning. Our task to determine if the person will go to play tennis based on different conditions.

First of all, we will start by creating tables for each categorical features

Explanation of each column:

Yes -> No: of times output was yes for a particular outlook
No -> No: of times output was no for a particular outlook

Eg: when Outlook was sunny, output was yes 2 times and no 3 times
P(Y) -> P(Outlook| Yes)

Eg: P(Sunny | Yes) -> meaning probability that outlook was sunny when the person went to play tennis
P(N) -> P(Outlook | No)

the probability that the outlook was overcast when the player didn't play tennis

Similarly, we create the tables for other catgorical features as well

We can observe, that in the table there are 9 instances of yes and 5 instances of No.

Now, based on our Bayes theorem, the probability that a person will go to play or not based on certain features is given by:

$$P(\frac{yes}{T,H,W,O})=\frac{P(\frac{(T,H,W,O)}{yes}) * P(yes) }{P(T,H,W,O)} \space\space\space\space... Eq(1)$$

$$P(\frac{no}{T,H,W,O})=\frac{P(\frac{(T,H,W,O)}{no}) * P(no) }{P(T,H,W,O)} \space\space\space\space...Eq(2)$$

Where T, H, W, and O stand for Temperature, Humidity, Wind, and Outlook respectively. We can observe that the denominators of both the terms are same. This means that we can neglect this part while making our predictions.

For equation 1, the first term of the numerator can be further broken down as follows:

$$P(\frac{(T,H,W,O)}{yes})=P(\frac{T}{yes})*P(\frac{H}{yes})*P(\frac{W}{yes})*P(\frac{O}{yes})$$

The same can also be done for Equation 2

Now, suppose we are given a test data point Hot, High, Strong, and sunny. Then what will be the output?

Using equation one, and nglacting the denominator term as we discussed probability that the person will not go out to play tennis can be given by:

$$P(\frac{(Yes)}{hot,high,strong,sunny})=P(\frac{hot}{yes})*P(\frac{high}{yes})*P(\frac{strong}{yes})*P(\frac{sunny}{yes})*P(Yes)$$

$$\space\space\space\space\space = \frac{2}{9}\frac{3}{9}\frac{3}{9}\frac{2}{9}\frac{9}{14}$$

Similarly, we calculate the probability that a person does not play tennis using equation 2, neglecting the denominator.

Let both probabilities come out to be Y and N respectively

if Y > N, then the output will be yes, i.e. the player will go to play tennis else player will not go to play tennis.

I hope this post was helpful in giving a general overview of naive bayes algorithm in machine learning and its main components.