Naive Bayes is a probabilistic machine learning method that solves classification problems by applying the Bayes Theorem.

The Naive Bayes algorithm and all pertinent ideas will be explained in detail in this essay, leaving no space for ambiguity or understanding gaps.

**Introduction**

The probabilistic machine learning method known as naive Bayes can be applied to a variety of classification issues.

Applications including document classification, sentiment analysis, and spam filtering are typical. The phrase originates in the writings of Rev. Thomas Bayes (1702).

**But why is it called ‘Naive’?**

The idea that the features making up the model are unrelated to one another is referred to be naive. To put it another way, altering the value of one feature does not directly alter or change the value of any other feature used in the algorithm.

Alright. The naive bayes algorithm appears to be a straightforward but effective algorithm. But why is it so well-liked?

That is as a result of NB’s significant edge. It is a probabilistic model, making the algorithm simple to use and the prediction process speedy. swiftly in the present

**What does “conditional probability” mean?**

Let’s start with conditional probability’s foundations.

Example of a Fair Dice and Coin Toss You have an equal chance of getting heads or tails when you flip a fair coin. Therefore, there is a 50% chance of getting heads. In a similar vein, what is the probability of rolling a 1 on a 6-sided die? The likelihood, if the dice are fair, is 1/6, or 0.166.

**Using the Bayes Rule**

A technique for figuring out P(Y|X) from P(X|Y) in the training dataset is the Bayes Rule.

In the equations above, we replace A and B with feature X and reaction Y to achieve this.

X is known but Y is unknown for test or scoring data observations. And for each row of the test dataset, you want to calculate the probability of Y given that X has already happened.

If Y has more than two categories, what then? We determine each Y class’s likelihood and assign the highest probability.

Click Here – What Is CAIIB Exam?

**Inference using Bayesian data**

The formula for estimating the probability of Y given X is provided by the Bayes Rule.

However, there are frequently a lot of X variables in real-world issues.

The Bayes Rule can be expanded to Naive Bayes when the features are independent.

**Naive Bayes Example Made by Hand**

Assume you have 1000 fruits, any of which may be orange, banana, or other. The Y variable has three classes, which are listed below. The following X variables are all binary (1 or 0), and we have information on them.

- Sweet
- Long
- Yellow

**What is Laplace Correction exactly?**

P(Orange | Long, Sweet, and Yellow) was zero in the example above because P(Long | Orange) was also zero.

In other words, no ‘Long’ oranges were discovered in the training data.

Although it seems sense, if your model has multiple features, the overall probability will be zero because one of the attributes has a value of 0. In order to prevent this, we raise the count of the variable with zero in the numerator to a very low number (usually 1), preventing the probability sum from falling to zero. The name given to this technique is “Laplace Correction.”

The majority of Naive Bayes model implementations support this or a related type of correction as a parameter.

**What does Gaussian Naive Bayes actually mean?**

We have already demonstrated how to compute when there are category Xs.

However, given that X is a continuous variable, how can the probability be determined?

The probability density function of a given distribution can be used to determine the likelihood of likelihoods if we assume that X follows a particular distribution.

We can replace the appropriate probability density of a Normal distribution and refer to it as the Gaussian Naive Bayes if we assume that the Xs have a Normal (also known as Gaussian) Distribution, which is a very common assumption.

**Improvement Suggestions for the Model**

- Change the variables using transformations like BoxCox or YeoJohnson to get the features as close to Normal as possible.
- Try utilizing Laplace correction to deal with records where the X variables have zero values.
- Look for related features and try to get rid of any that have a strong correlation. The features are independent, which is the fundamental tenet of Naive Bayes.
- feature creation. It could be advantageous to combine characteristics (of a product) to produce new ones that make sense.
- Try giving the algorithm more precise prior probability based on business knowledge rather than letting it create the priors based on the training sample.