XGBoost- Classification

Boosting Ensembling Technique

XGBoost- Classification

In this blog, we will be discussing another commonly used ensembling technique known as XGBoost. As we know, in there are two types of ensembling techniques: Boosting & Bagging. XGBoost belongs to the former.

XGBoost is a shorthand for "Extreme Gradient Boosting". It can be used for solving both regression and classification problems. As the name suggests, we need to know the basics of Gradient Boosting for understanding this algorithm (But that's the topic for another Blog post)

Today, we will discuss how we can solve Classification problems using XGBoost Ensembling Technique

This is the dataset we will be working on. We want to find out if the person will get a loan based on credit score, salary. Here both salary and credit are categorical data. Credit has 3 values: Bad, Good, and Normal.

1 means the loan is approved.

Step 1: Construct a Base model

Since this is a classification problem, there will be only two possible outputs : 0 and 1, We will use the probability of 0.5 to calculate the residual and the output using this base model.

Step 2: Calculate the Residual

For every data point, subtract the probability that we calculated previously from the output variable. (In this case, Approval is the output feature)

Step 3: Start constructing the tree As follows

We will use the "Salary "feature to create the tree. In XGBoost, we have to do Binary classification even if there are more than two categories in the salaries feature (i.e. only two nodes at every level)

Now we will calculate the Similarity weights using the below formula.

$$\frac{{(\Sigma \text{Residual})^2}}{{\Sigma(pr(1-pr))+ \alpha}}$$

where alpha is a hyperparameter, which we assume to be zero for now for simplicity purposes.

Calculate the similarity weight for the left split

  • Calculate the similarity weight for a right split

  • Calculate the similarity weight for the left decision split

  • Calculate the similarity weight of root node

(I am leaving the task of calculating S.W. of root node for simplicity right now)

Compute the gain

Gain = Similarity weight of left split + Similarity weight of right split + Similarity weight of root

We need to compute gain, because we need to determine the feature from which we should start splitting. the Split with a maximum gain is preferred

For Simplicity, we will assume that Salary has a better Gain so we can start splitting from the tree we have constructed till now. Now we will do next for each node based on Credit

For credit, we will have to create multiple Decision trees for each node (There are more than two categories.)

We will do the split on the left node for now

Keep in mind that both credit splits are for the left branch of the salary node.

To determine whether or not we should do further split, or how many Branches should be cut out, We Do something called Post Prunning using Cover Value.

After Creating the tree, we cut the unnecessary branches based on this cover value.
This cover value is Equal to The denominator of the Similarity Weight Formula:

$${{\Sigma(pr(1-pr))}}$$

If our gain for a split is less than this cover value, we will cut that branch.

Now whenever we get some new data, We need to find the base Model Output (Let is be M)using our initial formula. We will put that equation into the following formula (Also known as the log odds formula or logits function.

$$\log\left(\frac{p}{1-p}\right)$$

We will pass the new record through the constructed decision tree. Let us assume it arrives at a leaf node with The similarity Weight S.W.

We will use sigmoid function to give us a new probability value for each data record:

$$\sigma(M + Learning Rate*(S.W.))$$

Where sigmoid(x) is

$$\frac{1}{1 + e^{-x}}$$

We will calculate new residuals using these new probability values, and reconstruct a new decision tree. We will keep doing these until our residual becomes minimum. The final output will be given By:

$$\sigma(0 + \alpha(T1) + \alpha(T2) + \ldots)$$

Where alpha is the learning rate or hyperparameter.

That's it. I hope I was able to articulate my thoughts well in this thought, and was helpful to you guys.