Implementing Logistic regression practically

Implementing Logistic regression practically

Table of contents

No heading

No headings in the article.

Recently I learned to implement Logistic regression from https://youtu.be/n40hS9tQmcY and Intend to share my understanding of the implementation and document it through this blog post.

So, Let's begin!!!!!!

We will begin by importing all the important libraries for our code.

import seaborn as sns #for data visualization
#for manipulating our data frame
import pandas as pd
import numpy as np

Now, we will create a data frame.

We will be using the iris data set, where each instance has five attributes, with four of these being the measurements of the sepal and petals of each observation in the data set and the fifth being the class or species of Iris that each observation belongs to.

It is a popular dataset and is available in seaborn library, using which, we can directly create our dataframe

df=sns.load_dataset("iris")
df.head()

Now, we know that logistic regression is used for binary clasification, it can't be used on dataset which has more than 2 categories.

df['species'].unique()  #Printing how many categories are there in total

We also need to ensure that our dataset doesn't have any null values in it.

df.isnull().sum()

This returns the number of null values in each column.

Now, we will remove all the rows which are of setosa species in this data frame

df=df[df['species']!='setosa']
df.head()

Now, while processing the data, It is easier for our machine to deal with integer/numeric data type. So we will assign numeric labels to the species using map. Since we have only two categories, we will use 0 and 1.

df['species']=df['species'].map({'versicolor':0,'virginica':1})
df.head()

Next step is to separate our Dataset into dependent and independent variables(i.e. Input features and output Features)

x=df.iloc[:,:-1] # select all columns except one,from every row
y=df.iloc[:,-1]  #from every row,select the last column

Split your dataset into testing data and training data.

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(
    x,y,test_size=0.25,random_state=42
)

Import the logistic regression Model from sklearn library

from sklearn.linear_model import LogisticRegression
regression=LogisticRegression()

Now, we will implement Grid search for selecting the best combination of hyperparameters.

from sklearn.model_selection import GridSearchCV

parameter = {'penalty':['11','l2','elasricnet'],'C':[1,2,3,4,5,6,10,20,30,50],max_iter':[100,200,300]}

regressor=GridSearchCV(regression,param_grid=parameter,scoring='accuracy',cv=5)

regressor.fit(x_train,y_train)

Ok, so lots of things are happening in part so here is a brief explanation.

C: Denotes the strength of regularization.

penalty: Denotes the type of regularization we want to perform (ridge and lasso regresion)

max_iter-> No: of iterations the model will go through to tune the parameters

Based upon this parameters, we will tune our model (the variable named regression in this code) and store the tuned model in regressor variable.

cv = 5 means that the process will follow 5 fold cross-validation strategy

You can also view the best combination of parameters and the best score that we obtained during this process using following lines of code

print(regressor.best_params_)
print(regressor.best_score_)

and that's it!!!! We have created our model, and now's the time for prediction

y_pred=regressor.predict(x_test)

To calculate the accuracy, compare the predicted values with expected values.

from sklearn.metrics import accuracy_score,classification_report
score=accuracy_score(y_pred,y_test)
print(score)

you can also print the clssification report

print(classification_report(y_pred,y_test))