[Python In Depth] Logistic Regression without scikit-learn(a.k.a sklearn)
Sklearn(scikit-learn) is one of the most widely used machine learning libraries in Python.
Its official name is scikit-learn, but the shortened name sklearn is more than enough. Once the library is imported, to deploy Logistic analysis we only need about 3 lines of code. We assume that you have already tried that before. So here, we will introduce how to construct Logistic Regression only with Numpy library, the most basic and fundamental one for data analysis in Python.
We will go through with code blocks to see how the algorithm works.
I worked in Colab, and highly recommend you to use it since we can use free GPU resource, and don’t have to hustle downloading many different libraries. (Link below)
https://colab.research.google.com/
- Import dependent libraries, and generate a simple dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import randomdf = pd.DataFrame({“age”: [22,25,47,52, 46,56,55,60,62,61,18,28,27,29,49,55,25,58,19,18,21,26,40,45,50,54,23], “bought_insurance”:[0,0,1,0,1,1,0,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0]})
To briefly explain, Pandas is to handle Dataframes which is one of the characteristic features of Python. Numpy is for matrix calculations, and Matplotlib is for data visualization. Lastly, Random is to import diverse functions featuring randomness in it.
The generated dataset is very simple, only having two columns; age and whether the person bought insurance or not. If the person had one, then 1, if not, then 0.
2. Split dataset into two parts: Training, and Testing
test = df.sample(7)train = df[~df.isin(test)]
train.dropna(inplace = True)
For simplicity, we only made 27 units in our dataset. Out of the 27, we will leave 7 for testing, to see how our Numpy backed Logistic Regression performs on unfamiliar data. And the remaining 20 will be used to learn the parameters in Logistic Regression model.
Since we will check the performance level of our model after training it, the target value we are aiming is [1 1 0 0 0 1 1] which means first two and the last two of the testing dataset have insurance coverage.
3. Activation function
def sigmoid(x):
return 1/(1+np.exp(-x))
Activation function gives non-linearity to the linearly computed value. This is one of the super important features of machine learning, and you will see this again in neural networks part.
4. Loss function
def square_loss(y_pred, target):
return np.mean(pow((y_pred — target),2))
There are literally thousands of loss functions. You can even make your own one if you want, but here we will deploy the most basic and one of the widely used one. Square loss function, which calculates the euclidean distance between the target value and the predicted one.
5. Split X (feature) part and y (target) part
X_tr, y_tr = train.age, train[‘bought_insurance’]
X_te, y_te = test.age, test[‘bought_insurance’]
As you can see to select a column, which could be regarded as a series in python, there are two ways: using a dot to indicate certain column or using square brackets and assigning column name in it as a string value.
6. Model setup and running
lr = 0.01 #learning late
W = np.random.uniform(0,1) # colom 1
b = 0.1for i in range(10000):
z = np.dot(X_tr, W) + by_pred = sigmoid(z)l = square_loss(y_pred, y_tr)gradient_W = np.dot((y_pred-y_tr).T, X_tr)/X_tr.shape[0]gradient_b = np.mean(y_pred-y_tr)W = W — lr * gradient_Wb = b — lr* gradient_b
‘lr’ in code signifies ‘learning rate’, and this would be multiplied to the gradient value so that this will determine how much you would like to move from the current spot for better performance of the model. And the gradient, which I mentioned just before shows how much you missed out from the target value. So iteratively updating gradients and fixing it with the tuned learning rate, we could hopefully and finally reach the desired performance of the model.
7. Test the performance of the model
After step 6, we would have W and b, which has been updated 10,000 times iteratively. So, what we have is a linear discriminating function whose slope is W and intercept is b.
for i in range(len(X_te)):
r = sigmoid(np.dot(X_te, W) + b)
And this will return sequence of probabilities between 0 and 1. And according to the property that we assigned while defining the function Sigmoid earlier, if the value exceeds 0.5, the final value would be 1, indicating the person would have bought insurance.
What I have got was,
[0.7919596733344769, 0.9410453079178125, 0.9152338193767108, 0.08000347093257619, 0.17797780617706674, 0.9045616415408676, 0.8491202536233904]
for the 7 people in the test dataset. And the end result is [1, 1, 1, 0, 0, 1, 1], so I have missed 3rd person’s decision, compared to the target [1 1 0 0 0 1 1].
There are tons of ways to upgrade this simple model. You can change the learning rate or a number of iterations. Also, you can deploy other types of loss functions such as cross-entropy or log-likelihood and so forth.
So, happy learning!
And we welcome your feedback and comments :)!!!!!
Thank you for your time, and we would be sooooo happy for the clap!!! Yeyyy