Multiple Linear Regression From Scratch using Python
In Machine Learning, Multiple regression is a linear approach for modelling the relationship between a scalar target variable and one or more explanatory feature variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.
Today we are going to create from scratch Multiple linear regression using python and compare our model with sklearn model.
Plus lets plot our predictions in 3D . Its lot of fun so lets get started…
Linear Regression
Dependent variable = Y -intercept + slope * Independent variable
Multiple Linear Regression
Dependent variable = Y -intercept + slope(i) * Independent variable(i)
Or
y_hat = bias + W(1).x(1) + W(2).x(2) + …. + W(n).x(n)
Step 1 : Create some dummy data
from sklearn.datasets import make_regressionX, y, coef = make_regression(n_samples=1000,
n_features=2,
n_informative=2,
noise=10.0,
bias=1.0,
coef=True,
random_state=42)
Lets create a pandas dataframe using the data we just created
# yhat = '(w1*x1)+(w2*x2)+bias'df = pd.DataFrame(
data={'feature1':X[:,0],
'feature2':X[:,1],
'target (y)':y,
'weight1':coef[0],
'weight2':coef[1],
'bias':1 ,
'y_hat': ((coef[0]*X[:,0])+(coef[1]*X[:,1]))+1 })df.head() # let view the dataframe
Step 2 : Visualize the data
plt.figure(figsize=(12,8))
plt.subplot(2,2,1)
sns.regplot(x='feature1',y='target (y)', data=df , marker='x', color='lightblue')
plt.subplot(2,2,2)
sns.regplot(x='feature2',y='target (y)', data=df , marker='x', color='lightblue')
plt.subplot(2,2,3)
sns.regplot(x='feature1',y='y_hat', data=df , marker='+', color='salmon')
plt.subplot(2,2,4)
sns.regplot(x='feature2',y='y_hat', data=df , marker='+', color='salmon')
plt.show()
We can use sns.jointplot() Draw a plot of two variables with bivariate and univariate graphs.sns.jointplot(x='feature'
y='target',
data=df ,
marker='x')TO Plot pairwise relationships in a dataset
use sns.pairplot()sns.pairplot(dataframe);
Step 3 : Training and Testing data
This step is very important in preventing Data Leakage
X = df[[‘feature1’,’feature2']]
y = df[‘target (y)’]print(‘Shape of Independent variable’,X.shape)
print(‘Shape of Dependent variable’,y.shape)X_train, X_test, y_train, y_test = train_test_split(X, y)
print(‘Shape of Training data X’,X_train.shape)
print(‘Shape of Training data y’,y_train.shape)
print(‘Shape of Testing data X’,X_test.shape)
print(‘Shape of Testing data y’,y_test.shape)
Step 4 : Weight matrix and Bias
Given the number of weights and bias required , this utility method will create and return random floats as output
def get_weight_and_bias(num_w, num_b):
Weights = np.random.rand(num_w)
Bias = 0.01*np.random.rand(num_b)
return Weights,Bias
Step 5 : Multiple Linear Regression From Scratch
It sounds scary, I know, but we can accomplish this in a very few lines of code
def multiple_regression(features, weights, bias):
y_hat = (features@weights)+bias
return y_hat
Step 6 : Loss function
Lets use Mean Squared error as our loss function
Mean squared error = mean of (y_true — y_preds)**2
def loss_fn(ground_truth, predictions):
return np.mean(np.square((ground_truth-predictions)))
# for geeks
# (1/len(ground_truth))*np.sum((ground_truth - predictions)**2)
Step 7 : Gradient Descent
Another scary term , another few lines of code. that’s how we are going to approach this.
But lets get into some theory first , what is gradient descent?
Its an optimization algorithm , it help us move along the right direction in the loss surface
Oh.. boy another term , loss surface , what’s that ?
We are predicting dependent variable (y) using independent variables X(i)
y_hat = bias + W(1).x(1) + W(2).x(2) + …. + W(n).x(n)
W(i) = Weights , which are learned by the model
X(i) = Features
This prediction of ours y_hat may have some errors when we compare it with true values for dependent variable (y) , We want to minimize this error.
That’s why we defined a loss function, a function that computes the distance between the current output of the algorithm and the expected output.
How can we do this error minimization : By Learning good weights
How do we learn good weights ?
Lets use a loss function : MeanSquareError, to check how off we are then lets differentiate the loss fucntion with respect to model weights &bias to find the derivatives . Once we have the derivatives lets update the weights & bias in such a way that after each step we are moving in the right direction , that is to say we are minimizing the error with each updation.
“Loss surface” refers to the graph of this function. if Loss function is multidimensional then we obtain a graph in some high-dimensional space. Our task is finding the minima in this loss surface.
Gradient Descent
Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.
We use mean squared error as the Loss function
loss = 1/N * (y_true — y_pred)**2
Where y_pred = W1*X1 + W2*X2 + …. + Wn*Xn (Multiple regression)
error = y_true — ( W1*X1 + W2*X2 + …. + Wn*Xn )
derivative loss with respect to Weights
-2/N (error) Xi
derivative with respect to bias
-2/N (error) 1
Updating weights and bias
We want to move in the negative direction of the gradient
W(k+1)= W(k) + (Learning_rate * -Gradient)
B(k+1) = B(k) + (Learning_rate * -Gradient)
Enough theory let’s start coding , lets build this Gradient descent optimizer
# I told you, this only sounds scary ... few lines of code that's itdef gradient_descent(features, ground_truth, predictions):
# difference btw true-values and predicted-values is the error
error = ground_truth-predictions
# derivative of loss wrt weights
dW=[]
for col in features.columns:
dW.append(-2*np.mean((error*features[col])))
# derivative of loss wrt bias
db = -2*np.mean(error)
return dW,db
Training a Multiple Regression Model
def model_training(epochs,W,b,learning_rate=1e-3):
for i in range(epochs):
# get predictions
y_hat = multiple_regression(X_train, weights=W, bias=b)
# compute loss
loss = loss_fn(y_train, y_hat)
# optimize the model parameters
[dW1, dw2], db = gradient_descent(X_train, y_train, y_hat)
# update the weights
W[0] = W[0] + (learning_rate*-dW1)
W[1] = W[1] + (learning_rate*-dW1)
b = b + (learning_rate*-db)
# print the loss
print('epoch:',i,' loss:',loss)
We can see that the loss is decreasing with every epochs
That’s it we are doing multiple linear regression.
# Training
model_training(10,W,b)
Complete code of Multiple Linear Regression
class Multiple_reg:
def __init__(self, learning_rate=1e-3, num_weights=2 ,
bias=0.001*np.random.rand()):
self.learning_rate = learning_rate
self.weight = np.random.rand(num_weights)
self.bias = bias
def multiple_regression(self, features):
return (features@self.weight)+self.bias
def loss_fn(self, ground_truth, predictions):
return np.mean(np.square((ground_truth-predictions)))
def gradient_descent(self, features, ground_truth, predictions):
error = ground_truth - predictions
dW =[]
for col in features.columns:
dW.append(-2*np.mean((error*features[col])))
db = -2*np.mean(error)
return dW,db
def optimize_model_parametes(self, features, ground_truth,
predictions):
dW, db = self.gradient_descent(features, ground_truth,
predictions)
self.weight[0] += self.learning_rate * -dW[0]
self.weight[1] += self.learning_rate * -dW[1]
self.bias += self.learning_rate * -db
def fit(self, X, y_true, epochs=10, to_print=False):
history={'epoch':[],'loss':[]}
for epoch in range(epochs):
y_hat = self.multiple_regression(X)
loss = self.loss_fn(y_true, y_hat)
self.optimize_model_parametes(X, y_true, y_hat)
if to_print:
print('epoch:',epoch,'loss:',loss)
history['epoch'].append(epoch)
history['loss'].append(loss)
return history
def predict(self ,test_features):
y_hat = self.multiple_regression(test_features)
return y_hat
def get_model_coef(self):
return self.weight, self.bias
def evaluate(self,test_features,y_test):
y_hat = self.predict(test_features)
loss = self.loss_fn(y_test,y_hat)
return loss
let’s instantiate multiple regression model that we created
- make some predictions
- Compare performance with sklearn model
- plot the predictions
model1 = Multiple_reg()history1 = model1.fit(X_train, y_train, epochs=1000)
Let’s build some utility fuctions to help us plot the model and plot the learning curve of the model
# Helper functions# LEARNING CURVE
def plot_learning_curve(model,history):
model_coef,bias = model.get_model_coef()
plt.figure(figsize=(8,5))
plt.plot(history['loss']);
plt.title(f'Learning curve # Learned Weights:{model_coef} and bias:{bias :.2f}')
plt.xlabel('Epochs')
plt.ylabel('Mean squared error')
plt.show()
# Lets use sklearn Linear model and compare performance
def print_model_coef(model):
model_coef,bias = model.get_model_coef()
print(f'Actual Weight:{coef} and bias:{df["bias"][0]}')
print(f'Learned Weight:{model_coef} and bias:{bias :.2f}')
from sklearn.linear_model import LinearRegression
sklearn_model = LinearRegression()
sklearn_model.fit(X_train, y_train)
print('Sklearn LinearModel coef Weight:',
sklearn_model.coef_,
'bias:',
sklearn_model.intercept_)
Wow..! our simple model did a good job learning the parameters from the data
Lets make some predictions
# make predictions
predictions = model1.predict(X_test)
print('MAE', mean_absolute_error(y_test,predictions))
print('MSE', mean_squared_error(y_test,predictions))
Plotting the predictions
Our final task before the wrap-up visualizing the model predictions.
Humans are visual creatures lets put that to test. Half of the human brain is directly or indirectly devoted to processing visual information
For plotting in 3D we need X-axis, Y-axis & Z-axis
We can use np.meshgrid and our model to get the axis required.
Its very simple to code than to explain in lengthy words, so let’s jump straight to the code
# utility functions for plotting predictions
def get_axis(x_axis,y_axis,model):
# datapoints
X = np.linspace(x_axis.min(), x_axis.max())
Y = np.linspace(y_axis.min(), y_axis.max())
# create a mesh grid
XX, YY = np.meshgrid(X, Y)
# z-axis
z = model.predict(pd.DataFrame(
{'f1':np.ravel(XX),
'f2':np.ravel(YY)}))
ZZ = np.reshape(z.values, XX.shape)
return XX,YY,ZZ
Lets create X-axis, Y-axis & Z-axis using the above utility
X, Y, Z = get_axis(X_test['feature1'],X_test['feature2'], model1)
Our 3D plots
- Surface plot
- Contour3D
- wireframe
fig = plt.figure(figsize=(20,8))# SURFACE PLOT
ax = plt.subplot(1,3,1,projection='3d')
ax.scatter3D(X_test['feature1'], X_test['feature2'], predictions, marker='+', color='red' )
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_title('surface plot');
# CONTOUR3D
ax = plt.subplot(1,3,2,projection='3d')
ax.scatter3D(X_test['feature1'], X_test['feature2'], predictions, marker='+', color='red' )
ax.contour3D(X, Y, Z, 50, cmap='binary');
ax.set_title('contour 3D plot');
# WIREFRAME
ax = plt.subplot(1,3,3,projection='3d')
ax.scatter3D(X_test['feature1'], X_test['feature2'], predictions, marker='+', color='red' )
ax.plot_wireframe(X, Y, Z, rcount=10, ccount=10, color='black')
ax.set_title('wireframe');