Multiple Linear Regression From Scratch using Python

7 min readNov 20, 2021

In Machine Learning, Multiple regression is a linear approach for modelling the relationship between a scalar target variable and one or more explanatory feature variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

Today we are going to create from scratch Multiple linear regression using python and compare our model with sklearn model.
Plus lets plot our predictions in 3D . Its lot of fun so lets get started…

Linear Regression

Dependent variable = Y -intercept + slope * Independent variable

Multiple Linear Regression

Dependent variable = Y -intercept + slope(i) * Independent variable(i)
Or
y_hat = bias + W(1).x(1) + W(2).x(2) + …. + W(n).x(n)

Step 1 : Create some dummy data

from sklearn.datasets import make_regressionX, y, coef = make_regression(n_samples=1000,
                             n_features=2,
                             n_informative=2,
                             noise=10.0,
                             bias=1.0,
                             coef=True,
                             random_state=42)

Lets create a pandas dataframe using the data we just created

# yhat = '(w1*x1)+(w2*x2)+bias'df = pd.DataFrame(
           data={'feature1':X[:,0],
                 'feature2':X[:,1],
                 'target (y)':y, 
                 'weight1':coef[0],
                 'weight2':coef[1], 
                 'bias':1 , 
                 'y_hat': ((coef[0]*X[:,0])+(coef[1]*X[:,1]))+1 })df.head() # let view the dataframe

Step 2 : Visualize the data

plt.figure(figsize=(12,8))
plt.subplot(2,2,1)
sns.regplot(x='feature1',y='target (y)', data=df , marker='x', color='lightblue')
plt.subplot(2,2,2)
sns.regplot(x='feature2',y='target (y)', data=df , marker='x', color='lightblue')
plt.subplot(2,2,3)
sns.regplot(x='feature1',y='y_hat', data=df , marker='+', color='salmon')
plt.subplot(2,2,4)
sns.regplot(x='feature2',y='y_hat', data=df , marker='+', color='salmon')


plt.show()

We can use sns.jointplot() Draw a plot of two variables with bivariate and univariate graphs.sns.jointplot(x='feature'
              y='target',
              data=df ,
              marker='x')TO Plot pairwise relationships in a dataset 
use sns.pairplot()sns.pairplot(dataframe);

sns.pairplot( df [[ ‘feature1’ , ’feature2' , ’target (y) ’ ]] )

Step 3 : Training and Testing data

This step is very important in preventing Data Leakage

X = df[[‘feature1’,’feature2']]
y = df[‘target (y)’]print(‘Shape of Independent variable’,X.shape)
print(‘Shape of Dependent variable’,y.shape)X_train, X_test, y_train, y_test = train_test_split(X, y)
print(‘Shape of Training data X’,X_train.shape)
print(‘Shape of Training data y’,y_train.shape)
print(‘Shape of Testing data X’,X_test.shape)
print(‘Shape of Testing data y’,y_test.shape)

Step 4 : Weight matrix and Bias

Given the number of weights and bias required , this utility method will create and return random floats as output

def get_weight_and_bias(num_w, num_b):
  Weights = np.random.rand(num_w)
  Bias = 0.01*np.random.rand(num_b)
  return Weights,Bias

Step 5 : Multiple Linear Regression From Scratch

It sounds scary, I know, but we can accomplish this in a very few lines of code

def multiple_regression(features, weights, bias):
  y_hat = (features@weights)+bias
  return y_hat

Step 6 : Loss function

Lets use Mean Squared error as our loss function
Mean squared error = mean of (y_true — y_preds)**2

def loss_fn(ground_truth, predictions):
  return np.mean(np.square((ground_truth-predictions)))
# for geeks 
# (1/len(ground_truth))*np.sum((ground_truth - predictions)**2)

Step 7 : Gradient Descent

Another scary term , another few lines of code. that’s how we are going to approach this.

But lets get into some theory first , what is gradient descent?
Its an optimization algorithm , it help us move along the right direction in the loss surface

Oh.. boy another term , loss surface , what’s that ?

We are predicting dependent variable (y) using independent variables X(i)

y_hat = bias + W(1).x(1) + W(2).x(2) + …. + W(n).x(n)
W(i) = Weights , which are learned by the model
X(i) = Features

This prediction of ours y_hat may have some errors when we compare it with true values for dependent variable (y) , We want to minimize this error.
That’s why we defined a loss function, a function that computes the distance between the current output of the algorithm and the expected output.

How can we do this error minimization : By Learning good weights

How do we learn good weights ?
Lets use a loss function : MeanSquareError, to check how off we are then lets differentiate the loss fucntion with respect to model weights &bias to find the derivatives . Once we have the derivatives lets update the weights & bias in such a way that after each step we are moving in the right direction , that is to say we are minimizing the error with each updation.

“Loss surface” refers to the graph of this function. if Loss function is multidimensional then we obtain a graph in some high-dimensional space. Our task is finding the minima in this loss surface.

Gradient Descent

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.

We use mean squared error as the Loss function
loss = 1/N * (y_true — y_pred)**2

Where y_pred = W1*X1 + W2*X2 + …. + Wn*Xn (Multiple regression)

error = y_true — ( W1*X1 + W2*X2 + …. + Wn*Xn )

derivative loss with respect to Weights
-2/N (error) Xi

derivative with respect to bias
-2/N (error) 1

Updating weights and bias

We want to move in the negative direction of the gradient

W(k+1)= W(k) + (Learning_rate * -Gradient)

B(k+1) = B(k) + (Learning_rate * -Gradient)

Enough theory let’s start coding , lets build this Gradient descent optimizer

# I told you, this only sounds scary ... few lines of code that's itdef gradient_descent(features, ground_truth, predictions):
  # difference btw true-values and predicted-values is the error
  error = ground_truth-predictions
  
  # derivative of loss wrt weights
  dW=[]
  for col in features.columns:
    dW.append(-2*np.mean((error*features[col])))

  # derivative of loss wrt bias
  db = -2*np.mean(error)
  
  return dW,db

Training a Multiple Regression Model

def model_training(epochs,W,b,learning_rate=1e-3):

  for i in range(epochs):
    # get predictions
    y_hat = multiple_regression(X_train, weights=W, bias=b)
    # compute loss
    loss = loss_fn(y_train, y_hat)

    # optimize the model parameters 
    [dW1, dw2], db = gradient_descent(X_train, y_train, y_hat)
    # update the weights
    W[0] = W[0] + (learning_rate*-dW1)
    W[1] = W[1] + (learning_rate*-dW1)
    b = b + (learning_rate*-db)
    
    # print the loss

    print('epoch:',i,' loss:',loss)

We can see that the loss is decreasing with every epochs
That’s it we are doing multiple linear regression.

# Training
model_training(10,W,b)

Complete code of Multiple Linear Regression

class Multiple_reg:
  def __init__(self, learning_rate=1e-3, num_weights=2 , 
               bias=0.001*np.random.rand()):
    self.learning_rate = learning_rate
    self.weight = np.random.rand(num_weights)
    self.bias = bias
  
  def multiple_regression(self, features):
    return (features@self.weight)+self.bias

  def loss_fn(self, ground_truth, predictions):
    return np.mean(np.square((ground_truth-predictions)))

  def gradient_descent(self, features, ground_truth, predictions):
    error = ground_truth - predictions
    dW =[]
    for col in features.columns:
      dW.append(-2*np.mean((error*features[col])))
    db = -2*np.mean(error)
    return dW,db
  
  def optimize_model_parametes(self, features, ground_truth, 
                                  predictions):
    dW, db = self.gradient_descent(features, ground_truth, 
                                    predictions)
    self.weight[0] += self.learning_rate * -dW[0]
    self.weight[1] += self.learning_rate * -dW[1]
    self.bias += self.learning_rate * -db

  def fit(self, X, y_true, epochs=10, to_print=False):    
    history={'epoch':[],'loss':[]}
    for epoch in range(epochs):
      y_hat = self.multiple_regression(X)
      loss = self.loss_fn(y_true, y_hat)
      self.optimize_model_parametes(X, y_true, y_hat)
      if to_print:
        print('epoch:',epoch,'loss:',loss)
      history['epoch'].append(epoch)
      history['loss'].append(loss)
    return history
    
  def predict(self ,test_features):
    y_hat = self.multiple_regression(test_features)
    return y_hat
      
  def get_model_coef(self):
    return self.weight, self.bias

  def evaluate(self,test_features,y_test):
    y_hat = self.predict(test_features)
    loss = self.loss_fn(y_test,y_hat)
    return loss

let’s instantiate multiple regression model that we created

make some predictions
Compare performance with sklearn model
plot the predictions

model1 = Multiple_reg()history1 = model1.fit(X_train, y_train, epochs=1000)

Let’s build some utility fuctions to help us plot the model and plot the learning curve of the model

# Helper functions# LEARNING CURVE
def plot_learning_curve(model,history):
  model_coef,bias = model.get_model_coef()
  plt.figure(figsize=(8,5))
  plt.plot(history['loss']);
  plt.title(f'Learning curve # Learned Weights:{model_coef} and bias:{bias :.2f}')
  plt.xlabel('Epochs')
  plt.ylabel('Mean squared error')
  plt.show()

# Lets use sklearn Linear model and compare performance
def print_model_coef(model):
  model_coef,bias = model.get_model_coef()
  print(f'Actual Weight:{coef} and bias:{df["bias"][0]}')
  print(f'Learned Weight:{model_coef} and bias:{bias :.2f}')

  from sklearn.linear_model import LinearRegression
  sklearn_model = LinearRegression()
  sklearn_model.fit(X_train, y_train)
  print('Sklearn LinearModel coef Weight:',
         sklearn_model.coef_,
        'bias:',
         sklearn_model.intercept_)

plot_learning_curve( model1 , history1 )

print_model_coef( model1 )

Wow..! our simple model did a good job learning the parameters from the data

Lets make some predictions

# make predictions
predictions = model1.predict(X_test)
print('MAE', mean_absolute_error(y_test,predictions))
print('MSE', mean_squared_error(y_test,predictions))

Evaluation metrics

Plotting the predictions

Our final task before the wrap-up visualizing the model predictions.
Humans are visual creatures lets put that to test. Half of the human brain is directly or indirectly devoted to processing visual information

For plotting in 3D we need X-axis, Y-axis & Z-axis
We can use np.meshgrid and our model to get the axis required.
Its very simple to code than to explain in lengthy words, so let’s jump straight to the code

# utility functions for plotting predictions
def get_axis(x_axis,y_axis,model):
  # datapoints
  X = np.linspace(x_axis.min(), x_axis.max())
  Y = np.linspace(y_axis.min(), y_axis.max())
  # create a mesh grid
  XX, YY = np.meshgrid(X, Y)
  # z-axis
  z = model.predict(pd.DataFrame(
                      {'f1':np.ravel(XX),
                       'f2':np.ravel(YY)}))
  ZZ = np.reshape(z.values, XX.shape)
  return XX,YY,ZZ

Lets create X-axis, Y-axis & Z-axis using the above utility

X, Y, Z = get_axis(X_test['feature1'],X_test['feature2'], model1)

Our 3D plots

Surface plot
Contour3D
wireframe

fig = plt.figure(figsize=(20,8))# SURFACE PLOT
ax = plt.subplot(1,3,1,projection='3d')
ax.scatter3D(X_test['feature1'], X_test['feature2'], predictions, marker='+', color='red' )
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_title('surface plot');

# CONTOUR3D
ax = plt.subplot(1,3,2,projection='3d')
ax.scatter3D(X_test['feature1'], X_test['feature2'], predictions, marker='+', color='red' )
ax.contour3D(X, Y, Z, 50, cmap='binary');
ax.set_title('contour 3D plot');


# WIREFRAME
ax = plt.subplot(1,3,3,projection='3d')
ax.scatter3D(X_test['feature1'], X_test['feature2'], predictions, marker='+', color='red' )
ax.plot_wireframe(X, Y, Z, rcount=10, ccount=10, color='black')
ax.set_title('wireframe');