Simple Linear Regression

Ajay krishnan
6 min readNov 18, 2021

--

In Machine Learning, linear regression is a linear approach for modelling the relationship between a scalar target variable and one or more explanatory feature variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

Simple Linear Regression model assumes a linear relationship between the input variables (x) and the single target output

Linear Regression is used to predict output on a continuous spectrum
for an example : predicting the salary based on years of experience
or predicting the fuel economy based on horse power of the vehicle.
Linear Regression output is not bounded , it can go from zero to infinity.

Linear Regression

Dependent variable = Y -intercept + slope * Independent variable
or
Target variable = Y -intercept + slope * Feature variable

In this tutorial we’ll look at how to implement linear regression from scratch using python and we’ll also compare our model with sklearn.linear_model.LinearRegression

Linear Regression from scratch using python

Step 1 : we need some dummy data

from sklearn.datasets import make_regressionX, y, coef = make_regression(n_samples=1000,
n_features=1,
n_informative=1,
noise=20.0,
bias=1.0,
coef=True,
random_state=42)
# plot the data
plt.scatter(X, y, marker='+', c='seagreen');
plt.scatter(X, y, marker=’+’, c=’seagreen’);

Step 2 : Create a Dataframe

df = pd.DataFrame(data={
‘feature’:X.reshape(-1),
‘target’:y,
‘weight’:coef,
‘bias’:1.0,
‘y=Wx+b’:(X.reshape(-1)*coef)+1.0 })
df.head()
df.head()

Step 3 : Split data into train and test

## split the datafrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X.reshape(-1),
y,
test_size=0.2,
random_state=42)
print('Shape of training feature',X_train.shape)print('Shape of training target',y_train.shape)print('Shape of testing feature',X_test.shape)print('Shape of testing target',y_test.shape)
Data splits

Step 4 : Weight and Bias

y = W.X+ b

W = Weights
X = Features
b = bais

Here in this tutorial we have a single feature X and a target variable y.
Lets randomly assign a weight and bias using numpy

# this method will give us random floats as weight and bias
def
get_weight_and_bias():
Weights = np.random.rand()
Bias = 0.01*np.random.rand()
return Weights,Bias

Step 5 : Modeling

We now have our features X_train, target variable y_train , a method that will output random floats for Weight and Bias . Now lets create a method that perform linear regression given Features , Weight and Bias

def linear_regression(features, weights, bias):
y_hat = (features*weights)+bias
return y_hat

Step 6 : Loss function

Loss functions play an important role in any Machine learning model,
they define an objective with which the performance of the model is evaluated against and the parameters learned by the model are determined by minimizing a chosen loss function.

Lets define Mean Square Error as loss function for this regression problem

# Mean Squared Error
def loss_fn(ground_truth, predictions):
return np.mean(np.square((ground_truth-predictions)))
# returns (1/len(ground_truth))*np.sum((ground_truth -
predictions)**2)

Step 7 : Gradient descent

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function.

We use mean squared error as the Loss function
MSE = 1/N * (y_true — y_pred)**2

Where y_pred = Wx+b (linear regression)

error = y_true — Wx+b

derivative of loss with respect to Weights
-2/N * (error) * x

derivative of loss with respect to bias
-2/N * (error) * 1

We want to move in the negative direction of the gradient to minimize the loss

Updating weights and bias

W(k+1) = W(k) + (Learning_rate * -Gradient)

B(k+1) = B(k) + (Learning_rate * -Gradient)

def gradient_descent(features, ground_truth, predictions):
# difference btw true-values and predicted-values is the error
error = ground_truth-predictions
# derivative of loss wrt weights
dW = -2*np.mean((error*features))
# derivative of loss wrt bias
db = -2*np.mean(error)

return dW,db

Step 8 : Training a Linear model

def model_training(epochs,W,b,learning_rate=1e-3):

for i in range(epochs):
# get predictions
y_hat = linear_regression(X_train, weights=W, bias=b)
# compute loss
loss = loss_fn(y_train, y_hat)

# optimize the model parameters
dW, db = gradient_descent(X_train, y_train, y_hat)
# update the weights
W = W + (learning_rate*-dW)
b = b + (learning_rate*-db)

# print the loss
print('epoch:',i,' loss:',loss)

# Model Training -----------------------

w,b = get_weight_and_bias()
# lets train for just 10 epochs and check the loss
model_training(10,w,b)
Model Training

We can see that the loss is getting reduced as number of epochs increasing , this is good , we have implemented linear regression in python

Complete code for Linear model

class Linear_reg:def __init__(self, learning_rate=1e-3, weight= np.random.rand(), 
bias=0.001*np.random.rand()):
self.learning_rate = learning_rate
self.weight = weight
self.bias = bias
def linear_regression(self, features):
return (features*self.weight)+self.bias
def loss_fn(self, ground_truth, predictions):
return np.mean(np.square((ground_truth-predictions)))
def gradient_descent(self, features, ground_truth, predictions):
error = ground_truth - predictions
dW = -2*np.mean((error*features))
db = -2*np.mean(error)
return dW,db
def optimize_model_parametes(self, features, ground_truth, preds)
dW, db = self.gradient_descent(features, ground_truth, preds)
self.weight += self.learning_rate * -dW
self.bias += self.learning_rate * -db
def fit(self, X, y_true, epochs=10, to_print=False):
history={'epoch':[],'loss':[]}
for epoch in range(epochs):
y_hat = self.linear_regression(X)
loss = self.loss_fn(y_true, y_hat)
self.optimize_model_parametes(X, y_true, y_hat)
if to_print:
print('epoch:',epoch,'loss:',loss)
history['epoch'].append(epoch)
history['loss'].append(loss)
return history
def predict(self ,test_features):
y_hat = self.linear_regression(test_features)
return y_hat
def get_model_coef(self):
return self.weight, self.bias
def evaluate(self,test_features,y_test):
y_hat = self.predict(test_features)
loss = self.loss_fn(y_test,y_hat)
return loss

Lets instantiate the model and make some predictions after training.
For the purpose of understanding the performance of the model lets first create some utility functions
An utility to plot the Learning curve and an utility to plot the predictions

# utility to plot the Learning curve
def
plot_learning_curve(model,history):
model_coef = model.get_model_coef()
plt.plot(history['loss']);
plt.title(f'Learning curve # Learned Weight:{model_coef[0] :.2f}
and bias:{model_coef[1] :.2f}')
plt.xlabel('Epochs')
plt.ylabel('Mean squared error')
plt.show()
# an utility to plot the predictionsdef plot_model_predictions(model, X_test=X_test, y_test=y_test):
model_predictions = model.predict(X_test)
plt.scatter(X_test, y_test, marker='*', color='seagreen')
plt.scatter(X_test,model_predictions, marker='+', color='salmon')
plt.plot(X_test,model_predictions)
plt.title('Model predictions on test set')
plt.show()

Machine Learning is all about experiments

model1.fit(X_train, y_train, epochs=1000)
model1 = Linear_reg()history1 = model1.fit(
X_train,
y_train,
epochs=1000)
plot_model_predictions(model1)

Now lets check the model parameters : the weight and bias

def print_model_coef(model_coef):
print(f'Actual Weight:{coef :.2f} and bias:{df["bias"][0]}')
print(f'Learned Weight:{model_coef[0] :.2f} and bias:{model_coef[1] :.2f}')
# best parameters learned by the model
model1_coef = model1.get_model_coef()
print_model_coef(model1_coef)
print_model_coef(model1_coef)

Now let compare the model we build from scratch with sklearn Linear model

# Lets use sklearn Linear model and check the coeffrom sklearn.linear_model import LinearRegression
sklearn_model = LinearRegression()
sklearn_model.fit(X_train.reshape(-1,1), y_train)
print('Sklearn LinearModel coef Weight:',sklearn_model.coef_,
Sklearn LinearModel intercept 'bias:',sklearn_model.intercept_)
sklearn model coef and intercept

Next steps

Standardize the data
Its a good practice to normalize the features that have different scales and range.
This is important because the features are multiplied by model weights so the scale of the output and the scale of the gradient are affected by the scale of the inputs
We can use Standard Scaler to scale the Features to zero mean, unit variance or we can do it ourselves just compute the mean and std of the data then for each datapoint substract off the mean and divide by std

Hyperparameter tuning
In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.
We can change the learning rate of a model to see if it improves the learning.
In setting a learning rate, there is a trade-off between the rate of convergence and overshooting.

Resources

Colab notebook click for notebook

--

--

No responses yet