Univariate Linear Regression with mathematics in Python

Published in

Becoming Human: Artificial Intelligence Magazine

6 min readJan 5, 2021

In general, linear regression is an approach to modelling the relationship between a dependent variable and independent variables. Linear regression is also consider as next step up after correlation. It is function to predict the dependent value of the output variable based on the value of another independent variable. Univariate and multivariate regression represent two approaches to statistical analysis. Univariate involves the analysis of a single variable while multivariate analysis examines two or more variables. Most multivariate analysis involves a dependent variable and multiple independent variables. In this short article, we will focus on univariate linear regression and determine the relationship between one independent (explanatory variable) variable and one dependent variable. For example, given training data including size of house and its respective price, we would like to predict the price based on size of house.

Part of notes from lecture course conducted by Andrew Ng.

In this work, a linear regression algorithm is implemented without using machine learning library such as Scikit-Learn or TensorFlow. The dataset used in this simple experiment is obtained from Kaggle.

Using the given x and y point values, we will investigate the relationship between unit x and y. It is typical univariate regression case as it only consider one feature and one dependent variable. Although the relationship between x and y is obviously to be spotted, it is still a good example to demostrate the application of univariate linear regression.

Import data and libraries

Firstly, we have to import some of the essential libraries.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

The downloaded dataset file named fabricated_points.csv. It may vary for different users if they rename the file once downloaded.

data = pd.read_csv("fabricated_points.csv")

We assign the feature values and corresponding output values to X and Y arrays accordingly.

X = np.array(data['x']).reshape(-1,1)
Y = np.array(data['y']).reshape(-1,1)

The X array consists of values from x column and then reshape to 1 column as shown in below.

Determine gradient using Normal Equation

The motive in Linear Regression is to minimize the cost function. Before we build linear regression model from scratch, it will be great if we have someting to verify our model is working correctly. One of the method to determine the minimum cost value is Normal Equation. It is an analytical way to solve for weights (or called parameter) with a Least Square Cost Function.

Linear Regression and Gradient Descent

There are few optimization algorithms for finding a local minimum in regression. Gradient descent is the iterative algorithm that used to optimize the learning. The purpose is to minimize the cost function value. Now, let’s try gradient descent to optimize the cost function with some learning rate. Assuming no regularization take into consideration. Recall that the parameters of our model are the theta, θ values.

The hypothesis can be describe with this typical linear equation:

The cost function is

where m is the amount of data points.

Reminder again, the objective of linear regression is to minimize the cost function. Thus, the goal is to minimize J(θ0, θ1) and we will fit the linear regression parameters with data using gradient descent. There are two main function defined, derivative_J_theta() and gradient_descent() to perform gradient descent algorithm.

As shown in the code below, the computation mainly is using the equation express mathematically as above. The derivative_J_theta() function to compute θ0 and θ1. Next, we measure the accuracy or loss of our hypothesis function by using the cost function.

def derivative_J_theta(x, y, theta_0, theta_1):

  delta_J_theta0 = 0
  delta_J_theta1 = 0

  for i in range(len(x)):
    delta_J_theta0 +=  (((theta_1 * x[i]) + theta_0) - y[i])
    delta_J_theta1 +=  (1/x.shape[0]) * (((theta_1 * x[i]) + theta_0) - y[i]) * x[i]
  
  temp0 = theta_0 - (learning_rate * ((1/x.shape[0]) * delta_J_theta0) )
  temp1 = theta_1 - (learning_rate * ((1/x.shape[0]) * delta_J_theta1) )
  
  return temp0, temp1
def gradient_descent(x, y, learning_rate, starting_theta_0, starting_theta_1, iteration_num):
  store_theta_0 = np.empty([iteration_num])
  store_theta_1 = np.empty([iteration_num])
  # store_j_theta = []

  theta_0 = starting_theta_0
  theta_1 = starting_theta_1

  for i in range(iteration_num):
    theta_0, theta_1 = derivative_J_theta(x, y, theta_0, theta_1)
    store_theta_0[i] = theta_0
    store_theta_1[i] = theta_1
    store_j_theta = ((1/2*X.shape[0]) * ( ((theta_1 * X) + theta_0) - Y)**2)
    # store_j_theta.append((1/2*X.shape[0]) * ( ((theta_1 * X) + theta_0) - Y)**2)


  return theta_0, theta_1, store_theta_0, store_theta_1, store_j_theta

We can now training the model with small iteration number first and observe the result.

x = X
y= Y
learning_rate =  0.01
iteration_num = 10
starting_theta_0 = 0
starting_theta_1 = 0


theta_0, theta_1, store_theta_0, store_theta_1, store_j_theta = gradient_descent(x, y, learning_rate, starting_theta_0, starting_theta_1, iteration_num)

print("m : %f" %theta_0[0])
print("b : %f" %theta_1[0])

The m and b value we will obtain is 3.0219 and 0.6846 respectively.

Let’s plot the line to see how well the hypothesis fit into our data.

plt.scatter(X[:,0],Y[:,0])
plt.plot(X,(theta_1 * X) + theta_0, c='green')
plt.plot(X, X.dot(ne_theta), c='red')
plt.title('r=%f'%ne_theta[0,0])
plt.show()

The green line indicates our prediction, while red line is the normal equation.

Almost there! It seem like more iteration will generate better results. We will increase the iteration number to 100.

iteration_num = 100

The m value now is 3.3572 and b is 0.9560. Plot the graph again.

Great! The new trained parameter θ0 and θ1 are both optimized and almost align with the best fit red line. It indicated our gradient descent algorithm is working well.

From this example, we can understand clearly on the mathematics fundamental behind the univariate linear regression algorithm, and it can be very useful to perform prediction in machine learning applications.

Additional information about the differences between Gradient Descent and Normal Equation are summarized in the short notes.