### Solving a simple linear regression line¶

In a simple linear regression, the best line fit is:

$$y = a + bx$$

where, $\;\; b = \frac{ (\sum_{i=1}^{n} x_iy_i) - n\bar{x}\bar{y} }{\sum x^2 - n \bar{x}^2}$ and, $\;\;a = \bar{y} - b\bar{x}$

$\bar{x}$ and $\bar{y}$ are the means of the corresponding axis points.

A small example:

In [1]:
import numpy as np

In [2]:
points = np.array([
[0,0],
[5,7],
[10,10],
[15,13],
[20,20]])

In [3]:
x = np.array(points.T[0])
y = np.array(points.T[1])


The solution

In [4]:
A = np.vstack([x, np.ones(len(x))]).T
b, a = np.linalg.lstsq(A, y)[0]
print('y = {} + {}x'.format(a, b))

y = 0.8 + 0.92x


Plot the line:

In [5]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(x, y, 'o', label='Original data')
plt.plot(x, b*x + a, label='Fitted line')
plt.legend(bbox_to_anchor=(1.5,1));plt.grid();plt.show()


Compute the error:

$error = y - \hat{y}$

where, $y$ is the actual point, and $\hat{y}$ is the predicted one.

The Sum Square Error SSE:

$SSE = \sum_{i=1}^{n} (y - \hat{y})^2$

In [6]:
error = lambda x, y: (y - (a + b * x))**2
sse = sum([error(i,j) for i, j in zip(x,y)])
print('SSE = {}'.format(sse))

SSE = 6.4


For each point

In [7]:
for i, j in zip(x,y):
print('point: {}\terror: {:.2f}'.format((i,j), np.sqrt(error(i,j))))

point: (0, 0)	error: 0.80
point: (5, 7)	error: 1.60
point: (10, 10)	error: 0.00
point: (15, 13)	error: 1.60
point: (20, 20)	error: 0.80


In [8]:
!whoami && date

Aziz
Mon Dec 14 23:02:58 EST 2015