General Regression: Least Squares Approximation

Using the pseudoinverse matrix (of the input variable $x$ as a matrix A)

For a given data points ($x_i$,$y_i$)

The matrix $A$ is generated from $x$ as follows:

$A = [[1, x_1, x_1^2], \dots , [1, x_n, x_n^2]]$

and the pseudoinverse $A^+$ is calculated as

$A^+ = (A^T A)^{-1} A^T$

Then multiply $A^+$ by $y$ to get the approximated coefficients of the function:

$F(x) = c_1 + c_2 x + c_3 x^2$

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [2]:
def approximate(x,y):

# matrix A from x
f = lambda x: [x/x, x, x**2]
a = np.array(f(x)).T

# A+: the pseudoinverse of A
a_plus = np.dot(np.linalg.inv(np.dot(a.T, a)), a.T)

# multiply A+ by y, to obtain the coefficient vector
coeficient = np.dot(a_plus, y)

c1, c2, c3 = coeficient

# the result
fx = 'F(x) = {:.2f} + {:.2f} x + {:.2f} x**2'.format(c1,c2,c3)
print('The approximated solution:\n{}'.format(fx))

# sample the prediction line on a range of x values
x_range = np.arange(x.min()-1, x.max()+1, 0.2)
prediction = c1 + c2 * x_range + c3*x_range**2

# plot the prediction
plt.plot(x, y, 'o')
plt.plot(x_range, prediction)
plt.xlim(x.min()-2, x.max()+2)
plt.ylim(y.min()-2, y.max()+2)
plt.grid()
plt.show()


Example 1

In [3]:
# data points
x = np.array([-1, 1, 2, 3, 5])  # input variable
y = np.array([2, 1, 1, 0, 3])   # outcome

approximate(x,y)

The approximated solution:
F(x) = 1.20 + -0.76 x + 0.21 x**2


Example 2

In [4]:
x = np.array([2,0,-2,-1,1])
y = np.array([2,0,-2,1,-1])

approximate(x,y)

The approximated solution:
F(x) = 0.00 + 0.60 x + 0.00 x**2

/Library/Python/2.7/site-packages/ipykernel/__main__.py:4: RuntimeWarning: divide by zero encountered in divide


Example 3

In [5]:
cars = np.loadtxt(open("../cars.csv","r"), delimiter=",") # speed (mph) vs. Stopping distance (ft00)
x = cars.T[0]
y = cars.T[1]
approximate(x,y)

The approximated solution:
F(x) = 2.47 + 0.91 x + 0.10 x**2

In [6]:
!whoami && date

Aziz
Wed Dec 16 11:15:03 EST 2015