### Bayesian Decision Theory

#### Bayesian Classifier for 2D Classification Problem:

The decision boundary is derived by sovling the following linear equation system:

$g_{ij}(x) = g_i(x) - g_j(x) = 0$

Where $g(x)$ is the general form of discriminant functions for (multivariate) normal density; and defined as follows (based on 3rd special case in ch.2 Duda et al):

$g_{i}(x) = x^t W_{i} x + w_{i}^t x + \omega_{0i}$

and,

$W_i = -\frac{1}{2}\Sigma^{-1}_i$

$w_i = \Sigma^{-1}_i \mu_i$ threshold $\omega_{oi} = - \frac{1}{2} \mu_i^t \Sigma^{-1}_i \mu_i - \frac{1}{2} \ln |\Sigma^{-1}| + ln P(\omega_i)$

Note:

Solving for the general form (above) applies for all other scenarios (i.e. when: $\Sigma_i = \sigma^2 I$ or $\Sigma_i = \Sigma \; \forall i$)

in this case, assuming equal priors so $ln\;P(\omega_i)$ is eliminated.

and

$\Sigma = \begin{bmatrix} \sigma_{11}&\sigma_{12}\\ \sigma_{21}&\sigma_{22} \end{bmatrix}$ is the covariance matrix of a training sample.

$\mu = \begin{bmatrix}x1\\x2 \end{bmatrix}$ is the sample mean vecotr of a training sample.

## Concepts

IDEA:

In parametric techniques estimation, we use a training sample to estimate the parameters: mean $\mu$ and variance $\sigma^2$ (or, standard deviation $\sigma$) in order to estimate the overall distribution for some random variables.

That is, if we know $\mu$ and $\sigma$ of any distribution, we can (by employing gaussian distribution property.): - estimate its density "likelihood" that any random variable (of that distribution) can take.

Definitions:

pdf: The probability density function, or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. This given by the integral of the variables density over the range. [Wikipedia]

Mean: a weighted average of the possible values that the random variable X can take. - Expected value of a random variable is its measure of central tendency.

Variance: Measures how far set of numbers or variables is spread out. So the variance of the random variable X is the expected value of the squared deviation from the mean.

Standard Deviation: is the positive square root of the variance.

## See it in practie with different examples of datasets with the code below:

In [4]:
__author__ = "A.Aziz Altowayan"
__email__ = "aa10212w@pace.edu"

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sympy
from random import randrange
import seaborn as sns

In [5]:
class DescisionBoundary(object):
"""
input: class1 and class2 [both classes are the transpose of the 2D data points matrix]
calculate and plot the decision boundary of 2D distributions, binary classification method.
"""

def __init__(self, class1, class2):

self.c1 = class1
self.mu1 = np.array([ [np.mean(self.c1[0])], [np.mean(self.c1[1])] ])
self.cov1 = np.cov(self.c1, bias=1)
self.cov1_inv = np.linalg.inv(self.cov1)
self.det1 = np.linalg.det(self.cov1)

self.c2 = class2
self.mu2 = np.array([ [np.mean(self.c2[0])], [np.mean(self.c2[1])] ])
self.cov2 = np.cov(self.c2, bias=1)
self.cov2_inv = np.linalg.inv(self.cov2)
self.det2 = np.linalg.det(self.cov2)

def Wi(self, c=0):
"""return: a matrix,  -  1/ 2 * inverse of cov_matrix """
if c==1:
return float(-1)/2 * self.cov1_inv
elif c==2:
return float(-1)/2 * self.cov2_inv
else:
print("pls, choose category Wi(c=1) or Wi(c=2)?")

def wi(self, c=0):
""" return: a matrix, (inverse of cov_matrix) * (mu) """
if c==1:
return np.matrix(self.cov1_inv) * np.matrix(self.mu1)
elif c==2:
return np.matrix(self.cov2_inv) * np.matrix(self.mu2)
else:
print("pls, choose class wi(c=1) or wi(c=2)?")

def threshold(self, c=0):
""" return: the threshold defined above without the prior """
if c==1:
cov, mu, inv, det = self.cov1, self.mu1, self.cov1_inv, self.det1
elif c==2:
cov, mu, inv, det = self.cov2, self.mu2, self.cov2_inv, self.det2
else:
print("pls, choose class threshold(c=1) or threshold(c=2)?")
return
t = float(-1)/2 * np.matrix(mu.T) * np.matrix(inv) * np.matrix(mu) - float(1)/2 * np.log(det)
return float(t)

# model (based on sy module)
def disc_function(self, c=0):
""" return: (sympy) expression for the discriminant function gi(x) = x^t * Wi * x + wi^t + wi0 """
if c==1:
W, w, thresh = self.Wi(c=1), self.wi(c=1), self.threshold(c=1)
elif c==2:
W, w, thresh = self.Wi(c=2), self.wi(c=2), self.threshold(c=2)
else:
print("pls, choose class disc_function(c=1) or disc_function(c=2)?")
return

dimensions = range(1, len(W) + 1)
x = sympy.Matrix([ 'x{}'.format(n) for n in dimensions ])
sym_matrix = (x.T * W * x ) + (w.T * x)
gx = sym_matrix[0] + thresh

return gx

def get_model(self, g, poly=0):
""" input: sympy expr g1(x) - g2(x)
return: sympy expr solution for x2 """

if poly:    self.poly = 1
else:       self.poly = 0

if not poly:
x2 = sympy.solve(g)[0]
self.model = list(x2.values())[0]
elif poly:
if len(sympy.solve(g)) > 1:
x1 = sympy.solve(g)[1]
else:
x1 = sympy.solve(g)[0]
self.model = list(x1.values())[0]
return self.model

def get_decision(self, x, model):
""" input: x1 values and the model (sympy expr equation, form e.g. x2 = 3x1**2 - 2x1 )
return: x2 values correlated to x1"""

expr = model #.evalf(2)
if not self.poly:
# solved expr of form e.g.) x2 = .. 3x1 - x1 + 14 ..
x1 = sympy.symbols('x1')
f = sympy.lambdify(x1, expr, "numpy")
result = f(x)
return result
elif self.poly:
# solved expr of form e.g.) x1 = .. 3x2**2 - 3x2 + 14 ..
x2 = sympy.symbols('x2')
f = sympy.lambdify(x2, expr, "numpy")
result = f(x)
return result

# Utilities functions (Plotting and data)
def plot_boundary(self, x1, x2, density=0, lab1="class 1", lab2="class 2"):
""" input: x1 and x2 values,
optional: publish boolean to publish plot with decision boundary."""

ax = plt.subplot()
plt.title("Decision boundary between two classes")
plt.scatter(self.c1[0],self.c1[1], c='r', label='{}'.format(lab1))
plt.scatter(self.c2[0],self.c2[1], c='b', label='{}'.format(lab2))
ax.plot(x1, x2, 'k--', lw=1, label='Decision boundary')

if density:
c1 = np.random.multivariate_normal(self.mu1.T[0], self.cov1, size=1000)
sns.kdeplot(c1)
c2 = np.random.multivariate_normal(self.mu2.T[0], self.cov2, size=1000)
sns.kdeplot(c2)
plt.legend(bbox_to_anchor = (1.5, 1))
plt.show()

In [2]:
def get_data(case='random', x1=[10,40],y1=[10,50],x2=[10,40],y2=[40,90],n=10):
"""return 2-D sample dataset for two classes, v1 and v2 for each class.
case: dataset choice from below, ranodm is default
"""

if case=='random':
"""random data points, the classes distributed between values: x1, y1 (class1) and x2, y2 (class2)"""
[x11,x12],[y11,y12] = x1, y1
[x21,x22],[y21,y22] = x2, y2
data1 = np.array([[randrange(x11,x12), randrange(y11,y12)] for _ in range(n)]).T
data2 = np.array([[randrange(x21,x22), randrange(y21,y22)] for _ in range(n)]).T
return data1, data2

if case==1:
"""[case 1 in duda] each feature has the same covariance: âˆ‘i = \sigma^2"""
"""diagonal covariance proportional"""
d1 = np.array([[3,8], [2,6], [4,6], [3,4]]).T
d2 = np.array([[9,24], [6,18],[12,18], [9,12]]).T
return d1, d2

if case==2:
"""[case 2 in duda] data points yeild identical covariances for both classes """
d1 = np.array([[4,8],[8,8],[6,10],[6,6]]).T
d2 = np.array([[14,18],[18,18],[16,20],[16,16]]).T
return d1, d2

if case=="duda":
# Duda Ch.2 Ex1 dataset
d1 = np.array([[3,8], [2,6], [4,6], [3,4]]).T
d2 = np.array([[3,0], [1,-2], [5,-2], [3,-4]]).T
return d1, d2
if case=="buffalo":
# HW2 of Buffalo course
d1 = np.array([[0,0], [0,1], [2,2], [3,1], [3,2], [3,3]]).T
d2 = np.array([[6,9], [8,9], [9,8], [9,9], [9,10],[8,11]]).T
return d1, d2
if case=="midterm":
# CS855 Midterm training samples for a three-class, 2D problem
d1 = np.array([[0,4],[-4,0],[4,0],[0,-4]]).T
d2 = np.array([[-6,17],[-10,13],[-2,13],[-6,9]]).T
d3 = np.array([[10,14],[6,10],[14,10],[10,6]]).T
return d1, d2, d3
else:
print("invalide data case entry")


# Start From here:

## To demo:

• Get and plot training set.
• Show g(x) of each dataset sample [random | 1 'proportional cov' | 2 'exact cov' | "duda" | "buffalo" | "midter"]
• Show the solution of gx1 - gx2
• Plot the decision boundary.

repeate with different datasets.

• The messed up samples (with plotting density).
• Classifying 3-classes (midterm problem 2) (using three instances of DecisionBoundary() class for 3 classifiers between c1-c2, c2-c3 and c1-c3)
In [6]:
# w1, w2 = get_data(x1=[0,10],x2=[0,20],y1=[-50,0],y2=[-30,0]) # messed up samples
# w1, w2, w3 = get_data(case="midterm") #
w1, w2 = get_data(case="buffalo") # case=[random | 1 | 2 | "duda" | "buffalo" | "midterm"]
poly = 1
d = DescisionBoundary(w1, w2)
gx1 = d.disc_function(c=1)
gx2 = d.disc_function(c=2)
g = gx1 - gx2

if poly: model = d.get_model(g, poly=1)
else: model = d.get_model(g, poly=0)

In [8]:
gx1.evalf(2)

Out[8]:
$$x_{1} \left(- 0.56 x_{1} + 0.56 x_{2}\right) + 0.38 x_{1} + x_{2} \left(0.56 x_{1} - 1.1 x_{2}\right) + 1.3 x_{2} - 1.2$$
In [9]:
gx2.evalf(2)

Out[9]:
$$x_{1} \left(- 0.44 x_{1} - 0.028 x_{2}\right) + 7.7 x_{1} + x_{2} \left(- 0.028 x_{1} - 0.56 x_{2}\right) + 11.0 x_{2} - 83.0$$
In [10]:
# solution
model.evalf(2)

Out[10]:
$$4.8 x_{2} + 6.8 \cdot 10^{-17} \left(4.1 \cdot 10^{33} x_{2}^{2} - 7.9 \cdot 10^{34} x_{2} + 3.4 \cdot 10^{35}\right)^{0.5} - 30.0$$

## Plot Decsion boundary¶

In [55]:
minr = min(min(w1[0]), min(w2[0])) - (0)
maxr = max(max(w1[1]), max(w2[1])) + (0)
x = np.arange(minr, maxr, 0.1)
bound = d.get_decision(x, model)
print("cov1:\n{}\ncov2:\n{}".format(d.cov1, d.cov2))
if poly: d.plot_boundary(bound, x, density=0)
else: d.plot_boundary(x, bound, density=0)
if poly: d.plot_boundary(bound, x, density=1)
else: d.plot_boundary(x, bound, density=1)
model.evalf(3)

cov1:
[[ 1.80555556  0.91666667]
[ 0.91666667  0.91666667]]
cov2:
[[ 1.13888889 -0.05555556]
[-0.05555556  0.88888889]]


Out[55]:
$$4.83 x_{2} + 6.83 \cdot 10^{-17} \left(4.05 \cdot 10^{33} x_{2}^{2} - 7.93 \cdot 10^{34} x_{2} + 3.36 \cdot 10^{35}\right)^{0.5} - 30.0$$

### Midterm Problem 2: classifying 3 classes

By combining three classifiers

In [6]:
w1, w2, w3 = get_data(case="midterm")

In [7]:
d = DescisionBoundary(w1, w3)
g = d.disc_function(c=1) - d.disc_function(c=2)
model = d.get_model(g, poly=1)

d1 = DescisionBoundary(w2, w3)
g1 = d1.disc_function(c=1) - d1.disc_function(c=2)
model_1 = d1.get_model(g1, poly=1)

d2 = DescisionBoundary(w2, w1)
g2 = d2.disc_function(c=1) - d2.disc_function(c=2)
model_2 = d2.get_model(g2, poly=1)

In [20]:
print("cov w1 == w2 == w3:\n{}".format(d.cov1))
print("eig-vals and eig-vecs:\n{}".format(np.linalg.eig(d.cov1)))

x = np.arange(1.5, 20, 0.1)
bound = d.get_decision(x, model)
x1 = np.arange(9, 30, 0.1)
bound1 = d1.get_decision(x1, model_1)
x3 = np.arange(0, 8.5, 0.1)
bound2 = d2.get_decision(x3, model_2)

plt.scatter(w1[0],w1[1], c='r', label='$w1\;points$');plt.scatter(w2[0],w2[1], c='g', label='$w2\;points$');plt.scatter(w3[0],w3[1], c='b', label='$w3\;points$')
plt.plot(x, bound, 'k--', lw=1, label='bound $w1-w3$')
plt.plot(bound1,x1, 'c--', lw=1, label='bound $w2-w3$')
plt.plot(bound2,x3, 'y--', lw=1, label='bound $w1-w2$')

plt.legend()
plt.show()

cov w1 == w2 == w3:
[[ 8.  0.]
[ 0.  8.]]
eig-vals and eig-vecs:
(array([ 8.,  8.]), array([[ 1.,  0.],
[ 0.,  1.]]))


In [66]:
# Duda excercise

Duda, ch.2 ex1:
[[ 0.5  0. ]
[ 0.   2. ]]
[[ 2.  0.]
[ 0.  2.]]


Out[66]:
$$0.1875 x_{1}^{2} - 1.125 x_{1} + 3.51$$
In [54]:
# Special cases: proportional covarince

proportional covariance:
[[ 0.5  0. ]
[ 0.   2. ]]
[[  4.5   0. ]
[  0.   18. ]]


Out[54]:
$$5.0 \cdot 10^{-8} \left(- 1.0 \cdot 10^{14} x_{2}^{2} + 9.0 \cdot 10^{14} x_{2} + 3.01 \cdot 10^{15}\right)^{0.5} + 2.25$$
In [57]:
# Special cases: identical covarinces

identical covariances:
[[ 2.  0.]
[ 0.  2.]]
[[ 2.  0.]
[ 0.  2.]]


Out[57]:
$$- x_{2} + 24.0$$
In [61]:
# random dataset

random dataset:
[[  69.84   49.24]
[  49.24  136.49]]
[[  64.6   72.2]
[  72.2  224.8]]


Out[61]:
$$0.167 x_{2} + 4.04 \cdot 10^{-15} \left(3.74 \cdot 10^{28} x_{2}^{2} + 1.28 \cdot 10^{30} x_{2} - 8.78 \cdot 10^{31}\right)^{0.5} - 11.3$$
In [51]:
# Messed up dataset

Messed up dataset:
[[  10.05    5.7 ]
[   5.7   246.64]]
[[ 27.49   1.54]
[  1.54  82.24]]


Out[51]:
$$0.0256 x_{2} + 3.1 \cdot 10^{-17} \left(1.31 \cdot 10^{32} x_{2}^{2} + 2.64 \cdot 10^{33} x_{2} + 1.37 \cdot 10^{34}\right)^{0.5} + 3.27$$
In [27]:
# another random

Random dataset:
[[ 91.96 -54.14]
[-54.14  68.56]]
[[  92.61  -63.97]
[ -63.97  235.09]]


Out[27]:
$$- 1.77 x_{2} + 2.85 \cdot 10^{-15} \left(- 4.87 \cdot 10^{26} x_{2}^{2} + 1.81 \cdot 10^{30} x_{2} + 1.43 \cdot 10^{32}\right)^{0.5} + 50.4$$
In [31]:
# Nice, but why samples are not completely classified ??

In [30]:
# That's why ! by plotting the distributions densities boundary is clear