Bayesian Decision Theory

Bayesian Classifier for 2D Classification Problem:

The decision boundary is derived by sovling the following linear equation system:

$ g_{ij}(x) = g_i(x) - g_j(x) = 0$

Where \(g(x)\) is the general form of discriminant functions for (multivariate) normal density; and defined as follows (based on 3rd special case in ch.2 Duda et al):

\[ g_{i}(x) = x^t W_{i} x + w_{i}^t x + \omega_{0i}\]

and,

\[W_i = -\frac{1}{2}\Sigma^{-1}_i\]

\[w_i = \Sigma^{-1}_i \mu_i\] threshold \[ \omega_{oi} = - \frac{1}{2} \mu_i^t \Sigma^{-1}_i \mu_i - \frac{1}{2} \ln |\Sigma^{-1}| + ln P(\omega_i) \]

Note:

Solving for the general form (above) applies for all other scenarios (i.e. when: \(\Sigma_i = \sigma^2 I\) or \(\Sigma_i = \Sigma \; \forall i\))

in this case, assuming equal priors so \(ln\;P(\omega_i)\) is eliminated.

and

\(\Sigma = \begin{bmatrix} \sigma_{11}&\sigma_{12}\\ \sigma_{21}&\sigma_{22} \end{bmatrix}\) is the covariance matrix of a training sample.

\(\mu = \begin{bmatrix}x1\\x2 \end{bmatrix}\) is the sample mean vecotr of a training sample.

Concepts

IDEA:

In parametric techniques estimation, we use a training sample to estimate the parameters: mean \(\mu\) and variance \(\sigma^2\) (or, standard deviation \(\sigma\)) in order to estimate the overall distribution for some random variables.

That is, if we know \(\mu\) and \(\sigma\) of any distribution, we can (by employing gaussian distribution property.): - estimate its density "likelihood" that any random variable (of that distribution) can take.

Definitions:

pdf: The probability density function, or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. This given by the integral of the variables density over the range. [Wikipedia]

Mean: a weighted average of the possible values that the random variable X can take. - Expected value of a random variable is its measure of central tendency.

Variance: Measures how far set of numbers or variables is spread out. So the variance of the random variable X is the expected value of the squared deviation from the mean.

Standard Deviation: is the positive square root of the variance.

See it in practie with different examples of datasets with the code below:

In [4]:
__author__ = "A.Aziz Altowayan"
__email__ = "aa10212w@pace.edu"
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sympy
from random import randrange
import seaborn as sns
In [5]:
class DescisionBoundary(object):
    """
        input: class1 and class2 [both classes are the transpose of the 2D data points matrix]
        calculate and plot the decision boundary of 2D distributions, binary classification method.
    """
    
    def __init__(self, class1, class2):
        
        self.c1 = class1
        self.mu1 = np.array([ [np.mean(self.c1[0])], [np.mean(self.c1[1])] ])
        self.cov1 = np.cov(self.c1, bias=1)
        self.cov1_inv = np.linalg.inv(self.cov1)
        self.det1 = np.linalg.det(self.cov1)
        
        self.c2 = class2
        self.mu2 = np.array([ [np.mean(self.c2[0])], [np.mean(self.c2[1])] ])
        self.cov2 = np.cov(self.c2, bias=1)
        self.cov2_inv = np.linalg.inv(self.cov2)
        self.det2 = np.linalg.det(self.cov2)
        
    def Wi(self, c=0):
        """return: a matrix,  -  1/ 2 * inverse of cov_matrix """
        if c==1:
            return float(-1)/2 * self.cov1_inv
        elif c==2:
            return float(-1)/2 * self.cov2_inv
        else:
            print("pls, choose category Wi(c=1) or Wi(c=2)?")
            
    def wi(self, c=0):
        """ return: a matrix, (inverse of cov_matrix) * (mu) """
        if c==1:
            return np.matrix(self.cov1_inv) * np.matrix(self.mu1)
        elif c==2:
            return np.matrix(self.cov2_inv) * np.matrix(self.mu2)
        else:
            print("pls, choose class wi(c=1) or wi(c=2)?")

    def threshold(self, c=0):
        """ return: the threshold defined above without the prior """
        if c==1:
            cov, mu, inv, det = self.cov1, self.mu1, self.cov1_inv, self.det1
        elif c==2:
            cov, mu, inv, det = self.cov2, self.mu2, self.cov2_inv, self.det2
        else:
            print("pls, choose class threshold(c=1) or threshold(c=2)?")
            return
        t = float(-1)/2 * np.matrix(mu.T) * np.matrix(inv) * np.matrix(mu) - float(1)/2 * np.log(det)
        return float(t)


    # model (based on sy module)
    def disc_function(self, c=0):
        """ return: (sympy) expression for the discriminant function gi(x) = x^t * Wi * x + wi^t + wi0 """
        if c==1:
            W, w, thresh = self.Wi(c=1), self.wi(c=1), self.threshold(c=1)
        elif c==2:
            W, w, thresh = self.Wi(c=2), self.wi(c=2), self.threshold(c=2)
        else:
            print("pls, choose class disc_function(c=1) or disc_function(c=2)?")
            return
        
        dimensions = range(1, len(W) + 1)
        x = sympy.Matrix([ 'x{}'.format(n) for n in dimensions ])
        sym_matrix = (x.T * W * x ) + (w.T * x)  
        gx = sym_matrix[0] + thresh
        
        return gx
    
    def get_model(self, g, poly=0):
        """ input: sympy expr g1(x) - g2(x)
            return: sympy expr solution for x2 """

        if poly:    self.poly = 1
        else:       self.poly = 0
        
        if not poly:
            x2 = sympy.solve(g)[0]
            self.model = list(x2.values())[0]
        elif poly:
            if len(sympy.solve(g)) > 1:
                x1 = sympy.solve(g)[1]
            else:
                x1 = sympy.solve(g)[0]
            self.model = list(x1.values())[0]
        return self.model
    
    def get_decision(self, x, model):
        """ input: x1 values and the model (sympy expr equation, form e.g. x2 = 3x1**2 - 2x1 )
            return: x2 values correlated to x1"""

        expr = model #.evalf(2)
        if not self.poly:
            # solved expr of form e.g.) x2 = .. 3x1 - x1 + 14 ..
            x1 = sympy.symbols('x1')
            f = sympy.lambdify(x1, expr, "numpy")
            result = f(x)
            return result
        elif self.poly:
            # solved expr of form e.g.) x1 = .. 3x2**2 - 3x2 + 14 ..
            x2 = sympy.symbols('x2')
            f = sympy.lambdify(x2, expr, "numpy")
            result = f(x)
            return result

    # Utilities functions (Plotting and data)
    def plot_boundary(self, x1, x2, density=0, lab1="class 1", lab2="class 2"):
        """ input: x1 and x2 values, 
            optional: publish boolean to publish plot with decision boundary."""
        
        ax = plt.subplot()
        plt.title("Decision boundary between two classes")
        plt.scatter(self.c1[0],self.c1[1], c='r', label='{}'.format(lab1))
        plt.scatter(self.c2[0],self.c2[1], c='b', label='{}'.format(lab2))
        ax.plot(x1, x2, 'k--', lw=1, label='Decision boundary')

        if density:
            c1 = np.random.multivariate_normal(self.mu1.T[0], self.cov1, size=1000)
            sns.kdeplot(c1)
            c2 = np.random.multivariate_normal(self.mu2.T[0], self.cov2, size=1000)
            sns.kdeplot(c2)
        plt.legend(bbox_to_anchor = (1.5, 1))
        plt.show()
In [2]:
def get_data(case='random', x1=[10,40],y1=[10,50],x2=[10,40],y2=[40,90],n=10):
    """return 2-D sample dataset for two classes, v1 and v2 for each class.
    case: dataset choice from below, ranodm is default
    """

    if case=='random':
        """random data points, the classes distributed between values: x1, y1 (class1) and x2, y2 (class2)"""
        [x11,x12],[y11,y12] = x1, y1
        [x21,x22],[y21,y22] = x2, y2
        data1 = np.array([[randrange(x11,x12), randrange(y11,y12)] for _ in range(n)]).T
        data2 = np.array([[randrange(x21,x22), randrange(y21,y22)] for _ in range(n)]).T
        return data1, data2
    
    if case==1:
        """[case 1 in duda] each feature has the same covariance: ∑i = `\sigma^2`"""
        """diagonal covariance proportional"""
        d1 = np.array([[3,8], [2,6], [4,6], [3,4]]).T
        d2 = np.array([[9,24], [6,18],[12,18], [9,12]]).T
        return d1, d2

    if case==2:
        """[case 2 in duda] data points yeild identical covariances for both classes """
        d1 = np.array([[4,8],[8,8],[6,10],[6,6]]).T
        d2 = np.array([[14,18],[18,18],[16,20],[16,16]]).T
        return d1, d2
    
    if case=="duda":
        # Duda Ch.2 Ex1 dataset
        d1 = np.array([[3,8], [2,6], [4,6], [3,4]]).T    
        d2 = np.array([[3,0], [1,-2], [5,-2], [3,-4]]).T
        return d1, d2
    if case=="buffalo":
        # HW2 of Buffalo course
        d1 = np.array([[0,0], [0,1], [2,2], [3,1], [3,2], [3,3]]).T
        d2 = np.array([[6,9], [8,9], [9,8], [9,9], [9,10],[8,11]]).T
        return d1, d2
    if case=="midterm":
        # CS855 Midterm training samples for a three-class, 2D problem
        d1 = np.array([[0,4],[-4,0],[4,0],[0,-4]]).T
        d2 = np.array([[-6,17],[-10,13],[-2,13],[-6,9]]).T
        d3 = np.array([[10,14],[6,10],[14,10],[10,6]]).T
        return d1, d2, d3
    else:
        print("invalide data case entry")

Start From here:

To demo:

  • Get and plot training set.
  • Show g(x) of each dataset sample [random | 1 'proportional cov' | 2 'exact cov' | "duda" | "buffalo" | "midter"]
  • Show the solution of gx1 - gx2
  • Plot the decision boundary.

repeate with different datasets.

  • The messed up samples (with plotting density).
  • Classifying 3-classes (midterm problem 2) (using three instances of DecisionBoundary() class for 3 classifiers between c1-c2, c2-c3 and c1-c3)
In [6]:
# w1, w2 = get_data(x1=[0,10],x2=[0,20],y1=[-50,0],y2=[-30,0]) # messed up samples
# w1, w2, w3 = get_data(case="midterm") # 
w1, w2 = get_data(case="buffalo") # case=[random | 1 | 2 | "duda" | "buffalo" | "midterm"]
poly = 1
d = DescisionBoundary(w1, w2)
gx1 = d.disc_function(c=1)
gx2 = d.disc_function(c=2)
g = gx1 - gx2

if poly: model = d.get_model(g, poly=1)
else: model = d.get_model(g, poly=0)
In [8]:
gx1.evalf(2)
Out[8]:
$$x_{1} \left(- 0.56 x_{1} + 0.56 x_{2}\right) + 0.38 x_{1} + x_{2} \left(0.56 x_{1} - 1.1 x_{2}\right) + 1.3 x_{2} - 1.2$$
In [9]:
gx2.evalf(2)
Out[9]:
$$x_{1} \left(- 0.44 x_{1} - 0.028 x_{2}\right) + 7.7 x_{1} + x_{2} \left(- 0.028 x_{1} - 0.56 x_{2}\right) + 11.0 x_{2} - 83.0$$
In [10]:
# solution
model.evalf(2)
Out[10]:
$$4.8 x_{2} + 6.8 \cdot 10^{-17} \left(4.1 \cdot 10^{33} x_{2}^{2} - 7.9 \cdot 10^{34} x_{2} + 3.4 \cdot 10^{35}\right)^{0.5} - 30.0$$

Plot Decsion boundary

In [55]:
minr = min(min(w1[0]), min(w2[0])) - (0)
maxr = max(max(w1[1]), max(w2[1])) + (0)
x = np.arange(minr, maxr, 0.1)
bound = d.get_decision(x, model)
print("cov1:\n{}\ncov2:\n{}".format(d.cov1, d.cov2))
if poly: d.plot_boundary(bound, x, density=0)
else: d.plot_boundary(x, bound, density=0)
if poly: d.plot_boundary(bound, x, density=1)
else: d.plot_boundary(x, bound, density=1)
model.evalf(3)
cov1:
[[ 1.80555556  0.91666667]
 [ 0.91666667  0.91666667]]
cov2:
[[ 1.13888889 -0.05555556]
 [-0.05555556  0.88888889]]

Out[55]:
$$4.83 x_{2} + 6.83 \cdot 10^{-17} \left(4.05 \cdot 10^{33} x_{2}^{2} - 7.93 \cdot 10^{34} x_{2} + 3.36 \cdot 10^{35}\right)^{0.5} - 30.0$$

Midterm Problem 2: classifying 3 classes

By combining three classifiers

In [6]:
w1, w2, w3 = get_data(case="midterm")
In [7]:
d = DescisionBoundary(w1, w3)
g = d.disc_function(c=1) - d.disc_function(c=2)
model = d.get_model(g, poly=1)

d1 = DescisionBoundary(w2, w3)
g1 = d1.disc_function(c=1) - d1.disc_function(c=2)
model_1 = d1.get_model(g1, poly=1)

d2 = DescisionBoundary(w2, w1)
g2 = d2.disc_function(c=1) - d2.disc_function(c=2)
model_2 = d2.get_model(g2, poly=1)
In [20]:
print("cov w1 == w2 == w3:\n{}".format(d.cov1))
print("eig-vals and eig-vecs:\n{}".format(np.linalg.eig(d.cov1)))

x = np.arange(1.5, 20, 0.1)
bound = d.get_decision(x, model)
x1 = np.arange(9, 30, 0.1)
bound1 = d1.get_decision(x1, model_1)
x3 = np.arange(0, 8.5, 0.1)
bound2 = d2.get_decision(x3, model_2)

plt.scatter(w1[0],w1[1], c='r', label='$w1\;points$');plt.scatter(w2[0],w2[1], c='g', label='$w2\;points$');plt.scatter(w3[0],w3[1], c='b', label='$w3\;points$')
plt.plot(x, bound, 'k--', lw=1, label='bound $w1-w3$')
plt.plot(bound1,x1, 'c--', lw=1, label='bound $w2-w3$')
plt.plot(bound2,x3, 'y--', lw=1, label='bound $w1-w2$')

plt.legend()
plt.show()
cov w1 == w2 == w3:
[[ 8.  0.]
 [ 0.  8.]]
eig-vals and eig-vecs:
(array([ 8.,  8.]), array([[ 1.,  0.],
       [ 0.,  1.]]))

In [66]:
# Duda excercise
Duda, ch.2 ex1:
[[ 0.5  0. ]
 [ 0.   2. ]]
[[ 2.  0.]
 [ 0.  2.]]

Out[66]:
$$0.1875 x_{1}^{2} - 1.125 x_{1} + 3.51$$
In [54]:
# Special cases: proportional covarince
proportional covariance:
[[ 0.5  0. ]
 [ 0.   2. ]]
[[  4.5   0. ]
 [  0.   18. ]]

Out[54]:
$$5.0 \cdot 10^{-8} \left(- 1.0 \cdot 10^{14} x_{2}^{2} + 9.0 \cdot 10^{14} x_{2} + 3.01 \cdot 10^{15}\right)^{0.5} + 2.25$$
In [57]:
# Special cases: identical covarinces
identical covariances:
[[ 2.  0.]
 [ 0.  2.]]
[[ 2.  0.]
 [ 0.  2.]]

Out[57]:
$$- x_{2} + 24.0$$
In [61]:
# random dataset
random dataset:
[[  69.84   49.24]
 [  49.24  136.49]]
[[  64.6   72.2]
 [  72.2  224.8]]

Out[61]:
$$0.167 x_{2} + 4.04 \cdot 10^{-15} \left(3.74 \cdot 10^{28} x_{2}^{2} + 1.28 \cdot 10^{30} x_{2} - 8.78 \cdot 10^{31}\right)^{0.5} - 11.3$$
In [51]:
# Messed up dataset
Messed up dataset:
[[  10.05    5.7 ]
 [   5.7   246.64]]
[[ 27.49   1.54]
 [  1.54  82.24]]

Out[51]:
$$0.0256 x_{2} + 3.1 \cdot 10^{-17} \left(1.31 \cdot 10^{32} x_{2}^{2} + 2.64 \cdot 10^{33} x_{2} + 1.37 \cdot 10^{34}\right)^{0.5} + 3.27$$
In [27]:
# another random
Random dataset:
[[ 91.96 -54.14]
 [-54.14  68.56]]
[[  92.61  -63.97]
 [ -63.97  235.09]]

Out[27]:
$$- 1.77 x_{2} + 2.85 \cdot 10^{-15} \left(- 4.87 \cdot 10^{26} x_{2}^{2} + 1.81 \cdot 10^{30} x_{2} + 1.43 \cdot 10^{32}\right)^{0.5} + 50.4$$
In [31]:
# Nice, but why samples are not completely classified ??
In [30]:
# That's why ! by plotting the distributions densities boundary is clear