In [1]:
%matplotlib inline
from sklearn.cluster import k_means
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
m = [[-6,4], [-3,7], [1,6], [-4,0], [0,-1], [11,7], [8,3], [13,3], [8,-1], [12,-2]]
df = pd.DataFrame(m, columns=['x', 'y'])

In [3]:
df.shape # (n_sample, n_features)

Out[3]:
(10, 2)

## Support functions¶

In [4]:
def plot_clusters(cent, labels):
plt.scatter(cent[:,0], cent[:,1], marker='D', c='c', label='centroid')
plt.scatter(df[labels==0].x, df[labels==0].y, c='r', label='cluster 1')
plt.scatter(df[labels==1].x, df[labels==1].y, c='b', label='cluster 2')
plt.grid();plt.legend(bbox_to_anchor = (1.5, 1));plt.show()

In [5]:
import math
def distance(p0, p1):
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)


### 1) starting points pair: [-3, 7], [11, 7]¶

In [6]:
# seed 1
s1 = pd.DataFrame([[-3,7], [11,7]])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)

/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean

In [7]:
plot_clusters(centroid, labels)


### 2) starting points pair: (8, 3) and (8, -1)¶

In [8]:
# seed 2
s1 = pd.DataFrame([[8,3], [8,-1]])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)

/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean

In [9]:
plot_clusters(centroid, labels)


### 3) starting points pair: (0, -1) and (11,7)¶

In [10]:
# seed 3
s1 = pd.DataFrame([[0,-1], [11,7]])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)

/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean

In [11]:
plot_clusters(centroid, labels)


### 4)¶

As we see from the result above, there are two ways (depending on the starting points) to cluster this dataset into two clusters:

1. Right cluster (5pts) and Left cluster (5pts), where the final centroids are (10.4, 2) and (-2.4, 3.2).
2. Upper cluster (6pts) and Lower cluster (4pts), where the final centroids are (4, 5) and (4, -1).

### 5) The dataset has 10 data points, thus the possible combinations of 2 is $\frac{10!}{2!(8)!} = {45}$¶

In [12]:
from itertools import combinations
len(list(combinations(df.values, 2)))

Out[12]:
45

Now since we know that the centroids (10.4, 2) and (-2.4, 3.2) are the final means that k_means() converges at; to yield Left and Right clusters, we will count the number of starting pairs that converge at those two points.

In [13]:
from itertools import combinations
mids = []
for a,b in combinations(df.values, 2):
s1 = pd.DataFrame([a,b])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
midpoint = (centroid[1:] + centroid[:-1]) / 2
mids.append(midpoint)

/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean

In [14]:
mids

Out[14]:
[array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4.,  2.]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4.,  2.]]),
array([[ 4. ,  2.6]]),
array([[ 4.,  2.]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]])]
In [15]:
leftRight = [ 4, 2.6]   # midpoint of [-2.4, 3.2], [10.4, 2]
upDown = [4, 2]         # midpoint of [4, 5], [4, -1]


from the midpoints above we see that 41 points yield Left-Right clustering, while only 4 yield Up-Down clustering.

### 6) Collecting pair points with distance >= 10¶

In [16]:
far = pd.DataFrame(columns=('point1', 'point2', 'distance'))
i = 1
for p1, p2 in combinations(df.values, 2):
d = distance(p1,p2)
if d >= 10:
p = pd.DataFrame(p1,p2)
far.loc[i] = p1, p2, d
i += 1

In [17]:
far

Out[17]:
point1 point2 distance
1 [-6, 4] [11, 7] 17.262677
2 [-6, 4] [8, 3] 14.035669
3 [-6, 4] [13, 3] 19.026298
4 [-6, 4] [8, -1] 14.866069
5 [-6, 4] [12, -2] 18.973666
6 [-3, 7] [11, 7] 14.000000
7 [-3, 7] [8, 3] 11.704700
8 [-3, 7] [13, 3] 16.492423
9 [-3, 7] [8, -1] 13.601471
10 [-3, 7] [12, -2] 17.492856
11 [1, 6] [11, 7] 10.049876
12 [1, 6] [13, 3] 12.369317
13 [1, 6] [12, -2] 13.601471
14 [-4, 0] [11, 7] 16.552945
15 [-4, 0] [8, 3] 12.369317
16 [-4, 0] [13, 3] 17.262677
17 [-4, 0] [8, -1] 12.041595
18 [-4, 0] [12, -2] 16.124515
19 [0, -1] [11, 7] 13.601471
20 [0, -1] [13, 3] 13.601471
21 [0, -1] [12, -2] 12.041595

Now, lets check how many of those 21 pairs yield Left and Right by taking the midpoint of the final centroids

In [18]:
from itertools import combinations
mids = []
for a,b in combinations(df.values, 2):
d = distance(a, b)
if d >= 10:
s1 = pd.DataFrame([a,b])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
midpoint = (centroid[1:] + centroid[:-1]) / 2
mids.append(midpoint)

/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean
/Library/Python/2.7/site-packages/sklearn/cluster/k_means_.py:281: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional.
init -= X_mean

In [19]:
mids

Out[19]:
[array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]]),
array([[ 4. ,  2.6]])]

As we see in the midpoints above, all the 21 pairs yield (4, 2.6) as midpoint which is the midpoint of the centroids (10.4, 2) and (-2.4, 3.2), Therefore all the 21 pairs yield Left-Right clustering