In [1]:

```
%matplotlib inline
from sklearn.cluster import k_means
import matplotlib.pyplot as plt
import pandas as pd
```

In [2]:

```
m = [[-6,4], [-3,7], [1,6], [-4,0], [0,-1], [11,7], [8,3], [13,3], [8,-1], [12,-2]]
df = pd.DataFrame(m, columns=['x', 'y'])
```

In [3]:

```
df.shape # (n_sample, n_features)
```

Out[3]:

In [4]:

```
def plot_clusters(cent, labels):
plt.scatter(cent[:,0], cent[:,1], marker='D', c='c', label='centroid')
plt.scatter(df[labels==0].x, df[labels==0].y, c='r', label='cluster 1')
plt.scatter(df[labels==1].x, df[labels==1].y, c='b', label='cluster 2')
plt.grid();plt.legend(bbox_to_anchor = (1.5, 1));plt.show()
```

In [5]:

```
import math
def distance(p0, p1):
return math.sqrt((p0[0] - p1[0])**2 + (p0[1] - p1[1])**2)
```

In [6]:

```
# seed 1
s1 = pd.DataFrame([[-3,7], [11,7]])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
```

In [7]:

```
plot_clusters(centroid, labels)
```

In [8]:

```
# seed 2
s1 = pd.DataFrame([[8,3], [8,-1]])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
```

In [9]:

```
plot_clusters(centroid, labels)
```

In [10]:

```
# seed 3
s1 = pd.DataFrame([[0,-1], [11,7]])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
```

In [11]:

```
plot_clusters(centroid, labels)
```

As we see from the result above, there are **two ways** (depending on the starting points) to cluster this dataset into two clusters:

- Right cluster (5pts) and Left cluster (5pts), where the final centroids are (10.4, 2) and (-2.4, 3.2).
- Upper cluster (6pts) and Lower cluster (4pts), where the final centroids are (4, 5) and (4, -1).

In [12]:

```
from itertools import combinations
len(list(combinations(df.values, 2)))
```

Out[12]:

**k_means()** converges at; to yield Left and Right clusters, we will count the number of starting pairs that converge at those two points.

In [13]:

```
from itertools import combinations
mids = []
for a,b in combinations(df.values, 2):
s1 = pd.DataFrame([a,b])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
midpoint = (centroid[1:] + centroid[:-1]) / 2
mids.append(midpoint)
```

In [14]:

```
mids
```

Out[14]:

In [15]:

```
leftRight = [ 4, 2.6] # midpoint of [-2.4, 3.2], [10.4, 2]
upDown = [4, 2] # midpoint of [4, 5], [4, -1]
```

In [16]:

```
far = pd.DataFrame(columns=('point1', 'point2', 'distance'))
i = 1
for p1, p2 in combinations(df.values, 2):
d = distance(p1,p2)
if d >= 10:
p = pd.DataFrame(p1,p2)
far.loc[i] = p1, p2, d
i += 1
```

In [17]:

```
far
```

Out[17]:

In [18]:

```
from itertools import combinations
mids = []
for a,b in combinations(df.values, 2):
d = distance(a, b)
if d >= 10:
s1 = pd.DataFrame([a,b])
centroid, labels, iner = k_means(df, 2, init=s1, n_init=1)
midpoint = (centroid[1:] + centroid[:-1]) / 2
mids.append(midpoint)
```

In [19]:

```
mids
```

Out[19]: