Google News
logo
SciPy - Interview Questions
What about SciPy Cluster's and how to Implementation in SciPy K-Means!
K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data. Intuitively, we might think of a cluster as – comprising of a group of data points, whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm iterates the following two steps :
 
* For each center, the subset of training points (its cluster) that is closer to it is identified than any other center.
 
* The mean of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.
 
 
K-Means Implementation in SciPy :  We will understand how to implement K-Means in SciPy.
 
Import K-Means : We will see the implementation and usage of each imported function.
 
from SciPy.cluster.vq import kmeans,vq,whiten

Data generation : We have to simulate some data to explore the clustering.
 
from numpy import vstack,array
from numpy.random import rand

# data generation with three features
data = vstack((rand(100,3) + array([.5,.5,.5]),rand(100,3)))
 
Now, we have to check for data. The above program will generate the following output.
 
array([[ 1.48598868e+00, 8.17445796e-01, 1.00834051e+00],
       [ 8.45299768e-01, 1.35450732e+00, 8.66323621e-01],
       [ 1.27725864e+00, 1.00622682e+00, 8.43735610e-01],
............
Advertisement