
Fast Kmeans 1.0 File ID: 81446 


 Fast Kmeans 1.0 License: Shareware File Size: 10.0 KB Downloads: 130
Submit Rating: 



Fast Kmeans 1.0 Description 

Description: [L, C, D] = FKMEANS(X, k) partitions the vectors in the nbyp matrix X into k (or, rarely, fewer) clusters by applying the well known batch Kmeans algorithm. Rows of X correspond to points, columns correspond to variables. The output kbyp matrix C contains the cluster centroids. The nelement output column vector L contains the cluster label of each point. The kelement output column vector D contains the residual cluster distortions as measured by total squared distance of cluster members from the centroid.
FKMEANS(X, C0) where C0 is a kbyp matrix uses the rows of C0 as the initial centroids instead of choosing them randomly from X.
FKMEANS(X, k, options) allows optional parameter name/value pairs to be specified. Parameters are:
'weight'  nby1 weight vector used to adjust centroid and distortion calculations. Weights should be positive. 'careful'  binary option that determines whether "careful seeding" as recommended by Arthur and Vassilvitskii is used when choosing initial centroids. This option should be used with care because numerical experiments suggest it may be counterproductive when the data is noisy.
NOTES
(1) The careful seeding procedure chooses the first centroid at random from X, and each successive centroid from the remaining points according to the categorical distribution with selection probabilities proportional to the point's minimum squared Euclidean distance from the already chosen centroids. This tends to spread the points out more evenly, and, if the data is made of k well separated clusters, is likely to choose an initial centroid from each cluster. This can speed convergence and reduce the likelihood of getting a bad solution [1]. However, in experiments where 5% uniformly distributed noise data was added to such naturally clustered data the results were frequently worse then when centroids were chosen at random.
(2) If, as is possible, a cluster is empty at the end of an iteration, then there may be fewer than k clusters returned. In practice this seems to happen very rarely.
(3) Unlike the Mathworks KMEANS this implementation does not perform a final, slow, phase of incremental Kmeans ('onlinephase') that guarantees convergence to a local minimum.
References [1] "kmeans++: The Advantages of Careful Seeding", by David Arthur and Sergei Vassilvitskii, SODA 2007. 
More Similar Code 

The kmeans algorithm is widely used in a number applications like speech processing and image compression.
This script implements the algorithm in a simple but general way. It performs four basic steps.
1. Define k arbitrary prototypes from the data samples. 2. Assign each sample to the nearest prototype. 3. Recalculate prototypes as arithmetic means. 4. If a prototype changes, repeat step (2).
Codes for fuzzy k means clustering, including k means with extragrades, Gustafson Kessel algorithm, fuzzy linear discriminant analysis. Performance measure is also calculated.
Hard and soft kmeans implemented simply in python (with numpy). Quick and dirty, tested and works on large (10k+ observations, 210 features) realworld data.
An implementation of "kMeans Projective Clustering" by P. K. Agarwal and N. H. Mustafa.
This method of clustering is based on finding few subspaces such that each point is close to a subspace.
Kmeans image segmentation based on histogram to reduce memory usage which is constant for any image size.
This is a tool for Kmeans clustering. After trying several different ways to program, I got the conclusion that using simple loops to perform distance calculation and comparison is most efficient and accurate because of the JIT acceleration in...
Usage: [means,c]=KNMCluster(k,indata)
KNMCluster is an implementation of the Kmeans clustering algorithm. It takes inputs k and indata. k is the initial guess of the number of clusters.
indata is the aggregate data that you...
Description DC is simple and effective which can outperform the Kmeans and AP algorithm.
PBKM is simple and effective which can outperform the Kmeans algorithm.
This is an implementation of the paper kmeans++: the advantages of careful seeding.
It converges very quickly. 
User Review for Fast Kmeans 
