Fast version of kmeans clustering. Cluster the N x p matrix X into k clusters using the kmeans algorithm. It returns the cluster memberships for each data point in the N x 1 vector IDX and the K x p matrix of cluster means in C. This function is in some ways less general than Matlab's kmeans.m (for example it only uses euclidian distance), but it has some options that the Matlab version does not (for example, it has a notion of outliers and min-cluster size). It is also many times faster than matlab's kmeans. General kmeans help can be found in help for the matlab implementation of kmeans. Note that the although the names and conventions for this algorithm are taken from Matlab's implementation, there are slight alterations (for example, IDX==-1 is used to indicate outliers). IDX is a n-by-1 vector used to indicated cluster membership. Let X be a set of n points. Then the ID of X - or IDX is a column vector of length n, where each element is an integer indicating the cluster membership of the corresponding element in X. IDX(i)=c indicates that the ith point in X belongs to cluster c. Cluster labels range from 1 to k, and thus k=max(IDX) is typically the number of clusters IDX divides X into. The cluster label "-1" is reserved for outliers. IDX(i)==-1 indicates that the given point does not belong to any of the discovered clusters. Note that matlab's version of kmeans does not have outliers. USAGE [ IDX, C, d ] = kmeans2( X, k, [varargin] ) INPUTS X - [n x p] matrix of n p-dim vectors. k - maximum nuber of clusters (actual number may be smaller) prm - additional params (struct or name/value pairs) .k - [] alternate way of specifying k (if not given above) .nTrial - [1] number random restarts .maxIter - [100] max number of iterations .display - [0] Whether or not to display algorithm status .rndSeed - [] random seed for kmeans; useful for replicability .outFrac - [0] max frac points that can be treated as outliers .minCl - [1] min cluster size (smaller clusters get eliminated) .metric - [] metric for pdist2 .C0 - [] initial cluster centers for first trial OUTPUTS IDX - [n x 1] cluster membership (see above) C - [k x p] matrix of centroid locations C(j,:) = mean(X(IDX==j,:)) d - [1 x k] d(j) is sum of distances from X(IDX==j,:) to C(j,:) sum(d) is a typical measure of the quality of a clustering EXAMPLE See also DEMOCLUSTER Piotr's Computer Vision Matlab Toolbox Version 3.24 Copyright 2014 Piotr Dollar. [pdollar-at-gmail.com] Licensed under the Simplified BSD License [see external/bsd.txt]