kmeans2

PURPOSE ^

Fast version of kmeans clustering.

SYNOPSIS ^

function [ IDX, C, d ] = kmeans2( X, k, varargin )

DESCRIPTION ^

 Fast version of kmeans clustering.

 Cluster the N x p matrix X into k clusters using the kmeans algorithm. It
 returns the cluster memberships for each data point in the N x 1 vector
 IDX and the K x p matrix of cluster means in C.

 This function is in some ways less general than Matlab's kmeans.m (for
 example it only uses euclidian distance), but it has some options that
 the Matlab version does not (for example, it has a notion of outliers and
 min-cluster size).  It is also many times faster than matlab's kmeans.
 General kmeans help can be found in help for the matlab implementation of
 kmeans. Note that the although the names and conventions for this
 algorithm are taken from Matlab's implementation, there are slight
 alterations (for example, IDX==-1 is used to indicate outliers).

 IDX is a n-by-1 vector used to indicated cluster membership.  Let X be a
 set of n points.  Then the ID of X - or IDX is a column vector of length
 n, where each element is an integer indicating the cluster membership of
 the corresponding element in X.  IDX(i)=c indicates that the ith point in
 X belongs to cluster c. Cluster labels range from 1 to k, and thus
 k=max(IDX) is typically the number of clusters IDX divides X into. The
 cluster label "-1" is reserved for outliers. IDX(i)==-1 indicates that
 the given point does not belong to any of the discovered clusters. Note
 that matlab's version of kmeans does not have outliers.

 USAGE
  [ IDX, C, d ] = kmeans2( X, k, [varargin] )

 INPUTS
  X       - [n x p] matrix of n p-dim vectors.
  k       - maximum nuber of clusters (actual number may be smaller)
  prm     - additional params (struct or name/value pairs)
   .k         - [] alternate way of specifying k (if not given above)
   .nTrial    - [1] number random restarts
   .maxIter   - [100] max number of iterations
   .display   - [0] Whether or not to display algorithm status
   .rndSeed   - [] random seed for kmeans; useful for replicability
   .outFrac   - [0] max frac points that can be treated as outliers
   .minCl     - [1] min cluster size (smaller clusters get eliminated)
   .metric    - [] metric for pdist2
   .C0        - [] initial cluster centers for first trial

 OUTPUTS
  IDX    - [n x 1] cluster membership (see above)
  C      - [k x p] matrix of centroid locations C(j,:) = mean(X(IDX==j,:))
  d      - [1 x k] d(j) is sum of distances from X(IDX==j,:) to C(j,:)
           sum(d) is a typical measure of the quality of a clustering

 EXAMPLE

 See also DEMOCLUSTER

 Piotr's Computer Vision Matlab Toolbox      Version 3.24
 Copyright 2014 Piotr Dollar.  [pdollar-at-gmail.com]
 Licensed under the Simplified BSD License [see external/bsd.txt]

Generated by m2html © 2003