# pdist2

## PURPOSE

Calculates the distance between sets of vectors.

## SYNOPSIS

function D = pdist2( X, Y, metric )

## DESCRIPTION

``` Calculates the distance between sets of vectors.

Let X be an m-by-p matrix representing m points in p-dimensional space
and Y be an n-by-p matrix representing another set of points in the same
space. This function computes the m-by-n distance matrix D where D(i,j)
is the distance between X(i,:) and Y(j,:).  This function has been
optimized where possible, with most of the distance computations
requiring few or no loops.

The metric can be one of the following:

'euclidean' / 'sqeuclidean':
Euclidean / SQUARED Euclidean distance.  Note that 'sqeuclidean'
is significantly faster.

'chisq'
The chi-squared distance between two vectors is defined as:
d(x,y) = sum( (xi-yi)^2 / (xi+yi) ) / 2;
The chi-squared distance is useful when comparing histograms.

'cosine'
Distance is defined as the cosine of the angle between two vectors.

'emd'
Earth Mover's Distance (EMD) between positive vectors (histograms).
Note for 1D, with all histograms having equal weight, there is a simple
closed form for the calculation of the EMD.  The EMD between histograms
x and y is given by the sum(abs(cdf(x)-cdf(y))), where cdf is the
cumulative distribution function (computed simply by cumsum).

'L1'
The L1 distance between two vectors is defined as:  sum(abs(x-y));

USAGE
D = pdist2( X, Y, [metric] )

INPUTS
X        - [m x p] matrix of m p-dimensional vectors
Y        - [n x p] matrix of n p-dimensional vectors
metric   - ['sqeuclidean'], 'chisq', 'cosine', 'emd', 'euclidean', 'L1'

OUTPUTS
D        - [m x n] distance matrix

EXAMPLE
% simple example where points cluster well
[X,IDX] = demoGenData(100,0,5,4,10,2,0);
D = pdist2( X, X, 'sqeuclidean' );
distMatrixShow( D, IDX );
% comparison to pdist
n=500; d=200; r=100; X=rand(n,d);
tic, for i=1:r, D1 = pdist( X, 'euclidean' ); end, toc
tic, for i=1:r, D2 = pdist2( X, X, 'euclidean' ); end, toc
D1=squareform(D1); del=D1-D2; sum(abs(del(:)))