Compute channel feature pyramid given an input image. While chnsCompute() computes channel features at a single scale, chnsPyramid() calls chnsCompute() multiple times on different scale images to create a scale-space pyramid of channel features. In its simplest form, chnsPyramid() first creates an image pyramid, then calls chnsCompute() with the specified "pChns" on each scale of the image pyramid. The parameter "nPerOct" determines the number of scales per octave in the image pyramid (an octave is the set of scales up to half of the initial scale), a typical value is nPerOct=8 in which case each scale in the pyramid is 2^(-1/8)~=.917 times the size of the previous. The smallest scale of the pyramid is determined by "minDs", once either image dimension in the resized image falls below minDs, pyramid creation stops. The largest scale in the pyramid is determined by "nOctUp" which determines the number of octaves to compute above the original scale. While calling chnsCompute() on each image scale works, it is unnecessary. For a broad family of features, including gradient histograms and all channel types tested, the feature responses computed at a single scale can be used to approximate feature responses at nearby scales. The approximation is accurate at least within an entire scale octave. For details and to understand why this unexpected result holds, please see: P. Dollár, R. Appel, S. Belongie and P. Perona "Fast Feature Pyramids for Object Detection", PAMI 2014. The parameter "nApprox" determines how many intermediate scales are approximated using the techniques described in the above paper. Roughly speaking, channels at approximated scales are computed by taking the corresponding channel at the nearest true scale (computed w chnsCompute) and resampling and re-normalizing it appropriately. For example, if nPerOct=8 and nApprox=7, then the 7 intermediate scales are approximated and only power of two scales are actually computed (using chnsCompute). The parameter "lambdas" determines how the channels are normalized (see the above paper). lambdas for a given set of channels can be computed using chnsScaling.m, alternatively, if no lambdas are specified, the lambdas are automatically approximated using two true image scales. Typically approximating all scales within an octave (by setting nApprox=nPerOct-1 or nApprox=-1) works well, and results in large speed gains (~4x). See example below for a visualization of the pyramid computed with and without the approximation. While there is a slight difference in the channels, during detection the approximated channels have been shown to be essentially as effective as the original channels. While every effort is made to space the image scales evenly, this is not always possible. For example, given a 101x100 image, it is impossible to downsample it by exactly 1/2 along the first dimension, moreover, the exact scaling along the two dimensions will differ. Instead, the scales are tweaked slightly (e.g. for a 101x101 image the scale would go from 1/2 to something like 50/101), and the output contains the exact scaling factors used for both the heights and the widths ("scaleshw") and also the approximate scale for both dimensions ("scales"). If "shrink">1 the scales are further tweaked so that the resized image has dimensions that are exactly divisible by shrink (for details please see the code). If chnsPyramid() is called with no inputs, the output is the complete default parameters (pPyramid). Finally, we describe the remaining parameters: "pad" controls the amount the channels are padded after being created (useful for detecting objects near boundaries); "smooth" controls the amount of smoothing after the channels are created (and controls the integration scale of the channels); finally "concat" determines whether all channels at a single scale are concatenated in the output. An emphasis has been placed on speed, with the code undergoing heavy optimization. Computing the full set of (approximated) *multi-scale* channels on a 480x640 image runs over *30 fps* on a single core of a machine from 2011 (although runtime depends on input parameters). USAGE pPyramid = chnsPyramid() pyramid = chnsPyramid( I, pPyramid ) INPUTS I - [hxwx3] input image (uint8 or single/double in [0,1]) pPyramid - parameters (struct or name/value pairs) .pChns - parameters for creating channels (see chnsCompute.m) .nPerOct - [8] number of scales per octave .nOctUp - [0] number of upsampled octaves to compute .nApprox - [-1] number of approx. scales (if -1 nApprox=nPerOct-1) .lambdas - [] coefficients for power law scaling (see BMVC10) .pad - [0 0] amount to pad channels (along T/B and L/R) .minDs - [16 16] minimum image size for channel computation .smooth - [1] radius for channel smoothing (using convTri) .concat - [1] if true concatenate channels .complete - [] if true does not check/set default vals in pPyramid OUTPUTS pyramid - output struct .pPyramid - exact input parameters used (may change from input) .nTypes - number of channel types .nScales - number of scales computed .data - [nScales x nTypes] cell array of computed channels .info - [nTypes x 1] struct array (mirrored from chnsCompute) .lambdas - [nTypes x 1] scaling coefficients actually used .scales - [nScales x 1] relative scales (approximate) .scaleshw - [nScales x 2] exact scales for resampling h and w EXAMPLE I=imResample(imread('peppers.png'),[480 640]); pPyramid=chnsPyramid(); pPyramid.minDs=[128 128]; pPyramid.nApprox=0; tic, P1=chnsPyramid(I,pPyramid); toc pPyramid.nApprox=7; tic, P2=chnsPyramid(I,pPyramid); toc figure(1); montage2(P1.data{2}); figure(2); montage2(P2.data{2}); figure(3); montage2(abs(P1.data{2}-P2.data{2})); colorbar; See also chnsCompute, chnsScaling, convTri, imPad Piotr's Computer Vision Matlab Toolbox Version 3.25 Copyright 2014 Piotr Dollar & Ron Appel. [pdollar-at-gmail.com] Licensed under the Simplified BSD License [see external/bsd.txt]