Implementation Details
Initial Window Size and Placement
In practice, we work with digital video images so our distributions are discrete. Since
CAMSHIFT is an algorithm that climbs the gradient of a distribution, the minimum search
window size must be greater than one in order to detect a gradient. Also, in order to center
the window, it should be of odd size. Thus for discrete distributions, the minimum window
size is set at three. For this reason too, as CAMSHIFT adapts its search window size, the
size of the search window is rounded up to the current or next greatest odd number. In
practice, at start up, we calculate the color probability of the whole scene and use the
zeroth moment to set the window size (see subsection below) and the centroid to set the
window center.
Setting Adaptive Window Size Function
Deciding what function of the zeroth moment to set the search window size to in Step 3
of the CAMSHIFT algorithm depends on an understanding of the distribution that one wants
to track and the goal that one wants to achieve. The first consideration is to translate
the zeroth moment information into units that make sense for setting window size. Thus,
in Figure 4, the maximum distribution value per discrete cell is 206, so we divide the
zeroth moment by 206 to convert the calculated area under the search window to units of
number of cells. Our goal is then to track the whole color object so we need an expansive
window. Thus, we further multiply the result by two so that the window grows to encompass
the connected distribution area. We then round to the next greatest odd search window size
so that the window has a center.
For 2D color probability distributions where the maximum pixel value is 255, we set window size s to
We divide by 256 for the same reason stated above, but to convert the resulting 2D region to a 1D length, we need to take the square root. In practice, for tracking faces, we set window width to s and window length to 1.2s since faces are somewhat elliptical.
Comments on Software Calibration
Much of CAMSHIFT's robustness to noise, transient occlusions, and distractors depends on
the search window matching the size of the object being tracked—it is better to err on the
side of the search window being a little too small. The search window size depends on the
function of the zeroth moment M00 chosen above. To indirectly control
the search window size, we adjust the color histogram up or down by a constant, truncating
at zero or saturating at the maximum pixel value. This adjustment affects the pixel values
in the color probability distribution image which affects M00 and hence
window size. For 8-bit hue, we adjust the histogram down by 20 to 80 (out of a maximum of
255), which tends to shrink the CAMSHIFT window to just within the object being tracked
and also reduces image noise.
HSV brightness and saturation thresholds are employed since hue is not well defined for very low or high brightness or low saturation. Low and high thresholds are set off 10% of the maximum pixel value.
Comments on Hardware Calibration
To use CAMSHIFT as a video color object tracker, the camera's field of view (zoom) must be
set so that it covers the space that one intends to track in. Turn off automatic white balance
if possible to avoid sudden color shifts. Try to set (or auto-adjust) AGC, shutter speed,
iris or CCD integration time so that image brightness is neither too dim nor saturating.
The camera need not be in focus to track colors. CAMSHIFT will work well with cheap cameras
and does not need calibrated lenses.