

# Training ExamplesĬomparison with Nearest-Neighbor Matching of Whole Body

Decision forest constructed in 1 day on 1,000 core cluster.2,000 training example pixels per image.300,000 training images per tree randomly selected from 1M training images.The ensemble model Forest output probability Classify each pixel x in image I using all decision trees and average the results at the leaves:įor each tree, pick a randomly sampled subset of training data Randomly choose a set of features and thresholds at each node Pick the feature and threshold that give the largest information gain Recurse until a certain accuracy is reached or tree-depth is obtained Each tree is a classifier that predicts the likelihood of a pixel x belonging to body part class c – Non-leaf node corresponds to a thresholded feature – Leaf node corresponds to a conjunction of several features – At leaf node store learned distribution P(c|I, x)ĭI(x) is depth image, = (u, v) is offset to second pixelĬlassification Testing Phase: 1.Randomized decision forest: collection of independently-trained binary decision trees.Offset is scaled by depth at reference pixel.What should we use for a classifier? – Random Decision Forests.What should we use for a feature? – Difference in depth.Compute joint positionsĮxtract Body Pixels by Thresholding Depth Needs to be very fast (their algorithm runs at 200 fps on the Xbox 360 GPU).Lots of variation in bodies, orientations, poses.Computer Vision and Pattern Recognition, 2011 Real-Time Human Pose Recognition in Parts from a Single Depth Image, J.


In-camera ASIC computes 11-bit 640 x 480 depth map at 30 Hz.Stop when all dots have known depth or are marked “invalid” Neighboring pixels are added to a queue For each pixel in queue, initialize by anchor’s shift then search small local neighborhood if matched, add neighbors to queue Stop when no pixels are left in the queueģ. Windowed search via normalized cross correlation along scanline –Ĭheck that best match score is greater than threshold if not, mark as “invalid” and go to 2 Randomly select a region anchor: a dot with unknown depth a. Detect dots (“speckles”) and label them unknown 2. Light projection – If we project distinctive points, matching is easy Textureless surfaces.Basic Principle – Use a projector to create known features (e.g., points, lines).Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 Matching cost: SSD or normalized correlationĪdd Constraints and Solve with Graph Cuts.Slide a window along the right scanline and compare contents of that window with the reference window in the left image.For each pixel x in the first image – Find corresponding epipolar scanline in the right image – Examine all pixels on the scanline and pick the best match x’ – Compute disparity x-x’ and set depth(x) = fB/(x-x’).If necessary, rectify the two stereo images to transform epipolar lines into scanlines.– Find corresponding epipolar line in the right image – Examine all pixels on the epipolar line and pick the best match – Triangulate the matches to get depth informationĭisparity is inversely proportional to depth, z Some of following slides adapted from Steve Seitz and Lana Lazebnik Goal: recover depth by finding image coordinate x’ that corresponds to x X X.Stereo algorithm used by PrimeSense (Kinect) How it works for a projector/sensor pair 3. Part 1: Stereo from Projected Dots IR Projector IR Sensor Stereo Algorithm Segmentation, Part PredictionĪpplication (e.g., game) Estimate body parts and joint poses Kinect 1.5 due 5/2012 IR laser projector.2010) – Color video camera + laser-projected IR dot pattern + IR camera Human Body Recognition and Tracking: How the Kinect Works
