Humanising GrabCut: Learning to segment humans using the Kinect
Varun Gulshan, Victor Lempitsky and Andrew Zisserman
In: IEEE Workshop on Consumer Depth Cameras for Computer Vision, ICCV 2011, 6-13 November 2011, Barcelona.
The Kinect provides an opportunity to collect large
quantities of training data for visual learning algorithms
relatively effortlessly. To this end we investigate learning to
automatically segment humans from cluttered images (without depth information) given a bounding box . For this algorithm, obtaining a large dataset of images with segmented
humans is crucial as it enables the possible variations in
human appearances and backgrounds to be learnt.
We show that a large dataset of roughly 3400 humans can be automatically acquired very cheaply using
the Kinect. Segmenting humans is then cast as a learning
problem with linear classiﬁers trained to predict segmentation masks from sparsely coded local HOG descriptors.
These classiﬁers introduce top-down knowledge to obtain a
crude segmentation of the human which is then reﬁned using
bottom up information from local color models in a SnapCut  like fashion. The method is quantitatively evaluated
on images of humans in cluttered scenes, and a high performance obtained (88:5% overlap score). We also show that
the method can be completely automated – segmenting humans given only the images, without requiring a bounding
box, and compare with a previous state of the art method
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Deposited By:||Sunando Sengupta|
|Deposited On:||28 December 2011|