Recovering 3D human pose from monocular images
Ankur Agarwal and William Triggs
IEEE Transactions on Pattern Analysis and Machine Intelligence
We describe a learning based method for recovering 3D human body pose
from single images and monocular image sequences. Our approach
requires neither an explicit body model nor prior labelling of body
parts in the image. Instead, it recovers pose by direct nonlinear
regression against shape descriptor vectors extracted automatically
from image silhouettes. For robustness against local silhouette
segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts
descriptors. We evaluate several different regression methods: ridge
regression, Relevance Vector Machine (RVM) regression and Support Vector
Machine (SVM) regression over both linear and kernel bases. The RVMs provide
much sparser regressors without compromising performance, and kernel
bases give a small but worthwhile improvement in performance.
Loss of depth and limb labelling information often makes the recovery of 3D pose
from single silhouettes ambiguous. We propose two solutions to this: the first
embeds the method in a tracking framework, using dynamics from the previous
state estimate to disambiguate the pose; the second uses a mixture of
regressors framework to return multiple solutions for each silhouette.
We show that the resulting system tracks long sequences stably, and is also
capable of accurately reconstructing 3D human pose from single images, giving
multiple possible solutions in ambiguous cases. For realism and good
generalization over a wide range of viewpoints, we train the regressors on
images resynthesized from real human motion capture data. The method is
demonstrated on a 54-parameter full body pose model, both quantitatively on
independent but similar test data, and qualitatively on real image sequences.
Mean angular errors of 4--5 degrees are obtained --- a factor of 3 better than
the current state of the art for the much simpler upper body problem.