3D Visual Tracking of Articulated Objects
Teo de Campos
PhD thesis, University of Oxford.
The ability to track multiple and articulated objects is an important one, not least in the areas of autonomous
and teleoperated robotics, visual surveillance and human motion analysis. This thesis is concerned
with marker-free real-time detection and tracking of articulated objects, targeting human hands
with the aim to study methods that can be applied to enhance the interaction between humans and 3D
(real or virtual) objects.
A survey summarises methods used to approach this and related problems in the literature. It indicates
that, despite the large body of research in this field over twenty or so years, the area still proves
challenging. Two main approaches have been identified. The first, known as generative tracking, uses an
explicit kinematical representation of linkages or constraints between object parts and tracks by minimising
error of projected control points. The second, known as discriminative approach, little is specified
beforehand, but training data is used in order to create a map between image observations and 3D poses.
This thesis describes novel work in both areas.
In the generative area, a method for tracking of articulated objects is described. It is a new extension
of a method for tracking rigid objects in which the motion constraints between parts of the object are
imposed up-front within the tracking process. The inter-frame pose update is derived as the solution of
a linear system. This method has been applied to track articulated objects, including hands and multiple
objects with motion constraints.
An alternative method is that based on estimating the motion of each subpart independently, thereby
introducing redundant degrees of freedom, and imposing constraints later in a lower dimensional subspace.
This method is reviewed and a comparison between this and the aforementioned method is presented
in terms of accuracy, efficiency and robustness.
In the discriminative area, an inference-based approach is adopted in which a non-parametric relation
between global image measurements and 3D poses is learnt using a multivariate regressor based on Relevance
Vector Machine. This relation is a continuous map that allows fast and efficient pose estimation
from static images. This method can detect and estimate the 3D pose of hands from static images, so it
can be applied to (re-)initialise the generative tracker.
In this thesis, the use of multiple view is adopted as a solution to reduce the ambiguities for both
generative and discriminative methods. Experiments with single and multiple views are described and a novel extension of the discriminative method for multiple views is proposed and evaluated.
|EPrint Type:||Thesis (PhD)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Deposited By:||Teo de Campos|
|Deposited On:||21 February 2012|