An evaluation of bags-of-words and spatio-temporal shapes for action recognition
Teo de Campos, Mark Barnard, Krystian Mikolajczyk, Josef Kittler, Fei Yan, William Christmas and David Windridge
In: IEEE Workshop on Applications of Computer Vision (WACV), 05-06 Jan 2011, Kona, Hawaii.
Bags-of-visual-Words (BoW) and Spatio-Temporal Shapes (STS) are two
very popular approaches for action recognition from video. The
former (BoW) is an un-structured global representation of videos which
is built using a large set of local features. The latter (STS) uses a
single feature located on a region of interest
(where the actor is) in the video. Despite the popularity of these
methods, no comparison between them has been done.
Also, given that BoW and STS differ intrinsically in terms of
context inclusion and globality/locality of operation,
an appropriate evaluation framework has to be designed carefully.
This paper compares these two approaches using four different datasets
with varied degree of space-time specificity of the actions and varied
relevance of the contextual background. We use the same local feature
extraction method and the same classifier for both approaches. Further
to BoW and STS, we also evaluated novel variations of BoW constrained
in time or space. We observe that the STS approach leads to better
results in all datasets whose background is of little relevance to
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Deposited By:||Teo de Campos|
|Deposited On:||03 December 2010|