3D shape estimation in video sequences provides high precision evaluation of facial expressions
This is the latest version of this eprint.
Person independent and pose invariant estimation of facial expressions and action unit (AU) intensity estimation is important for situation analysis and for automated video annotation. We evaluated raw 2D shape data of the CK+ database, used Procrustes transformation and the multi-class SVM leave-one-out method for classification. We found close to 100% performance demonstrating the relevance and the strength of details of the shape. Precise 3D shape information was computed by means of Constrained Local Models (CLM) on video sequences. Such sequences offer the opportunity to compute a time-averaged '3D Personal Mean Shape' (PMS) from the estimated CLM shapes, which -- upon subtraction -- gives rise to person independent emotion estimation. On CK+ data PMS showed significant improvements over AU0 normalization; performance reached and sometimes surpassed state-of-the-art results on emotion classification and on AU intensity estimation. 3D PMS from 3D CLM offers pose invariant emotion estimation that we studied by rendering a 3D emotional database for different poses and different subjects from the BU 4DFE database. Frontal shapes derived from CLM fits of the 3D shape were evaluated. Results demonstrate that shape estimation alone can be used for robust, high quality pose invariant emotion classification and AU intensity estimation.
Available Versions of this Item