Learning sign language by watching TV (using weakly aligned subtitles)
Patrick Buehler, Mark Everingham and Andrew Zisserman
In: CVPR, 20-25 Jun 2009, Miami, USA.
The goal of this work is to automatically learn a large number of British Sign Language (BSL) signs from TV broadcasts. We achieve this by using the supervisory information available from subtitles broadcast simultaneously with the signing.
This supervision is both weak and noisy: it is weak due to the correspondence problem since temporal distance between sign and subtitle is unknown and signing does not follow the text order; it is noisy because subtitles can be signed in different ways, and because the occurrence of a subtitle word does not imply the presence of the corresponding sign.
The contributions are: (i) we propose a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation, and properly models the case of hands touching; (ii) we show that by optimizing a scoring function based on multiple instance learning, we are able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision.
The method is automatic given the English target word of the sign to be learnt. Results are presented for 210 words including nouns, verbs and adjectives