For signed information about any topic, please click the following icon.

This research is in the area of computer vision - making computers which can understand what is happening in photographs and video. However, most learning based approaches require annotated data which can be expensive to acquire. This project seeks to develop automated tools that allow temporal visual content,such as a human gesturing, using sign language, or interacting with objects or other humans, to be learnt from standard TV broadcast signals using the high level annotation in the form of subtitles or scripts. This requires the development of models of the visual appearance and dynamics of actions, and learning methods which can train such models using the weak supervision provided by the text. As such there are two main domains to this work, Sign Language Recognition and more general understanding of actions and behavior in broadcast footage. More details on some of these elements are given below.

Sign Language

Upper Body Pose Estimation and Tracking

Learning Sign Language by Watching TV

Sign Recognition using Sequential Pattern Trees

Additional pose data and CNN body pose software

Action Recognition

Recognising actions in 2D

Recognising actions in 3D

Tracking and Character Identification

2D detection and tracking

Tracking 3D objects from 2D footage

Tracking Hands in 3D

Identifying Characters in Footage

This page was last updated on 19/03/2016 by S Hadfield