Abstract

Surrey Audio-Visual Expressed Emotion (SAVEE) database has been recorded as a pre-requisite for the development of an automatic emotion recognition system. The database consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion. The data were recorded in a visual media lab with high quality audio-visual equipment, processed and labeled. To check the quality of performance, the recordings were evaluated by 10 subjects under audio, visual and audio-visual conditions. Classification systems were built using standard features and classifiers for each of the audio, visual and audio-visual modalities, and speaker-independent recognition rates of 61%, 65% and 84% achieved respectively.


Conclusions

We have recorded an audio-visual database of expressed emotions with six basic emotions and neutral. The database consists of phonetically-balanced TIMIT sentences uttered by 4 English actors with a total size of 480 utterances. The database was evaluated by 10 subjects with respect to recognizability for each of the audio, visual and audio-visual data. The subjective evaluation results show higher classification accuracy for the visual data compared to the audio, and the overall performance improved by combining the two modalities. A reasonably high classification accuracy was achieved in speaker-dependent and speaker-independent experiments on the database, which follow a similar pattern of emotion classification results as that by human evaluators, i.e. the visual data performed better than the audio and the overall performance improved for the audio-visual combined. Human evaluation and machine learning experimental results show the usefulness of this database for research in the field of emotion recognition.


Acknowledgments

We are thankful to Kevin Lithgow, James Edge, Joe Kilner, Darren Cosker, Nataliya Nadtoka, Samia Smail, Idayat Salako, Affan Shaukat and Aftab Khan for help with the data capture, evaluation and as subjects, to James Edge for his marker tracker, to Adrian Hilton for use of his 3dMD equipment, to Sola Aina for help with the description of phonetic symbols, and to the University of Peshawar (Pakistan) and CVSSP at the University of Surrey (UK) for funding.

Last update: 2 April 2015
Authors: Philip Jackson and Sanaul Haq
Designed by JustDreamweaver.com