Emotion Recognition in Speech Signal: Experimental Study, Development and Applications

730
60.8
Следующее
Популярные
Опубликовано 6 сентября 2016, 5:07
In this talk I will overview my research on emotion expression and emotion recognition in speech signal and its applications. Two proprietary databases of emotional utterances were used in this research. The first database consists of 700 emotional utterances in English pronounced by 30 subjects portraying five emotional states: unemotional (normal), anger, happiness, sadness, and fear. The second database consists of 3660 emotional utterances in Russian by 61 subjects portraying the following six emotional states: unemotional, anger, happiness, sadness, fear and surprise. An experimental study has been conducted to determine how well people recognize emotions in speech. Based on the results of the experiment the most reliable utterances were selected for feature selection and for training recognizers. Several machine learning techniques have been applied to create recognition agents including k-nearest neighbor, neural networks, and ensembles of neural networks. The agents can recognize five emotional states with the following accuracy: normal or unemotional state - 55-75, anger - 70-80, and fear - 35-55. The agents can be adapted to a particular environment depending on parameters of speech signal and the number of target emotional states. For a practical application an agent has been created that is able to analyze telephone quality speech signal and distinguish between two emotional states (agitation which includes anger, happiness and fear, and calm which includes normal state and sadness) with the accuracy 77. The agent was used as a part of a decision support system for prioritizing voice messages and assigning a proper human agent to response the message at call center environment. I will also give a summary of other research topics in the lab including fast pitch-synchronous segmentation of speech signal, the use of speech analysis techniques for language learning and video clip recognition using a joint audio-visual model.
автотехномузыкадетское