Data-driven methods in Description-based Audio Information Processing

43
Опубликовано 6 сентября 2016, 17:12
Digital consumer devices and Internet technology have jointly helped tremendous amounts of multimedia content to be created, shared and edited quickly and easily. To efficiently access this vast amount of data, user-centric automatic methods of processing and indexing are necessary. My main research focus in this area is to develop a description-based framework for automatic audio information processing. Audio content is available in a variety of forms such as music, news, sports commentaries, podcasts and is also an integral part of video. Processing these forms requires a combination of clustering, segmentation, and classification techniques through a well specified indexing framework. Since the audio medium is inherently rich and it conveys varying levels of information depending on the time-scale and the description, developing an indexing framework using language is challenging. Contemporary methods adopt a naïve solution to processing audio; they simply categorize audio into fixed high-level semantic categories (such as animal sounds, laughter, music, etc.) and attempt to identify them. These techniques also overlook the varying levels of human content description that is mainly based on perception. My research effort is geared towards developing representations and analysis techniques that are scalable in terms of time and description level. It considers both perception-based description (using onomatopoeia) and high-level semantic descriptions. These methods can be universally applied to the domain of unstructured audio that covers all forms of content where the type and the number of acoustic sources and their duration are highly variable. The ultimate goal of my work is to develop a full-duplex audio information processing system where audio is categorized, segmented, and clustered using both signal-level measures and higher-level language-based descriptions.
автотехномузыкадетское