Towards Spoken Term Discovery at Scale with Zero Resources

Microsoft Research334 тыс

Следующее

17.08.16 – 4331:53:56

Three Technical Talks by Faculty Members of the UPCRC at UC Berkeley ParLab

Популярные

85 дней – 1 9703:45

CataractBot: An LLM-Powered Experts-in-the-Loop Chatbot for Cataract Patients

176 дней – 22215:51

Keynote: Building Globally Equitable AI

Опубликовано 17 августа 2016, 21:18

The spoken term discovery task takes speech as input and identifies terms of possible interest. The challenge is to perform this task efficiently on large amounts of speech with zero resources (no training data and no dictionaries), where we must fall back to more basic properties of language. We find that long (~1 s) repetitions tend to be contentful phrases (e.g. University of Pennsylvania) and propose an algorithm to search for these long repetitions without first recognizing the speech. To address efficiency concerns, we take advantage of (i) sparse feature representations and (ii) inherent low occurrence frequency of long content terms to achieve orders-of-magnitude speedup relative to the prior art. We frame our evaluation in the context of spoken document information retrieval, and demonstrate our methodΓÇÖs competence at identifying repeated terms in conversational telephone speech.

Свежие видео