Speech Processing on Multi-Genre Broadcast Media

296

Microsoft Research334 тыс

Следующее

13.09.16 – 3 5214:18

Programming Cells

Популярные

08.11.23 – 1 3681:52

Project Mosaic

27.10.22 – 31850:20

Lightning talks: Training and inference efficiency

Опубликовано 13 сентября 2016, 20:45

There have been lots of research and work on broadcast speech since the mid-1990s, including transcription, diarization etc, but almost all have been limited domain - typically broadcast news. This talk will describe the recent progress of speech processing on Multi-Genre Broadcast Media. Different from most of the previous work on Broadcast News, a broad, multi-genre dataset, spanning the whole range of seven weeks BBC TV output across four channels, was used - the training set provided by the BBC contained about 1 600 hours of broadcast audio, together with several hundred million words of subtitle text. Transcriptions for the acoustic training data were the broadcast subtitles which have an average word error rate of about 33% (26% due to deletions) compared with verbatim transcripts. Evaluations of speech recognition, speaker diarization, and lightly supervised alignment using the BBC TV recordings, are performed. The details of Cambridge University system are described for all these evaluations on Multi-Genre Broadcast Media. Key features of the system include: lightly supervised decoding based data selection, DNN-based segmentation, Hybrid & Tandem systems joint decoding, DNN adaptation for AM, RNN adaptation for LM, the complementarity exploration within HTK and Kaldi, etc. With these advances, the CUED system won all evaluated tasks in the 1st Multi-Genre Broadcast Challenge.

Свежие видео