Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings

262

Microsoft Research334 тыс

Следующее

22.06.16 – 1751:28:58

Inter-Active Learning with Queries on Instances and Features

Популярные

81 день – 7446:16

A generative model of biology for in-silico experimentation and discovery

355 дней – 13913:18

AI Forum 2023 | Harnessing AI for a Greener Tomorrow

Опубликовано 22 июня 2016, 19:13

Speaker diarization consist of automatically partitioning an input audio stream into homogeneous segments (segmentation) and assigning these segments to the same speaker (speaker clustering). This process can allow to enhance the readability by structuring an audio document, or provide the speaker's true identity when it's used in conjunction with speaker recognition system. In this seminar I will talk about two new methods: ILP Clustering and Speaker embeddings. In speaker clustering, a major problem with using greedy agglomerative hierarchical clustering (HAC) is that it does not guarantee an optimal solution. I propose a new clustering model (called ILP Clustering), by redefining clustering problem as a linear program (ie. linear program is defined by an objective function and subject to linear equality and/or linear inequality constraint). Thus an Integer Linear Programming (ILP) solver can be used to search the optimal solution over the whole problem. In a second part, I propose to learn a set of high-level feature representations through deep learning, referred to as speaker embeddings. Speaker embedding features are taken from the hidden layer neuron activations of Deep Neural Networks (DNN), when learned as classifiers to recognize a thousand speaker identities in a training set. Although learned through identification, the speaker embeddings are shown to be effective for speaker verification in particular to recognize speakers' unseen in the training set. The experiments were conducted on the corpus of French broadcast news ETAPE where these new methods based on ILP/speaker-embeddings decreases DER by 4.79 points over the baseline diarization system based on HAC/GMM.

Свежие видео