Microsoft Research335 тыс
Опубликовано 14 мая 2024, 15:32
Speaker(s): Eloi Moliner
Host: Hannes Gamper
Speech reverberation control involves the manipulation of acoustic characteristics in speech recordings, including tasks like speech dereverberation or reverberation time reduction. Diffusion implicit bridges are a recently proposed domain translation technique based on diffusion models and entropy-regularized optimal transport. They enable a bijective mapping between samples from different distributions by bridging through a prior Gaussian distribution. Diffusion bridges have the advantage of not requiring paired data samples for training and are optimized with a simple and stable Euclidean objective. This study applies diffusion implicit bridges to unsupervised speech reverberation control. We identify how a naive implementation of this method results in numerous undesired artifacts, such as speaker identity changes or babling, and attribute it to the curvature in the sampling trajectories. To mitigate these issues we propose training the model with a chunk-based optimal transport coupling between speech and noise samples, which significantly straightens the learned trajectories and improves the semantic consistency of the speech content. We study the performance of different configurations of the model through a comprehensive objective evaluation. To demonstrate the versatility of the method, we additionally conduct experiments on other tasks such as speech declipping or guitar distortion removal.
See more at microsoft.com/en-us/research/v...
Host: Hannes Gamper
Speech reverberation control involves the manipulation of acoustic characteristics in speech recordings, including tasks like speech dereverberation or reverberation time reduction. Diffusion implicit bridges are a recently proposed domain translation technique based on diffusion models and entropy-regularized optimal transport. They enable a bijective mapping between samples from different distributions by bridging through a prior Gaussian distribution. Diffusion bridges have the advantage of not requiring paired data samples for training and are optimized with a simple and stable Euclidean objective. This study applies diffusion implicit bridges to unsupervised speech reverberation control. We identify how a naive implementation of this method results in numerous undesired artifacts, such as speaker identity changes or babling, and attribute it to the curvature in the sampling trajectories. To mitigate these issues we propose training the model with a chunk-based optimal transport coupling between speech and noise samples, which significantly straightens the learned trajectories and improves the semantic consistency of the speech content. We study the performance of different configurations of the model through a comprehensive objective evaluation. To demonstrate the versatility of the method, we additionally conduct experiments on other tasks such as speech declipping or guitar distortion removal.
See more at microsoft.com/en-us/research/v...