Final intern talk: Improving Frechet Audio Distance for Generative Music Evaluation

822

10.1

Microsoft Research336 тыс

Следующее

22.09.23 – 1 0141:11:02

End-to-End Encrypted Group Chats with MLS: Design, Implementation and Verification

Популярные

15.12.23 – 1 59353:47

Effective Human-AI Decision-Making or Everyone: A Sisyphean Task?

07.07.23 – 5346:06

Privacy-Preserving Domain Adaptation of Semantic Parsers

Опубликовано 22 сентября 2023, 18:27

Speakers: Azalea Gui
Host: Hannes Gamper

As generative music models become more powerful and popular, there is a growing need for robust objective metrics of music quality that correlates with human perception. The Frechet Audio Distance (FAD) is a commonly used metric for this purpose. However, its performance may be hampered by issues including sample size bias, limitations of the underlying audio embeddings, and the use of low-quality reference sets. We propose reducing sample size bias by extrapolating unbiased scores as the sample size approaches infinity. A comparison of various audio embeddings reveals that some are better suited for deriving FAD scores that capture aspects of musical or acoustic quality. Finally, our experiments underscore the importance of choosing a diverse and high-quality reference dataset for FAD calculation. Listening test results indicate that unbiased FAD scores calculated using suitable embeddings and reference music improves correlation with human ratings of musical and acoustic quality.

Свежие видео