Make some noise: Teaching the language of audio to an LLM using sound tokens

736
12.9
Опубликовано 28 июля 2025, 19:56
August 22, 2024
Speakers: Shivam Mehta
Host: Hannes Gamper

We investigate the use of low bitrate causal quantized audio representations to fine-tune large language models (LLMs) using LoRA for comprehending and generating audio. Differing from earlier approaches that depend on continuous audio representations for audio comprehension, our attempt involves learning a discretized language of audio through a causal variational quantization leading to an ultra-low bitrate of 0.293 kbps. These proposed audio tokens are then utilized to fine-tune the Llama 7b model for multimodal tasks involving audio understanding and generation. By treating audio as a language with a similar left-to-right inductive bias, we can leverage these tokens to train a multimodal model and conduct qualitative multimodal analysis.
Случайные видео
12 дней – 240 1731:01
LG's NEWEST SoundSuite Lineup is nuts
25 дней – 44 1881:08
This Cooler has 3 displays - TCOMAS CES
246 дней – 33 5150:25
Photo Assist | Galaxy S25 Edge | Samsung
автотехномузыкадетское