Direct Nash Optimization: Teaching language models to self-improve with.. | Microsoft Research Forum
2 072
8.4
Microsoft Research348 тыс
Следующее
Опубликовано 3 сентября 2024, 18:59
Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers, discusses teaching language models to self-improve using a preference oracle like GPT-4, framing it as a two-player game to find an optimal policy at a Nash equilibrium, and achieving state-of-the-art win rates against GPT-4 Turbo on benchmarks such as Alpaca-Eval and MT-Bench.
This session aired on September 3, 2024 at Microsoft Research Forum, Episode 4.
Register for the series: aka.ms/registerresearchforumYT...
Continue watching episode 4: aka.ms/researchforumYTe4
Explore all previous episodes: aka.ms/researchforumYTplaylist
This session aired on September 3, 2024 at Microsoft Research Forum, Episode 4.
Register for the series: aka.ms/registerresearchforumYT...
Continue watching episode 4: aka.ms/researchforumYTe4
Explore all previous episodes: aka.ms/researchforumYTplaylist
Свежие видео
Случайные видео