Published on 14 May 2026, 17:06
Language models are usually trained to predict the next word, but that does not always lead to the best overall answers. We introduce energy-based fine-tuning, a new method that trains models to produce better full responses, leading to stronger results without the need for complex reward models or verifiers.
Project: energy-based-fine-tuning.githu...
Paper: arxiv.org/abs/2603.12248
GitHub: github.com/sjelassi/ebft_openr...
This session aired on May 14, 2026, at Microsoft Research Forum, Season 2 Episode 4.
Register for the series to hear about new releases: microsoft.com/en-us/research/e...
Explore all previous episodes: aka.ms/researchforumYTplaylist
Project: energy-based-fine-tuning.githu...
Paper: arxiv.org/abs/2603.12248
GitHub: github.com/sjelassi/ebft_openr...
This session aired on May 14, 2026, at Microsoft Research Forum, Season 2 Episode 4.
Register for the series to hear about new releases: microsoft.com/en-us/research/e...
Explore all previous episodes: aka.ms/researchforumYTplaylist
Fresh videos
Random videos






















