Microsoft Research355 тыс
Опубликовано 3 июня 2025, 16:07
Researcher Jindong Wang and Associate Professor Steven Euijong Whang explore the NeurIPS 2024 work ERBench. ERBench leverages relational databases to create LLM benchmarks that can verify model rationale via keywords in addition to checking answer correctness.
Show notes: microsoft.com/en-us/research/p...
Listen to the Abstracts series: microsoft.com/en-us/research/p...
Show notes: microsoft.com/en-us/research/p...
Listen to the Abstracts series: microsoft.com/en-us/research/p...
Случайные видео





















