Microsoft Research336 тыс
Опубликовано 2 января 2025, 18:38
As Retrieval-Augmented Generation (RAG) systems gain prominence for grounding large language models (LLMs) in external knowledge, constructing evaluation frameworks is critical in accelerating developments across multiple diverse languages. This talk introduces a comprehensive multilingual RAG evaluation pipeline comprising three key components: retrieval, relevance assessment, and generation. MIRACL, a multilingual retrieval dataset with high-quality relevance judgments annotated by native speakers; NoMIRACL, a benchmark for assessing relevance in multilingual RAG, designed to measure LLM robustness against retrieval errors; and MIRAGE-Bench, an arena-based multilingual RAG evaluation framework integrating both heuristic metrics and surrogate judge models for multilingual generation evaluation. Together, these resources provide a foundation for advancing multilingual information access and enhancing the robustness of RAG systems. This talk highlights key findings from each section, challenges, and future work for multilingual RAG research.
Speaker: Nandan Thakur, University of Waterloo, Canada
Speaker: Nandan Thakur, University of Waterloo, Canada
Свежие видео
Случайные видео