Large-scale Retrieval with Ivory and MapReduce

51
Опубликовано 17 августа 2016, 3:40
It is commonly acknowledged that web-scale collections have outgrown the capabilities of individual machines, necessitating the use of clusters to tackle many problems in information retrieval. The release of the 25-terabyte billion-page ClueWeb09 collection in 2009 and the increasing popularity of Hadoop, the open source implementation of the MapReduce distributed framework, have motivated academic researchers to think more seriously about cluster-based distributed retrieval solutions. In this talk, we will first introduce Ivory, an end-to-end open-source distributed retrieval system built at University of Maryland, College Park; Ivory takes full advantage of Hadoop and its underlying distributed file system for both indexing and retrieval. We will then present an overview of several research projects evolved around Ivory, such as approximate positional indexing for efficient ranked retrieval, scalable monolingual and cross-lingual pairwise document similarity, and automatically-extracted pseudo test collections for learning ranking functions for the task of web search.
автотехномузыкадетское