The Scalable Hyperlink Store

24
Опубликовано 7 сентября 2016, 16:02
This talk describes the Scalable Hyperlink Store, a specialized database that gives very fast access to the forward and backward links of very large web graphs. SHS has been designed to scale to the size of the current MSN Search corpus (about 5 billion crawled web pages and 250 billion hyperlinks) and to provide link access times in the microsecond range. I am currently exploring cost-efficient fault-tolerance schemes and ways to support incremental updates to the database. SHS provides infrastructure for conducting research on properties of the web graph, and can potentially be a useful tool to MSN Search. But most interestingly, it has the potential of enabling a class of search result ranking algorithms known as query-dependent link-based ranking that have been widely studied in the scientific literature, but not been deployed by major search engines. Our plans for the summer are to implement a variety of such algorithms on top of SHS and to measure their performance and effectiveness.
автотехномузыкадетское