Rhea: Automatic IO Filtering for Optimizing Cloud Analytics

9
Опубликовано 18 августа 2016, 19:42
Hadoop is the de-facto standard for processing large datasets. In close collaboration with the Microsoft Hadoop team we are developing an Azure service to accelerate Hadoop jobs. We take advantage of the observation that many Hadoop jobs are very selective and operate on just a fraction of their input data. In this cross-group project we have used static analysis techniques to examine the map phase of a job and automatically extract a filter that identifies the interesting rows and columns of the input data. Instead of sending all data from the Azure storage to the compute cluster, we automatically identify and send only the subset of interest. Using our filters on some example jobs, we have reduced network overheads by a factor of 5, and job completion times by a factor of 3 to 4.
Случайные видео
122 дня – 5 032 6761:50
Voices of Galaxy: Mejdi Schalck | Samsung
24.12.20 – 23 12312:03
3x3 SEO tips for JavaScript web apps
автотехномузыкадетское