MADDER and Self-Tuning Data Analytics on Hadoop with Starfish

119

Microsoft Research334 тыс

Следующее

17.08.16 – 3201:24:19

Full-rank Gaussian Modeling of Convolutive Audio Mixtures Applied to Source Separation

Популярные

06.12.22 – 52227:07

Reconfigurable Metamaterial Surfaces for mmWave and Satellite Networks

01.12.22 – 48417:46

Lightning Talk: Biomedical Visual Language Learning

Опубликовано 17 августа 2016, 3:09

Timely and cost-effective analytics over 'big data' is now a key ingredient for success in businesses and scientific disciplines. The Hadoop platform---consisting of an extensible MapReduce execution engine, pluggable distributed storage engines, and a range of procedural to declarative interfaces to express analysis tasks---is an emerging choice for big data analytics. Hadoop's performance out of the box can be poor, causing suboptimal use of resources, time, and money (e.g., in pay-as-you-go clouds). Unfortunately, practitioners of big data analytics such as business analysts, computational scientists, and researchers often lack the expertise to tune the Hadoop platform for good performance. I will introduce Starfish, a self-tuning system for big data analytics. Starfish builds on Hadoop, while adapting to system workloads and user needs to provide good performance automatically; without any need for users to understand and manipulate the many tuning knobs in the Hadoop platform. While Starfish's design is guided by work on self-tuning database systems, I will discuss how new analysis practices (dubbed the MADDER principles) over big data pose new challenges; leading us to different design choices in Starfish. Starfish is under active development and is available at: cs.duke.edu/starfish

Свежие видео