Accelerating Advanced Analytics [1/4]

51
Опубликовано 11 августа 2016, 8:11
Advanced analytics -- the analysis of large and complex data using machine learning (ML) -- is becoming ubiquitous, with a growing demand for advanced analytics tools in the enterprise domains. However, there exist several challenging bottlenecks in the end-to-end process of building and deploying advanced analytics applications. My research focuses on abstractions, algorithms, and systems to mitigate such bottlenecks and accelerate advanced analytics from a data management standpoint. In this talk, I will focus on my work on mitigating one such pervasive bottleneck in the process of feature engineering for ML -- joins of multiple tables. Many real-world datasets are multi-table, connected by key-foreign key relationships, but almost all ML toolkits expect single-table inputs. This forces data scientists to join all tables and materialize a single table that collects all features. Alas, such joins often cause the output to blow up in size, which slows down ML, increases costs, and leads to data maintenance headaches. In my work, I show how it is possible to mitigate these issues by avoiding
автотехномузыкадетское