Enabling Integrated Search and Exploration Over Large Multidimensional Data

51
Опубликовано 11 августа 2016, 8:09
The need for rich, ad-hoc data analysis is key for pervasive discovery. However, generic and reusable systems tools for interactive search, exploration and mining over large data sets are lacking. Exploring large data sets interactively requires advanced data-driven search techniques that go well beyond the conventional database querying capabilities, whereas state-of-the-art search technologies are not designed and optimized to work for large out-of-core data sets. These requirements force users to roll their own custom solutions, typically by gluing together existing libraries, databases and custom scripts, only to end up with a solution that is difficult to develop, scale, optimize, maintain and reuse. To address these limitations, we propose a tight integration of data management and search technologies. This combination would not only allow users to perform search efficiently, but also offer a single, expressive framework that can support a wide variety of data-intensive search and exploration tasks. As the first step in this direction, we describe a custom search framework called Semantic Windows, which allows users to conveniently perform structured search via shape and content constraints over a multidimensional data space. As the second step, we describe a general-purpose exploration framework called Searchlight, which allows Constraint Programming (CP) machinery to run efficiently inside a Database Management System (DBMS) without the need to extract, transform and move the data. This marriage concurrently offers the rich expressiveness and efficiency of constraint-based search and optimization provided by modern CP solvers, and the ability of DBMSs to store and query data at scale, resulting in an enriched functionality that can effectively support both data- and search-intensive applications. As such, Searchlight is the first system to support generic search, exploration and mining over large multidimensional data collections, going beyond point algorithms designed for point search and mining tasks.
автотехномузыкадетское