High-Throughput Data-Intensive Computing: Shared-Scan Scheduling in Scientific Databases & the Cloud

206
Опубликовано 17 августа 2016, 2:22
Data-intensive computing consists of batch-processing workloads that scan massive data sets in parallel. The focus on data access, data movement, data ingest, and data production means that these workloads overwhelm the network and I/O capabilities of data centers and supercomputers. Major improvements in throughput are available by co-scheduling tasks that access the same data so that multiple tasks complete processing based on accessing and transferring the data a single time. Multiple tasks share I/O, network data transfer, cache space, and even computing with SIMD or vector processing. This talk will review the evolution of co-scheduling in data-intensive computing systems, including shared-scan scheduling for map/reduce workloads (Agrawal et al., VLDB 2008), data-driven batch processing for scientific databases (LifeRaft and JAWS), shared streaming-I/O for spatial workloads, and shared join processing for Pig programs and Nova workflows.
Случайные видео
143 дня – 9 881 1810:56
Top 3 things from Google I/O 2024
160 дней – 495 50613:18
Tiny 3D-printed PC vs. Fastest Xbox
10.01.22 – 37 6260:50
Hash Code 2022 - Join us today!
автотехномузыкадетское