Cooperative Data and Computation Partitioning for Distributed Architectures

42
Опубликовано 6 сентября 2016, 16:26
The recent design shift towards chip multiprocessors has spawned a significant amount of research in the area of program parallelization. Performance gains in the future will require programmer and compiler intervention to increase the amount of parallel work possible. The future abundance of cores on a single chip offers many possibilities in ways to exploit the underlying parallel resources. Much of the recent work in this area has fallen into the areas of coarse-grain parallelization by the programmer with new programming models, such as transactional memory, and different ways to exploit threads and data-level parallelization. In this talk, I will focus on a different angle for increasing the available parallelism to the cores: compiler techniques to detect and exploit fine-grain parallelism. This technique creates fine-grain threads by partitioning code at the granularity of individual operations and data across multiple cores and caches. First, I will present a profile-guided method for partitioning memory accesses that intelligently disperses data across multiple caches to minimize coherence traffic, while balancing the working set demands on each cache. Next, I will describe a method for partitioning the computation of a program across multiple cores that creates a set fine-grain threads which directly communicate scalar values. Finally, I will show how these methods synergistically combine and discuss future directions in this area.
автотехномузыкадетское