Candidate talk: On the Evaluation and Extraction of Thread-Level Parallelism in Ordinary Programs

35
Опубликовано 6 сентября 2016, 16:47
Multi-core systems such as IBM's Cell, Intel's Core 2 Duo and AMD's Barcelona are becoming ubiquitous. Efficient exploitation of the hardware parallelism available on such systems is tightly coupled to the degree of program parallelization (or multithreading). Although there has been a large amount of work done in the context of multithreading, the lack of a rigorous performance evaluation methodology and detailed application characterization make it difficult to assess the performance gain achievable using the existing techniques. As a first step, we present a rigorous performance evaluation methodology to establish a solid baseline of the performance gain achievable via program parallelization at the loop-level. Specifically, we determine the coverage, defined as the percentage of the total execution time, of loops in ordinary programs as represented by the industry-standard SPEC CPU 2000/6 (both integer and floating point) and EEMBC benchmarks. Subsequently, we dissect the loop coverage into two categories: inherently parallel (DOALL) and potentially parallel (non-DOALL) loops. The coverage of each category establishes an upper bound on the performance gain achievable for each category by any technique. Next, we proposed novel techniques for thread synchronization and load balancing for non-DOALL and DOALL loops respectively. We evaluated the techniques using the applications from the SPEC CPU and OMPM benchmark suites and achieved better performance, on real machines, than the state-of-the-art.
Случайные видео
43 дня – 5860:22
Cat Adventure
10.09.23 – 14 0752:21
Samsung x CNN: Through My Eyes
28.12.22 – 7 1560:59
Android 2022 recap #Shorts
27.09.16 – 9 6760:40
re:Invent 2016 | 5k Run
автотехномузыкадетское