Correctness Checking Concepts and Tools for HPC: Call for Action

42
Опубликовано 27 июня 2016, 19:42
Today's high performance computing story is one where problems of ever-increasing scale in science and engineering are required to be solved under strict power budgets. This necessitates the use of heterogeneous computing elements (e.g., CPUs and GPUs) and also causes significant shifts in the use of established programming APIs (e.g., MPI mixed with Open MP and CUDA). In addition to detecting defects such as data races and deadlocks in this context, a designer increasingly worries about emerging issues such as resilience, floating-point precision, and even the ability to replay executions. My talk will first give a broad overview of our efforts directed at these problems. It will then focus on our tool GKLEE that helps locate data races in non-trivial CUDA kernels. I will close with two topics: (1) how the same kinds of concurrency errors pertaining to memory orderings are being repeated, and (2) the hope that by emphasizing correctness checking (in addition to the usual fixation on performance tuning) in basic concurrency courses, we might minimize these frequently committed mistakes.
автотехномузыкадетское