Unearthing Concurrency Bugs in Cloud-Scale Distributed Systems

471
39.3
Опубликовано 19 апреля 2017, 23:17
Users demand for 24/7 dependability of cloud services. Unfulfilled dependability is costly, yet, there are complex challenges to reach an ideal dependability. Behind cloud computing is a collection of hundreds of complex systems written in millions of lines of code that are brittle and prone to failures. In this talk, I am discussing about one of unsolved problems in distributed systems, "distributed concurrency bugs". Distributed concurrency bugs are caused by nondeterministic orders of distributed events such as message arrivals, crashes, and reboots. I am presenting my insight I gain from our bug study, which can help many research on bug combating. And I am presenting my effort to advance distributed system model checker to unearth hidden bugs in systems. I am proposing a principle of semantic awareness to tackle the major problem of model checker, "state space explosion". In this work, I am showing that leveraging semantic knowledge of systems under test can help model checker finds bugs 2x - 340x faster than state of the art.

See more on this video at microsoft.com/en-us/research/v...
автотехномузыкадетское