What New Bugs Live in the Cloud? (and How to Exterminate Them)

446
14.9
Следующее
Популярные
Опубликовано 29 июня 2016, 22:48
As more data and computation move from local to cloud settings, datacenter distributed systems have become a dominant backbone for many modern applications. However, the complexity of cloud hardware and software ecosystem has outpaced existing testing, debugging, and verification tools. In this talk, I will describe three new classes of bugs that appear in cloud-scale distributed systems: distributed concurrency bugs (with multiple failures), scalability bugs, and non-deterministic performance bugs. (1) A distributed concurrency bug is a concurrency bug in distributed systems that is caused by distributed events (message arrivals, local computation, fault/reboot) that can occur in non-deterministic order. (2) A scalability bug is a latent but that is scale dependent, which typically surface in large-scale deployments (100+ nodes), but not necessarily in small/medium-scale deployments. (3) A non-deterministic performance bug is a performance fault that only appears in specific topological scenarios (e.g., specific task placements and locations of slow hardware). I will present our work in combating these new classes of bugs, including semantic-aware model checking (SAMC), taxonomy of distributed concurrency bugs (TaxDC), scalability checks (SCk), performance verification (SPV), and path-based speculative execution (PBSE).
автотехномузыкадетское