Towards Reliable Storage Systems

65
Опубликовано 6 сентября 2016, 18:35
Three trends will dominate the storage systems of tomorrow: users are storing increasingly massive amounts of data, storage software complexity is growing, and the use of cheap and less reliable hardware is increasing. These trends present us with a terrific challenge: How can we promise users that storage systems work robustly in spite of their massive software complexity and the broad range of disk failures that can arise? Unfortunately, current approaches describe recovery in thousands of lines of intricate, low-level C code and it is scattered throughout. As a result, current storage systems are not reliable. In this talk, I will present how we build a new generation of more robust and reliable storage systems by adhering to the idea that complexity is the enemy of reliability. Specifically, I will present new online and offline reliability frameworks (I/O Shepherding and SQCK) that advocate a higher-level strategy where the logic of reliability policies can be described clearly and concisely. With I/O shepherding, file system administrators can write disk-failure policies (such as retry, parity, mirrors, checksums, sanity checks, and data structure repairs) in a few lines of code in a single locale. With SQCK, file system developers can separate the logic of hundreds of data structure repairs from their low-level implementation. I will also discuss other interesting findings that show how storage system reliability is difficult to achieve in current approaches.
автотехномузыкадетское