Microsoft Research334 тыс
Опубликовано 8 сентября 2016, 19:07
Redundancy through state replication is the primary mechanism for achieving fault tolerance in distributed systems. State machine replication (SMR) is used extensively both within datacenters, where machine failures are common and must be tolerated, and in the wide-area, to ensure that data is close to all the clients that access it, and to guard against data loss and service unavailability caused by datacenter outages. Today, the SMR protocol of choice in systems where performance and availability are critical is Paxos. Paxos does not depend on external failure detectors or reconfiguration services to tolerate the failure of a minority of replicas, and therefore, in theory, systems using Paxos have high availability. However, because of the need to optimize for high performance, the elegance of the core protocol does not fully extend to practical implementations. This work aims to plant practical SMR implementation aspects in a firm theoretical ground, and thus to enable SMR designs that achieve high throughput through near-perfect load balancing, near-optimal request processing latency (especially in the wide area), and high performance robustness when confronted with failures and slow replicas. The talk will focus on Egalitarian Paxos, a new variant of the Paxos protocol. In EPaxos, all replicas perform the same functions simultaneously to ensure load balancing, constant availability, and low commit latency. We will also show the benefits of in-depth exploration of other aspects of state machine replicationΓÇöaspects heretofore belonging only to the realm of practical optimizations- such as time leases.
Случайные видео