Microsoft Research333 тыс
Опубликовано 19 августа 2016, 19:14
For many problems, algorithms implemented in MapReduce or similar two-ranks-of-parallel-tasks systems exhibit a tradeoff between memory size and communication. More precisely, the tradeoff is between "reducer size" (the number of inputs received by the second rank of parallel tasks) and the "replication rate" (the average number of key-value pairs generated by the first rank in response to a single input). We begin with the simple but common "all-pairs" problem, where there is some output associated with every pair of inputs. For this problem, the reducer size and replication rate are seen to vary inversely. We then look at the different relationships that exist between these two parameters for a number of other problems, including (dense) matrix multiplication, similarity joins, and computing marginals
Свежие видео