Atrax, a distributed web crawler

4 794
84.1
Следующее
Популярные
17.02.23 – 2 5981:01:27
Art of doing disruptive research
01.02.23 – 3 8394:51
Seeing AI app - World Channel
Опубликовано 9 сентября 2016, 22:15
This talk describes Atrax, a distributed and very fast web crawler. Running Atrax on a cluster of four DS20E Alpha servers saturates our internet connection. During a recent crawl, we were able to download about 115 Mbits/sec, or about 50 million web pages per day, over a sustained period of time. Atrax has been used to collect the raw data for numerous web studies performed at Compaq Research.
автотехномузыкадетское