Microsoft Research334 тыс
Следующее
Опубликовано 8 июля 2016, 0:58
CTR (Click Through Rate) prediction is extremely important for Internet advertisements. Data of users' impression and click logs possess two major challenges. First, the collected data set in just a few days contains billions or more instances. Second, the number of positive data (i.e., clicks) is relatively small, so the data set is highly unbalanced. We develop a distributed Newton method for training very large-sale logistic regression. We use real data to analyze the scalability of our method, the relationship between test accuracy and data size, the workflow of big-data experiments, and the various tools for implementing big-data machine learning packages.