Creating Diverse Ensemble Classifiers to Reduce Supervision

94
Опубликовано 6 сентября 2016, 5:47
For many predictive modeling tasks, acquiring supervised training data for building accurate classifiers is often difficult or expensive. Training data may either be limited, or often additional data may be acquired, but there is a cost associated with the acquisition. We study the problem of learning with reduced supervision in three setting. First, in the pure supervised learning setting, where we try to maximize the utility of small datasets. Second, in a traditional active learning setting, where a large pool of unlabeled examples is available, and the learner can select training examples to be labeled. Third, in the setting of active feature-value acquisition, where the data contain missing feature-values, that may be acquired at a cost. For these settings, we present methods to learn more accurate models at lower costs of data acquisition. Our methods are based on a new technique for building a diverse ensemble of classifiers by using specially constructed artificial training examples. Experiments demonstrate that our method, DECORATE, performs consistently better than bagging, boosting and Random Forests when training data is limited. We also show that DECORATE can be very effective for the tasks of active learning and active feature-value acquisition.
автотехномузыкадетское