Microsoft Research334 тыс
Опубликовано 22 июня 2016, 19:34
Machine Learning with text data can be very useful for social networks analytics for instance to perform sentiment analysis. Extracting a "machine learnable" representation from raw text is an art in itself. In this session we will introduce the bag of words representation and its implementation in scikit-learn via its text vectorizers. We will discuss preprocessing with NLTK, n-grams extractions, TF-IDF weighting and the use of SciPy sparse matrices. Finally we will use that data to train and evaluate a Naive Bayes classifier and a Linear Support Vector Machine.
Случайные видео