Bayesian topic models

359
12
Следующее
Популярные
Опубликовано 6 сентября 2016, 16:30
Electronic documents provide vast amounts of information, but need to be organized in a way that lets people use that information. Topic models provide one way of approaching this problem, automatically identifying the topics that appear in a collection of documents, and indicating the extent to which each document reflects each topic. I will summarize the basic ideas behind one such model, Latent Dirichlet Allocation (Blei, Ng, & Jordan, 2003), and use this model to describe how tools from Bayesian statistics can be useful in statistical natural language processing. In particular, I will describe a simple algorithm for identifying topics from documents, based on Markov chain Monte Carlo, and show how this simple topic model can be extended to incorporate syntax, model the interests of authors, infer topic hierarchies, and pick out topically coherent segments of dialogue.
автотехномузыкадетское