Adding Domain Knowledge to Latent Topic Models

260

Microsoft Research330 тыс

Следующее

17.08.16 – 1 45251:31

Fully Homomorphic Encryption over the Integers with Shorter Public Keys

Популярные

58 дней – 61254:23

Mapping the World: Creating a Global and Temporal High-Resolution Building Density Map

306 дней – 3 64920:57

AI Forum 2023 | The Small Models Revolution

Опубликовано 17 августа 2016, 2:31

Around the turn of the century, a favorite pastime in machine learning was to inject various forms of domain knowledge into clustering. Examples include the must-links, where two items must be in the same cluster, and the cannot-links, where they cannot be in the same cluster. Collectively known as constrained clustering, it produced more relevant clusters for domain experts. Fast forward a decade, a new favorite pastime is to inject various forms of domain knowledge into Latent Dirichlet Allocation. The goal is to constrain the latent topic assignment of each word, so that latent topic modeling is informed by both data and domain knowledge, and the resulting topics are more relevant for domain experts. We present a few examples that our group has worked on, starting from the simple topic-in-set knowledge where the latent topic of a word is constrained within a small set of candidate topics, to Dirichlet Forest which allows must-links and cannot-links on topics while maintaining conjugacy for efficient inference, to a general framework named Fold.all. Fold.all allows domain experts to express arbitrary knowledge in human-friendly First-Order Logic, and combines it with data using stochastic optimization. This approach enables domain experts to focus on high-level modeling goals instead of the low-level issues involved in creating a custom topic model.

Свежие видео