Who 'Dat? Identity resolution in large email collections

Microsoft Research330 тыс

Следующее

07.09.16 – 241:16:30

Of Scripts and Programs: Tall tales, Urban Legends, and Future Prospects

Популярные

166 дней – 47957:52

Women in Data Science Fireside Chat with Ilda Ladeira, Karin Kimbrough and Lisa Cohen

338 дней – 1 8111:44

Announcing New Microsoft Research AI & Society Fellows program

Опубликовано 7 сентября 2016, 17:52

Automated techniques that can support the human activities of search and sense-making in large email collections are of increasing importance for a broad range of uses, including historical scholarship and lawyers involved in e-discovery incident to civil litigation. In this talk, I'll briefly describe some of the work to date on searching large email collections, and then for most of the talk I will focus on the more challenging task of support for sense-making. Specifically, I'll describe joint work with Tamer Elsayed to automatically resolve the identity of people who are mentioned ambiguously (e.g., just by first name) in a collection of email from a failed corporation (Enron). Our results indicate that for people who are well represented in the collection we can use a generative model to guess the right identity about 80 of the time. I'll conclude the talk with a few remarks on our next directions for techniques, evaluation, and additional types of collections to which similar ideas might be applied.

Свежие видео