Words, links, and patterns: novel representations for Web-scale text mining

45
Опубликовано 6 сентября 2016, 5:01
Textual data is everywhere, in email and scientific papers, in online newspapers and e-commerce sites. The Web contains more than 200 terabytes of text not even counting the contents of dynamic textual databases. This enormous source of knowledge is seriously underexploited. Textual documents on the Web are very hard to model computationally: they are mostly unstructured, time-dependent, collectively authored, multilingual, and of uneven importance. Traditional grammar-based techniques don't scale up to address such problems. Novel representations and analytical tools are needed. I will discuss several recent contributions related to text mining from a variety of genres. More specifically these include (a) lexical models of the growth of the Web, (b) graph-based entity classification, (c) evolving news summarization, and (d) mining protein interactions in papers. As it turns out, the right representations, when complemented with traditional NLP techniques, turn all of these into instances of better studied problems in areas such as social networks, statistical mechanics, sequence analysis, and computational phylogenetics.
Свежие видео
13 дней – 11 3559:08
Google Trends for SEO
13 дней – 1 0890:59
What are view transitions?
14 дней – 141 58810:21
A complete game changer! Insta360 Link 2
Случайные видео
262 дня – 6 0000:18
Why yes, I am being productive.
04.05.15 – 81 6492:46
Ubuntu MATE 15.04 - See What's New
автотехномузыкадетское