Semi-unsupervised learning of taxonomic and non-taxonomic relationships from the web

48
Следующее
Популярные
Опубликовано 6 сентября 2016, 6:25
Due to the size of the World Wide Web, it is necessary to develop tools for automatic or semi-automatic analyses of web data, such as finding patterns and implicit information in the web, a task usually known as Web Mining. In particular, web content mining consists of automatically mining data from textual web documents that can be represented with machine-readable semantic formalisms. While more traditional approaches to Information Extraction from text, such as those applied to the Message Understanding Conferences during the nineties, relied on small collections of documents with many semantic annotations, the characteristics of the web (its size, redundancy and the lack of semantic annotations in most texts) favor efficient algorithms able to learn from unannotated data. Furthermore, new types of web content such as web forums, blogs and wikis, are also a source of textual information that contain an underlying structure from which specialist systems can benefit. This talk will describe an ongoing project for automatically acquiring ontological knowledge (both taxonomic and non-taxonomic relationships) from the web in a partially unsupervised way. The proposed approach combines distributional semantics techniques with rote extractors. A particular focus will be set on an automatic addition of semantic tags to the Wikipedia with the aim of transforming it, with small effort, into a Semantic Wikipedia.
автотехномузыкадетское