Scalable Knowledge Harvesting

Microsoft Research330 тыс

Следующее

06.09.16 – 3081:26:16

Singularity of random Bernoulli matrices

Популярные

01.05.23 – 2 2251:47:05

MARI Grand Seminar - Large Language Models and Low Resource Languages

01.02.23 – 7 3115:27

Seeing AI app - Creating a Route

Опубликовано 6 сентября 2016, 5:41

Performance of many Natural Language Processing (NLP) systems have reached a plateau using existing techniques. There seems to be a general consensus that systems have to integrate semantic knowledge or world knowledge in one form or another in order to provide additional information required to improve the quality of results. But building adequate and large enough semantic resources is a difficult unsolved problem. In my work, I attack the problem of very large scale acquisition of semantic knowledge by exploiting natural language text available on the Internet. In particular, I concentrate on one problem: extracting is-a relations from a very large corpus (70 million web pages, 26 billion word corpus) downloaded from the Internet. Since the amount of data involved is greater by two orders of magnitude than published before, the algorithms designed had to be highly scalable. This was achieved by: 1. Using a novel Pattern-based learning algorithm that exploits local features. 2. Using a Clustering algorithm that uses randomized techniques by exploiting co-occurrence (global) features in linear time. Using these algorithms, I extract is-a relations from text to build a huge table. These extracted relations are then evaluated by using a host of different applications.

Свежие видео