Dealing with data: classification, clustering and ranking

226
Следующее
Популярные
Опубликовано 6 сентября 2016, 6:20
This talk will be focused on the following three pieces of work that we have done: (1)   How to utilize unlabeled data in classification? In many real-world machine learning problems, such as web categorization, only few labeled examples can be available since labeling needs human labor, and unlabeled data are far easy to obtain. So, naturally, one may wonder if we can utilize unlabeled data in our classification tasks. I will present a simple, powerful and mathematically clean approach to this problem, and demonstrate its good experimental results provided by the third party on a number of machine learning benchmarks. Our approach has been considered as state of the art in machine learning literature. (2)   How to partition directed graphs like the Web? Spectral clustering for undirected graphs has been being extensively studied since a mathematician Fiedler’s seminal work in 1970’s. The spectral method is so powerful that many people have attempted to generalize it to directed graphs. Among them the most popular one is perhaps Jon Kleinberg’s HITS algorithm for both ranking web pages and detecting web communities. In 2003, Monika Henzinger, the former research director at Google Inc., listed this generalization issue as one of six algorithmic challenges in web search engines. I will show how we thoroughly solve this problem via Markov chain theory, and also the application of our approach to real-world web data. This approach can be implemented with several lines of Matlab code. (3)   How to rank objects like images and texts? Link-based ranking has enjoyed a huge success in web search engines. However, in practice, many types of data have no link structure but being modeled as vectors in Euclidean spaces, for instance, texts and images. A principled way of ranking those kinds of data is to explore and exploit their intrinsic geometrical or manifold structure. I will show how we address this issue in a simple mathematical framework. Our approach has been widely used by different communities from image retrieval to bioinformatics. In addition, I will  also talk about some theoretic analysis around  those approaches, and discuss future extensions.
автотехномузыкадетское