Predicting the effectiveness of searches

78
Опубликовано 7 сентября 2016, 16:34
Text retrieval systems are designed to generate a ranking of results deemed relevant to the input query. However, depending on the quality of each query, there is likely to be a variance in the performance on each query. This performance, measured in terms of standard metrics like Precision and Recall, can be calculated with the help of available relevance judgements in the research setting. But in the case of new incoming queries, how can the performance be estimated? Being able to predict performance on a given query is particularly important because an acceptable level of performance needs to be provided for each individual query. Most users have experiences of issuing a query and either finding a relevant document immediately or spending considerable time and effort to no avail. These latter searches are frustrating to users and if sufficiently frequent, a search engine risks losing users. It is therefore critical that it is understood why searches fail. And since some failure is inevitable (a retrieval system cannot find a relevant document if it is not indexed) it is also important to predict when such failures occur in order to take remedial action. There is therefore a considerable interest within the Information Retrieval community in estimating the effectiveness of search. This talk first surveys some previous efforts towards this end. Thereafter, four properties of the result sets of queries are described, these are (i) the clustering tendency (ii) the sensitivity to document perturbation (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. Experimental evidence of the utility of these measures for the task of query performance prediction is then provided. Recent results published at SIGIR 2006 showed that a combination of these features results in the highest correlation with average precision reported to date.
автотехномузыкадетское