Size Estimation of Approximate Predicates

59
Опубликовано 17 августа 2016, 20:53
Ever-increasing amounts of text are produced by end-users or collected from multiple sources. Since such data typically has many errors and lacks standardized representation, similarity query processing has recently drawn significant interests; it has a wide range of applications including query refinement for web search, near duplicate document detection and elimination. In this talk, we will discuss size estimation of similarity queries, which is crucial in query optimization. We consider string/substring matching problems with edit distance, and the set similarity join problem. The proposed techniques are based on recent developments on similarity query processing such as Min-Hashing. We present how we can combine such techniques with frequent pattern mining and sampling to estimate size of similarity queries.
автотехномузыкадетское