Exploration on efficient similar sentences extraction

Yanhui GU, Zhenglu YANG, Guandong XU, Miyuki NAKANO, Masashi TOYODA, Masaru KITSUREGAWA

Research output: Contribution to journalArticlespeer-review

7 Citations (Scopus)


Measuring the semantic similarity between sentences is an essential issue for many applications, such as text summarization, Web page retrieval, question-answer model, image extraction, and so forth. A few studies have explored on this issue by several techniques, e.g., knowledge-based strategies, corpus-based strategies, hybrid strategies, etc. Most of these studies focus on how to improve the effectiveness of the problem. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top-k semantic similar sentences to a query. The previous methods cannot handle the big data efficiently, i.e., applying such strategies directly is time consuming because every candidate sentence needs to be tested. In this paper, we propose efficient strategies to tackle such problem based on a general framework. The basic idea is that for each similarity, we build a corresponding index in the preprocessing. Traversing these indices in the querying process can avoid to test many candidates, so as to improve the efficiency. Moreover, an optimal aggregation algorithm is introduced to assemble these similarities. Our framework is general enough that many similarity metrics can be incorporated, as will be discussed in the paper. We conduct extensive experimental evaluation on three real datasets to evaluate the efficiency of our proposal. In addition, we illustrate the trade-off between the effectiveness and efficiency. The experimental results demonstrate that the performance of our proposal outperforms the state-of-the-art techniques on efficiency while keeping the same high precision as them. Copyright © 2013 Springer Science+Business Media New York.

Original languageEnglish
Pages (from-to)595-626
JournalWorld Wide Web
Early online dateJan 2013
Publication statusPublished - Jul 2014


Gu, Y., Yang, Z., Xu, G., Nakano, M., Toyoda, M., & Kitsuregawa, M. (2014). Exploration on efficient similar sentences extraction. World Wide Web, 17, 595-626. https://doi.org/10.1007/s11280-012-0195-z


  • Semantic similarity
  • Query aggregation
  • Top-k


Dive into the research topics of 'Exploration on efficient similar sentences extraction'. Together they form a unique fingerprint.