A topic modeling based approach to novel document automatic summarization

Zongda WU, Li LEI, Guiling LI, Hui HUANG, Chengren ZHENG, Enhong CHEN, Guandong XU

Research output: Contribution to journalArticlespeer-review

64 Citations (Scopus)

Abstract

Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality. Copyright © 2017 Elsevier Ltd. All rights reserved.

Original languageEnglish
Pages (from-to)12-23
JournalExpert Systems with Applications
Volume84
Early online dateMay 2017
DOIs
Publication statusPublished - Oct 2017

Citation

Wu, Z., Lei, L., Li, G., Huang, H., Zheng, C., Chen, E., & Xu, G. (2017). A topic modeling based approach to novel document automatic summarization. Expert Systems with Applications, 84, 12-23. https://doi.org/10.1016/j.eswa.2017.04.054

Fingerprint

Dive into the research topics of 'A topic modeling based approach to novel document automatic summarization'. Together they form a unique fingerprint.