A topic modeling based approach to novel document automatic summarization

Zongda WU, Li LEI, Guiling LI, Hui HUANG, Chengren ZHENG, Enhong CHEN, Guandong XU

Research output: Contribution to journalArticlespeer-review

64 Citations (Scopus)


Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality. Copyright © 2017 Elsevier Ltd. All rights reserved.

Original languageEnglish
Pages (from-to)12-23
JournalExpert Systems with Applications
Early online dateMay 2017
Publication statusPublished - Oct 2017


Wu, Z., Lei, L., Li, G., Huang, H., Zheng, C., Chen, E., & Xu, G. (2017). A topic modeling based approach to novel document automatic summarization. Expert Systems with Applications, 84, 12-23. https://doi.org/10.1016/j.eswa.2017.04.054


Dive into the research topics of 'A topic modeling based approach to novel document automatic summarization'. Together they form a unique fingerprint.