A weighted word embedding model for text classification

Haopeng REN, ZeQuan ZENG, Yi CAI, Qing DU, Qing LI, Haoran XIE

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance. Copyright © 2019 Springer Nature Switzerland AG.
Original languageEnglish
Title of host publicationDatabase systems for advanced applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part I
EditorsGuoliang LI, Jun YANG, Joao GAMA, Juggapong NATWICHAI, Yongxin TONG
Place of PublicationCham
PublisherSpringer
Pages419-434
ISBN (Electronic)9783030185763
ISBN (Print)9783030185756
DOIs
Publication statusPublished - 2019

Fingerprint

Costs

Bibliographical note

Ren, H., Zeng, Z., Cai, Y., Du, Q., Li, Q., & Xie, H. (2019). A weighted word embedding model for text classification. In G. Li, J. Yang, J. Gama, J. Natwichai, & Y. Tong (Eds.), Database systems for advanced applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part I (pp. 419-434). Cham: Springer.

Keywords

  • Neural bag-of-words models
  • Term weighting schemes
  • N-grams
  • Text classification