Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance. Copyright © 2019 Springer Nature Switzerland AG.
|Title of host publication||Database systems for advanced applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part I|
|Editors||Guoliang LI, Jun YANG, Joao GAMA, Juggapong NATWICHAI, Yongxin TONG|
|Place of Publication||Cham|
|Publication status||Published - 2019|
CitationRen, H., Zeng, Z., Cai, Y., Du, Q., Li, Q., & Xie, H. (2019). A weighted word embedding model for text classification. In G. Li, J. Yang, J. Gama, J. Natwichai, & Y. Tong (Eds.), Database systems for advanced applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part I (pp. 419-434). Cham: Springer.
- Neural bag-of-words models
- Term weighting schemes
- Text classification