Abstract
As an important task in Asian language information processing, Chinese word embedding learning has attracted much attention recently. Based on either Skip-gram or CBOW, several methods have been proposed to exploit Chinese characters and sub-character components for learning Chinese word embeddings. Chinese characters are combinations of meaning, structure, and phonetic information (pinyin). However, previous works only cover the former two aspects and cannot effectively explore distinct semantics of characters. To address this issue, we develop a Pinyin-enhance Skip-gram model named rsp2vec, in addition to a radical and pinyin-enhanced Chinese word embedding (rPCWE) learning models based on CBOW. For our models, the phonetic information and semantic components of Chinese characters are encoded into embeddings simultaneously. Evaluations on word analogy reasoning, word relevance, text classification, named entity recognition, and case studies validate the effectiveness of our models. Copyright © 2022 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Original language | English |
---|---|
Pages (from-to) | 42805-42820 |
Journal | Multimedia Tools and Applications |
Volume | 81 |
Issue number | 30 |
Early online date | 10 Aug 2022 |
DOIs | |
Publication status | Published - Dec 2022 |
Citation
Wang, F. L., Lu, Y., Cheng, G., Xie, H., & Rao, Y. (2022). Learning Chinese word embeddings from semantic and phonetic components. Multimedia Tools and Applications, 81(30), 42805-42820. doi: 10.1007/s11042-022-13488-6Keywords
- Chinese word embedding
- Semantic components
- Phonetic information