Natural language processing (NLP) with Cantonese, a mixture of Traditional Chinese, borrowed characters to represent spoken terms, and English, is largely under developed. To apply NLP to detect social media posts showing suicide risk, which is a rare event in regular population, is even more challenging. This paper tried different text mining methods to classify comments in Cantonese on YouTube whether they indicate suicidal risk. Based on word vector feature, classification algorithms such as SVM, AdaBoost, Random Forest, and LSTM are employed to detect the comments' risk level. To address the imbalance issue of the data, both re-sampling and focal loss methods are used. Based on improvement on both data and algorithm level, the LSTM algorithm can achieve more satisfied testing classification results (84.3% and 84.5% g-mean, respectively). The study demonstrates the potential of automatically detected suicide risk in Cantonese social media posts. Copyright © 2019 Springer Nature Switzerland AG.
|Title of host publication||Proceedings of the Future Technologies Conference (FTC) 2018|
|Editors||Kohei ARAI, Rahul BHATIA, Supriya KAPOOR|
|Place of Publication||Cham|
|Publication status||Published - 2019|