Detecting comments showing risk for suicide in YouTube

Jiahui GAO, Qijin CHENG, Leung Ho Philip YU

Research output: Chapter in Book/Report/Conference proceedingChapters

3 Citations (Scopus)


Natural language processing (NLP) with Cantonese, a mixture of Traditional Chinese, borrowed characters to represent spoken terms, and English, is largely under developed. To apply NLP to detect social media posts showing suicide risk, which is a rare event in regular population, is even more challenging. This paper tried different text mining methods to classify comments in Cantonese on YouTube whether they indicate suicidal risk. Based on word vector feature, classification algorithms such as SVM, AdaBoost, Random Forest, and LSTM are employed to detect the comments' risk level. To address the imbalance issue of the data, both re-sampling and focal loss methods are used. Based on improvement on both data and algorithm level, the LSTM algorithm can achieve more satisfied testing classification results (84.3% and 84.5% g-mean, respectively). The study demonstrates the potential of automatically detected suicide risk in Cantonese social media posts. Copyright © 2019 Springer Nature Switzerland AG.
Original languageEnglish
Title of host publicationProceedings of the Future Technologies Conference (FTC) 2018
EditorsKohei ARAI, Rahul BHATIA, Supriya KAPOOR
Place of PublicationCham
ISBN (Electronic)9783030026868
ISBN (Print)9783030026851
Publication statusPublished - 2019


Gao, J., Cheng, Q., & Yu, P. L. H. (2019). Detecting comments showing risk for suicide in YouTube. In K. Arai, R. Bhatia, & S. Kapoor (Eds.), Proceedings of the Future Technologies Conference (FTC) 2018 (Vol. 1, pp. 385-400). Cham: Springer.


Dive into the research topics of 'Detecting comments showing risk for suicide in YouTube'. Together they form a unique fingerprint.