Abstract
Harnessing data from social media to monitor health events is a promising avenue for public health surveillance. A key step is the detection of reports of a disease (referred to as 'health mention classification') amongst tweets that mention disease words. Prior work shows that figurative usage of disease words may prove to be challenging for health mention classification. Since the experience of a disease is associated with a negative sentiment, we present a method that utilises sentiment information to improve health mention classification. Specifically, our classifier for health mention classification combines pre-trained contextual word representations with sentiment distributions of words in the tweet. For our experiments, we extend a benchmark dataset of tweets for health mention classification, adding over 14k manually annotated tweets across diseases. We also additionally annotate each tweet with a label that indicates if the disease words are used in a figurative sense. Our classifier outperforms current SOTA approaches in detecting both health-related and figurative tweets that mention disease words. We also show that tweets containing disease words are mentioned figuratively more often than in a health-related context, proving to be challenging for classifiers targeting health-related tweets. Copyright © 2020 IW3C2 (International World Wide Web Conference Committee).
Original language | English |
---|---|
Title of host publication | Proceedings of The Web Conference 2020 |
Place of Publication | New York |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1217-1227 |
ISBN (Electronic) | 9781450370233 |
DOIs | |
Publication status | Published - Apr 2020 |
Citation
Biddle, R., Joshi, A., Liu, S., Paris, C., & Xu, G. (2020). Leveraging sentiment distributions to distinguish figurative from literal health reports on Twitter. In Proceedings of The Web Conference 2020 (pp. 1217-1227). Association for Computing Machinery (ACM). https://doi.org/10.1145/3366423.3380198Keywords
- Health mention classification
- Public health surveillance
- Figurative language