Ensemble learning with soft-prompted pretrained language models for fact checking

Shaoqin HUANG, Yue WANG, Eugene Y. C. WONG, Lei YU

Research output: Contribution to journalArticlespeer-review

Abstract

The infectious diseases, such as COVID-19 pandemic, has led to a surge of information on the internet, including misinformation, necessitating fact-checking tools. However, fact-checking infectious diseases related claims pose challenges due to informal claims versus formal evidence and the presence of multiple aspects in a claim. To address these issues, we propose a soft prompt-based ensemble learning framework for COVID-19 fact checking. To understand complex assertions in informal social media texts, we explore various soft prompt structures to take advantage of the T5 language model, and ensemble these prompt structures together. Soft prompts offer flexibility and better generalization compared to hard prompts. The ensemble model captures linguistic cues and contextual information in COVID-19-related data, and thus enhances generalization to new claims. Experimental results demonstrate that prompt-based ensemble learning improves fact-checking accuracy and provides a promising approach to combat misinformation during the pandemic. In addition, the method also shows great zero-shot learning capability and thus can be applied to various fact checking problems. Copyright © 2024 The Author(s). Published by Elsevier B.V.
Original languageEnglish
Article number100067
JournalNatural Language Processing Journal
Volume7
Early online dateMar 2024
DOIs
Publication statusPublished - Jun 2024

Citation

Huang, S., Wang, Y., Wong, E. Y. C., & Yu, L. (2024). Ensemble learning with soft-prompted pretrained language models for fact checking. Natural Language Processing Journal, 7, Article 100067. https://doi.org/10.1016/j.nlp.2024.100067

Keywords

  • Fact checking
  • Social media
  • Ensemble learning
  • Soft prompt
  • Language model

Fingerprint

Dive into the research topics of 'Ensemble learning with soft-prompted pretrained language models for fact checking'. Together they form a unique fingerprint.