Joint bilingual sentiment classification with unlabeled parallel corpora

Bin LU, Chenhao TAN, Claire CARDIE, Ka Yin Benjamin TSOU

Research output: Chapter in Book/Report/Conference proceedingChapters

77 Citations (Scopus)

Abstract

Most previous work on multilingual sentiment analysis has focused on methods to adapt sentiment resources from resource-rich languages to resource-poor languages. We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data. We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language. Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.12%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines. Copyright © 2011 The Association for Computational Linguistics.
Original languageEnglish
Title of host publicationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Place of PublicationPortland, Oregon
PublisherThe Association for Computational Linguistics
Pages320-330
Volume1
ISBN (Print)9781932432879
Publication statusPublished - 2011

Citation

Lu, B., Tan, C., Cardie, C., & Tsou, B. K. (2011). Joint bilingual sentiment classification with unlabeled parallel corpora. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (vol.1, pp.320-330). Portland, Oregon: The Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Joint bilingual sentiment classification with unlabeled parallel corpora'. Together they form a unique fingerprint.