A general framework for learning prosodic-enhanced representation of rap lyrics

Hongru LIANG, Haozheng WANG, Qian LI, Jun WANG, Guandong XU, Jiawei CHEN, Jin-Mao WEI, Zhenglu YANG

Research output: Contribution to journalArticlespeer-review


Learning and analyzing rap lyrics is a significant basis for many Web applications, such as music recommendation, automatic music categorization, and music information retrieval, due to the abundant source of digital music in the World Wide Web. Although numerous studies have explored the topic, knowledge in this field is far from satisfactory, because critical issues, such as prosodic information and its effective representation, as well as appropriate integration of various features, are usually ignored. In this paper, we propose a hierarchical attention variational a utoe ncoder framework (HAVAE), which simultaneously considers semantic and prosodic features for rap lyrics representation learning. Specifically, the representation of the prosodic features is encoded by phonetic transcriptions with a novel and effective strategy (i.e., rhyme2vec). Moreover, a feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation. A comprehensive empirical evaluation demonstrates that the proposed framework outperforms the state-of-the-art approaches under various metrics in different rap lyrics learning tasks. Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Original languageEnglish
Pages (from-to)2267-2289
JournalWorld Wide Web
Early online dateFeb 2019
Publication statusPublished - Nov 2019


Liang, H., Wang, H., Li, Q., Wang, J., Xu, G., Chen, J., Wei, J.-M., & Yang, Z. (2019). A general framework for learning prosodic-enhanced representation of rap lyrics. World Wide Web, 22, 2267-2289. https://doi.org/10.1007/s11280-019-00672-2


  • Representation learning
  • Variational autoencoder
  • Hierarchical attention mechanism
  • Rap lyrics


Dive into the research topics of 'A general framework for learning prosodic-enhanced representation of rap lyrics'. Together they form a unique fingerprint.