SOAC: Supervised off-policy actor-critic for recommender systems

Shiqing WU, Guandong XU, Xianzhi WANG

Research output: Chapter in Book/Report/Conference proceedingChapters

4 Citations (Scopus)

Abstract

Improving users' long-term experience in recommender systems (RS) has become a growing concern for recommendation platforms. Reinforcement learning (RL) is an attractive approach because it can plan and optimize long-term returns sequentially. However, directly applying RL as an online learning method in the RS setting can significantly compromise users' satisfaction and experience. As a result, learning the recommendation policy from logged feedback collected under different policies has emerged as a promising direction. Offline learning enables the agent to utilize off-policy learning techniques. Nevertheless, several challenges need to be addressed, such as distribution shift. In this paper, we propose a novel RL method, called Supervised Off-Policy Actor-Critic (SOAC), for learning the recommendation policy from the logged feedback without exploration. The proposed SOAC addresses challenges, including distribution shift and extrapolation errors, and focuses on improving the ranking of items in a recommendation list. The experimental results demonstrate that SOAC can achieve better recommendation performance than existing supervised RL methods. Copyright © 2023 IEEE.

Original languageEnglish
Title of host publicationProceedings of 23rd IEEE International Conference on Data Mining, ICDM 2023
EditorsGuihai CHEN, Latifur KHAN, Xiaofeng GAO, Meikang QIU, Witold PEDRYCZ, Xindong WU
Place of PublicationDanvers, MA
PublisherIEEE
Pages1421-1426
ISBN (Electronic)9798350307887
DOIs
Publication statusPublished - 2023

Citation

Wu, S., Xu, G., & Wang, X. (2023). SOAC: Supervised off-policy actor-critic for recommender systems. In Chen, G., Khan, L., Gao, X., Qiu, M., Pedrycz, W., & Wu, X (Eds.), Proceedings of 23rd IEEE International Conference on Data Mining, ICDM 2023 (pp. 1421-1426). IEEE. https://doi.org/10.1109/ICDM58522.2023.00185

Keywords

  • Recommender systems
  • Sequential recommendation
  • Reinforcement learning
  • Off-Policy Actor-Critic

Fingerprint

Dive into the research topics of 'SOAC: Supervised off-policy actor-critic for recommender systems'. Together they form a unique fingerprint.