Abstract
Improving users' long-term experience in recommender systems (RS) has become a growing concern for recommendation platforms. Reinforcement learning (RL) is an attractive approach because it can plan and optimize long-term returns sequentially. However, directly applying RL as an online learning method in the RS setting can significantly compromise users' satisfaction and experience. As a result, learning the recommendation policy from logged feedback collected under different policies has emerged as a promising direction. Offline learning enables the agent to utilize off-policy learning techniques. Nevertheless, several challenges need to be addressed, such as distribution shift. In this paper, we propose a novel RL method, called Supervised Off-Policy Actor-Critic (SOAC), for learning the recommendation policy from the logged feedback without exploration. The proposed SOAC addresses challenges, including distribution shift and extrapolation errors, and focuses on improving the ranking of items in a recommendation list. The experimental results demonstrate that SOAC can achieve better recommendation performance than existing supervised RL methods. Copyright © 2023 IEEE.
Original language | English |
---|---|
Title of host publication | Proceedings of 23rd IEEE International Conference on Data Mining, ICDM 2023 |
Editors | Guihai CHEN, Latifur KHAN, Xiaofeng GAO, Meikang QIU, Witold PEDRYCZ, Xindong WU |
Place of Publication | Danvers, MA |
Publisher | IEEE |
Pages | 1421-1426 |
ISBN (Electronic) | 9798350307887 |
DOIs | |
Publication status | Published - 2023 |
Citation
Wu, S., Xu, G., & Wang, X. (2023). SOAC: Supervised off-policy actor-critic for recommender systems. In Chen, G., Khan, L., Gao, X., Qiu, M., Pedrycz, W., & Wu, X (Eds.), Proceedings of 23rd IEEE International Conference on Data Mining, ICDM 2023 (pp. 1421-1426). IEEE. https://doi.org/10.1109/ICDM58522.2023.00185Keywords
- Recommender systems
- Sequential recommendation
- Reinforcement learning
- Off-Policy Actor-Critic