MGPolicy: Meta graph enhanced off-policy learning for recommendations

Xiangmeng WANG, Qian LI, Dianer YU, Zhichao WANG, Hongxu CHEN, Guandong XU

Research output: Chapter in Book/Report/Conference proceedingChapters

9 Citations (Scopus)

Abstract

Off-policy learning has drawn huge attention in recommender systems (RS), which provides an opportunity for reinforcement learning to abandon the expensive online training. However, off-policy learning from logged data suffers biases caused by the policy shift between the target policy and the logging policy. Consequently, most off-policy learning resorts to inverse propensity scoring (IPS) which however tends to be over-fitted over exposed (or recommended) items and thus fails to explore unexposed items. 

In this paper, we propose meta graph enhanced off-policy learning (MGPolicy), which is the first recommendation model for correcting the off-policy bias via contextual information. In particular, we explicitly leverage rich semantics in meta graphs for user state representation, and then train the candidate generation model to promote an efficient search in the action space. lMoreover, our MGpolicy is designed with counterfactual risk minimization, which can correct poicy learning bias and ultimately yield an effective target policy to maximize the long-run rewards for the recommendation. We extensively evaluate our method through a series of simulations and large-scale real-world datasets, achieving favorable results compared with state-of-the-art methods. Our code is currently available online. Copyright © 2022 Association for Computing Machinery.

Original languageEnglish
Title of host publicationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages1369-1378
ISBN (Electronic)9781450387323
DOIs
Publication statusPublished - Jul 2022

Citation

Wang, X., Li, Q., Yu, D., Wang, Z., Chen, H., & Xu, G. (2022). MGPolicy: Meta graph enhanced off-policy learning for recommendations. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1369-1378). Association for Computing Machinery. https://doi.org/10.1145/3477495.3532021

Keywords

  • Recommendation
  • Off-policy learning
  • Counterfactual risk minimization
  • Bias

Fingerprint

Dive into the research topics of 'MGPolicy: Meta graph enhanced off-policy learning for recommendations'. Together they form a unique fingerprint.