Off-policy learning over heterogeneous information for recommendation

Xiangmeng WANG, Qian LI, Dianer YU, Guandong XU

Research output: Chapter in Book/Report/Conference proceedingChapters

3 Citations (Scopus)


Reinforcement learning has recently become an active topic in recommender system research, where the logged data that records interactions between items and users feedback is used to discover the policy. Much off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has been a popular research topic in reinforcement learning. However, the log entries are biased in that the logs over-represent actions favored by the recommender system, as the user feedback contains only partial information limited to the particular items exposed to the user. As a result, the policy learned from such off-line logged data tends to be biased from the true behaviour policy. In this paper, we are the first to propose a novel off-policy learning augmented by meta-paths for the recommendation. We argue that the Heterogeneous information network (HIN), which provides rich contextual information of items and user aspects, could scale the logged data contribution for unbiased target policy learning. Towards this end, we develop a new HIN augmented target policy model (HINpolicy), which explicitly leverages contextual information to scale the generated reward for target policy. In addition, being equipped with the HINpolicy model, our solution adaptively receives HIN-augmented corrections for counterfactual risk minimization, and ultimately yields an effective policy to maximize the long run rewards for the recommendation. Finally, we extensively evaluate our method through a series of simulations and large-scale real-world datasets, obtaining favorable results compared with state-of-the-art methods. Copyright © 2022 Association for Computing Machinery.

Original languageEnglish
Title of host publicationProceedings of The ACM Web Conference 2022, WWW '22
Place of PublicationNew York
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450390965
Publication statusPublished - Apr 2022


Wang, X., Li, Q., Yu, D., & Xu, G. (2022). Off-policy learning over heterogeneous information for recommendation. In Proceedings of The ACM Web Conference 2022, WWW '22 (pp. 2348-2359). Association for Computing Machinery.


  • Recommendation
  • Off-policy Learning
  • Counterfactual risk minimization
  • Bias
  • Heterogeneous information network


Dive into the research topics of 'Off-policy learning over heterogeneous information for recommendation'. Together they form a unique fingerprint.