We study the relay selection problem for information freshness-oriented two-way relay networks (TWRNs) operated with physical-layer network coding (PNC). Information freshness is quantified by age of information (AoI), defined as the time elapsed since the generation time of the latest received information update. Since PNC leads to mutual wireless interference in TWRNs, this complicates the relay selection for users. To address this problem, this paper formulates relay selection as a multi-armed bandit (MAB) to dynamically learn the optimal mapping between users and relays. Specifically, the two end users act as agents, interacting with the environment, receiving feedback as rewards in the MAB, and then optimizing the system-level AoI performance through the learning experience. Simulation results demonstrate that the proposed MAB approach significantly outperforms the conventional relay selection scheme. Copyright © 2023 IEEE.
|Title of host publication||Proceedings of 2023 IEEE 20th Consumer Communications & Networking Conference (CCNC)|
|Place of Publication||USA|
|Publication status||Published - 2023|