Joint optimization of model inferencing and task offloading for MEC-empowered large vision model services

  • Xinyi ZHUANG
  • , Jiaqi WU
  • , Hongjia WU
  • , Tingting ZHANG
  • , Lin GAO

Research output: Chapter in Book/Report/Conference proceedingChapters

3 Citations (Scopus)

Abstract

With the rapid advancement of Large Vision Models (LVMs) such as Sora, the initial comprehension of physical laws by large AI models has garnered significant attention, which enables them to interpret and apply physical principles with increasing accuracy and sophistication. Nevertheless, due to resource limitations and delay constraints, traditional cloud-based LVM services often fail to meet the diverse needs of users, particularly in scenarios requiring real-time responsiveness. In this work, we explore the scenario of Mobile Edge Computing (MEC)-empowered LVM services in wireless networks, where heterogeneous LVMs are deployed on both cloud and edge servers, and LVM Users (LUs) can offload computation task to edge servers to reduce delay and energy consumption. In such a scenario, we focus on the joint optimization of model inferencing and task offloading for LUs, aiming to maximize the total service utility, while minimizing delay and energy consumption. First, to characterize the utility of LVM services, we propose a multi-dimensional video quality metric based on real measurements, which incorporates both the prompt-video alignment and the classic video quality indicators. Then, to solve the problem in a decentralized manner, we propose a two-stage solution based on both learning and optimization techniques. In the first stage, we design a reinforcement learning-based Multi-Agent Proximal Policy Optimization (MAPPO) approach to make the real-time model inferencing and task offloading decisions. In the second stage, we employ the optimization-based Sequential Least Squares Programming (SLSQP) to make the efficient resource allocation decisions. Simulation results show that our proposed solution outperforms other benchmarks, and can reduce delay and energy consumption by up to 17.2% and 21.7%, respectively, while increasing service utility by up to 3%. Copyright © 2025 IEEE.

Original languageEnglish
Title of host publicationProceedings of IEEE Conference on Computer Communications, INFOCOM 2025
Place of PublicationUSA
PublisherIEEE
ISBN (Electronic)9798331543051
DOIs
Publication statusPublished - 2025

Citation

Zhuang, X., Wu, J., Wu, H., Zhang, T., & Gao, L. (2025). Joint optimization of model inferencing and task offloading for MEC-empowered large vision model services. In Proceedings of IEEE Conference on Computer Communications, INFOCOM 2025. IEEE. https://doi.org/10.1109/INFOCOM55648.2025.11044689

Fingerprint

Dive into the research topics of 'Joint optimization of model inferencing and task offloading for MEC-empowered large vision model services'. Together they form a unique fingerprint.