Action progression networks for temporal action detection in videos

Chong-Kai LU, Man-Wai MAK, Ruimin LI, Zheru CHI, Hong FU

Research output: Contribution to journalArticlespeer-review

Abstract

This study introduces an innovative Temporal Action Detection (TAD) model that is distinguished by its lightweight structure and capability for end-to-end training, delivering competitive performance. Traditional TAD approaches often rely on pre-trained models for feature extraction, compromising on end-to-end training for efficiency, yet encounter challenges due to misalignment with tasks and data shifts. Our method addresses these challenges by processing untrimmed videos on a snippet basis, facilitating a snippet-level TAD model that is trained end-to-end. Central to our approach is a novel frame-level label, termed 'action progressions,' designed to encode temporal localization information. The prediction of action progressions not only enables our snippet-level model to incorporate temporal information effectively but also introduces a granular temporal encoding for the evolution of actions, enhancing the precision of detection. Beyond a streamlined pipeline, our model introduces several novel capabilities: (1) It directly learns from raw videos, unlike prevalent TAD methods that depend on frozen, pre-trained feature extraction models. (2) It is flexible for training with trimmed and untrimmed videos. (3) It is the first TAD model to avoid the detection of incomplete actions. (4) It can accurately detect long-lasting actions or those with clear evolutionary patterns. Utilizing these advantages, our model achieves commendable performance on benchmark datasets, securing averaged mean Average Precision (mAP) scores of 54.8%, 30.5%, and 78.7% on THUMOS14, ActivityNet-1.3, and DFMAD, respectively. Copyright © 2024 The Author.

Original languageEnglish
Pages (from-to)126829-126844
JournalIEEE Access
Volume12
Early online dateAug 2024
DOIs
Publication statusPublished - 2024

Citation

Lu, C.-K., Mak, M.-W., Li, R., Chi, Z., & Fu, H. (2024). Action progression networks for temporal action detection in videos. IEEE Access, 12, 126829-126844. https://doi.org/10.1109/ACCESS.2024.3451503

Keywords

  • Action recognition
  • Temporal action detection
  • Video analysis

Fingerprint

Dive into the research topics of 'Action progression networks for temporal action detection in videos'. Together they form a unique fingerprint.