Abstract
Facial video-based blood volume pulse (BVP) signal measurement holds great potential for remote health monitoring, while existing methods have issues with convolutional kernel perceptual field constraints. This paper proposes an end-to-end multi-level constrained spatiotemporal representation structure for facial video-based BVP signal measurement. First, an intra- and inter-subject feature representation is proposed to strengthen the BVP-related features generation at high, semantic, and shallow levels, respectively. Second, the global-local association is presented to enhance BVP signal period pattern learning, and the global temporal features are introduced into the local spatial convolution of each frame by adaptive kernel weights. Finally, the multi-dimensional fused features are mapped to one-dimensional BVP signals by the task-oriented signal estimator. The experimental results on the publicly available MMSE-HR dataset demonstrate that the proposed structure overperforms state-of-the-art methods (e.g., AutoHR) in BVP signal measurement, with a 20% and 40% reduction in mean absolute error and root mean squared error, respectively. The proposed structure would be a powerful tool for telemedical and non-contact heart health monitoring. Copyright © 2023 IEEE.
Original language | English |
---|---|
Pages (from-to) | 3948-3957 |
Journal | IEEE Journal of Biomedical and Health Informatics |
Volume | 27 |
Issue number | 8 |
Early online date | May 2023 |
DOIs | |
Publication status | Published - Aug 2023 |