Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 13h

Head-Pose-Aware Visual Speech Recognition with FiLM Modulation

Researchers have developed a new framework called HP-VSR-ResFiLM to improve visual speech recognition (VSR) by explicitly incorporating head-pose information. This method uses a pose-conditioned residual Feature-wise Linear Modulation (FiLM) block to adapt visual features based on head orientation, addressing challenges like geometric distortions and occlusions. Experiments on LRS2 and LRS3 datasets showed competitive performance, with word error rates of 25.0% and 33.2% respectively, demonstrating improved robustness for unconstrained VSR scenarios. AI

IMPACT Enhances robustness of speech recognition systems in real-world, unconstrained environments.

FiLM
Matthew Kit Khinn Teng
LRS2
LRS3
HP-VSR-ResFiLM