Skip to main content
Figure 2 | EURASIP Journal on Advances in Signal Processing

Figure 2

From: Multi-pose lipreading and audio-visual speech recognition

Figure 2

AV-ASR system adopted in our experiments. Block diagram of the AV-ASR system adopted in our experiments. We consider this system as a model and introduce in it a pose normalization block in order to allow multi-pose speech recognition. The lower part corresponds to the lipreading system, where the pose normalization is incorporated after the mouth has been tracked and normalized. The possible feature spaces where the pose normalization can take place are related to the different steps involved in the computation of the visual features: first the images themselves, then a low-dimensional representation in frequency domain and finally the LDA features designed for the classification task. The audio-visual fusion is also adapted in order to take into account the reliability associated to the visual stream after pose normalization.

Back to article page