Figure 1From: Multi-pose lipreading and audio-visual speech recognitionStandard AV-ASR system. Structure of audio-visual ASR system. Upper row corresponds to the audio system, where the features used for speech recognition are extracted and fed to the audio-visual integration block and classifier. The lower part corresponds to the lipreading system: first the mouth is tracked and a sequence of normalized mouth images is extracted, then the visual features are computed and finally used in the audio-visual integration and classification blocks.Back to article page