Skip to main content
Fig. 3 | EURASIP Journal on Advances in Signal Processing

Fig. 3

From: Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit

Fig. 3

Stages for training the neural network (TDNN-F) following Kaldi’s Mini-librispeech recipe. High-resolution, cepstral-normalized MFCCs (40 features instead of 13) are extracted from a speed/volume augmented corpora, as are the 100-dimensional i-vector features; to be used as input to the DNN. Training labels, on the other hand, are provided by a GMM tri-SAT acoustic model. At the top or bottom of each block, there is a reference for a script (boldface) and the directory where resources created by that script are usually stored (normal text). For details on the training pipeline of the tri-SAT and all previous GMM-based acoustic models, the reader is referred to Fig. 2

Back to article page