Skip to main content

Table 10 Results for the female dataset regarding the cumulative percentage below a tolerance threshold, in milliseconds, of the differences between forced aligned audio and ground-truth phonemes, also known as phone boundary

From: Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit

Toolkit

Cumulative tolerance

< 10 ms

< 25 ms

< 50 ms

< 100 ms

UFPAlign (HTK)

31.40%

63.94%

88.19%

 97.08%

EasyAlign

36.59%

78.12%

94.06%

 98.91%

MFA ( A )

39.34%

75.99%

87.77%

 95.65%

MFA (T&A)

37.65%

78.69%

95.16%

 99.08%

UFPAlign (mono)

47.47%

87.70%

97.55%

 99.57%

UFPAlign (tri-\(\Updelta\))

50.44%

89.88%

98.34%

 99.62%

UFPAlign (tri-LDA)

47.48%

89.22%

98.27%

 99.76%

UFPAlign (tri-SAT)

45.69%

88.20%

98.15%

 99.77%

UFPAlign (TDNN-F)

34.41%

75.94%

97.61%

 99.87%

  1. Notations on MFA stand for align-only (A) and train-and-align (T&A) procedures, while on UFPAlign, they denote either the nature of the toolkit or the acoustic model