Skip to main content

Table 11 Results for the male dataset regarding the cumulative percentage below a tolerance threshold, in milliseconds, of the differences between forced aligned audio and ground-truth phonemes, also known as phone boundary

From: Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit

Toolkit

Cumulative tolerance

< 10 ms

< 25 ms

< 50 ms

< 100 ms

UFPAlign (HTK)

30.73%

62.45%

86.55%

 96.42%

EasyAlign

31.53%

67.51%

89.69%

 96.95%

MFA ( A )

32.81%

64.85%

78.49%

 90.61%

MFA (T&A)

45.12%

83.34%

97.23%

 99.66%

UFPAlign (mono)

43.51%

83.42%

96.29%

 99.42%

UFPAlign (tri-\(\Updelta\))

46.28%

85.55%

97.13%

 99.74%

UFPAlign (tri-LDA)

43.49%

84.50%

97.19%

 99.74%

UFPAlign (tri-SAT)

42.14%

83.51%

97.19%

 99.78%

UFPAlign (TDNN-F)

32.02%

70.62%

96.65%

 99.94%

  1. Notations on MFA stand for align-only (A) and train-and-align (T&A) procedures, while on UFPAlign they denote either the nature of the toolkit or the acoustic model