Skip to main content

Table 5 Cross-word mismatches between transcriptions manually aligned by a phonetician (top) versus generated by our G2P software (bottom)

From: Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit

(a) “nada como um almoç o ao a r livre\(\rightarrow\)nada como um almoç oa r livre

a

w

m

o

s

O

\(\varnothing\)

\(\varnothing\)

a

h/

a

w

m

o

s

u

a

w

a

X

(b) pair a u m ar de arara rara no rio\(\rightarrow\)pair u m ar de arara rara no rio”

p

a

j

4

\(\varnothing\)

\(\texttt {u}\sim\)

m

a

h/

 

p

a

j

r

a

\(\texttt {u}\sim\)

\(\varnothing\)

a

X

 

(c) “o baile inicia às nov e e meia\(\rightarrow\)o baile inicia às nov i meia”

6

\(\varnothing\)

Z

n

O

v

i

\(\varnothing\)

  

a

j

s

n

O

v

i

i

  
  1. Word boundary losses, typically present in spoken language rather than in text, are represented by the empty set symbol (\(\varnothing\)), as well as deletion or addition of phonetic tokens that can be later merged into one (/u\(\sim\) m \(\rightarrow\) /u\(\sim\)) or split into two or more (/6/ /Z/ \(\rightarrow\) /a/ /j/ /s/), respectively.