Skip to main content

Table 3 VoxCeleb1 and VoxCeleb2 dataset

From: Text-independent speaker recognition based on adaptive course learning loss and deep residual network

Layer name

Kernel size

Strides

Output size

Conv1

7 ×7,32

1 ×1

(32,64,200)

Res1

\(\left [\begin {array}{lllll} 3\times 3,32\\ 3\times 3,32\end {array}\right ]\times 3\)

1×1

(32,64,100)

Conv2

1 ×1,64

2 ×2

(64,32,100)

Res2

\(\left [\begin {array}{lllll}3\times 3,64\\ 3\times 3,64\end {array}\right ]\times 4\)

1×1

(64,32,100)

Conv3

1 ×1,128

2 ×2

(128,16,50)

Res3

\(\left [\begin {array}{lllll}3\times 3,128\\ 3\times 3,128\end {array}\right ]\times 6\)

1×1

(128,16,50)

Conv4

1 ×1,256

2 ×2

(256,8,25)

Res4

\(\left [\begin {array}{lllll}3\times 3,256\\ 3\times 3,256\end {array}\right ]\times 3\)

1×1

(256,8,25)

Reshape

-

-

(2048,25)

CASP

1×512

1×1

(512)

FC

-

-

(512)