From: Spatial and temporal learning representation for end-to-end recording device identification
SFENN | TFENN | |||
---|---|---|---|---|
DNN | CNN | ResNet | LSTM | Bi-LSTM |
Input (2496) | Input (64*39*1) | Input | Input | Input |
FC (1024) | Conv (6*5*5) | BN+Relu | LSTM(39) | Bi-LSTM(39) |
FC (1024) | pooling | Conv(16*3*3) | LSTM(78) | Bi-LSTM(78) |
FC (1024) | Conv (16*5*5) | BN+Relu | FC(1024) | FC(1024) |
 | pooling | Conv(16*3*3) |  |  |
 | Conv (40*5*5) | BN+Relu |  |  |
 | pooling | Conv(16*3*3) |  |  |
 | FC (1024) | BN+Relu |  |  |
 |  | Conv(16*3*3) |  |  |
 |  | pooling (Input) |  |  |
 |  | Add() |  |  |