An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing

Table 1 Notations and their definitions

Notation	Definition
L	The number of convolutional layers
p	The overall pruning rate for all channels
C^l	The original total number of channels in
	each layer, 1≤l≤L
\(\mathbf {w}^{l}_{k}\)	The convolutional kernel,
	\(\mathbf {w}^{l}_{k}\in \mathbb {R}^{C^{l-1}\times 3\times 3},1\le l\le L, 1\le k \le C^{l}\)
H^l	The height of channels, 1≤l≤L
W^l	The width of channels, 1≤l≤L
\(\mathbf {z}^{l}_{k}\)	The feature map or channel,
	\(\mathbf {z}^{l}_{k}\in \mathbb {R}^{H^{l}\times W^{l}},1\le l\le L, 1\le k \le C^{l}\)
f^l	The feature saliency, \(\mathbf {f}^{l}\in \mathbb {R}^{C^{l}}\)
	\(f^{l}_{k}\in \mathbf {f}^{l}, 1\le k\le C^{l}\)
[a^l]_i	The remaining channels in each layer in
	the i-th training epoch, 1≤l≤L
\(\Theta ^{l}_{k}\)	The evaluation on channels’ significance w.r.t.
	a single mini-batch, 1≤l≤L,1≤k≤C^l
[ξ^l]_i	The layers’ significance evaluation in the i-th
	training epoch, 1≤l≤L
J	The loss function adopted to evaluate the difference
	between the observed values and the actual ones
ε	The smoothing factor
s	The proportion of redistributing channels