 Research
 Open access
 Published:
An efficient pruning scheme of deep neural networks for Internet of Things applications
EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 31 (2021)
Abstract
Nowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computationalintensive requirement of DNNs makes it difficult to be applicable for resourcelimited Internet of Things (IoT) devices. In this paper, we propose a novel pruningbased paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient endtoend training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR10, our proposed scheme is able to significantly reduce its FLOPs (floatingpoint operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machinelearningbased IoT framework and establish distributed training of neural networks in both cloud and edge.
1 Introduction
The Internet of Things (IoT), which aims to integrate the physical world by collecting and sharing information [1, 2], has been widely used in various areas, including smart city[3], smart transportation [1], smart home [4], and smart agriculture [5]. Moreover, extensive applications with IoT devices generate a large amount of data and it becomes incentive to utilize datadriven deep neural networks (DNNs) to further extract accurate information [6]. For example, a large number of biomedical data such as medical images could be smartly recognized by the convolutional neural network (CNN) to monitor human health [7]. Moreover, CNN has been widely used to process image data on IoT devices such as wireless sensor cameras [6, 8] and smart phones [7]. The authors in [3] propose several CNNbased applications in typical IoT scenarios, such as recognizing garbage images for waste management and monitoring parking spaces for smart parking lot management. Statistical evidence in [3] also shows that CNN is considered as one of the most extensively applied deep learning models for various IoT applications.
However, though it is generally believed that neural networks need to be complicated enough to represent the realworld targeted objects [9, 10], deep convolutional neural networks, usually with huge overhead and complexity in both storage and computing, are very difficult to be directly applied to resourcelimited IoT devices [2]. It is essential to reduce the model size of CNN due to its large computational overhead. To address this issue, previous work primarily focuses on reducing the computational overhead and storage cost of DNNs by carefully designing the corresponding network architecture, e.g., VGG [11], GoogLeNet [12], ResNet [13], and MobileNet [14], in regarding to complex CNN for processing images. Besides, networkpruning is often adopted to compress the deep neural network itself by removing unimportant interlayer connections [15, 16], neurons or entire channels in CNN [17–19]. An intuitive overview is depicted in Fig. 1.
Pruning at the scale of kernel in the convolutional layer, called as filterlevel pruning or channel pruning, has been extensively studied and achieved exciting results with huge reduction in computation and negligible performance loss in accuracy [17–20]. However, these pruning schemes generally follow the basic threestage procedure (as shown in Fig. 2), i.e., training a redundant network from scratch, pruning it and retraining it for accuracy recovery [15], which is cumbersome and timeconsuming especially for resourcelimited IoT devices, leading to a huge gap between theoretical performance and practical applications. Therefore, it remains a critical concern in practical IoT scenarios to improve the traditional pruningbased DNN compression process before its efficient application.
To cope with the aforementioned issue, we put forward a more concise and lightweight deep learning scheme to reveal an efficient and compact CNN structure in a more efficient manner. The typical process of our proposed scheme is depicted in Fig. 2. Specifically, the proposed strategy can be divided into two phases, i.e., structure learning and weight learning, and the latter functions in the same way as the conventional training process. During the period of structurelearning, we focus on evaluating the significance of each channel and unveiling a compact yet effective structure. To achieve the objective, we propose to evaluate the channels’ significance by Taylor criterion introduced by [17] and redistribute the remaining channels, which is stemed from weightredistribution proposed by [21, 22]. The criterion of Taylorexpansion aims to discover those channels whose removal leads to more impact on the final loss. However, such a criterion is only calculated on the basis of a single minibatch. In order to obtain all the channels’ significance evaluation over the entire data set, we propose a longterm assessing variable called as featuresaliency, which is computed by the moving average on each batch’s evaluation criterion. Simultaneously, considering the common finding that layers are not equally important in a deep neural network [18], we prefer to allocate more channels to sensitive layers, namely pruning less parameters in these layers. To achieve this goal, we extend the original algorithm with regard to weightredistribution [21] to convolutional kernels and call it as channelredistribution. Generally, inspired by the basic weights redistributing steps suggested in [21], we firstly calculate the saliency of different layers, then temporarily remove a certain proportion of channels in each layer and finally redistribute those removing channels according to the layerwise saliency. We summarize the novel channelredistribution algorithm in Fig. 3. Next, we have to remove the surplus channels with their corresponding kernels and train the preserved weights after obtaining the compact structure.
It is noteworthy that the new training model at the final stage (i.e., weight learning) is much smaller than the original one in terms of both computational cost and number of hyperparameters, implying the training process is relatively fast. In other words, the timeconsuming training of a large neural network in traditional pruning methods could be avoided in our proposed scheme. The process of learning the compact structure also solves the problem of how to design an efficient DNN, namely determining the appropriate structure of neural network to be used for resourcestringent IoT devices. On the other hand, there are also some researches on solving the timeconsuming training of DNNs for IoT applications by introducing the distributed architecture [2, 6, 23, 24]. A typical distributed learning process for IoT consists of training a redundant deep neural network at the cloud computing servers, and then pushing it to edge nodes. Actually, our scheme can further bridge the connection between redundant and compact neural networks at cloud and edge nodes, as depicted in Fig. 4. Considering a kind of application scenario where a large DNN model has been trained generally at a cloud server, we can retrain and prune it to gain compact networks to be more suitable for some specific IoT tasks. Compared with directly training a compact neural network from scratch, our proposed scheme transfers the knowledge of original neural networks and is able to achieve better performance, e.g., faster convergence and higher efficiency.
Our major contribution can be summarized as follows.

Inspired by previous work on pruning [17, 21], we propose a novel training strategy for learning compact and efficient neural networks. The proposed scheme can achieve comparatively good performance with significantly reduced model size, computational complexity, and negligible accuracy loss. Compared with the traditional pruningbased DNN compression methods, our scheme is more concise and realizes endtoend DNN compression. Moreover, our scheme also overcomes the dilemma of designing neural network structures through adaptive structurelearning.

We incorporate our lightweight scheme into the common IoT applications and establish a novel paradigm for applying DNN to IoT scenarios with resource constraints yet heavy tasks. The proposed paradigm is also capable to migrate large deep neural networks to edge computing nodes through compression and retraining, which efficiently facilitates to adapt to any specific edge tasks.

We conduct extensive experiments on various standard benchmark datasets, including CIFAR10 [25] and ILSVRC12 [9], and compare with the wellrecognized advanced CNN architectures, including VGG [11], ResNet [13], and MobileNet [14]. Simulation results verify the effectiveness of our scheme.
The remainder of this paper is organized as follows: Section 2 talks about some necessary backgrounds on deep neural networks and formulates the DNNbased IoT applications scenario. Section 3 gives the details of our proposed pruning scheme in terms of mathematical formulation and algorithm, while Section 4 presents the detailed experimental results. Finally, Section 5 summarizes the paper and offers future directions.
2 Background of CNN pruning
2.1 DNNpowered IoT
As mentioned before, a large amount of data produced by IoT devices promotes the application of datadriven deep nerural networks to automatically extract useful representations from raw data [2, 6]. Among many deep learning methods, CNN has been extensively used to process twodimensional data and is further applied to IoT devices, such as smart wireless cameras [6, 8], or applications [3, 6–8, 26]. Typically, CNN, being composed of convolutional layers, pooling layers, and fully connected layers (as shown in Fig. 5), has a large number of parameters and huge computational overhead that limits its extensive application for resourceconstrained IoT devices. Therefore, reducing the complexity of CNN has become an imperative research topic and pruning belongs to one popular means.
2.2 Related works on pruning
Unstructured pruning Early works generally focus on pruning deep neural networks by removing redundant weights according to their magnitude [15, 16]. However, in order to obtain the significance of various weights, they have to start from training a redundant neural network in advance. In addition, the pruning weights are determined by rigidly setting a global threshold of magnitude for the whole deep neural network. Later work [27] proposes to improve the traditional pruning process by selectively learning the corresponding weights with greater impact on loss while discarding the others through cutting off their gradient flow. Moreover, both [22] and [21] propose a smoother way, namely redistributing the remaining weights, in order to obtain a proper compact structure instead of setting a threshold. They both suggest to allocate more weights to the sensitive layers, although the detailed approaches differ concretely. Our scheme aims to extend their work to another kind of redistribution at the scale of convolutional kernel, namely channelwise redistribution for structured pruning.
Structured pruning Due to the reason that weightpruning does not significantly reduce the amount of computation load, researchers begin to pay attention to largescale pruning, i.e., filterpruning or channelpruning. Specifically, both the work [28] and [19] introduce an extra loss of “Group LASSO” to compel some kernels or the corresponding weights in batchnormalization layers [29] to zero and prune them at the end of each training. In addition, the work [30] introduces a discriminationaware loss to keep channels that contribute to the discriminative power of neural networks. Some other methods propose to prune channels through optimizing the formulation of reconstruction error [20, 31], reducing the similarity between features [32, 33], and directly evaluating channels’ significance [17, 18]. Our algorithm is based on the evaluation of channel saliency as well. Furthermore, some recent pruning methods introduce advanced machinelearningbased approaches, such as metalearning [34] and generativeadversariallearning [35], which also achieve remarkable results.
Pruning with new paradigms Nearly none of the aforementioned methods deviates from the three basic steps of pruning, that is, training an overparameterized neural network, pruning, and finetuning it. Based on the argument in [36] that a compact DNN model trained from scratch can reach competitive performance compared with its redundant counterpart, the traditional pruning strategy may be too timeconsuming and outdated, thereby not suitable for the cloudtoedge distributed computing architecture for IoT applications. Recent work like [37] introduces a novel pruning strategy that temporally removes unimportant kernels but keeps them updated in the phase of training, namely soft pruning. Moreover, the paper [38] proposes to prune the model from scratch on the basis of random initialization. This model in [38] to find a compact structure by introducing group LASSO loss to the batchnormalization layers as same as network slimming [19]. However, our scheme differs in that we are inspired by the works in [17] and [21] and design a completely different structurelearning algorithm through evaluating channels’ importance and channel redistribution accordingly. In addition, all the parameters of the neural network are also updated simultaneously during the structurelearning process, in contrast to [38], since some prior weights in the training of large neural networks are still effective for the training of the compact counterparts, which is better than random initialization. In addition, some other pruning methods [39] propose to learn an efficient structure by automatic search that functions in a similar way to Network Architecture Search (NAS) [40]. In fact, many NAS schemes [41–43] aim to find a proper structure with excellent performance on exact datasets. However, NASbased schemes require much more computing resources and data to search for connections between neurons or convolutional channels from scratch, while pruningoriented schemes, based on exiting models, aim to reduce the complexity by searching over a smaller space with less resource overhead, and therefore are more suitable for IoT terminal deployment.
2.3 Potential applications in IoT scenarios
In order to reduce both training and inference cost of DNN, previous works take into account the cloudandedge computing architecture for dataheavy IoT applications and propose a distributed computing paradigm [2, 6, 23, 24]. As illustrated in Fig. 4, one may regard our proposed paradigm as a supplement to the original architecture, in which we improve the conventional process of copying the parameters from the cloud to the edge by introducing an efficient retraining scheme with structurelearning and weightoptimization, thereby making the model adaptable to any personalized IoT applications as well as reducing the redundant parameters and computational overhead.
3 Methods
3.1 Notations
Beforehand, we formally give some symbol notations used throughout the paper. Suppose we have a deep neural network with L convolutional layers, \(\mathbf {w}^{l}_{k}\) and \(\mathbf {z}^{l}_{k}\) are used to represent the convolutional kernel and the individual output channel of the lth convolutional layer, respectively. The subscript k∈[1,⋯,C^{l}] represents the channel index, where C^{l} indicates the total number of output channels in the corresponding layer. We further use H^{l} and W^{l} to indicate the height and width of channels in the lth layer, respectively. Pruning the kth channel in layer l signifies removing the corresponding kernel \(\mathbf {w}^{l}_{k}\). Moreover, we define f^{l} to represent longterm evaluation of channels, i.e., feature saliency, \(\mathbf {f}^{l}\in \mathbb {R}^{C^{l}}\). Overall, we summarize all notations in Table 1.
At the beginning of training, each layer retains the same proportion of channels, which is controlled by the pruning rate p. These preserved channels will be adaptively redistributed at the end of each training epoch. Succinctly, the preserved channels are called as activated channels. We further define [a^{l}]_{i} to represent the number of activated channels in the lth layer where the subscript i refers to the iterative epoch of training. The initialized values of [a^{l}]_{i} are
3.2 Criterion of channel significance
In order to evaluate the channels’ saliency, we adopt a Taylorexpansion [17] based criterion. Considering a minibatch B={X={x_{1},x_{2},...,x_{m}},Y={y_{1},y_{2},...,y_{m}}}, the final loss on the batch B can be defined as J(B,W) where W represents the network parameters. Suppose a kernel \(\mathbf {w}^{l}_{k}\) with respect to its activation \(\mathbf {z}^{l}_{k}\) is removed, the corresponding impact on the cost function J can be expressed as
We use the Taylor series to expand the cost function at point \(\mathbf {z}^{l}_{k}=0\)
Ignoring the higherorder remainder and substituting (3) to (2), we have
The criterion can be regarded as a measure of the significance of feature maps for a singleentry minibatch. For a channel with multivariate output, the item \(\Theta ^{l}_{k}\) can be rewritten as
where M is the total number of channel’s entries. The computation of item \(\Theta _{k}^{l}\) requires the activation and the gradient, which can be calculated from the forward and backward propagation, respectively. Furthermore, we impose an extra rescaling method with maxnormalization, that is
Such normalization process is essential since we need to ensure that these evaluation values of each layer are at the same scale. Its function is similar to batchnormalization [29], which ensures that the statistics of layerwise evaluation values are under the same distribution. Equation (6) indicates that the maximum criterion values regarding different layers are all normalized to 1, resulting in comparable scale of feature saliency f^{l}, which is defined as a longtime estimating variable for individual channels
where the hyperparameter ε is a smoothing factor set to 0.98 for all experiments in this paper. The featuresaliency helps determine which channels are retained when the structure is fixed. Note that the values of f^{l} update with each minibatch in a training epoch, we omit the iterative epoch index i for simplicity of representation.
3.3 Channel redistribution
The proposed channel redistribution process occurs at the end of each training epoch, which is indicated by the subscript i. Note that the aforementioned feature saliency evaluation is based on a single channel, it is necessary to calculate the significance of each layer which has several channels in order to obtain an efficient structure. Suppose [ξ^{l}]_{i} indicates the corresponding layer’s significance for the iterative epoch i
where \(\hat {\mathbf {f}}^{l}\in \mathbf {f}^{l}\) is its subset which contains several large values of feature saliency of the corresponding layer and the total number of elements in \(\hat {\mathbf {f}}^{l}\) is [a^{l}]_{i−1}. Next we need to normalize all [ξ^{l}]_{i} so that the sum of these values is 1.
Looking again at the channel redistribution process shown in Fig. 3, after obtaining each layer’s significance evaluation, the following step is to temporarily remove a fixed proportion of channels to release some reallocating space, followed by redistributing channels according to the calculated values about layers’ significance, that is, updating the number of activated channels [a^{l}]_{i−1} in each layer. Given that the updated value may exceed the maximum number of channels in the corresponding layer, the value of [a^{l}]_{i} is limited to C^{l} which is the total number of channels in the original structure as the following formula
where the sparsity s is a hyperparameter which is predefined to indicate how many channels are reallocated each time. Different from the original work [21] that adjusts the value of s throughout training, the sparsity s is fixed to 0.5 in our experiments. Moreover, we allocate the extra channels evenly among the other layers if necessary. All the relevant details are shown in Algorithm 1.
Furthermore, after uncovering a suitable compact structure, we need to remove those insignificant channels with their convolutional kernels and train the remaining weights to obtain the representative capability, as depicted in Fig. 2. In the period of pruning, the remaining channels are determined according to their feature saliency f^{l} as well as the number of the activated channels a^{l} in the corresponding layer. Overall, we summarize the total process of our pruning scheme in Algorithm 2. For the reason that channelpruning is simply applied to convolutional layers, we have omitted the general batchnormalization [29] layers, activation layers, pooling layers, and fully connected layers for simplicity.
3.4 Discussion
As most of the heavy computation is concentrated on convolutional layers, we only need to pay attention to computational overhead or saving in these layers. Suppose the output channel size of the lth layer is H^{l}×W^{l} and the final number of activated channels is A^{l}, accordingly (C^{l}−A^{l}) kernels in the corresponding layers will be removed. Therefore, the dimension of remaining channels in the lth layer is H^{l}×W^{l}×A^{l} and the computation in terms of FLOPs (floatingpoint operations) in such layer decreases from K^{2}×C^{l−1}×H^{l}×W^{l}×C^{l} to K^{2}×A^{l−1}×H^{l}×W^{l}×A^{l}, where the label K indicates the kernel size. Compared to the raw FLOPs with respect to individual layers, a reduced ratio of \(\left (1\frac {A^{l1}A^{l}}{C^{l1}C^{l}}\right)\) is obtained, leading to large decrease in the computational cost of CNN.
4 Results and discussion
4.1 Experimental setting
We evaluate our scheme on various representative benchmark datasets, including CIFAR10 [25] and ILSVRC12 [9], and compare with the advanced DNN architectures, including VGG [12], ResNet [13], and MobileNet [14]. CIFAR10 contains 50,000 training images and 10,000 testing images, which are categorized into 10 classes. We follow the common data augmentation suggested by [13] with shifting and mirroring. Both architectures are trained from scratch using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1. The learning rate is decayed by 10 times in every one third of the total number of iterations. The weight decay and momentum is 10^{−4} and 0.9, respectively. ILSVRC12 contains 1.3 million training images and 50,000 validating images without test set. While evaluating on ILSVRC12, we also follow the training settings and the strategy of data augmentation suggested by [13] and adopt Pytorch [44] which is the fundamental framework of our experiments. Note that those advanced CNN architectures including VGG and MobileNet are designed for large dataset like ImageNet, retraining, and pruning them to match small dataset like CIFAR10 could be viewed as a suitable verification platform of distributed training process in both cloud and edge nodes for IoT applications.
In order to verify the effectiveness of our proposed scheme, we formally compare our scheme’s performance with that of various stateoftheart pruning approaches, including PFEC [18], NS [19], CP [31], ThiNet [20], SFP [37], CFP [32], DCP [30], FPGM [45], COP [33], GAL [35], PFS [38], and ASS [39]. Moreover, we present the performance of our scheme in terms of both theoretical acceleration and practical acceleration with respect to various pruning rates to show the robustness and efficiency of our scheme. Overall, our proposed method has achieved comparable and satisfactory results even with more concise pruning program, which could be effectively incorporated into the common distributed training paradigm for anticipated IoT applications.
4.2 Experiments on CIFAR10
Pruning VGG. Though VGG is not designed for small data set like CIFAR10, previous work have studied its performance at extremely high pruning rates. We firstly train an original nonpruned 16layer VGG as baseline (no pruning) and then run several experiments with different pruning rates from scratch. We compare the testing accuracy with that of the previous stateoftheart approaches and summarize the corresponding results in Table 2.
As shown in Table 2, our proposed scheme can achieve comparable results with the aforementioned stateoftheart methods with different reduced FLOPs and parameters. For instance, a compact model with 49.3% in FLOPs drop achieves superior accuracy compared to the baseline performance. In the case where the FLOPs and number of parameters are reduced by 72.6% and 94.1%, respectively, the pruned VGG based on our scheme can still maintain an applicable accuracy of 93.27% for such dataset.
Pruning ResNet. Note that compact ResNet architectures with less channels in each layer are built up in [13] for recognizing images from CIFAR10, we adopt the recommended 32layer and 56layer ResNet as baselines (no pruning). Specifically, for the reason that the input/output number of channels within a residual block must be consistent to ensure the shortcut connection, we only prune the first layer’s output channels per block.
It can be observed from Table 3 that our proposed strategy can achieve competitive results. For example, the compact ResNet32 with 49.0% reduction in FLOPs and 60.1% reduction in parameters still retains an accuracy of 92.50% (i.e., 93.20–0.70% = 92.50%). In addition, more experiments on pruning ResNet56 further verify the effectiveness of our algorithm. For example, in the case where the FLOPs reduction and the parameters reduction are 49.6% and 58.0%, respectively, the performance of the compact model established by our scheme only decreases by 0.17% in accuracy.
Pruning MobileNet. We design a MobileNetlike neural network with less layers for simplicity. Its primeval structure contains ten blocks with each block including a depthwise convolutional layer and a pointwise convolutional layer [14]. Since the output channels of depthwise convolutional layer change as soon as the channel number of its previous pointwise layer changes, we only need to focus on pruning channels in the pointwise convolutional layers. The pruning results are shown in Table 4. Overall, our algorithm can still achieve good performance even for such computationally efficient architecture. For example, when FLOPs and parameters compression ratio increases to 61.3% and 92.9%, respectively, the accuracy loss is only 0.27%.
4.3 Experiments on ImageNet
We adopt a widely studied architecture ResNet50 as in the previous pruning approaches. Different from general ResNet architecture, ResNet50 contains a special structure called “bottleneck” [13], which includes three convolutional layers with only the middle layer being expressive in each residual block. Similar to pruning ResNet on CIFAR10, we focus on pruning the channels of the first two layers in a bottleneck, so that we do not need to worry about the identity mapping when copying the parameters to a compact model. We summarize the experimental results on ILSVRC12 in Table 5 where we report the performance of both the advanced approaches and ours. It can be observed that the pruned model based on our scheme can reach a comparable accuracy along with significant reduction in both FLOPs and parameters.
Noteworthily, our method is indeed not as effective as some startoftheart algorithms. However, these advanced algorithms have added additional training strategies or enlarged the training time, but our algorithm is very efficient and simple, thus being deployed in a wide range of IoT scenarios.
4.4 Tradeoff between performance and compression rate
In practical IoT scenarios, it is necessary to balance the performance and compression rates according to different computing requirements and energy consumption restrictions. On the other hand, showing the performance with various compression rates can also illustrate the robustness and efficiency of an pruning algorithm. Thus in this section, we explore the performance of our scheme upon different pruning rates. For all experiments with different network architectures, we use the same hyperparameter settings. We summarize the results in Figs. 6, 7, and 8, which corresponds to ResNet, VGG, and MobileNet, respectively.
As can be observed in Fig. 6, ResNet architecture is sensitive to pruning. When the FLOP reduction proportion increases to 0.6, the performance in accuracy drops by nearly 1.0%. In addition, when pruning VGG and MobileNet, our proposed scheme is more robust in terms of various reduced FLOPs as well as pruned parameters. As depicted in Figs. 7 and 8 with regard to pruning VGG and MobileNet, respectively, our proposed strategy can achieve efficient neural network structures with even higher testing accuracies compared to their baselines at the low level of compression rate for both VGG and MobileNet. Such interesting results also indicate that the performance of compact models may outperform that of redundant models to some extent, which implies that the premise of efficient training is to unveil superior neural network with a suitable structure.
4.5 The uncovered compact structures
In this section, we take advantage of the subneuralnetwork architectures revealed by our proposed method. Note that a practical problem of deploying DNNs is how to design appropriate lightweight structures to adapt to resourcelimited IoT computing tasks, so learning the compact structures can help us design efficient neural networks beyond the stateoftheart architectures. As seen from Fig. 9, compared with the original deep neural networks with no pruning, our scheme keeps more channels in the middle layers of the designed neural networks while effectively pruning more channels in the last layers and the first layer in the case of pruning VGG on CIFAR10. The discovered structure suggests that the middle layers are more sensitive whereas the first layer and the last layers are easier to be pruned, which is consistent with the previous findings in [18, 19], indicating the effectiveness of our proposed method.
It can be observed from Fig. 10 that when pruning ResNet on CIFAR10, the compact model tends to maintain more channels in layers where the number of channels doubles, suggesting those layers are more salient. Similar interesting phenomenon is found when pruning ResNet on ImageNet as well. As depicted in Fig. 11, although the distribution of the pruned channels appears to be disordered to some extent, more channels are still retained in the “turningpoint” layers where the number of channels in the original neural network jumps abruptly. Our proposed compact ResNet structure is consistent with the conclusion of sensitivity analysis in [18].
4.6 Acceleration in practice
In this section, we show the runningtime acceleration performance of the designed compressive neural networks in practice. We test all compact CNNs on several Intel E5 CPUs with the software platform of Pytorch deep learning framework in the operating system of Ubuntu 16.04. Due to the reason that the running time on GPUs is too short to manifest the differences among different methods as well as running on GPUs is not suitable for practical IoT devices, we have not shown the actual acceleration performance on such devices. For each compact neural network, we measure the time of forward propagation for 100 rounds and average them. The overall experimental results are organized in Table 6 where we present both theoretical amount of computation in FLOPs and practical acceleration results.
As shown in Table 6, the test results of each row are obtained by reducing the FLOPs of the corresponding neural network model by 50%, and the practical acceleration performance is consistently effective and impressive for all representative CNN architectures. In addition, the actual acceleration performance of MobileNet is significantly higher than that of both ResNet50 and ResNet56, indicating its potential suitability for resourcestringent IoT devices.
4.7 Training time measurement
In fact, one important issue hindering the application of DNN is its complexity in training time. However, our scheme is more efficient as both structure and weight learning are relatively faster in terms of the common training time, especially in the case where initial weights are transferred from posttraining models (e.g., inheriting the network parameters from cloud). To be specific, we experiment on one Nvidia RTX2080 GPU, with the software platform of Pytorch and the dataset of CIFAR10. Figure 12 provides the performance comparison in terms of the normalized training time of all neural networks. It can be observed from Fig. 12 that the time cost of structure learning is much shorter than that of parameter optimization, which indicates that our scheme is very efficient in finding the compact structures. In addition, the total training time decreases as the pruning rate increases in all experiments, implying our proposed scheme’s efficiency as well.
5 Conclusions
In this paper, we proposed a novel pruningbased paradigm that aims to apply DNN, especially CNN, to resourcelimited IoT scenarios. Our proposed scheme has the capability to train and compress deep neural networks simultaneously. Specifically, we introduce a heuristic algorithm to learn both the architecture and weights of the targeted neural network. Once a compression rate is given, our scheme can train a redundant and randomly initialized neural network into a compact, representative one. A large number of experiments have illustrated the effectiveness of our scheme, which can reduce the complexity of the redundant CNN while maintaining its performance, for example, a satisfying accuracy of 93.27% of the pruned VGG with dramatic reduction in FLOPs and the number of the involved parameters (i.e., 72.6% and 94.1%, respecitvely). In addition, extensive experiments also verify the performance of our scheme regarding various pruning rates in terms of both theoretical acceleration and practical running time reduction.
As mentioned before, our proposed strategy can realize efficient endtoend training and compression of CNN and is able to be incorporated into the conventional distributed computing paradigm to apply deep learning to resourcelimited IoT applications. Moreover, our scheme is lightweight and can be easily extended to other types of DNNs. For future work, we will apply the proposed pruning scheme to actual IoT scenarios to further testify its effectiveness.
Availability of data and materials
Both CIFAR10 and ILSVRC12 data sets are public and can be searched on Google.
References
J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, W. Zhao, A survey on Internet of Things: Architecture, enabling technologies, security and privacy, and applications. IEEE Internet Things J.4(5), 1125–1142 (2017).
M. A. AlGaradi, A. Mohamed, A. AlAli, X. Du, M. Guizani, A survey of machine and deep learning methods for Internet of Things (IoT) security. arXiv preprint arXiv:1807.11023 (2018).
M. Mohammadi, A. AlFuqaha, S. Sorour, M. Guizani, Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun. Surv. Tutorials. 20(4), 2923–2960 (2018).
E. Park, Y. Cho, J. Han, S. J. Kwon, Comprehensive approaches to user acceptance of Internet of Things in a smart home environment. IEEE Internet Things J.4(6), 2342–2350 (2017).
O. Elijah, T. A. Rahman, I. Orikumhi, C. Y. Leow, M. N. Hindia, An overview of Internet of Things (IoT) and data analytics in agriculture: Benefits and challenges. IEEE Internet Things J.5(5), 3758–3773 (2018).
H. Li, K. Ota, M. Dong, Learning IoT in edge: Deep learning for the Internet of Things with edge computing. IEEE Netw.32(1), 96–101 (2018).
X. Ma, T. Yao, M. Hu, Y. Dong, W. Liu, F. Wang, J. Liu, A survey on deep learning empowered IoT applications. IEEE Access. 7:, 181721–181732 (2019).
X. Xie, K. H. Kim, in The 25th Annual International Conference on Mobile Computing and Networking. Source compression with bounded DNN perception loss for IoT edge computer vision (ACMLos Cabos, 2019), pp. 1–16.
D. Jia, D. Wei, S. Richard, L. LiJia, L. Kai, L. FeiFei, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Imagenet: A largescale hierarchical image database (IEEEMiami, 2009).
A. Krizhevsky, I. Sutskever, G. E. Hinton, in Advances in Neural Information Processing Systems. Imagenet classification with deep convolutional neural networks (Curran Associates, Inc.Harrahs and Harveys, Lake Tahoe, 2012), pp. 1097–1105.
K. Simonyan, A. Zisserman, in International Conference on Learning Representations (ICLR). Very deep convolutional networks for largescale image recognition (OpenReview.netSan Diego, 2015).
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Going deeper with convolutions (IEEEBoston, 2015).
K. He, X. Zhang, S. Ren, J. Sun, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Deep residual learning for image recognition (IEEELas Vegas Nevada, 2016).
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
S. Han, J. Pool, J. Tran, W. Dally, in Advances in Neural Information Processing Systems 28, ed. by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett. Learning both weights and connections for efficient neural network (Curran Associates, Inc.Montreal, 2015), pp. 1135–1143.
S. Han, H. Mao, W. Dally, in International Conference on Learning Representations (ICLR). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (OpenReview.netCaribe Hilton, San Juan, 2016).
P. Molchanov, S. Tyree, T. Karras, T. Aila, J. Kautz, in International Conference on Learning Representations (ICLR). Pruning convolutional neural networks for resource efficient inference (OpenReview.netPalais des Congreptune, Toulon, 2017).
H. Li, A. Kadav, I. Durdanovic, H. Samet, H. P. Graf, in International Conference on Learning Representations (ICLR). Pruning filters for efficient convnets (OpenReview.netPalais des Congreptune, Toulon, 2017).
Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, C. Zhang, in The IEEE International Conference on Computer Vision (ICCV). Learning efficient convolutional networks through network slimming (IEEEVenice, 2017).
J. Luo, J. Wu, W. Lin, in The IEEE International Conference on Computer Vision (ICCV). Thinet: A filter level pruning method for deep neural network compression (IEEEVenice, 2017).
T. Dettmers, L. Zettlemoyer, Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).
H. Mostafa, X. Wang, in Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol.97, ed. by K. Chaudhuri, R. Salakhutdinov. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization (Long Beach, 2019), pp. 4646–4655.
E. De Coninck, T. Verbelen, B. Vankeirsbilck, S. Bohez, P. Simoens, P. Demeester, B. Dhoedt, in International Internet of Things Summit. Distributed neural networks for Internet of Things: The biglittle approach (Springer, 2015), pp. 484–492.
R. Hu, Y. Guo, E. P. Ratazzi, Y. Gong, Differentially private federated learning for resourceconstrained Internet of Things. arXiv preprint arXiv:2003.12705 (2020).
A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images. Technical Report (2009).
H. Amroun, M. H. Temkit, M. Ammi, in 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). Best feature for CNN classification of human activity using IoT network (IEEEExeter, 2017), pp. 943–950.
X. Ding, G. Ding, X. Zhou, Y. Guo, J. Han, J. Liu, in Advances in Neural Information Processing Systems 32. Global sparse momentum SGD for pruning very deep neural networks (Curran Associates, Inc.Vancouver, 2019).
W. Wen, C. Wu, Y. Wang, Y. Chen, H. Li, in Advances in Neural Information Processing Systems 29. Learning structured sparsity in deep neural networks (Curran Associates, Inc.Vancouver, 2016), pp. 2074–2082.
S. Ioffe, C. Szegedy, in Proceedings of the 32nd International Conference on Machine Learning (ICML). Batch normalization: Accelerating deep network training by reducing internal covariate shift (ACMLille, 2015).
Z. Zhuang, M. Tan, B. Zhuang, J. Liu, Y. Guo, Q. Wu, J. Huang, J. Zhu, in Advances in Neural Information Processing Systems 31. Discriminationaware channel pruning for deep neural networks (Curran Associates, Inc.Montreal, 2018), pp. 875–886.
Y. He, X. Zhang, J. Sun, in The IEEE International Conference on Computer Vision (ICCV). Channel pruning for accelerating very deep neural networks (IEEEVenice, 2017).
P. Singh, V. K. Verma, P. Rai, V. P. Namboodiri, Leveraging filter correlations for deep model compression. arXiv eprints, 1811–10559 (2018).
W. Wang, C. Fu, J. Guo, D. Cai, X. He, in Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence, IJCAI19. Cop: Customized deep model compression via regularized correlationbased filterlevel pruning (Morgan KaufmannMacao, 2019).
Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K. T. Cheng, J. Sun, in Proceedings of the IEEE International Conference on Computer Vision (ICCV). Metapruning: Meta learning for automatic neural network channel pruning (IEEESeoul, 2019), pp. 3296–3305.
S. Lin, R. Ji, C. Yan, B. Zhang, L. Cao, Q. Ye, F. Huang, D. Doermann, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Towards optimal structured CNN pruning via generative adversarial learning (IEEELong Beach, 2019).
Z. Liu, M. Sun, T. Zhou, G. Huang, T. Darrell, in International Conference on Learning Representations (ICLR). Rethinking the value of network pruning (OpenReview.netNew Orleans, 2019).
Y. He, G. Kang, X. Dong, Y. Fu, Y. Yang, in IJCAI International Joint Conference on Artificial Intelligence. Soft filter pruning for accelerating deep convolutional neural networks (Morgan KaufmannStockholm, 2018).
Y. Wang, X. Zhang, L. Xie, J. Zhou, H. Su, B. Zhang, X. Hu, Pruning from scratch. arXiv eprints, 1909–12579 (2019).
M. Lin, R. Ji, Y. Zhang, B. Zhang, Y. Wu, Y. Tian, Channel pruning via automatic structure search. arXiv eprints, 2001–08565 (2020).
T. Elsken, J. H. Metzen, F. Hutter, Neural architecture search: A survey. J. Mach. Learn. Res.20(55), 1–21 (2019).
M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, Q. V. Le, in 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Mnasnet: Platformaware neural architecture search for mobile (IEEELong Beach, 2019).
H. Liu, K. Simonyan, Y. Yang, in International Conference on Learning Representations(ICLR). DARTS: Differentiable architecture search (OpenReview.netNew Orleans, 2019).
Y. Xu, L. Xie, X. Zhang, X. Chen, G. J. Qi, Q. Tian, H. Xiong, in International Conference on Learning Representations(ICLR). Pcdarts: Partial channel connections for memoryefficient architecture search (OpenReview.netVirtual Conference, Formerly Addis Ababa ETHIOPIA, 2020).
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, in NIPSW. Automatic differentiation in pytorch (Curran Associates, Inc.Long Beach, 2017).
Y. He, P. Liu, Z. Wang, Z. Hu, Y. Yang, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Filter pruning via geometric median for deep convolutional neural networks acceleration (IEEELong Beach, 2019).
Acknowledgements
Not applicable.
Funding
This work was supported in part by National Key R&D Program of China (No. 2020YFB1804804), National Natural Science Foundation of China (No. 62071425, 61731002), Zhejiang Key Research and Development Plan (No. 2019C01002, 2019C03131), Huawei Cooperation Project, the Project sponsored by Zhejiang Lab (2019LC0AB01), and Zhejiang Provincial Natural Science Foundation of China (No. LY20F010016).
Author information
Authors and Affiliations
Contributions
QC is the major contributor of this paper. She has written most of the sections of the paper and carried out most simulations. SS completes the simulation program and adjusted the hyperparameters of the algorithm. RL is the corresponding author. He participates in discussing the main core content of the paper and approved the submitted manuscript. QL participates in designing the simulation program and revising the final manuscript. JL analyzed the effectiveness of the algorithm and revised the paper. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Qi, C., Shen, S., Li, R. et al. An efficient pruning scheme of deep neural networks for Internet of Things applications. EURASIP J. Adv. Signal Process. 2021, 31 (2021). https://doi.org/10.1186/s13634021007444
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634021007444