 Research
 Open Access
 Published:
An incremental learning algorithm for the hybrid RBFBP network classifier
EURASIP Journal on Advances in Signal Processing volume 2016, Article number: 57 (2016)
Abstract
This paper presents an incremental learning algorithm for the hybrid RBFBP (ILRBFBP) network classifier. A potential function is introduced to the training sample space in space mapping stage, and an incremental learning method for the construction of RBF hidden neurons is proposed. The proposed method can incrementally generate RBF hidden neurons and effectively estimate the center and number of RBF hidden neurons by determining the density of different regions in the training sample space. A hybrid RBFBP network architecture is designed to train the output weights. The output of the original RBF hidden layer is processed and connected with a multilayer perceptron (MLP) network; then, a back propagation (BP) algorithm is used to update the MLP weights. The RBF hidden neurons are used for nonlinear kernel mapping and the BP network is then used for nonlinear classification, which improves classification performance further. The ILRBFBP algorithm is compared with other algorithms in artificial data sets and UCI data sets, and the experiments demonstrate the superiority of the proposed algorithm.
1 Introduction
In the field of pattern recognition and data mining, various methods and models are proposed to solve different problems. Existing methods can be divided into two levels including data level and algorithmic level. The data levels are mainly concerned with various sampling techniques [1]. The algorithmic level tried to apply or improve varieties of existing traditional learning algorithms such as fuzzy clustering [2], Markovian jumping system [3–5], knearest neighbors [6], and neural network, where singlelayer feedforward networks (SLFNs) have been intensively studied in the past several decades and applied to solve various problems in different fields, such as image recognition [7], signal processing [8], disease prediction [9], and industrial fault diagnosis [10]; in particular, radial basis function (RBF) neural networks offer an effective mechanism for nonlinear mapping and classification. In a typical RBF network, the number of hidden neurons is assigned a priori [11, 12], which leads to poor adaptability for different sample sets. Several sequential learning algorithms have been proposed to determine proper sizes of RBF network architectures. A resource allocation network (RAN) for constructing the RBF network is proposed in [13], which uses the novelty of incoming data as the learning strategy. A RAN algorithm based on an extended Kalman filter (RANEKF) is proposed in [14], which uses the extended Kalman filter algorithm instead of the least mean squares (LMS) algorithm. In [15], a minimal resource allocation network (MRAN) is proposed, which is allowed for the deletion of the previous center. The deletion strategy is based on the overall contribution of each hidden unit to the network output. A sequential learning algorithm for growing and pruning the RBF (GAPRBF) is proposed in [16, 17]; this algorithm uses the significance of neurons as the learning strategy. In [18], a Gaussian mixture model (GMM) to approximate the generalized growing and pruning evaluation formula is proposed; the GMM can be used for problems with a highdimensional probability density distribution. In [19], an error correction (ErrCor) algorithm is used for function approximation; this algorithm can achieve a desired error rate with fewer RBF units. Other methods have also been established to identify a proper architecture while maintaining a desired accuracy [20–22].
Support vector machines (SVMs), which are maximal margin classifiers, can also be used to train SLFNs. RBFs and SVMs differ in that at the output layer, a SVM employs convex optimization to find an optimal linear classifier, whereas the output weights of RBF network are typically estimated by a linear least squares algorithm, such as the LMS or recursive least squares (RLS) algorithm. Regarding other training SLFNs, extreme learning machines (ELMs) are proposed in [23]; ELMs choose random hidden neuron parameters and calculate the output weights with the least squares algorithm. This method can achieve a fast training speed. Subsequently, an online sequence extreme learning machine (OSELM) algorithm that can learn one by one and data blocks of the input samples is proposed in [24]. In ELMs, the number of hidden nodes is assigned a priori, and many nonoptimal nodes may exist; thus, in [25–28], several types of growing and pruning techniques based on ELMs are proposed to effectively estimate the number of hidden neurons.
All of the algorithms for training SLFNs consist of two stages: (1) suitable feature mapping and (2) output weight adjustment. To train SLFNs efficiently, in this paper, a potential function is introduced in the feature mapping stage to train the sample space, and an incremental learning method of constructing RBF hidden neurons is proposed. Note that although the sequence learning RBF algorithms can also generate RBF hidden neurons automatically, because of the lack of global information in the sample space, the adaptability of complex sample space may be poor. In contrast to GAPRBF, the proposed method does not require an assumption that the input samples obey a unified distribution. Furthermore, it does not need to fit the input sample distribution, such as the algorithm proposed in [18]. The proposed method utilizes global information about each class of training sample space and can generate RBF hidden neurons incrementally to adapt the sample space. By using a potential function to measure the density in each class of training sample space, the corresponding RBF hidden neurons that cover different sample areas can be established. The center of the Gaussian kernel function can be determined by learning the density of different regions in the training sample space. Once the width is given, a hidden neuron is generated and introduced into the RBF network, and a mechanism for eliminating the potentials of original samples is presented. This mechanism is ready for the next learning step, and thus, the RBF centers and number of hidden neurons can be effectively estimated. In this way, a suitable network size for RBF hidden layer that matches the complexity of the sample space can be built up. Thus, the proposed method solves the problem of dimension change from sample space mapping to feature space, and it reduces the restrictions on the sample sets, which is adaptable to more complex sample sets.
In this paper, a hybrid RBFBP network architecture is designed for the output weight adjustment stage to further improve the generalization and classification performance. The output of the original RBF hidden layer is processed and connected with a new hidden layer, which means that the output of the original RBF hidden layer, the new hidden layer, and the output layer consists of a multilayer perceptrons (MLPs), and the output of the original RBF hidden layer is the input of the MLPs. Once the network architecture is established, a back propagation (BP) algorithm is used to update the weights of the MLPs. In the hybrid RBFBP network, the RBF hidden neurons are used for nonlinear kernel mapping, the complexity of sample space is mapped onto the dimension of the BP network input layer, and the BP network is then used for nonlinear classification. The nonlinear kernel mapping can improve the separability of sample spaces, and a nonlinear BP classifier can then supply a better classification surface. In this manner, the improved network architecture combines the local response characteristics of the RBF network with the global response characteristics of the BP network, which simplifies the neuron number selection in the BP network hidden layer while further reducing the dependence on space mapping in the RBF hidden layer.
The incremental learning algorithm for the hybrid RBFBP (ILRBFBP), which is a batch learning algorithm, is proposed by combining the proposed incremental learning algorithm with the hybrid RBFBP network architecture. In this paper, the performance of the ILRBFBP algorithm is compared with other wellknown learning algorithms, such as back propagation based on stochastic gradient descent (SGBP) [29], the RBF algorithm based on kmeans clustering (KMRBF) [12], GAPRBF, SVM, and an ELM, on artificial data sets. To measure the unique features of the proposed method, the kmeans clustering learning algorithm based on the hybrid RBFBP network (KMRBFBP) is also compared with ILRBFBP on artificial data sets. Because SGBP and KMRBF are not suitable for considering more complex problems, for multiclass data sets, in addition to batch learning algorithms, such as SVM and ELM, other wellknown sequential algorithms, such as MRAN, GAPRBFN, and OSELM, are also compared with the ILRBFBP algorithm. The results indicate that the ILRBFBP algorithm can provide a higher classification accuracy with comparable complexity.
The remainder of this paper is organized as follows. Section 2 describes the principal ideas of the ILRBFBP, followed by a summary of the algorithm. Section 3 presents the experimental results and performance comparisons with other existing batch and sequential algorithms. Section 4 provides the conclusions of this study.
2 Main concepts of the ILRBFBP algorithm
In this section, the main concepts of the ILRBFBP algorithm are described. First, we provide the problem definition of the basic RBF network and then present the incremental learning algorithm for constructing RBF hidden neurons. Then, a hybrid RBFBP network architecture is designed, and the ILRBFBP algorithm is summarized. Finally, a method of adjusting the output saturation for multiclass classification problem is proposed.
2.1 Problem definition
For a RBF network, the output can be given by
where
where K is the number of RBF hidden neurons; φ _{ k }(x) is the response of the kth hidden node for an input vector x, where x ∈ R ^{t}; ω _{ k } is its connecting weight to the output node, which determines the classification surface; and μ _{ k } and σ _{ k } are the center and width of the kth hidden node, respectively, where k = 1, 2, … K.
A RBF network can localize the input sample space, which maps input samples to the interior of the hypercube, and the localized area is near a vertex. The dimension of the hypercube is the number of RBF hidden neurons. Thus, when going through the RBF network, an input vector x ∈ R ^{t} can be denoted as f : R ^{t} → (0, 1]^{K}. Figure 1 shows the results of mapping input samples going through the RBF hidden neurons, where the number of RBF hidden neurons is set as K = 3. In Fig. 1, we assume that every input sample vector is near the center of a RBF hidden neuron and that there is no overlap area covered by different RBF hidden neurons.
Figure 1 illustrates that in a RBF network, to achieve good training algorithms, an effective method of mapping the input sample space should be established, which means completing the estimation of the parameter set \( {\left\{K,{\mu}_k,{\sigma}_k\right\}}_{k=1}^K \). Then, an effective classification surface is needed, which depends on output weight adjustment.
2.2 Incremental learning algorithm for constructing RBF hidden neurons
In the fields of data mining and pattern recognition, potential functions can be used for density clustering and image segmentation (IS) [30]. Several methods of constructing potential function are proposed in [31]; here, we choose the potential function
where γ(x _{ 1 }, x _{ 2 }) represents the interaction potential of two points x _{ 1 }, x _{ 2 } in the input sample space, d(x _{ 1 }, x _{ 2 }) represents the distance measure, and T is a constant, which can be regarded as the distance weighting factor.
Given a training sample set S, where a specific label y _{ i } _{,} y _{ i } ∈ {y _{ i }; i = 1, 2, … h} is attached to each sample vector x in S, h is the number of pattern class. Let S _{ i } denote the set of feature vectors that are labeled y _{ i }, \( {S}_i=\left\{{\mathbf{x}}_{\mathbf{1}}^{\mathbf{i}},{\mathbf{x}}_{\mathbf{2}}^{\mathbf{i}},\dots, {\mathbf{x}}_{{\mathbf{N}}_{\mathbf{i}}}^{\mathbf{i}}\right\} \), where N _{ i } is the number of training samples in the ith pattern class. Thus, \( S={\cup}_{i=1}^h{S}_i \), S _{ i } ∩ S _{ j } = ∅, ∀ i ≠ j. For a pair of samples \( \left({\mathbf{x}}_{\mathbf{u}}^{\mathbf{i}},{\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}}\right) \) in S _{ i }, its interaction potential can be denoted as
Let \( {\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}} \) be the baseline sample; then, the interaction potential of all other samples to \( {\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}} \) can be denoted as
Therefore, the potentials of each sample in S _{ i } is given by
The potentials can be used to measure the density of different regions in the pattern class. Potentials are relatively large in the dense region, whereas they are relatively small in the sparse region. Once the potentials of each sample in S _{ i } are given, the sample with the maximum potential can be selected, where it is assumed the sample is \( {\mathbf{x}}_{\mathbf{p}}^{\mathbf{i}} \), that is,
In a RBF network, the activation response of hidden neurons has local characteristics. The sample space is divided into different subspaces by establishing different Gaussian kernel functions. To generate valid Gaussian kernel functions, we find the most densely region in the sample space and then establish a Gaussian kernel to cover the region. For that purpose, the sample with the maximum potential is chosen as the center of Gauss kernel function, which is given below.
where k refers to the number of RBF hidden neurons generated. To simplify the calculation, the width is fixed and selected by cross validation.
When a hidden neuron is established, it is necessary to eliminate the potentials of the region to find the next center in the remaining samples. This process can be updated by
where \( {\mathbf{x}}_{\mathbf{p}}^{\mathbf{i}} \) is the center of the current hidden neuron. For the potential value update process, Eq.(9) shows when a sample \( {\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}} \) is close to the center \( {\mathbf{x}}_{\mathbf{p}}^{\mathbf{i}} \), the potential value of \( {\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}} \) is attenuated fast, whereas when a sample \( {\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}} \) is far away from the center, the potential value of \( {\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}} \) is attenuated slowly. When meeting the inequality
a new hidden neuron is introduced into the RBF network and is ready to search for the next center; otherwise, the algorithm of constructing RBF hidden neurons in the current pattern class is over, where δ is a threshold.
The above process is called the incremental learning algorithm of constructing RBF hidden neurons. Figure 2 shows a schematic diagram of generating RBF hidden neurons incrementally, where the serial numbers in the training sample space represent the regions covered by different RBF hidden neurons. These covered regions transition from dense to sparse. The incremental learning algorithm of constructing RBF hidden neurons is summarized in Algorithm 1.
2.3 Hybrid RBFBP network architecture
As noted above, in a typical RBF network, the output weights are typically estimated by a linear least squares algorithm, such as the LMS or RLS algorithm. In this section, we transform the linear least squares algorithm into a nonlinear algorithm. When classifying a problem, a nonlinear algorithm can supply a better classification surface to adapt the sample space. For that purpose, a hybrid RBFBP network architecture is designed. The output of the RBF hidden neurons is processed and connected with a MLPs network, and then, the nonlinear BP algorithm is used to update the weights of the MLPs. The architecture of the hybrid RBFBP network is shown in Fig. 3, which consists of four components:

1.
The input layer, which consists of t source nodes, where t is the dimensionality of the input vector

2.
The RBF hidden layer, which consists of a group of Gaussian kernel functions:
where μ _{ k } and σ _{ k } are the center and width of the hidden neuron, respectively, and K is the number of hidden neurons.

3.
The BP hidden layer, which consists of the neurons between the RBF hidden layer and output layer. The induced local field \( {v}_j^{(l)} \) for neuron j in layer l of the BP network is
where \( {y}_i^{\left(l1\right)} \) is the output signal of the neuron i in the previous layer l1 of the BP network and \( {\omega}_{ji}^{(l)} \) is the synaptic weight of neuron j in layer l that is fed from neuron i in layer l1. Assuming the use of a sigmoid function, the output signal of neuron j in layer l is
where a and b are constants.
If neuron j is in the first BP network hidden layer, i.e., l = 1, set
where g _{ j }(x) is the double polar output of φ _{ j }(x) and can be denoted as

4.
The output layer. Set L is the depth of the BP network, note the depth of the BP network is equal to the sum of the BP network input layer, the hidden layer, and the output layer, i.e., if l = 1, then L = 3, and the output can be given as
In Fig. 3, the double polar processing can ensure the validity of the BP network input. The hybrid RBFBP network architecture is designed such that the RBF network has good stability, where the activation response in the RBF hidden neurons has local characteristics and maps the output value between 0 and 1. Thus, the original samples including outliers will be limited to a finite space. When the results of mapping the RBF hidden neurons are processed and used for the input of the BP network, the convergence rate of the BP algorithm can be increased and local minima can be avoided. For a BP network, the activation response in hidden neurons has global characteristics, especially those regions not fully displayed in the training set. Therefore, the hybrid RBFBP network architecture is a reasonable model; it provides a new strategy that combines the local characteristics of the RBF network with the global characteristics of the BP network. In addition, the hybrid network simplifies the number of neurons in the BP hidden layer while further reducing the dependence on space mapping in the RBF hidden layer.
A single hidden layer MLP neural network with an inputoutput mapping can provide an approximate realization of any continuous mapping [32]. Combined with the above discussion, in the hybrid network, we set the number of BP network hidden layers as l = 1.
Combining the proposed incremental learning algorithm with the hybrid RBFBP network architecture, the incremental learning RBFBP (ILRBFBP) algorithm is summarized in Algorithm 2.
2.4 Adjustment of the output label values
The ILRBFBP algorithm can handle binary problems and multiclass problems. For multiclass classification problems, suppose that the observation data set is given as \( {\left\{{\mathbf{x}}_{\mathbf{n}},{\mathbf{y}}_{\mathbf{n}}\right\}}_{n=1}^N \), where x _{ n } ∈ R ^{t} is an t ‐ dimensional observation features and y _{ n } ∈ R ^{h} is its coded class label. Here, h is the total number of classes, which is equal to the number of output hidden neurons. If the observation data x _{ n } is assigned to the class label c, then the cth element of y _{ n } = [y _{1}, …, y _{ c }, … y _{ h }]^{T} is 1 and other elements are −1, which can be denoted as follows:
The output tags of the ILRBFBP classier are ŷ _{ n } = [ŷ _{1}, …, ŷ _{ c }, … ŷ _{ h }]^{T}, where
According to the coding rules, only one output tag value is 1 and the other value is −1. If this condition is not met, the output tag is saturated and must be adjusted. Therefore, we set an effective way to correct the saturation problem in the learning process, which can be denoted as the pseudo code in Algorithm 3.
3 Performance evaluation of the ILRBFBP algorithm
In this section, we evaluate the performance of the ILRBFBP algorithm using two artificial classification problems from [33] and three classification problems from the UCI machine learning repository [34]. The artificial binary data sets, including the Doublemoon and Twist problems are used to measure the unique features of ILRBFBP and the main advantages of the results over others. Table 1 provides a description of the classifying data sets, where Doublemoon, Twist, and IS are wellbalanced data sets and Heart and vehicle classification (VC) are imbalanced data sets. For balanced data sets, the numbers of training samples in each class are identical. For the heart problem, the numbers of training samples in classes 1 and 2 are 33 and 40, respectively. For the VC problem, the numbers of training samples in classes 1–4 are 119, 118, 98, and 89, respectively.
The performance of ILRBFBP is compared with other wellknown batch and sequential learning algorithms, such as SGBP, KMRBF, KMRBFBP, SVM and ELM, MRAN, GAPRBF, and OSELM on different data sets. Note that the number of SGBP, KMRBF, KMRBFBP, ELM, and OSELM hidden neurons is selected manually. When changing the number of hidden neurons several times, the one with the lowest overall testing error is selected as the suitable number of hidden neurons. For multiclass problems, the method of adjusting output saturation problems is used. All simulations in each algorithm are performed ten times and are conducted in the MATLAB 2013 environment on an Intel(R) Core(TM) i5, 3.2 GHZ CPU with 4G of RAM. The simulations for the SVM are carried out using the popular LIBSVM package in C [35].
3.1 Performance measures
In this paper, the overall and average perclass classification accuracies are used to measure performance. The confusion matrix Q is used to obtain the classlevel performance and global performance of the various classifiers. Classlevel performance is measured by the percentage classification (η _{ i }), which is defined as
where q _{ ii } is the number of correctly classified samples and \( {N}_i^T \) is the number of samples for the class \( {\mathbf{y}}_{\mathbf{i}} \) in the training/testing data set. The overall (η _{ o }) and average perclass (η _{ a }) classification accuracies are defined as
where h is the number of classes and N ^{T} is the number of training/testing samples.
3.2 Performance comparison
3.2.1 Artificial binary data sets: Doublemoon problem
The prototype and data set of the Doublemoon classification problem are shown in Fig. 4a, b, respectively, where r = 10, ω = 6 and d = − 6. The main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBFBP are set as T = 1, σ = 3, δ = 0.01, M = 5, and α = 0, respectively. Figure 4c shows the classification results for the testing samples under these parameters. The classification results illustrate that the proposed algorithm can provide a superior classification surface. Figure 5 shows using different width parameters to cover the training sample space, where each cover generates a RBF hidden neuron and the number of RBF hidden neurons is increased incrementally, the bold lines represent the first coverage region in each pattern class. In Fig. 5, with the increase of the width parameter, the corresponding region covered each RBF hidden neuron is increased accordingly, which will affect the location of the next center, thus generates different RBF hidden neurons. Though the number of RBF hidden neurons has changed, ILRBFBP still can effectively cover each class of training samples. Thus, the incremental learning algorithm based on potential function clustering is feasible. ILRBFBP can be well adapted to the sample space, which is an effective algorithm to incrementally generate RBF hidden neurons for the Doublemoon problem.
Figure 6a, b demonstrates that when the number of training samples has changed, KMRBFBP needs less number of RBF hidden neurons than KMRBF. When the number of training samples is more than 500, KMRBFBP can get a higher classifying accuracy than KMRBF. These results show that the hybrid RBFBP network architecture is effective, which can improve the classifying accuracy and reduce the dependence on the original sample space mapping. In GAPRBF and ILRBFBP, the number of RBF hidden neurons is generated automatically. ILRBFBP needs less number of RBF hidden neurons than GAPRBF, and the overall testing accuracy outperforms GAPRBF. The classifying accuracy of ILRBFBP is comparable with SVM and KMRBFBP and outperforms ELM and KMRBF. Note that the number of KMRBF and KMRBFBP is selected manually. When changing the number of hidden neurons several times, the one with the highest overall testing accuracy is selected as the suitable number of hidden neurons. As ILRBFBP utilizes global information about each class of training sample space, it can generate RBF hidden neurons incrementally to adapt the sample space, and the hybrid RBFBP network architecture improves the network performance further.
3.2.2 Artificial binary data sets: Twist problem
The prototype and data set for the twist classification problem are shown in Fig. 7a, b, respectively, where d _{1} = 0.2, d _{2} = 0.5 and d _{3} = 0.8. Compared to the Doublemoon problem, the twist classification problem is more complex and can thus be used to evaluate the classification performance of the different algorithms. The main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBFBP are set as T = 200, σ = 0.15, δ = 0.01, M = 5, and α = 0, respectively. Figure 7c shows the classification results for the testing samples under these parameters. The classification results illustrate that the proposed algorithm still provides a superior classification surface for the Twist classification problem. Figure 8 shows using different width parameters to cover the training sample space, where each cover generates a RBF hidden neuron. In Fig. 8, the bold lines represent the first coverage region, which denote the most dense region in each pattern class. Although there are some overlap in different coverage regions, ILRBFBP still can effectively cover each class of training samples and generate corresponding RBF hidden neurons incrementally.
Figure 9a, b demonstrates that when the number of training samples has changed, KMRBFBP needs less number of RBF hidden neurons than KMRBF and can get a higher classifying accuracy. Thus, the hybrid RBFBP network architecture improves the classifying accuracy and reduces the dependence on the original sample space mapping. Note that in KMRBF and KMRBFBP, when the number of training samples is changed, the number of RBF hidden neurons has to be adjusted manually; otherwise, it will lead to a poor classification accuracy. Compared to KMRBF and KMRBFBP, ILRBFBP can adapt the training sample space well; when the number of training samples is changed, the number of RBF hidden neurons in ILRBFBP is changed accordingly and can get a higher classifying accuracy. Compared to GAPRBF, ILRBFBP can better adapt to the change of sample space. The classifying accuracy of ILRBFBP outperforms GAPRBF as well as SVM and ELM. Thus, the incremental learning algorithm based on potential function is effective, which utilizes global information about each class of training sample space to construct RBF hidden neurons incrementally, and the hybrid RBFBP network architecture improves the network performance further.
3.2.3 UCI binary data set: Heart problem
In this section, the Heart problem in the UCI binary data set is used to evaluate the performance of the ILRBFBP algorithm. In the Heart problem, the sample distribution values of each dimension are between 0 and 1, and the main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBFBP are set as T = 1, σ = 1.2, δ = 0.001, M = 5, and α = 0.1, respectively. As noted above, the Heart problem is an imbalanced classification problem. This, in addition to the overall testing η _{ o }, the average testing η _{ a } is also used to measure the performance of each algorithm.
The performance comparisons between ILRBFBP and the other batch learning algorithms are shown in Table 2. For the Heart problem, the overall and average testing accuracy of ILRBFBP are clearly higher than those of SGBP, and the proposed algorithm outperforms ELM and KMRBF by approximately 2.5–5 %. The average testing accuracy of ILRBFBP is 1.74 % lower than that of the SVM; however, the overall testing accuracy is approximately 3 % higher than that of the SVM, and fewer hidden neurons are needed.
3.2.4 UCI multiclass data sets: IS and VC problems
In this section, the IS and VC problems are used to evaluate the performance of the ILRBFBP algorithm. The output saturation is adjusted for the multiclass classifying problem in the ILRBFBP algorithm. For the IS problem, the sample distribution range in each dimension is different, so the inputs of each algorithm are scaled appropriately between 0 and +1. The main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBFBP are set as T = 1, σ = 0.3, δ = 0.001, M = 8, and α = 0.2, respectively. The IS problem is a wellbalanced data set; the number of training samples in each class is 30, and the overall testing η _{ o } is used to measure the performance of each algorithm. For the VC problem, the sample distribution values of each dimension are between −1 and +1, and the main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBFBP are set as T = 1, σ = 0.4, δ = 0.001, M = 9, and α = 0.1, respectively. The number of training samples in each class is 119, 118, 98, and 89. The VC problem is a highly imbalanced data set, where the strong overlap between the classes influences the performance of each algorithm. The overall testing η _{ o } and average testing η _{ a } are used to measure the performance of each algorithm.
Table 3 shows the performance comparisons for the IS and VC problems. For the IS problem, the overall testing accuracy of ILRBFBP is approximately 5–6 % higher than those of MRAN and GAPRBF and approximately 0.9–1.3 % higher than those of OSELM, SVM, and ELM. For the VC problem, the overall and average testing accuracies of ILRBFBP are approximately 9–11 % higher than those of MRAN and GAPRBF and approximately 1.2–2.5 % higher than those of the SVM, ELM, and OSELM. The number of RBF hidden neurons and training time of ILRBFBP are the greatest because the strong overlap of sample space increases the number of RBF hidden neurons and learning time, which yields a higher classification accuracy.
3.3 Analysis of the parameters in the ILRBFBP algorithm
In this section, the parameter selection for the ILRBFBP algorithm is discussed, which mainly refers to the distance weighting factor T, width σ and number of BP hidden neurons.
3.3.1 Selection of distance weighting factor T
In this paper, parameter T is used for distance weighting, which can be used to control the interaction potential between two samples. By changing T, the nonlinear mapping of the potential γ can be achieved.
To determine a proper choice of T, in this paper, the standard deviation is considered to measure the impact on T. Here, the Twist classification problem is used in the experiment. Given the number of training samples is 500 and testing samples is 4000; other parameters are given as follows:

1)
Twist 1: Set d _{1} = 0.2, d _{2} = 0.5, and d _{3} = 0.8,the standard deviation in each dimension is 0.3281 and 0.3196, respectively. The width parameter is set as σ = 0.1.

2)
Twist 2: Set d _{1} = 2, d _{2} = 5, and d _{3} = 8,the standard deviation in each dimension is 3.2744 and 3.2689, respectively. The width parameter is set as σ = 1
Figure 10a shows that when the samples are not normalized, for the Twist 2 sample set, the standard deviation of each dimension is relatively large; with the increase of T, the classification performance is reduced. For the Twist 1 sample set, the standard deviation of each dimension is relatively small and the sensitivity of classifying accuracy on T is reduced; however, when the T is selected as 200, the maximum classification accuracy is achieved. Thus, the choice of T should be inversely proportional to the standard deviation of each dimension, that is, \( T\propto \raisebox{1ex}{$1$}\!\left/ \!\raisebox{1ex}{$\underset{i=1,2\dots t}{ \max}\left\{{\alpha}_i\right\}$}\right. \), where α _{ i } is the standard deviation of ith dimension and t is the sample dimension. Figure 10b further indicates that when the samples are normalized to [−1, 1], the dependence on T is reduced and a relatively stable classification accuracy can be achieved.
In this paper, for the Doublemoon data set, the maximum standard deviation of two dimensions is 8.6448, so a small T should be provided and T is set as T = 1. For the Twist data set, the maximum standard deviation of two dimensions is 0.3281, and T is set as T = 200.
In highdimensional space, the sample distribution is often relatively sparse. The sample dimension is considered to be inversely proportional to T, thus T ∝ 1/t. In this paper, for the IS classification problem, the input values in each dimension are scaled appropriately between 0 and +1. For the Heart and VC classification problems, the values in each dimension are between −1 and 1. Thus, the impact of standard deviation on T is eliminated. Taken into account the dimension information, for the IS, Heart, and VC classification problems, a small T should be provided and T is set as T = 1.
3.3.2 Impact of the width σ on ILRBFBP
The width parameter σ can be used to control the classification accuracy and generalization performance in a RBF network. In the ILRBFBP algorithm, the width is fixed and selected by cross validation. To reduce the range of the width parameter value selection, we conduct preprocessing for the sample space. If the sample distribution values of each dimension vary considerably, such as in the IS data set, the inputs to each algorithm are scaled appropriately between 0 and +1, whereas the inputs to each algorithm remain unchanged in the Heart and VC data sets.
In the proposed incremental learning algorithm, using a potential function approach to construct RBF hidden neurons incrementally has to complete the effective coverage of the training sample space. As the samples in highdimensional space are relatively sparse, if the width is too small, it may lead to establish the corresponding Gaussian kernel at each sample, and the proposed incremental learning algorithm is invalid. The reason is that although the potential value of each sample in the training sample space is measured, in the process of eliminating the potential value of the sample, the generated RBF hidden neurons do not cover other samples, which will lead to a failure of Eq. (9), and excessive RBF hidden neurons will lead to the redundancy of the network architecture, which affects the classification performance of the BP network. Thus, in the proposed ILRBFBP algorithm, an effective kernel width parameter should be provided, which can generate proper RBF hidden neurons to cover the sample space. Note that the number of generated RBF hidden neurons should not be close to the number of the training samples; otherwise, the proposed algorithm is invalid.
Figure 11a, b shows the impact of width on the overall classification accuracy and the number of RBF hidden neurons, respectively. Figure 11 illustrates that for the Heart and VC data sets, when the width parameter is small, such as σ = 0.1 and σ = 0.2, the overall classification accuracy is poor, and effective coverage of the input sample space is not achieved.
When the value of the width parameter is in a suitable range, the number of generated RBF hidden neurons will change, but a relatively stable classification accuracy can be achieved. For the proposed ILRBFBP algorithm, once the width is given, it can learn the sample space automatically, and the changes in the width parameter will affect the coverage of RBF hidden neurons and generate different RBF hidden neurons. Thus, the incremental learning strategy can counteract the effect of the width to some extent.
4 Impact of the number of BP hidden neurons on ILRBFBP
In the hybrid RBFBP network architecture, the nonlinear BP algorithm is used to adjust the weights of the MLPs, which further improves the classification result. However, this method results in an increase in the number of parameters to be selected, especially the selection of the number of BP hidden neurons. For this problem, we conduct experiments on the UCI data sets and discuss the results.
Figure 12 shows the impact of the number of BP hidden neurons on ILRBFBP. For the Heart, IS, and VC problems, when the number of BP hidden neurons is greater or equal to 4, the overall classification accuracy does not change considerably. For the hybrid RBFBP network, the mapping results of RBF hidden neurons are processed and used for the input of BP network, which improves the stability of the BP network and effectively avoids falling into local minima for the BP algorithm. Thus, the dependence on the number of BP hidden neurons is reduced. When the sample set is more complex, the momentum term can be used to improve the BP algorithm further.
5 Conclusions
In this paper, an incremental learning algorithm for the hybrid RBPBP (ILRBFBP) network classifier is proposed. The ILRBFBP algorithm uses a potential function to measure the density of the training sample space and incrementally generates RBF hidden neurons, enabling the effective estimation of the center and number of RBF hidden neurons. In this way, a suitable network size for RBF hidden layer that matches the complexity of the sample space can be built up. A hybrid RBFBP network architecture is designed to improve classification performance further, which shows good stability and generalization performance. The hybrid network simplifies the selection of the number of neurons in the BP hidden layer while further reducing the dependence on space mapping in the RBF hidden layer.
The performance of the ILRBFBP algorithm has been compared with other batch learning algorithms, such as SGBP, KMRBF, SVM, and ELM, and sequential learning algorithms, such as MRAN, GAPRBF, and OSELM, in artificial data sets and UCI data sets. The method of adjusting output label values is used to prevent the output saturation problem for multiclass classification. Experiments demonstrate the superiority of the ILRBFBP algorithm.
In the future, we will focus on the optimization of kernel width and imbalanced data classification problems. In the ILRBFBP algorithm, the width is fixed and selected by cross validation and the adjustment of width parameter will affect the location of next center, as well as the network size. Therefore, it is necessary to design an adaptive width adjustment to adapt to the different regions of the sample space. In addition, for the imbalanced data classification problem, the samples in the boundary regions contain more classification information, thus how to measure and select these samples is particularly important. Further studies are needed to address these concerns.
References
M Lin, K Tang, X Yao, Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw and Learning Systems 24(4), 647–660 (2013)
LQ Li, WX Xie, Intuitionistic fuzzy joint probabilistic data association filter and its application to multitarget tracking. Signal Process 96, 433–444 (2014)
YL Wei, JB Qiu, HR Karimi, M Wang, Hinfinity model reduction for continuoustime Markovian jump systems with incomplete statistics of mode information. Int J Syst Sci 45(7), 1496–1507 (2014)
YL Wei, JB Qiu, HR Karimi, M Wang, Filtering design for twodimensional Markovian jump systems with statedelays and deficient mode information. Inform Sci 269, 316–331 (2014)
YL Wei, JB Qiu, HR Karimi, M Wang, A new design of H∞ filtering for continuoustime Markovian jump systems with timevarying delay and partially accessible mode information. Signal Process 93(9), 2392–2407 (2013)
FY Meng, X Li, JH Pei, A feature point matching based on spatial order constraints bilateralneighbor vote. IEEE Trans Image Process 24(11), 4160–4171 (2015)
LX Guan, WX Xie, JH Pei, Segmented minimum noise fraction transformation for efficient feature extraction of hyperspectral images. Pattern Recogn 48(10), 3216–3226 (2015)
HC Nejad, O Khayat, B Azadbakht, M Mohammadi, Using feed forward neural network for electrocardiogram signal analysis in chaotic domain. J Intelligent and Fuzzy Systems 27(5), 2289–2296 (2014)
CH Weng, CK Huang, RP Han, Disease prediction with different types of neural network classifiers. Telematics Inform 33(2), 277–292 (2016)
C Lu, N Ma, ZP Wang, Fault detection for hydraulic pump based on chaotic parallel RBF network. EURASIP J on Advances in Signal Processing 49, (2011). doi: 10.1186/16876180201149
J Moody, CJ Darken, Fast learning in networks of locallytuned processing. Neurocomputing 1(2), 281–294 (1989)
D Lowe, Characterising complexity by the degrees of freedom in a radial basis function network. Neurocomputing 19(13), 199–209 (1998)
J Platt, A resourceallocating network for function interpolation. Neural Comput 3(2), 213–225 (1991)
V Kadirkamanathan, M Niranjan, A function estimation approach to sequential learning with neural networks. Neural Comput 5(6), 954–975 (1993)
L Yingwei, N Sundararajan, P Saratchandran, A sequential learning scheme for function approximation using minimal radial basis function. Neural Comput 9(2), 461–478 (1997)
GB Huang, P Saratchandran, N Sundararajan, An efficient sequential learning algorithm for growing and pruning RBF (GAPRBF) networks. IEEE Trans Syst Man Cybern B Cybern 34(6), 2284–2292 (2004)
GB Huang, P Saratchandran, N Sundararajan, A generalized growing and pruning RBF (GAPRBF) neural network for function approximation. IEEE Trans Neural Netw 16(1), 57–67 (2005)
M Bortman, M Aladjem, A growing and pruning method for radial basis function networks. IEEE Trans Neural Netw 20(6), 1030–1045 (2009)
H Yu, PD Reiner, T Xie, T Bartczak, BM Wilamowski, An incremental design of radial basis function networks. IEEE Trans Neural Netw and Learning Systems 2(10), 1793–1803 (2014)
S Suresh, D Keming, HJ Kim, A sequential learning algorithm for selfadaptive resource allocation network classifier. Neurocomputing 73(1618), 3012–3019 (2010)
T Xie, H Yu, J Hewlett, P Rózycki, B Wilamowski, Fast and efficient secondorder method for training radial basis function networks. IEEE Trans Neural Netw and Learning Systems 23(4), 609–619 (2012)
C Constantinopoulos, A Likas, An incremental training method for the probabilistic RBF network. IEEE Trans Neural Netw 17(4), 966–974 (2006)
GB Huang, QY Zhu, CK Siew, A new learning scheme of feedforward neural, in Proceedings of International Joint Conference on Neural Networks (IJCNN 2004), pp. 985–99
NY Liang, GB Huang, P Saratchandran, N Sundararajan, A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6), 1411–1423 (2006)
GB Huang, L CHEN, CK Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4), 879–892 (2006)
GB Huang, L CHEN, Convex incremental extreme learning machine. Neurocomputing 70(1618), 3056–3062 (2007)
GB Huang, L Chen, Enhanced random search based incremental extreme learning machine. Neurocomputing 71(1618), 3460–3468 (2008)
G Feng, GB Huang, Q Lin, Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8), 1352–1357 (2009)
Y LeCun, L Bottou, GB Orr, KR Müller, Efficient backprop. Lecture Notes Comput Sci 1524, 9–50 (1998)
JH Pei, WX Xie, Adaptive multi thresholds image segmentation based on potential function clustering. Chinese J Computers 22(7), 758–762 (1999)
OA Bashkerov, EM Braverman, IB Muchnik, Potential function algorithms for pattern recognition learning machines. Autom Remote Control 25(5), 692–695 (1964)
G Cybenko, Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signal, and Systems 2, 303–314 (1989)
S Hayin, Neural networks and learning machines. Third Edition (China Machine Press, China, 2009), pp. 61–63
C Blake, C Merz, UCI repository of machine learning databases (Department of Information and Computer Sciences, University of California, Irvine, 1998). available at http://archive.ics.uci.edu/ml/
CC Chang, CJ, LIBSVM: a library for support vector machines (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan, 2003). available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
Acknowledgements
The authors thank the support provided by the National Science Foundation of China (No. 61331021, U1301251) and the Shenzhen Science and Technology Plan Project (JCYJ20130408173025036). The authors would like to thank the EditorinChief, the Associate Editor, and the Anonymous Reviewers for their helpful comments and suggestions which have greatly improved the quality of presentation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Wen, H., Xie, W., Pei, J. et al. An incremental learning algorithm for the hybrid RBFBP network classifier. EURASIP J. Adv. Signal Process. 2016, 57 (2016). https://doi.org/10.1186/s1363401603578
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363401603578
Keywords
 Radial basis function (RBF)
 Back propagation (BP)
 Incremental learning
 Hybrid
 Neural network