- Research
- Open Access

# An incremental learning algorithm for the hybrid RBF-BP network classifier

- Hui Wen
^{1}, - Weixin Xie
^{1}, - Jihong Pei
^{1}Email author and - Lixin Guan
^{1}

**2016**:57

https://doi.org/10.1186/s13634-016-0357-8

© Wen et al. 2016

**Received:**3 November 2015**Accepted:**27 April 2016**Published:**10 May 2016

## Abstract

This paper presents an incremental learning algorithm for the hybrid RBF-BP (ILRBF-BP) network classifier. A potential function is introduced to the training sample space in space mapping stage, and an incremental learning method for the construction of RBF hidden neurons is proposed. The proposed method can incrementally generate RBF hidden neurons and effectively estimate the center and number of RBF hidden neurons by determining the density of different regions in the training sample space. A hybrid RBF-BP network architecture is designed to train the output weights. The output of the original RBF hidden layer is processed and connected with a multilayer perceptron (MLP) network; then, a back propagation (BP) algorithm is used to update the MLP weights. The RBF hidden neurons are used for nonlinear kernel mapping and the BP network is then used for nonlinear classification, which improves classification performance further. The ILRBF-BP algorithm is compared with other algorithms in artificial data sets and UCI data sets, and the experiments demonstrate the superiority of the proposed algorithm.

## Keywords

- Radial basis function (RBF)
- Back propagation (BP)
- Incremental learning
- Hybrid
- Neural network

## 1 Introduction

In the field of pattern recognition and data mining, various methods and models are proposed to solve different problems. Existing methods can be divided into two levels including data level and algorithmic level. The data levels are mainly concerned with various sampling techniques [1]. The algorithmic level tried to apply or improve varieties of existing traditional learning algorithms such as fuzzy clustering [2], Markovian jumping system [3–5], k-nearest neighbors [6], and neural network, where single-layer feed-forward networks (SLFNs) have been intensively studied in the past several decades and applied to solve various problems in different fields, such as image recognition [7], signal processing [8], disease prediction [9], and industrial fault diagnosis [10]; in particular, radial basis function (RBF) neural networks offer an effective mechanism for nonlinear mapping and classification. In a typical RBF network, the number of hidden neurons is assigned a priori [11, 12], which leads to poor adaptability for different sample sets. Several sequential learning algorithms have been proposed to determine proper sizes of RBF network architectures. A resource allocation network (RAN) for constructing the RBF network is proposed in [13], which uses the novelty of incoming data as the learning strategy. A RAN algorithm based on an extended Kalman filter (RANEKF) is proposed in [14], which uses the extended Kalman filter algorithm instead of the least mean squares (LMS) algorithm. In [15], a minimal resource allocation network (MRAN) is proposed, which is allowed for the deletion of the previous center. The deletion strategy is based on the overall contribution of each hidden unit to the network output. A sequential learning algorithm for growing and pruning the RBF (GAP-RBF) is proposed in [16, 17]; this algorithm uses the significance of neurons as the learning strategy. In [18], a Gaussian mixture model (GMM) to approximate the generalized growing and pruning evaluation formula is proposed; the GMM can be used for problems with a high-dimensional probability density distribution. In [19], an error correction (ErrCor) algorithm is used for function approximation; this algorithm can achieve a desired error rate with fewer RBF units. Other methods have also been established to identify a proper architecture while maintaining a desired accuracy [20–22].

Support vector machines (SVMs), which are maximal margin classifiers, can also be used to train SLFNs. RBFs and SVMs differ in that at the output layer, a SVM employs convex optimization to find an optimal linear classifier, whereas the output weights of RBF network are typically estimated by a linear least squares algorithm, such as the LMS or recursive least squares (RLS) algorithm. Regarding other training SLFNs, extreme learning machines (ELMs) are proposed in [23]; ELMs choose random hidden neuron parameters and calculate the output weights with the least squares algorithm. This method can achieve a fast training speed. Subsequently, an online sequence extreme learning machine (OS-ELM) algorithm that can learn one by one and data blocks of the input samples is proposed in [24]. In ELMs, the number of hidden nodes is assigned a priori, and many nonoptimal nodes may exist; thus, in [25–28], several types of growing and pruning techniques based on ELMs are proposed to effectively estimate the number of hidden neurons.

All of the algorithms for training SLFNs consist of two stages: (1) suitable feature mapping and (2) output weight adjustment. To train SLFNs efficiently, in this paper, a potential function is introduced in the feature mapping stage to train the sample space, and an incremental learning method of constructing RBF hidden neurons is proposed. Note that although the sequence learning RBF algorithms can also generate RBF hidden neurons automatically, because of the lack of global information in the sample space, the adaptability of complex sample space may be poor. In contrast to GAP-RBF, the proposed method does not require an assumption that the input samples obey a unified distribution. Furthermore, it does not need to fit the input sample distribution, such as the algorithm proposed in [18]. The proposed method utilizes global information about each class of training sample space and can generate RBF hidden neurons incrementally to adapt the sample space. By using a potential function to measure the density in each class of training sample space, the corresponding RBF hidden neurons that cover different sample areas can be established. The center of the Gaussian kernel function can be determined by learning the density of different regions in the training sample space. Once the width is given, a hidden neuron is generated and introduced into the RBF network, and a mechanism for eliminating the potentials of original samples is presented. This mechanism is ready for the next learning step, and thus, the RBF centers and number of hidden neurons can be effectively estimated. In this way, a suitable network size for RBF hidden layer that matches the complexity of the sample space can be built up. Thus, the proposed method solves the problem of dimension change from sample space mapping to feature space, and it reduces the restrictions on the sample sets, which is adaptable to more complex sample sets.

In this paper, a hybrid RBF-BP network architecture is designed for the output weight adjustment stage to further improve the generalization and classification performance. The output of the original RBF hidden layer is processed and connected with a new hidden layer, which means that the output of the original RBF hidden layer, the new hidden layer, and the output layer consists of a multilayer perceptrons (MLPs), and the output of the original RBF hidden layer is the input of the MLPs. Once the network architecture is established, a back propagation (BP) algorithm is used to update the weights of the MLPs. In the hybrid RBF-BP network, the RBF hidden neurons are used for nonlinear kernel mapping, the complexity of sample space is mapped onto the dimension of the BP network input layer, and the BP network is then used for nonlinear classification. The nonlinear kernel mapping can improve the separability of sample spaces, and a nonlinear BP classifier can then supply a better classification surface. In this manner, the improved network architecture combines the local response characteristics of the RBF network with the global response characteristics of the BP network, which simplifies the neuron number selection in the BP network hidden layer while further reducing the dependence on space mapping in the RBF hidden layer.

The incremental learning algorithm for the hybrid RBF-BP (ILRBF-BP), which is a batch learning algorithm, is proposed by combining the proposed incremental learning algorithm with the hybrid RBF-BP network architecture. In this paper, the performance of the ILRBF-BP algorithm is compared with other well-known learning algorithms, such as back propagation based on stochastic gradient descent (SGBP) [29], the RBF algorithm based on k-means clustering (KM-RBF) [12], GAP-RBF, SVM, and an ELM, on artificial data sets. To measure the unique features of the proposed method, the k-means clustering learning algorithm based on the hybrid RBF-BP network (KMRBF-BP) is also compared with ILRBF-BP on artificial data sets. Because SGBP and KM-RBF are not suitable for considering more complex problems, for multi-class data sets, in addition to batch learning algorithms, such as SVM and ELM, other well-known sequential algorithms, such as MRAN, GAP-RBFN, and OS-ELM, are also compared with the ILRBF-BP algorithm. The results indicate that the ILRBF-BP algorithm can provide a higher classification accuracy with comparable complexity.

The remainder of this paper is organized as follows. Section 2 describes the principal ideas of the ILRBF-BP, followed by a summary of the algorithm. Section 3 presents the experimental results and performance comparisons with other existing batch and sequential algorithms. Section 4 provides the conclusions of this study.

## 2 Main concepts of the ILRBF-BP algorithm

In this section, the main concepts of the ILRBF-BP algorithm are described. First, we provide the problem definition of the basic RBF network and then present the incremental learning algorithm for constructing RBF hidden neurons. Then, a hybrid RBF-BP network architecture is designed, and the ILRBF-BP algorithm is summarized. Finally, a method of adjusting the output saturation for multi-class classification problem is proposed.

### 2.1 Problem definition

*K*is the number of RBF hidden neurons;

*φ*

_{ k }(

**x**) is the response of the

*k*th hidden node for an input vector

**x**, where

**x**∈

*R*

^{ t };

*ω*

_{ k }is its connecting weight to the output node, which determines the classification surface; and

*μ*

_{ k }and

*σ*

_{ k }are the center and width of the

*k*th hidden node, respectively, where

*k*= 1, 2, …

*K*.

**x**∈

*R*

^{ t }can be denoted as

*f*:

*R*

^{ t }→ (0, 1]

^{ K }. Figure 1 shows the results of mapping input samples going through the RBF hidden neurons, where the number of RBF hidden neurons is set as

*K*= 3. In Fig. 1, we assume that every input sample vector is near the center of a RBF hidden neuron and that there is no overlap area covered by different RBF hidden neurons.

Figure 1 illustrates that in a RBF network, to achieve good training algorithms, an effective method of mapping the input sample space should be established, which means completing the estimation of the parameter set \( {\left\{K,{\mu}_k,{\sigma}_k\right\}}_{k=1}^K \). Then, an effective classification surface is needed, which depends on output weight adjustment.

### 2.2 Incremental learning algorithm for constructing RBF hidden neurons

*γ*(

**x**

_{ 1 },

**x**

_{ 2 }) represents the interaction potential of two points

**x**

_{ 1 },

**x**

_{ 2 }in the input sample space,

*d*(

**x**

_{ 1 },

**x**

_{ 2 }) represents the distance measure, and

*T*is a constant, which can be regarded as the distance weighting factor.

*S*, where a specific label

**y**

_{ i }

_{,}

**y**

_{ i }∈ {

**y**

_{ i };

*i*= 1, 2, …

*h*} is attached to each sample vector

**x**in

*S*,

*h*is the number of pattern class. Let

*S*

_{ i }denote the set of feature vectors that are labeled

**y**

_{ i }, \( {S}_i=\left\{{\mathbf{x}}_{\mathbf{1}}^{\mathbf{i}},{\mathbf{x}}_{\mathbf{2}}^{\mathbf{i}},\dots, {\mathbf{x}}_{{\mathbf{N}}_{\mathbf{i}}}^{\mathbf{i}}\right\} \), where

*N*

_{ i }is the number of training samples in the

*i*th pattern class. Thus, \( S={\cup}_{i=1}^h{S}_i \),

*S*

_{ i }∩

*S*

_{ j }= ∅, ∀

*i*≠

*j*. For a pair of samples \( \left({\mathbf{x}}_{\mathbf{u}}^{\mathbf{i}},{\mathbf{x}}_{\mathbf{v}}^{\mathbf{i}}\right) \) in

*S*

_{ i }, its interaction potential can be denoted as

*S*

_{ i }is given by

*S*

_{ i }are given, the sample with the maximum potential can be selected, where it is assumed the sample is \( {\mathbf{x}}_{\mathbf{p}}^{\mathbf{i}} \), that is,

*k*refers to the number of RBF hidden neurons generated. To simplify the calculation, the width is fixed and selected by cross validation.

a new hidden neuron is introduced into the RBF network and is ready to search for the next center; otherwise, the algorithm of constructing RBF hidden neurons in the current pattern class is over, where *δ* is a threshold.

### 2.3 Hybrid RBF-BP network architecture

- 1.
The input layer, which consists of

*t*source nodes, where*t*is the dimensionality of the input vector

- 2.
The RBF hidden layer, which consists of a group of Gaussian kernel functions:

*μ*

_{ k }and

*σ*

_{ k }are the center and width of the hidden neuron, respectively, and

*K*is the number of hidden neurons.

- 3.
The BP hidden layer, which consists of the neurons between the RBF hidden layer and output layer. The induced local field \( {v}_j^{(l)} \) for neuron

*j*in layer*l*of the BP network is

*i*in the previous layer

*l*-1 of the BP network and \( {\omega}_{ji}^{(l)} \) is the synaptic weight of neuron

*j*in layer

*l*that is fed from neuron

*i*in layer

*l*-1. Assuming the use of a sigmoid function, the output signal of neuron

*j*in layer

*l*is

*a*and

*b*are constants.

*j*is in the first BP network hidden layer, i.e.,

*l*= 1, set

*g*

_{ j }(

**x**) is the double polar output of

*φ*

_{ j }(

**x**) and can be denoted as

- 4.
The output layer. Set

*L*is the depth of the BP network, note the depth of the BP network is equal to the sum of the BP network input layer, the hidden layer, and the output layer, i.e., if*l*= 1, then*L*= 3, and the output can be given as

In Fig. 3, the double polar processing can ensure the validity of the BP network input. The hybrid RBF-BP network architecture is designed such that the RBF network has good stability, where the activation response in the RBF hidden neurons has local characteristics and maps the output value between 0 and 1. Thus, the original samples including outliers will be limited to a finite space. When the results of mapping the RBF hidden neurons are processed and used for the input of the BP network, the convergence rate of the BP algorithm can be increased and local minima can be avoided. For a BP network, the activation response in hidden neurons has global characteristics, especially those regions not fully displayed in the training set. Therefore, the hybrid RBF-BP network architecture is a reasonable model; it provides a new strategy that combines the local characteristics of the RBF network with the global characteristics of the BP network. In addition, the hybrid network simplifies the number of neurons in the BP hidden layer while further reducing the dependence on space mapping in the RBF hidden layer.

A single hidden layer MLP neural network with an input-output mapping can provide an approximate realization of any continuous mapping [32]. Combined with the above discussion, in the hybrid network, we set the number of BP network hidden layers as *l* = 1.

### 2.4 Adjustment of the output label values

**x**

_{ n }∈

*R*

^{ t }is an

*t*‐ dimensional observation features and

**y**

_{ n }∈

*R*

^{ h }is its coded class label. Here,

*h*is the total number of classes, which is equal to the number of output hidden neurons. If the observation data

**x**

_{ n }is assigned to the class label

*c*, then the

*c*th element of

**y**

_{ n }= [

*y*

_{1}, …,

*y*

_{ c }, …

*y*

_{ h }]

^{ T }is 1 and other elements are −1, which can be denoted as follows:

**ŷ**

_{ n }= [

*ŷ*

_{1}, …,

*ŷ*

_{ c }, …

*ŷ*

_{ h }]

^{ T }, where

## 3 Performance evaluation of the ILRBF-BP algorithm

Descriptions of the classifying data sets

Data sets | No. of features | No. of classes | No. of training | No. of testing | Attribute | Sources |
---|---|---|---|---|---|---|

Double-moon | 2 | 2 | 200~2000 | 4000 | Balance | Artificial |

Twist | 2 | 2 | 200~2000 | 4000 | Balance | Artificial |

Heart | 13 | 2 | 73 | 230 | Imbalance | UCI |

IS | 19 | 7 | 210 | 2100 | Balance | UCI |

VC | 18 | 4 | 424 | 422 | Imbalance | UCI |

The performance of ILRBF-BP is compared with other well-known batch and sequential learning algorithms, such as SGBP, KM-RBF, KMRBF-BP, SVM and ELM, MRAN, GAP-RBF, and OS-ELM on different data sets. Note that the number of SGBP, KM-RBF, KMRBF-BP, ELM, and OS-ELM hidden neurons is selected manually. When changing the number of hidden neurons several times, the one with the lowest overall testing error is selected as the suitable number of hidden neurons. For multi-class problems, the method of adjusting output saturation problems is used. All simulations in each algorithm are performed ten times and are conducted in the MATLAB 2013 environment on an Intel(R) Core(TM) i5, 3.2 GHZ CPU with 4G of RAM. The simulations for the SVM are carried out using the popular LIBSVM package in C [35].

### 3.1 Performance measures

*η*

_{ i }), which is defined as

*q*

_{ ii }is the number of correctly classified samples and \( {N}_i^T \) is the number of samples for the class \( {\mathbf{y}}_{\mathbf{i}} \) in the training/testing data set. The overall (

*η*

_{ o }) and average per-class (

*η*

_{ a }) classification accuracies are defined as

*h*is the number of classes and

*N*

^{ T }is the number of training/testing samples.

### 3.2 Performance comparison

#### 3.2.1 Artificial binary data sets: Double-moon problem

*r*= 10,

*ω*= 6 and

*d*= − 6. The main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBF-BP are set as

*T*= 1,

*σ*= 3,

*δ*= 0.01,

*M*= 5, and

*α*= 0, respectively. Figure 4c shows the classification results for the testing samples under these parameters. The classification results illustrate that the proposed algorithm can provide a superior classification surface. Figure 5 shows using different width parameters to cover the training sample space, where each cover generates a RBF hidden neuron and the number of RBF hidden neurons is increased incrementally, the bold lines represent the first coverage region in each pattern class. In Fig. 5, with the increase of the width parameter, the corresponding region covered each RBF hidden neuron is increased accordingly, which will affect the location of the next center, thus generates different RBF hidden neurons. Though the number of RBF hidden neurons has changed, ILRBF-BP still can effectively cover each class of training samples. Thus, the incremental learning algorithm based on potential function clustering is feasible. ILRBF-BP can be well adapted to the sample space, which is an effective algorithm to incrementally generate RBF hidden neurons for the Double-moon problem.

#### 3.2.2 Artificial binary data sets: Twist problem

*d*

_{1}= 0.2,

*d*

_{2}= 0.5 and

*d*

_{3}= 0.8. Compared to the Double-moon problem, the twist classification problem is more complex and can thus be used to evaluate the classification performance of the different algorithms. The main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBF-BP are set as

*T*= 200,

*σ*= 0.15,

*δ*= 0.01,

*M*= 5, and

*α*= 0, respectively. Figure 7c shows the classification results for the testing samples under these parameters. The classification results illustrate that the proposed algorithm still provides a superior classification surface for the Twist classification problem. Figure 8 shows using different width parameters to cover the training sample space, where each cover generates a RBF hidden neuron. In Fig. 8, the bold lines represent the first coverage region, which denote the most dense region in each pattern class. Although there are some overlap in different coverage regions, ILRBF-BP still can effectively cover each class of training samples and generate corresponding RBF hidden neurons incrementally.

#### 3.2.3 UCI binary data set: Heart problem

In this section, the Heart problem in the UCI binary data set is used to evaluate the performance of the ILRBF-BP algorithm. In the Heart problem, the sample distribution values of each dimension are between 0 and 1, and the main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBF-BP are set as *T* = 1, *σ* = 1.2, *δ* = 0.001, *M* = 5, and *α* = 0.1, respectively. As noted above, the Heart problem is an imbalanced classification problem. This, in addition to the overall testing *η*
_{
o
}, the average testing *η*
_{
a
} is also used to measure the performance of each algorithm.

Performance comparison for the Heart problem

Method |
| Training time(s) | Training | Testing | Testing |
---|---|---|---|---|---|

SGBP | 7 | 0.95 | 95.01 | 46.09 | 48.42 |

KM-RBF | 7 | 0.78 | 82.19 | 75.22 | 75.30 |

SVM | 39 | 0.08 | 100 | 77.39 | 81.81 |

ELM | 10 | 0 | 87.67 | 77.83 | 77.66 |

ILRBF-BP | 11 and 5 | 0.66 | 91.78 | 80.43 | 80.07 |

#### 3.2.4 UCI multi-class data sets: IS and VC problems

In this section, the IS and VC problems are used to evaluate the performance of the ILRBF-BP algorithm. The output saturation is adjusted for the multi-class classifying problem in the ILRBF-BP algorithm. For the IS problem, the sample distribution range in each dimension is different, so the inputs of each algorithm are scaled appropriately between 0 and +1. The main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBF-BP are set as *T* = 1, *σ* = 0.3, *δ* = 0.001, *M* = 8, and *α* = 0.2, respectively. The IS problem is a well-balanced data set; the number of training samples in each class is 30, and the overall testing *η*
_{
o
} is used to measure the performance of each algorithm. For the VC problem, the sample distribution values of each dimension are between −1 and +1, and the main parameters of distance weighting factor, width, incremental learning threshold, number of BP hidden neurons, and momentum constant in ILRBF-BP are set as *T* = 1, *σ* = 0.4, *δ* = 0.001, *M* = 9, and *α* = 0.1, respectively. The number of training samples in each class is 119, 118, 98, and 89. The VC problem is a highly imbalanced data set, where the strong overlap between the classes influences the performance of each algorithm. The overall testing *η*
_{
o
} and average testing *η*
_{
a
} are used to measure the performance of each algorithm.

Performance comparisons for the IS and VC problems

Data sets | Method |
| Training time(s) | Testing | Testing |
---|---|---|---|---|---|

IS | SVM | 96 | 11.61 | 90.62 | – |

MRAN | 78 | 11.68 | 85.82 | – | |

GAP-RBF | 87 | 5.77 | 86.34. | – | |

ELM | 49 | 0 | 90.23 | – | |

OS-ELM | 100 | 0.01 | 90.67 | – | |

ILRBF-BP | 77 and 8 | 2.09 | 91.57 | – | |

VC | SVM | 234 | 10.74 | 68.72 | 67.99 |

MRAN | 105 | 10.38 | 60.24 | 60.02 | |

GAP-RBF | 81 | 9.87 | 58.94 | 58.17 | |

ELM | 300 | 0.09 | 68.01 | 67.39 | |

OS-ELM | 300 | 0.12 | 68.95 | 67.56 | |

ILRBF-BP | 258 and 9 | 11.53 | 70.17 | 69.43 |

### 3.3 Analysis of the parameters in the ILRBF-BP algorithm

In this section, the parameter selection for the ILRBF-BP algorithm is discussed, which mainly refers to the distance weighting factor *T*, width *σ* and number of BP hidden neurons.

#### 3.3.1 Selection of distance weighting factor *T*

In this paper, parameter *T* is used for distance weighting, which can be used to control the interaction potential between two samples. By changing *T*, the nonlinear mapping of the potential *γ* can be achieved.

*T*, in this paper, the standard deviation is considered to measure the impact on

*T*. Here, the Twist classification problem is used in the experiment. Given the number of training samples is 500 and testing samples is 4000; other parameters are given as follows:

- 1)
Twist 1: Set

*d*_{1}= 0.2,*d*_{2}= 0.5, and*d*_{3}= 0.8,the standard deviation in each dimension is 0.3281 and 0.3196, respectively. The width parameter is set as*σ*= 0.1. - 2)
Twist 2: Set

*d*_{1}= 2,*d*_{2}= 5, and*d*_{3}= 8,the standard deviation in each dimension is 3.2744 and 3.2689, respectively. The width parameter is set as*σ*= 1

*T*, the classification performance is reduced. For the Twist 1 sample set, the standard deviation of each dimension is relatively small and the sensitivity of classifying accuracy on

*T*is reduced; however, when the

*T*is selected as 200, the maximum classification accuracy is achieved. Thus, the choice of

*T*should be inversely proportional to the standard deviation of each dimension, that is, \( T\propto \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$\underset{i=1,2\dots t}{ \max}\left\{{\alpha}_i\right\}$}\right. \), where

*α*

_{ i }is the standard deviation of

*i*th dimension and

*t*is the sample dimension. Figure 10b further indicates that when the samples are normalized to [−1, 1], the dependence on

*T*is reduced and a relatively stable classification accuracy can be achieved.

In this paper, for the Double-moon data set, the maximum standard deviation of two dimensions is 8.6448, so a small *T* should be provided and *T* is set as *T* = 1. For the Twist data set, the maximum standard deviation of two dimensions is 0.3281, and *T* is set as *T* = 200.

In high-dimensional space, the sample distribution is often relatively sparse. The sample dimension is considered to be inversely proportional to *T*, thus *T* ∝ 1/*t*. In this paper, for the IS classification problem, the input values in each dimension are scaled appropriately between 0 and +1. For the Heart and VC classification problems, the values in each dimension are between −1 and 1. Thus, the impact of standard deviation on *T* is eliminated. Taken into account the dimension information, for the IS, Heart, and VC classification problems, a small *T* should be provided and *T* is set as *T* = 1.

#### 3.3.2 Impact of the width *σ* on ILRBF-BP

The width parameter *σ* can be used to control the classification accuracy and generalization performance in a RBF network. In the ILRBF-BP algorithm, the width is fixed and selected by cross validation. To reduce the range of the width parameter value selection, we conduct preprocessing for the sample space. If the sample distribution values of each dimension vary considerably, such as in the IS data set, the inputs to each algorithm are scaled appropriately between 0 and +1, whereas the inputs to each algorithm remain unchanged in the Heart and VC data sets.

In the proposed incremental learning algorithm, using a potential function approach to construct RBF hidden neurons incrementally has to complete the effective coverage of the training sample space. As the samples in high-dimensional space are relatively sparse, if the width is too small, it may lead to establish the corresponding Gaussian kernel at each sample, and the proposed incremental learning algorithm is invalid. The reason is that although the potential value of each sample in the training sample space is measured, in the process of eliminating the potential value of the sample, the generated RBF hidden neurons do not cover other samples, which will lead to a failure of Eq. (9), and excessive RBF hidden neurons will lead to the redundancy of the network architecture, which affects the classification performance of the BP network. Thus, in the proposed ILRBF-BP algorithm, an effective kernel width parameter should be provided, which can generate proper RBF hidden neurons to cover the sample space. Note that the number of generated RBF hidden neurons should not be close to the number of the training samples; otherwise, the proposed algorithm is invalid.

*σ*= 0.1 and

*σ*= 0.2, the overall classification accuracy is poor, and effective coverage of the input sample space is not achieved.

When the value of the width parameter is in a suitable range, the number of generated RBF hidden neurons will change, but a relatively stable classification accuracy can be achieved. For the proposed ILRBF-BP algorithm, once the width is given, it can learn the sample space automatically, and the changes in the width parameter will affect the coverage of RBF hidden neurons and generate different RBF hidden neurons. Thus, the incremental learning strategy can counteract the effect of the width to some extent.

## 4 Impact of the number of BP hidden neurons on ILRBF-BP

## 5 Conclusions

In this paper, an incremental learning algorithm for the hybrid RBP-BP (ILRBF-BP) network classifier is proposed. The ILRBF-BP algorithm uses a potential function to measure the density of the training sample space and incrementally generates RBF hidden neurons, enabling the effective estimation of the center and number of RBF hidden neurons. In this way, a suitable network size for RBF hidden layer that matches the complexity of the sample space can be built up. A hybrid RBF-BP network architecture is designed to improve classification performance further, which shows good stability and generalization performance. The hybrid network simplifies the selection of the number of neurons in the BP hidden layer while further reducing the dependence on space mapping in the RBF hidden layer.

The performance of the ILRBF-BP algorithm has been compared with other batch learning algorithms, such as SGBP, KM-RBF, SVM, and ELM, and sequential learning algorithms, such as MRAN, GAP-RBF, and OS-ELM, in artificial data sets and UCI data sets. The method of adjusting output label values is used to prevent the output saturation problem for multi-class classification. Experiments demonstrate the superiority of the ILRBF-BP algorithm.

In the future, we will focus on the optimization of kernel width and imbalanced data classification problems. In the ILRBF-BP algorithm, the width is fixed and selected by cross validation and the adjustment of width parameter will affect the location of next center, as well as the network size. Therefore, it is necessary to design an adaptive width adjustment to adapt to the different regions of the sample space. In addition, for the imbalanced data classification problem, the samples in the boundary regions contain more classification information, thus how to measure and select these samples is particularly important. Further studies are needed to address these concerns.

## Declarations

### Acknowledgements

The authors thank the support provided by the National Science Foundation of China (No. 61331021, U1301251) and the Shenzhen Science and Technology Plan Project (JCYJ20130408173025036). The authors would like to thank the Editor-in-Chief, the Associate Editor, and the Anonymous Reviewers for their helpful comments and suggestions which have greatly improved the quality of presentation.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- M Lin, K Tang, X Yao, Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw and Learning Systems
**24**(4), 647–660 (2013)View ArticleGoogle Scholar - L-Q Li, W-X Xie, Intuitionistic fuzzy joint probabilistic data association filter and its application to multitarget tracking. Signal Process
**96**, 433–444 (2014)View ArticleGoogle Scholar - Y-L Wei, J-B Qiu, HR Karimi, M Wang, H-infinity model reduction for continuous-time Markovian jump systems with incomplete statistics of mode information. Int J Syst Sci
**45**(7), 1496–1507 (2014)MathSciNetView ArticleMATHGoogle Scholar - Y-L Wei, J-B Qiu, HR Karimi, M Wang, Filtering design for two-dimensional Markovian jump systems with state-delays and deficient mode information. Inform Sci
**269**, 316–331 (2014)MathSciNetView ArticleGoogle Scholar - Y-L Wei, J-B Qiu, HR Karimi, M Wang, A new design of H∞ filtering for continuous-time Markovian jump systems with time-varying delay and partially accessible mode information. Signal Process
**93**(9), 2392–2407 (2013)View ArticleGoogle Scholar - F-Y Meng, X Li, J-H Pei, A feature point matching based on spatial order constraints bilateral-neighbor vote. IEEE Trans Image Process
**24**(11), 4160–4171 (2015)MathSciNetView ArticleGoogle Scholar - L-X Guan, W-X Xie, J-H Pei, Segmented minimum noise fraction transformation for efficient feature extraction of hyperspectral images. Pattern Recogn
**48**(10), 3216–3226 (2015)View ArticleGoogle Scholar - HC Nejad, O Khayat, B Azadbakht, M Mohammadi, Using feed forward neural network for electrocardiogram signal analysis in chaotic domain. J Intelligent and Fuzzy Systems
**27**(5), 2289–2296 (2014)Google Scholar - CH Weng, CK Huang, RP Han, Disease prediction with different types of neural network classifiers. Telematics Inform
**33**(2), 277–292 (2016)View ArticleGoogle Scholar - C Lu, N Ma, ZP Wang, Fault detection for hydraulic pump based on chaotic parallel RBF network. EURASIP J on Advances in Signal Processing
**49**, (2011). doi: 10.1186/1687-6180-2011-49 - J Moody, CJ Darken, Fast learning in networks of locally-tuned processing. Neurocomputing
**1**(2), 281–294 (1989)Google Scholar - D Lowe, Characterising complexity by the degrees of freedom in a radial basis function network. Neurocomputing
**19**(1-3), 199–209 (1998)View ArticleGoogle Scholar - J Platt, A resource-allocating network for function interpolation. Neural Comput
**3**(2), 213–225 (1991)MathSciNetView ArticleGoogle Scholar - V Kadirkamanathan, M Niranjan, A function estimation approach to sequential learning with neural networks. Neural Comput
**5**(6), 954–975 (1993)View ArticleGoogle Scholar - L Yingwei, N Sundararajan, P Saratchandran, A sequential learning scheme for function approximation using minimal radial basis function. Neural Comput
**9**(2), 461–478 (1997)View ArticleMATHGoogle Scholar - G-B Huang, P Saratchandran, N Sundararajan, An efficient sequential learning algorithm for growing and pruning RBF (GAP-RBF) networks. IEEE Trans Syst Man Cybern B Cybern
**34**(6), 2284–2292 (2004)View ArticleGoogle Scholar - G-B Huang, P Saratchandran, N Sundararajan, A generalized growing and pruning RBF (GAP-RBF) neural network for function approximation. IEEE Trans Neural Netw
**16**(1), 57–67 (2005)View ArticleGoogle Scholar - M Bortman, M Aladjem, A growing and pruning method for radial basis function networks. IEEE Trans Neural Netw
**20**(6), 1030–1045 (2009)View ArticleGoogle Scholar - H Yu, PD Reiner, T Xie, T Bartczak, BM Wilamowski, An incremental design of radial basis function networks. IEEE Trans Neural Netw and Learning Systems
**2**(10), 1793–1803 (2014)View ArticleGoogle Scholar - S Suresh, D Keming, HJ Kim, A sequential learning algorithm for self-adaptive resource allocation network classifier. Neurocomputing
**73**(16-18), 3012–3019 (2010)View ArticleGoogle Scholar - T Xie, H Yu, J Hewlett, P Rózycki, B Wilamowski, Fast and efficient second-order method for training radial basis function networks. IEEE Trans Neural Netw and Learning Systems
**23**(4), 609–619 (2012)View ArticleGoogle Scholar - C Constantinopoulos, A Likas, An incremental training method for the probabilistic RBF network. IEEE Trans Neural Netw
**17**(4), 966–974 (2006)View ArticleMATHGoogle Scholar - G-B Huang, Q-Y Zhu, C-K Siew, A new learning scheme of feedforward neural, in Proceedings of International Joint Conference on Neural Networks (IJCNN 2004), pp. 985–99Google Scholar
- N-Y Liang, G-B Huang, P Saratchandran, N Sundararajan, A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw
**17**(6), 1411–1423 (2006)View ArticleGoogle Scholar - G-B Huang, L CHEN, C-K Siew, Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw
**17**(4), 879–892 (2006)View ArticleGoogle Scholar - G-B Huang, L CHEN, Convex incremental extreme learning machine. Neurocomputing
**70**(16-18), 3056–3062 (2007)View ArticleGoogle Scholar - G-B Huang, L Chen, Enhanced random search based incremental extreme learning machine. Neurocomputing
**71**(16-18), 3460–3468 (2008)View ArticleGoogle Scholar - G Feng, G-B Huang, Q Lin, Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw
**20**(8), 1352–1357 (2009)View ArticleGoogle Scholar - Y LeCun, L Bottou, GB Orr, K-R Müller, Efficient backprop. Lecture Notes Comput Sci
**1524**, 9–50 (1998)View ArticleGoogle Scholar - J-H Pei, W-X Xie, Adaptive multi thresholds image segmentation based on potential function clustering. Chinese J Computers
**22**(7), 758–762 (1999)Google Scholar - OA Bashkerov, EM Braverman, IB Muchnik, Potential function algorithms for pattern recognition learning machines. Autom Remote Control
**25**(5), 692–695 (1964)Google Scholar - G Cybenko, Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signal, and Systems
**2**, 303–314 (1989)MathSciNetView ArticleMATHGoogle Scholar - S Hayin,
*Neural networks and learning machines. Third Edition*(China Machine Press, China, 2009), pp. 61–63Google Scholar - C Blake, C Merz,
*UCI repository of machine learning databases*(Department of Information and Computer Sciences, University of California, Irvine, 1998). available at http://archive.ics.uci.edu/ml/ Google Scholar - C-C Chang, C-J,
*LIBSVM: a library for support vector machines*(Department of Computer Science and Information Engineering, National Taiwan University, Taiwan, 2003). available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html