Bearing Fault Detection Using Artiﬁcial Neural Networks and Genetic Algorithm

A study is presented to compare the performance of bearing fault detection using three types of artiﬁcial neural networks (ANNs), namely, multilayer perceptron (MLP), radial basis function (RBF) network, and probabilistic neural network (PNN). The time domain vibration signals of a rotating machine with normal and defective bearings are processed for feature extraction. The extracted features from original and preprocessed signals are used as inputs to all three ANN classiﬁers: MLP, RBF, and PNN for two-class (normal or fault) recognition. The characteristic parameters like number of nodes in the hidden layer of MLP and the width of RBF, in case of RBF and PNN along with the selection of input features, are optimized using genetic algorithms (GA). For each trial, the ANNs are trained with a subset of the experimental data for known machine conditions. The ANNs are tested using the remaining set of data. The procedure is illustrated using the experimental vibration data of a rotating machine with and without bearing faults. The results show the relative e ﬀ ectiveness of three classiﬁers in detection of the bearing condition.


INTRODUCTION
Machine condition monitoring is gaining importance in industry because of the need to increase reliability and to decrease the possibility of production loss due to machine breakdown.The use of vibration and acoustic emission (AE) signals is quite common in the field of condition monitoring of rotating machinery.By comparing the signals of a machine running in normal and faulty conditions, detection of faults like mass unbalance, rotor rub, shaft misalignment, gear failures, and bearing defects is possible.These signals can also be used to detect the incipient failures of the machine components, through the online monitoring system, reducing the possibility of catastrophic damage and the downtime.Some of the recent works in the area are listed in [1,2,3,4,5,6,7,8].Although often the visual inspection of the frequency domain features of the measured signals is ad-equate to identify the faults, there is a need for a reliable, fast, and automated procedure of diagnostics.
Artificial neural networks (ANNs) have potential applications in automated detection and diagnosis of machine conditions [3,4,7,8,9,10].Multilayer perceptrons (MLPs) and radial basis functions (RBFs) are the most commonly used ANNs [11,12,13,14,15], though interest in probabilistic neural networks (PNNs) is also increasing recently [16,17].The main difference among these methods lies in the ways of partitioning the data into different classes.The applications of ANNs are mainly in the areas of machine learning, computer vision, and pattern recognition because of their high accuracy and good generalization capability [11,12,13,14,15,16,17,18]. Though in the area of machine condition monitoring MLPs are being used for quite some time, the applications of RBFs and PNNs are relatively recent [3,19,20,21].In [19], a procedure was presented for condition monitoring of rolling element bearings comparing the performance of the classifiers MLPs and RBFs with all calculated signal features and fixed parameters for the classifiers.In this, vibration signals were acquired under different operating speeds and bearing conditions.The statistical features of the signals, both original and with some preprocessing like differentiation and integration, high-and lowpass filtering, and spectral data of the signals, were used for classification of bearing conditions.
However, there is a need to make the classification process faster and accurate using the minimum number of features which primarily characterize the system conditions with optimized structure or parameters of ANNs [3,22].Genetic algorithms (GAs) were used for automatic feature selection in machine condition monitoring [3,21,22,23].In [22], a GA-based approach was introduced for selection of input features and number of neurons in the hidden layer.The features were extracted from the entire signal under each condition and operating speed [19].In [23], some preliminary results of MLPs and GAs were presented for fault detection of gears using only the time domain features of vibration signals.In this approach, the features were extracted from finite segments of two signals: one with normal condition and the other with defective gears.
In the present work, the procedure of [23] is extended to the diagnosis of bearing condition using vibration signals through three types of ANN classifiers.Comparisons are made between the performance of the three different types of ANNs, both with and without automatic selection of input features and classifier parameters.The classifier parameters are the number of hidden layer neurons in MLPs and the width of the radial basis function in RBFs and PNNs. Figure 1 shows a flow diagram of the proposed procedure.The selection of input features and the classifier parameters are optimized using a GA-based approach.These features, namely, mean, root mean square, variance, skewness, kurtosis, and normalized higher-order (up to ninth) central moments are used to distinguish between normal and defective bearings.Moments of order higher than nine are not considered in the present work to keep the input vector within a reasonable size without sacrificing the accuracy of the diagnosis.The roles of different vibration signals are investigated.The results show the effectiveness of the extracted features from the acquired and preprocessed signals in diagnosis of the machine condition.The procedure is illustrated using the vibration data of an experimental setup with normal and defective bearings.

VIBRATION DATA
Figure 2 shows the schematic diagram of the experimental test rig.The rotor is supported on two ball bearings MB 204 with eight rolling elements.The rotor was driven with a three-phase AC induction motor through a flexible coupling.The motor could be run in the speed range of 0-10,000 rpm using a variable frequency drive (VFD) controller.For the present experiment, the motor speed was maintained at 600 rpm.Two accelerometers were mounted at 90 • on the right-hand side (RHS) bearing support to measure vibrations in vertical and horizontal directions (x and y).Separate measurements were obtained for two conditions, one with normal bearings and the other with an induced fault on the outer race of the RHS bearing.The outer race fault was created as a small line using electro-discharge machining (EDM) to simulate the initiation of a bearing defect.It should be mentioned that only one type of bearing fault has been considered in the present study to see the effectiveness of the proposed approach for two-class recognition.Diagnosis of different types and levels of bearing faults is important for optimal maintenance purposes and outside the scope of the present work.Each accelerometer signal was connected through a charge amplifier and an anti-aliasing filter to a channel of a PC-based data acquisition system.One pulse per revolution of the shaft was sensed by a proximity sensor and the signal was used as a trigger to start the sampling process.The vibration signals were sampled simultaneously at a rate of 49152 samples/s per channel.The lower and higher cutoff frequencies of each charge amplifier were set at 2 Hz and 100 kHz, respectively.The cutoff frequency  of each anti-aliasing filter was set at 24 kHz, almost the half of the sampling rate.The number of samples collected for each channel was 24576 with each bearing condition: normal and faulty.The experiment was repeated under the same operating conditions and a further set of 24576 data points was acquired for each accelerometer signal and bearing condition.These time-domain data were preprocessed to extract the features, similar to [10], for using them as inputs to the ANNs.Half of the first data set was used for training and the other half for testing the ANNs, while the entire data of the second set were used for testing.

Signal statistical characteristics
Two sets of experimental data, each with normal and defective bearings, were acquired.For each set, two vibration signals consisting of 24576 samples (q i ) were obtained using accelerometers in vertical and horizontal directions to monitor the machine condition.The magnitude of the vibration was constructed from the two component signals z = (x 2 + y 2 ).These signals were divided into 24 segments (bins) of 1024 (n) samples each.An alternative approach would have been to take 24 individual measurements from 24 different runs.However, the present approach was used, similar to [10], to see the effectiveness of the proposed procedure in situations where multiple runs of data may not be feasible, especially in actual industrial setting.Each of these data segments was further processed to extract the following features (1-9): mean (µ), root mean square (RMS), variance (σ 2 ), skew-ness (normalized third central moment γ 3 ), kurtosis (normalized fourth central moment γ 4 ), and normalized fifth to ninth central moments (γ 5 -γ 9 ) as follows: where E{•} represents the expected value of the function.
Figure 3 shows plots of some of these features extracted from the vibration signals (q i ) x, y, and z of the first set of data, each row representing the features for one signal.Only a few of the features are shown as representatives of the full feature set.
It is important to note that in the present work, only two (normal and faulty) conditions of bearings have been considered and the sample size for feature extraction was chosen as 1024 to keep the length of acquired data within a reasonable limit.The features were also calculated, doubling the number of samples with no significant difference.However, for consideration of multiple fault conditions, the data of longer duration (in terms of number of cycles or shaft revolutions) and larger sample size for feature extraction, especially for higher-order (fifth-ninth) moments, may be necessary.

Time derivative and integral of signals
The high-and low-frequency content of the raw signals can be obtained from the corresponding time derivatives and the integrals.In this work, the first time derivative (dq) and the integral (iq) have been defined, using sampling time as a factor, as follows:

dq(k)
The derivative and the integral of each signal were processed to extract an additional set of 18 features (10-27).

High-and lowpass filtering
The raw signals were also processed through low-and highpass filters with a cutoff frequency as one-tenth ( f /10) of the sampling rate ( f = 49152 Hz).The cutoff frequency was chosen to minimize the effect of sampling on the low-and high-frequency characteristics of the signals.These filtered signals were processed to obtain a set of another 18 features (28-45) leading to a total of 45 features.

Normalization
The total set of features consists of 45 × 144 × 2 array, where each row represents a feature and the columns denote the number of signals (three), segments per signal (24), bearing conditions (two), and sets of run (two).Each of the features was normalized, dividing each row by its absolute maximum value and keeping it within ±1 for better speed and success of the network training.A second scheme of normalization with zero mean and a standard deviation of 1 for each feature set was attempted.Another normalization scheme was also examined by making the features zero mean and then normalizing by the absolute maximum value.The results comparing the effectiveness of these normalization schemes are discussed in Section 6.5.However, it is to be mentioned that the use of absolute maximum in magnitude normalization scheme exploits the large peaks present in the fault signal lowering the normal rotational components.This changes the relative statistics of the signals with and without faults, leading to better classification success.

ARTIFICIAL NEURAL NETWORKS
In this section, three types of ANNs are briefly discussed with reference to the structures and the parameters.The main differences among these are also briefly discussed.Readers are referred to [13,17,24] for further details.Data from two different sets of run were used in the present work.For the first set of run, half of the data were used for training the ANNs and the rest were used for testing.Entire data from the second set of run were used for testing.

Multilayer perceptron
The feed-forward MLP network, used in this work, consists of three layers: input, hidden, and output.The input layer has nodes representing the normalized features extracted from the measured vibration signals.There are various methods, both heuristic and systematic, to select the neural network structure and activation functions [24].The number of input nodes was varied from 2 to 45 and that of the output nodes was 2. The target values of two output nodes can have only binary levels representing "normal" (N) and "failed" (F) bearings.In the MLPs, the sigmoidal activation functions were used in the hidden and output layers to maintain the outputs close to 0 and 1.The outputs were rounded to binary levels (0 and 1

Radial basis function networks
The structure of an RBF network is similar to that of an MLP.The activation function of the hidden layer is Gaussian spheroid function as follows: The output of the hidden neuron gives a measure of distance between the input vector x and the centroid c of the data cluster.The parameter σ, representing the radius of the hypersphere, is generally determined using iterative process selecting an optimum width on the basis of the full data sets.However, in the present work the width is selected along with the relevant input features using a GA-based approach.In the present work, the RBFs were created, trained, and tested using Matlab through a simple iterative algorithm of adding more neurons in the hidden layer till the performance goal is reached.

Probabilistic neural networks
The structure of a PNN is similar to that of an RBF, both having a Gaussian spheroid activation function in the first of the two layers.The linear output layer of the RBF is replaced with a competitive layer in PNN which allows only one neuron to fire with all others in the layer returning zero.The major drawback of using PNNs was computational cost for the potentially large size of the hidden layer which could be equal to the size of the input vector.The PNN can be Bayesian classifier, approximating the probability density function (PDF) of a class using Parzen windows [17].The generalized expression for calculating the value of Parzen approximated PDF at a given point x in feature space is given as follows: where p is the dimensionality of the feature vector and N A is the number of examples of class A used for training the network.The parameter σ represents the spread of the Gaussian function and has significant effects on the generalization of a PNN.
One of the problems with the PNN is handling the skewed training data, where the data from one class are significantly more than the other class.The presence of skewed data is more likely in a real environment as the number of data for normal machine condition would, in general, be much larger than the machine fault data.The basic assumption in the PNN approach is the so-called prior probabilities, that is, the proportional representation of classes in training data should match, to some degree, the actual representation in the population being modeled [16,17].If the prior probability is different from the level of representation in the training cases, then the accuracy of classification is reduced.To compensate for this mismatch, the a priori probabilities can be given as input to the network and the class weightings are adjusted accordingly at the binary output nodes of the PNN [16,17].If the a priori probabilities are not known, then training data set should be large enough for the PDF estimators to asymptotically approach the underlying probability density.
In the present work, the data sets have equal number of samples from normal and faulty bearing conditions.The PNNs were created, trained, and tested using Matlab.The width parameter is generally determined using iterative process, selecting an optimum value on the basis of the full data sets.However, in the present work, the width is selected along with the relevant input features using the GA-based approach, as in case of RBFs.

GENETIC ALGORITHMS
GAs have been considered with increasing interest in a wide variety of applications [25,26,27].These algorithms are used to search the solution space through simulated evolution of "survival of the fittest."These are used to solve linear and nonlinear problems by exploring all regions of state space and exploiting potential areas through mutation, crossover, and selection operations applied to individuals in the population [25,26].The use of GA needs consideration of six basic issues: chromosome (genome) representation, selection function, genetic operators like mutation and crossover for reproduction function, creation of initial population, termination criteria, and the evaluation (fitness) function.In the GA, a population size of ten individuals was used starting with randomly generated genomes.This size of population was chosen to ensure relatively high interchange among different genomes within the population and to reduce the likelihood of convergence within the population.

Genome representation
In the present work, GA is used to select the most suitable features and one variable parameter related to the particular classifier: the number of neurons in the hidden layer for MLPs and the width (σ) for RBFs and PNNs.Different mutation, crossover, and selection routines have been proposed for optimization [25].In the present work, a GA-based optimization routine [28] was used.

MLP training
For MLPs, the genome X contains the row numbers of the selected features from the total set and the number of hidden neurons.For a training run needing N different inputs to be selected from a set of Q possible inputs, the genome string would consist of N + 1 real numbers.The first N numbers (x i , i = 1, N) in the genome are constrained to be in the range 1 ≤ x i ≤ Q, whereas the last number x N+1 has to be within the range S min ≤ x N+1 ≤ S max .The parameters S min and S max represent, respectively, the lower and upper bounds on the number of neurons in the hidden layer of the MLP: T . (5)

RBF and PNN training
For RBFs and PNNs, the first N entries of the (N +1)-element genome represent the row numbers of the selected features as in case of MLPs.However, the last element x N+1 represents the spread (σ) of the Gaussian function of (3) and ( 4) for RBFs and PNNs, respectively.For the present work, this was taken between 0.1 and 1.0 with a step size of 0.1.

Selection function
In a GA, the selection of individuals to produce successive generations plays a vital role.A probabilistic selection is used based on the individual's fitness such that the better individuals have higher chances of being selected.There are various schemes for selection process [25,26].In this work, normalized geometric ranking method was used because of better performance [26,29].In this method, the probability P i for ith individual being selected is given as follows: where q represents the probability of selecting the best individual, r is the rank of the individual, and P denotes the population size.The parameter q is to be provided by the user.The best individual is represented by a rank of 1 and the worst having a rank of P. In the present work, a value of 0.08 was used for q.

Genetic operators
Genetic operators are the basic search mechanisms of the GA for creating new solutions based on the existing population.The operators are of two basic types: mutation and crossover.Mutation alters one individual to produce a single new solution, whereas crossover produces two new individuals (offsprings) from two existing individuals (parents).Let X and Y denote two individuals (parents) from the population and X and Y denote the new individuals (offsprings).

Mutation
In this work, nonuniform-mutation function [26] was used.It randomly selects one element x i of the parent X and modifies it as X = {x 1 , x 2 , . . ., x i , . . ., x N , x N+1 } T after setting the element x i equal to a nonuniform random number in the following manner: where r 1 and r 2 denote uniformly distributed random numbers between (0, 1); G is the current generation number; G max denotes the maximum number of generations; s is a shape function used in the function f (G); and a i and b i represent, respectively, the lower and upper bounds for each variable i.

Crossover
In this work, heuristic crossover [26] was used.This operator produces a linear extrapolation of two individuals using the fitness information.A new individual X is created as per (8) with r being a random number following uniform distribution U(0, 1), and X is better than Y in terms of fitness.If X is infeasible, given as η = 0 in (10), then a new random number r is generated and a new solution is created using (8): The choice of heuristic crossover was based on its main characteristics of utilizing the fitness function to determine the search direction for better performance [26].

Initialization, termination, and evaluation functions
To start the solution process, the GA has to be provided with an initial population.The most commonly used method is the random generation of initial solutions for the population.The solution process continues from one generation to another, selecting and reproducing parents until a termination criterion is satisfied.The most commonly used terminating criterion is the maximum number of generations.
The creation of an evaluation function to rank the performance of a particular genome is very important for the success of the training process.The GA will rate its own performance around that of the evaluation (fitness) function.The fitness function used in the present work returns the number of correct classification of the test data.The better classification results give rise to higher fitness index.

SIMULATION RESULTS
The data set 45 × 144 × 2 consisted of 45 normalized features for each of the three signals split in form of 24 segments of 1024 samples each, with two bearing conditions and two sets of run.Two cases were studied.In the first case (Case A), data of the first set of run were further divided into two equal subsets.The first 12 bins of each signal were used for training the ANNs giving a training set of 45 × 72 and the rest (45 × 72) were used for testing.In the second case (Case B), the complete data of the first set of run were used for training the ANNs and the data of the second set of run were used for testing.In both cases, the testing data sets had no part in the training of ANNs.In each case, the training was based on the training data sets only.No validation set was used for early stopping of the training process because of the limited size of the available data sets.However, for a larger data set, it would be preferred to have separate sets for training, validation, and testing.
For each of the MLPs and RBFs, two output nodes were used, whereas for PNNs only one output node was used.The use of one output node for all classifiers would have been enough.However, the classification success was not satisfactory with one output node in case of MLPs and RBFs for the present data sets with the particular choice of network structure and activation functions.The target value of the first output node was set as 1 and as 0 for normal and failed bearings, respectively, and the values were interchanged (0 and 1) for the second output node.For PNNs, the target values were specified as 1 and 2, respectively, representing normal and faulty conditions.Results are presented to see the effects of accelerometer location (direction) and signal processing for diagnosis of machine condition using ANNs with and without feature selection based on GA.The training success for each case was 100 percent.

Performance comparison of ANNs without feature selection
In this section, classification results are presented for straight ANNs without feature selection for the data of the first set of run (Case A).For each straight MLP, number of neurons in the hidden layer was kept at 24, and for straight RBFs and PNNs, widths (σ) were kept constant at 1.00 and 0.10, respectively.These values were found on the basis of several trials of training the ANNs.

Effect of sensor location
Table 1 shows the classification results for each of the signals x, y, and the resultant z using all input features (1-45).For all classifiers, test success was mostly unsatisfactory.The test success was in the range of 87.50%-95.83%for MLPs, 50.00%-95.83%for RBFs, and 83.33% for PNNs.The classification error was in the failure to recognize a fault, termed as fault-not-recognized (FNR) which may suggest the overlap of the features of faulty bearings to that of normal bearings.The performance of MLPs and PNNs is reasonably consistent for all signals; however, for RBF, the signal z gives a classification success around 45% higher than the signals in other two directions (x and y).This may be attributed to the better classification capability of RBF using features extracted from the combined signal z.

Effect of signal preprocessing
Table 2 shows the effects of signal processing on the classification results for straight ANNs with all three signals.In each case, all the features from the signals with and without signal processing were used.To see the relative effectiveness of the lower-and the higher-order features of the original signals, results were obtained for the feature ranges separately (1-4 and 5-9) and together (1-9).The use ofthe three signals x, y, and z gave rise to better classification success than using individual signals.This may be due to the fact that the feature sets extracted from the three signals gave better representation of the bearing conditions than the individual signals.The classification performance of using only lower-order moments (1st-4th) was better than using the higher-order moments (5th-9th).The use of all nine features gave classification success better than higher-order features only, but slightly worse than the lower-order features.
The test success, based on the last four rows of data sets, was in the range of 90.97%-95.83%for MLPs, 98.61% for RBFs, and 94.44% for PNNs.Here again, the classification error was of type FNR for all cases, except for PNN, it was 4.17% FNR and 1.39% false alarm (FA).The misclassification suggests the inadequacy of separation of the data sets (normal and faulty) for all three classifiers.From examination of the data sets, no particular explanation for the difference in misclassification type (FNR or FA) for PNNs could be put forward since for each case, the data sets included equal number of samples from normal and faulty classes.

Performance comparison of ANNs with feature selection
In this section, classification results are presented for ANNs with feature selection based on GA for the Case A. Only three features were selected from the corresponding ranges.In case of MLPs, the number of neurons in the hidden layer was selected in the range of 10 and 30, whereas for RBFs and PNNs, the Gaussian spread was selected in the range of 0.1 and 1.0 with a step size of 0.1.

Effect of sensor location
Table 3 shows the classification results along with the selected parameters for each of the signals x, y, and the resultant z.In all cases, the input features were selected by GA from the entire range (1-45).The test success improved substantially in each case with feature selection, compared with the results of Table 1.The test success was 95.83%-100% for MLPs, 87.50%-100% for RBFs, and 100% for PNNs.The classification error was of type FNR with MLPs and RBFs.Features selected for different schemes are also shown for comparison.Though some of the features were selected by two of the three schemes, there was no apparent fixed combination of features.However, it should be noted that features from higher-order moments (features 5-9, 14-18, 23-27, 32-36, and 41-45) were selected by GAs quite often, justifying their inclusion in the feature sets.

Effect of signal preprocessing
Table 4 shows the effects of signal processing on the classification results for the signals x, y, and z with GA.In all cases, only three features from the signals with and without signal preprocessing were used from each of these ranges.The effectiveness of the lower-order moments (1st-4th) was found to be better than the higher-order moments (5th-9th).In case of PNN, the higher-order moment (5th) improved the classification success more than using only the lower-order features.Here again, the selection of features from higherorder moments was evident.The groupings of the features selected for different cases showed no apparent bias or preference.From the results of last four rows, the test success was 97.22%-100% for MLPs, 88.89%-100% for RBFs, and 94.44%-98.61% for PNNs.For PNNs, the classification errors were as follows: 1.39%-4.17%FNR and 0%-1.39%FA.

Performance of PNNs with selection of six features
In this section, results are presented for PNNs with six features from the corresponding ranges as shown in Tables 5  and 6.The test success was 100% for all cases with individual signals (Table 5) and also for all signals and features taken together (Table 6).Here again, the features from higher-order moments were selected by GAs.   8 and 9).Table 8 shows the effect of number of input features on the ANN classification performance with a generation number of 40.In general, the test success improved with higher number of input features, it was 100% for all classifiers with 8 features.The test success with six features was 100% for MLP and PNN, and 99.31% for RBF.Though the performance of MLP was better than the other two classifiers with lower number of features, the training time for MLP was much higher.

Results with second test data set using statistical normalization
The data sets discussed so far were normalized in magnitude to keep the features within ±1.In this section, results are presented using the statistical normalization scheme with zero mean and unit standard deviation, see Table 9.The performance of PNNs for two normalization schemes can be compared from the results presented in last columns of Tables 7  and 9.The classification success of the statistical normalization scheme (with zero mean and standard deviation of 1) is slightly better than the magnitude normalization scheme for lower number of features (up to 3).However, the test success deteriorated with the scheme of statistical normalization for higher number of features.Training time increased somewhat with higher number of features but not in direct proportion.
To investigate the separability of the data sets with and without bearing fault, three features selected by GA were plotted, as shown in Figures 4a and 4b.In Figure 4a, the magnitude normalized features are shown, whereas in Figure 4b, the statistically normalized features are shown.In both cases, the data clusters are not well separated and have considerable overlap.This can explain the unsatisfactory classification success with three features only.The smaller width selected by GA for lower number of features (up to 3) may be attributed to the closeness of the data clusters.However, the separation of classes is slightly better for the statistically normalized data than the magnitude normalized data.Another normalization scheme was also examined by making the features zero mean and then normalizing by the absolute maximum value.However, no significant difference in classification performance of the magnitude normalized data (with and without zero mean) was noticed.

CONCLUSIONS
A procedure is presented for the diagnosis of bearing condition using three classifiers, namely, MLP, RBF, and PNN with GA-based feature selection from time-domain vibration signals.The selection of input features and the appropriate classifier parameters have been optimized using a GA-based approach.The roles of different vibration signals and preprocessing techniques have been investigated.The effects of number of features and generations on the classification success have been studied.The use of six selected features gave 100% test success for most of the cases considered in this work.Though the classification performance of MLP was comparable with that of PNN with six features, the training time of MLP was much higher than PNN.The false classification with lower number of features may be attributed to the overlap of data sets with and without bearing faults.The effectiveness of the features from lower-order statistics was better than the higher-order moments.However, the selection of features from higher-order moments using GAs justified the inclusion of these moments in the feature sets.The results show the potential application of GAs for selection of input features and classifier parameters in ANN-based condition monitoring systems.However, in the present study, the data sets include equal representation from normal and faulty bearings under similar operating conditions.All the features have been considered from time-domain vibration signals.The sample size used for extraction of features is kept relatively small for the two-class (normal and faulty) problem considered in this work.For multiple fault conditions (multiclass problems), the issue of suitable sample size for feature extraction needs to be examined.This leaves a scope for future work including consideration of skewed data sets, incorporation of frequency-domain data, studying the effects of varying machine conditions, and extension to multiclass problems covering different types and levels of bearing faults.

Figure 1 :
Figure 1: Flow chart of diagnostic procedure.

Figure 4 :
Figure 4: (a) Scatter plot of features with magnitude normalization.(b) Scatter plot of features with statistical normalization.

Table 1 :
Performance comparison of classifiers without feature selection for different sensor locations.

Table 2 :
Performance comparison of classifiers without feature selection for different signal preprocessing.

Table 3 :
Performance comparison of classifiers with feature selection for different sensor locations.

Table 4 :
Performance comparison of classifiers with feature selection for different signal preprocessing.

Table 5 :
PNN performance with six selected features for different sensor locations.

Table 6 :
PNN performance with six selected features for different signal preprocessing.

.4. Results with second test data set In
[10]previous sections, both training and test feature sets were derived from the same vibration signals of the first set of run (Case A) although the test data were not used in training.In this section, simulation results are presented for Case B using the entire data of the first set of run for training of ANNs and the data of the second set of run for testing.The size of training and test data was 24576 each.The normalization was carried out using maximum values of the particular feature set[10].Table7shows the results of different generation numbers on the classification performance of

Table 7 :
PNN performance with six selected features for different generation numbers.

Table 8 :
ANN performance with magnitude normalized data for different number of features selected.

Table 9 :
PNN performance with statistically normalized data for different number of selected features.