Recognition of human activities with wearable sensors

A novel approach for recognizing human activities with wearable sensors is investigated in this article. The key techniques of this approach include the generalized discriminant analysis (GDA) and the relevance vector machines (RVM). The feature vectors extracted from the measured signal are processed by GDA, with its dimension remarkably reduced from 350 to 12 while fully maintaining the most discriminative information. The reduced feature vectors are then classified by the RVM technique according to an extended multiclass model, which shows good convergence characteristic. Experimental results on the Wearable Action Recognition Dataset demonstrate that our approach achieves an encouraging recognition rate of 99.2%, true positive rate of 99.18% and false positive rate of 0.07%. Although in most cases, the support vector machines model has more than 70 support vectors, the number of relevance vectors related to different activities is always not more than 4, which implies a great simplicity in the classifier structure. Our approach is expected to have potential in real-time applications or solving problems with large-scale datasets, due to its perfect recognition performance, strong ability in feature reduction, and simple classifier structure.


Introduction
Activity recognition has become one of the most active topics in context-aware computing, due to its wide application prospects in industrial, educational, and medical domains [1][2][3].For instance, characterization of the activities of assembly-line workers can increase the safe reliability and improve the productivity [1]; as another example, monitoring the human activity of daily life can provide very useful information for medical diagnosis, elderly care, as well as the assessment of individuals' physical and mental conditions [2].
Early studies in activity recognition employed visionbased systems with single or multiple video cameras, which remains the most common means to date [4,5].
In general, such systems may be acceptable and practical in a laboratory or well-controlled environment.However, when the activities take place in the real-home setting or outdoors environment, the accuracy of activity recognition would be affected by variable lighting condition or the clutter disturbance [6].Wearable-sensor-based system offers an appropriate alternative to activity recognition [7][8][9][10][11], which is inherently immune to the shadow and occlusion effects.Furthermore, compared with the vision-based systems, this kind of systems would not supply additional privacy information, thus the subjects may act more naturally as in their daily life.Another major advantage in using wearable-sensorbased systems is that the cost of storing and processing data can greatly be reduced.Therefore, in this article we will focus on developing an effective approach for activity recognition with wearable sensors.
Feature dimension reduction (which can be seen as a feature selection operation) is an important and essential procedure before classification [6][7][8].A high-dimensional feature would result in the following problems: First, some features are irrelevant or redundant and cannot provide supportive information for classification (in the worst sense, inappropriately emphasizing such features may even hinder the recognition accuracy); second, the training of classifiers in a high-dimensional space would be difficult and time consuming.Therefore, it is desirable to effectively reduce the dimension of the feature set before performing the classification task.Principal component analysis (PCA) and linear discriminant analysis (LDA) are two classical techniques for data dimension reduction [7][8][9]12,13].The essence of the PCA is to find the optimal projection directions that maximally preserve the data variance.However, it does not take into account the class label, thus the 'optimal' projection directions may not give the most discriminating feature.While LDA seeks the optimal projection directions that maximize the ratio of between-class scatter and withinclass scatter.However, it cannot capture the nonlinear relationships among samples.To overcome the weakness of LDA, the generalized discriminant analysis (GDA) based on the kernel trick has been proposed in [14], which can be viewed as an extension of LDA.Actually, GDA has been proved superior to LDA in many applications such as face recognition, document analysis, and image retrieval [15][16][17].
As to the existing classification methods, most of them can be divided into two categories, depending on whether they are based on the kernel-based leaning or not.The non-kernel-based learning classification methods [for instance, the k-nearest neighbor (k-NN) method] usually give equal or comparative weight to each training sample, which may not be reasonable in some cases.While the kernel-based leaning classification methods [for instance, the support vector machines (SVM) method] try to pick out the informative samples for classification, thus these methods usually achieve higher recognition accuracy and at the same time maintain relative sparsity of the support vectors.In the last decade, the Bayesian theory was introduced into the design of classification methodology, of which the relevance vector machines (RVM) method [18] is a representative.As a Bayesian extension of the SVM, the RVM can provide posterior probabilistic output for class memberships.Furthermore, the RVM requires dramatically fewer number of relevance vectors (RVs), which means it is more suitable for real-time applications.
Extensive study has been done on human activity recognition with wearable sensors [7][8][9][10][11].Fleury et al. [7] recognized seven kinds of human daily activities.Data associated with activities were collected by the infrared sensor, the temperature and hygrometry sensor, and the wearable kinematic sensors.A feature set was extracted from the raw sensor data by PCA, and then SVM method was employed in the classification process.The cross-validation test achieved an overall recognition accuracy of 86%.Khan et al. [8] used LDA to generate feature set and artificial neural networks (ANNs) to recognize human activities with data recorded by the body-worn accelerometers.Altun et al. [9] constructed a hardware system with gyroscope, accelerometer, as well as magnetometer, and did a comparative study on human activity recognition.They considered in total seven classification methods: Bayesian decision, decision tree, the least-squares method, k-NN, dynamic time warping, SVM, and ANN, where the SVM achieved the best recognition performance in the 'leave-one-out' cross-validation process.
In this article, we put forward a new approach for human activity recognition.We employ GDA to reduce the feature dimension and then construct a multiclass RVM classifier to perform the classification task.To the best of authors' knowledge, both the GDA and RVM techniques are applied in the wearable-sensor-based system for the first time.
The rest of this article is organized as follows.Section 2 provides a briefly description of the Wearable Action Recognition Dataset (WARD) [19].The detailed information about the feature extraction and the feature dimension reduction process is presented in Section 3. In Section 4, we first provide a review of the RVM classification technique and then introduce the construction procedure for the multiclass RVM classifier.Section 5 reports the experimental results.Conclusions are given in Section 6.

Wearable action recognition dataset
In this article, we used a publicly available dataset called WARD http://www.eecs.berkeley.edu/~yang/software/WAR/ for human activity recognition, which was introduced by Yang et al. [19].To construct this dataset, 20 volunteers, i.e., 7 females and 13 males with a wide age range (from 15 to 75 years old), were enrolled to collect 13 activities.All the involved 13 kinds of activities are listed in Table 1.
The data were recorded by five custom-built sensor boards, which were attached to different body parts: two on the wrists, one on the waist, and two on the ankles.Each sensor board has been equipped with two sensors: a tri-axial accelerometer with the range of ± 2 g (m/s 2 ) and a bio-axial gyroscope with the range of ± 500°/s.Since the output data format of each accelerometer and gyroscope would have 3 and 2 dimensions, respectively, the activity signal are totally given in 25 dimensions.
Figure 1 provides instances of measured signal with three different activities related to the same subject.Subfigures in the first row show the three-dimensional data recorded by the accelerometer, while subfigures in the second row show the two-dimensional data recorded by the gyroscope located at the waist.As we expected, there are much differences in the magnitude and period of the measured signal among different activities.

Feature extraction and reduction
In this section, we first describe the data preprocessing procedure, and then present the feature extraction and data normalization process.Finally, the GDA is introduced for feature reduction.

Data preprocessing
In general, it is unnecessary to analyze the entire recorded data for activity recognition.Therefore, we divided every raw sample data into small windows before feature extraction.In order to sufficiently capture the information of human activity and be convenient for the FFT-based computation of the frequency-domain features, the window length is set to be 2 n .Four different window lengths, 32 samples, 64 samples, 128 samples, and 256 samples, were investigated, and the best recognition performance can be achieved with the window length of 256 samples.It can be explained as follows.At a sampling rate of 30 Hz, the time duration related to the four different window lengths would approximately be 1, 2, 4, and 8.5 s, respectively.Bouten et al. [20] had pointed out that the frequency of human daily activity mainly ranged from 1 to 18 Hz; therefore, the time duration of 8.5 s would cover at least several cycles of human activities and accordingly the activity recognition performance would be more reliable, i.e., shorter time duration would cover less activity cycles, thus the recognition performance would gain lower credibility.In this study, the window lengths larger than 256 samples were not taken into consideration.There are two reasons behind it: (1) for some activities, the number of samples contained in the raw sample data is less than 512.For example, the raw sample data as to The subject walks clockwise for more than 10 s Turn left (TL) The subject stays at the same position and turn left for more than 10 s Turn right (TR) The subject stays at the same position and turn right for more than 10 s Go upstairs (Up) The subject goes up a flight of stairs

Go downstairs (Down)
The subject goes down a flight of stairs Jog (Jog) The subject jog straight forward for more than 10 s

Jump (Jump)
The subject stays at the same position and jump for more than 5 times Push wheelchair (Push) The subject pushes a wheelchair/walker for more than 10 s the "Jog" activity of the test subject 5 only contains 397 samples.
(2) As shown below in Table 2, with the window length of 256 a recognition rate as high as 99.2% can be achieved by our method.Increasing the window length may slightly improve the recognition rate, but will result in a longer delay.
As a result, we set the length parameter of the window to be 256 samples in the following experiments.Then, the full data size in a truncated window would be 256 × 25, where 25 equals the dimensions of activity signal, i.e., each sensor node can provide five-dimensional activity signal, and there are in total five sensor nodes, and there are 128 × 25 data overlapping between consecutive windows.

Feature extraction
The features derived from time domain and transform domain are both used for activity recognition in most previous studies [9,21].In this study, we simply choose the features most frequently adopted by previous researchers, rather than deliberately select them.Specifically, the selected time domain features include the mean value, variance, skewness, and kurtosis, which can be expressed as follows: where N = 256, s i, n represents the nth data value in the ith dimension associated with some window.While the selected transform domain features include the magnitudes of the maximum five peaks of the resultant fast Fourier transform (FFT), as well as the magnitudes of the maximum five peaks of the resultant cepstrum coefficients, which can be calculated as follows: The six types of feature presented above can be employed for activity recognition.
Figure 2 gives an instance of signal representations both in the time and the transform domain, related to some windows for two specific activities.Figure 2a, b is the signal along z-axis related to the walking forward and the jump activity, recorded by the accelerometer located at the waist.Figure 2c, d is the resultant FFT of the signal in (a) and (b), respectively (with the maximum five peaks of FFT marked with 'O'), while Figure 2e, f shows the resultant cepstrum of the signal in (a) and (b), respectively (with the maximum five peaks of cepstral coefficients marked with 'O').
According to the above feature extraction procedure, a feature vector with 14 elements can be obtained from each window along every dimension, i.e., 4 elements in the time domain and the remaining 10 elements in the transform domain.Thus, the total dimension of the feature vectors would be 350 × 1.

Data normalization
Commonly, the selected features are heterogeneous.Directly taking the feature data acquired as in Section 3.2 for subsequent classification may lead to problems, especially when the distribution characteristic of the selected features witnesses dramatical discrepancies.Therefore, all the features should be normalized before constructing the classifiers.For simplicity, we do this study for each feature in a linear way as follows: where x j denotes the data value of jth feature before normalization, while the symbols 'max{f j }' and 'min{f j },' respectively, represent the maximum and the minimum values related to this feature throughout the whole dataset.Thereafter, all the feature values will fall into the range [0 1].

Feature dimension reduction by GDA
As described above, the GDA is a nonlinear data dimension reduction method based on kernel function learning technique, which will be used to deal with the normalized feature vectors.F, x i ↦ j(x i ), and then the classical LDA is carried out in this new feature space.
The between-class scatter and within-class scatter matrix in the feature space F are defined as: where x lk denotes the element k of the class c l and φl represents the mean of the class c l in space F: The GDA method would find the projection matrix v that maximizes the ratio: Note that explicitly carrying out the mapping j means a demanding task.Therefore, the skill of reproducing kernels has been adopted when deriving the projection matrix.Since the rank of B is no more than L-1, the upper boundary value of t is L-1.More details about the GDA are available in [14].
After performing the GDA on the normalized activity feature vectors, the dimensions of them would be reduced to n (n ≤ L-1).In our case, there are in total L = 13 activities, thus the dimension of feature vectors will be reduced to no more than 12.We have tested the performance of GDA with different reduced dimensions ranging from 1 to 12, and finally set the dimension parameter to be 12 as it can provide the best recognition performance.We also used PCA and LDA for comparison.The resultant threedimensional features with highest weights by each method are picked out and drawn in Figure 3.It can be seen that the GDA can capture the discriminate information better than and LDA, and this would be a good prognosis for the recognition performance.

RVM classification techniques
In this section, we will review the binary RVM theory, and then introduce a multiclass RVM model for solving the multi-class problem.Based on that, the multidimension activity feature vectors can be classified.

The binary RVM
RVM is originally designed for the binary classification problem.Given a training dataset of N input target pairs {x n , t n } N n=1 , where x n R m is the training sample and t n {0,1} is the target value of x n .Supposing the posterior probability of x n with t n = 1 is given by P(t n = 1|x n , w) = s{y(x n ; w)}, where s(y) is the logistic sigmoid function defined by s(y) = 1/(1+e -y ), then the posterior probability of x n with t n = 0 can be expressed as P(t n = 0|x n , w) = 1-s{y(x n ; w)}.On assumption that the input variables x n are independent of each other, the likelihood on the entire set of training samples can be calculated using the Bernoulli distribution: where w is a weight vector represented by w = [w 1 , w 2 , w 3 , ..., w N ] T .Here, we assume that w can be well described by the zero-mean Gaussian distribution with In order to find the 'most probable' weights w MP , an iterative procedure is utilized, which is based on a Laplace approximation.With a fixed value of a, the logarithm of the posterior distribution over the weight w is given by where A = diag(a 0 , a 1 , ..., a N ).When maximizing the value of above expression, the mean value of w MP and its covariance Σ MP should be where B is an N × N diagonal matrix with elements b n = y n (1-y n ), the vector y = (y 1 , y 2 , ..., y N ) T , and F is the design matrix with elements Φ ni = j i (x n ).The detail information about the derivation of the above two equations is provided in Appendix.
Consequently, the corresponding marginal likelihood can be expressed as To maximize it, the parameter a n should be updated as follows: in each iteration, where w n denotes the nth element of the estimated posterior weights w MP , and Diag n represents the nth diagonal element of the posterior covariance matrix Σ MP from Equation (15).The above procedure via Equations ( 13)-( 17) is repeated until when the preset convergence criterion is met.Up to this point, the training stage for the binary classifier is completed.In the classification stage, the test sample x will be classified to the class t {0,1}, which maximizes the conditional probability P(t|x, w).

Multiclass RVM
Traditional RVM solves the binary classification problem.However, the practical activity recognition task usually involves a multiclass task.For instance, the WARD dataset contains totally 13 kinds of activities.Therefore, the RVM must be extended to the multiclass situation.The first possible scheme is to directly generalize the RVM to the multiclass RVM as in [18].However, in this case the size of the covariance V would linearly scale with the total number of involved classes, which is a disadvantage from the computational perspective [18].The second possible solution is to consider the L-class problems as a set of two-class problems.In this case, the simplest way called 'one-versus-all' is to train L individual binary classifiers and integrate them together.The test sample x will be classified to the class t i on condition that

Experimental results
In this section, we first examine the convergence characteristic of the constructed RVM classifiers and present the classification results for the proposed approach with threefold cross validation on WARD (Section 5.1).Then, we compare the recognition performances with different feature reduction techniques and different classification techniques, respectively.We also show a comparison between our approach and other existing methods on the same dataset (Section 5.2).

Recognition performance with the proposed approach
As described in Sections 3 and 4, the feature vectors are extracted from the measured activity signal and compressed into 12 dimensions by GDA, which subsequently are classified with the multiclass RVM technique.To evaluate the recognition performance of our approach, we did threefold cross validation on the entire WARD dataset.Specifically, all the feature vectors were randomly divided into three partitions, of which one partition was retained as the validation set for testing, and the remaining two partitions were used for training.Such cross-validation process was performed for three times, so as to ensure each partition was validated.The whole procedure was repeated for ten times, and the resultant recognition rates were averaged.
During the cross-validation process, we established an RVM classifier for each kind of activity, i.e., there were in total 13 different RVM classifiers for all kinds of activities.To examine the convergence characteristic of the constructed classifiers, we monitored their marginal likelihood versus the iteration numbers.As shown in Figure 4, the likelihood of 'ST' activity (denoted by the solid line and squares) quickly converged after about 12 iterations, and those of the remained activities also show the similar tendency, which proves the consistency of the classifiers and the convenience to construct them.
Figure 5 shows the recognition results with different numbers of feature types mentioned in Section 3. It can be seen that the recognition accuracy gradually get improved as the number of employed feature types increases, which indicates that the different types of feature considered in this article can provide complementary information.As a result, all the six types of features are used for recognition in the following experiments.
The confusion matrix related to the recognition result (with all six types of feature) is given in Table 3.It can be observed that the confusion occurrences are distributed in an unbalance way.For instance, the three most confused pairs are 'Up' and 'Jog', 'SI' and 'ST', as well as 'ST' and 'SI'.The confusion rates between them have, respectively, reached 2.4, 1.45, and 1.31%, while some pairs such as 'LY' and 'Push', 'TR' and 'WR', as well as 'TL' and 'Jump' are never confused with each other.It probably can be explained as follows.The subjects do not have to move the ankles to perform both 'SI' and 'ST' activities, and at the same time the accompanying movement of waist is always quite small.Thus, the sensor nodes would provide less discriminable information for those two kinds of activities.While 'Up' is sometimes misclassified as 'Jog' mainly because both activities may involve the feet rising action.A possible solution for improving the discrimination between similar activities is to include more sensors located at knees or thighs, or deploying other kinds of sensors.For instance, sensors such as location sensors can be used to keep track of a subject's body movement.Another interesting point is that the confusion rates are not necessarily equivalent between a specific pair.For instance, though the 'Up' activity may be judged as 'Jog' with a rate of 2.4%, the 'Jog' is never mistaken as 'Up'.Such phenomenon may provide cues for further improvement on the selection of features.

Comparative evaluation
Extensive comparisons have been made to thoroughly examine the performance of the proposed approach.This section reports the comparison results.
First, we compare the recognition performances of our proposed feature reduction with two other feature reduction techniques, i.e., PCA and LDA.The evaluation is performed on the WARD dataset, and the same classifier RVM is employed.The comparison results are listed in Table 4.The GDA achieves the recognition rate of 99.2%, which is 22.9% higher compared with the PCA and 58.9% higher compared with the LDA.This consequence is as per our expectation.On one hand, the distribution of wearable data is nonlinear and complex duo to the factors such as measurement noise, outliers, and other variation.On the other hand, both PCA and LDA are linear feature reduction techniques.Thus, it is difficult for them to capture nonlinear relationship with a linear mapping.On the contrary, GDA is nonlinear extension of LDA.It can transform the original data space to a feature space by a nonlinear mapping through kernel methods, where is more likely to be linearly separable than in the original data space.Therefore, the GDA can provide more reliable and robust solution to activity recognition problem.
We subsequently compare the recognition performances between the RVM classifiers and three other popular classification techniques, i.e., k-NN, Bayesian decision, and SVM.The evaluation is also performed on the WARD dataset, and the same feature reduction technique (GDA) is employed.In general, recognition accuracy is sensitive to the parameters of classifiers.For fairly comparison, it would be desirable to adopt optimal parameters for each kind of classifier, respectively.Specifically, as to the k-NN classifier, we set k = 3 since in this article we can achieve the highest recognition accuracy under this condition; While for the RVM classifier, the Gaussian kernel is employed and the optimal bandwidth value is be found by the following simple method: we increased the bandwidth value with a constant step of 0.05 over the range of [0.05 1], and trained the RVM classifier over the whole training set.The bandwidth parameter 0.15, which maximized the classification accuracy, was chosen for the following experiments.As to the construction of the multiclass SVM classifiers, we also adopted the Gaussian kernel and the 'one-versusall' strategy, which are the same as in the multiclass RVM model.Its optimal values of two controlling parameters, i.e., bandwidth and regularizing parameter C, are also fixed with the same searching strategy as in RVM (in this article, the optimal bandwidth and regularizing parameters are set to be 0.25 and 10,    [21].Although the values of N SV are larger than 70 in most cases, the values of N RV are always not more than 4, i.e., the sparsity property of the RVM is much better than the SVM.Therefore, the results in Table 5 demonstrate that RVM has remarkable advantage both in the recognition accuracy and the sparsity of RVs. To further evaluate the performance of our approach, we also employed other conventional metrics [22,23] including the precision, the recall rate, the F index and the specificity rate, which can be described as    where TP (true positive) refers to the number of positive samples classified as positive; FP (false positive) refers to the number of positive samples classified as negative; FN (false negative) denotes the number of negative samples classified as positive; and TN (false negative) denotes the number of negative samples classified as negative.The comparison results regarding all these metrics are plotted in Figure 6.It can be seen that the RVM achieves the highest scores of them, which are 0.9918, 0.9903, 0.9910, and 0.9993, respectively.
Figure 7 highlights the relationship between the true positive rate (TPR) and false positive rate (FPR) related to the different classification techniques.It shows that the result of RVM is almost perfect, with the highest TPR of 99.18% and lowest FPR of 0.07%.The SVM and Bayesian decision method appear a little worse, while the k-NN method performs the worst, with the lowest TPR value of 89.04% and the highest FPR of 1.08%.
Finally, we also compare the performance of our approach with other existing methods on the same dataset.The quantitative results have been reported in Table 2.It can be seen that our approach outperforms other alternatives in terms of recognition rate.Specifically, Yang et al. [19] employed a distributed sparsity classifier to classify human activities and gave the recognition accuracy of 93.46% with all five sensor nodes, which is about 6% lower than that of our approach.They modeled the distribution of multiple action classes as a mixture subspace model and represented the test sample with linear approximation of all training samples.However, this linear representation structure may meet limitations in describing the test samples of the complex activities.Huynh [24] combined a generative model (multiple eigenspace) with SVM classifier into their activity recognition framework.Since the multiple eigenspace approach has advantages in the representing the structure of the input data and SVM has good discriminability, they achieved higher recognition accuracy than that of Yang's method, but still 2.23% lower than that of our approach.We also noted that Yang et al.'s method can be adaptive to the alteration of sensor configuration by constructing a global projecting matrix, thus it has advantages over Huynh's as well as our approach in dealing the problems if one or more sensor nodes (or sensor) failures.One possible way for us in handling these problems is to individually train the data from individual sensor, and then fuse the classification results with the valid sensors at the decision level.
It is also worth noting that the test data we use are different from that used in the study of [25].In our test data, the number of sensor nodes, the number of test subjects, and the types of activities are 5, 20, and 13, respectively, while the corresponding numbers and types in [25] are 8, 3, and 12, respectively.Therefore, our results are not compared with those in [25].

Conclusions
We put forward a novel approach for the recognition of human activities with wearable sensors, by combining the GDA and the RVM techniques.To the best of authors' knowledge, both of these techniques are applied to this domain for the first time.
As a powerful data dimension reduction method, the GDA can sharply reduce the dimension of the feature space, while maintaining the most discriminative information among different activities.Specifically, in this article the dimension of the feature vectors has even been reduced from 350 to merely 12, which can greatly speed up the subsequent training process.Meanwhile, the RVM also shows great extension flexibility to the multiclass classification problems.Experimental results on the WARD dataset demonstrated that the RVM technique not only provides the highest recognition rate, the highest TPR, as well as the lowest FPR compared with the conventional classification techniques, but also possesses much simpler classifier structure in contrary to the SVM.In conclusion, our approach would have advantages in real applications or solving problems with large-scale datasets, due to its perfect recognition performance, strong ability in feature reduction, and simple classifier structure.
Appendix: Derivations of Equations ( 14) and (15) The RVM model takes the form of a linear combination of basis function transformed by a logistic sigmoid function y(x, w) = σ w T φ (x) = 1 1 + e −w T φ(x) (20) The gradient vector of the log posterior distribution, which from Equation (13), is given by  (21) where the vector y = (y 1 , y 2 , ..., y N ) T , and F is the design matrix with elements Φ ni = j i (x n ).By setting Equation (21) to zero, the mean w MP of the Laplace approximation is represented by w MP = A -1 F T (t-y).
Hessian matrix of the log posterior distribution, which from Equation (13), is given by where B is an N × N diagonal matrix with elements b n = y n (1-y n ).At convergence of the iterative reweighed least squares algorithm, the negative Hessian represents the inverse covariance matrix for the Gaussian approximation to the posterior distribution [26].Then, the

Figure 1 A
Figure 1 A set of activity signal recorded by the accelerometer and the gyroscope located at the waist.
Given a training dataset {x i } Ni=1 containing L classes, with n l samples belong to the class l (i.e., N = L l=1 n l ), the GDA operation on it consists of two steps: first, the data x i will be transformed from the original feature space R into a new one F via a nonlinear mapping j: R

Figure 2
Figure 2 An instance of activity signal representations both in the time domain and in the transform domain: (a, b) The original data in some window respectively related to the walking forward and the Jump activity, recorded by the accelerometer located at the waist along z-axis.(c, d) Resultant FFT of the signal in (a, b), respectively (with the maximum five peaks of FFT marked with 'O').(e, f) Resultant cepstrum of the signal in (a, b), respectively (with the maximum five peaks of cepstral coefficients marked with 'O').

(a) 3 -Figure 3
Figure 3 Scatter plots of the three most important features respectively picked out by PCA (up), LDA (middle) and GDA (bottom).

Figure 4
Figure 4The marginal likelihood versus iteration numbers.

Figure 5
Figure5Recognition accuracy of our approach versus number of feature types.

Figure 6
Figure 6 Evaluation of recognition performance with conventional metrics.

Table 1
Description of activities performed by each subject

Table 2
Recognition results with different feature reduction techniques He et al.EURASIP Journal on Advances in Signal Processing 2012, 2012:108 http://asp.eurasipjournals.com/content/2012/1/108respectively).Table5shows the recognition accuracies by these different classification techniques.The RVM gives the recognition rate as high as 99.2%, followed by the SVM and the Bayesian decision, respectively, with recognition rates of 97.7 and 95.9%, while the k-NN performs the worst, of which the recognition rate only reaches 88.3%.The reason behind it may be as follows: k-NN is a linear classifier, which calculates the similarity between the test sample and the training samples.Since it does not take the data distribution into account, k-NN may not be suited for dealing with noise data.As for SVM, it is based on the principle of structural risk minimization.The final classifier obtained by SVM depends only on the "borderline" samples in the training samples, i.e., support vectors (SVs).These SVs are located near the decision boundary of the classifier.This makes the SVM sensitive to noises or outliers and patterns that were wrongly classified lie near the separation hyper-plane.As for RVM, it is a Bayesian extension of the SVM.The final classifier obtained by RVM depends only on fewer samples in the training samples, i.e., RVs.Unlike SVM, these RVs are formed by samples appearing to be more representative of the classes, which are located away from the decision boundary of the classifier.Therefore, the RVM has a better generalization ability and more robust to noises or outliers.Additionally, Table5also gives the N SV and N RV which represent the number of support vectors in the SVM model and the number of relevant vectors in the RVM model, respectively.They reflect the classifiers' structural complexity

Table 3
Confusion matrix related to the ten times threefold validation by our approach

Table 4
Recognition results with different feature reduction techniques

Table 5
Recognition results with different classification techniques