 Research
 Open Access
Discriminant nonstationary signal features’ clustering using hard and fuzzy cluster labeling
 Behnaz Ghoraani^{1, 2}Email author and
 Sridhar Krishnan^{2}
https://doi.org/10.1186/168761802012250
© Ghoraani and Krishnan; licensee Springer. 2012
Received: 7 March 2012
Accepted: 4 October 2012
Published: 27 November 2012
Abstract
Current approaches to improve the pattern recognition performance mainly focus on either extracting nonstationary and discriminant features of each class, or employing complex and nonlinear feature classifiers. However, little attention has been paid to the integration of these two approaches. Combining nonstationary feature analysis with complex feature classifiers, this article presents a novel direction to enhance the discriminatory power of pattern recognition methods. This approach, which is based on a fusion of nonstationary feature analysis with clustering techniques, proposes an algorithm to adaptively identify the feature vectors according to their importance in representing the patterns of discrimination. Nonstationary feature vectors are extracted using a nonstationary method based on time–frequency distribution and nonnegative matrix factorization. The clustering algorithms including the Kmeans and selforganizing tree maps are utilized as unsupervised clustering methods followed by a supervised labeling. Two labeling methods are introduced: hard and fuzzy labeling. The article covers in detail the formulation of the proposed discriminant feature clustering method. Experiments performed with pathological speech classification, Twave alternans evaluation from the surface electrocardiogram, audio scene analysis, and telemonitoring of Parkinson’s disease problems produced desirable results. The outcome demonstrates the benefits of nonstationary feature fusion with clustering methods for complex data analysis where existing approaches do not exhibit a high performance.
Keywords
 Kmeans clustering
 The selforganizing tree map (SOTM)
 Time–frequency feature analysis
 Supervised classification
 Unsupervised clustering
 Discriminant cluster selection
1 Introduction
The advancement in sensor technology made it possible to gather huge amounts of data, which on the one hand extends the applicability of signal analysis to a wide variety of fields, such as communications, security, biomedicine, biology, physics, finance, and geology. But on the other hand, the large data make demands for advanced and automated pattern recognition techniques to effectively process the gathered data. In pattern detection context, the general purpose of any processing technique can be described as the analysis of a given dataset to make a certain decision based on the obtained information.
In a signal classification method, a feature extraction divides a signal into shortduration segments and maps the segments into features in an appropriate multidimensional space. Next, a classification scheme performs the actual task of classifying the signals relying on the extracted features. In general, classification techniques can be divided into two groups: supervised learning and unsupervised learning. In a supervised learning, the classification scheme is usually based on the availability of a set of signals that have already been classified or described. Learning can also be unsupervised, in the sense that the system is not given a prior labeling of patterns. Instead, it establishes the classes based on the statistical or structural regularities of the patterns.
Supervised learning approaches are developed based on the assumption that the structures of signals from different classes are completely different. They then find a discriminating pattern among signals by dividing the feature space into nonoverlapping subspaces which represent each corresponding class. Although, this approach might be satisfactory in cases the signals are separable in the feature space, this approach seems to be too optimistic in applications where an overlap exists between different classes. This is a common issue in many realworld applications specially, in biomedical applications which the aim is to determine abnormal behaviored signals from the normal ones. In majority of cases, the discriminative structure of an abnormal signal occurs in a short duration, and as a result not the entire signal is abnormal. Hence, feature vectors that are extracted from the normal portion of an abnormal signal will overlap with the features extracted from the normal signals. In other words, natural similarities between different classes may result in some overlapping in the feature space. For example, in pathological speech recognition, while the nature of both normal and pathological signals is speech, only few highfrequency contents or transient components cause the discrimination between the two classes. Therefore, the extracted features may not necessarily represent the discriminating structures in each class, causing an overlap in the feature space. In addition, nonstationarities in the realworld signals cause some variations in the signals’ properties which may result in spread and overlapping of the obtained feature vectors over the feature space.
 (i)
Employing complex classification algorithms: complex learning methods such as artificial neural networks (ANN) [1] have been developed in order to discriminate different classes in the presence of features’ overlapping.
 (ii)
Applying feature selection methods: there have been previous attempts to select uncorrelated feature elements that are more related to the discriminative characteristics of each class in order to improve the classification accuracy. One of these approaches is the theory of rough sets, proposed by Pawlak [2, 3], is a kind of data analysis theory that introduced overlaps between classes. In this theory, a rough membership function makes it possible to distinguish similar feature elements and measures the degree of overlap between a set of experimental values representing a standard (e.g., set of values typically associated with a biomedical abnormality). This approach has been applied in feature selection and extraction to reduce a large number of features and identify the representative features [4]. It is worth mentioning that the aforementioned feature selection approaches differ from the subject of our study as the former selects the uncorrelated feature elements in a feature vector to increase the accuracy rate, while the latter keeps all the feature elements and identifies the cluster of feature vectors that are unrelated to the discrimination between classes.
 (iii)
Extracting the discriminant features: some attempts have been performed in the literature in order to obtain the discriminative features of the signals: local discriminant base analysis [5] and timewidth versus frequency band energy mapping [6]. While these analyses are active areas of research, the optimal choice of discriminant features highly depends on the nature of the dataset and the dissimilarity measures used to distinguish between classes. Furthermore, these analyses can only be used with decompositionbased time–frequency (TF) analysis such as wavelet or matching pursuit, and are restricted to TF analysis approaches.
In an unsupervised classification method, a clustering method (e.g., Gaussian mixture model and Kmeans clustering) is used to obtain clusters of features for each class. This training stage is performed sequentially for each class; there is no interactions between feature vectors of different classes. In the test stage, the unknownclass data are tested with respect to the discriminant clusters of each class. The predicted class is the one associated with the clusters with the maximum probability. Unsupervised classification is a natural way to proceed towards automatic pattern recognition systems for realworld applications with overlapping features as it considers the possibility of overlapping features and clusters that share a common structure among different classes.
As our goal to enhance discriminatory powers in nonstationary feature extraction, in this study, we focus on developing a new scheme for a combined unsupervised and supervised classification approach. This framework, which we call ’discriminant cluster selection’, aims to improve the classification accuracy in decisionmaking systems by providing an alternative solution to the feature overlapping problem mentioned above. In this study, we also demonstrate the fusion of nonstationary feature analysis with the proposed unsupervised classification methods to cluster the nonoverlapping feature vectors as the discriminative pattern.
In this study, we employ and refine the existing clustering approaches to develop a classification technique that improves the classification accuracy rate. We adopt the notion of unsupervised clustering; however, unlike commonly used unsupervised clustering methods, we propose to perform the clustering stage on all the training feature vectors obtained from the different classes and train one set of clusters for the entire training features. Next, we use the distribution of feature vectors in these clusters and their class label to compute the presence of the discriminative pattern in each class. Two types of clusters are identified: discriminant clusters which mainly consist of feature vectors from one specific class, and common clusters which are a mixture of features from different classes. We propose that discriminant clusters identify the representative structures in each class, and common clusters represent the similarities between classes. The proposed scheme is different from feature selection techniques which attempt to select the optimal feature elements in a feature vector to improve the classification performance. Our proposed work feeds all the elements of the feature vectors to the clustering stage, and then decides which feature cluster represents the discriminative structure between the classes. Both feature selection techniques and the proposed method can simultaneously be applied to increase the classification accuracy. In a future study, a combination of these two methods can be investigated to further improve the accuracy rate in a classification application. Our proposed framework is predicted to significantly improve the classification accuracy rate of signals. It will also improves our insight about the discrimination pattern in each class which may be reconstructed or located using the feature vectors in the discriminant clusters.
The structure of this article is as follows: Section 2 explains the discriminant feature clustering methodology. Section 3 explains Kmeans clustering and the selforganizing tree map (SOTM) as two unsupervised clustering techniques employed in this study. Two supervised cluster labeling techniques (hard and fuzzy labeling) are explained in Section 4. Section 5 explains the nonstationary signal features. In Section 6, the application of the developed technique is presented for three synthetic examples. In addition, the application of the proposed strategy is investigated for speech pathological detection, sudden cardiac deathrisk stratification, audio scene classification, and telemonitoring of Parkinson’s disease (PD), and the results are given in Section 6. Conclusion is provided in Section 7.
2 Methodology
where $g\left(\sigma ,\mu \right)$ is a Gaussian with mean μ and variance of σ^{2}. Mean of this Gaussian function locates a component in time, and the variance specifies the duration of each component. The sine function localizes the component in frequency domain. The normal signal is constructed to consist of seven frequencymodulated components.
To construct an abnormal synthetic signal, three of the components are transformed into transients. In many realworld applications such as biomedicine, transients are known to be the discriminative structures of abnormal signals, and are used in this example as one of the abnormality descriptors. Figure 1a–d displays the generated normal and abnormal signals in time and TF domains. In this example, spectrogram with FFT size of 1024 points and Kaiser window with parameter of 5, length of 256 samples and 220 samples overlap, was used to construct the TF of each signal. The TF domain provides TF distribution (TFD), which is a threedimensional TF representation with two dimensions representing the time and frequency domains, respectively. The third dimension (i.e., the intensity of the distribution) indicates the energy distribution of the signal at the corresponding time and frequency. While the time representation does not provide much information about the difference between these two synthetic signals, the TFD provides a better visual display of the discriminant structure as indicated by the dashed circles. If the right quantification and classification algorithms are used, the TF representation may successfully be employed for automatic pattern recognition applications.
Six joint TF feature vectors [7] are extracted from each signal while each vector consists of three features: S_{ h }, i.e., sparsity of the signal in time domain; S_{ w }, i.e., sparsity of the signal in frequency domain; and D_{ w }, i.e., abrupt changes in frequency domain. The applied TF feature extraction method is fully explained in Section 5. The extracted TF feature vectors are shown in Figure 1e. As can be seen in the feature space of Figure 1e, considering the relative location of the features in this feature space, two types of clusters can be detected: an overlapped cluster containing the frequencymodulated components which are common between two signals, and an abnormality cluster which consists of features corresponding to the transients in the abnormal signal. Our proposed feature classification method is successful if it can separate the abnormal cluster from the normal one (i.e., in this example, the transient and normal feature groups, respectively), and use the abnormality cluster for detection of any abnormality behavior in a test signal. The overlapped clusters do not play any role in any discrimination between the two classes. Therefore, once any feature vector is assigned to an overlapped cluster, it will be excluded from the classification of its corresponding signal, and will have no effect in labeling the signal as abnormal.
In the second stage, each cluster is labeled ({α}) based on the feature arrangements in the feature domain determining whether the cluster consists of discriminant features or common features. The clusters which consist of the majority of abnormality signals are labeled as the discriminant structure corresponding to the abnormality pattern. The outcome of this stage in Figure 1 indicates the left cluster in Figure 1f as the abnormality cluster since all the containing features belong to the abnormal signal. Similarly, the righthand cluster is labeled as the common cluster because the cluster consists of fairly equal number of normal and abnormal signals.
Once the abnormal and normal clusters are labeled, the trained clusters along with their labels ({α}) are passed to the classification stage. In test stage, each of the test feature vectors are assigned to one of the cluster centers based on the minimum Euclidean distance (ED) measure. Next, feature vectors which belong to the overlapped clusters will be excluded, and finally, based on the membership of the test feature vectors, the class label of the corresponding signal is determined. Two methods are proposed to define the class label of each signal: hard labeling which is based on majority vote, and fuzzy labeling which is based on majority vote weighted by the membership distribution of each cluster. The above stages are described in the following Sections.
3 Clustering methods
One of the most popular clustering algorithms is Kmeans clustering algorithm. The other popular clustering algorithm is SOTM that does not require any information about the number of clusters in the feature domain. This Section explains the unsupervised clustering method, and the supervised cluster labeling is explained in the next Section.
3.1 Kmeans clustering
 1.
The method starts with K initial random centroids, ${\left\{{\overrightarrow{C}}_{u}\right\}}_{u=1,\dots ,K}$.
 2.It classifies the feature samples into the nearest centroid according to the squared ED. To do so, it first calculates the squared ED of any given sample to all the centroids as given in the following equation:$\begin{array}{l}\left\{{e}_{z}^{2}\right\}=\sum _{u=1}^{K}{\u2225{\overrightarrow{f}}_{z}{\overrightarrow{C}}_{u}\u2225}^{2}\end{array}$(2)
Then, the algorithm assigns the sample to the centroid with minimum ED.
 3.The mean of the points in each cluster is computed as the new cluster centroids:$\begin{array}{l}{\overrightarrow{C}}_{u}=\frac{1}{{Z}_{u}}\sum _{u=1}^{{Z}_{u}}{\overrightarrow{f}}_{z}^{u}\end{array}$(3)
where Z_{ u } is the number of feature samples assigned to cluster u, and ${\left\{{\overrightarrow{f}}_{z}^{u}\right\}}_{z=1,\dots ,{Z}_{u}}$ are the assigned samples to cluster u.
 4.
The algorithm iteratively repeats Steps 2 and 3 unless the new cluster centers are the same as or close enough to the centroids of the previous Stage.
3.2 SOTM
 1.
The weight vectors are initialized randomly ${\left\{{\overrightarrow{C}}_{u}\left(t\right)\right\}}_{u=1,\dots ,K}$, where K is the number of clusters. The random value is usually a vector from the training set.
 2.For a new input vector, the distance from the input vector and all of the existing nodes, d _{ u }, is calculated as$\begin{array}{c}\phantom{\rule{6.0pt}{0ex}}{d}_{u}(\overrightarrow{f},{\overrightarrow{C}}_{u}(t\left)\right)={\left\{\sum _{z=1}^{Z}{\left[{\overrightarrow{f}}_{z}{\overrightarrow{C}}_{u}\left(t\right)\right]}^{2}\right\}}^{1/2}u=1,\mathrm{..},K\end{array}$(4)
where ${\overrightarrow{C}}_{u}\left(t\right)$ is the node of the cluster u at time t.
 3.Select the node with the minimum distance d _{ u }as the winning node, u ^{∗}${d}_{u\ast}(\overrightarrow{f},{\overrightarrow{C}}_{u}(t\left)\right)=min{d}_{u}(\overrightarrow{f},{\overrightarrow{C}}_{u}(t\left)\right)$(5)
 4.The minimum distance, ${d}_{u\ast}(\overrightarrow{f},{\overrightarrow{C}}_{u}(t\left)\right)$, is then compared with H(t), the hierarchical control function, which decreases over time. If the input vector is within the threshold H(t) of the winning node, the weight vector is updated based on the following update rule:${\overrightarrow{C}}_{u}(t+1)={\overrightarrow{C}}_{u}\left(t\right)+\lambda \left(t\right)[\overrightarrow{f}{\overrightarrow{C}}_{u}(t\left)\right]$(6)
Where λ(t) is the learning rate, which decreases with time. When the input vector is farther from the winning node than the threshold, a new subnode is generated from the winning node at $\overrightarrow{f}$.
 5.Checking the terminating conditions; The algorithm will stop if any of the following conditions are fulfilled

Maximum number of iterations is reached.

Maximum number of clusters is reached.

No significant change occurs in the structure of the tree.

 6.
Otherwise the algorithm is repeated from Step 2.
where τH is a time constant, which is bound to the projected size of the input feature F, H(0) is the initial value, t is the number of iterations (or sample presentation), and ζ is the number of iterations over which the linear version of H(t) would decay to the same level as the exponential version. One benefit of initializing H(t) to a large value, possibly larger than the maximum variation within the data, is that all levels of resolution across the data can be explored.
The learning rate in Equation (6), λ(t), is an important factor in organizing the network. λ(t) can operate in number of different global or local modes. In global modes, a single learning rate is applied to all nodes, whereas in local modes an individual rate operates for each node a set of nodes. There are a few modalities proposed for the operation of the learning rate and the details are discussed in [11, 12].
4 Cluster labeling
Assignment of the right label to each cluster is one of the critical concerns in our proposed discriminant cluster selection system. We propose two methods to label the obtained clusters and obtain the class label of the signals as explained in the following subsections.
4.1 Method 1: hard labeling
In an Eclass classification problem, this method decides whether each cluster represents classes 1,2,…, or E.

First, the clusters are identified, say K clusters $\left\{{\overrightarrow{C}}_{1},{\overrightarrow{C}}_{2},\dots ,{\overrightarrow{C}}_{K}\right\}$. K≥E is the number of clusters and is not necessarily equal to the number of classes (i.e., E). The number depends on the application and the employed clustering method.

Next, we calculate the feature vectors of each class based on their assignment to a cluster and denote this number as NUM_{1}(u), NUM_{2}(u),…, NUM_{ E }(u) representing the number of class 1 to E feature vectors in the u th cluster, respectively.

Then, clusters with a fairly equal mix of feature vectors from different classes are identified as overlapped clusters and labeled as common clusters (i.e., K_{ c } clusters). The remaining clusters (i.e., K_{ d }clusters) are discriminant clusters and are labeled based on the membership distribution of their feature vectors. The class with majority membership defines the label of each discriminant cluster. In order to quantify the significance of the overlap between different classes, the clusters with more that 30% of overlap are assigned to the common clusters, and the remaining clusters are identified as the discriminant clusters. The calculation proceeds as shown in the following equation:$\begin{array}{l}{\alpha}_{u}=0,\phantom{\rule{2em}{0ex}}\\ \text{For}\phantom{\rule{1em}{0ex}}u\in \left\{{K}_{c}\right\}\phantom{\rule{2em}{0ex}}\\ {\alpha}_{u}=\mathrm{arg}\phantom{\rule{1em}{0ex}}\text{Max}\left\{{\text{NUM}}_{e}\left(u\right)\right\},\phantom{\rule{2em}{0ex}}\\ \text{For}\phantom{\rule{1em}{0ex}}u\in \left\{{K}_{d}\right\}\text{and}\phantom{\rule{1em}{0ex}}e=1,\dots ,E\phantom{\rule{2em}{0ex}}\end{array}$(8)
where α_{ u } is the label defined for the u th cluster, and α_{ u }=0 represents a common cluster.

Once the training stage is completed, the estimated clusters and the calculated labels denoted with $\left\{{\alpha}_{1},{\alpha}_{2},\dots ,{\alpha}_{K}\right\}$ are passed to the test stage.

${\text{Cluster}}_{{\overrightarrow{f}}_{\text{test}}}$, the cluster, which each test feature belongs to, is found as the nearest cluster based on ED criterion:$\begin{array}{ll}{\text{Cluster}}_{{\overrightarrow{f}}_{\text{test}}}& =\mathrm{arg}\phantom{\rule{2.77695pt}{0ex}}\underset{u=1,\dots ,K\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}},}{{Min}_{i}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\left\{\left{\overrightarrow{f}}_{\text{test}}{\overrightarrow{C}}_{u}\right\right\},\phantom{\rule{2em}{0ex}}\\ ={u}_{f},\phantom{\rule{2em}{0ex}}\end{array}$(9)
where ${\overrightarrow{C}}_{u}$ is the center of each cluster constructed in the training stage.

The label of the above cluster is assigned to each test feature, and is used to determine the class ${\overrightarrow{f}}_{\mathrm{test}}$ belongs to:${\overrightarrow{f}}_{\text{test}}\in \text{Class e}\phantom{\rule{1em}{0ex}}\mathrm{if}\phantom{\rule{1em}{0ex}}{\alpha}_{{u}_{f}}=e,$(10)

Once all the feature vectors in a test signal are labeled, the feature vectors that are assigned to common clusters are excluded and the labeling of the remaining feature vectors are used to classify the signal. A test signal is classified as a class e signal, if the majority of its test feature vectors (i.e., excluding the feature vectors assigned to common clusters) belong to class e.
We call this procedure ‘hard labeling’ as each cluster is distinguished with one label.
4.2 Method 2: fuzzy labeling
After all the feature vectors are clustered, clusters with large overlapping (i.e., containing more than 30% overlapping feature vectors) are associated as common clusters (i.e., K_{ c }), and the remaining clusters (i.e., K_{ c }) which are the discriminant clusters are used in the training stage as follows:
where E is the number of classes and K_{ d }is the number of discriminant clusters.
where NUM_{ e }(u) is the number of features belong to class e that exist in cluster u, and m_{ u } is the total of features in the u th cluster. These coefficients will be used in calculation of the membership degree for each of the test vectors.
where s_{ u }is the number of the representing vectors for a test signal that fall within the u th cluster and K_{ d } is the number of discriminant clusters.
where M(:,e) is the e th vector of the membership matrix, M, and the signal is labeled to belong to the class associated with the maximum value of Φ(e).
Although the advantage of the hard and fuzzy labeling is the identification of the representative clusters for each class and discriminating them from the common clusters, the method requires that each class contributes with the same number of feature vectors. Since the identification of representative clusters is based on comparing the membership of each class in the clusters, the number of normal and abnormal feature vectors should be the same in order to perform a fair comparison. The proposed solution in such scenarios is to reduce the sample size of all the classes to the sample number of the smallest classes.
5 Nonstationary signal feature extraction
This approach captures the TF features by applying the nonnegative matrix factorization (NMF) [13] to the TFD of each signal. Spectrogram can be used as a simple TF representation. Seven features are extracted from the decomposed vectors including: ${\text{MO}}_{\overrightarrow{h}}^{\left(1\right)},{\text{MO}}_{\overrightarrow{w}}^{\left(1\right)},{\text{S}}_{\overrightarrow{h}},{\text{S}}_{\overrightarrow{w}},{\text{D}}_{\overrightarrow{h}},{\text{D}}_{\overrightarrow{w}}$, and ${\text{SH}}_{\overrightarrow{w}}$.
5.1 NMF
In these equations, $\u3008\phantom{\rule{2.22198pt}{0ex}}\mathrm{A.B}\phantom{\rule{2.22198pt}{0ex}}\u3009$ and $\frac{\u3008A\phantom{\rule{2.22198pt}{0ex}}\u3009}{\u3008B\phantom{\rule{2.22198pt}{0ex}}\u3009}$ are termbyterm multiplication and division of two matrices, and 1 is a matrix of ones. KL divergence formula is not a boundconstrained problem, which requires the objective function to be well defined at any point of the bounded region [16]. The log function in KL divergence formula is not well defined if any elements in matrix V or WH is zero. Hence, we do not consider KL divergence formulation in this study. The least squares error approach is a standard boundconstrained optimization problem. Various minimization strategies have been proposed for the least squares error strategy. In this study, we use a projected gradient boundconstrained optimization method which is proposed by Lin [16].
5.2 Features
As shown in Figure 3, features are extracted from each decomposed W and H matrices. The obtained features are explained as follows:
5.2.1 Joint TF moments
where ${\mu}_{{\overrightarrow{h}}_{j}}$ and ${\mu}_{{\overrightarrow{w}}_{j}}$ are the first moment of the j th coefficient and base vectors and are computed as follows: ${\mu}_{{\overrightarrow{h}}_{j}}=\sum _{n=0}^{N}n{\overrightarrow{h}}_{j}^{T}\left(n\right)$ and ${\mu}_{{\overrightarrow{w}}_{j}}=\sum _{m=0}^{M}m{\overrightarrow{w}}_{j}\left(m\right)$.
5.2.2 Sparsity
The sparsity feature is zero if and only if a vector contains a single nonzero component (i.e., maximum sparsity), and is negative infinity if and only if all the components are equal (i.e., minimum sparsity).
5.2.3 Discontinuity
${\text{D}}_{\overrightarrow{h}}$ and ${\text{D}}_{\overrightarrow{w}}$ capture the discontinuities and abrupt changes in coefficient and base vectors, respectively. A vector with a smaller value of discontinuity feature is smoother compared to a vector with a larger discontinuity feature.
5.2.4 Sharpness
6 Results
6.1 Synthetic dataset
6.1.1 Example 1
 (i)
TF representation (i.e., TF matrix) of each signal is constructed.
 (ii)
NMF matrix decomposition method is applied to the TF matrix, and 10 base and coefficient components (i.e., W and H, and r=10) are computed for each signal.
 (iii)
A feature vector is extracted from each component pair as explained in Section 5; i.e., there are ten feature vectors for each signal and each feature vector contains the following feature values: $\left\{{\text{MO}}_{\overrightarrow{h}}^{\left(1\right)},{\text{MO}}_{\overrightarrow{w}}^{\left(1\right)},{\text{MO}}_{\overrightarrow{h}}^{\left(2\right)},{\text{MO}}_{\overrightarrow{w}}^{\left(2\right)},{\text{S}}_{\overrightarrow{h}},{\text{S}}_{\overrightarrow{w}},{\text{D}}_{\overrightarrow{h}},{\text{D}}_{\overrightarrow{w}},{\text{SH}}_{\overrightarrow{w}}\right\}$.
 (iv)SOTM clustering is used to train and then classify the signals in each class. The classifier is trained using 90% samples and classified over all the signals. SOTM is simultaneously applied to Classes 1 and 2 feature vectors and computes 25 clusters in the feature space. The number of feature vectors associated to Class 1 or Class 2 are counted in each cluster and the distribution of feature vectors in these 25 clusters is computed and displayed in Figure 5.In both hard and fuzzy labeling, clusters with more than 30% overlap (i.e., clusters 1, 12, 13, 18, 20, 23, 24, and 25) are assigned to common clusters, and the remaining clusters are identified as discriminant clusters and are labeled depending on the labeling method proposed in Section 4 (Figure 6). In hard labeling, clusters with more than 30% Class 1 feature vectors are labeled as Class 1 (i.e., clusters 3, 6, 9, 10, 11, 15, 16, and 17) and the ones with more than 30% Class 2 feature vectors are labeled as Class 2 (i.e., clusters 2, 4, 5, 7, 8, 14, 19, 21, and 22). However, in fuzzy labeling, a membership ratio is assigned to each cluster as follows:$M={\begin{array}{c}\left[\begin{array}{ccccccccccccccccc}{C}_{2}& {C}_{3}& {C}_{4}& {C}_{5}& {C}_{6}& {C}_{7}& {C}_{8}& {C}_{9}& {C}_{10}& {C}_{11}& {C}_{14}& {C}_{15}& {C}_{16}& {C}_{17}& {C}_{19}& {C}_{21}& {C}_{22}\\ 0.28& 0.99& 0.29& 0.33& 0.88& 0.26& 0.32& 0.96& 0.83& 0.99& 0.33& 0.74& 1.0& 1.0& 0.25& 0.29& 0.3193\\ 0.72& 0.01& 0.71& 0.67& 0.12& 0.74& 0.68& 0.04& 0.1& .011& 0.67& 0.26& 0.0& 0.0& 0.75& 0.71& 0.6807\end{array}\right]\end{array}}^{\text{T}}$(31)
 (v)
All the signals are tested and labeled. Figure 6 displays the receiver operating curve (ROC) of the final classification.
6.1.2 Example 2
TFD in panels C and D is constructed using spectrogram method, FFT size of 1,024 points and Kaiser window with parameter of 5, length of 256 samples and 220 samples overlap. Features were extracted as explained in Section 5: NMF with a decomposition order of 10 was applied to the spectrograms of y_{1} and y_{2}. The decomposed vectors were: ${\left[{\overrightarrow{w}}_{1}\left(i\right)\phantom{\rule{1em}{0ex}}{\overrightarrow{h}}_{1}^{\text{T}}\left(i\right)\right]}_{i=1:10}$ and ${\left[{\overrightarrow{w}}_{2}\left(i\right)\phantom{\rule{1em}{0ex}}{\overrightarrow{h}}_{2}^{\text{T}}\left(i\right)\right]}_{i=1:10}$, respectively. Seven TF features were extracted from each decomposed vector. Three of these features are shown in panel C of Figure 7 where the asterisk and circle correspond to y_{1} and y_{2} signals, respectively. Kmeans clustering with three clusters was applied to all the features. Each cluster with the majority membership of a signal was marked as the corresponding signal’s discriminant pattern.
As can be seen in this feature plane, there was a group of features which were clustered in the middle. Using the discriminant feature selection method, this cluster was selected as the discriminant pattern in signal y_{2}: D_{y 2}. The same method identified the discriminant pattern in y_{1}signal: D_{y 1}. The remaining features belonged to the commonalities between these two signals.
Panel D in Figure 7 displays the discriminant structures in y_{1} and y_{2}signals. These TF structures were built using the decomposed vectors corresponding to the D_{y 1} and D_{y 2} feature points: $\sum _{i={D}_{y1}}{\overrightarrow{w}}_{1}\left(i\right){\overrightarrow{h}}_{1}^{\text{T}}\left(i\right)$ and $\sum _{i={D}_{y2}}{\overrightarrow{w}}_{2}\left(i\right){\overrightarrow{h}}_{2}^{\text{T}}\left(i\right)$. As demonstrated in this example, the proposed method was able to successfully identify the discriminant structures in each signal. Once the discriminant clusters are selected, these clusters along with the proposed cluster labeling methods can be used to classify a new signal. The above example used only one signal from each class in arriving at the differences between TF structures. In practice, we have to use more number of signals in both classes before arriving at a robust discriminant pattern.
6.1.3 Example 3
This experiment introduces more challenges to the identification of the discriminant structures between two classes. In this example, the discriminant structure overlaps with the common structure; i.e., the abnormal components are mixed with the normal structure. As is demonstrated in this example, the proposed discriminant cluster selection method provides a successful separation between the normality and abnormality structures.
6.2 Real dataset
Pathological voice classification, Twave alternans (TWA) evaluation from the surface electrocardiogram (ECG), environmental audio classification, telemonitoring of PD are selected as the applications of the developed discriminant cluster selection method. The former is performed using the hard labeling clustering method, and the latter three are evaluated employing the fuzzy labeling approach.
6.2.1 Hard labeling: pathological speech detection
Dysphonia or pathological voice refers to speech problems resulting from damage to or malformation of the speech organs. Currently, patients are required to routinely visit a specialist to follow up their progress. Moreover, the traditional ways to diagnose voice pathology are subjective, and depending on the experience of the specialist, different evaluations can be resulted. Developing an automated technique saves time for both the patients and the specialist, and can improve the accuracy of the assessments. In a previous study from our group [7], we introduced the joint TF feature extraction and classification for pathological speech verification. In this study, we provide this application with a focus on nonstationary TF feature analysis + hard cluster labeling, and compare its performance with traditional clustering methods.
The proposed methodology was applied to the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database, distributed by Kay Elemetrics Corporation [21]. The database consists of 51 normal and 161 pathological speakers whose disorders spanned a variety of organic, neurological, traumatic, and psychogenic factors. The speech signal is sampled at 25 kHz and quantized at a resolution of 16 bits/sample. In this study, 25 abnormal and 25 normal signals were used to train the classifier. Each signal is divided into 80ms segments and the TFD is constructed [22, 23]. Next, NMF with base number of r=15 is employed to each TF representation, and 15 base and coefficient vectors are estimated as explained in Equation (15).
As explained in our previous study [7], abnormal speech behaves differently for voiced (i.e., vowel) and unvoiced (i.e., constant) components. Therefore, prior to feature extraction, the base vectors are divided into two groups: (a) low frequency (LF): the bases with dominant energy in the frequencies lower than 4 kHz, and (b) high frequency (HF): the bases with major energy concentration in the higher frequencies. Four features (${S}_{\overrightarrow{h}},{D}_{\overrightarrow{h}},{S}_{\overrightarrow{w}},S{H}_{\overrightarrow{w}}$) are extracted from each LF base vectors, and five features $\left\{{S}_{\overrightarrow{h}},{D}_{\overrightarrow{h}},M{O}_{\overrightarrow{w}}^{\left(1\right)},M{O}_{\overrightarrow{w}}^{\left(2\right)},M{O}_{\overrightarrow{w}}^{\left(3\right)}\right\}$ are obtained from each HF base vector.
The clustering and labeling are performed as explained in Sections 3.1 and 4.1, respectively. In the training stage, 100 and 20 clusters are experimentally found to be proper choice for the number of clusters (K) in case of LF and HF features, respectively. From the entire clusters, 25% were assigned as common clusters and the remaining clusters labeled class normal or abnormal as explained in hard labeling scheme.
6.2.2 Fuzzy labeling: TWA evaluation from the surface ECG
Next, the adaptive TFD [22, 23] of the aligned Twaves is constructed over each vertical sample (${\overrightarrow{A}}_{1}$, ${\overrightarrow{A}}_{2},\dots ,{\overrightarrow{A}}_{N}$). Adaptive TFD approach is a highresolution TF representation capable of adaptively tracking nonstationary structures. Adaptive TFD uses the matching pursuit [22] method to decompose the signal over a dictionary of TF atoms. At each iteration, the signal is projected over a dictionary of TF functions and the one which models the greatest fraction of the signal energy is selected. This TF function is then subtracted from the signal, and the residual signal is subsequently decomposed in further iterations till all or most of the signal energy is modeled. The matching pursuit decomposition with Gabor TF atoms has been chosen in this study because of its superior TF resolution [22], crossterm free nature, adaptivity, and suitability for pattern recognition applications. The adaptive TFD for each vertical sample is computed. If V_{1}, V_{2},…,V_{ N } are the TFD of each vertical sample, in the next stage, the TFD representative of the entire Twave (denoted with V_{avg}) is calculated as the average of V_{1}V_{2},…,V_{ N }. Once the average TFD is constructed, features are extracted as explained below:
where T is the energy of the decomposed base (W) at frequency of 0.5 cpb (cycle per beat), and μ_{noise}estimates the noise energy. Considering a white Gaussian noise, noise has a constant spectral density at the entire spectral bandwidth. Since the Twave alternation and respiratory activities do not have any spectral content over the spectral bandwidth of 0.36 to 0.49 cpb, this bandwidth is used to estimate the noise energy. The last ten elements in each base component represent the spectral content of the Twave signal. Basically, any information about Twave including noise and TWA value exist in the spectral content of the last ten elements in the base vector. Therefore, the other ten features are chosen to be the last ten elements in each base component. As a result of this feature extraction, 3 feature vectors are extracted for each ECG segment where each vector includes 11 feature values.
As explained in our previous study [25], realworld ECGs with inherent noise were obtained from 26 normal subjects who underwent 2 channel ambulatory ECG recordings (GE Healthcare, Inc.) for 24–48 continuous hours at our institution. The ambulatory ECGs were recorded at a sampling rate of 125 Hz and then exported for custom analysis. The mean heart rate of these recordings was 78–17 bpm (beats per minute) and the mean noise level was 40–67 μ V. Each ECG channel was included as a separate record.
Two groups of ambulatory ECGs were generated, one without simulated TWA (TWA magnitude = 0 μ V) and the other with simulated TWA (TWA magnitude = 5 μ V): ECG signals are recorded from normal subjects and therefore they are assumed to have 0 TWA. A simulated TWA signal with amplitude of 5 μ V is added to the ECG signals by uniformly increasing Twave amplitude of even beats and decreasing Twave amplitude of odd beats across the Twave. The use of a known TWA signal permits TWA quantification to be compared between the different methods. A TWA detection threshold of 5 μ V was prespecified as this cutpoint approximates the TWA magnitude measured by Klingenheben et al. [28] in patients with heart disease using a similar definition of TWA as our study. The extracted features from NMFadaptive SM were fed into two classifiers (the SOTM clustering and fuzzy labeling, and an LDA) to train and classify the ECG segments as TWA present or absent.
Half of the dataset is used for the training stage and the other half is employed to test the accuracy of the TWA detection. SOTM is applied on the training dataset and the number of valid clusters is calculated for the classification. Clusters with small number of samples are eliminated. We experimentally decided that the clusters with a membership of less than 1% of the entire feature vectors are not valid. The clusters are formed as the data are presented to the network and the number and size of the clusters is determined by the parameters such as the hierarchical control function (H(t)) and the learning rate (λ(t)). The initial values of these functions are appointed according to the dataset. In the next stage, the membership coefficients are calculated for each cluster based on the distribution of the train signals. In the test stage, each of the test signals are assigned to one of these cluster centers based on the minimum ED measure. Finally, the class label of each signal is determined by the weighted sum of the feature vectors falling within each cluster multiplied by the membership coefficients. Another point to be discussed here is that since the data are represented to the SOTM in a random manner, the number and the shape and size of the clusters might vary each time the clustering algorithm is run on the data. However, since there is not a onetoone correspondence between the clusters and the two groups, this fact has no considerable impact on the total performance of the classifier. In addition, the results of the several are averaged to further eliminate this effect.
Comparison of TWA detection rate for NMFAdaptive SM using the proposed fuzzy labeling classifier and LDA
Method  Sensitivity (%)  Specificity (%)  Classifier  TWA magnitude (μ V)  ECG database 

NMFAdaptive SM  92  95  Fuzzy labeling  5  52 Real ECGs 
NMFadaptive SM  87  91  LDA  5  52 Real ECGs 
SM [25]  73  63  LDA  5  52 Real ECGs 
MMA [25]  92  58  LDA  5  52 Real ECGs 
Wavelet 1 [29]  77  NS  Wilcoxon ranksum test  10  10 Synthetic ECGs 
Wavelet 2 [30]  96  NS  LDA  10  2050 Synthetic ECGs 
6.2.3 Fuzzy labeling: audio classification
Audio signals are the important sources of information for understanding the content of multimedia. Therefore, developing audio classification techniques that better characterize audio signals plays an essential role in many multimedia applications, such as multimedia indexing and retrieval, and auditory scene analysis. Having approximately 10% of the world population suffering from some sort of hearing loss, one of the important applications of audio classification is in hearing aids (HA) for hearing impaired people. In order to prevent the noise signals from being magnified by the hearing aid, the HA is required to detect the audio classes which the incoming signals belong to, and then change the HAs parameters accordingly. A recent article from our group [31] presented the benefits of joint TF feature extraction employed in environmental audio classification. Next section provides the performance of fuzzy cluster labeling employed along with nonstationary joint TF features when performed for audio scene analysis, and compares its performance with supervised classification.
In this study, we use an environmental audio dataset that was compiled in our signal analysis research group at Ryerson University [31]. The dataset is designed to have 10 different classes such that it consists of 192 audio signals of 5s duration each with a sampling rate of 22.05 kHz and a resolution of 16 bits/sample. This database is designed to have 10 different classes including 20 aircraft, 17 helicopters, 20 drums, 15 flutes, 20 pianos, 20 animals, 20 birds, and 20 insects, and the speech of 20 males and 20 females. Most of the music samples were collected from the Internet and suitably processed to have uniform sampling frequency and duration.
Threesecond audio signals are transformed into TF domain. Next, NMF with decomposition order of 15 (r=15) decomposes each TFD into 15 base and coefficient vectors. In this study, experimentally, r=15 is found to be a suitable choice for the application. Seven features (Section 5) are extracted from each base and coefficient vector. Finally, The SOTM clustering and fuzzy labeling is employed to train and classify the signals.
Confusion matrix for classifying human versus nonhuman audio signals
Classes  Total  TF + soft labeling  TF + GMM  MFCC + GMM  

Human  Nonhuman  Human  Nonhuman  Human  Nonhuman  
Human  (n)  40  40  0  37  3  40  0 
(%)  100  100  0  92.5  7.5  100  0  
Nonhuman  (n)  40  3  37  8  32  17  23 
(%)  100  7.5  92.5  20  80  42.5  57.5 
The human versus nonhuman sound discrimination is also performed using GMM as a successful traditional clustering method for audio signals. This classification resulted in a lower performance with 86% overall accuracy rate. Fifteen mixtures are experimentally found sufficient and used for the GMM classification. We also compared the accuracy of the TF feature extraction and clustering method with the wellknown MFCCs features. MFCCs are shortterm spectral features and are widely used in the area of audio and speech processing.
In this application, a signal is divided into 32ms segments and then we compute the first 13 MFCCs for all the segments of the entire length of the audio signals and use them as feature vectors. Using GMM and 15 mixtures, MFCC features resulted in 79% overall classification accuracy rate. It can be seen that MFCC features and GMM system are able to successfully classify human signals; however, the method is not very effective for classifying the nonhuman signals (i.e., 57.5% accuracy rate). The reason for such behavior can be explained that MFCC features and GMM clustering system are useful for human speech analysis, but they are challenged when dealing with natural sounds with nonhuman sources. However, it can be evidenced that the combination of the TF feature vectors and the proposed discriminant cluster labeling are significantly successful.
Different audio classes in the dataset and the number of signals in each class
Classes  Dataset  Average accuracy  

TF + soft labeling (%)  TF + GMM (%)  MFCC + GMM (%)  
Human/nonhuman  Nonhuman: aircraft, piano, animal, bird  96  86  79 
Human: male and female speeches  
Human/music  Music: piano, flute, drum  98  68  71 
Human: male and female speeches  
Natural/artificial  Natural: male, female, bird, animal, insect  91  63  62 
Artificial: helicopter, airplane, piano, flute, drum  
Human/Nature  Nature: animal, insect, bird  98  83  75 
Human: male and female speeches  
Aircraft/music  Music: piano, flute, drum  98  76  89 
Aircraft: helicopter, airplane 
6.2.4 Fuzzy labeling: telemonitoring of PD
In this application, we present an assessment of the proposed discriminant clustering method for discriminating healthy people from people with PD by detecting dysphonia. The data for this application were obtained from Little et al. [32]. The dataset consists of 195 sustained vowel phonations from 31 male and female subjects, of which 23 were diagnosed with parkinson disease. The time since diagnoses ranged from 0 to 28 years, and the ages of the subjects ranged from 46 to 85 years (mean 65.8, standard deviation 9.8). Averages of six phonations were recorded from each subject, ranging from 1 to 36 s in length. See [32] for subject details. Little et al. [32] selected ten highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine (SVM).
In this section, we employ the ten features proposed in [32] and employ the proposed discriminant clustering method using soft labeling strategy to perform discrimination between people with PD and healthy subjects. It is worth mentioning that since this database provided only the extracted attributes and not the original signals, we could only use the given features. This way, we could evaluate the proposed discriminant cluster selection method and investigate the efficiency of this method in comparison to the exhaustive search and SVM classification used in [32].
7 Conclusion
The objective of this article was to improve the performance of pattern recognition systems when there is an overlapping feature vectors due to nonstationarity of the signals or the commonality that exist among different classes. To make this happen, the article introduced a different strategy to clustering techniques based on a fusion of unsupervised and supervised learning approaches. This method applied an unsupervised clustering to the feature vectors from all the different classes, and then used a supervised labeling method to select two types of clusters: discriminant and common clusters. The supervised cluster labeling approach selected the discriminant clusters from the common ones according to their importance for representing each corresponding class. The obtained discriminant clusters represented the differentiating patterns that exist among signals from different classes. Therefore, in the classification stage, only the feature vectors that were located in the discriminant clusters were considered for the classification of a given signal. These feature vectors were better representatives of the signals’ characteristics, and resulted in a significantly higher classification accuracy rate.
In order to identify the discriminant clusters, two cluster labeling methods were proposed: hard and fuzzy labeling. In hard labeling, discriminant clusters were assigned to one of the possible classes, but in fuzzy labeling, they were associated to each class with a relative membership value ranging from 0 to 1 (with 0 being the least contribution, and 1 being the most). Both proposed methods enhanced the commonly used supervised learning and clustering approaches. Kmeans and SOTM clustering methods were explained for the applications studied in this article. An advantage of SOTM compared to the Kmeans method was the number of clusters, which should be known beforehand in Kmeans, but was adaptively determined in the SOTM algorithm.
In conclusion, experiments performed with synthetic signals as well as pathological speech, surface ECG, telemonitoring of PD, and environmental audio signals demonstrated the potential of the proposed discriminant feature clustering framework for becoming a powerful pattern recognition tool.
Declarations
Authors’ Affiliations
References
 Freeman G, Dony R, Areibi S: Audio environment classication for hearing aids using artificial neural networks with windowed input. In Proceedings of the IEEE Symposium on Computational Intelligence in Image and Signal Processing, vol. 2846. Honolulu, HI; April 2007:183188.View ArticleGoogle Scholar
 Pawlak Z: Rough sets. Int. J. Comput. Inf. Sci 1982, 11: 341356. 10.1007/BF01001956MathSciNetView ArticleMATHGoogle Scholar
 Pawlak Z: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Norwell; 1992.MATHGoogle Scholar
 Jensen R, Shen Q: New approaches to fuzzyrough feature selection. IEEE Trans. Fuzzy Syst 2009, 17: 824838.View ArticleGoogle Scholar
 Saito N, Coifman R: Local discriminant bases and their applications. J. Math. Imag. Vis 1995, 5(4):337358. 10.1007/BF01250288MathSciNetView ArticleMATHGoogle Scholar
 K Umapathy S: Krishnan, Timewidth versus frequency band mapping of energy distributions. IEEE Tran. Signal Process 2007, 55: 978989.View ArticleGoogle Scholar
 Ghoraani B, Krishnan S: A joint timefrequency and matrix decomposition feature extraction methodology for pathological voice classification. EURASIP J. Adv. Signal Process 2009, 2009(ID 928974):11. [http://dx.doi.org/10.1155/2009/928974] []MATHGoogle Scholar
 Jain A, Duin R, Mao J: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell 2000, 22(1):437. 10.1109/34.824819View ArticleGoogle Scholar
 Duda R, Hart P, Stork D: Pattern Classification. Wiley, New York; 2001.MATHGoogle Scholar
 Kong H, Guan L: Detection and removal of impulse noise by a neural network guided adaptive median filter. In Proceedings of the IEEE International Conference on Neural Networks, vol. 2. Perth, WA; November 1995:845849.View ArticleGoogle Scholar
 Kyan M: Unsupervised learning through dynamic selforganization: implications for microbiological image analysis. PhD thesis, School of Electrical and Information Engineering University of Sydney, (2007)Google Scholar
 Kyan M, Jarrah J, Muneesawang P, Guan L: Strategies for unsupervised multimedia processing: selforganizing trees and forests. IEEE Comput. Intell. Mag 2006, 1: 2740. 10.1109/MCI.2006.1626492View ArticleGoogle Scholar
 Lee D, Seung H: Learning the parts of objects by nonnegative matrix factorization. Nature 1999, 401(6755):788791. 10.1038/44565View ArticleGoogle Scholar
 Paatero P, Tapper U: Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 1994, 5: 111126. 10.1002/env.3170050203View ArticleGoogle Scholar
 Lee D, Seung H: Algorithms for nonnegative matrix factorization. In Advances in Neural Information Processing Systems 13. MIT Press, Cambridge, MA); 556562.Google Scholar
 Lin CJ: Projected gradient methods for nonnegative matrix factorization. Neural Comput 2007, 19(10):27562779. 10.1162/neco.2007.19.10.2756MathSciNetView ArticleMATHGoogle Scholar
 Tacer B, Loughlin P: Timefrequency based classification,in Proceedings of the International Society for Optical Engineering (SPIE), vol. 2846. Denver, CO; August 1996:186192.Google Scholar
 Groutage D, Bennink D: Feature sets for nonstationary signals derived from moments of the singular value decomposition of cohenposch (positive timefrequency) distributions. IEEE Trans. Signal Process 2000, 48(5):14981503. 10.1109/78.840002View ArticleGoogle Scholar
 Davy M, Doncarli C, BoudreauxBartels GF: Improved optimization of timefrequency based signal classifiers. IEEE Signal Process. Lett 2001, 8: 5257.View ArticleGoogle Scholar
 Davy M, Gretton A, Doucet A, Rayner P: Optimized support vector machines for nonstationary signal classification. IEEE Signal Process. Lett 2002, 9(12):442445.View ArticleGoogle Scholar
 Eye M, Infirmary E: Voice Disorders Database, Version 1.03. Kay Elemetrics Corporation, Lincoln Park; 1994.Google Scholar
 Mallat SG, Zhifeng Z: Matching pursuits with timefrequency dictionaries. IEEE Trans. Signal Process 1993, 41(12):33973415. 10.1109/78.258082View ArticleMATHGoogle Scholar
 Krishnan S, Rangayyan R, Bell G, Frank C: Adaptive timefrequency analysis of knee joint vibroarthrographic signals for noninvasive screening of articular cartilage pathology. IEEE Trans. Biomed. Eng 2000, 47(6):773783. 10.1109/10.844228View ArticleGoogle Scholar
 Dibazar A, Narayanan S, Berger T: Feature analysis for automatic detection of pathological speech. In Proceedings of the Second Joint EMBS/BMES Conference, vol. 1. (Houston, TX, USA; October 2002:182183.Google Scholar
 Ghoraani B, Krishnan S, Selvaraj RJ, Chauhan VS: T wave alternans evaluation using adaptive timefrequency signal analysis and nonnegative matrix factorization. Med. Eng. Phys 2011, 33(6):700711. 10.1016/j.medengphy.2011.01.007View ArticleGoogle Scholar
 Nearing BD, Verrier RL: Modified moving average analysis of Twave alternans to predict ventricular fibrillation with high accuracy. J. Appl. Physiol 2002, 92: 541549.View ArticleGoogle Scholar
 Smith JM, Clancy EA, Valeri CR, Ruskin JN, Cohen RJ: Electrical alternans and cardiac electrical instability. Circulation 1988, 77(1):110121. 10.1161/01.CIR.77.1.110View ArticleGoogle Scholar
 Klingenheben T, Ptaszynski P, Hohnloser S: Quantitative assessment of microvolt twave alternans in patients with congestive heart failure. J. Cardiovasc. Electrophysiol 2005, 16: 620624. 10.1111/j.15408167.2005.40708.xView ArticleGoogle Scholar
 Romero I, Grubb N, Clegg G, Robertson C, Addison P, Watson J: Twave alternans found in preventricular tachyarrhythmias in CCU patients using a wavelet transformbased methodology. IEEE Trans. Biomed. Eng 2008, 55: 26582665.View ArticleGoogle Scholar
 Boix M, Cantó B, Cuesta D, Micó P: Using the wavelet transform for twave alternans detection. Math. Comput. Model 2009, 50: 738742. 10.1016/j.mcm.2009.05.002View ArticleMathSciNetMATHGoogle Scholar
 Ghoraani B, Krishnan S: Timefrequency matrix feature extraction and classification of environmental audio signals. IEEE Trans. Audio Speech Lang. Process 2011, 19(7):21972209.View ArticleGoogle Scholar
 Little M, McSharry P, Roberts S, Costello D, Moroz I: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMed. Eng 2007., OnLine 6(23):Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.