- Open Access
Localization and classification of heart beats in phonocardiography signals —a comprehensive review
EURASIP Journal on Advances in Signal Processing volume 2018, Article number: 26 (2018)
Phonocardiogram (PCG) signal represents recording of sounds and murmurs resulting from heart auscultation. Analysis of these PCG signals is critical in diagnosis of different heart diseases. Over the years, a variety of methods have been proposed for automatic analysis of PCG signals in time, frequency, and time-frequency domains. This paper presents a comprehensive survey of different methods proposed for automatic analysis of PCG signals with the objective to evaluate the current state-of-the-art and to determine the potential domains of effective analysis. An important aspect of our contribution is that the review is carried out as a function of domains of analysis rather than simply discussing different methods. Our method further splits analysis into pre-processing, localization, and classification, and details are presented in terms of techniques and classifiers used during these phases. Finally, results are summarized for normal heart beat, noisy heart beat, and different pathologies using metrices like accuracy and detection rate. In addition to time and frequency domain, time-frequency based methods including wavelet, empirical mode decomposition (EMD) and time-frequency representation (TFR) are selected for detailed analysis. The review concludes that the time-frequency representations like EMD and wavelets represent areas of exploration in future along with perspective of using different time-frequency techniques together.
Human heart is the most important organ in the body that provides blood to all parts of the body using a pump-like action. During the pumping action, electrical and mechanical activities are carried out resulting in the flow of blood. Healthy heart is very important for the normal day to day working of human body as blood carries important nutrients to the organs. Heart-related diseases, known as cardiovascular diseases (CVD), are responsible for a major proportion of deaths all around the world. According to surveys conducted by the World Health Organization (WHO), 33% of all deaths are the result of CVDs . Different modalities are known to exist to monitor the health of heart. The most popular of these is the electrocardiogram or ECG. The electrical activity of heart is recorded in the form of electrocardiogram which is composed of three main waves, ‘P’ wave, ‘QRS’ complex, and ‘T’ wave. The middle wave of ECG is known as ‘QRS’ complex that, in general, comprises three deflections ‘Q’ (the first negative deflection) ‘R’ (the first positive deflection), and ‘S’ (the negative deflection following the ‘R’ wave). Another important modality is photoplethysmography (PPG) that employs light-based sensors to estimate the rate of flow of blood by measuring the changes in the reflected/transmitted light. Like ECG, PPG can also be used to monitor various cardiac conditions. Figure 1 illustrates an example each of ECG and PPG signals . A wide range of computerized systems have been proposed over the years for automatic analysis of ECG [3, 4] and PPG  signals as well as the combination of these and other modalities .
In addition to ECG and PPG, phonocardiogram (PCG), the recording of the sounds and murmurs made by heart during a cardiac cycle, can be effectively employed to study and monitor the activities of heart. Such sounds are typically recorded using a device called phonocardiograph. The mechanical activity of heart due to its physical movement produces four distinct sounds or beats which are named as first normal heart sound (S1), second normal heart sound (S2), third abnormal heart sound (S3), and fourth abnormal heart sound (S4). S1 and S2 are normal sounds while S3, S4, murmurs, and certain other sounds usually refer to some disease or anomaly. Murmur is a noisy heart sound that can be heard when the heart valve is shut but the blood continues to flow. S1 is usually low frequency and high-amplitude signal while S2 has high frequency and low amplitude. In some cases, it is also possible to encounter S1 having low amplitude than S2 as elaborated in .
ECG, PPG, and PCG, all represent a cyclostationary signal, i.e., a signal in which the statistics of the signal vary but are repetitive with a period. ECG and PCG are highly correlated signals  and are known to contain more information than the PPG signal. PCG, however, enjoys a distinct advantage over ECG and PPG signals as it records the acoustic properties of the signal. These properties are better suited for murmur detection which represents abnormal heart sound . Furthermore, PCG signal also has an excellent starting trigger in the form of S1 wave [10, 11].
The mechanical activity of the heart can be heard using a traditional or an electronic stethoscope. Auscultation or listening to heart is an old but very effective method of diagnosis for a number cardiovascular diseases. The recording of these sounds (PCG signal) has the same spectrum as that of audio signals. The recording process, however, also picks noisy sound signals from the environment which distort the periodicity of the PCG signal making its analysis a challenging task. A sample PCG signal is shown in Fig. 2 where it can be seen that the first heart sound S1 has lower amplitude than the second heart sound S2. Moreover, the amplitudes of S1 and S2 are both varying. The time duration from S1 to S2 is known as systole and the one between S2 and S1 is known as diastole. Systole and diastole shown as t12 and t21, respectively, in Fig. 2 serve as important beat classification features. The time durations t12 and t21 are also varying.
PCG, though a very complex signal, divides the normal heart sound signals into two beats (S1 and S2) and offers important details about a number of heart-related diseases [12–14]. With the advancements in technology as well as different areas of pattern classification and machine learning, efforts have been made to automate the analysis of PCG signals. The automatic analysis of these signals involves two primary challenges, localization and classification. In localization, the objective is to correctly detect the positioning of beats in the signal while classification deals with the categorizing the beats into S1 and S2 in case of normal heart sounds and into S3, S4, and murmurs etc. in case of abnormal sounds.
Localization algorithms are generally peak-based algorithms where candidate peaks are localized in a signal and windows are constructed around these potential peaks. Features are then calculated from each window, and the peaks are classified. Another approach is not to precisely localize the peaks for classification. Springer et al.  for instance, divides the signal into a sequence of segments which are classified and later processed for a precise localization. Heart beat is a repetitive or cyclic process (Fig. 2) and a normal PCG signal can be divided into four segments. These include the time window containing S1, the silence window, the segment containing S2, and again the silence window. A normal heart beat can hence be modeled using these states. However, since the sequence and duration of each state within the cycle can vary, probabilistic modeling is better suited for such scenarios as opposed to deterministic approaches. Hidden Markov model (HMM), for instance, has been employed for such modeling in [15–17]. The same idea can be extended to include the abnormal beats S3 and S4 (and different other pathologies) as illustrated in Fig. 3. Such techniques allow pre-classifying these segments into one of the states while the exact location of beats within these segments are localized subsequently.
The initial developments and findings in digital processing of PCG signals have been summarized in works by [18, 19]. In a relatively recent survey, Meziani et al.  discussed computerized analysis of PCG signals but limit their discussion to wavelet transforms-based methods only. In another short review , authors present a comparative study of EMD and wavelet-based methods for analysis of PCG signals. The study concludes that EMD is better suited for PCG as compared to wavelet, specially, when dealing with noisy signals. Authors highlighted the use of machine learning tools for better classification and suggested that the use of hybrid classifiers serves to enhance the performance. Nabih et al.  review the denoising, segmentation, and classification techniques for automated analysis of PCG signals. The key focus of their study is the comparison of different classifiers (artificial neural network (ANN), support vector machine (SVM), self-organizing map (SOM), and hybrid classifiers with the conclusion that while each of the processing steps is important, the choice of classifier is the most critical parameter in enhancing the overall system performance.
As discussed earlier, the present study is organized as function of time, frequency, and time-frequency methods. In time domain, the signal is sampled on time axis (Fig. 4a) and features (like amplitude, mean, and energy) are directly computed from signal. In spectral or frequency domain, signal spectra is divided into various spectral bands (Fig. 4b) and features (for example, spectral flux) are calculated from these bands. Unlike time and frequency methods, the time-frequency methods rely on simultaneous sampling on time and frequency axis. The general representations of time-frequency domain for fixed and variable time/frequency windows are illustrated in Fig. 4c, d, respectively. In our discussion, we considered time-frequency representations (like Wigner-Ville and its variants, short-time frequency transform (STFT)) and techniques based on EMD and wavelets under the category of time-frequency methods.
We first introduce the standard datasets commonly employed by researchers (to evaluate the localization and classification algorithms) in Section 2. Section 3 details the time, frequency, and time-frequency methods for analysis of PCG signals along with a critical discussion on the presented methods. Finally, Section 5 concludes the paper.
While many of the initial PCG localization and classification techniques reported results on private datasets, a number of publicly available datasets have also been developed allowing researchers meaningfully compare their developed systems using similar experimental protocols.
These public datasets include E-General Medical , PASCAL Heart Sound Challenge (HSC) dataset , the PhysioNet CinC Challenge dataset , and the Heart Sound and Murmur Library . In the following, we present the key characteristics of these datasets for a better interpretation of the quantitative results reported by different studies (presented later in paper).
PASCAL HSC dataset was first made available in 2011 for two challenges, segmentation (localization), and classification. The database is divided into two subsets, dataset ‘A’ and dataset ‘B’. Dataset ‘A’ has been collected using the iStethoscope Pro iPhone app while dataset ‘B’ has been gathered in a clinical setting using the digital stethoscope, DigiScope. The two datasets have 176 and 656 total auscultations, respectively. Files in both the datasets contain normal heart beats, murmurs, and extra systoles. In addition to these, dataset ‘B’ also contains files with artifacts and extra sounds and is more challenging.
The 2016 PhysioNet CinC database was also developed as a part of a challenge. Heart sounds in the dataset are gathered from clinical as well as non-clinical sources. Healthy individuals as well as those with different pathologies contributed to data collection. The sound classes in the challenge include ‘normal,’ ‘uncertain,’ and ‘abnormal.’ The training set labeled from ‘A’ to ‘E’ has a total of 3126 files while the validation set comprises 300 recordings. The durations of the files vary from 5 to 120 s. The test set has not been made publicly available and was only used to score the participants of challenge.
E-General Medical is a well-known vendor of medical equipment and datasets. Among other datasets, a part of the dataset with heart sounds has been freely made available by E-General Medical. The dataset comprises 64 recordings with normal beats, S3, S4, and different pathologies. Majority of pathologies are systolic and diastolic in nature. In many cases, only one sample per signal type is available. Examples of such one sample per signal type include late systolic, early systolic, normal split, open snap, pan systolic, late systolic aortic stenosis, severe systolic aortic stenosis, critical systolic aortic stenosis, and systolic aortic valve replacement. Other sounds like diastolic-fixed S2 split, diastolic wide S2 split, systolic mitral regurgitation, systolic mitral prolapse, systolic split, and diastolic-fixed S2 split have more than one sample. Complete details on the dataset can be found in .
Heart sound and Murmur Library, University of Michigan Health Systems, is another public dataset that comprises normal heart sounds, normal split S1, systolic click, S4 gallop, S3 gallop, single S2, split S2 persistent, split S2 transient, and various type of murmurs . A summary of the databases presented above can be found in Table 1.
As mentioned previously, the techniques proposed for automatic analysis of PCG signals can be categorized into time domain, frequency domain, and time-frequency domain. In the following, we discuss the time and frequency domains followed by the time-frequency domain.
Time and frequency analysis
The most general method of analysis is based on the time domain. In time domain, analysis is carried out on the signal itself. Statistics like peak value, mean, mode, median, and peak to peak duration of signal are typically employed for analysis. As indicated in Fig. 5, P1 and P2 are the peak values of the signal, A1 and A2 are peak to peak values, and △1 and △2 are time periods which may represent features of S1 and S2. These and similar features are extracted either globally from the complete signal or locally from different segments of the signal. The general representation of a signal in time domain is presented in Eq. 1; f(t) represents the signal which is modeled by amplitude A, frequency ω, and phase θ. Each of these parameters as well as different statistics derived from these parameters have been used as features for localization and classification.
For frequency domain analysis, the signal is first converted from time to frequency domain using fast Fourier transform (FFT) elaborated in Eq. 2.
Where f(t) is the signal and F(w) is the fast Fourier transform of f(t). After analysis of the frequency spectrum, any appropriate spectrum operation (for instance filtering) is applied, and the signal is converted back to time domain using the inverse fast Fourier transform (IFFT). Bandpass filtering is usually employed in heart beat analysis to remove the low and high frequency noise from stethoscope signals. Figure 6a illustrates a sample signal before and after filtering. Figure 6b shows the spectrum of the signal, filter response, and the frequency contents after filtering. Since the frequency contents of various heart signals are different, the frequency spectra can be exploited for classification of heart beats. Finding the optimal set of filters (along with cutoff frequencies), however, can be challenging especially for noisy signals. Consequently, a number of researchers investigated Mel-Frequency Cepstral Coefficients (MFCCs)  for frequency domain analysis of PCG signals. In MFCC calculation, pre-emphasis is first applied on the signal to enhance the high-frequency components of signal. The signal is then divided into windows (generally rectangular), and FFT is applied to each window. The Mel filter bank is then applied and logarithm of all filter bank energies is computed. The acoustic response of human ear is not linear to all frequencies. Mel scale uses this non-linearity and maps frequencies (f) linearly below 1000 Hz and logarithmically above 1000 Hz. Conversion from frequency to Mel scale is given in Eq. 3.
Finally, discrete cosine transform (DCT) is applied to get the MFCCs . An overview of the steps involved in MFCC calculation is presented in Fig. 7. The pre-emphases, windowing, and FFT calculation in MFCC computation act as pre-processing steps; hence, a separate pre-processing is generally not required.
Time and frequency analysis have been widely investigated to localize and classify normal as well as abnormal heart beats like third and fourth heart sound, clicks and murmurs. With a few exceptions [29, 30], analysis in the time domain usually starts with pre-processing which generally comprises decimation followed by a low pass filter with a cut-off frequency of around 200 Hz. The decimation decreases the computational load on the analysis module. Low pass filtering is used as pre-processing step as heart beat has characteristics of low frequency signal. Liang et al.  and Potes et al.  first decimate the signal and pass it through low pass and band pass filters, respectively. Chauhan et al.  employed low pass and median filtering along with Hamming window for removal of noise, ripples, and ringing effects. Likewise, Ari et al.  pass the signal through a low pass filter after decimation while Hussnain et al.  simply used the low pass filtering without decimation. Signal from the decimation and filtering stages is next fed to the normalization stage.
Normalization removes the amplitude variation for inter and intra signal classification. The signal is then converted to a form that is best suitable for localization and classification. Shannon energy envelop is most widely employed method investigated in most of the time-based techniques [16, 31, 34]. In short, researchers have mostly employed combinations of decimation, filtering, and Shannon energy for pre-processing in the time domain.
The signal after pre-processing has peaks at regular and non-regular intervals. Candidate peaks can be used for individual beat localization which in turn can be used for heart rate detection. The time duration between S1 and S2 (systole) and the one between S2 and S1 (diastole) are usually exploited for the localization and classification of candidate peaks [16, 31, 33].
Another common approach is to employ machine learning-based techniques for localization and classification. Signals represented in an appropriate feature space are used to train a learning algorithm.
Authors in , for instance, use support vector machine with a modified cuckoo search (MCS) optimizer using features extracted from linear predictive coding (LPC) for classification. Similarly, Chen et al.  investigated deep neural networks for recognition of heart sounds S1 and S2. In another study, Potes et al.  applied AdaBoost-abstain (modified version of conventional AdaBoost) and convolutional neural networks (CNN) to detect abnormal heart sounds.
As discussed previously, MFCC is usually used in the frequency-based methods and is found to be effective in speech processing including speech recognition and speaker identification . MFCC has been used in the feature extraction process in  and  for localization and classification. Likewise, MFCC-based features were used to train HMM classifiers in  and reported high classification rates.
As discussed earlier, heart beat signal is a time series data composed of different events (occurrence of S1, S2, etc.). The sequence of these events varies in case of abnormal signals. Human heart beat has therefore been modeled using HMM (Fig. 3) in a number of studies [15–17] where each state represents an event (silence, S1, S2, murmur, etc.). Although HMMs exploit this temporal dependency, the features fed to these HMMs can be extracted in the frequency domain. MFCC-based features, for instance, have been employed to train HMMs . Such modeling has been applied to simple two class problem (normal and abnormal heart sound) as well as for the detection of different pathologies like ventricular septal defect (VSD), mitral regurgitation (MR), aortic stenosis (AS), aortic regurgitation (AR), patent suctus arteriosus (PDA), pulmonary regurgitation (PR), and pulmonary stenosis (PS) .
Ajit et al.  used simple FFT for localization and classification without any further processing as frequency contents of different heart sounds have different spectrum contents. Authors in  used both frequency- and energy-based features for classification while Mandeep and Cheema  used time, frequency, cepstrum, and statistical features for localization as well as for classification. Features were evaluated using Ranker and Info Gain Attribute Evaluation algorithm, and multiple classifiers were investigated to recognize the beats. Among other notable studies, Samit et al.  after pre-processing, used wavelet decomposition for localization and least square SVM with least mean square (LMS) for classification. In another study, Nogueira et al.  exploited MFCC with time features of systole and diastole durations, and the relationship between ECG and PCG signal for heart sound classification. Ortiz et al.  employed dynamic time warping (DTW) along with MFCC for heart sound classification realizing a test score of 84% using SVM classifier. Tang et al.  proposed an ensemble of 324 features primarily composed of time, frequency, kurtosis, energy, and other features to classify a sound as normal or abnormal using ANN. Rubin et al.  converted PCG signal to heat map using MFCC for abnormal sound detection using convolutional neural network (CNN). It can be noticed from the discussion of the presented techniques that classifiers like ANN, SVM, and HMM have been commonly employed in time- and frequency-based methods. While HMM exploits the temporal dependencies in the signal, classifiers like ANN and SVM categories individual beats into one of the beat classes under study.
From the view point of heart sounds, S1 and S2 are considered in [30, 31, 36, 37] while  and , in addition to S1 and S2, also consider split S1, split S2, S3, S4, murmurs, clicks, and snaps. The work in  is focused on discrimination between normal heart sounds and murmurs. Likewise, authors in [32, 39–41] primarily distinguish between the two broad categories, normal (S1, S2) and abnormal heart sounds. Similarly, noise or spike detection is studied in  while the work in  considers normal heart conditions as well as pathological problems including aortic insufficiency, aortic stenosis, atrial septal defect, mitral regurgitation, and mitral stenosis.
Table 2 summarizes the research in explicit time [29–31, 33–35, 38–41] and frequency [16, 17, 32, 36, 37] domains. All studies except [30, 32, 34, 38–41] consider private datasets for evaluation of the proposed techniques. These private datasets were recorded at various settings, for instance, authors in  and  used clinical settings while 3M Poland microphone samples are employed in . Likewise, for public datasets, different subsets have been employed by different authors making a direct comparison of these techniques difficult.
The evaluation protocol in different studies also varies. Techni-ques proposed in [16, 17, 30–32, 35, 37–41] considered signals,  used cycles while  used beats. The general evaluation metrics include accuracy, specificity, sensitivity, and positive predictive value (PPV) as elaborated in the following:
Where ‘TP’ represents true positives or correctly identified, ‘TN’ represents true negatives or correctly rejected, ‘FP’ for false positives or incorrectly identified, and ‘FN’ represents false negatives or incorrectly rejected. Most of the studies [17, 29, 30, 32, 35, 36, 38–41] used these measures; authors in [16, 31] employed correct and incorrect cycle detection while  employed classification rate for quantifying the results.
Comparing different time-based methods, it can be observed from Table 2 that a number of studies achieved an overall score of above 90% on public datasets [24, 25]. On private datasets, classification accuracies of as high as 100% have also been reported [33, 35]. Springer algorithm  has been employed for localization in most of studies [38, 40, 41]. For classification, peak to peak time duration has shown to be a discriminative feature [31, 34]. A similar trend is observed for frequency based methods which report, on the average, accuracies of more than 80% on the public PASCAL challenge dataset while a classification rate of 100% is claimed on a private dataset . Features based on MFCC have been commonly employed in combination with variety of classifiers. In some cases, time- and frequency-based features have been combined [16, 30, 32] to further enhance the localization and classification performance. Furthermore, it is also common to employ time domain for classification frequency domain for classification [38–41].
Due to cyclostationary nature of PCG, a signal cannot be localized both in time and frequency due to uncertainty principle limitation. Consequently, time-frequency analysis of PCG signals is investigated in a number of studies. Such methods generally employ short-time frequency transform (STFT), time-frequency representations like Wigner-Ville, wavelet transform, and empirical mode decomposition (EMD).
Where WD stands for Wigner distribution, x is the signal, x∗ represents the conjugate of x while τ is the time delay and w is the frequency. For a mono component signal, Eq. 8 produces a frequency component that is representative of signal frequency. However, for multiple-component signal, this autocorrelation produces extra components known as ‘cross-terms.’ As an example, we consider a two-component signal x(t)=x1(t)+x2(t). The autocorrelation of signal x(t) will be composed of terms x(t)∗x(t)=x1(t)∗x1(t)+x2(t)∗x2(t)+x1(t)∗x2(t). While the first two terms represent the power components of signal under study, the third term is an additional undesirable component (cross-term). Since WD computes FFT of the autocorrelation of signal, the contribution from cross-terms also appears; hence, corrupting the spectrum. To minimize the impact of cross-terms, a windowed version of above equation is normally used as presented in Eq. 9.
Where w(τ) represents the smoothing window. Equation 9 is known as pseudo Wigner-Ville distribution . Figure 8 shows the output of the pseudo Wigner-Ville representation of heart beat signal from a sample file in the Pascal DigiScope dataset (Table 1). The peak at 1000th sample represents S1, and the one at 2000th sample represents S2. All other structures represent noise resulting from cross terms due the TFR manipulation. The noise samples result from autocorrelation of multi-spectral signals.
Vargas et al.  demonstrated that time-frequency methods can be used for analysis of non-stationary signal like EEG and PCG. Zhang et al.  used matching pursuit (MP) that works both in time and frequency domains, for PCG signal decomposition. For time scaling, the signal is expanded in a way that the frequency components remain the same. Likewise, for frequency scaling, the frequency contents are compressed and expanded while keeping the temporal contents intact. In frequency scaling, the frequency band may be shifted towards higher frequencies or towards lower spectrum. Despite the shift, the contents in time remain the same. It is not uncommon to use both time and frequency scaling. The MP method converts the signal into time-frequency atoms with time and frequency parameters. The summation of atoms gives back the original signal with error. Making use of these parameters, the signal can be transformed. This transformation results in a modified version of the signal which cannot be directly compared with original signal. An inverse scaling is therefore used for comparison. The time-frequency representations (TFR) can be calculated from original, scaled, and inversed versions. The sum of Wigner distribution of all atoms of a signal is termed as MP-based Wigner distribution. MP-based Wigner distribution has the distinct advantage of automatic removal of cross terms over conventional Wigner distribution when applied to signal. Also, MP-based Wigner distribution gives much clear presentation than the spectrogram. The time and frequency resolution of both spectrogram and MP-WV distribution is higher than original, and components are much clear in case of normal heart sound as well as aortic regurgitation.
In another study , authors employed spectrogram along with Renyi entropy for localization and classification. First, TFR using spectrogram is calculated, the optimal value for window length is then calculated using Renyi Entropy Measure (RME). The RME (Eq. 10) is also used for localization of the end of normal heart sound and start of any pathological sound based on a threshold. The method was tested for normal PCG, early aortic stenosis, late aortic stenosis, and pulmonary stenosis.
Debbal and Bereksi-Reguig  compared time-frequency methods of STFT (short-time frequency transform), CWT (continuous wavelet transform), and Wigner-Ville (WV) distribution primarily for the analysis of second heart sound. Analysis was also carried out for normal and pathological cases of early stenosis aortic analysis, late aortic stenosis, and pulmonary stenosis, etc. Authors concluded that for second heart sound, WV suffers from cross-term limitation while STFT joins the split. CW outperforms the other two techniques both in terms of localization and split. Gavrovska et al.  used affine PWVD along with Shannon envelope, ACF, and Haar wavelet lifting for S1 and S2 localization and classification. The basic difference between the normal TFR (time frequency distribution) and affine distribution (time-scale distribution or TSD) is that TFR is covariant to time and frequency of signal while TSD is covariant with time and scale of the signal . The developed technique relies on three main steps. The signal is first down sampled, and coarse detection is performed using PWVD followed by fine detection using Haar wavelet lifting scheme and normalized average Shannon energy (NASE) algorithm. The Haar wavelet-based lifting scheme emphasizes more on low frequency components of the PCG signal under consideration. Finally, identification of S1 and S2 is carried out.
Singh and Dutta  used time, probability, and spectrogram for automatic analysis of PCG for cardiac disorders. Barma  considered decomposition method called Hilbert Vibration Decomposition (HVD) along with time-frequency representation of smoothed pseudo Wigner-Ville distribution (SPWVD) for detection of third heart sound. HVD was proposed to overcome the limitations associated with Hilbert-Huang transform (HHT). HHT is unable to separate low-energy noise from heart sound which has major energy in low frequency spectrum. It also generates low amplitudes at low frequency which has no physical interpretation. In another work, Zhang et al.  proposed a method based on spectrogram and partial least squares regression to classify heart sound into normal, murmur, extra heart sound, extra systole, and artifacts. In an extension of this work , authors employed a number of machine learning algorithms for classification of heart sounds.
A summary of TFR-based localization and classification techniques is presented in Table 3. It should be noted that the evaluation datasets considered in [45, 48, 50] are private datasets comprising 11, 70, and 90 signals, respectively. The technique proposed in  is evaluated on the E-general Medical dataset with six signals, while no explicit details about the signals used for evaluations is given in . Public datasets PASCAL CHSC 2011, 2016 PhysioNet Challenege, E-General, and Michigan Library data have been employed for evaluations in [51–53]. Gavrovska et al. , Zhang et al. , and Barma et al.  employed precision, efficiency, and recall to quantify the system performance. Among notable contributions, recall of more than 90% and precision of 96.39% is reported in . Likewise, Barm et al.  realize an efficiency of 93.9% while Zhang et al. [52, 53] report a varying sensitivity of 50–100% depending upon the number and types of heart sound classes considered in the experiments.
Wavelet is a popular method of choice in many fields because of the flexibility it provides in terms of choice of kernel selection which in turn reflects different time and frequency resolutions. Wavelet transform divides the signal into wavelet and scale factor. The formation is given in Eq. 11 .
cj0 and d j are coefficients while φj0,k and ψj,k are scaling and wavelet functions, respectively. Haar, Mexican hat, Morlet, Daubechies, and Symlet wavelets are a few of the well-known functions from a very long list of wavelet functions. The different time and frequency resolutions are usually achieved by using filter banks, an example illustrated in Fig. 9 for S1 beat selected from the PASCAL dataset. In Fig. 9a, first the beat S1 data was converted to Hamming data which was then fed to wavelet transform converting it into one scale and 12 coefficients. The average signal is presented in Fig. 9a while Fig. 9b–d present the low-band, mid-band, and high-frequency band of the beat.
Wavelet transform is a powerful tool that has been employed for pre-processing, localization, and classification. Song et al.  denoised the signals in two phases. Environment noise in the heart sounds was removed using the LMS algorithm, while a second step of denoising was carried out using db3 wavelet. Wavelet analysis using db6 order 4 filter was performed for signal reconstruction. The authors concluded that the reconstructed signal gives better characteristics for classification between normal heart sound and murmurs. Features were computed using NASE algorithm, and fuzzy neural network with structure learning classifier was employed for classification .
In another study, Sepideh and Geranmayeh  considered modeling of the normal heart sound and three pathological disorders, namely, aortic insufficiency, the aortic stenosis, and the pulmonary stenosis sounds. Wavelet db4, order 5 analysis was then carried out on the signals producing statistical features of mean and standard deviation to train an ANN. Features were modified, and the inverse wavelet analysis using the same db4, order 5 was performed which produced the modeled sound .
Liang and Hartimo  segmented the PCG signal into four parts, i.e., systole, diastole, first heart sound, and second heart sound using wavelet decomposition and reconstruction. Feature vector is then constructed from the original PCG as well as from the third, fourth, and fifth coefficient of db6 order 5. The feature vector based on systole and diastole duration is normalized and fed to ANN classifier for classification of innocent murmur and pathological murmurs . The study was later extended to consider wavelet packet decomposition for heart beat classification .
Tu et al.  denoised the signal using soft and hard thresholding method with the aim to remove the murmur and environmental noise. The signal was then reconstructed using wavelet db6 and channels 5, 6, and 7. Sound envelop was then extracted using Hilbert transform, and features mainly based on time duration of systole and diastole were used for classification. Gupta et al.  proposed a segmentation algorithm based on homomorphic filtering and K-means clustering. The signal was first passed through pre-processing stages comprising normalization and low pass filtering. Segmentation was performed for complete and missed cycle using db2. Features extracted from the segmented signal were then used for classification of normal, systolic, and diastolic murmur.
Among other wavelet based contributions, Naseri and Homaeinezhad  proposed a framework for classification of heart sounds S1, S2, S3, S4, murmurs, and scuffles. The authors first carry out signal normalization followed by wavelet packet decomposition for noise removal and finally bias removal using Gaussian smoothing filter. Frequency- and amplitude-based features were employed for the detection of different heart sounds.
Pedrosa et al.  divide their work into two parts, first is to segment the signal into periodic and non-periodic (noisy) parts, and second is to classify the segmented parts into murmurs or normal beats. Morlet-based pre-processing is used in the segmentation stage and autocorrelation function (ACF) is used for detection of periodic and noisy parts. Any periodic parts are then classified as S1 or S2 using time and time-frequency features. Murmur detection is based on features extracted from Shannon energy, CWT, DWT, singular value decomposition (SVD), MFCC, bispectrum, variance fractal dimension, and Lyapunove exponents. Zheng et al.  carried out pathological signal detection based on DWT, MFCC, and dynamic time warping (DTW). First, DWT is used for envelogram segmentation followed by feature extraction using MFCC. Finally, DTW is employed to measure the distance between the signal under test and signal with known pathologies. In another work, Marques et al.  investigated stationary wavelet transform followed by hierarchical clustering for localization S1, S2, systole, and diastole.
Deng and Bentley  aimed segmentation of normal heart sounds and detection of murmurs. The signal is first down sampled and db4 order 6 is used to extract the beats. Peaks are identified based on the time between the identified systole and diastole periods. The identified peaks are then classified using the time duration between peaks as feature. Gupta et al.  preprocess the signal using normalization and low pass filtering. Segmentation is carried out using homomorphic filtering and K-means clustering. Once segmented, wavelet coefficients using db2 are computed to serve as features. The dimensionality of the feature vector is reduced using PCA. Classification for normal, systolic murmur, and diastolic murmur is carried out using multilayer perceptron-back propagation neural network (MLP-BP).
Abo-Zahhad et al.  proposed a human authentication algorithm based on discrete wavelet transform (DWT) for PCG signals. Jain and Tiwari  used adaptive thresholding method for denoising of PCG signals. Later, in an extension  to this work, adaptive algorithm was used for shrinking of wavelet coefficients for PCG denoising. Goda et al.  combined DWT, time features, and other features for classification of heart sounds. Abdollahpur et al.  used DWT with MFCC for heart sound classification while Boussaa et al.  compared MFCC with DWT for PCG classification. Likewise, Kay and Agarwal  used MFCC and continuous wavelet transform (DWT being the discrete counterpart) for heart sound classification.
There is a growing trend towards using wavelet in combination with other operators like Teager energy operator (TEO) and non-negative matrix factorization (NMF). Sattar et al.  used NMF for PCG segmentation. Ramovic et al. proposed a system for human authentication based on wavelet and TEO  while fetal heart sound detection is carried out by Koutsiana using wavelet and fractal dimensions .
A summarized overview of the wavelet based methods discussed in this section is presented in Table 4. Among the studies discussed, classification of normal (S1 and S2) and abnormal heart sounds is considered in [55, 61, 62, 68, 69, 71, 73, 77]. Murmurs were focused in [57, 58, 60–62, 64, 65, 67, 72], while special heart sounds like heart pathologies, scuffles, and artifacts are considered in [56, 63, 65]. Both private and public datasets have been employed for evaluations. Systems presented in [55–61, 66] have been evaluated on private datasets, while works reported in [63, 69] considered a major proportion of the E-General Medical PASCAL 11, 2016 PhysioNet challenge, Egeneral, MIT BIH, and BIOSec public datasets. Likewise, studies presented in [62, 64, 65, 67, 71, 73, 77] employed different subsets of the PHSC 2011 dataset and 2016 PhysioNet challenge. Among evaluation metrics, classical measures including accuracy, PPV, specificity, sensitivity, precision, and classification rate have been mostly used [55–58, 60–62, 65, 71, 73, 77]. Varying results have been reported in these studies ranging from as low as 50% to as high as 100%. Techniques based on wavelets report quantified results even in the earlier studies like [57, 58] where accuracies of 74.4 and 85% are reported (respectively) on private datasets. On public datasets, accuracies of 85 and 98% are reported in [71, 73] respectively on the PhysioNet database. Among other studies, Jain et al. [68, 69] realize a 100% accuracy on E-General and a private dataset while the work presented in  also reports an accuracy of 100% on the MIT BIH dataset. In general, in terms of performance, wavelet-based techniques exhibit a trend similar to that of other techniques where high accuracies are reported on private datasets while the publicly available and more difficult datasets still offer a number of challenges.
EMD and Hilbert-Huang transform
EMD has been a popular choice for the time-frequency analysis in many fields. Unlike other time-frequency algorithms, EMD operates in time domain and operates directly on the signal.
EMD was introduced by Huang et al.  along with 2D graphical representation known as Hilbert spectrum for non-linear and non-stationary time series analysis. The algorithm is based on the assumption that each data series is primarily composed of a finite set of simple oscillations which are AM/FM components called intrinsic mode functions (IMF) by sifting process [79, 80]. An IMF must follow the following properties:
The mean value of an IMF is zero.
The difference between the number of zero crossings and the number of extrema is at most one.
The first condition implies that IMF should be a narrow band signal, and the second refers to IMF being a mono-component signal . EMD comes in many flavors like bivariate EMD, complex EMD, and multivariable EMD. In our discussion, we will focus only on the standard EMD algorithm [81, 82] which is outlined in the following.
The signal x(t) decomposed by EMD can be represented as follows:
Where c i (t) are IMF and r(t) is the residue. Since c i (t) is monocomponent, by taking its Hilbert transform, it can be converted to an analytical signal. The derivation is given in the following:
X i (t) can be represented with amplitude and phase as given below:
If phase θ i is differentiated with the respect to time t, the result is instantaneous frequency ω i . The plot of θ i vs ω i is the graphical representation known as Hilbert-Huang transform [78–80]. Figure 10 shows an actual S1 beat selected from a data file in the iStethoscope dataset. Two plots are shown, one for the actual rectangular data and other from hamming-based data of the same rectangular window. b–d present the low-band, mid-band, and high-frequency band of the beat using standard EMD algorithm.
Among EMD-based techniques, Salman et al.  first filter the signal using low pass filter followed by EMD-based denoising. Authors claim that EMD offers better values for a number of noise measures including signal-to-noise ratio (SNR), root mean square error (RMSE), and percent root mean square difference (PRD). Cardiac cycle was calculated using autocorrelation of normalized Hilbert transformed signal, and segmentation was carried out using Shannon energy. Finally, the time duration between peaks was used for systole and diastole identification from which different beats like S1, S2, S3, and S4 were classified .
In another work, Zhao et al.  employ two methods for instantaneous frequency calculation. One based on Hilbert transform, and the other based on TFD (EMD). The Hilbert-based approach fails for wide band signals while EMD reports satisfactory results. Authors claim to make use of EMD in its basic form to detect coronary artery disease based on the instantaneous frequency calculated from diastolic murmurs using EMD and SVM. In another study  by the same authors, signal was denoised using db5 and, ensemble EMD (E-EMD) was then applied. EEMD removes the mode mixing problem in traditional EMD. Marginal spectrum was calculated on the Hilbert-Huang spectrum of EEMD output followed by DCT quantization and normalization of the marginal spectrum. Vector quantizer (VQ) was trained using Linde-Buzo-Gray algorithm, and classification was carried out using Euclidean distance.
One of the early attempts to analyze biomedical signals (including PCG) using EMD was made by Sun et al. . The authors argued that EMD is a powerful yet little explored tool for analysis of different biomedical signals. Later, the authors investigated EMD decomposition for instantaneous frequency estimation of PCG signals and concluded that EMD being a physical decomposition is more useful than mathematical decompositions like wavelet transform . Likewise, Gavrovska et al.  employed EMD with wavelet for PCG denoising.
Among other contributions, Moukhadem et al.  and Sun et al.  proposed algorithms for the classification of first and second heart sounds. In , Sun et al. used a combination of wavelet and EMD for classification. Signal is first decomposed using EMD, and the highest frequency IMF is denoised using db7 wavelet. The output signal is then reconstructed using denoised channels, and all other IMFs and NASE algorithm is applied on the resultant signal. In addition, cross correlation is calculated between the original signal and all IMFs. The IMF reporting maximum correlation is chosen and NASE is again calculated. Using separate thresholds, endpoints of S1 and S2 are detected from both denoised NASE and maximum correlation NASE signals. In , after pre-processing, SVD is used for feature extraction from the output of EMD and S-transform. SVD is also applied to the S-Matrix calculated from S-transform followed by k-NN for classification.
Boutana et al.  exploited EMD for heart sound segmentation and used it for clinical cases of late aortic stenosis, early aortic stenosis, and mitral regurgitation. The method for selection of IMFs is based on noise-only model which assumes that if the noise is additive white Gaussian noise, the logarithm-variance of each IMF varies but the variation is linear and with a parameter called the Hurst exponent. Papadaniil and Hadjileontiadis  employed ensemble EMD along with kurtosis for heart sound segmentation. Low pass and median filtering is first used for noise removal. Afterwards, EEMD-based IMFs are calculated. The IMFs which do not meet any of the energy criteria, instantaneous frequency criteria, and bootstrap kurtosis-based criteria are removed. S1 and S2 are then classified based on the Kurtosis measure.
Authors in  suggested that translated EMD outperforms the traditional EMD. Translated EMD takes the signal, modulates it, applies EMD, and then demodulates all IMFs. The results are presented for simulated as well as real PCG data. In a relatively recent study, Jimenez et al.  first normalize and resample the PCG signals. Three different types of EMD analysis, namely, EMD, EEMD, and adaptive EEMD are then carried out. Afterwards, S_MFCC obtained from the signal, ST_MFCC calculated from energy operator in the frequency domain, SW_MFCC from frequency bands of the spectral energy distribution, and SWT_MFCC from combination of all previous are calculated from two signals which are sum of the even and odd IMFs. Fuzzy rough set (FRS) algorithm is then employed to reduce statistical moments obtained from HHT. At the end, classification is carried out using ergodic HMM.
Banerjee et al.  used variational mode decomposition along with Shannon energy feature for heart sound localization. Variational mode decomposition was developed by Dragomiretskiy and Zosso  to remove noise sensitivity and sampling problems that accompany standard EMD. Salman et al.  used EMD for removal of white, colored, exponential, and alpha-stable noise and showed that EMD is superior when compared with wavelet and total variation denoising methods. Heart murmurs were detected by Jusak et al.  using complete ensemble empirical mode decomposition (CEEMD, another variant of standard EMD) and the Pearson distance metric. Authors concluded that CEEMD is computationally more complex as it extracts more modes in comparison to EMD.
Research efforts are being continuously made to enhance the EMD algorithm and proposition of new filtering techniques for EMD. A number of variants have been proposed to handle the overshoot and undershoot end effects related to the classical EMD algorithm. These effects are demonstrated in Fig. 11. The figure shows that upper envelop is well above while the lower envelop is fairly below the original signal. Both of these introduce errors in the mean envelop. Another problem in EMD-based techniques is associated with the stopping criteria. If the algorithm stops premature, it will not reveal all the details in signal, and if it continues for too many iterations, the physically meaningless IMF would appear. The third major problem with EMD is its decomposition sequence. EMD decomposes from high frequency towards lower frequency but not from high energy towards lower energy. This effect can be minimized by using HHT transform along with EMD .
A summary of the methods discussed in this section is presented in Table 5. It should be noted that all types of heart beat have been considered in the EMD-based research. Normal heart beats of class S1 and S2 are considered in most of the studies [83–87, 89, 90, 92–95, 97, 100]. Additionally, S3 and S4 sounds are considered in  while S3 an S4 with gallop are considered in . Studies in [86, 87] focus on abnormal heart sounds whereas various pathological states like regurgitation and stenosis are investigated in [91, 92, 100]. Likewise, murmurs have been taken into account in [94, 98, 100].
Techniques proposed in most of the aforementioned studies have been evaluated on private datasets. Only the works presented in [91, 100] are evaluated on the publicly available E-General Medical dataset. Likewise, systems reported in [83, 97, 98, 101] employed the University of Michigan dataset while PASCAL CHSC 2011 dataset is considered in . Standard metrics of accuracy, specificity, sensitivity, etc. have been employed in most cases. Specific measures like mean prediction power and mean accuracy , correct recognition rate (CRR) , correct and incorrect diagnosis , and SNR and ratio R  have also been reported. EMD algorithm is compared with other time-frequency algorithms like wavelet and DWT in [87, 88, 101]. Studies reported in [97, 98] used SNR and ΔSNR while  employed the average detection rate. Gavrovska et al.  compared DWT, EMD, and EEMD, and concluded that EEMD reports the best performance among the three. Similarly, authors in  compared EMD with other denoising techniques and argued that EMD outperforms other methods. For E-General Medical public dataset, Varghees and Ramachandaran  reported average sensitivity and positive predictivity of more than 90% while for private datasets, accuracies of as high as 99% are also reported .
Application of multiple time-frequency methods.
A recent trend in automated analysis of PCG signals is investigation of multiple time-frequency representations for effective classification and localization. Gavrovska et al. , for example, combined time-frequency representation of Wigner-Ville with Haar wavelet for normal heart beat detection in pediatric patients. Authors report a precision and recall of more 90% for 90 heart beat signals collected from 55 patients and 35 healthy subjects. The idea of multiple time-frequency methods was also exploited by Sun and Gong , and Gong and Nie  where the authors who employ EMD with wavelets for separation of S1 and S2 from noisy heart sounds. Both the studies [90, 102] report high accuracies of more than 99% and demonstrate the effectiveness of combining multiple time-frequency methods over single representation. It should however be noted that although these multiple time-frequency methods realize high accuracies, they are computationally expensive and hence, could not gain significant research attention of research community. Furthermore, the choice of different time-frequency methods and their sequence of application also remains a challenging problem.
Comparing different time-frequency representations, it can be observed from Table 3, SPWVD outperforms other representations like spectrogram. Among wavelet-based techniques, a large number of comprehensive investigations have been carried out using a variety of wavelet filters. DWT, CWT, and Db filters have been most popular choices reporting high localization and classification accuracies. Likewise, EMD along with its various enhanced variants has remained an attractive choice for researchers for this problem. Lately, techniques based on combination of EMD and wavelets have also been explored and are known to report higher accuracies as compared to EMD or wavelet-based methods. Further discussion on comparison of different techniques is presented later in the next section.
This section presents an analysis of time, frequency, and time-frequency methods discussed in the previous sections. Time-based methods have been used extensively in the past and have still maintained their popularity because of ease with which information can be extracted. They can provide, in theory, the best localization for beats in the signal under observation. These methods need pre-processing steps to make the signal fit for subsequent steps of localization and classification. With the development of machine learning techniques, time domain methods enjoy a renewed interest which is expected to continue in the near future. This trend is very well demonstrated by the recent research worked presented in Table 2. Quantitative comparison of results also shows that time-based methods, in general, are able to localize and classify beats with more than 70% specificity. The frequency-based methods for localization and classification usually start with MFCC which employs pre-emphasis, windowing and FFT to make the signal appropriate for classification and localization. An analysis of the quantitative results reported in Table 2 shows that on the average, the performance of frequency-based methods is more or less similar to that of time-based methods. It should however be noted that frequency-based methods are best suited for frequency band localization but fail to extract the time location of the frequency band under observation. Time-based methods, on the other hand, provide beat location in the signal but are limited in the sense that they do not provide explicit information about the frequency content of the signal.
The time-frequency methods decompose signals into different time and frequency resolutions and aim to overcome the limitations of time- and frequency-based methods. The time-frequency methods investigated in the literature include wavelets, EMD, and TFR. Although TFR in its basic form suffers from noise due to cross terms, a number of effective techniques based on TFR are proposed for heart beat analysis. These methods attempt to suppress the cross terms while keeping as much frequency resolution as possible. The recent popularity of TFR methods is generally attributed to pseudo WV distribution and HHT which serve to reduce the noise due to cross terms and allow a better representation of the signal under study. A comparison from Table 3 shows that the results reported for TFR methods exhibit high variation, mainly as a function of the database employed for evaluation.
Among other time-frequency methods, EMD and wavelets have been mostly employed for PCG analysis. EMD is a data driven decomposition technique that decomposes a signal from high to low frequencies by generating lower and upper envelops using spline interpolation. EMD-based techniques suffer from end-effects of undershoot and overshoot, the modeling errors inherited in the EMD algorithm. The stopping criteria for EMD are being continuously researched to extract IMFs which are of physical significance. EMD has witnessed a number of variations over the years. In addition to the standard EMD, noise-assisted EMD, ensemble EMD, multi-variable EMD, and EMD complemented by HHT have also been investigated. It can be observed from Table 5 that in general, high accuracies are reported by EMD-based techniques. The results however are not directly comparable as diverse datasets have been employed by different researchers.
In addition to EMD, wavelets and their variants have been a popular as well as an effective choice of researchers for analysis of PCG signals. Wavelet is a goal-driven algorithm which decomposes a signal using dyadic filter bank. This decomposition method is limited in the sense as it does not take into account the parameters of the signal under decomposition. Despite this limitation, wavelets have been employed in all stages from pre-processing to classification.
It can be seen from Table 4 that wavelet-based techniques have been comprehensively studied and evaluated on wide variety of datasets including all three public datasets presented in this paper. These methods consistently report high localization and classification accuracies on multiple datasets.
Summarizing the key findings of our analysis, the initial attraction of employing time-frequency based methods like EMD and wavelets was their ability to represent the signal at multiple resolutions unlike time and frequency analysis. The complexity of PCG signal, however, forced the researchers to borrow techniques and features from the time and frequency domains. Features like systole and diastole time duration, beat amplitude, and beat frequency, for instance, are in common use. In general, all techniques have evolved to an extent where they are able to successfully model human heart beat under clean environment. Nevertheless, automatic analysis of noisy heart beats still remains a challenging problem. Time and frequency methods, in most cases, are dependent on machine learning algorithms to enhance the localization and classification performance. Features extracted in the time or frequency domain are typically fed to a learning algorithm and the choice of the learning algorithm also influences the overall system performance. The time-frequency methods (especially EMD and wavelets), on the other hand, are not too sensitive to the choice of the learning algorithm. Furthermore, time and frequency methods have been mostly limited to the two-class problem of classifying the heart beats as normal or abnormal while wavelets and EMD present more sophisticated solutions detecting not only the abnormality but also classifying the pathology.
Studying the quantitative performance of time, frequency, and time-frequency methods (presented in the respective tables), it can be seen that a direct comparison of these methods is difficult due to the different types of challenges offered by each dataset. A general observation is that for small datasets and clean signals, time, frequency, and time-frequency-based methods report similar accuracies. For larger datasets and noisy signals, however, the performance of time-frequency methods, especially wavelets, remains relatively stable once compared to other methods. The PASCAL 2011 dataset A, for instance, is considered to be a very challenging set of noisy signals. Only a limited number of studies [30, 52, 53, 62, 67, 77, 95] have been evaluated on this set and among these wavelets-based techniques report the highest accuracies.
Time-frequency representations like Wigner-Ville and pseudo WV have remained relatively less explored primarily due to the problem of cross terms. Reducing the cross terms while keeping the maximum possible information in signal needs to be further investigated. EMD and wavelet-based techniques enjoy the advantage over other techniques in the sense that they decompose the signal at multiple resolutions hence removing the high-frequency noise and reducing the energy contribution from low frequencies. These noise handling capabilities make such techniques an attractive choice, especially, when dealing with noisy signals. Despite these characteristics, the problem of computerized analysis of PCG signals still remains very challenging for noisy environments. A clear evidence is the performance of different systems on the noisy signals of the PASCAL 2011 datasets which offer a great margin for enhancement. Another interesting direction could be to investigate the combination of EMD and wavelets, for instance, techniques like empirical-wavelet decomposition can be employed for analysis of noisy signals. Likewise, combination of algorithms and features from time, frequency, and time-frequency domains can also be investigated to propose robust localization and classification techniques which can deal with different types of signals (clean, noisy, various pathologies etc.). In our present work, we are exploring such a combination where EMD or wavelets (time-frequency domain) are being employed for noise removal while localization is carried out using Springer algorithm  (time domain). Furthermore, a combination of features extracted from the three representations is intended to be fed to different machine learning algorithms for classification. Such rich representations are likely to enhance the localization and classification accuracies of the system.
It is also worth mentioning that the primary focus of most of the research on automatic analysis of PCG signals has been on enhancing the localization and classification accuracies. From the view point of practical applications, development of computationally efficient solutions which may work in real time is also a challenging problem that needs further exploration. In addition, the presently available PCG datasets comprise a limited number of samples and do not cover the complete range of pathologies which are likely to be encountered in clinical settings. This necessitates the development and labeling of a comprehensive dataset of PCG signals encompassing a variety of signals and covering all major pathologies. Considering the complexity of the problem, modeling the heart beats and various pathologies is likely to offer more robust solutions as opposed to the conventional techniques relying on localization and classification. To the best of authors’ knowledge, very limited efforts have been made in this direction. Morlet wavelet filter, for instance, has been investigated to model the heart beat. However, modeling of such complex signals brings along its own challenges and requires significant amount of data for each pathology, a requirement that is hard to meet in the currently available datasets.
This paper presented an overview of the techniques proposed for computerized analysis of PCG signals which represent recordings of heart sound. Localization and classification of beats have been the key research areas with the objective to discriminate between normal and abnormal heart sounds. The variation in amplitude, frequency, and duration of beats makes PCG a very complex signal for automatic analysis. The domain has witnessed more than three decades of research, and this paper is an attempt to provide an overview of the current state-of-the-art on this subject. We organized the notable contributions to automatic analysis of PCG signals as function of domains of analysis, namely, time, frequency, and time-frequency methods. An analysis of the review techniques revealed that, in general, the performance of time, frequency, and time-frequency is similar for small datasets and clean signals. The more challenging scenario is analysis of noisy signals where time-frequency specially EMD and wavelets has been popular choice of researchers. These methods, however, bring with them the additional computational cost and algorithmic complexity. Consequently, simple features like amplitude, energy, beat duration, and spectral flux extracted from tine and frequency domain continue to sustain. These time and frequency methods have been complemented by using sophisticated machine learning to enhance the localization and classification performance.
Artificial neural network
Complete ensemble empirical mode decomposition
Convolutional neural networks
Correct recognition rate
Discrete cosine transform
Dynamic time warping
Discrete wavelet transform
Ensemble empirical mode decomposition
Empirical mode decomposition
Fast fourier transform
Hidden Markov model
Heart sound challenge
Hilbert vibration decomposition
Inverse fast fourier transform
Intrinsic mode functions
Least mean square
Linear predictive coding
Modified cuckoo search
Mel-frequency cepstral coefficients
Multilayer perceptron-back propagation neural network
Normalized average Shannon energy
Non-negative matrix factorization
Patent ductus arteriosus
Positive predictive value
Percent root mean square difference
Renyi entropy measure
Root mean square error
First normal heart sound
Second normal heart sound
Third abnormal heart sound
Fourth abnormal heart sound
Self organizing map
Smoothed Pseudo Wigner-Ville distribution
Short-time frequency transform
Singular value decomposition
Support vector machine
Teager energy operator
Ventricular septal defect
World Health Organization
Mortality due Cardiovascular diseases in world. World Health Organization. https://www.medicalnewstoday.com/articles/282929.php.
Z Zhang, IEEE Signal Processing Cup (2015). http://www.ubcieee.org/events/signal-processing-cup-2015. https://sites.google.com/site/researchbyzhang/ieeespcup2015.
F Castells, et al, Principal component analysis in ECG signal processing. EURASIP J. Adv. Signal Process.2007:, 074580 (2007).
F Minhthang Bui, D Hatzinakos, Biometric methods for secure communications in body sensor networks: resource-efficient key management and signal-level data scrambling. EURASIP J. Adv. Signal Process.529879 (2008). https://doi.org/10.1155/2008/529879.
M Sepulveda-Cano, et al, Selection of nonstationary dynamic features for obstructive sleep apnoea detection in children. EURASIP J. Adv. Signal Process.538314 (2011). https://doi.org/10.1155/2011/538314.
S Nemati, A Malhotra, GD Clifford, Data fusion for improved respiration rate estimation. EURASIP J. Adv. Signal Process 2010.926305 (2010). http://doi.org/10.1155/2010/926305.
SM Debbal, F Bereksi Reguig, Computerized heart sounds analysis. Comput. Biol. Med.38:, 263–280 (2008).
AH Osamah, et al, Multimodal biometrics based on identification and verification system. J. Biom. Biostat.4:, 2 (2013).
W Phanphaisarn, et al, Heart detection and diagnosis based on ECG and EPCG relationships. Med. Devices (Auckl). 4:, 133–144 (2011).
S Maderwald, et al, Is Phonocardiogram gating a reliable alternative to ECG gating in clinical routine for CINE and velocity-encoded phase ocrast imaging?Proc.Intl. Soc. Mag. Reson. Med.20: (2012).
LS Stodieck, MW Luttges, Relationships between the electrocardiogram and phonocardiogram: potential for improved heart monitoring. ISA Trans.23(4), 59–65 (1984).
I Maglogiannis, et al, Support vectors machine-based identification of heart valve diseases using heart sounds. Comput. Meth. Prog. Biomed.95:, 47–61 (2009).
H Moghaddasi, F Almasganj, A Zoroufian, Imaging of heart acoustic based on the sub-space methods using a microphone array. Comput. Meth. Prog. Biomed.146:, 133–42 (2017).
A Sepehri Amir, et al, A novel method for pediatric heart sound segmentation without using the ECG. Comput. Methods Prog. Biomed.99:, 43–48 (2010).
DB Springer, L Tarassenko, GD Clifford, Logistic regression-HSMM-based heart sound segmentation. IEEE Trans. Biomed. Eng.63:, 822–832 (2016).
S Chauhan, et al, A computer-aided MFCC-based HMM system for automatic auscultation. Comput. Biol. Med.38:, 221–33 (2008).
MA Kotb, H Nabih, F El Zahraa, M El Falaki, Improving the Recognition of Heart Murmur. Int. J. Adv. Comput. Sci. Appl. (IJACSA).7:, 1–10 (2016).
LG Durand, P Pibarot, Digital signal processing of the phonocardiogram: review of the most recent advancements. Crit. Rev. Biomed. Eng.23(3-4), 163–219 (1995).
RM Rangayyan, RJ Lehner, Phonocardiogram signal analysis: a review. Crit. Rev. Biomed. Eng.15(3), 211–369 (1987).
F Meziani, SM Debbal, A Atbi, Analysis of phonocardiogram signals using wavelet transform. J. Med. Eng. Technol.36(6), 283–302 (2012).
T Chakrabarti, et al, in 2015 International conference and workshop on computing and communication (IEMCON). Phonocardiogram signal analysis - practices, trends and challenges: a critical review. (IEEENew York, 2015).
NA Mohammed, ESA El-Dahshan, AS Yahia, A review of intelligent systems for heart sound. J. Med. Eng. Technol.41(7), 553–563 (2017).
eGeneral Medical Inc. USA. eGeneralMedical.com. http://www.egeneralmedical.com/listohearmur.html. Accessed 20 Apr 2018.
P Bentley, et al, PASCAL Classifying heart sounds challenge (2011). http://www.peterjbentley.com/heartchallenge/. Accessed 20 Apr 2018.
National Institute of General Medical Sciences and the National Institute of Biomedical Imaging and Bioengineering. PhysioNet/CinC challenge (2016). NIGMS & NIBIB. (2016). https://www.physionet.org/physiobank/database/challenge/2016/. Accessed 20 Apr 2018.
Michigan Medicine. Heart Sound & Murmur Library University of Michigan Health System. Ann Arbor. MI.http://www.med.umich.edu/lrc/psb/_open/html/repo/primer/_heartsound/primer/_heartsound.html. Accessed 20 Apr 2018.
S Davis, P Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process.28(4), 357–366 (1980).
LR Rabiner, RW Schafer, Introduction to digital speech processing (Now Publishers Inc., 2007).
H Naseri, MR Homaeinezhad, H Pourkhajeh, Noise/spike detection in phonocardiogram signal as a cyclic random process with non-stationary period interval. Comput. Biol. Med. 43:, 1205–1213 (2013).
M Singh, A Cheema, Heart sounds classification using feature extraction of phonocardiography signal. Int. J. Comput. Appl.77:, 13–17 (2013).
H Liang, S Lukkarinen, I Hartimo, Heart sound segmentation algorithm based on heart sound envelolgram. Comput. Cardiol.24:, 105–8 (1997).
C Potes, et al, Ensemble of feature-based and deep learning-based classifiers for detection of abnormal heart sounds. Comput. Cardiol. Conf. (CinC), 621–624 (2016). https://doi.org/10.23919/CIC.2016.7868819.
S Ari, K Hembram, G Saha, Detection of cardiac abnormality from PCG signal using LMS based least square SVM classifier. 37:, 8019–26 (2010).
H Ali, TJ Ahmad, K Shoab, Heart sound signal modeling and segmentation based on improved Shannon energy envelogram using adaptive windows. Asialink Int. Conf. Biomed. Eng. Tech. (2007). Pearson Corp. London.
G Redlarski, D Gradolewski, A Palkowski, A system for heart sounds classification. PLoS ONE. 9(11), e112673 (2014). https://doi.org/10.1371/journal.pone.0112673.
TE Chen, et al, S1 and S2.heart sound recognition using deep neural networks. IEEE Trans. Biomed. Eng.64:, 372–80 (2017).
AP Yoganathan, et al, Use of the fast Fourier transform for frequency analysis of the first heart sound in normal man. Med. Biol. Eng.14:, 69–73 (1976).
ND Marcelo, et al, Classifying Heart Sounds Using Images of MFCC and Temporal Features (Springer, Cham, 2017).
JJG Ortiz, CP Phoo, J Wiens, Heart sound classification based on temporal alignment techniques (IEEE, New York, 2016).
H Tang, H Chen, T Li, M Zhong, Classification of normal/abnormal heart sound recordings based on multi-domain features and back propagation neural network. Comput. Cardiol. Conf. (CinC), 593–596 (2016). https://doi.org/10.23919/CIC.2016.7868812.
J Rubin, R Abreu, A Ganguli, S Nelaturi, I Matei, K Sricharan, in Computing in Cardiology Conference (CinC). Recognizing Abnormal Heart Sounds Using Deep Learning (Cornell University library, 2016), pp. 813–6.
I Orovic, S Stankovic, Time-frequency-based speech regions characterization and eigenvalue Decomposition applied to speech watermarking. EURASIP J. Adv. Signal Process. 2012. 2010:, 572748 (2010).
YS Shin, J-J Jeon, Pseudo Wigner-Ville time-frequency distribution and its application to machinery condition monitoring. Shock. Vib.1-1:, 65–76 (1993).
JD Martnez-Vargas, et al, Time-frequency based feature selection for discrimination of non-stationary biosignals. EURASIP J. Adv. Signal Process.2012(219), 2012 (2012).
X Zhang, et al, Time-frequency scaling transformation of the phonocardiogram based of the matching pursuit method. IEEE Trans. Biomed. Eng.45:, 972–9 (1998).
D Boutana, M Benidir, B Barkat, Segmentation and identification of some pathological phonocardiogram signals using time- frequency analysis. IET Signal Process.5:, 527–537 (2011).
SM Debbal, F Bereksi Reguig, Time-frequency analysis of the second cardiac sound in phonocardiogram signals. Med. Phys. 32:, 2911–7 (2011).
A Gavrovska, V Bogdanovic, I Reljin, B Reljin, Automatic heart sound detection in paediatric patients without electrocardiogram reference via pseudo-affine Wigner-Ville distribution and Haar wavelet lifting. Comput. Meth. Prog. Biomed.113:, 515–2 (2014).
P Goncalves, RG Baraniuk, Pseudo affine Wigner distributions: definition and kernel formulation. IEEE Trans. Signal Process.46:, 1505–1516 (1998).
A Sing, MK Dutta, CM Travieso, in International Conference and Workshop on Bioinspiblue Intelligence (IWOBI). Analysis of Heart Sound for Automated Diagnosis of Cardiac Disorders (IEEENew York, 2017), pp. 1–5.
A Barma, BW Chen, W Ji, S Rho, CH Chou, JF Wang, Detection of the third heart sound based on nonlinear signal decomposition and time-frequency localization. IEEE Trans. Biomed. Eng.63:, 1718–27 (2016).
W Zhang, J Han, S Deng, Heart sound classification based on scaled spectrogram and least squares regression. Biomed. Signal Process. Control. 32:, 20–28 (2017).
W Zhang, J Han, S Deng, Heart sound classification based on scaled spectrogram and tensor decomposition. 84:, 220–231 (2017).
RC Gonzalez, RE Woods, Digital image processing, 3rd edn. (Pearson Corp., London, 2007).
D Song, L Jia, Y Lu, L Tao, in International Conference on Systems and Informatics (ICSAI2012). Heart sounds monitor and analysis in noisy environments (IEEENew York, 2012), pp. 1677–1681.
S Babaei, A Geranmayeh, Heart sound reproduction based on neural network classification of cardiac valve disorders using wavelet transforms of PCG signals. Comput. Biol. Med. 39:, 8–15 (2009).
H Liang, I Hartimo, A heart sound feature extraction algorithm based on wavelet decomposition and reconstruction. Proc. 20th Ann. Int. Conf. IEEE Eng. Med. Biol. Soc.20:, 1539–42 (1998).
H Liang, I Hartimo, A feature extraction algorithm based on wavelet packet decomposition for heart sound signals (IEEE, New York, 1998).
Tu Zhihai, et al, in Sixth International Conference on Natural Computation (ICNC 2010). Improved methods for detecting main components of heart sounds (IEEENew York, 2010), pp. 3585–88.
CN Gupta, et al, Neural network classification of homomorphic segmented heart sounds. Appl. Soft Comput.7:, 286–97 (2007).
H Naseri, MR Homaeinezhad, Detection and boundary identification of phonocardiogram sounds using an expert frequency-energy based metric. Ann. Biomed. Eng.41:, 279–92 (2012).
J Pedrosa, A Castro, TTV Vinhoza, in 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Automatic heart sound segmentation and murmur detection in paediatric phonocardiograms (IEEENew York, 2014), pp. 2294–7.
X Zheng, M Dong, B Fu, X Fei, Automatic pathological signal detection of heart sound based on multi-resolution analysis. Math. Comput. Contemp. Sci. (2013).
N Marques, et al, Exploring the stationary wavelet transform detail coefficients for detection and identification of the S1 and S2 heart sounds. Comp. Cardiol.2013:, 891–894 (2013).
Y Deng, PJ Bentley, A robust heart sound segmentation and classification algorithm using wavelet decomposition and spectrogram (2012).
CN Gupta, et al, Segmentation and classification of heart sounds. Can. Conf. Electr. Comput. Eng.1674–1677 (2005). doi:10.1109/CCECE.2005.1557305.
M Abo-Zahhad Mohammed, et al, A comparative approach between cepstral features for human authentication using heart sounds. Signal Image Video Process. 10:, 843–851 (2016).
P Kumar Jain, A Kumar Tiwari, An adaptive method for shrinking of wavelet coefficients for phonocardiogram denoising (Elsevier, New York, 2016).
P Kumar Jain, A Kumar Tiwari, An adaptive thresholding method for the wavelet based denoising of phonocardiogram signal. Biomed. Signal Process. Control. 38:, 388–99 (2017).
MÁ Goda, P Hajas, Morphological determination of pathological PCG signals by time and frequency domain analysis. Comput. Cardiol. Conf. (CinC), 1133–1136 (2016). https://doi.org/10.23919/CIC.2016.7868947.
M Abdollahpur, et al, Cycle selection and neuro-voting system for classifying heart sound recordings. Comput. Cardiol.43:, 176–238 (2016).
M Boussaa, et al, in 11th International Conference on Intelligent Systems: Theories and Applications (SITA). Comparison of MFCC and DWT features extractors applied to PCG classification (IEEENew York, 2016), pp. 1–5.
E Kay, A Agarwal, DropConnected neural networks trained on time-frequency and inter-beat features for classifying heart sounds. Inst. Phys. Eng. Med. Physiol. Meas.38:, 1645–57 (2017).
F Sattar, et al, Time-scale-based segmentation for degraded PCG Signals Using NMF. Non-negative Matrix Factorization Tech., 79–194 (2016). Springer.
A Ramovic, et al, Wavelet and Teager Energy Operator (TEO) for Heart Sound Processing and Identification. IFMBE Proc. 1(1–2), 70–73 (2017).
K Elisavet, et al, Fetal heart sounds detection using wavelet transform and fractal dimension. Front. Bioeng. Biotechnol.5: (2017). https://doi.org/10.3389/fbioe.2017.00049.
M Nabih-Ali, et al, Denoising of Heart Sound Signals Using Discrete Wavelet Transform. Circ. Syst.8(2017), 184–190 (2017).
NE Huang, et al, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. R. Soc.454:, 903–995 (1998).
A Zeiler, et al, in The 2010 International Joint Conference on Neural Networks (IJCNN). Empirical mode decomposition - an introduction. (IEEENew York, 2010), pp. 1–8.
C Park, et al, Classification of motor imagery BCI using multivariate empirical mode decomposition. IEEE Trans. Neural Syst. Rehabil. Eng. 21:, 10–22 (2013).
N Rehman, DP Mandic, Multivariate empirical mode decomposition. Proc. R. Soc. Lond. A Math. Phys. Eng. Sci.466:, 1291–302 (2010).
T Tanaka, DP Mandic, Complex empirical mode decomposition. IEEE Signal Proc. Lett.14:, 101–4 (2007).
S Amy, et al, in 4th International Conference on Instrumentation, Communications, Information Technology and Biomedical Engineering (ICICI-BME). Automatic segmentation and detection of heart sound components S1, S2, S3 and S4 (IEEENew York, 2015).
Z Zhao, Noninvasive diagnosis of coronary artery disease based on instantaneous frequency of diastolic murmurs and SVM, vol 2005 (IEEE, New York, 2005).
Z Zhao, Q Shen, F Ren, Heart sound biometric system based on marginal spectrum analysis. Sensors (Basel). 13(2), 2530–51 (2013).
L Sun, et al, in 7th International Conference on Signal Processing. A novel method of time-frequency Representation and its application to biomedical signal processing, vol. 1 (IEEENew York, 2004), pp. 236–9.
L Sun, et al, in 27th Annual International Conference of the Engineering in Medicine and Biology Society. Instantaneous frequency estimate of nonstationary phonocardiograph signals using Hilbert spectrum (IEEENew York, 2005), p. 2005.
A Gavrovska, M Slavkovic, I Reljin, B Reljin, Application of wavelet and EMD-based denoising to phonocardiograms. Int. Symp. Signals Circ. Syst. ISSCS2013, 1–4 (2013).
A Moukadem, A Dieterlen, C Brand, Study of two feature extraction methods to distinguish between the first and the second heart sounds. Int. Conf. Bio-inspiblue Syst. Signal Process. BIOSIGNALS.2012:, 346–50 (2012).
H Sun, W Chen, J Gong, An improved empirical mode decomposition wavelet algorithm for phonocardiogram signal denoising and its application in the first and second heart sound extraction.
DBB Barkat, M Benidir, Segmentation of pathological heart sound signal using empirical mode decomposition. Int. J. Comput. Electr. Eng., 26–29 (2013). https://doi.org/10.7763/IJCEE.2013.V5.655.
CD Papadaniil, LJ Hadjileontiadis, Efficient heart sound segmentation and extraction using ensemble empirical mode decomposition and kurtosis features. IEEE J. Biomed. Health Inf.18:, 1138–52 (2014).
S Christophe, et al, in 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies. On the separation of heart sound components using a translated empirical mode decomposition, vol. 4. ISABEL (ACMNew York, 2011), pp. 1–5.
JA Jimenez, MA Becerra, E Delgado-Trejos, Heart murmur detection using ensemble empirical mode decomposition and derivations of the Mel-frequency cepstral coefficients on 4-area phonocardiographic signals. Comput. Cardiol.493–496 (2014).
S Banerjee, M Mishra, A Mukherjee, Segmentation and detection of first and second heart sounds (S1 and S2) using variational mode decomposition, 565–70 (2016). https://doi.org/10.1109/IECBES.2016.7843513.
K Dragomiretskiy, D Zosso, Variational mode decomposition. IEEE Trans. Signal Process. 62–63:, 531–544 (2014).
AH Salman, et al, Empirical mode decomposition (EMD) Based Denoising Method for Heart Sound Signal and Its Performance Analysis. Int. J. Electr. Comput. Eng. (IJECE). 6:, 2197–204 (2016).
J Jusak, I Puspasari, P Susanto, Heart murmurs extraction using the complete ensemble empirical mode decomposition and the Pearson distance metric (IEEE, New York, 2016).
A Nima, A Oveis, Short term wind power publiction based on improved kriging interpolation, empirical mode decomposition and closed-loop forecasting engine. Sustainability. 9(11), 2071–1050 (2017). http://www.mdpi.com/2071-1050/9/11/2104. Accessed 20 Apr 2018.
V Nivitha Varghees, KI Ramachandran, A novel heart sound activity detection framework for automated heart sound analysis. Biomed Signal Proc. Control.13:, 174–188 (2014).
AH Salman, et al, in 2015 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS). Performance comparison of denoising methods for heart sound signal (IEEENew York, 2015).
J Gong, S Nie, Y Wang, An improved empirical mode decomposition algorithm for phonocardiogram signal de-noising and its application in S1/S2 extraction, vol. 32 (IEEE, New York, 2015).
GB Moody, RG Mark, The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol.20:, 45–50 (2001).
SE Schmidt, C Holst-Hansen, C Graff, E Toft, JJ Struijk, Segmentation of heart sound recordings by a duration-dependent hidden Markov model. Physiol. Meas.31(4), 513 (2010).
University of Toronto BioSec Lab. Biosec. PCG database. Lab BioSec.http://www.comm.utoronto.ca/biometrics/databases.html. Accessed 20 Apr 2018.
The authors would like to thank Bahria University, Islamabad, Pakistan, for providing us with the opportunity to carry out the reported research.
The reported research is not a part of any funded project.
Authors declare that they have no identifiable conflicting or competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Availability of data and materials
The review referred to the following datasets which are publicly available.
• 2015 IEEE Signal processing cup. Zhilin Zhang. 2015. https://sites.google.com/site/researchbyzhang/ieeespcup2015http://www.ubcieee.org/events/signal-processing-cup-2015
• PASCAL Classifying heart sounds challenge. Peter Bentley et al. 2011. http://www.peterjbentley.com/heartchallenge/
• eGeneral Medical Inc.USA. eGeneralMedical.com. http://www.egeneralmedical.com/listohearmur.html
• Michigan Medicine. Heart Sound and Murmur Library University of Michigan Health System. Ann Arbor. MI. http://www.med.umich.edu/lrc/psb_open/html/repo/primer_heartsound/primer_heartsound.html
• National Institute of General Medical Sciences and the National Institute of Biomedical Imaging and Bioengineering. PhysioNet/CinC challenge 2016. NIGMS & NIBIB. 2016. https://www.physionet.org/physiobank/database/challenge/2016/
• University of Toronto BioSec Lab. Biosec. PCG database. Lab BioSec. http://www.comm.utoronto.ca/~biometrics/databases.html
Authors referred to following implementation in the manuscript.
• ASTRES : Analysis, synthesis and transformations by reassignment, EMD and synchrosqueezing. http://www.ens-lyon.fr/PHYSIQUE/Equipe3/ANR_ASTRES/
• Massachusetts institute of technology. EMD. http://www.mit.edu/~gari/CODE/HRV/emd.m
About this article
Cite this article
Ismail, S., Siddiqi, I. & Akram, U. Localization and classification of heart beats in phonocardiography signals —a comprehensive review. EURASIP J. Adv. Signal Process. 2018, 26 (2018) doi:10.1186/s13634-018-0545-9
- PCG signals