Heart rate tracking in photoplethysmography signals affected by motion artifacts: a review

*Correspondence: shahiid.ismail@gmail.com 1Bahria University, Islamabad, Pakistan Full list of author information is available at the end of the article Abstract Non-invasive photoplethysmography (PPG) technology was developed to track heart rate during motion. Automated analysis of PPG has made it useful in both clinical and non-clinical applications. However, PPG-based heart rate tracking is a challenging problem due to motion artifacts (MAs) which are main contributors towards signal degradation as they mask the location of heart rate peak in the spectra. A practical analysis system must have good performance in MA removal as well as in tracking. In this article, we have presented state-of-art techniques in both areas of the automated analysis, i.e., MA removal and heart rate tracking, and have concluded that adaptive filtering and multi-resolution decomposition techniques are better for MA removal and machine learning-based approaches are future perspective of heart rate tracking. Hence, future systems will be composed of machine learning-based trackers fed with either empirically decomposed signal or from output of adaptive filter.

(2021) 2021: 5 Page 2 of 27 changes in blood [9]. PPG is of either transmittance or reflectance type. In the transmittance type, LED used as the light source is placed opposite to PD while LED and PD are on same side in the reflectance type PPG. Figure 1a shows both types where LED and PD are on same and opposite side of the finger for transmittance and reflectance type PPG, respectively. When light from LED is incident on living tissue, it is mainly absorbed by tissues and arteries. Figure 1b shows light absorption from light source (LED) to receiver (PD) and resultant PPG waveform which represents a quasiperiodic signal [11]. It is clear from Fig. 1b that waveform is composed mainly of non-pulsatile or DC component and pulsatile or AC component. The DC component corresponds to average or steady blood volume of both arterial and venous blood. The AC component refers to changes in blood volume during systole and diastole phase [10]. Changes in blood volume can be used to calculate the average heart rate (HR) which is one of the major applications of PPG among others as mentioned above. However, PPG has one distinct advantage over other HR calculation modalities like phonocardiography (PCG), electrocardiography (ECG), etc. as it does not require any specific technique to attach sensors at pre-defined positions in the body [12]. In fact, PPG can be collected from the earlobe, finger, or wrist [13]. This ease-of-use has made PPG attractive to calculate HR during exercise and other physical activities.
During physical activity, PPG channels (channels which are used for HR calculation) and motion channels (acceleration (ACC) channels) become correlated. Hence, PPG channels are directly affected due to noise contents added due to ACC channels. However, these noise contents are different from other noises like environment noise as they have high amplitude and are able to alter the signal morphology, hence changing its spectral contents. These special types of noise contents which arise due to physical activity are known as motion artifacts (MAs). Moreover, MAs can add contents in which spectral contents may overlap or become too close to actual HR. Figures 2a, b and 3a, b show the effect of MAs on a dual channel PPG signals. In Fig. 2a, the person is at rest or moving slowly. The peaks in both PPG channels represent the actual HR. But in Fig. 2b, the person is either doing exercise or moving using some energy and the major spectral peak Fig. 1 a Transmittance and reflectance type PPG [10]. b Absorption of light in tissue [10] Ismail et al. EURASIP Journal on Advances in Signal Processing in PPG1 and a minor peak in PPG2 represent the actual heart rate. In Fig. 3a, one of the minor peaks in PPG1 represents the actual heart rate. But PPG2 does not have a single peak which could represent the actual heart rate. Moreover, MAs in ACC channels appear directly in PPG channels. Finally, in Fig. 3b, both channels are devoid of peaks which could represent the actual heart rate. Moreover, both PPG channels in Figs. 2b and 3a, b are affected differently due to the intensity and type of exercise that the person was doing. Due to high probability of proximity of HR and MA spectral peaks and spectral peak randomness, removal of MAs is the single most important preprocessing step in classical signal processing to calculate HR calculation from PPG. In literature, PPG signals are not used to calculate HR directly but in fact, HR calculation is divided into phases like pre-filtering, MA removal, peak detection, and peak tracking. However, we will address the problem in two distinct stages, i.e., pre-filtering and peak tracking where pre-filtering is composed of noise and MA removal and peak tracking is consisted of peak detection and tracking. A general system which can be used to analyze the PPG for heart rate estimation and tracking is shown in Fig. 4. In the beginning, PPG and ACC signals are passed through a MA removal and filtering stage which cleans the signal to make it fit for processing in the tracking stage and we have the final heart rates from tracking stage.
Adaptive filtering complemented by other preprocessing techniques and multiresolution decomposition are dominant approaches used to pre-filter the signal. After the signal is made comparatively motion artifact (noise(s)) free, it is fed to the tracking algorithm which uses signal properties to track HR.
The article consists of 5 sections. Given below in the "Datasets" section are details of datasets used by researchers to develop and quantify their research findings. After datasets, MA removal using pre-filtering techniques of adaptive filtering and multi-resolution decomposition are discussed, followed by the trackers in the "Methods" section. A detailed discussion about the findings of review is presented in the "Discussion" section. The "Conclusion" section concludes the review.

Datasets
Datasets being used in the PPG research are both private as well as public in nature. They have been recorded from healthy as well as critically ill persons using variety of equipments. A brief introduction of more popular datasets is given below.

IEEE Signal Processing Cup (SPC)
IEEE SPC [14] is the first publicly available PPG HR dataset which is labeled as well. It is divided into three parts. The first chunk is composed of 12 training signals uploaded in August 2014. Another training signal and 10 test signals were uploaded in January 2015. The aim of uploading this dataset was a competition for HR calculation. All training data (12+1) is composed of 6 channels. The first channel is an ECG reference, next two are PPG-based signals, and the final three signals are acceleration (ACC) signals along x, y, and z axes. Test data (10 signals) is composed of five channels in which the first two channels are PPG and the three additional channels are ACC signals [14]. Reference ECG and ground truth are provided separately. All signals whether ECG, PPG, and ACC are sampled at 125 Hz using green LEDs which uses a wavelength of 515 nm. There are two types of exercise patterns reported which are given below Type01 exercise generally refers to whole body motion like running, jogging, etc. Type02 are exercises which usually involve forehand and wrist motion. Because of public availability, this dataset is used by researchers as a reference for HR estimation (MA removal and peak tracking ) along side the private datasets.

Chon Lab dataset
Chon Lab dataset is a private dataset which was recorded in Chon Labs using equipment (pulse oximeter and accelerometer) developed at the same lab [15]. It was gathered from 10 healthy persons using a forehead band. Both PPG and ACC signals were recorded at red and infrared wavelengths using sampling frequency of 80 Hz. The reference ECG was at much higher sampling frequency of 400 Hz recorded from the chest. Subjects performed walking, jogging, and running for 9 min followed by 1-min random arbitrary motion [15].

IEEE TBME Respiratory Rate Benchmark dataset
The IEEE TBME dataset is a dataset composed of capnometry (25 Hz) and PPG (100 Hz) signals. Both capnometry and PPG signals were recorded using S/5 Collect software using sampling frequency of 300 Hz [16]. It has 8 min, raw PPG signals for 42 persons. PPG pulse and artifacts are labeled. This dataset is basically a respiration dataset, but it also contains ECG and HR data. This dataset was first appeared in research work conducted by Karlen et al. [16].

BIDMC PPG and Respiration dataset
The BIDMC PPG and Respiration dataset is different from usual datasets as it was acquired from critically ill patients at the Beth Israel Deaconess Medical Centre, USA. The 53 annotated recordings are each 8 min long. PPG, impedance respiratory signal, and ECG are sampled at 125 Hz. HR, respiratory rate (RR), and blood oxygen saturation level (SpO2) are all sampled at 1 Hz. This dataset is available in three different formats, i.e., WFDB (WaveForm DataBase), CSV (comma-separated-value), and Matlab (R) formats. Pimentel et al. [17] used this dataset to estimate RR.

Synthetic dataset
Synthetic dataset is basically output of different computer algorithms which take baseline wander (BW), amplitude modulation (AM), or frequency modulation (FM) as input. This dataset is a collection of ECG and PPG signals. In order to make the dataset more realistic, it was implemented for a range of HRs and RRs. Charlton et al. [18] introduced this dataset. The total number of ECG and PPG signals is 192. Among the datasets discussed above, IEEE SPC and Chon Lab datasets are created solely for purposes of HR tracking from PPG signals. While IEEE SPC is the most famous public dataset for HR tracking and is usually used to quantify the performance of any HR calculation technique, Chon Lab dataset on the other hand is private in nature. IEEE TBME dataset and BIDMC PPG dataset are basically for RR calculation, and researchers usually use it for RR only [16,19]. However, as heart rates are also available, so they can be used for quantification of technique for heart rate calculation as well. But a recent trend is to use these datasets for other classifications like personal authentication as reported in works [20,21]. All the datasets mentioned till now are collected from real persons. However, Synthetic dataset is a computer-generated dataset which was created for purpose of algorithm testing.

Methods
As mentioned earlier, PPG analysis is divided into MA removal and HR tracking. The details of both stages will be discussed separately.

Motion artifact (MA) removal
As many MA removal techniques are proposed over years, we would like to discuss the findings by further categorizing techniques into adaptive filtering, multi-resolution, and other techniques.

Adaptive filtering-based preprocessing
In signal processing applications, signals are pre-filtered to make them suitable for the desired application. Usually, the noise we want to remove have spectral contents which does not overlap with signal contents. In such cases, static filters (in which filter coefficients remain constant) can be used. However, if spectral contents of noise overlap with signal or we have little or no information regarding the noise, then static filters cannot be used and we have to consider the adaptive filters. Adaptive filters are of different types but the configuration which is usually used in PPG is shown in Fig. 5.
These types of filters are known as least mean square (LMS) filters, and they are named after the error they reduce. For these types of adaptive filters, we need a reference signal. In PPG processing, we have ACC signals as noise reference. In comparatively earlier work, Chan and Zhang [22] used adaptive filtering to remove MAs. The authors quantified the performance of adaptive filtering by comparing correlation between unfiltered PPG and noise versus unfiltered PPG and output of LMS filter. Results showed that correlation between PPG and uncorrelated output of adaptive filter was improved which showed effectiveness of using adaptive filtering to remove noise from PPG. Comtois and Mendelson in their work [23] proved that tri-axis ACC data can be used an input for MA removal and HR calculation, aside from SpO2 calculation. For their research, the authors collected PPG and ACC signals from the forehead. In similar work, Kim et al. [24] put forward an idea that signal collected from the forehead had fewer MAs than signals collected from other body parts and hence HR collected from the forehead was more reliable. Pengfei et al. [25] designed a wrist band type PPG sensor in which triaxis Micro Electro-Mechanical Systems (MEMS) ACC sensors were used and put forth the idea that two ACC axis can be used for the reduction of MAs using fast transversal recursive least squares filter (FTRLSF) algorithm. Fast transversal filter (FTF) was used for fast recovery of clean PPG. Fallet and Vesin [26,27] made different combinations of PPG and ACC signals to track HR. In these works [26,27], they used normalized LMS (NLMS), a variant of classical LMS, to remove the motion-related noise. Mashhadi et al. [28] used all three ACC signals as reference to remove noise from two PPG signals. However, ACC signals were not fed directly to adaptive filters but first, they were decomposed using singular value decomposition (SVD)-based technique to generate reference MA signals. The approach used by Ahamed and Islam [29] is based on information from tri-acceleration signals. However, only signal having the highest band power in frequency range 0.45-2 Hz was considered for reference generation. The authors observed that ACC signal with the highest power in mentioned frequency range had peaks which were correlated with the highest peak in PPG signal in the same range.
Wood and Asada [30] designed a jogger-specific MA removal system in which PPG was sampled using a finger-worn ring ACC sensor. They modeled MA removal problem as system identification problem using Windrow adaptive noise canceler using finite impulse response (FIR) filter [30]. They extended their work to design FIR filter using Laguerre series and concluded that Laguerre-based FIR filter output closely resembled clean PPG signal [31]. In a couple of recent studies [32][33][34][35], the authors have used different variants of adaptive filters to suppress MAs. The authors in [32] removed artifacts using NLMS filter as this filter has low computational complexity and complex NLMS (CNLMS) was used in [34] to remove artifacts in cascaded format. Researchers in [33,35] complemented their adaptive filters with principal component analysis (PCA) and Newton adaptation. The studies discussed so far have used the original ACC signals as noise reference in one way or the other. Noise can be generated synthetically, and researchers like Ram et al. [36][37][38] and Yousafi et al. [39,40] have used synthetically generated noise in their research. Yousefi et al. [39] first focused on the fundamental frequency present in the PPG and used comb filter to generate reference. Ram et al. [36,38] generated noise reference by setting coefficients of cardiac and respiratory components to zero in FFTbased spectrum. However, the approach used in both researches [37,40] was similar and based on generation of noise using SVD, independent component analysis (ICA), and FFT. Another approach of MA removal is to model them using polynomials. In studies like [41][42][43], the authors modeled MAs present in ACC signals using non-linear second-order Volterra filters. In Eq. 1, signal x is modeled as non-linear second-order Volterra filter. (1) In Eq. 1, term 1 models signal x linearly while term 2 shows the non-linear part of the equation.
The approach in [41] was different from the other researches in that first it used cross bicoherence to detect MAs in PPG. A comparatively less used approach is to use Kalman filtering (KF) to suppress or remove MAs. In [44], Frigo et al. used KF to suppress the MAs but instead of using conventional KF only, they introduced the smoother version with constant step.
As discussed above, adaptive filtering appears as a powerful technique to remove MAs because of a variety of available implementations, ease of use, and compatibility with other methods.

Signal decomposition-based MA removal
As discussed earlier, MAs can change the morphology of PPG signals in such a way that spectral contents related with MAs can mask the spectral contents representing the actual HR contents as shown in Fig. 3b. Removing artifacts in such cases becomes a challenging problem. Signal decomposition is another, yet useful methodology to remove the MAs aside from adaptive filtering. In signal decomposition, the first signal is represented at multiple time and frequency resolutions using any suitable technique. Then, signal resolutions which are more related to MAs/noise(s) are removed and the signal is reconstructed using the remaining resolutions. In essence, we have a new yet clean representation of signal which can be used for further processing.
Wavelets are famous multi-resolution filters which are in use owing to functions, they make available for use in different applications. In one of such works [45], Kasambe and Rathor used very large-scale integration (VLSI)-based wavelet denoising to remove MAs. In studies [46][47][48][49][50], the authors used wavelet transform for PPG denoising. The use of wavelets was generally limited to denoising [46][47][48][49][50], and researches usually reconstructed PPG signal again which was used for HR estimation. However, Rojano et al. [47] went on to compare wavelet with Singular Value Decomposition of the Time Frequency Distribution (SVDTFD) and Ensemble Empirical Mode Decomposition (EEMD). They reported that DWT falls between SVDTFD and EEMD and is better than EEMD but lags behind SVDTFD.
Empirical Mode Decomposition (EMD) is another signal decomposition technique which is similar in working with wavelets with the difference that instead of using filters at various cutoff, it operates in signal domain. It resolves signal into multiple resolutions by considering minima and maxima in signal. Zhang et al. [51] and Emroz et al. [52] both employed EEMD for HR detection. EEMD is an updated version of EMD which is known to produce intrinsic mode functions (IMFs) which have better physical interpretation [53,54]. Tang et al. [55] combined EMD with wavelet to provide better denoising. The authors concluded that employment of EMD at the first stage enhanced the signal quality which when de-noised by wavelet enhanced signal quality to extend that was much better for noise removal. The techniques discussed so far operate on signal itself, but there are techniques which decompose signal using mathematical operations. Among these techniques are techniques like SVD, singular spectrum analysis (SSA), variational mode decomposition (VMD), sparse signal recovery (SSR), etc. In this section, we have restricted our discussion to SVD, PCA, ICA and SSR only.
SVD is a technique which converts a 1-D signal to matrix and decomposes it using relation 2 In the above relation, both U and V are orthogonal matrices containing left and right singular vectors. S is a diagonal matrix and its diagonal values contain singular values In the above matrix, S s corresponds to singular values related to clean signal, S s,n are noisy singular value, and S n are noise only values. But removing S n and suppressing S s,n systematically, signal can be de-noised. In one of the earlier attempts, Reddy and Kumar [56] and Lee et al. [57] de-noised the PPG signal using SVD. Biagetti et al. [58] complemented SVD with Hankel transform to remove MAs. Hankel transform is a closely related transform of the Fourier transform especially in case of radially symmetric functions. The relationship between Fourier and Hankel is given in Eq. 3 where f (r) is radially symmetric function and J 0 (kr) is zero-order Bessel's function. PCA and ICA are closely related techniques and also work similar to SVD to remove MAs. Both ICA and PCA are similar as both try to remove correlation. While PCA keeps higher order statistics but ICA tries to remove them as well. Kim and Yoo [59] made use of ICA to remove MAs.
Ghosal et al. [60] compared ICA and PCA and found that both ICA and PCA have comparable performance in signal reconstruction but PCA is better for noise reduction for SNR above 25 dB. Galli et al. [61,62] used maximal incorrelation between PCAs generated from PPG and ACC. In [62], the authors used PCA which have minimal correlation with ACC but they refined the criteria and used PCAs generated from all ACC signals and had incorrelation less than the threshold.
Sparse signal recovery (SSR) is another decomposing technique which is gaining popularity in recent years in signal processing domain. In SSR, signal is reconstructed using sparsity assumption, i.e., signal has limited number of non-zero components in its spectra. SSR performance is degraded if the signal is not sparse. Sparsity can be achieved using any signal decomposition technique like PCA, etc. Karna and Kumar used PCA for MA reduction during SSR [63]. In the following discussion, we have chosen research works which rely on SSA. In SSA, the signal passes through stages namely embedding, SVD decomposition, grouping, and reconstruction [64]. During the embedding stage, each signal is converted into a L-trajectory matrix which is then decomposed to linearly independent rank 1 matrices using SVD. Rank 1 matrices are classified into groups having the same or harmonically related oscillatory components. Groups which have frequency components inside specified range are chosen and signal is reconstructed again. The reconstructed signal after SSA generally has low artifacts. Ziling Zhang utilized SSR in a number of works [64,65] where at first SSA was performed followed by taking temporal difference of signals. The author observed that the first-and second-order difference maintained the fundamental frequency and harmonics. The difference signal was subsequently used to perform SSR. In [64,65], the authors utilized only the single ACC signal for reconstruction. However, researchers in [66,67] included all ACC signals as they observed that additional information from other channels was helpful in reducing MAs.
As signal decomposition-based techniques have been developed over a period of time and as such are able to remove MAs both in stationary and quasiperiodic signals.

Other techniques
Aside from adaptive filtering-and signal decomposition-based MA removal, methods based on signal statistics, spectrum subtraction, Wiener filtering, and heuristics, etc. are in wide use as well. Hayashi and Ooi [68] converted tri-axis ACC into exercise intensity and based on mean and standard deviation of exercise intensity detected and removed MAs by scaling the spectrogram using Gaussian distribution. Dubey et al. [69] considered PPG and ACC signals to be quasiperiodic and hence can be modeled by finite harmonic sum model (HSUM). The joint HSUM was used to remove MA frequency from HR frequency. Schack et al. [70] reduced the MAs by summing the squared spectra. This summation enhanced the common spectral components in PPG and reduced the MArelated components. Harmonically related components of MAs were further reduced by Gaussian bandpass filtering.
A very simple, yet, useful approach is to subtract the spectra of ACC from PPG spectra. However, subtracting ACC spectra from PPG spectra results in many negative peaks in resultant spectra. Moreover, at times, spectral components related to heart rate may either be totally removed or suppressed significantly. Nonetheless, this subtraction approach was used in number of works like [66,[71][72][73][74]. Researcher used different methodologies to deal with the abovementioned problems. While Zhu et al. [71] set all negative peaks to zero after subtraction, [72] used heuristics to counter their negative peaks. Nowak et al. [73] had applied non-negative matrix factorization before applying subtraction, and Sun et al. [74] introduced a new methodology named as Grid-less Spectral Estimation (GRESS) which is basically a sparsity-based method for spectral estimation. Mashhadi et al. [75] introduced a novel method of spectral division in which PPG spectra are divided by ACC spectra. The author had opinion that this method is useful as it automatically scales PPG spectra and has linear time complexity.
Wiener filter is another powerful technique to remove motion-related artifacts from PPG which was employed by [76][77][78]. Wiener filter is a filtering technique used in digital signal processing to enhance SNR. The formulation used by Temko et al. [76] is given below In Eq. 4, P XX (f ) is the power spectrum of noisy PPG and P NN (f ) is the spectrum of ACC signals. By studying the signal properties, one can introduce heuristics to remove motionrelated components. This heuristical approach has been used in studies like [79][80][81]. A summary of the different MA removal techniques is given in Table 1. From Table 1, it is apparent that the authors generally used both PPG and all ACC signals which were first subsampled and passed through a bandpass filter [0. [4][5]. Then, resultant subsampled and filtered signals were cleansed using techniques discussed above. Sometimes, the technique developed had specific requirements like ring type sensor in the case of Wood and Harry [30] or it used some specific model like the exercise model as in the case of Biagetti et al. [58].
From the discussion detailed above, it can be inferred that MAs introduce nonstationarity in cyclo-stationary PPG signals. However, techniques based on adaptive filtering, signal decomposition, heuristics, and other methods have been developed to the extent that they can remove MAs with satisfactory results.

Heart rate tracking
In PPG-based HR tracking, detection and tracking of HR is the complementary part. There are many approaches to track the HR and generally they use signal properties, heuristics, and features extracted from signal. We have categorized these techniques into signal decomposition, signal subspace, signal processing, and machine learning-based methods. The trackers are named as signal decomposition-based tracker (SDT), signal subspace-based trackers (SSTs), signal processing-based trackers (SPT), and machine learning-based trackers (MLT) based on the properties they are using. These tracking methods are discussed below starting with signal decomposition-based trackers.

SDT
SDT is a tracker which uses high-level signal decomposition in MA removal stage. Among decomposition techniques, only trackers based on SSR, EMD, and short-term Fourier transform (STFT) are discussed here. SSR was used by Zhang et al. in number of works [64][65][66]. The ground work was conducted in [64] in which the author used a single ACC signal as noise reference. TROIKA, as a full-fledge framework was presented in [65]. TROIKA is based on SSA, temporal difference of reconstructed signal, SSR, spectrum estimation, and spectral peak tracking with verification. These processing steps were grouped together in initialization, peak selection, and peak verification stages. Another framework closely related to TROIKA is Joint Sparse Spectrum Reconstruction (JOSS) [66]. Instead of utilizing single ACC signal as reference, it used all three ACC signals. Moreover, the authors included Smoother algorithm [82] to predict the value of HR when HR change was greater than 10 in the peak verification stage. It added peak discovery as the final stage during which Smoother algorithm was again used with wider search area to detect macro-trend. The authors reported the reduction of error from TROIKA [65] to JOSS [66].
EMD is a technique which operates in signal domain to remove MAs for better HR tracking. However, EEMD and Complex EEMD (CEEMD) are preferred over EMD  because of producing IMFs with better physical interpretations. CEEMD was used in research work [51] where authors concluded that after denoising (+MA removal) by CEEMD, signal was made clean to extend that simple rules based on limited HR change (≤ 10) between consecutive windows and two spectral peaks were enough to tracking HR. Among both peaks which ever has HR change from previous peak less than a defined reference was considered as final peak. If none of the peak had HR change less than the reference, then the final HR was based on direction and magnitude of error. EEMD was used by Emroz et al. [52], and they tracked HR changes from multiple sources including EEMD decomposition, RLS filtering, time domain processing, and crude PPG using different sampling frequencies (25,125,250, and 500 Hz). However, for tracking, technique worked in progressive way and tested whether HR calculated from raw PPG was similar to HR calculated from other sources. If yes, then PPG HR was considered as final. Otherwise, HR from other sources (one at a time) was final. For different sampling frequencies, the authors reported mean error of around 1 as shown in Table 2.
STFT is yet another powerful technique which decomposes the signals. However, it does not generate new signals like EMD or SSA. It simply divides the given signal into multiple time windows. By calculating FFT of each window, we can get useful information regarding the change in frequency contents with time. STFT-based approaches were adopted by [72,83]. Work in [72] was composed of normalization, conditioning, and masking of spectrogram to calculate the final HR. Spectrogram conditioning dealt with the spectral subtraction of ACC from PPG and it used dynamic time warping. Final HRs were based on the shortest path search problem. Insoo et al. [83] proposed a cost function to give score to PPG peaks based on normalized energy, closeness to motion, and previous HR peaks. Finally, HRs are detected using a model based on previous peaks. While work in Zhing and Jafari [72] evaluated their system performance on standard TROIKA dataset [65], Insoo et al. created their own dataset and used it to report and evaluate their system performance. Wavelet and VMD are other decomposition techniques which can remove MAs so that simple heuristics-based algorithm can be used to calculate HR. These approaches were adopted in [46,84] where authors concluded that HRs were tracked robustly.

SST
SSTs are based on orthogonality condition which states that spectral space related to HR is orthogonal to spectral space which contains MAs. Mathematically, this condition is represented in form of Eq. 5 where F HR and F MA are HR and MA spectra, respectively. The operator '. ' is dot product of both spectra and it is zero when F HR and F MA spectra are normal to each other. Closestsubspace algorithm for reducing motion artifact (CARMA) and spectral filter algorithm for MAs and HR reconstruction (SpaMA) are two of the most famous SSTs. CARMA, presented by Baca et al. [85], used Hankel transform to model heart and motion-related spectra. Then, it removed motion-related spectra from noisy PPG spectra employing SVD decomposition as SVD gives output in terms of orthonormal components. Afterwards, a tracking model was introduced which the authors claimed to track the HR even in the cases when initially tracker was off track. Biagetti et al. [58] updated the original CARMA  technique in another work where they introduced the exercise models to further enhance its capabilities to track HRs. SpaMA was developed by authors [80] who claimed that clean PPG could be constructed using just four spectral peaks. Out of four peaks, three peaks from PPG spectra and a major peak from acceleration spectrum are selected. The reconstructed clean PPG signal can be used for HR tracking. During the technique, first, both PPG and acceleration signals were downsampled and power spectral density was calculated. Then, three peaks from PPG and a single peak from acceleration were chosen and compared. If the output of the comparison was positive, then discard the first peak as it is corrupted and HR is the frequency associated with the second peak. If both first and second were corrupted, then HR was the third frequency. If all three peaks were corrupted, HR was HR of the previous window for real-time application and HR was calculated using cubic spline interpolation for offline application. Aside from HR calculation, the authors reconstructed the PPG using sample amplitude, phase, and frequency information which was used for HR variability. Reiss et al. [86] extended the algorithm based on the fact that HR from the last window was not so robust. The authors then tracked the HR using mean of the last six windows when HR was uncertain. This new tracker is known as SpaMa plus. Cross bicoherence test [41] and Harmonic sum (HSUM) [69] are two measures which consider PPG and ACC signals occur in spaces which are normal to each other. In their work [41], the authors observed that significant bicoherence was present for highly correlated PPG and ACC data segments. These contaminated signal segments were cleaned using second-order Voltera filter. The cleansed segments were fed to Eigenvalue method and finally HRs were calculated using a model. Dubey et al. [69] first employed HSUM to model ACC signals from which frequencies related to MAs were calculated. Another HSUM was developed for PPG which was basically a joint HSUM and contains frequencies corresponding to both ACC and HR. Since frequencies related to motion were already determined using ACC HSUM, the remaining frequencies corresponded to heart rate.
All SSTs except SpaMa reported their system performance on dataset of IEEE Signal Processing Cup containing training data of 12 subjects [65]. SpaMa used complete TROIKA and Chon Lab datasets [15]. The authors concluded that SpaMa had reported the least error among its competitors.

SPT
SPT are the ones which use classical signal processing techniques like filtering, correlation, etc. Schack et al. [70] introduced a tracker which they claimed to be useful for real-time applications as it used very less computational resources. During processing, both PPG and ACC signals were filtered using FIR bandpass filter with 0.5-6 Hz and subsequently downsampled to 25 Hz. The vectors composed of normalized autocorrelation of both PPG channels and a vector of cross correlation of both PPG were calculated. The squared spectra of these vectors were added together to enhance the common components in PPG. The resultant squared spectrum was then filtered using a Gaussian band stop filter. The parameters of bandstop were controlled using maximum energy values in ACC signals. Finally, HR was tracked using a cost function dependent on least square fit of the last three HRs.
Torres et al. [87] and Zhu and Du [67] both used bandwidth adjustment for HR calculation. Torres et al. [87] approach was basically a heuristically based approach composed of peak selection and verification stages which iteratively changed the bandwidth of Blackman Harris window filter. Zhu and Du [67] selected two peaks from cleansed PPG signal and constructed a 0.32 bandwidth filters around peaks. HR was calculated by considering weighted heart rate of both channels which was then passed through moving average filter (average of last 4 and future 4 heart rates plus present) and smoother algorithm ( for start (first five) and tail (last four)). Fallet and Vesin used adaptive notch filtering in both of their studies to segregate the HR from MAs [26,27]. HR modeling is yet another approach to calculate and track HR which appeared in number of research works. While [44,61,62] modeled HR using Kalman filtering which passes through prediction, estimation, and update stages, Ahamed and Islam [29] developed an empirical model for estimation. The estimation was then passed through mean filter and a smoother same as [67]. Phase coder is a technique to enhance the resolution of FFT by transforming to Polar coordinates. Phase coder was used in works conducted by [34,76] where it was complemented by simple heuristics and TROIKA [65] for HR calculation, respectively. Viterbi algorithm (VA) is a technique to explore the hidden states in a sequence. It was exploited by [73] where it was used along side particle filtering [88]. Particle filtering is self appeared to calculate HR. Phase vocoder (PV) and VA appeared together in a work [78] to calculate HR tracking, respectively.
Aside from classical signal processing techniques, adaptive filtering can also remove artifacts to such an extent that HR tracking can be done directly. Adaptive filtering is a powerful technique to remove noise/MAs from noisy PPG using ACC signal/signals. Strong heuristics can be created to estimate the HR while these heuristics may be empirical/observation based. Heuristics may be simple/complex based on proceeding noise/MA algorithm. A number of authors have used adaptive filtering to first clean the PPG signals from which HR were calculated using either modeling or heuristics. Heuristic-based trackers appeared in number of studies like [33,79,[89][90][91][92][93][94][95]. Pan et al. [89] employed HR modeling along with heuristics to reduce the error. Mashhadi et al. [75], instead of using heuristics, employed lazy tracker which was basically an averaging tracker. The strongest motivation in the development of signal processing-based tracker is to develop such a tracker which has low computational cost and as such can be used for realtime processing. The trackers discussed above generally fulfill this criterion, but they also have reported very small absolute errors and as such are very robust in nature as well.

MLT
Machine learning is a comparatively new approach to solve signal and image processing- The system introduced by Roy et al. [101] was a personalized system, and it focused on single PPG beat instead of the whole PPG signal. During the training phase, two multilayer perceptron-ANN (MLP-ANN) were trained. One of them accessed the quality of PPG as being clean or noisy and this ANN also generated a reference beat template. Input to other ANN are set of features and weight matrix. The features are generated by auto encoder neural network (AE-NN) after PPG beat matrix is updated after PCA reconstruction. The updated PPG beat matrix is optimized using Particle Swarm Optimization (PSO) resulting in weight matrix which is used as second input to NN. This second ANN uses PSO and AE-NN during the training phase but it deploys only AE-NN during test. The output of this second ANN is another weight matrix which rectified the noisy beats. Since beats vary in length, networks are trained in such a way that each incoming beat is truncated or extrapolated based on template beats. Another work which uses similar setting for processing is reported in MoDTRAP [100]. This system uses EEMD and LSTM for reference beat generation. However, VMD is used for heart rate tracking. The author suggests that VMD is able to extract heart rate even when heart rate is totally suppressed in the spectrum. Both of these systems reported a very high classification for heart rate tracking and respiration rate both a private as well as on SPC11 dataset.
Reiss et al. [86] and Biswas et al. [20] both used CNN and CNN and LSTM, respectively, for PPG-based heart rate tracking. CNN is basically a deep learning approach and image as well as signal can be given as input to CNN. Generally, images are given as input and network is trained. Since training period is usually extended, a number of CNNtrained networks are available online like Densely Connected Convolutional Networks [102], ImageNet [103], and Inception [104]. If, any of these off-the-shelf solutions is to be used, then signal must be converted to image [105]. However, if it is required that signal must be retained in original form due to inherent signal properties, then CNN has a unique capability that it can be used as feature extractor as well. In such a case, CNN is complemented by another network like LSTM. Since heart rate tracking is basically a regression problem, CNN and other classifier can be implemented in such a way that they can handle regression problem. LSTM being a memory module can model HR changes as a sequential process. Reiss et al. [86] introduced the use of CNN as an approach alternative to classical approaches for PPG-based HR tracking. Researchers concluded that it could remove or at least reduce the parameters used in classical approaches. The authors in Biswas [20], PPGnet [98], and PPnet [99] used PPG in its signal form unlike Reiss where it was first converted to image using STFT. Here 1-D CNN was used as feature extractor and the extracted feature were fed to LSTM which tracked the heart rate. While PPGnet is only a heart rate tracker, PPnet also gives diastolic blood pressure (DBP) and systolic blood pressure (SBP). Tracker introduced by Biswas was used for personal identification as well. All of these studies reported small average error. The main drawback of using the machine learning is the computational load associated with this approach. This computational load is especially very high during training phase. A number of recent studies [96,97] has focused on this problem, and instead of using real number for bias and weight, they have trained and evaluated their system performance using binary weights and biases. The resulting system is named as Binary Cornet (b-Cornet) which uses binary CNN (b-CNN) and binary LSTM (b-LSTM). The studies showed that they had results which are comparable to state-of-art. SVM is a linear classifier which is used for linear/non-linear classification and is generally used in PPG-based HR tracking to classify the spectral peak, related with HR. Sun et al. [74] and Xiang et al. [35] both trained their SVM empirically and did classification based on spectral peak separation and spectral peak ratio. RF is yet another choice for classification and was used by Yalan Ye et al. [42] and the author found it to be robust. Zhu et al. [71] developed a model for variation but employed neural network and linear regression for HR tracking.
Machine learning-based trackers suffer from limitation of input data for training purposes which is resource hungry phase. However, once a classifier is trained, testing is generally fast. In case of PPG-based HR monitoring, data is more scarce and TROIKA is one such publicly available downloadable source. However, despite this scarcity, researchers have put forward systems which can track HR robustly [35,71].
From the discussion about trackers presented above, it can be inferred that different trackers have been able to track heart rate both online and offline. A summary of all trackers is presented in Table 3. Any trackers can be broadly categorized into machine learning tracker (MLT) or non-machine leaning tracker (NMLT). While NMLT requires preprocessing which limits the solution space to spectral range between 0.4 and 6 Hz, these trackers generally use exercise model and ACC signals to decorrelate the heart rate from MAs. The ability of NMLT to track in real time is dependent on model as SST can be tailored to track in real time and SPT are designed to operate in real time. However, all NMLT have been tailored for general solution and are not subject specific in nature. On the other hand, MLT are dependent on training, and once trained, they are operated in real time. These trackers do not limit the search space to general heart rate spectra [0. [4][5][6]. Moreover, they are subject specific and will track multiple biometric markers simultaneously in future as shown [20].
From the detail discussion presented above in the "Methods" section, it is evident that both preprocessing and HR tracking have become mature. One way is to heavily preprocess the signal and track HR based on simple heuristics/modeling. The other way is to shift the main work load in tracking phase. Both the tracks are producing results which show their robustness and general trend towards error reduction.  Table 1 shows that PPG signals are usually filtered using bandpass filter with cutoff frequencies in range of 0.4-5 Hz. This is due to fact that frequencies below 0.4 Hz in the PPG spectrum refer to energy contribution from DC or non-pulsatile component of PPG signals. Also, frequencies above 5 Hz in spectrum related to heart rates above 300 bpm. Usually heart rates rarely jump above 300 bpm even during high-intensity exercises. Hence, PPG signals are usually processed in frequency range 0.4-5 Hz. Downsampling to lower frequency is often used with bandpass filtering to reduce computational cost of algorithm and to make algorithm fit for real-time applications. Researchers have used different combinations of PPG and ACC signals to detect and remove MAs. The actual MA removal techniques include adaptive filtering; ACC modeling like Volterra, etc.; multi-resolution signal decomposition like wavelet, etc.; matrix decomposition like SVD, etc.; and others. All the techniques mentioned are able to remove MAs to the extent that researcher deemed them fit for their tracking algorithm. However, sometimes, researchers have developed algorithms which needed additional requirements like Pengefi et al. [25] which required additional reference or Levi [30] which was specifically developed for Joggers. The dataset generally used for mentioned researches is IEEE SPC 2015, because it is a publicly available dataset and is labeled as well. The advantage of one preprocessing technique over the other for MA removal cannot be established directly as quantification measures are not available. Moreover, both PPG and acceleration signals might be single channel or multi-channel inputs. However, techniques which use all the information available, generally, have reduced errors (Error1 and Error2). This is due to the fact that probability of a clean channel among the multi-channel PPG is high. However, processing multi-channel PPG and ACC takes time. A composite ACC signal can be constructed directly from the relative intensities of ACC channels or it can be constructed using major spectral peaks from all ACC channels. Finally, tracking is based on a single yet relatively clean channel PPG and a composite ACC signal. Table 2 shows the HR tracking techniques along with quantified results. The results presented are for all types of trackers mentioned above, i.e., signal decomposition, signal subspace, signal processing, and machine learning based. The performance of these trackers is usually quantified using averaged absolute estimation errors, Error1 and Error2, their standard deviation, Pearson's correlation, and Bland-Altman's (BA) plot. Error1 and Error2 quantify overall performance of system including preprocessing and tracking as there is no separate measure to quantify the performance of MA removal technique. Error1 and Error2 are defined below where BPM est (i), BPM true (i) represent estimated and ground truth in ith window, respectively, and W is total number of time windows. Pearson correlation (PC) and Bland-Altman both show how close the estimated heart rate and ground truth are. PC is a well-known measure which varies between [0,1] where high values show high correlation. Bland-Altman is a graphical representation which is also used to show correlation. Figure 6 shows an example of PC graph and BA plot. The results discussed in Table 2 are in terms of Error1 and Error2 as it is a general observation that when Error1 and Error2 are reduced then other errors are reduced as well. Hence, the following discussion will be focused only on Error1 and Error2. An overview of Table 2 shows that Error1 and Error2 are getting reduced with every new research for each category. For example, TROIKA [65], published in 2015, one of major HR trackers, had errors of 2.42 and 1.82 in comparison to Emroz et al. [52], 2016, which have 1.02 respectively. Similarly, the error of 1.03 reported by Zhu et al. [71] is much less error reported by Biswas et al. [20]. Moreover, the smallest error in each category is around 1 for the class type01 exercises [14]. But when the whole dataset is considered, then the error is relatively high as shown by error of 1.93 and 3.56 by SpaMA [80] or 1.96 and 1.47 reported by [20]. This is due to the fact that the whole dataset is composed of additional types of exercises. In essence, techniques discussed above still need refinement to model types of exercise in which the forehand and wrist are used extensively. Hence, applications can be built based on additional hardware, if required, which are tailored to a specific type of motions. Methods mentioned in Table 2 can be directly compared in terms of mentioned errors, but they lack comparison in terms of resources. Any method which is using any multi-resolution decomposition/adaptive filtering will take more time than any heuristic method for MA removal and HR tracking. Hence, all methods mentioned were generally developed for offline processing except SPT and MLT. The development of real-time application poses many challenges like computational load should be minimum and the tracker should not be too much dependent on history and should be able to track even when it goes off-track due to any reason. Reducing computational load is a challenge as low-resolution signal will reduce useful information. But if signal is sampled at high frequency, then downsampling to a very low frequency is challenging as design of such narrow band filters will be a problem. Moreover, the advantage of sampling might be totally compromised if the signal is downsampled to a very low frequency. Using tracking history is usually an advantage but the tracker should be used to smooth the output rather then tune or control tracker parameters.
A number of research works presented in the "Methods" section uses TROIKA dataset [65]. These research works are based on an assumption that the signal at the beginning is taken from person at rest. Hence, this patch of signal has very low noise and as such can be used as reference for future processing. However, this assumption is very strong in nature as a person might start tracking his/her heart rate when they are already in brisk motion during high physical activity. Hence, methods independent of such assumption should be focused for any application. Pearson correlation calculation and plotting [43]. The figure shows how close the estimated heart rate to the actual heart rate Finally, the author would now like to propose a system which, in theory, could track the heart rate in an online and offline application. One such system is shown in Fig. 7. In online mode, Fig. 7a, a forehead-based PPG sensor samples the signal at 500 Hz which is then sampled down to 25 Hz and bandpass filtered to 0.5-4 Hz. Downsampling and filtering reduces the computational load as well as removes spectral contents outside normal heart rates. Twenty-five hertz is chosen as compromise between computational cost and signal fidelity. The range of 0.5-4 Hz generally covers the average range of the heart rate from rest to high work out. Next, the Schack algorithm is used to remove the motion artifacts. Then, the filtered signal is fed to b-Cornet where b-CNN to extract features to access the fidelity of signal and identify the exercise type and b-LSTM is used to track the heart rate changes. Based on the signal fidelity, exercise model, and LSTM output, the spectral contents reflecting heart rate can be tracked. The b-Cornet mentioned above are trained offline.
In the offline mode, the signal is just bandpassed through filter of cutoff frequency [0.5-4] without downsampling. In this way, much of signal information will be retained. Then, the signal is reconstructed using the EEMD algorithm while removing noise/MAs. Finally, from b-Cornet, the heart rate would be tracked. One important difference from online mode is continuous update of b-CNN and b-LSTM. Every new sample which is similar to any old samples will be added to the training dataset. In this way, the system will be trained continuously for future use. The use of head bound PPG sensor introduces less MAs as the head is generally still even during high-intensity motions. Hence, more focus should be on building these head-mounted sensors.

Conclusion
In this paper, we have presented an overview of techniques in two key areas of heart rate tracking, i.e., motion artifact removal and hear rate tracking. The research works on motion artifacts were grouped in terms of signal processing techniques that are mostly used. However, heart rate tracking was categorized in four types: signal decompositionbased trackers, signal subspace-based trackers, signal processing-based trackers, and machine learning-based trackers. From the review, it can be summarized that both areas of heart rate tracking are interlinked; hence, simultaneous work is needed for a good tracking system. We would like to point out that two key areas of exploration in this topic are real-time tracking system and creation of public and labeled datasets. The realm of creating a system which could track heart rate in real time is a challenging problem because of variance present in PPG signals. It is very difficult to capture this variance even in systems working offline. However, to capture this variance in real time is challenging owing to high computational resources associated with it. Different systems have been proposed in recent studies but a system which has passed clinical trials has not yet implemented. It is a very difficult to create and label a dataset which could capture heart rates under different human activities. Even creating a dataset from rest to different highintensity exercise is resource-hungry job. To create a dataset which contains normal and pathological conditions is still an open research area.