A robust multi-feature based method for distinguishing between humans and pets to ensure signal source in vital signs monitoring using UWB radar

Pets have been indispensable members for many families in modern life, especially significant for the elderly and the blind. However, they may cause false alarm when misused as signal source in non-contact monitoring of the vital signs using ultrawideband (UWB) radar. Distinguishing between humans and pets can help ensure the correct signal source. Nevertheless, existing solutions are few or only utilize a single feature, which can hinder robustness and accuracy because of individual differences. In this study, we proposed a robust multi-feature based method to solve the problem. First, 19 discriminative features were extracted to reflect differences in aspects of energy, frequency, wavelet entropy, and correlation coefficient. Second, the features were ranked by recursive feature elimination algorithm and the top eight were then selected to build an optimal support vector machine (SVM) model. The area under the receiver operating characteristic curve (AUC) of the optimal SVM model reached 0.9620. The false and missing alarms for identifying humans were 0.0962 and 0.0600, respectively. Finally, comparison with the state-of-the-art method that only employed one feature validated the advance and accuracy of the proposed method. The method is envisioned to facilitate the UWB radar applications in noncontact and continuous vital signs monitoring.


Introduction
Non-contact and continuous vital signs monitoring are gaining increasing research interest for their considerable market potential and significant promotion of people's welfare [1][2][3]. Ultra-wideband (UWB) radar is an effective method for realizing the above goals because of its penetrability of nonmetallic obstacles, stability in light conditions, and protection of visual privacy [4][5][6]. Many studies have employed UWB radar to recognize hand-based gestures [7], detect fall incidents [8], monitor sleeps [9], and obtain images [10], etc. However, a literature review showed that distinguishing between humans and pets in applications of vital signs monitoring using UWB radar has rarely been explored.
According to the investigation [11], there were 73.55 million pet owners in China's cities and towns in 2018, and dog owners alone accounted for approximately 46.1%. In particular, Guide-dogs, aside aiding people with blindness, are indispensable: they accompany their owners in daily routines and medical settings where continuous and non-contact monitoring is desirable. However, the vital signs of many pets are closely similar to those of humans, which may cause false alarm when the pets' signs are misused as the signal source. Therefore, it is inevitable to explore methods for distinguishing between humans and pets to ensure the correct signal source in non-contact health monitoring using UWB radar.
Some research groups have explored distinguishing between humans and animals using UWB radar. Björklund et al. [12] used a support vector machine (SVM) to distinguish between approaching humans and animals in perimeter protection. A combination of Gaussian mixture and hidden Markov models was proposed in [13] to distinguish be-tween slow-moving animal and human targets for border safeguarding and wildlife anti-poaching operations. Otero [14] distinguished walking humans from animals by extracting key features based on the different motions of the torsos and limbs. These studies all focused on distinguishment of moving targets by identifying motion differences. However, they can be unsuitable for distinguishing humans and pets in vital signs monitoring where the targets are expected to keep stationary. Generally, only respiration and heartbeat signals can be obtained when using UWB radar for vital signs monitoring.
Wang Y. et al. [15] extracted wavelet entropy to distinguish between stationary humans and dogs under through-wall condition using UWB radar for post-disaster rescues. Ma Y. et al [16] further improved the accuracy for the above application scenario by extracting 12 features and combined them using machine learning. Nevertheless, only respiration signals could be obtained under through-wall condition using UWB radar. In addition, the differences of the extracted features under through-wall condition were enlarged after signal processing such as filtering. The existence of the wall could introduce strong noises, whose intensity and order degrees are smaller than those of humans but can be close to those of the dogs. Thus, after filtering, most wallintroduced noises within humans' signals can be eliminated, whereas many parts of noises were still left within dogs' signals. This could facilitate the distinguishing under through-wall condition. However, there are usually few strong noises introduced by heavy obstacles in applications of vital signs monitoring. Thus, the methods in [15,16] may not stay efficient for distinguishing in vital signs monitoring. What is more, the heartbeat signals which are available in vital signs monitoring were not effectively used in the aforementioned methods.
Wang P. et al. [17] proposed a feature (energy ratio of respiration to heartbeat) to distinguish humans and animals in vital sign monitoring using a UWB radar with a center frequency of 7.29 GHz. However, the robustness and accuracy of the method cannot be guaranteed by the single feature because of individual differences.
In this study, after signal preprocessing, a total of 19 discriminative features were extracted to reflect the differences between humans and pets in aspects of energy, frequency, wavelet entropy, and correlation coefficient. Dogs were chosen as the typical representative of pets herein considering the experimental feasibility and the largest proportion of dogs in pets. The 19 features were first ranked based on their contribution to the classification tasks, and the top eight features were then selected as the optimal feature subset. Finally, the optimal feature subset was combined using the SVM method to build an optimal SVM model, whose classification performance can reach the highest. The false and missing alarms for identifying the human target of the optimal SVM model were 9.62% and 6%, respectively, which can meet the classification accuracy requirements for distinguishing humans from pets in vital signs monitoring. These results are envisioned to facilitate the human-aimed UWB radar techniques such as vital signs monitoring.

UWB radar system
An X4M02 (XeThru/Novelda, Oslo, Norway) UWB radar system on a chip [18] was utilized in this study. The block diagram is illustrated in Fig. 1. First, a sequence of Gaussian-like pulses is generated by the transmitter, which functions as a direct radio frequency synthesizer. The center frequency and bandwidth of the pulses are programmable and regulated by the crystal oscillator (OSC) and phase-locked loop (PLL). Then, the pulses are sent to the transmitting antenna (TA) to probe the micro vibration on the chest caused by the respiration and heartbeat of the targets. Further, the return pulses are received through the receiving antenna (RA) and transferred in turn to the high-pass filter (HPF), low noise amplifier (LNA), and sampler for filtering, amplifying, and sampling, respectively. Finally, the preliminarily processed pulses are sent to the host computer through the serial peripheral interface (SPI) and USB cable for signal preprocessing and further classification tasks. The average output power of the radar is lower than − 44 dBm/MHz, which meets the average transmitted power requirements of the Federal Communications Commission and the European Telecommunications Standards Institute.
The raw echo data are stored within the host computer in the form of waveforms and denoted by Raw(m, n), where m = {1, 2, ⋯, M} represents the range points along the fast-time index, and n = {1, 2, ⋯, N} represents the time points along the slow- time index. The fast-time index is associated with the detection range along each received waveform. The slow-time index is associated with the data acquisition time along the measuring duration and is denoted by t. Here, M = 186 range points were recorded by the utilized radar, and the value of N depended on the practical data acquisition time. The slow-time scanning speed of the UWB radar was 17 Hz; thus, the slow time can be written as N = 17 t. The key parameters of the UWB radar utilized in this study are summarized in Table 1.

Signal preprocessing
The raw echo data contain environmental noise interference, making it necessary to implement signal preprocessing procedures for the extraction of true and clear respiration and heartbeat signals. The signal preprocessing involves three steps: direct current (DC) removal, low-pass (LP) filtering, and least mean square (LMS)-based adaptive filtering. Figure 2 illustrates the corresponding flow chart of the preprocessing steps. DC removal is first applied to remove the DC component and baseline drift. The step is implemented by where R DC (m, n) denotes the signal after DC removal. Afterward, LP filtering with a cut-off frequency of 5 Hz is performed to filter out the high-frequency noise interference while retaining the respiration and heartbeat signals. LP filtering is determined as where R LP (m, n) represents the signal after the 5 Hz LP filtering, h(t) represents the impulse function of the finite impulse response filter [19], and * represents the convolution operation. Finally, adaptive filtering based on the LMS algorithm is employed to suppress strong clutter and eliminate the respiration-like clutter. The detailed procedure and algorithm of adaptive filtering are described in [20].
After the three signal preprocessing steps, a clear signal S(m, n) is obtained, making it easier to observe the true respiration and heartbeat signal of the target. All further feature extraction procedures are based on the preprocessed signal of S(m, n). Figure 3 depicts a comparison of the raw echo data and corresponding preprocessed radar signal when a human target is approximately 1 m away from the UWB radar under free-space conditions. In vital signs monitoring, respiration and heartbeat signals can be obtained simultaneously using UWB radar. It is important to make full use of the available signal resources for better distinguishing between humans and pets. This section illustrated the method for separation of respiration and heartbeat signals. All subsequent features are extracted based on this basis. After signal preprocessing, the radar signal of S(m, n) generally contains components of respiration, heartbeat, harmonics of respiration and heartbeat, and clutters, which can be expressed as: where R denotes the respiration signal, H denotes the heartbeat signal, har denotes the harmonics of respiration and heartbeat, and C denotes small clutters. Then, the respiration and heartbeat signals can be extracted through variational mode decomposition (VMD) [17].
VMD is a generalization of the classic Wiener filter into multiple adaptive bands. It can decompose the original signal into a set of variational intrinsic mode functions (VIMFs) through an entirely non-recursive method, which is quite robust to sampling and noise [21]. VMD can be understood as a constrained variational problem as: where S represents the original input signal, {u k } = {u 1 , u 2 , ⋯, u K } represents the decomposed VIMFs, ω k represents the frequency of the corresponding variational intrinsic mode function u k , s.t. means subject to. The detailed algorithm is illustrated in [21]. When the number of VIMFs is set as K = 4, the respiration and heartbeat signals can be extracted and separated as R = VIMF1 = u 1 and H = VIMF2 = u 2 , respectively [17]. The spectrum of the fast Fourier transform (FFT) of a preprocessed human signal at the target position is shown in Fig. 4. The four VIMFs obtained by VMD and the corresponding FFT spectra are illustrated in Fig. 5.

Feature extraction, sorting, and selection
To distinguish a stationary human target from a dog target in vital signs monitoring, it is important to extract features with strong discriminative abilities from different aspects. In this study, a total of 19 specific features belonging to four different categories were extracted from the preprocessed radar signal S(m, n), respiration signal R(m, n), and the heartbeat signal H(m, n). The outline of the 19 features is described in Fig. 6.

Energy ratio of respiration to heartbeat (ERRH)
Compared to humans, the heart of dogs is located closer to the front ribs of the chest, and the muscle and fat layers between the skin and the heart are thinner [22,23]. Thus, the heartbeat of a dog target can be detected with more relative intensity because of the smaller electromagnetic wave attenuation caused by fat and muscles. Moreover, the lung-to-heart weight ratio of a human target is approximately three times larger than that of a dog target [17,24]. These differences in body structure between human and dog targets can jointly cause apparent differences in ERRH, which is calculated as follows: where p denotes the target position that can be determined by choosing the range points with maximum energy. The energy of one position can be calculated by summing the square of all the amplitudes along the slow-time index.

Energy change rate in fixed range window (ECR)
Within the preprocessed radar signal S(m, n), several range points are related to the micro vibration on the chest of the target because the human and dog chests have a certain thickness. Thus, the micro vibration on the chest, which is caused by the breath movement and heartbeat, should be reflected between a certain distance ranges. Hence, the feature of ECR is proposed because there is a significant difference in the body size and chest vibration intensity between human and dog targets. The ECR is calculated as follows: where w h represents the half-width of the fixed range window and was empirically chosen as w h = 3 in this study. Thus, the fixed range window contains a total of seven range points. The equivalent detection is approximately 0.18 m, close to the thickness of the human chest. The value of w h may change when UWB radar of different range resolution is used, and the width of the fixed range window should be adjusted to close to the thickness of human chest.

Range width between half maximum energy (RWHME)
Because the intensity of the micro vibration on the chest may differ between human and dog targets, the number of range points related to the micro vibrations between both targets should also be different. The feature of RWHME is extracted to reflect the above difference and is determined by where R hme and L hme denote the two range points where the energy is half the maximum energy at the target position p.

Energy ratio in the reference frequency band of respiration (ERR)
The normal frequency band of the respiratory rate of a human target is different from that of a dog target [25,26]. Thus, the energy ratio of human's respiration in the human-respiration reference frequency band should be larger than that of a dog's respiration in the human-respiration reference frequency band. Therefore, the feature of ERR is proposed to reflect the above difference. To calculate the ERR, ensemble empirical mode decomposition (EEMD) is first performed on the respiratory signal R to further eliminate small respiration-like clutters and obtain a much clearer respiratory signal. EEMD can decompose an original complicated signal into a collection of intrinsic mode functions (IMFs) by adding a random white noise signal [27,28], as illustrated below: where k = 1, 2, ⋯, K represents the number of trails, namely, the times that Eq. (9) is repeated; wn k represents the randomly added white noise at trail k; and R k represents the synthesized signal. R k is then decomposed as L k IMFs and the corresponding residue Res k . Here, K was chosen to be K = 50. Further, when completing the K trails, the final number of IMFs is chosen by selecting the minimum value of L k : The final IMFs that are regarded as the EEMD results are obtained by calculating the ensemble average of the corresponding IMF, as illustrated below: Some selected IMFs are then summed to synthesize a new clear signal, and the remaining IMFs are regarded as noises and discarded. The discriminant method for selecting the truly needed IMFs here is given by where das i denotes the discrete autocorrelation sequence of IMF i , and β i is the energy concentration of das i within the interval n ∈ [N − 1, N + 1]. When β q > 2, the corresponding IMF q is regarded as the first denoised IMF component [29]. Thus, the new clearer signal with small clutters eliminated can be resynthesized by The remaining IMF i , where i ∈ [i, q − 1], represents small respiration-like clutters with a higher frequency. Based on the above procedure, ERR can be calculated as follows: where ER re _ rfb represents the energy of R re on the human respiration reference frequency band, and ER re represents the energy of R re . The human-respiration reference frequency band is defined within [0.2 Hz, 0.4 Hz] [16,30].

Energy ratio in the reference frequency band of heartbeat (ERH)
Similar to the procedure for calculating ERR, EEMD is also implemented for the heartbeat signal H, and a clearer signal H re can be resynthesized with small high-frequency heartbeat-like clutters being eliminated. ERH is determined as where EH re _ rfb represents the energy of H re in the human heart rate reference frequency band, and EH re represents the energy of H re . Here, the human heart rate reference frequency band is defined within [1 Hz, 1.7 Hz].

Eight frequency-category features
As stated in the calculation description of ERR, there should be a difference in energy distribution at different frequencies between human and dog targets. To reflect this difference, two features of the energy ratio on the reference frequency band have been extracted. Here, eight frequency-category features are extracted to reflect the energy distribution differences more subtly. The Hilbert margin spectrum can appropriately describe the energy contribution of each frequency value [30] and is suitable for nonlinear and nonstationary vital signals based on the EEMD algorithm. The detailed calculations are as follows.
First, the Hilbert transformation [31] is performed on the IMFs that are obtained by Eq. (11), as illustrated by where IMFH i denotes the Hilbert transform of IMF i , P H denotes the Cauchy principal value of the singular integral, and t denotes the slow-time value. Then, the Hilbert marginal spectrum can be determined by where α i (t) represents the amplitude of the pre-envelope of IMF i , and ω Hi (t) represents the instantaneous phase.
1~3. ωH R, 1/4 , ωH R, 3/4 , and width between ωH R, 1/4 and ωH R, 3/4 (WH R ) The difference in the middle part of Hilbert margin spectrum in respiration signals between humans and dogs is found to be the most significant [16]. Thus, ωH R, 1/4 , ωH R, s:t: Similarly, ωH R, 3/4 is the frequency value, where the cumulative energy value is 3/4 of the total energy in the Hilbert marginal spectrum of the respiration signal R. It can be given by s:t: Then, the width between ωH R, 1/4 and ωH R, 3/4 is calculated by 7~8. Respiratory rate (Rr) and heart rate (Hr) Rr and Hr can be calculated by performing Fourier transformation on the respiration signal R and heartbeat signal H, respectively.

Four wavelet-entropy-category features
Wavelet entropy can provide information associated with the functional dynamics of order/disorder microstates quantitatively based on an orthogonal discrete wavelet transform [15,32]. It is believed that vital signs of respiration and heartbeat of humans are more regular than those of dogs [15,16]. That is, the microstates order of the human target should be better. Thus, four wavelet-entropy-category features were extracted to reflect the above difference.

Mean wavelet entropy at target position (MWE)
In the calculation of MWE, a set of elementary functions called the wavelet family ψ a, b (t) is generated by dilations and translations of a unique admissible mother wavelet ψ(t), as illustrated below: where a and b are the scale and translation parameters, respectively. The wavelet becomes narrower as a increases. For a special election of the mother wavelet function ψ(t), a and b are chosen as a l = 2 −l and b l, q = 2 −l n with l ∈ Z. Z is the set of integers, l is chosen as l = {−1, −2, ⋯, −log 2 N}, and n is the time point along the slow-time index. The original signal at the target position can be reconstructed by discrete wavelet transformation: where coefficients C l (n) are the local residual errors between successive signal approximations at scales l and l + 1. Based on the above information, the normalized values of the relative wavelet energy rwe l at each scale of l can be calculated as Next, the wavelet entropy at target position WE can be determined by To observe instantaneous changes in each parameter more clearly, the input radar signal is divided into non-overlapping frames, with the width of each frame chosen as 128 slow-time points. Finally, MWE can be calculated as where WE fra represents the WE when the input signal is only within the part of the fra-th frame among the total target-position signal S(t).
The larger the value of MWE, the greater the disorder of the original signal. Thus, the value of MWE of the target-position human signal should be smaller than that of the dog target because the vital signs of a human target are considered to be more regular.

Standard deviation of wavelet entropy at target position (SWE)
SWE is extracted to reflect the degree of fluctuation of WE. It is calculated by 11 Mean of MWE in fixed range window (MMWE) As illustrated above, several range points are related to the target's micro vibration on the chest because of the thickness of the chest. Therefore, MMWE is extracted to reflect the difference in MWE values around the target position. The width of the fixed range window here was chosen to be the same as that in the calculation of ECR, containing 2w h + 1 = 7 range points. Then, MMWE can be determined as 12 Ratio of MWE between range points inside and outside the fixed range window (RMWE) RMWE is calculated as follows:

Two correlation-coefficient-category features
The correlation between signals along the slow-time index around the target position of a human target is much more intense than that of a dog target [16,33]. Thus, two correlation-coefficient-category features are extracted to reflect the difference from the aspect of correlation of signals at different range points.

Mean of correlation coefficient in fixed range window (MCC)
MCC is calculated as where Cov represents the covariance operation, and Var represents the variance operation. w h = 3 is half the width of the fixed range window.
14 Change rate of correlation coefficient in fixed range window (CRCC) The trends of the correlation coefficient in the fixed range window can differ between human and dog targets. Thus, CRCC is extracted to reflect the above difference and is calculated by where CC(p, m) represents the correlation coefficient between the target-position signal S(p, n) and one signal in the fixed range window.

Feature sorting and selection based on RFE-SVM
SVM is a machine learning method that can combine many extracted features for better and more robust classification tasks [34]. Recursive feature elimination (RFE) is a popular algorithm for reducing the number of input features in SVM to speed up classification tasks and reduce computational complexity [35,36]. RFE can simultaneously improve the accuracy of SVM classifications. Thus, we adopted RFE-SVM in this study to achieve the highest classification accuracy with few features when distinguishing between stationary human and dog targets in an in-home environment using UWB radar. Let X = {x 1 , x 2 , ⋯, x TS } T be the vector of input features and Y = {y 1 , y, ⋯, y TS } T , y ts ∈ {−1, +1} be the known class labels for TS samples. Then, a decision function D(x) can be built by SVM as follows: where w is the weight vector, and b is a bias value [36]. The label of x is classified as +1 when D(x) > 0 and −1 when D(x) < 0.
The detailed RFE-SVM procedure is outlined as follows: The final output is a ranked feature list r. We then built SVM models with different features in r and chose the feature subset with the best classification performance as the optimal feature subset. Fivefold cross-validation was adopted to avoid overfitting and enhance the model generalization ability. The model was implemented by first dividing the sample data into five equal parts randomly. Four parts were then utilized as the training set to build the SVM model, and the remaining part was regarded as the test set to verify the classification performance of the model. The procedure was repeated until each of the five parts was used once as the test set. The detailed procedure for selecting the optimal feature subset is as follows: 1. Start by initializing the cycling number cn = 1 and repeat the following steps until cn = length(r); 2. Build an SVM model with the feature subset r(1 : cn); 3. Optimize the model parameters of the penalty coefficient Cp and kernel function parameter gamma based on the grid search technique; 4. Calculate the classification performance, cn = cn + 1.
Finally, the optimal SVM model, using the optimal feature subset, can be obtained by choosing the highest classification performance. The penalty coefficient Cp and kernel function parameter gamma are two key parameters to prevent the model from overfitting and improve the classification performance [36].

Performance evaluation metrics
The classification performance of the SVM model was jointly assessed using five indicators: accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC), which is detailed in [37]. The definitions of accuracy, precision, recall, and F1-socre are shown in Appendix A.

Experimental subjects and scenarios
Dogs were chosen as the typical representative of pets herein by jointly considering the experimental feasibility and the largest proportion of dogs in pets. Participants of the experiments were 10 adult volunteers (five males and five females) aged from 24 to 43 years old and five healthy Beagle dogs (two males and three females) aged about 1 year. All human volunteers had informed consent. The Beagle dogs were obtained from the Experimental Animal Center of Fourth Military Medical University, and the animal experiment was conducted according to IACUC-20201201. The experimental scenarios are shown in Fig. 7. Each human target sat still during the collection of the radar data. To collect the radar data of the dog targets, some dog foods and water were fed in advance to help calm the dogs, and the data collection started when the dogs laid still. Thus, the gestures of the dogs for each collection could be different.
The distance between the UWB radar and target was approximately 1 m and varied slightly for different collections. The duration for each data collection was approximately 30 s, which can meet the requirements for quick distinguishment in practical applications. To expand the total number of target samples, data was collected for each target 10 times in a space of five days. By jointly considering the different collection distances, different gestures for each collection, long time span, different body states, and different environmental interferences, we considered the 10 collections of a target as different data samples. Therefore, there was a total of 150 samples, which served as inputs for building the SVM models. The human and dog targets were labeled as "+ 1" and "− 1," respectively. A detailed description of the samples is provided in Table 2.

Feature comparison between humans and pets
To validate the rationality and effectiveness of the 19 extracted features, some typical comparisons of the features between humans and pets are shown in this section. The comparison of energy changes in the fixed range window between human and dog targets is illustrated in Fig. 8. Comparison of ωH R, 1/4 , ωH R, 3/4 , and WH R between human and dog targets is shown in Fig. 9. Figure 10 illustrates a typical comparison of correlation coefficients for signals around the target position S(m, n), m ∈ [p − w h , p + w h ] between human and dog targets.
As shown in Fig. 11, the optimal feature subset with the highest AUC value of 0.9620 was r(1 : 8): r=[MCC, Hr, MMWE, RWHME, ERRH, MWE, ERH, ωH R, 3/4 ]. Among them, MCC had the highest feature ranking criteria. Namely, the difference described by MCC between humans and dogs in vital signs monitoring was the largest. As illustrated in Fig. 10, the correlation coefficient values for human targets in the fixed range window were all close to 1. This indicated that the point signals around the target position for human targets were closely correlated with the point signal at the target position. However, the correlation coefficient values for dog targets in the fixed range window decreased significantly at the edge of the window. This implied that the correlation between point signals around the target position and the target-position point Collections for each target 10 10 Total data samples 100 50 signal for dog targets declined quickly as the distance to the target position increased. Thus, MCC, which denoted the mean of correlation coefficient in the fixed range window, of human targets should be significantly higher than that of dog targets. ωH R, 3/4 was the last feature within the optimal feature subset. It was extracted in aspect of frequency. However, because the respiratory frequency band of human targets had a large overlap with that of dog targets [25,26], the difference described by features related to respiratory rate could be relatively small. And ωH R, 3/4 was the only respiratory-raterelated feature that was selected into the optimal feature subset. The optimization of the searching penalty coefficient Cp and kernel function parameter gamma for feature subset r(1 : 8) using the grid search technique is illustrated in Fig. 12. The results are Cp=9.1896, and gamma=0.0825.
The classification performance for the optimal SVM model using the optimal feature subset of r(1 : 8) is illustrated in Table 3. The overall accuracy of the classification task for distinguishing stationary human target from dog target is approximately 0.8933. The precision is approximately 0.9038, which means that the ratio of true human target to the total SVM-model-classified human target is 90.38%. The corresponding false alarm for human target is approximately 9.62%. The recall is approximately 0.94, which means that 94% of the total true human target can be identified correctly. The corresponding missing alarm is about 6%. The F1-score is a jointly indicator for evaluating the distinguishing ability of the built model, and it is about 0.9215.

Analysis of distinguishing ability of the features
To analyze the contribution of different categories of the features, the four-category features were used to build SVM models respectively. The AUC values for the different Fig. 11 Selecting optimal feature subset evaluated using AUC values  Table 4. The eight frequency-category features possessed the highest AUC values when used along. This indicates that frequency-category features contribute the most to the classification performance. Thus, the frequency-category differences between human and dog targets should be the largest. Relatively, the other three categories (energy, wavelet entropy, and correlation coefficient) contributed much less. Among the eight frequencycategory features, heart-rate-related features were apparently more important. Only two frequency-category features existed within the optimal feature subset r(1 : 8). Hr (heart rate) was the second in order, and ωH R, 3/4 (the frequency value where the cumulative energy value is 3/4 of the total energy in the Hilbert marginal spectrum of the heartbeat signal) was the eighth in order. Both features were extracted from the heartbeat signal and are heart-rate-related features.

Comparison between distinguishing in vital signs monitoring and through-wall condition
Our previous work in [16] conducted classification tasks to distinguish stationary humans and dogs through a 28-cm thick brick wall using a UWB radar with a 400 MHz center frequency. Each target was at 2.5 m far away from the brick wall, and the UWB radar was clung to the other side of the brick wall with approximately 1.1 m above the ground. The dogs lay on a 0.8 m high experimental table, and the stationary human targets stood facing the brick wall. The aim of the study was to optimize the distribution of resources in post-disaster rescue missions, such as earthquakes or mining accidents. It extracted 12 corresponding features, which were from the same four categories used in this present study. However, the roles of different feature categories can differ evidently between in vital signs monitoring and through-wall condition. The role of a feature category is determined by its AUC value when only the features belonging to that category are used as input to build the SVM model. The corresponding comparison is shown in Fig. 13. AUC values of different feature categories under through-wall condition can be obtained in [16]. The methods for building the SVM models and calculating the AUC values under the two data collecting condition are the same. The kernel function of the SVM method is chosen as radial basis function. The chief differences are the data collecting condition and the detailed features within each category. For distinguishing tasks under through-wall conditions, the wavelet-entropycategory features contributed the most, and its AUC value was very close to that of the optimal feature subset. This is because the through-wall conditions can introduce a fairly strong environmental noise. As the intensity of the noise is close to that of the dog's signals, it can increase the degree of disorder in a dog's signals. Conversely, the intensity of human signals can be much larger than that of the introduced noise. Moreover, most of the noises mixed in the human signals can be eliminated by signal preprocessing such as filtering. In this way, the difference in the degree of order/disorder between signals of human and dog targets is enlarged by the existence of the brick wall. Thus, wavelet-entropy-category features, which are used to reflect the difference in the degree of order/disorder, can contribute the most to the through-wall distinguishing tasks accordingly. However, in vital signs monitoring, there is basically no strong environmental noises because there may be rarely heavy obstacles between the target and UWB radar, or the obstacles only attenuate small electromagnetic wave energy. Thus, the noise can be largely eliminated by the signal preprocessing steps in either the humans' or dogs' signals. The degree of signal order/disorder between human and dog targets is hence not enlarged and can be very small in this case. Therefore, most features that effectively work under through-wall condition can only play a small role in vital signs monitoring, taking the wavelet-entropy-category features for instance. Moreover, the contributions of the four feature categories for distinguishing in through-wall conditions are very close. On the contrary, the frequency-category features contribute much more than the other three categories in vital signs monitoring.
In particular, it is notable that the heartbeat signal can be stably extracted in vital signs monitoring. Thus, some features related to the heartbeat signal can thereby be extracted, such as ERRH, ERH, ωH H, 1/4 , ωH H, 3/4 , WH H , and Hr. Among them, Hr, ERRH, and ERH are all ranked within the top-eight in order and are selected within the optimal feature subset. This indicates that there may be a large difference in the heartbeat signal between human and dog targets. And the method in [16] did not utilize the information of heartbeat signals since it cannot be separated under throughwall condition. The aim of the above analysis was to illustrate that distinguishing stationary humans and pets in vital signs monitoring or under through-wall condition are two different scientific problems.

Discussion
A literature review showed that only the study in [17] conducted the distinguishment between humans and pets in non-contact vital signs monitoring using UWB radar. The study extracted a single feature of ERRH (energy ratio of respiration to heartbeat) and validated its effectiveness for the distinguishing tasks. Thus, a comparison was implemented between our proposed method and the method in [17]. Because the study in [17] did not illustrate a clear threshold of ERRH to distinguish humans from pets, we employed the feature of ERRH as the only input to build a SVM model for comparison.
The procedures were the same as that of our proposed method, and the only difference was the features for building the SVM model. The comparison results of distinguishing performance using the 150 samples are shown in Table 5.
The recall of the state-of-the-art method is 1, which means that all the true human samples can be identified correctly. However, the precision of 0.6667 represents that only approximately 66.7% of the overall classified human samples that predicted by the state-of-the-art method are true. The other 33.3% of the whole classified human samples are actually dog samples. This can lead to the misuse of dog signals as human signals in vital sings monitoring and may further induce false alarm. By comparison, the precision and recall of our proposed method are 0.9038 and 0.94 respectively, which can ensure a much better performance for comprehensive consideration.
However, there are still a few issues to be further explored for more practical applications in vital signs monitoring. For example, the experimental scenarios should contain both humans and pets which are very close to each other at the same time. Moreover, the animal species can be increased.

Conclusions
A robust and accurate method for distinguishing between humans and pets in noncontact vital signs monitoring using UWB radar was proposed in this paper. After performing signal preprocessing on the raw radar data, a total of 19 discriminative synthetic features were extracted from the preprocessed signal, separated respiration signal, and separated heartbeat signal. Next, RFE-SVM was utilized to rank the 19 features, and the top-eight features were selected as the optimal feature subset. An optimal SVM model was then built with the optimal feature subset. The performance of the optimal SVM model was demonstrated based on fivefold cross-validation with an accuracy of 0.8933, precision of 0.9038, recall of 0.9400, F1-score of 0.9215, and AUC value of 0.9620. The false and missing alarms for human target identification were approximately 9.62% and 6%, respectively, which can meet the requirements of effectively distinguishing humans from dogs in non-contact vital signs monitoring using UWB radar. Therefore, the method can help endure the correct signal source and avoid using erroneous health monitoring data or even false alarms. The proposed method is envisioned to facilitate the applications of UWB radar in vital signs monitoring, which can further promote the welfare of people with pets aside, especially the elderly and the blind.