Design and implementation of a compressive infrared sampling for motion acquisition

Liu, Tong; Liu, Jun

doi:10.1186/1687-6180-2014-20

Research
Open access
Published: 19 February 2014

Design and implementation of a compressive infrared sampling for motion acquisition

Tong Liu¹ &
Jun Liu²

EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 20 (2014) Cite this article

2314 Accesses
2 Citations
Metrics details

Abstract

This article proposes a compressive infrared sampling method in pursuit of the acquisition and processing of human motion simultaneously. The spatial-temporal changes caused by the movements of the human body are intrinsical clues for determining the semantics of motion, while the movements of short-term changes can be considered as a sparse distribution compared with the sensing region. Several pyroelectric infrared (PIR) sensors with pseudo-random-coded Fresnel lenses are introduced to acquire and compress motion information synchronously. The compressive PIR array has the ability to record the changes in the thermal radiation field caused by movements and encode the motion information into low-dimensional sensory outputs directly. Therefore, the problem of recognizing a high-dimensional image sequence is cast as a low-dimensional sequence recognition process. A database involving various kinds of motion played by several people is built. Hausdorff distance-based template matching is employed for motion recognition. Experimental studies are conducted to validate the proposed method.

1 Introduction

How to effectively acquire the human motion information is of key importance for analyzing and interpreting behavior. Wearable sensors and isomorphic vision sensors are the most used sensing methods for motion acquisition. The wearable sensor-based sensing method is able to directly obtain the motion information of specific limbs or joints. It can form the low-dimensional sensor output-based feature representation [1]. However, the wearable sensor belongs to the intrusive sensing method. It needs the observed person to wear or wrap the sensor on the body, so the feeling of comfort will be affected.

The isometric vision sensor-based sensing is the non-intrusive method for motion acquisition. It has shown strong potential application prospects with the developing advantages of the sensor’s low cost and networking [2, 3]. In order to achieve the recognition and understanding of visual motion, it needs to extract the useful information and feature from the high-dimensional image data. Some representations of the feature, such as the geometric model of the human body, spatial-temporal patterns, appearance, area, contours, and optical flow model, have been demonstrated to be the effective methods [4]. However, a large number of studies have found that the dimension of the visual image-based feature is high. It will increase the computational complexity of recognition in the information processing and be not suitable for the real-time requirements of systems.

The field of computer vision research has found that the visual image-based data contains considerable redundant information. This makes the data analysis complicated in computation [4, 5]. It needs further refinement to remove the redundant parts of the data for forming the low-dimensional feature. Many parsimony models referred as dimension reduction are seeking efficient transformation and recognition based on a low-dimensional representation, such as principal component analysis (PCA) [6], isometric feature mapping (Isomap) [7], and local linear embedding (LLE) [8]. One can remove the redundant information sampled densely by generic sensors and improve the efficiency in the subsequent data analysis procedure [9]. However, there is an imbalance between the data acquisition and data utilization. Great efforts are made to compress the features for moving the course of dimension. This is a wasteful process both for sensing and computational resources. Moreover, the compression algorithms of visual feature heavily rely on software computing platforms, while little work is focused on the physical hardware in pursuit of integrating the compression processing directly.

The newly proposed compressed sensing (CS) theory is able to integrate the sparse signal acquisition and compression in a single process [10–12]. This sensing mechanism avoids the generation and processing of the redundant information. The CS theory supported that the single-pixel camera is able to use the incoherent basis to sample the observed space; then, the scene could be recorded with a low-dimensional form [13, 14]. The dimension of measurements is only relevant to the sparsity of the raw signal, and the incoherent basis has the form of random matrix. The random matrix-based measurement operation is the most common compression and can be embedded into the physical hardware. In addition, a random matrix measurement is independent on the sparse data, and the property of data-friendly compression is preferable. The raw image of the scene can be recovered by solving optimization problems.

The a priori condition for CS is that the raw signal itself or in some transform domain can be sparsely represented. Some studies have been focused on the parsimony of motion perception in human vision, such as moving light displays (MLDs) [15, 16] and spatial-temporal salient points [17–19]. Two major characteristics of the human motion can be found from the above studies. First, the moving and changing information of body motion is the effective clue for recognizing and understanding the human behavior. Second, the short-term changes in human motion can be considered to be sparsely distributed in spatial domain. This raises a simple question: for a given motion signal, whether it is possible to incorporate the direct acquisition and compression into a single sensing process by certain physical coder.

In general, the visible light-based vision sensors do not have the ability of acquiring the motion information directly. However, the pyroelectric infrared (PIR) sensor, due to its inherent capability in motion detection, has the ability of directly acquiring the motion information [20–28]. In addition, the PIR sensor can form the infrared multiplexed sensing with the visual modulation by Fresnel lens. This sensing mechanism supports the implementation of compressed infrared sampling. Although there has been a lot of work exploiting the combination of PIR sensors and Fresnel lens arrays to form the multiplexed infrared sensing, establishing the compressed infrared sampling model for acquiring the motion information is an unsolved problem.

In this article, we exploit the compressive infrared sampling towards the acquisition and processing of human motion. The spatial-temporal changes caused by the movement of the human body are intrinsical clues for determining the semantics of motion, while the movements with short-term changes can be considered as a sparse distribution compared with the interface region. Several PIR sensors with pseudo-random-coded Fresnel lenses are introduced to acquire and compress motion clues synchronously. The compressive PIR array has the ability to record the changes in the thermal radiation field caused by human movements and encode the motion information into low-dimensional sensory outputs directly. Therefore, the problem of recognizing a high-dimensional image sequence is cast as a low-dimensional sequence recognition process. Hausdorff distance-based template matching is employed for validating the usefulness of the proposed sensing method. In the experimental analysis, a database involving various kinds of motion played by several people is built. The relations between the compressive dimension and correct recognition rate and the compressive dimension and the time consumed by recognition are compared detailedly, and the application of the proposed sensing paradigm to human-computer interaction is addressed.

While the proposed method is based on the following assumption: The motion of body is constrained in a predefined interface space, we assume that each motion is normalized and the same semantical motions posed by different persons should have small spatial-temporal variances. Although our assumption oversimplifies the general motion recognition problem, our purpose in this article is to present the idea of incorporating the PIR sensor and random compression theory into a single hardware measurement process. More solutions for the real challenges can be found in [29].

The rest of this article is organized as follows: In Section 2, we introduce the related infrared sensing model. Section 3 describes the random compression theory. Section 4 gives the design of compressive infrared sampling for motion acquisition. Section 5 presents the experimental setup and gives the Hausdorff distance-based recognition. The summary and conclusions of this article are given in Section 6.

2 Pyroelectric infrared sensor model

2.1 Sensing model

There has recently been considerable interests in PIR sensors for human motion detection and analysis [20–28]. The PIR sensor is made of pyroelectric materials which is sensitive to the thermal radiation with the wavelength between 5 and 14 µm. When the thermal radiation transfers to the sensor and causes temperature changes, the pyroelectric sensor material will produce an equal number of opposite electric charge in its polar. This will produce a weak voltage. Thus, this sensing process has three advantages. First, it is only sensitive to human motion and supports the motion extraction directly in the hardware. Second, its performance is robust to illumination changes and complex background, so troubles in traditional camera-based vision system can be removed. Third, through the modulation of the sensor’s field of view (FOV) by Fresnel lens, the sensor can be achieved on a specific observation area in an optical multiplexing pattern. If we bridge incoherent projections from the interface region to measurement space rationally, the main information can be recorded in a low-dimensional representation.

The human body is able to make the thermal radiation exchanges with the surroundings at room temperature. By studying the pyroelectric materials, Hossain and Rashid gived the simplified equation of pyroelectric current [30]:

I (t) = p_{s} \frac{d T (t)}{d t} = \frac{p_{s} η}{C} e^{- \frac{G}{C} t} U (t) * \frac{d ϕ (t)}{d t},

(1)

where p_s is the pyroelectric constant and related to the pyroelectric material. $T (t)$ is the difference of temperature between the sensor and ambient environment and has the form $T (t) = \frac{η}{C} e^{- \frac{G}{C} t} U (t) * ϕ (t)$ , where U(t) is the unit step function, ∗ is the convolution operator, is the heat capacity of the sensor, is the total thermal conductivity between the sensor and the environment, η is the rate of absorption of the sensor, and ϕ(t) is the received thermal power of the PIR sensor at time t and can be simplified based on the Stefan-Boltzman law as

ϕ (t) = \frac{A (t) ε_{h} k_{B} A_{s} (T_{h}^{4} - T_{c}^{4})}{d_{0}^{2}} + n (t),

(2)

where A_s is the surface area of the sensor, d₀ is the distance between the thermal radiation source and sensor, $T_{h}$ is the temperature of human (37°C), and $T_{c}$ is the ambient temperature in Kelvin. ε_h and k_B are the Stefan-Boltzmann’s constant and emissivity factor, respectively, and n(t) is the noise. A(t) is the surface area of the human body that can be observed by the sensor.

Figure 1 presents the transmission of thermal radiation and sensing process of a PIR sensor. We set $H (t) = \frac{p_{s} η}{C} e^{- \frac{G}{C} t} U (t)$ and $s (r, t) = \frac{d ϕ (t)}{d t}$ , where the H(t) is defined as the step response function and s(r, t) is the density distribution function of the changing thermal radiation at the space r. By integrating an external resistor and an amplifier, the PIR sensor’s output is refined as

m (t) = H (t) * \int_{Ω} v (r) s (r, t) d r,

(3)

where v(r) is the visibility function, which is ‘1’ when r is visible to the sensor, otherwise is ‘0’. The visible function v(r) is able to be achieved by the Fresnel lens physically. The Fresnel lens is made of low-cost plastic and has two main functions. First, it has the capability of focusing the changes of thermal radiation onto the sensor; thus, the sensitivity can be enhanced. Second, according to the requirements of sensing tasks, it can reshape and code the FOVs of the sensor.

Shankar et al. used the black body as simulated radiation sources of the human body and found that the upper and lower cutoff frequencies of the sensor is 0.7 and 2 Hz, respectively, [25]. The step response of the sensor can be approximated as H(t) ≈ C_vδ(t - τ), τ = 1.8 s, where C_v is the voltage constant, δ is the impulse function, and τ is the delay constant. In order to facilitate the following discussions, we set H(t) = δ(t). Equation 3 can be simplified as

m (t) = \int_{Ω} v (r) s (r, t) d r,

(4)

where Ω is the coverage area of the sensor.

2.2 Reference structure-based infrared sampling model

In the community of spatial imaging, the geometry reference structure-based tomography is the sensing mode that samples the spatial information selectively in connection with the tasks [31]. Its core idea is to use the reference structure-based coding modulation to build a projection mapping from sampling space to measurement space. In this article, using the combinations of the PIR sensor and Fresnel lens, we build an infrared sampling model, as shown in Figure 2a. Here, the reference structure is achieved by Fresnel lens physically.

Let us first assume that the FOV of a PIR sensor Ω is divided into L non-overlapping sub-cell Ω_i, having the form

Ω = ⋃_{i} Ω_{i}, Ω_{i} ⋂ Ω_{j} = \emptyset, i, j = 1, 2, \dots, L,

(5)

where Ω_i is the i th sub-cell of the raw interface space. If the FOVs of the PIR sensor are discrete, we denote v_j as the visible function for the j th sensor; the j th output in the sensor arrays is

\begin{align} m_{j} (t) & = \int_{Ω} v_{j} (r) s (r, t) d r = \sum_{i = 1}^{L} \int_{Ω_{i}} v_{j} (r) s (r, t) d r \\ = \sum_{i = 1}^{L} v_{ji} \int_{Ω_{i}} s (r, t) d r = \sum_{i = 1}^{L} v_{ji} s_{i} (t), \end{align}

(6)

where v_{j
i} bridges the visibility between the j th sensor and i th sub-cell Ω_i and $s_{i} (t) = \int_{Ω_{i}} s (r, t) d r$ is the integration of the thermal radiation changes in the cell Ω_i at time t. We set v_j = row [v_j(r)] and s(t) = col [s_i(t)], respectively, and Equation 6 is rewritten in matrix notation as

m (t) = col [m_{j} (t)] = Vs (t),

(7)

where V = [ v_ji] determines the spatial transform of the thermal radiation and is able to be implemented by the Fresnel lens physically.

In general, a PIR sensor commonly consists of single-, dual-, or quad-element detectors. The single-element sensor must add a thermal compensation module to remove the sensitivity to ambient temperature. Quad-element sensors have the inherent advantage that the output is the difference between the voltages obtained from each of the elements of the sensor [25]. The environmental effects can be removed. Figure 2a shows the sampling model of a quad-element PIR sensor, and its output is denoted as

m (t) = m_{1} (t) + m_{2} (t) - m_{3} (t) - m_{4} (t),

(8)

where m₁(t) … m₄(t) are the separated output of four elements, respectively. Hence, the visual FOV of the sub-cell can be further divided into four regions by the quad-element PIR sensor, which is denoted as

Ω_{i} = Ω_{i_{1}} \cup Ω_{i_{2}} \cup Ω_{i_{3}} \cup Ω_{i_{4}} .

(9)

The output of the sensor is refined as

\begin{align} m (t) = \sum_{i = 1}^{L} v_{i} s_{i} (t) = & \sum_{i = 1}^{L} (v_{i_{1}} s_{i_{1}} (t) + v_{i_{2}} s_{i_{2}} (t) \\ - v_{i_{3}} s_{i_{3}} (t) - v_{i_{4}} s_{i_{4}} (t)) . \end{align}

(10)

Due to the Fresnel lens masks encoding the quad-element PIR sensor integrally visible or invisible for a particular cell, there is $v_{i} = v_{i_{1}} = v_{i_{2}} = v_{i_{3}} = v_{i_{4}}$ and

m (t) = \sum_{i = 1}^{L} v_{i} (s_{i_{1}} (t) + s_{i_{2}} (t) - s_{i_{3}} (t) - s_{i_{4}} (t)) .

(11)

Then, the output of the j th sensor is

m_{j} (t) = \sum_{i = 1}^{L} v_{ji} (s_{i_{1}} (t) + s_{i_{2}} (t) - s_{i_{3}} (t) - s_{i_{4}} (t)) .

(12)

Figure 2b shows the Fresnel lens containing 25 non-overlapping cells; thus, each PIR sensor is divided into four sub-cells to form a symmetrical subtraction.

2.3 Sparsity analysis on motion representation

The a priori condition for compressive sampling is that the raw signal itself or in some transform domain can be sparsely represented. It is necessary to analyze the motion representation. This is the key to acquire the motion compressively in an efficient way.

Based on the previous PIR sensing model, the sensor will generate approximated impulse response on the changing thermal radiation, while the changing thermal radiation is controlled by the received thermal power ϕ(t). If we assume that when the moving subjects keep a fixed distance from the sensor, both body and ambient temperatures are isothermal and the sensor’s noise is small, then $\frac{d ϕ (t)}{d t}$ is only associated with the visible surface of the body A(t) and can be represented as the moving body parts.

To extract the moving body parts and prove sparsity, we first designed a set of gymnastics to build a motion database. There are 14 kinds of gymnastic motions, including the local movements generated by the arms and legs and the synergistic motions of the upper and lower limbs. Figure 3 gives the sequential images of each kind of motion. All the motions are constrained to perform at a predefined region for keeping the fixed distance from the sensor array. There are five lab members who participated in our experiments, and each member does each motion six times repeatedly. The members are with the most common heights and weights; the range of height is from 160 to 180 cm. Thus, we collected 30 image sequences for each motion; the motions are sampled at 25 frames/s.

In what follows, the sophisticated optical flow method is employed to extract the changing body parts [32]. Examples of three motions are shown in Figure 4a,b,c. We select three frames and three optical flow images in each category of motions for visualization and then compute the intensity of motion flow to represent the changing body parts as shown in the third row of each sub-figure. The large intensity coefficients are represented by light pixels, while small coefficients are represented by dark pixels. It can be observed that most of the coefficients of motion flow are close to zeros. We also compute the average distribution of intensity of motion flow on the designed gymnastic motions and plot the corresponding histogram in Figure 4d. Again, most coefficients are very small to zero, meaning the short-term changing body parts are sparsely distributed. This fact motivates us to set up a compressive infrared sampling for motion acquisition.

3 Random matrix-based compression

In the community of data mining and dimension reduction, the random matrix-based compression or projection has attracted the attentions of a large number of researchers. It has the advantages of low generation complexity, low distance-preserving distortion, and the ability of accelerating the data processing. Given a high-dimensional and sparse data set, such as the thermal radiation space, it is natural to ask whether it could be embedded into a lower dimensional space without suffering great distortion.

Johnson-Lindenstrauss (JL) lemma gives the intuition for designing the infrared sampling towards non-adaptive and stable compressed acquisition method. The original formulation of JL lemma is stated as in [33]: given a parameter α > 0 and an integer n₀. If M is a positive integer and M > O(α^-2 logn₀), there exists a Lipschitz mapping V:R^N → R^M for the set S∈R^N which is composed of n₀ points. The mapping is denoted as

(1 - α) ∥ S_{1} - S_{2} ∥_{2} \leq ∥ V S_{1} - V S_{2} ∥_{2} \leq (1 + α) ∥ S_{1} - S_{2} ∥_{2}

(13)

for every S₁, S₂ ∈ S. The JL lemma shows that the set S in N-dimensional Euclidean space can be mapped on the M-dimensional Euclidean space by the compression matrix V. The JL lemma provides a compression and dimension reduction idea, which if we are able to design the applicable compression projection V, then the data processing calculated in the original high-dimensional space is transformed to a low-dimensional space. Johnson and Lindenstrauss demonstrated inequality (13) and the existence of the V from the perspective of geometric approximation. However, they did not give the indication of how to design the V for a specific data set [33].

In the subsequent studies, Dasgupta and Gupta provided the proof of JL lemma using the probability theory [34] and pointed out that the entries v_ij in the matrix V were able to be built by the independent Gaussian random variable, meaning as $v_{ij} \sim N (0, 1)$ . When the number of samples satisfies M ≥ 4(α²/2 - α³/3)^-1 lnn₀, inequality (13) will hold with high probability. However, the random Gaussian variable v_ij contains consecutive floating point numbers; it is difficult to physically integrate or realize in many areas of engineering.

Achlioptas simplified the proof of JL lemma from the perspective of probability theory [35]. The more simple and easily implemented random compression matrix is given. If M is an integer satisfying

M \geq (\frac{4 + 2 β}{α^{2} / 2 - α^{3} / 3}) ln (n_{0}),

(14)

and the projection entry v_ij has the form with random Bernoulli distribution

v_{ij} : = \{\begin{array}{l} + 1 with probability 0.5 \\ - 1 with probability 0.5, \end{array}

(15)

or to meet

v_{ij} : = \sqrt{3} \{\begin{array}{l} 1 with probability \frac{1}{6} \\ 0 with probability \frac{2}{3} \\ - 1 with probability \frac{1}{6}, \end{array}

(16)

inequality (13) will hold with high probability. The Bernoulli distribution-based compressed projection matrix V, due to the simple physical signification, is widely applied to the engineering field.

4 Compressive infrared sampling

4.1 Random matrix-based compressive infrared sampling

Random matrix-based dimension reduction and CS theory provides powerful tools for the design of compressive infrared sampling. According to Achlioptas’s statistical results [35], if the v_ij is a random variable with the symmetric Bernoulli distribution, then the matrix V is able to achieve the dimension reduction and have the approximate distance-preserving property as described in inequality (13). According to Section 2, the random symmetric Bernoulli distribution-based sensing matrix can be achieved using the optical multiplexing. The combination of PIR sensor and the Fresnel lens supports the physical implementation of optical multiplexing. To be specific, for the random matrix stated as in Equation 15, the compressive infrared sampling is achieved by the single-element PIR sensor and Fresnel lens encoded with random masks; for the random matrix satisfying Equation 16, the compressive infrared sampling is achieved by the random-rotated quad-element PIR sensors and Fresnel lenses encoded with random masks. In this article, we adopt the second physical method for designing the compressive infrared sensing.

If we assume that the body’s movement is constrained in a fixed interaction space, so that a specific motion with certain semantics is composed of an infrared sequence of moving body parts. According to sampling theory for the radiation space, the original interface space can be divided into coarser-grained and non-overlapping cells. This division is achieved by the isometric mapping between the interface space and Fresnel lens. The sub-cells on the Fresnel lens have the homologous distribution to the traditional visual sensor pixels. When the body moves into the sensing space, the feature of motion can be represented by the changes of thermal radiation. According to the theory of random compression proposed by Achlioptas [35], we use the random distribution of Equation 13 to modulate each of the FOV of the Fresnel lens. Figure 2b shows the Fresnel lens containing 25 non-overlapping cells; thus, each PIR sensor is divided into four sub-cells to form a symmetrical subtraction. We first select two thirds of the all FOVs of the Fresnel lens on each PIR sensor and mask them. This operation will make the changes of thermal radiation in the sub-cell to be not visible for the PIR sensor. Then, we randomly select a half from all the sensors and then rotate them 90°. The above combined operation enables the sensing matrix of the sensor array to have a pseudo-random property and forms the fashion of compression infrared sampling for motion information:

m (t) = Vs (t) .

(17)

Figure 5 presents the diagram of the proposed compressed infrared sampling. In this article, we employ 16 PIR sensors to measure the motion information in raw space parallelly, having $m \in R^{16}$ . The sub-lens on Fresnel lens is further divided into four sub-cells using the quad-element PIR sensors, so the original measured space is divided into 25 × 4 = 100 non-overlapping sub-cells and $s \in R^{100}$ . The measurement matrix V ∈ R^{16 × 100} compresses the original 100-dimensional states of the thermal radiation changes into the 16-dimensional sensor outputs with a non-adaptive way.

4.2 Statistical analysis of random measurement matrix

Figure 6a presents the pseudo-random measurement matrix used in this article. The white pixels in this figure represent the visible entries ‘1’, the black pixels represent the visible entries ‘ -1’, while the gray pixels denote the invisible entries ‘0’. Given the measurement matrix, it is necessary to verify its effectiveness for inequality (13). However, due to the unknown knowledge of specific set S and its element number n₀, it is hard to demonstrate that inequality (13) holds with high probability directly.

Kaski proposed the cosine of the angle between two vectors to measure the distortion of similarity when random compression is used [36]. His method gives quantitative assessments on random compression. In this article, we employ his statistical results to assess the effectiveness of pseudo-random measurement matrix. First, assuming two vectors s₁ and s₂ are given, the inner product of two measurement vectors m₁ and m₂ by the random matrix V can be expressed as follows [36]:

m_{1}^{T} m_{2} = s_{1}^{T} V^{T} {Vs}_{2},

(18)

The matrix V^TV can be decomposed as V^TV = I + ε, where I is the identity matrix and the matrix ε denotes the entities off the diagonal:

ε_{ij} : = \{\begin{array}{l} v_{i}^{T} v_{j} & for i \neq j, \\ 0 & for i = j . \end{array}

(19)

Then, Equation 18 can be rewritten as

m_{1}^{T} m_{2} = s_{1}^{T} s_{2} + \sum_{i \neq j} ε_{ij} s_{1_{i}} s_{2_{j}} .

(20)

The diagonal entities in matrix V^TV should be equal to unity since the measurement vector v_i has the normalized weights with equal probability, while the non-diagonal entries ε_ij should be equal to zero [36]. However, the vector v_i and v_j are not orthogonal in practice, which causes the non-diagonal entries ε_ij to be small but not to zero. The similarities of the original vectors will generate distortions with the non-zero entries ε_ij, which can be seen in Equation 20.

If the random measurement matrix is fixed, it is possible to use the statistical properties of entries ε_ij to analyze the distortions generated by compression. The ideal average of ε_ij is E[ε_ij] = 0, with approximate variance $σ_{ε}^{2} \approx 1 / M = 0.0625$ . According to the variance $σ_{ε}^{2}$ , we can infer that the more measurements and sparser original vectors will generate smaller distortions when random compression is used. The actual average value of ε_ij based on previous description is -0.0105, and the variance is 0.1518. Figure 6b gives the ideal and actual distribution of ε_ij. Although the actual variance is larger than the ideal one, the ε_ij will have the smaller value associated with the sparse vector.

5 Experiments and results

5.1 Experimental setup

Figure 7a presents a PIR sensor module; both the length and width are 4 cm. The PIR sensor unit locates in the center of the module. Figure 7b shows the prototype of our proposed sensing system for the acquisition of motion information. The sensor array is composed of 16 quad-element PIR sensors. The PIR sensors D 205b commercially available are employed for sampling the thermal radiation changes [37]. Both the horizontal and vertical range of the sensor’s FOV are about 95°. We use the smart system-on-chip (SoC) C C 2430 to sample the signal with the frequency of 10 Hz. Figure 7c shows the experimental setup for real-time measurement of the body motion. We assume that the movements are restricted inside the virtual box in front of the person. The distance between the sensor unit and the subject is 1.5 m. When the limbs of the body move through the interface region, the corresponding sensors will be activated.

In order to test the validity of the proposed method, we designed a set of gymnastics to build a motion database. There are 14 kinds of gymnastic motions, including the local movements generated by the arms and legs and the synergistic motions of the upper and lower limbs. The ambient temperature is kept at 25°C. Figure 3 gives the sequential images of each kind of motion. Figure 8 shows the typical sensor output vector when a person walks across the interface space.

In the following experiments, we do assume that the gymnastic motion played by each member is required to be constrained in the interaction region. The spatial and temporal differences of the same semantic motions are small. However, the general motion recognition and behavior understanding should be considered the more challenging problems. Although our assumptions simplify the motion recognition, the purpose of this article is to exploit the compressed infrared sampling to directly acquire human motion in a compressive form.

5.2 Measurement sequence segmentation

We use the energy-based detection method to segment the motion signal directly. First, the collected signal is normalized by removing its direct current component. Second, the normalized signal is enframed by splitting the signal into overlapping frames and denoted as

m_{s} (t) = {[m_{s} (t - d), \dots m_{s} (t) \dots, m_{s} (t + d)]}^{T},

(21)

where m_s(t) is the s th sensor’s frame vector at time t and d is the width of the radius in enframing operation. In our subsequent experiments, the d is 5. Third, the energy signal is obtained by accumulating the squared value in each column of the enframed signal, and a predefined threshold value is used for generating the start-end boundaries. The overall motion segmentation may be determined based on the synthetically enframed signal M(t) = [m₁(t) … m₁₆(t)]^T.

5.3 Hausdorff distance-based recognition method

Hausdorff distance is a lightweight tool to measure the similarity between two different temporal sequences. It is able to overcome the different lengths of time between changes in the sequence and timing offset, and the output also includes the temporal constraints implicitly. Hausdorff distance is widely used in the temporal sequence matching and recognition [38, 39]. Figure 9 gives the flow chart of recognition method.

Let us assume that a test sequence M₁ = [m₁ (1), …, m₁(t₁), …, m₁(T₁)] is given, and the reference sequence is M₂ = [m₂(1), …, m₂(t₂), …, m₂(T₂)]. T₁ and T₂ are the length of the two signal sequences, respectively. The average Hausdorff distance between two sequences is

D_{h} (M_{1}, M_{2}) = \underset{1 \leq t_{1} \leq T_{1}}{mean} (min_{1 \leq t_{2} \leq T_{2}} (∥ {\hat{M}}_{1} (t_{1}) - | {\hat{M}}_{2} (t_{2}) ∥)),

(22)

where ${\hat{M}}_{1}$ and ${\hat{M}}_{2}$ are the energy normalized sequence. The Hausdorff distance is denoted as

D_{hausdorff} (M_{1}, M_{2}) = \frac{1}{2} [D_{h} (M_{1}, M_{2}) + D_{h} (M_{2}, M_{1})] .

(23)

If we have pre-stored some reference samples of different motions in the database, then a coming test sequence needs to be matched with each reference sample. The category of the test sequence is determined by the following nearest neighbor rule:

w^{*} = \underset{1 \leq w \leq W}{arg min} D_{hausdorff} (M_{1}, M_{w}),

(24)

where w^∗ is the category to be recognized and W is the total number of categories in the motion database.

5.4 Experimental results and discussion

The number of measurements or dimensions is an important parameter in the proposed sensing method. By the statistical experiments, we compare the recognition performances in different numbers of dimension. For the collection of the reference set, we randomly select half of the samples from each motion set as the template samples, and the remaining samples are used for test and analysis. The following statistical experimental results are based on 30 cross-validations.

We first statistically give the average similarity of six typical categories of motion against the other motions under different sampling dimensions, as shown in Figure 10. The normalized similarity is calculated based on the reciprocal of average Hausdorff distance. From the experimental results, it can be seen that each kind of motion samples has the confusing similarity against the other reference samples with low sampling dimension, especially the one-dimensional infrared sampling on synergistic motions of the upper and lower limbs. Thus, the acquired information cannot support to determine the category of the testing motion. When the number of compressed sampling dimension increases to 6 or 11, it is obvious that each kind of the testing motion only has the maximum normalized similarity with the truly recognized category.

Figure 11 shows the three-dimensional confusion matrix based on the proposed sensing method with four kinds of sampling dimension. The confusion matrix shows the motion reference category (left) versus the test category (right). Each bar (m_i,M_j) in the matrix denotes the percentage of motion M_j being recognized as motion m_i. The percentage of the correctly recognized motion can be obtained by calculating the trace of matrix. The remaining lower bars present the percentage of misclassification. It can be seen when the sample dimension is 1, due to the small sample dimension, the compression sampling signal cannot form the distance-preserving map with the original high-dimensional space. It causes a highly erroneous recognition rate. It is also obvious that when the number of compressed sampling dimension increases to 6 or 11, the proposed method is able to obtain higher correct recognition rate.

Table 1 presents the average correct rate with four types of compressive dimension using the proposed sensing method. When the compressed sampling dimension is 1, the average correct recognition rate is low. The reason for this phenomenon is that if the sampling number is small, the isometric distance between the original high-dimensional state signal of motion and the low-dimensional sampling signal cannot achieve low distortion with high probability. While the number of compressive dimension reaches to 6, the sensing method will get a better performance.

Table 1 The average correct rates and recognition time for motion detection with four types of dimensionality

Full size table

Table 1 also gives the average recognition time consumed by different compressed sampling dimensions. The relevant algorithms in our experimental studies run on an Intel Pentium 4 2.8-GHz computer by Matlab codes. It can been seen that the larger compressed sampling dimension will consume the longer processing time, and it has approximately linear growing trend based on the Hausdorff distance matching method. It should be noted that the recognition is very fast, since the testing feature is represented by a low-dimensional sequence. Even in the case of 16 dimensions, the average recognition time will not exceed 36 ms. For some real-time recognition system, designers can make a tradeoff on the number of sampling dimension according to the sensing resources, data processing time, and average recognition rate.

6 Conclusions

A compressive sampling-based PIR sensor array for human motion sensing has been developed and evaluated in this article. The PIR sensors and pseudo-random masked Fresnel lens arrays are used for efficient motion feature transformation. Compressive dimension reduction theory supports that sparse or compressible motion information can still preserve its statistical feature in the measurement space. By modeling the low-dimensional sequential features, we can achieve motion recognition, which is confirmed by experiments. The proposed sensing method gives rise to two main advantages. First, the sensing module is able to acquire and compress motion information synchronously. Second, the problem of recognizing a high-dimensional signal is transformed into the low-dimensional space, and the computational time of recognition can be saved.

However, there are some limitations in the practical application. First, the sensing method relies on the assumption that the motion is constrained in a predefined interface region, so the distance between the body and sensor array is fixed. Indeed, many motions are often associated with more freedom. Second, different from the classical compressive sensing paradigm, the low-dimensional sensor outputs are used for motion classification without reconstruction. Developing more effective performance analysis method for the sensor system is our future work. Third, other parameters such as the location of the sensor unit and the distance between the sensor unit and the subject to be captured could also be considered to improve the performance of the system. Although the prototype sensor system has the limitations, the sensing method has provided a proof of concept with respect to using a combination of PIR sensor and CS theory for compressive motion acquisition.

References

Yang C, Hsu Y: A review of accelerometry-based wearable motion detectors for physical activity monitoring. Sensors 2010, 10(8):7772-7788. 10.3390/s100807772
Article Google Scholar
Yilmaz A, Javed O, Shah M: Object tracking: a survey. ACM Comput Surv 2006, 38(4):1-45.
Article Google Scholar
Turaga P, Chellappa R, Subrahmanian V, Udrea O: Machine recognition of human activities: a survey. IEEE Trans. Circ. Syst. Video Technol 2008, 18(11):1473-1488.
Article Google Scholar
Poppe R: A survey on vision-based human action recognition. Image Vis Comput 2010, 28(6):976-990. 10.1016/j.imavis.2009.11.014
Article Google Scholar
Wang L, Suter D: Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Comput. Vis. Image Understand 2008, 110: 153-172. 10.1016/j.cviu.2007.06.001
Article Google Scholar
Jolliffe I: Principal Component Analysis. Berlin: Springer; 1986.
Book MATH Google Scholar
Tenenbaum J, Silva V, Langford J: A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290(5500):2319-2323. 10.1126/science.290.5500.2319
Article Google Scholar
Roweis S, Saul L: Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290(5500):2323-2326. 10.1126/science.290.5500.2323
Article Google Scholar
Tošić I, Frossard P: Dictionary learning. IEEE Signal Process. Mag 2011, 28(2):27-38.
Article Google Scholar
Donoho DL, Trans Compressedsensing: IEEE. Inf. Theory. 2006, 52: 1289-1306.
Article Google Scholar
Baraniuk R: Compressive sensing. IEEE Signal Process. Mag 2007, 24(4):118-121.
Article MathSciNet Google Scholar
Candès E, Wakin M: An introduction to compressive sampling. IEEE Signal Process. Mag 2008, 25(2):21-30.
Article Google Scholar
Duarte M, Davenport M, Takhar D, Laska J, Sun T, Kelly K, Baraniuk R: Single-pixel imaging via compressive sampling. IEEE Signal Process. Mag 2008, 25(2):83-91.
Article Google Scholar
Romberg J: Imaging via compressive sampling. IEEE Signal Process Mag 2008, 25(2):14-20.
Article Google Scholar
Johansson G: Visual perception of biological motion and a model for its analysis. Atten. Percept. Psychophys 1973, 14: 201-211. 10.3758/BF03212378
Article Google Scholar
Johansson G: Visual motion perception. Sci. Am 1975, 232: 76-88.
Article Google Scholar
Oikonomopoulos A, Patras I, Pantic M: Spatiotemporal salient points for visual recognition of human actions. IEEE Trans. Syst., Man Cybern. Part B: Cybernetics 2005, 36: 710-719.
Article Google Scholar
Dollar P, Rabaud V, Cottrell G, Belongie S: Behavior recognition via sparse spatio-temporal features. In 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. Piscataway: IEEE; 2005:65-72.
Chapter Google Scholar
Niebles J, Wang H, Fei-Fei L: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis 2008, 79(3):299-318. 10.1007/s11263-007-0122-4
Article Google Scholar
Fuksis R, Greitans M, Hermanis E: Motion analysis and remote control system using pyroelectric infrared sensors. Electron. Electrical Eng 2008, 86(6):69-72.
Google Scholar
Sixsmith A, Johnson N: A smart sensor to detect the falls of the elderly. IEEE Pervasive Comput 2004, 3(2):42-47.
Article Google Scholar
Burchett J, Shankar M, Hamza AB, Guenther BD, Pitsianis N, Brady DJ: Lightweight biometric detection system for human classification using pyroelectric infrared detectors. Appl. Optics 2006, 45(13):3031-3037. 10.1364/AO.45.003031
Article Google Scholar
Fang JS, Hao Q, Brady DJ, Guenther BD, Hsu KY: Real-time human identification using a pyroelectric infrared detector array and hidden Markov models. Opt. Express 2006, 14(15):6643-6658. 10.1364/OE.14.006643
Article Google Scholar
Fang JS, Hao Q, Brady DJ, Guenther BD, Hsu KY: A pyroelectric infrared biometric system for real-time walker recognition by use of a maximum likelihood principal components estimation (MLPCE) method. Opt. Express 2007, 15(6):3271-3284. 10.1364/OE.15.003271
Article Google Scholar
Shankar M, Burchett JB, Hao Q, Guenther BD, Brady DJ: Human-tracking systems using pyroelectric infrared detectors. Opt. Eng 2006, 45: 106401. 10.1117/1.2360948
Article Google Scholar
Hao Q, Brady D, Guenther B, Burchett J, Shankar M, Feller S: Human tracking with wireless distributed pyroelectric sensors. IEEE Sensors J 2006, 6(6):1683-1696.
Article Google Scholar
Liu T, Guo X, Wang G: Elderly-falling detection using distributed direction-sensitive pyroelectric infrared sensor arrays. Multidimensional Syst. Signal Process 2012, 23: 451-467. 10.1007/s11045-011-0161-4
Article MathSciNet MATH Google Scholar
Liu T, Liu J: Feature-specific biometric sensing using ceiling view based pyroelectric infrared sensors. EURASIP J. Adv. Signal Process 2012. doi:10.1186/1687–6180–2012–206
Google Scholar
Mitra S, Acharya T: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybernet. Part C: Appl. Rev 2007, 37(3):311-324.
Article Google Scholar
Hossain A, Rashid M: Pyroelectric detectors and their applications. IEEE Trans. Ind. Appl 1991, 27(5):824-829. 10.1109/28.90335
Article Google Scholar
Brady D, Pitsianis N, Sun X: Reference structure tomography. J. Opt. Soc. Am. A 2004, 21(7):1140-1147. 10.1364/JOSAA.21.001140
Article Google Scholar
Lucas BD, Kanade T: An iterative image registration technique with an application to stereo vision. In The 7th International Joint Conference on Artificial intelligence, vol. 2. San Francisco: Morgan Kaufmann; 1981:674-679.
Google Scholar
Johnson W, Lindenstrauss J: Extensions of Lipschitz mappings into a Hilbert space. In Conference in Modern Analysis and Probability, vol. 26. American Mathematical Society, Providence; 1984:189-206.
Chapter Google Scholar
Dasgupta S, Gupta A: An elementary proof of the Johnson-Lindenstrauss Lemma. Technical report, International Computer Science Institute, Berkeley, 1999
Achlioptas D: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci 2003, 66: 671-687. 10.1016/S0022-0000(03)00025-4
Article MathSciNet MATH Google Scholar
Kaski S: Dimensionality reduction by random mapping: fast similarity computation for clustering. In The 1998 IEEE World Congress on Computational Intelligence, vol. 1. Piscataway: IEEE; 1998:413-418.
Google Scholar
The PIR Sensor Co. Ltd. . Accessed 7 May 2013 http://pirsensor.bloombiz.com
Sim DG, Kwon OK, Park RH: Object matching algorithms using robust Hausdorff distance measures. IEEE Trans. Image Process 1999, 8(3):425-429. 10.1109/83.748897
Article Google Scholar
Kim SH, Park RH: An efficient algorithm for video sequence matching using the modified Hausdorff distance and the directed divergence. IEEE Trans. Circ. Syst. Video Technol 2002, 12(7):592-596. 10.1109/TCSVT.2002.800512
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. They also wish to thank all the staff of the Information Processing & Human-Robot Systems lab in Sun Yat-sen University for their aid in conducting the measurement experiments. This work is partly supported by the National Natural Science Foundation of Liaoning Province (grant no. 2013020008) and the National Natural Science Foundation of China (grant no. 61074167).

Author information

Authors and Affiliations

Department of Electronic Science, Huizhou University, Guangdong, 516001, China
Tong Liu
College of Physics & Electronic Information Engineering, Wenzhou University, Wenzhou, 325035, China
Jun Liu

Authors

Tong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Liu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Liu, T., Liu, J. Design and implementation of a compressive infrared sampling for motion acquisition. EURASIP J. Adv. Signal Process. 2014, 20 (2014). https://doi.org/10.1186/1687-6180-2014-20

Download citation

Received: 07 May 2013
Accepted: 16 January 2014
Published: 19 February 2014
DOI: https://doi.org/10.1186/1687-6180-2014-20

Design and implementation of a compressive infrared sampling for motion acquisition

Abstract

1 Introduction

2 Pyroelectric infrared sensor model

2.1 Sensing model

2.2 Reference structure-based infrared sampling model

2.3 Sparsity analysis on motion representation

3 Random matrix-based compression

4 Compressive infrared sampling

4.1 Random matrix-based compressive infrared sampling

4.2 Statistical analysis of random measurement matrix

5 Experiments and results

5.1 Experimental setup

5.2 Measurement sequence segmentation

5.3 Hausdorff distance-based recognition method

5.4 Experimental results and discussion

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords