Skip to content

Advertisement

  • Research Article
  • Open Access

Efficient Human Action and Gait Analysis Using Multiresolution Motion Energy Histogram

  • 1Email author,
  • 2,
  • 2 and
  • 2, 3
EURASIP Journal on Advances in Signal Processing20102010:975291

https://doi.org/10.1155/2010/975291

Received: 29 November 2009

Accepted: 10 February 2010

Published: 28 April 2010

Abstract

Average Motion Energy (AME) image is a good way to describe human motions. However, it has to face the computation efficiency problem with the increasing number of database templates. In this paper, we propose a histogram-based approach to improve the computation efficiency. We convert the human action/gait recognition problem to a histogram matching problem. In order to speed up the recognition process, we adopt a multiresolution structure on the Motion Energy Histogram (MEH). To utilize the multiresolution structure more efficiently, we propose an automated uneven partitioning method which is achieved by utilizing the quadtree decomposition results of MEH. In that case, the computation time is only relevant to the number of partitioned histogram bins, which is much less than the AME method. Two applications, action recognition and gait classification, are conducted in the experiments to demonstrate the feasibility and validity of the proposed approach.

Keywords

  • Recognition Rate
  • Action Recognition
  • Resolution Level
  • Motion Period
  • Human Action Recognition

1. Introduction

Analyzing human's behavior or identity is a very interesting research topic because human is usually the most concerned object in many applications such as surveillance system or video understanding. Recently, this problem is usually solved by two kinds of approaches: video-based approaches or sensor-based approaches [1, 2]. The advantage of video-based approach is that the individuals do not have to put on additional devices and the hardware cost is also cheaper. For video-based approaches, there exists abundant considerable works made by previous researchers such as employing template matching [3, 4], Intensity-based features [5, 6], shape matching [7], and spatial-temporal features [813]. For spatial-temporal features, motion energy images (MEIs) are a very useful feature which incorporates temporal information into spatial images. The idea of MEI was firstly introduced by Bobick and Davis in [14]. The authors obtained the MEI by collecting a group of frames and extract scale invariant features for recognition. This idea was extended to the so called average motion energy (AME) by aligning and normalizing the foreground silhouettes [15]. By doing so, the AME can depict human's motion in a two-dimensional space (the spatial domain) while preserving the temporal motion information. Unlike other approaches in [10], the process of generating AME is computationally inexpensive and can be employed in real-time applications. Besides, it had been proven that AME can provide reliable accuracy for action recognition by other researchers [16]. Recently, some researchers focus on more challenging problems such as gender classification [17, 18] and gait classification [1924]. The AME idea was also employed in [22], in which they named AME as gait energy images (GEIs).

In [15], the authors directly employed the sum of absolute difference (SAD) for action recognition purpose and obtain adequate recognition results. However, the computation of SAD is inefficient when the size of image is large because the computational time is relevant to the image size. This problem is not severe when the amount of database images is small but can be expected with the increasing size of image database. To remedy this problem, we propose a histogram-based approach which can efficiently compute the similarity among patterns. Firstly, the AME image is converted to the motion energy histogram (MEH). Then, we adopt a multiresolution structure to construct the multiresolution motion energy histogram (MRMEH). Last, we propose an uneven partitioning method to address the important part of MEH automatically and apply an efficient histogram matching algorithm by utilizing the characteristic of multiresolution histogram.

2. Histogram-Based Method

In this section, we will introduce the proposed histogram-based method for human action recognition. Section 2.1 describes the overview of the proposed framework. How to generate the AME and MEH is addressed in Sections 2.2 and 2.3, respectively. Then we will briefly describe the characteristic of MEH in Section 2.4. The construction and utilization of MRMEH are elaborated in Sections 2.5 and 2.6, followed by the time complexity analysis of the proposed method in Section 2.7.

2.1. System Overview

Figure 1 shows the overview of the system. Firstly, we extract the motion period from the input video. We can obtain the Average Motion Energy image based on the extracted motion period and convert it to the Motion Energy Histogram (MEH). Then we perform quadtree decomposition on the MEH to construct the Multiresolution Motion Energy Histogram (MRMEH). Finally we apply an efficient histogram matching algorithm on MRMEH to obtain the recognition result.
Figure 1
Figure 1

System Overview.

2.2. Average Motion Energy (AME)

Before elaborating on the proposed histogram-based method, the average motion energy (AME) image proposed by Wang and Suter [15] is first introduced. Given a set of aligned human silhouettes , the AME can be defined as follows:
(1)
where is the number of frames. In their work, is set as the motion period of the input action. There have been plenty of studies for periodicity detection of motions. For example, in [20], the authors adopted a simple strategy to extract the motion cycle. They observed the number of foreground pixels in the silhouette in each frame over time, . Afterwards the motion period can be extracted by seeking the local maximums of . This strategy works fair in the walking motion. However, for some other periodic motions such as waving hands or jumping, the size of the foreground silhouette does not change much. Figure 2 is an example which shows the variation of of two motions, where Figure 2(a) represents the of motion "walking" and Figure 2(b) represents the of motion "waving one hand", respectively. We can see that there are obvious local maximums and local minimums in in the walking motion but not in the waving motion. Therefore, we adopt another robust method which measures object's self-similarity over time [25]. Obviously, if the periodic motion exists in the video sequence, we can find similar poses in different frames. In other words, high correlation value can be obtained when comparing the foreground blobs in different frames within a period of time. Thus we can get the motion period by subtracting the indices of these frames.
Figure 2
Figure 2

The size of foreground silhouette for two motions over time: (a) walking, and (b) waving one hand.

2.3. Motion Energy Histogram (MEH)

From (1) we can see that the pixel value in represents the intensity at position . From a histogram point of view, we can regard AME as a two-dimensional histogram whose bin value represents the frequency on position during time interval . Thus, we can reform the AME to the motion energy histogram (MEH) by using the following equation:
(2)
After transforming AME to MEH, the process of recognizing two different MEHs becomes a histogram matching process. Hence, various properties of histogram can be employed to improve the recognition performance. Through the observation, we find that the corresponding MEHs of two entirely different actions can be distinguished at very low-resolution. For example, Figure 3 illustrates the MEHs of two actions: walking and waving both hands. The left column is the original MEHs of these two actions and the right column is their corresponding MEH at low resolution. We can see that even under such a low resolution, these two actions can still be distinguished perceptually. In other words, if we can classify these actions at lower resolution level rather than directly compare them at the highest resolution level, the whole recognition process will become more efficient because the procedure will compare less histogram bins. However, the recognition rate may decrease under low resolution levels. In order to maintain the same recognition rate as the method proposed in [15], we adopt a multiresolution structure on the histogram to achieve this goal. Details will be introduced in the next subsection.
Figure 3
Figure 3

MEHs of two periodic motions (a) and their corresponding low-resolution image (b).

2.4. Characteristic of Mutliresolution Histogram

The basic idea of multiresolution was firstly introduced in [26] and was further extended to a general form by Yu et al. [27]. A specified partitioning method on histogram bins is used to downscale the resolution of a histogram. By performing this operation recursively we can obtain a pyramid structure of multiresolution histogram. For a given histogram with bins, the nonuniform partitioning process for the multiresolution structure is stated as follows. We firstly divide these bins into disjointed subsets, , . A histogram with bins is defined as the lower-resolution version of X and the bin values of are described as follows:
(3)
As Yu et al. show in [27], and satisfy the inequality
(4)
where represents the similarity measurement function -norm and -distance. By using the nonuniform partitioning method iteratively, we can form several multiresolution histograms and the inequality chain in (3) can be rewritten as
(5)

where denotes the bin size of the current resolution level, and , where denotes the element number of the universal bin set . We will introduce how to construct a multiresolution motion energy histogram in the next subsection.

2.5. Construct the Multiresolution Motion Energy Histogram (MRMEH)

As we mentioned in Section 2.4, the multiresolution histogram can be constructed by summing certain bins at higher-resolution level. However, in order to make this multiresolution structure more effective, the selection of the disjointed subsets is a very important task. As addressed in [26], the uniform partitioning method is a simple and straightforward way. However, Yu et al. [27] proved that the nonuniform partitioning method has better performance than the uniform one. Hence, we decide to create via the nonuniform partitioning method. Here, we adopt a straightforward strategy to generate automatically. Since different actions have different characteristics, it is reasonable that for different types of actions, the partitioning method should also be different. In terms of image, the result of a quadtree represents a partition of space in two dimensions by decomposing the image into four equal quadrants, subquadrants, recursively. Each node in the quadtree either has exactly four children or no children (a leaf node). Each leaf node contains the data corresponding to a specific subregion of the image. Because of the characteristic of quadtree, we can address the importance of a given image automatically with a proper setting of decomposition criterion. Thus, we can realize that quadtree is a very efficient structure which can represent the characteristic of two-dimensional data. In our work, we set the decomposition rule as "nonzero value". That is, if the current quadrant contains non-zero pixel values, it will be decomposed into four subquadrants. Since the quadtree decomposition method will divide the image into four subregions each time, the width and height of the image must be the power of 2. Therefore, we set the bins of MEH in each dimension as 256. Figure 4 is the decomposition result on an MEH of motion "walking" with minimum region size of 64.
Figure 4
Figure 4

Quadtree decomposition results of a walking motion.

In this paper, we propose a method to construct the multiresolution motion energy histogram (MRMEH) by using the results of quadtree decomposition on MEH. The construction of MRMEH can be easily understood through Figure 5. The maximum resolution level is the depth of the tree. Nodes in different colors represent the histogram bin of certain resolution level(s) and the value of each node is the summation of all pixel values in the corresponding subregion. It is reasonable that level should have bins at most if every node at level is decomposed into four children. However, each resolution level will have less than bins in real world applications because not every node will be decomposed according to the decomposition criterion. More specifically, the bin set does not cover the universal bin set at some resolution levels because not every node will be decomposed to that level. To solve this problem, if we found that a bin at level is indecomposable, we will copy it to the higher resolution level to retain the information inside. By doing so, we can ensure that the histogram bin set at each resolution level is always equal to . As shown in Figure 5, the number of histogram bins at level 2 and 3 will be 10 and 22, respectively. Figure 6 is the MRMEH of the action "running". Note that the two-dimensional histograms are reformed to one-dimensional for better understanding. Figure 6(a) is the MEH and Figure 6(b) is the quadtree decomposition result. Figures 6(c) to 6(g) are the corresponding MRMEHs at resolution levels 1 to 5, respectively. In the next subsection, we will introduce how to utilize the MRMEH to recognize human actions.
Figure 5
Figure 5

The idea of constructing an MRMEH.

Figure 6
Figure 6

A five-level MRMEH for the motion "running". (a) Original MEH. (b) Quadtree decomposition result. (c)–(g) Corresponding histogram from levels 1 to 5, respectively.

2.6. Efficient Action/Gait Recognition Using MRMEH

As we mentioned in Section 2.4, for two histograms using the same partitioning method, the similarity between these two histograms at different resolution levels will obey the inequality described in (4). From this equation we can realize that further comparison is unnecessary at higher-resolution level if the similarity at lower-resolution level is above the threshold. Thus we can speed up the recognition process by comparing less histogram bins. It is noticeable that we must use same partition method for all compared histograms to let the similarity between different actions be comparable. Thus, we adopt a dynamic partition method. More specifically, we obtain the partition method from the query MEH and employ it to all MEHs in the database. Assume that there are -defined actions in the database. The matching algorithm is described in Algorithm 1.

Algorithm 1: Algorithm for action recognition

Input: :MEH of the query action with size - by-

Output: The best matched

( ) Initialize ; ; th Inf;

( ) Construct the MRMEH from min_lv to max_lv base on the quadtree decomposition result

   of . Memorize the partitioning method .

( ) for all in the database

( )

( ) for  : 

( )  find histogram at resolution level according to M p

( )   , ;

( )  if Dist th and

( )  then ;

( )   else discard ;

( )   end

( )   if Dist th and

( )   then ; th Dist;

( )   end

( ) end

2.7. Time Complexity Analysis

In this section, we will analyze the time complexity between the MRMEH method and the AME-SAD method [15]. Here we evaluate the number of operations among different methods. Assume that the MEH is an -by- two-dimensional histogram. According to the Algorithm 1, the proposed method has to apply the quadtree decomposition before starting the matching procedure. Thus, we have to consider the time cost of quadtree decomposition. The number of operations of quadtree decomposition is bounded between and . Since the number of bins at each resolution level is dynamically determined, it is very difficult to estimate how many bins in total will be compared for each query. Here, we assume that this branch-and-bound algorithm has to compare bins to decide whether eliminating a candidate MEH or not. Then, for each , the number of operations will be to for constructing the corresponding MRMEH and for matching. This is even worse than the AME-SAD method, which takes operations. However, this matching process can be reduced to by storing the decomposition information in advance. Hence, the total number of operations will be bounded from to and the recognition time will be relevant to , not with the increasing . In our experiments, will be much smaller than and several quantitative results are made to verify this approach.

First of all, we divide the recognition process into two parts: the offline and online procedures. In the offline processing, we will construct a complete quadtree for each action. More specifically, all leaf nodes will be in the deepest level of the tree. The value in each nonleaf node is the summation of its four children's value. In the online process, after obtaining the quadtree decomposition result of , we can directly find out the value in the corresponding node from the preconstructed tree. By doing so the computation cost will only be related to the number of bins we compared with , which is supposedly. For simplicity, here we take a binary tree, for example, to illustrate this idea. As shown in Figure 7, the right column is the preconstructed tree in the database. Given a quadtree structure in the left column of Figure 7, the corresponding histogram at levels 1 to level 3 is shown from the second row to the fourth row in the right column, respectively. Thus it is very easy to find out the corresponding node to construct the histogram at a specific resolution level.
Figure 7
Figure 7

An example of matching two histograms effectively. (a) Tree of the query histogram. (e) The preconstructed tree in the database. (b)–(d) Compared bins at different level. (f)–(h) Corresponding bin values in the database.

3. Experiments

In this section, some experiments were conducted to demonstrate the feasibility, robustness, compatibility, and validity of the proposed approach on two applications: human action recognition and gait classification. All statistics are evaluated using Matlab 2007b and the Intel Core 2 quad CPU 2.4 G with 2.0 G RAM.

3.1. Action Recognition

To evaluate the performance of the proposed method, we adopt the Weizmann database used in [10]. Because nonperiodic motions are hard to extract the motion period, we only use the periodic motions in their database, which includes 7 periodic actions from 9 different subjects and each subject performs 7 periodic motions. For simplicity, we numbered these 7 actions from A1 to A7, which are walking (A1), running (A2), jumping-forward (A3), jumping jack (A4), waving one hand (A5), waving two hands (A6), and galloping-sideways (A7). The resolution of the videos is 180-by-144 and 25 fps. Moreover, we add additional 10 subjects with our own videos in a different environment to enlarge the database. Hence the number of testing subjects is 19 in the whole experiments. The resolution of the additional videos is 360-by-240 and 30 fps. Sample images of all subjects are shown in Figure 8.
Figure 8
Figure 8

Subjects used in the action recognition experiment.

For those videos taken by ourselves, we employ a robust background subtraction approach [28] to extract the silhouette. For the database used in [10], we directly use the masks provided by the authors. All silhouettes were aligned by the center of the mass. Since the width and height of input image must be the power of 2 for quadtree decomposition, we adopted a 256-by-256 image in our experiments. Same subject may have different heights due to different distances to the camera in these videos; we normalize the silhouette according to the scale of height to provide fair comparisons. In our experiments, we normalize the silhouette to 150 pixels high. Figure 9 shows the generated MEH of all 7 actions from all subjects.
Figure 9
Figure 9

MEH of 7 actions. (a) Walking, (b) Running. (c) Jumping. (d) Jumping Jack. (e) Waving one hand. (f) Waving two hands. (g) galloping-sideways.

3.2. Recognition Accuracy Analysis

In the recognition process, we adopt the leave-one-out cross-validation rule to obtain fair recognition rate. The histogram similarity function used in this paper is -norm. Although the -distance is also compatible of the multiresolution structure, we find that -norm similarity measurement already provides high recognition rate in the experiments. Table 1 tabulates the recognition rate at different levels. Numbers in bold represent the recognition rate for each action at each resolution level. Figure 10 is the comparison between the AME-SAD method employed in [15] and the proposed MRMEH- method. The AME-SAD method always compares the whole images to the database images so that the recognition rate is fixed. We can find that the recognition rate is equal to the AME-SAD method at resolution level 5. Therefore, we only have to construct the quadtree to level 5 (whose minimum block size is 8-by-8) and save more computation time. Tables 2, 3, 4, 5, and 6 show the confusion matrix at each resolution level.
Table 1

The recognition rate of 7 motions at 5 resolution levels.

Lv

5

4

3

2

1

A1

19/19

19/19

18/19

18/19

10/19

 

100.00%

100.00%

94.73%

94.73%

52.63%

A2

19/19

19/19

18/19

17/19

15/19

 

100.00%

100.00%

94.73%

89.47%

78.95%

A3

18/19

18/19

17/19

14/19

13/19

 

94.73%

94.73%

89.47%

73.68%

68.42%

A4

18/19

18/19

15/19

14/19

6/19

 

94.73%

94.73%

78.95%

73.68%

31.58%

A5

19/19

19/19

18/19

9/19

10/19

 

100.00%

100.00%

94.73%

60.00%

52.63%

A6

19/19

18/19

13/19

14/19

8/19

 

100.00%

94.73%

68.42%

73.68%

50.00%

A7

19/19

19/19

19/19

17/19

8/19

 

100.00%

100.00%

100.00%

89.47%

50.00%

Table 2

Confusion matrix at resolution level 5.

 

A1

A2

A3

A4

A5

A6

A7

A1

1.00

0

0

0

0

0

0

A2

0

1.00

0

0

0

0

0

A3

0

0

0.95

0

0.05

0

0

A4

0

0

0

0.95

0

0.05

0

A5

0

0

0

0

1.00

0

0

A6

0

0

0

0

0

1.00

0

A7

0

0

0

0

0

0

1.00

Table 3

Confusion matrix at resolution level 4.

 

A1

A2

A3

A4

A5

A6

A7

A1

1.00

0

0

0

0

0

0

A2

0

1.00

0

0

0

0

0

A3

0

0

0.95

0

0.05

0

0

A4

0

0

0

0.95

0

0.05

 

A5

0

0

0

0

1.00

0

0

A6

0

0

0

0.05

0

0.95

0

A7

0

0

0

0

0

0

1.00

Table 4

Confusion matrix at resolution level 3.

 

A1

A2

A3

A4

A5

A6

A7

A1

0.95

0

0

0

0

0

0.05

A2

0

0.95

0.05

0

0

0

0

A3

0

0

0.90

0

0.05

0

0.05

A4

0

0

0

0.73

0

0.27

0

A5

0

0

0

0

1.00

0

0

A6

0

0

0.11

0.16

0.05

0.68

0

A7

0

0

0

0

0

0

1.00

Table 5

Confusion matrix at resolution level 2.

 

A1

A2

A3

A4

A5

A6

A7

A1

0.95

0.05

0

0

0

0

0

A2

0

0.89

0.11

0

0

0

0

A3

0

0.11

0.74

0.05

0.05

0

0.05

A4

0

0

0

0.73

0.05

0.11

0.11

A5

0

0

0

0.16

0.57

0.27

0

A6

0

0

0.11

0.16

0.05

0.68

0

A7

0

0

0

0

0

0

1.00

Table 6

Confusion matrix at resolution level 1.

 

A1

A2

A3

A4

A5

A6

A7

A1

0.53

0.05

0.05

0

0.05

0

0.32

A2

0

0.79

0.21

0

0

0

0

A3

0.11

0.11

0.68

0.05

0.05

0

0

A4

0.26

0

0

0.27

0.05

0.11

0.37

A5

0.05

0

0

0.11

0.52

0.16

0.16

A6

0.16

0

0

0

0.05

0.42

0.37

A7

0.16

0

0.05

0.05

0.11

0.21

0.42

Figure 10
Figure 10

Comparison between MRMEH- and AMD-SAD in terms of recognition rate.

3.3. Recognition Efficiency Analysis

In the second experiment, several quantitative results are conducted to evaluate the computational efficiency. Table 7 tabulates the average number of bins for different actions at different resolution levels. We can find that at resolution level 5, the average number of bins falls from 168.2 to 268.4, which are much less than AME-SAD method which use the whole image (256-by-256) for comparison. Furthermore, if we can eliminate as many dissimilar MEH as possible at lower-resolution level, the computation efficiency can be further improved. In our experiments, the average processed resolution level is 2.83 when the maximum resolution level is 5. Table 8 depicts the average number of pixels (bins) need to be processed during the recognition process for each query. In order to construct the MRMEH, we have to retrieve each pixel in the AME several times. The average number of pixels needed to be retrieved and compared for all actions is shown in the second and third rows of Table 8, respectively. It is noticeable that the recognition rate of the proposed MRMEH- (no matter using quadtree-based or uniform partitioning method) and AME-SAD will be the same (see Algorithm 1). Hence the only difference among these methods is the computation time. Although the proposed algorithm has to spend additional time on organizing the MRMEH using quadtree decomposition or uniform partitioning method, the multiresolution structure approach still has better efficiency than the AME-SAD method. With the increasing of the database templates, the proposed method will be much more effective than using the AME-SAD approach.
Table 7

Average number of bins at each resolution level.

Lv

A1

A2

A3

A4

A5

A6

A7

1

4

4

4

4

4

4

4

2

16

16

16

16

16

16

16

3

40

40

39.8

40

40

39.9

39.9

4

84.7

86.5

76.8

97.9

81.6

87

88.6

5

204.4

219.5

168.2

268.3

184.7

207.1

213

Table 8

Comparisons between AME-SAD and two kinds of partitioning methods in action recognition application.

Number of operations

AME-SAD

Quadtree-

Uniform-

MRMEH construction (no. of pixels retrieved)

191451.43

327680

Action recognition (no. of bins compared)

458752

1438.71

5114.69

3.4. Real-Time Action Recognition

Here we demonstrate a system which is able to quickly classify different periodic motions in a single video sequence. As described in [25], we also adopt the self-similarity matrix in this application. If the motion cycle is successfully extracted at frame , then we empty the matrix , obtain the corresponding MEH, and perform the action recognition. If we cannot extract the motion period at frame , we keep computing the self-similarity information at frame and added it into the matrix . Since it takes several frames to accomplish a motion cycle, in our work the system tries to extract the motion period in every 15 frames. Because it is meaningless to compute the periodicity between two dissimilar actions, we choose to empty the matrix S when a motion cycle is detected in order to avoid computing the self-similarity between two dissimilar actions. Figure 11 shows the recognition results on a subject who is performing several motions sequentially. These motions are waving one hand, side walking, waving two hands, and running. Since the recognition time of the proposed method is very short, our method can response human's motions immediately and correctly.
Figure 11
Figure 11

Action recognition in a single video sequence.

3.5. Gait Classification

In this experiment, we will demonstrate the advantage of our method on another application: the human gait classification. Unlike the action recognition problem, the human gait classification problem involves many subjects (usually more than 100) with similar actions so that the classification task becomes more challenging. In order to quickly identify each individual in a large database, an efficient matching algorithm is required. We adopt the CASIA Gait Database (Dataset B) [19] which consists of 124 subjects' gaits from 11 view angles (see Figure 12). Each subject performs the walking motion 10 times which include six natural walks, two walking sequences with a bag, and two sequences with a coat. For simplicity, we only consider the side-view and natural walk sequences in our experiment. We select 120 subjects whose silhouette images are acceptable for gait analysis. Figure 13 shows some examples of the AME images used in our experiments.
Figure 12
Figure 12

Human gaits captured from 11 different view angles.

Figure 13
Figure 13

The extracted AME images in our experiments.

We adopt the leave-one-out cross-validation rule and the cumulative match score (CMS) to evaluate the performance of our method. The resolution level of MRMEH is from 1 to 7. In our experiment, the classification rate is 96.39% using the AME-SAD and 1-NN rule. Figure 14 shows the CMS among the top 10 matches at different resolution levels where the -axis is the recognition rate. In our work, the recognition rate of MRMEH at resolution level 7 reaches the same accuracy as that of AME-SAD. Note that for the action recognition problem described in Section 3.2, the recognition rate of MRMEH at resolution level 5 already reaches the same accuracy as that of AME-SAD. Because the gait classification problem is more sophisticated than the action recognition problem, we have to use a finer resolution level in order to identify different people. Table 9 tabulates comparison of the average number of bins at different resolution levels between the nonuniform (quadtree-based) partitioning method and the uniform partition method. We can find that the bin number of quadtree-based method is only 9.63% of the uniform partitioning method at resolution level 7. Since both methods have the same recognition rate, obviously the quadtree-based partitioning method is the better choice in order to obtain better computation efficiency. Table 10 tabulates the average number of comparisons for each query between the AME-SAD and the MRMEH method. We can find that the MRMEH approach (either quadtree-based or uniform partitioning method) performs better than the AME-SAD method. Moreover, the quadtree-based partitioning method has the least number of operations (only 2% of the AME-SAD method) among the three methods.
Table 9

Number of bins at different resolution levels using the quadtree-based and uniform partitioning method.

Resolution level

Quadtree-based partitioning

Uniform partitioning

1

4

4

2

16

16

3

39.94

64

4

86.98

256

5

216.67

1024

6

581.73

4096

7

1578.30

16384

Table 10

Comparisons between AME-SAD and two kinds of partitioning methods in gait classification application.

Number of operations

AME-SAD

Quadtree-

Uniform-

MRMEH construction (no. of pixels retrieved)

205753.33

65536

Action recognition (no. of bins compared)

7864320

142225.45

1693850.2

Figure 14
Figure 14

The CMS at different resolution levels.

4. Conclusion

In this paper, we propose a histogram-based approach on human action recognition. We transform the average motion energy (AME) image to the two-dimensional motion energy histogram (MEH). By doing so, various characteristics of histogram can be adopted to improve the performance on similarity measurement. We discover that the MEH generated within a motion period provides rich information to distinguish different type of motions. To improve the recognition efficiency, we adopt a multiresolution structure on the MEH. The multiresolution motion energy histogram (MRMEH) is a very useful structure which is able to remarkably speed up the recognition process. In order to construct an adequate MRMEH for different actions or people, we proposed an automated partitioning method using the characteristic of quadtree decomposition. The important part in each MEH can be automatically addressed through the decomposition process. Experiments show that the number of operations is highly reduced using the proposed nonuniform partitioning method so that the computation efficiency is greatly improved. Moreover, the recognition accuracy remains the same as the AME-SAD method via the multiresolution histogram matching algorithm. Because of the computationally inexpensive approach, real-time system is practical by using the MRMEH with high recognition accuracy. Through bunch of quantitative experiments we verified our thoughts as well as demonstrated the powerful capabilities of MRMEH in action recognition and gait classification applications.

Declarations

Acknowledgment

This work is supported by the National Science Council (no. 98-2218-E-238-002-).

Authors’ Affiliations

(1)
Department of Computer Science and Information Engineering, Vanung University, Chung-Li, Taiwan
(2)
Department of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan
(3)
Department of Informatics, Fo-Guang University, I-Lan, Taiwan

References

  1. Middleton L, Buss AA, Bazin A, Nixon MS: A floor sensor system for gait recognition. Proceedings of the 4th IEEE Workshop on Automatic Identification Advanced Technologies (AUTO ID '05), October 2005, New York, NY, USA 171-180.View ArticleGoogle Scholar
  2. Gafurov D, Snekkenes E: Gait recognition using wearable motion recording sensors. EURASIP Journal on Advances in Signal Processing 2009, 2009:-16.Google Scholar
  3. Bobick AF, Davis JW: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(3):257-267. 10.1109/34.910878View ArticleGoogle Scholar
  4. Lam THW, Lee RST, Zhang D: Human gait recognition by the fusion of motion and static spatio-temporal templates. Pattern Recognition 2007, 40(9):2563-2573. 10.1016/j.patcog.2006.11.014View ArticleMATHGoogle Scholar
  5. Schüldt C, Laptev I, Caputo B: Recognizing human actions: a local SVM approach. Proceedings of the International Conference on Pattern Recognition, 2004 3: 32-36.Google Scholar
  6. Fathi A, Mori G: Action recognition by learning mid-level motion features. Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '08), 2008 1-8.Google Scholar
  7. Carlsson S, Sullivan J: Action recognition by shape matching to key frames. Proceedings of the Workshop on Models versus Exemplars in Computer Vision (CVPR '01), December 2001, Kauai, Hawaii, USAGoogle Scholar
  8. Bobick AF, Davis JW: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(3):257-267. 10.1109/34.910878View ArticleGoogle Scholar
  9. Veeraraghavan A, Chowdhury AR, Chellappa R: Role of shape and kinematics in human movement analysis. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 1: 730-737.Google Scholar
  10. Blank M, Gorelick L, Shechtman E, Irani M, Basri R: Action as space-time shapes. Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV '05), 2005, Beijing, China 1395-1402.View ArticleGoogle Scholar
  11. Han J, Bhanu B: Human activity recognition in thermal infrared imagery. Proceedings of the 6th IEEE Workshop on Object Tracking and Classification Beyond and in the Visible Spectrum (OTCBVS '05), June 2005, San Diego, Calif, USAGoogle Scholar
  12. Lakany H: Extracting a diagnostic gait signature. Pattern Recognition 2008, 41(5):1644-1654.View ArticleMATHGoogle Scholar
  13. Davis J: Hierarchical motion history images for recognizing human motion. Proceedings of the IEEE Workshop on Detection and Recognition of Events in Video (EVENT '01), July 2001, Vancouver, BC, CanadaGoogle Scholar
  14. Bobick A, Davis J: The representation and recognition of action using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(3):257-267. 10.1109/34.910878View ArticleGoogle Scholar
  15. Wang L, Suter D: Informative shape representations for human action recognition. Proceedings of the International Conference on Pattern Recognition, 2006 2: 1266-1269.Google Scholar
  16. Zou X, Bhanu B: Human activity classification based on gait energy image and convolutionary genetic programming. Proceedings of the 18th International Conference on Pattern Recognition, 2006 3: 556-559.Google Scholar
  17. Shan C, Gong S, McOwan PW: Fusing gait and face cues for human gender recognition. Neurocomputing 2008, 71(10–12):1931-1938.View ArticleGoogle Scholar
  18. Li X, Maybank SJ, Yan S, Tao D, Xu D: Gait components and their application to gender recognition. IEEE Transactions on Systems, Man and Cybernetics Part C 2008, 38(2):145-155.View ArticleGoogle Scholar
  19. Yu S, Tan D, Tan T: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proceedings of the International Conference on Pattern Recognition, 2006 4: 441-444.Google Scholar
  20. Sarkar S, Phillips PJ, Liu Z, Vega IR, Grother P, Bowyer KW: The humanID gait challenge problem: data sets, performance, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27(2):162-177.View ArticleGoogle Scholar
  21. Lee L, Grimson WEL: Gait analysis for recognition and classification. Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition (FGR '02),, May 2002, Washington, DC, USA 155-162.View ArticleGoogle Scholar
  22. Han J, Bhanu B: Individual recognition using gait energy image. IEEE Transactions on Pattern Analysis and Machine Intelligence 2006, 28(2):316-322.View ArticleGoogle Scholar
  23. Cheng M-H, Ho M-F, Huang C-L: Gait analysis for human identification through manifold learning and HMM. Pattern Recognition 2008, 41(8):2541-2553. 10.1016/j.patcog.2007.11.021View ArticleMATHGoogle Scholar
  24. Huang X, Boulgouris NV: Human gait recognition based on multiview gait sequences. EURASIP Journal on Advances in Signal Processing 2008, 2008:-8.Google Scholar
  25. Cutler R, Davis LS: Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(8):781-796. 10.1109/34.868681View ArticleGoogle Scholar
  26. Song B-C, Kim MJ, Ra JB: A fast multiresolution feature matching algorithm for exhaustive search in large image databases. IEEE Transactions on Circuits and Systems for Video Technology 2001, 11(5):673-678. 10.1109/76.920197View ArticleGoogle Scholar
  27. Yu C-C, Jou F-D, Lee C-C, Fan K-C, Chuang TC: Efficient multi-resolution histogram matching for fast image/video retrieval. Pattern Recognition Letters 2008, 29(13):1858-1867. 10.1016/j.patrec.2008.06.004View ArticleGoogle Scholar
  28. Horprasert T, Harwood D, Davis LS: A statistical approach for real-time robust background subtraction and shadow detection. Proceedings of the IEEE Frame-Rate Workshop (ICCV '99), 1999Google Scholar

Copyright

© Chih-Chang Yu et al. 2010

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement