Skip to main content

A Grassmann graph embedding framework for gait analysis


Gait recognition is important in a wide range of monitoring and surveillance applications. Gait information has often been used as evidence when other biometrics is indiscernible in the surveillance footage. Building on recent advances of the subspace-based approaches, we consider the problem of gait recognition on the Grassmann manifold. We show that by embedding the manifold into reproducing kernel Hilbert space and applying the mechanics of graph embedding on such manifold, significant performance improvement can be obtained. In this work, the gait recognition problem is studied in a unified way applicable for both supervised and unsupervised configurations. Sparse representation is further incorporated in the learning mechanism to adaptively harness the local structure of the data. Experiments demonstrate that the proposed method can tolerate variations in appearance for gait identification effectively.

1 Introduction

The use of CCTV video cameras for surveillance is common in public and commercial establishments like banks, shopping malls, parks, and railway stations. Most of the current video surveillance systems require human operators to constantly supervise the cameras. In other words, the effectiveness of the system is largely dependent on the vigilance of the person monitoring the system. To resolve this shortcoming, research is under way to develop automated systems for real-time cameras monitoring. Among the efforts, gait recognition is a popular study for automatic human identification. Gait recognition is a biometric technology that identifies people based on the manner they walk. This technology is suitable for person identification at a distance when other biometrics like face, iris, or fingerprint might be obscured or at too low a resolution. In many situations, gait is the only evidence available from a crime scene [1].

With the advent of visual surveillance, it is not difficult to obtain multiple-viewpoint shot of a subject or video outputs over a period of time. These multiple sets of images can be combined to yield better performance as compared to single-shot images. Subspace-based approaches have been shown effective in modeling data consisting of multiple sets of images [2]. For example, Jacobs et al. [3] showed that illumination on human faces can be modeled as a nine-dimensional subspace under mild assumptions. Subsequent to this finding, sets of images of the same person under varying lighting conditions are often modeled as low dimensional subspaces [46]. While a subspace is a linear space, the collection of linear subspaces is a completely different space known as the Riemannian manifold [7]. More formally, the d-dimensional subspace in n is called the Grassmann manifold, named after the famous mathematician Hermann Günther Grassmann [8]. The Grassmann manifold has long been known for its fascinating mathematical properties. However, its applications in computer vision and machine learning have appeared rather recently.

Turaga et al. [9] demonstrated the use of computer vision applications such as video-based face recognition, activity recognition, and image set-based object recognition on the Grassmann manifold. The Grassmann manifold structure of the face shape is also utilized in [10] for age estimation and face verification. In [11], geometrical structure of the Grassmann manifold was exploited for visual tracking scheme.

Hamm and Lee [12] showed that using a suitable Grassmann kernel, the Grassmann space can be embedded to a higher-dimensional reproducing kernel Hilbert space (RKHS) where many Euclidean algorithms can be generalized. Subsequent to this finding, several studies extended the use of dimension reduction methods on the Grassmann manifold [13, 14]. Considerable improvement in recognition accuracy has been reported for this application.

In this paper, we propose an approach called Grassmann graph embedding (GGE) for gait analysis. Motivated by the success of the graph embedding (GE) framework [15], we show how GE can be integrated in the Grassmann manifold for the gait recognition problem through the use of well-defined kernel functions on the manifold. We provide a general formulation that supports both supervised and unsupervised dimension reduction mechanisms. We further attach semantic meaning to the gait data by incorporating sparse representation in our learning mechanism.

The rest of the paper is organized as follows. In Section 2, we review the different approaches to gait recognition. In Section 3, we provide the background of the methods used in this paper. The overall framework for the proposed Grassmann GE learning is described in Section 4. In Section 5, we present experimental results on different settings. Lastly, some concluding remarks are given in Section 6.

2 Related work

We provide a background study for the methods addressing view angle, clothing, and also speed factors in gait recognition. Besides, some subspace-based techniques related to our work are also reviewed in this Section.

2.1 Gait recognition under various viewing angles

Appearance change due to varying view angles is one of the greatest challenges in gait analysis. Studies show that single-view gait recognition performance drops when the view angle changes [16, 17]. Current approaches to gait recognition under various viewing angles can be classified into one of the three major categories: (1) extraction of view-invariant gait feature, (2) generation of three-dimensional (3D) gait information, and (3) learning projection or mapping functions to transform gait features from various views into a common feature space.

The first approach attempts to find gait features that are invariant to view changes. Jean et al. [18] introduced body part trajectories as the view-invariant feature. The 2D trajectories of the feet and head were normalized to make them appear as if they were always seen from the front-to-parallel viewpoint. A method was proposed by Kale et al. [19] to synthesize the lateral view from arbitrary view through perspective projection in a sagittal plane. Recently, Goffredo et al. [20] derived gait features based on estimated joint positions. A reconstruction method was employed to normalize the gait features from different viewpoints into the side plane. The methods in the first category can only work with limited range of view angles, and the accuracy of the methods can be affected by self-occlusion.

The methods in the second category integrate 3D information from multiple cameras to construct a gait model. An image-based rendering method was employed by Bodor et al. [21] to reconstruct the 3D view of the subject from a blend of different views. Zhao et al. [22] used video sequences acquired by multiple cameras to setup a human 3D model. Matching for the 3D models was performed using a linear time normalization technique. Yamauchi et al. [23] captured the body data using a high-resolution projector-camera system. They were able to obtain fairly accurate reconstructed synthetic human poses. The methods in the second category are able to provide reliable performance. However, these 3D analysis methods require complicated setup of a calibrated multi-camera system. Besides, these methods demand complex computation which makes them unsuitable for practical application.

The methods in the third category have some learnt mapping/projection function to normalize the gait features obtained from various viewing points to a shared feature space. Makihara et al. [24] extracted a frequency-domain gait feature using Fourier analysis. After that, a view transformation model (VTM) was used to learn a mapping function for the gait features obtained from different views. Some other variations based on VTMs had also been introduced [2527]. Studies that utilize VTM [2427] assume that the feature matrix in the training set can be completely decomposed into view and subject independent submatrices without overlapping elements. However, the view angle may sometimes be difficult to obtain a priori.

In [28], the correlation of gait sequences from different views was modeled using canonical-correlation analysis (CCA). The CCA strengths were directly used to match two gait sequences. Lee and Elgammal [29] presented a multi-linear generative model using higher-order singular value decomposition. View factors, body configuration factors, and gait-style factors could be obtained using such model. The methods in the third category generate more stable gait features and are less sensitive towards noise as compared to the methods in the first category. Furthermore, the methods in the third category deploy a simpler camera setup as compared to those in the second category.

2.2 Gait recognition with clothing and carrying conditions

Clothing is another challenging factor for gait recognition. The appearance of a person changes when the person wears different types of clothes. Besides, a recent study [30] shows that gait spoofing is possible by imitating the clothing of a person with similar build. These observations imply that the clothing factor yields high intraclass variation and low interclass variation which makes personal identification difficult.

Hossain et al. [31] attempted to address the clothing factor in gait recognition by proposing an approach to adaptively assign weights to different body parts based on how much that area is affected by clothing variation. For example, the head will usually be affected if a person wears a hat, while the leg will be affected if the person wears a long skirt. The algorithm assigns less weight to the head when the person wears a hat and similarly assigns less weight to the leg when the person wears a long skirt. This method thus reduces the influence of clothing by the adaptive weight tuning mechanism. However, the method makes strong assumption on the types of clothing the person wears (e.g., the clothing types must be known beforehand), and this makes it not very practical in real-life application.

Another study [32] approached the clothing factor using a random subspace method. Multiple subspaces were randomly formed using the coefficients generated by 2DPCA. A promising result was obtained as the method combined the evidences from multiple subspaces which provided different information about the clothing aspects when classification was performed.

There is also a group of researchers who introduced the use of gait energy image (GEI) with sway alignment [33] to overcome the clothing and carrying effects. Instead of taking the whole body to generate GEIs, only the area below the knee was used. The authors claimed that their method produced better accuracy as they believed that the lower part of the body was usually unaffected by the clothing and carrying conditions. Nevertheless, this method easily fails when the person's leg is obscured (e.g., the person wears a long skirt or carries a briefcase).

2.3 Gait recognition across various walking speeds

The approaches towards speed factor in gait recognition bear some resemblance to those methods addressing viewpoint variation. There are two general approaches that deal with gait with varying speeds: (1) learning mapping functions to transform the gait features from various speeds into a common walking speed and (2) extraction of speed-invariant gait feature. In the first approach, Tanawongsuwa and Bobic [34] proposed a stride normalization technique to transform the gait feature across various speeds into a common walking speed. On the other hand, Tsuji et al. [35] viewed cross-speed gait recognition as a similar problem as cross-view gait recognition and applied the VTM [24] technique to transform the gait from different speeds to a common speed for recognition.

In the second approach, Kusakunniran et al. [36] showed that the use of Procrustes shape analysis could tolerate the gait changes due to speed differences. They extended the technique to a higher-order shape configuration that could better represent the gait signature across speeds. They further introduced a differential composition model to assign different weights to different shape boundary to cope with large changes in walking speeds. Liu and Sarkar [37] proposed a population hidden Markov model to normalize the gait features based on a generic walking model. The proposed model, when combined with linear discriminant analysis, could distinguish the shapes of different subjects and suppress the differences of the same subject under various conditions, including speed changes. Tan et al. [38] represented the gait features using eight projective representations. The representation using projection from different directions yielded acceptable accuracy for gait recognition across speeds. Recently, Guan and Li [39] deployed the random subspace method [32] to address the cross-speed problem. This method also seemed to respond well towards speed changes.

2.4 Subspace-based approaches

In the computer vision community, the subspace method [40] has been used to represent an image set by a linear subspace that is spanned by all the images in the set. A number of algorithms have been proposed to measure the distances/similarities among the subspaces. Among the many distance/similarity measures, the concept of principal angle [41] between two subspaces has been widely adopted due to its efficient, accurate, and robust characteristics. Yamaguchi et al. [42] presented a method called mutual subspace method (MSM) that directly used the angles between two subspaces as the similarity score of two face image sets. Li et al. [43] further introduced the idea of weighted subspace distance to more effectively account for the characteristics of the underlying data distribution. This method was adopted by Liu et al. [44] in gait recognition to compare two subspaces comprising gait images captured from different view angles. A nonlinear extension of the principal angle method has also been presented in [45, 46].

Fukui and Yamaguchi proposed a constrained CSM (CMSM) [47] to learn a subspace in which the entire class exhibited small variance. This method greatly outperformed the original MSM. Later on, the nonlinear extension of the method using kernel trick was presented in [48]. The concept of multiple CMSM was proposed in [49] to create multiple constrained subspaces using ensemble learning, and MSM was used for classification. Inspired by linear discriminant analysis, Kim et al. [4] developed a technique that minimizes the canonical correlations of between-class sets and maximizes the canonical correlations of within-class sets. This method was shown to perform well in several object recognition problems.

2.5 Motivation and contribution

The subspace-based approach is shown to be promising in modeling video sequences. Subspaces can accommodate the effect of a wide range of variations and capture the dynamic properties in the video sequences. In many video surveillance applications, multiple snapshots of the same subject at different time instances can be obtained for recognition. Similarly, multiple images of the same subjects under varying viewpoints are also available in video camera networks. Therefore, it is natural to utilize these multiple sets of images instead of the conventional single snapshot image in our recognition task.

Clearly, the subspace-structure data resides on a nonlinear manifold. The non-Euclidean domain which suits the subspace-structure data is the Grassmann manifold. The Grassmann manifold G(m, D) is the set of m-dimensional linear subspaces of the D. Hence, a set of linear subspaces can be perceived as points on the Grassmann manifold. Most of the computer vision algorithms are developed for data lying in D. Applying these algorithms directly on the nonlinear manifold will yield poor accuracy as the underlying geometry of the manifold is ignored. Therefore, this paper aims to generalize the algorithm developed for D to the Grassmann manifold through the use of well-defined Grassmann kernels.

Our primary contributions in this paper are (1) a formulation for modeling gait subspaces on the Grassmann manifold, (2) a framework to integrate supervised and unsupervised GE techniques in the Grassmann manifold, (3) a method to incorporate sparse representation in the learning algorithms, and (4) extensive experiment to corroborate the proposed approach.

A preliminary version of this paper was presented in [50], which explored the use of gait recognition on the Grassmann manifold. This paper provides the road block for modeling gait image sets on the Grassmann manifold. A local-based discriminant analysis method called Grassmann locality preserving discriminant analysis was deployed, and an encouraging result was reported. In this paper, we provide a more detailed analysis and present a framework to integrate supervised and unsupervised GE methods. On top of that, we also propose three graph learning mechanisms, namely global, local, and adaptive learning, which operate around the GE framework which was not studied in the previous paper.

3 Preliminaries

Brief reviews of the Grassmann manifold and sparse representation are provided in this section. The theory behind the Grassmann kernel would be helpful to understand how points on the manifold could be measured. Some background knowledge of sparse representation would be beneficial in understanding how adaptive learning is accomplished in this work.

3.1 Grassmann manifold

The geometric property of the Grassmann manifold has received significant attention, and a good introduction for this topic can be found in [7]. For image set matching problem, an image set comprising of m images, with each image having D pixels, can be represented as a point on G(m, D). Two points on the Grassmann manifold, which correspond to two image sets, are equivalent if one can be mapped to the other by an m × m orthogonal matrix [7].

The distance between two subspaces can be measured by canonical distance, which is the length of geodesic path connecting two points on the Grassmann manifold. However, it is more computationally efficient to compute the distances between the subspaces using the principal angles [51]. Given two subspaces,P1 and P2, or referred to as points on the Grassmann manifold, principal angles are related to the geodesic distance by

D Geo 2 P 1 , P 2 = i θ i 2

where θ = [θ1, , θm] denotes the distance between span (P i ) and span (P j ). Principal angles can be conveniently computed using singular value decomposition as

P 1 P 2 = US V

where U = [u1um], u k span (P1), V = [v1v m ], v k span (P2), and S is the diagonal matrix S = diag (cosθ1…cosθm).

Various distances have been defined based on the principal angles, and some well-known distances are the Binet-Cauchy, projection, and Procrustes distances. Among the various distances, the projection distance, Binet-Cauchy distance, and canonical-correlation distance (the largest principal angle) are induced from positive definite kernels. This means that we can define the corresponding kernels on the Grassmann manifold based on these matrices.

In this paper, the projection kernel and canonical-correlation kernel are adopted as they are reported to provide good result [12, 14]. Given two points on a Grassmann manifold, X i and X j D xm, the similarity between the points is defined as

k _ proj i , j = X i X j F 2
k _ cc i , j = max a p span X i max b q span X j a p T b q

subject to a p T a p = b p T b p = 1 and a p T a q = b p T b q = 0 , p ≠ q; k_ proj denotes the projection kernel while k_ cc signifies the canonical-correlation kernel.

3.2 Sparse representation

In the past few years, sparse representation (SR) has proven to be a powerful tool for computer vision, computational biology, statistics, pattern recognition, and other applications [32, 52, 53]. Given a signal, or the column vector of an image in our case, x i k and an overcomplete dictionary [54] with k bases, X = [x1, x2, …, x n ] n × k (k > n), the goal of SR is to represent x i using as few entries of X as possible. The objective function can be defined as follows:

min S i 0 s . t . x i = X S i

where S i denotes the sparse coefficient matrix and 0 denotes the l0 norm of a vector.

However, it is NP-hard to find the sparsest solution for Equation 2 using l0-minimization. As such, l1-minimization is often used to solve the problem [54]. In practical applications, there might be noises in signal x i . Therefore, the following optimization model is used to estimate S i :

min S i 1 s . t . x i - X S i 2 < ϵ

where 1 is l1-norm and ϵ is the error-tolerant term.

4 Proposed approach

The detail of the proposed approach is given in this section. The proposed method mainly consists of three stages: GEI construction, Grassmann projection, and GGE. Two types of GGE configurations are introduced: supervised and unsupervised. Three different graph learning mechanisms are further presented for each of the GGE learning modes. The general framework for the proposed approach is depicted in Figure 1.

Figure 1

General framework for the GGE method.

4.1 Gait energy image

The simple yet effective GEI [55] approach is deployed in this paper. Given a gait sequence I t i , j t = 1 F , where I t (i, j) is a pixel at position (i, j) in the image I t , and F is the total number of frames in the gait sequence, GEI is defined as

GEI i , j = t = 1 F I t i , j / F .

One advantage of representing the gait feature using GEI is that we do not need to consider the underlying dynamics of the walking motion. This representation enables us to study the gait sequence from a holistic view by implicitly characterizing the structural statistics of the spatiotemporal patterns of the walking person. The original silhouette images and the resulting GEI images of three subjects are illustrated in Figure 2. We observe that the subjects can be favorably distinguished from the GEI images.

Figure 2

Samples of original silhouette images and the resulting GEI images.

4.2 Grassmann projection

The set of GEI images taken from the video sequence are modeled as a collection of linear subspaces. In this way, the undesired variability due to view angle, pose, and appearance changes can be absorbed within subspaces, and the variability of subject identity can be emphasized as variability among the subspaces. Most subspace-based learning techniques [46] employ an inconsistent mechanism, e.g., feature extraction is performed in the Euclidean space while non-Euclidean subspace distances are used. Optimization and convergence will be difficult to achieve using this inconsistent approach [12]. Under the Grassmann framework, the feature extraction and distance measurement can be integrated in a graceful manner, resulting in a simpler and more familiar algorithm.

Given sets of GEIs calculated using Equation 7, we compute SVD over the image sets to obtain the corresponding subspaces{X1, X2, …, X n } where X i D xm and D refers to the length of the gait feature while m signifies the number of images comprising the subspaces. After that, the Grassmann kernel is applied on these subspaces. To this end, we have tested two types of kernel functions, namely the projection and canonical kernels [12, 14] given in Equations 3 and 4.

4.3 Grassmann graph embedding

Grassmann kernels allow us to embed the manifold in a higher-dimensional RKHS to which many Euclidean algorithms can be generalized. Conventional dimension reduction techniques like linear discriminant analysis (LDA), principal component analysis (PCA), and locality preserving projection (LPP) can thus be applied on the Grassmann manifold to further improve recognition accuracy [1214]. The GE framework [15] has proven to be effective in unifying the various dimension reduction algorithms. Given points from the underlying Grassmann manifold , the local geometrical structure of can be modeled by constructing a similarity graph W. Let G = {V, W} denotes an undirected weighted graph with vertices V and similarity matrix W. The values for W can be directly obtained from the output of the Grassmann kernel. On the other hand, the diagonal matrix D and the Laplacian matrix L of the graph G are defined as L = DW where D ii = j i W ij .

The task of GE is to determine a low-dimensional representation of the vertex set V that preserves similarities between vertex pairs in the original high-dimensional space. The solution can be directly obtained using eigenvalue decomposition [15]. In the following text, we formulate the GE dimension reduction problem over the Grassmann manifolds for unsupervised and supervised configurations.

The unsupervised GGE approach is suitable for open surveillance systems like applications to monitor pedestrians at the streets and customers at the shopping malls. It is very difficult, if not impossible, to obtain the subject's identity in such settings; thus, unsupervised GGE will be useful in discerning an individual with unknown identity. On the contrary, supervised GGE is appropriate for closed-set identification like monitoring employees in a workplace. As the identity of the legitimate subject is known, supervised GGE would be able to classify the gait data reliably using identity information.

4.3.1 Unsupervised GGE

We formulate the unsupervised GGE method by first forming the similarity graph W. We want to find a mapping function F:Y i  → Z i to map the points on the Grassmann manifold, , to a new manifold, ’, to preserve the local geometry of the manifold. In other words, we want to find a transformation which maps the connected points on W as close as possible. The following objective function realizes this criterion:

min ij Z i - Z j 2 W ij .

The objective function W ij incurs a heavy penalty if the connected neighbors are mapped far apart in ’. Therefore, minimizing W ij ensures that Z i and Z j are close if Y i and Y j are close.

SupposeU is a projection matrix, ZT =UTY, that fulfills the objective function (8) and Y is the kernel matrix produced by the Grassmann kernel. By simple algebra manipulation, the objective function can be reduced to

1 / 2 ij Z i - Z j 2 W ij = 1 / 2 ij U i T Y i - U j T Y j 2 W ij = U T D - W ij Y T U = U T Y L Y T U

where D is a diagonal matrix given by D ij = j W ij . The optimization problem can be reduced to finding

arg min U U T Y L Y T U = 1 U T Y L Y T U .

The projection matrix U that minimizes Equation 8 is given by the maximum eigenvalue solution to the generalized eigenvalue problem:

Y L Y T U = λ Y D Y T U .

4.3.2 Supervised GGE

The unsupervised GGE method can be extended to the supervised version by constructing two similarity graphs, W w,ij and W b,ij , which denote the within-class and between-class similarity matrices, respectively. The extension is desirable as we can take advantage of the class label information to improve the classification accuracy. The mapping function for supervised GGE is slightly different from its unsupervised counterpart. The new mapping function F : Y i Z t is formed such that the connected points of the within-class similarity matrix, W w,ij , stay as close as possible while connected points of the between-class similarity matrix, W b,ij , stay as distant as possible. The class label information is used in this method to discover the discriminant structure of the samples. The objective functions for supervised GGE are defined as follows:

min ij Z i - Z j 2 W w , ij
max ij Z i - Z j 2 W b , ij .

The objective function W w,ij incurs a heavy penalty if neighboring points Z i and Z j are mapped far apart while they are actually in the same class. Likewise, the objective function W b,ij incurs a heavy penalty if neighboring points Z i and Z j are mapped close together while they belong to different classes.

SupposeU is a projection matrix, ZT =UTY, to realize the objective functions (12) and (13). By simple algebra manipulation, the objective function (12) can be reduced to

1 / 2 ij Z i - Z j 2 W w , ij = 1 / 2 ij V i T Y i - V j T Y j 2 W w , ij = i V i T Y i D w , ii Y i T V i - i , j V j T Y j D w , ij Y j T V j = U T Y D w Y T U T - U T Y W w Y T U T

where D w is a diagonal matrix given by D w , ij = j W w , ij . Similarly, the objective function (13) can be condensed to the following form:

1 / 2 ij Z i - Z j 2 W b , ij = 1 / 2 ij V i T Y i - V j T Y j 2 W b , ij = U T D b - W b Y T U = U T Y L b Y T U

where D b is a diagonal matrix obtained through D b , ii = j W b , ij . The optimization problem can be condensed into the following form:

arg max U U T Y D w Y T U = 1 U T Y L b W w Y T U .

The projection matrix that minimizes Equation 16 can be obtained by solving the generalized eigenvalue problem:

Y L b W w U = λ Y D w Y T U .

The procedure to implement GGE for supervised and unsupervised configurations is summarized in Algorithm 1.

4.3.3 Constructing the similarity graphs

Graph relations play a crucial role in the GE framework to determine how the methods behave based on the connectivity and weight assignment of the neighboring points in the data. We present three approaches for graph construction: global, local, and adaptive. The first approach constructs fully connected graphs where all nodes are connected using predefined weights. The representative methods for this approach are PCA and LDA for unsupervised and supervised configurations, respectively.

The second approach takes into consideration the neighborhood information where only the k neighboring nodes are connected in the graph. If k = N, the local approach is the same as the global approach. Some popular methods for this approach are LPP and locality preserving discriminant analysis [56] for unsupervised and supervised modes, respectively.

The third approach adaptively assigns weights to the nodes based on how the rest of the samples contribute to the sparse representation of the nodes. This is an unconventional approach for graph construction, and the detail of constructing the adaptive graph is given in the subsequent section.

Weight assignment for the similarity graphs for the global approach is straightforward where all nodes in the graph are connected with equal weights. For the unsupervised mode, the simplest graph structure is to set W ij = 1. Another way to form the similarity graph is using the heat kernel equation W ij = - x ^ i - x ^ j 2 / t [15] where t is an adjustable constant. In contrast to the unsupervised mode, two graphs are constructed in the supervised mode. Weights are assigned to the within-class similarity graph, W w,ij , if two nodes share the same class label; 0 otherwise. Similarly, weights are assigned to the between-class similarity graph, W b,ij , if two nodes are not from the same class; 0 otherwise.

For the local approach, the simplest graph structure is the simple-minded graph where the similarity matrix W ij is set to 1 if x ^ i is among the k th nearest neighbors of x ^ j ; 0 otherwise. The weight can also be replaced by the heat kernel equation. On the other hand, the supervised method takes into consideration the class information and sets the within-class similarity graph W w,ij  = 1 if x ^ i is among the k th nearest neighbors of x ^ j in the same class;0 otherwise. In a similar manner, the between-class similarity graph assigns W b,ij  = 1 if x ^ i is among the k th nearest neighbors of x ^ j in different classes; 0 otherwise.

We also propose a self-adaptive graph structure. Suppose S(i, j) is the sparse output estimated by Equation 6 using the column vector of X ^ (output of the Grassmann kernel), the similarity graph for unsupervised self-adaptive graph is defined as W ij  = S(i, j). On the other hand, the within-class similarity graph for the supervised method is defined as Ww,ij = S w (i, j). S w is the output from Equation 6 fulfilling the conditions x ^ i N w x ^ j or x ^ i N w x ^ j where N w x ^ j is the set of k neighbors sharing the same label with x ^ i . The between-class similarity graph is characterized by Wb,ij = S b (i, j). S b is the output from Equation 6 and x ^ i N b x ^ j or x ^ j N b x ^ i where N b x ^ i is the set of k neighbors having different labels. This is the basic approach to construct an adaptive graph where a single dictionary is learnt for all classes. Since the dictionary is learnt only once, some computational burden can be saved.

A number of variations can be derived from this basic idea. For example, class-specific dictionary can be learnt where each class is modeled independently of the others. W w can be modeled from the SR output using the column vector of X w ^ , where X w ^ is the Grassmann output sharing the same labels with the test sample. W b can also be constructed using the SR output using the column vector of X b ^ , where X b ^ is the Grassmann output having different labels with the test sample. This approach enables the learnt dictionary to have an efficient representation for each class. However, dictionary learning has to be performed multiple times for different classes.

If one wishes to uncover only the semantic information in the between-class similarity graph (due to the fact that perhaps not much interesting information can be revealed in the sparse within-class similarity graph as large values are expected for nodes coming from the same class), the between-class similarity graph could be generated using the sparse approach while the within-class similarity graph be constructed using the simple-minded or heat kernel functions. The combination of fully connected and sparse graphs benefits from the flexibility of sparse graph and low computational cost of the fully connected graph. Table 1 summarizes the different graph construction methods for GGE.

Table 1 Summary of the different graph construction methods

5 Experiments

Two databases were used to evaluate the proposed method namely, the Chinese Academy of Sciences, Institute of Automation (CASIA) gait database: dataset B [57] and the Osaka University, Institute of Scientific and Industrial Research (OU-ISIR) gait database: datasets A and B [58]. The CASIA gait database is good for assessing the view variation effect on gait as it contains a large number of subjects taken from different viewing angles. The CASIA gait database consists of 124 subjects captured from 11 different angles. The viewing angles range from 0° to 180°, separated by an interval of 18°. There are ten walking sequences for each subject, with six samples containing subjects walking under normal condition, two samples with subjects walking with coats, and two samples with subjects carrying bags. Therefore, there are altogether 13,640 (10 × 11 × 124) gait sequences in the database. All the images were cropped and normalized to 120 × 120 pixels.

The OU-ISIR gait database is suitable for assessing the influence of speed changes and clothing variations on gait. The OU-ISIR gait database: dataset A contains 35 subjects captured from side view with speed variation from 2 to 7 km/h, at an interval of 1 km/h. There are two walking sequences for each speed level. Thus, there are 420 (2 × 6 × 35) gait sequences in this dataset. On the other hand, dataset B is made up of 68 subjects acquired from side view with clothing variations. There are many clothing combinations in this dataset which include pants, half shirt, rain coat, skirt, and cap. All the images for the OU-ISIR database were cropped and resized to 128 × 88 pixels.

5.1 Experiment result

5.1.1 Evaluation on view variations

The CASIA gait database was used to testify the performance of the proposed method under view changes. All the six gait sequences under the normal walking condition were used. For clear indication, each of the viewing angles {0°, 18°, …, 180°} were labeled as θ = {1, 2, …, 11}. We formulated three cases to evaluate the proposed method against viewpoint changes. We simulated realistic scenarios where the multiple views could have been acquired from fairly different viewpoints:

  1. 1.

    Same view setting, θ test = θ train . In this setting, all the viewpoints used in the training and testing sets were the same, e.g., θ train = {1, …, 11} and θ test = {1, …, 11}.

  2. 2.

    Mixed view setting, θ test =θ; θ train =θ. In this setting, we made it challenging in which not all the poses in the testing sets were available for training, e.g., θ train = {2, 3, 4, 6, 8} and θ test = {2, 4, 6, 7, 9}.

  3. 3.

    Different view setting, θ test = θ - θ train . This is a difficult case where the testing set contains images which were totally different from those in the training set, e.g., θ train = {2, 4, 6, 9} and θ test = {1, 3, 5, 8}. We further included more challenging scenarios to test how the proposed method was able to generalize unseen viewpoints, e.g., θ train = {1, 2, 3, 4} and θ test = {7, 8, 9, 10}. This is an interesting experiment to see how well the proposed method performs in extrapolating view angles beyond the known view angles. The previous setting where the estimation of view angles is within the range of known view angles (e.g., θ train = {2, 4, 6, 9} and θ test = {1, 3, 5, 8}) can be seen as an interpolation case.

The following setup was deployed to run the experiment. We randomly selected four gait sequences from each subject to form the training set, and the remaining sequences were for the testing set. The selected view angles for the different settings were modeled as the subspace for each sample in the training and testing sets. We then computed the similarity score for every pair of training–testing matches. The random division of the gait sequences into training and testing sets was repeated several times, and the average result was recorded. The k-nearest neighbor method was used to measure the similarity score between the training and testing sets.

For SR dictionary learning, we deployed the l1-regularized least square problem solver distributed by Boyd's research group [59]. The algorithms are sensitive towards several parameters listed in Table 2. The values or range of values that generally yield good performance based on empirical test are also given in Table 2. The results reported in this paper were obtained based on the best possible combination of the parameters. The rank-1 recognition rate was used as the performance indicator. The correct match was counted when the sample in the testing set was the best match (top one) from the training set.

Table 2 List of parameters used in GGE

The experimental results for evaluating the changes in view angles are shown in Table 3. The canonical-correlation kernel is denoted as CC, kernel while projection kernel is termed as ‘Proj’. The prefixes ‘SM’, ‘HK’, and ‘SR’ are the abbreviations for simple-minded (the binary graph), heat kernel function, and sparse representation. We included comparison with the multi-view subspace representation (MSR) method. On top of that, we also added the classical score-level fusion method to benchmark the algorithm. The scores from the different view angles were fused together using the minimum dissimilarity selection rule [60].

Table 3 Evaluating the effect of view angle changes for the same, mixed, and different view settings

When all of the viewing angles are used to train the system, 100% accuracy could be achieved for all the methods except for score-level fusion and MSR. It is not surprising to get such good result because GGE captures the variations in viewpoint changes when recognition is performed. In the mixed view settings, promising results close to 100% accuracy is obtained. This is encouraging as the proposed methods are shown to possess cross-view capability. The performance is still favorable in the different view settings when the viewpoints in the testing set are close to that in the training set. However, the accuracy drops when the viewpoints in the testing sets are far apart from the training set (e.g., in the case of TR 1, 2, 3, 4; TT 7, 8, 9, 10). We accept such poor result because we understand that it is a challenging problem to extrapolate unseen views which are very different from the existing views. Based on the results shown in Table 3, we notice that the proposed method using CC kernel consistently yields good results. The effectiveness of SR has been verified in the experiments.

5.1.2 Variants of SR analysis

In this experiment, we evaluate the different variants derived from the SR similarity graph construction approach described in Section 4.3.3. Table 4 displays the performance of the different graph combination methods. Adaptive supervised GGE with CC kernel was applied in this experiment. SR variation 1 refers to the basic adaptive graph construction approach which learns a shared dictionary for all classes. This method constructs the within-class and between-class similarity graphs using Ww,ij = S w (i, j) and Wb,ij = S b (i, j). The l1-minimization algorithm given in Equation 6 is run once per test sample, and the output is split into W w,ij and W b,ij based on the class labels.

Table 4 Rank-1 recognition rate (%) of the different combinations of SR similarity graphs

On the contrary, SR variation 2 learns class-specific dictionary. It builds the within-class similarity graph using W w , ij = S ^ w i , j where S ^ w i , j is the result of running l1-minimization on the Grassmann output sharing the same class labels with the test sample. The between-class similarity graph is constructed using W b , ij = S ^ b i , j where S ^ b i , j is the result of running l1-minimization on the Grassmann output having different class labels from the test sample. In this respect, the l1-minimization algorithm was run twice per test sample: the first time using the training data from the same class to construct W w,ij and the second time using the training data from the other classes to construct W b,ij .

As for the SM + SR method, the simple-minded function was used to generate the within-class similarity graph, W w,ij , while SR was used to build the between-class similarity graph, W b,ij . Likewise, the heat kernel function and SR were used to generate W w,ij and W b,ij , respectively, for the HK + SR method.

We observe that the performances of the different SR variations do not deviate significantly. We use SR variation 1 in the subsequent sections as it gives slightly better results in most situations. Besides, it is less time-consuming as compared to SR variation 2 and does not require additional parameters like the number of neighbors k and the constant term t as compared to SM + SR and HK + SR.

5.1.3 Evaluation on clothing and carrying conditions

We conducted experiments to examine the performance of the proposed methods under clothing variations. The main purpose of this experiment is to simulate the condition where suspects captured by the surveillance cameras are trying to masquerade themselves by wearing covers like rain coat or hat. This experiment is also useful to identify the ability of the proposed method to discriminate individuals who wear loose outfits like baggy pants and skirt (for ladies) which can obstruct the gait pattern from being observed properly.

The CASIA and OU-ISIR gait databases were used for this evaluation. For the CASIA database, we took four normal gait sequences as the training set and two bags-carrying and two coats-wearing sequences as the testing sets. All the 11 viewing angles were applied in the test. Using two types of data for training and testing (one from the normal walking sequence and the other from the carrying/clothing conditions) is a more realistic setting where we need to generalize the unknown carrying/clothing type from the existing dataset. In real-life scenario, there is no way for us to predict the types of clothes the person wears or the things the person carry when he/she walks. As for the OU-ISIR database, six different clothing combinations were tested. Most of the clothing combinations were from types A (e.g., regular pants and parka) to M (e.g., baggy pants and down jacket) [58]. The clothing types were chosen such that we could get the largest possible variations for the test. Only 16 subjects were tested in this experiment. This is because we could only identify 16 corresponding pairs between dataset A, the normal walking sequence, and dataset B, walking with clothing variations. Six sequences from dataset A were used as the training set, while the six sequences in dataset B were used as the testing set.

The results of the tests are shown in Table 5. We find that the variations in clothing alter an individual's appearance and make the problem of gait identification challenging. For example, the images depicted in Figure 3 are taken from the same subject. The images look different when different types of clothing are worn. The experiment suggests that further investigation has to be carried out to study gait recognition with substantial clothing variations. Nevertheless, the methods could handle the carrying condition satisfactorily.

Table 5 Evaluating the effect of clothing and carrying conditions
Figure 3

Images of the same person with different clothing types.

5.1.4 Evaluation on walking speeds

We have also conducted experiments to assess the effect of walking speed on gait. We are interested in this study as the perpetrator usually walks faster in order to leave the crime scenes immediately. The OU-ISIR gait database was used for this evaluation. Using similar treatment as the view angle evaluation, we labeled the speed {2, 3, …, 7 km/h} as S = {1, 2, …, 6}. Table 6 shows the result of evaluating speed variations. Some methods could achieve 100% accuracy when all the speeds are used. Unlike clothing variations, speed changes do not drastically affect the accuracy of gait identification. Therefore, the methods could tolerate speed variations quite robustly.

Table 6 Evaluating the effect of speed variations using rank-1 recognition rate (%)

5.2 Summary and discussion

The important findings of this work are summarized below:

  •  The Grassmann manifold provides a platform to reduce the subspace-to-subspace matching problem to a point-to-point matching model. This is immensely useful for gait recognition as the gait video sequences naturally fall in the subspace learning paradigm (unlike face recognition which can be carried out using single image).

  •  The unsupervised and supervised GGE configurations provide different treatments to gait dataset of different natures (labeled and unlabeled). Nevertheless, the two approaches can be unified gracefully under a general formulation.

  •  GGE outperforms the benchmark methods for all cases. The proposed adaptive learning approach, in particular, yields considerable improvement in classification accuracy.

  •  As a comparison among the different graph construction approaches, the adaptive graph construction method obviously outperforms its counterparts. However, no conclusive remark can be drawn between the global and local methods. The global approach performs better than the local approach under view angle changes, but the opposite happens for the clothing and carrying conditions, while almost similar results were obtained for speed variation. As such, we conjecture that the topological structure of the graph has disparate impact on different scenarios. No single graph structure (referring to the global and local graphs) works best for all cases.

  •  Unsupervised GGE surprisingly outperforms its supervised counterpart in a number of scenarios, e.g., gait image sets with varying view angles and different clothing appearances. The reason why the unsupervised method performs better than the supervised scheme may be because the similarity graph W encodes general information about the relationship among the nodes, whereas the within- and between-similarity graphs, W w and W b , may overlook some subtle discriminative connection in the graphs. There may also be some outliers in the labeled training set, for example, an image of a person wearing thick sweater with hood that confuses the true appearance of the person, which explains why the supervised method is slightly inferior to the unsupervised method. This counterintuitive result suggests that it might be better to resort to unsupervised method when the cost of labeling the data is high where the class information would not lead to a dramatic improvement in recognition rate.

  •  The canonical-correlation kernel generally performs better than the projection kernel in the changing view scenario. We attribute this to the nature of the canonical-correlation kernel which is based on the notion of principal angle. The canonical-correlation kernel will always find the closest view angle in the training set for comparison. This concept is illustrated in Figure 4. Nevertheless, the projection kernel performs better than the canonical-correlation kernel in the clothing and carrying conditions. This may be due to the fact that projection kernel treats the image subspaces from a more holistic aspect. However, if the sample size is small, e.g., for the OU dataset, projection kernel does not have any advantage over the canonical-correlation kernel. The result suggests that the kernels describe different aspects of the subspaces.

Figure 4

Subspace-based method will always find the closest view angle between training and testing subspaces for matching.

6 Conclusions

This paper demonstrates how it is possible to formulate the gait recognition problem on the Grassmann manifold. This formulation enables us to work in higher-order data structure to harness the nonlinear structure of the data and yet benefit from conventional vector-based computation. We present a method comprising unsupervised and supervised learning modes on the Grassmann manifold. We further introduce the concept of adaptive graph in the learning mechanism to adaptively tailor the graph content based on the nature of the dataset. Experimental results suggest that the proposed method has a potential for practical application as it demonstrates view- and speed-invariant capabilities.


  1. 1.

    Matovski DS, Nixon MS, Mahmoodi S, Carter JN: The effect of time on gait recognition performance. IEEE Trans. Inf. Forensics and Secur. 2012, 7(2):543-552.

    Article  Google Scholar 

  2. 2.

    Zhao W: Face Processing. Burlington: Academic Press; 2005.

    Google Scholar 

  3. 3.

    Basri R, Jacobs DW: Lambertian reflectance and linear subspaces. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25(2):218-233. 10.1109/TPAMI.2003.1177153

    Article  Google Scholar 

  4. 4.

    Kim TK, Kittler J, Cipolla R: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29(6):1005-1018.

    Article  Google Scholar 

  5. 5.

    Cevikalp H, Triggs B: Face recognition based on image sets. In 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010. Piscataway: IEEE; 2010:2567-2573.

    Chapter  Google Scholar 

  6. 6.

    Wang R, Shan S, Chen X, Gao W: Manifold-manifold distance with application to face recognition based on image set. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR2008. Piscataway: IEEE; 2008:1-8.

    Google Scholar 

  7. 7.

    Edelman A, Arias TA, Smith ST: The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal. Appl. 1999, 20(2):303-353.

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Wikipedia, Hermann Grassmann Wikimedia Foundation, Inc; 2013. . Accessed 13 January 2014

  9. 9.

    Turaga P, Veeraraghavan A, Srivastava A, Chellappa R: Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33(11):2273-2286.

    Article  Google Scholar 

  10. 10.

    Wu T, Turaga P, Chellappa R: Age estimation and face verification across aging using landmarks. IEEE Trans. Inf. Forensics and Secur. 2012, 7(6):1780-1788.

    Article  Google Scholar 

  11. 11.

    Khan ZH, Gu IYH: Visual tracking and dynamic learning on the Grassmann manifold with inference from a Bayesian framework and state space models. In 2011 18th IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE; 2011:1433-1436.

    Chapter  Google Scholar 

  12. 12.

    Hamm J, Lee DD: Grassmann discriminant analysis: a unifying view on subspace-based learning. In Proceedings of the 25th International Conference on Machine Learning. New York: ACM; 2008:376-383.

    Google Scholar 

  13. 13.

    Wang T, Shi P: Kernel Grassmannian distances and discriminant analysis for face recognition from image sets. Pattern Recogn. Lett. 2009, 30(13):1161-1165. 10.1016/j.patrec.2009.06.002

    Article  Google Scholar 

  14. 14.

    Harandi MT, Sanderson C, Shirazi S, Lovell BC: Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE; 2011:2705-2712.

    Google Scholar 

  15. 15.

    Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S: Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Anal. Mach. Intell., IEEE Transactions 2007, 29(1):40-51.

    Article  Google Scholar 

  16. 16.

    Yu S, Tan D, Tan T: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. 18th International Conference on Pattern Recognition, 2006. ICPR 2006, 4: 441-444.

    Google Scholar 

  17. 17.

    Kawai R, Makihara Y, Hua C, Iwama H, Yagi Y: Person re-identification using view-dependent score-level fusion of gait and color features. In Proceedings of the 21st International Conference on Pattern Recognition. Piscataway: IEEE; 2012:2694-2697.

    Google Scholar 

  18. 18.

    Jean F, Bergevin R, Albu AB: Computing and evaluating view-normalized body part trajectories. Image Vis. Comput. 2009, 27(9):1272-1284. 10.1016/j.imavis.2008.11.009

    Article  MATH  Google Scholar 

  19. 19.

    Kale A, Chowdhury AKR, Chellappa R: Towards a view invariant gait recognition algorithm. In Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE; 2003:143-150.

    Chapter  Google Scholar 

  20. 20.

    Goffredo M, Bouchrika I, Carter JN, Nixon MS: Self-calibrating view-invariant gait biometrics. IEEE Trans. Syst. Man Cybern. B Cybern. 2010, 40(4):997-1008.

    Article  Google Scholar 

  21. 21.

    Bodor R, Drenner A, Fehr D, Masoud O, Papanikolopoulos N: View-independent human motion classification using image-based reconstruction. Image Vis. Comput. 2009, 27(8):1194-1206. 10.1016/j.imavis.2008.11.008

    Article  Google Scholar 

  22. 22.

    Zhao G, Liu G, Li H, Pietikäinen M: 3D gait recognition using multiple cameras. In 7th International Conference on Automatic Face and Gesture Recognition FGR 2006. Piscataway: IEEE; 2006:529-534.

    Google Scholar 

  23. 23.

    Yamauchi K, Bhanu B, Saito H: Recognition of walking humans in 3D: initial results. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009. Piscataway: IEEE; 2009:45-52.

    Chapter  Google Scholar 

  24. 24.

    Makihara Y, Sagawa R, Mukaigawa Y, Echigo T, Yagi Y: Gait recognition using a view transformation model in the frequency domain. Proceedings of the 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. Lecture notes in Computer Science, vol. 3953. In Computer Vision - ECCV 2006. Edited by: Leonardis A, Bischof H, Pinz A. Heidelberg: Springer; 2006:151-163.

    Chapter  Google Scholar 

  25. 25.

    Kusakunniran W, Wu Q, Li H, Zhang J: Multiple views gait recognition using view transformation model based on optimized gait energy image. In 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). Piscataway: IEEE; 2009:1058-1064.

    Chapter  Google Scholar 

  26. 26.

    Kusakunniran W, Wu Q, Zhang J, Li H: Support vector regression for multi-view gait recognition based on local motion feature selection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE; 2010:974-981.

    Google Scholar 

  27. 27.

    Kusakunniran W, Wu Q, Zhang J, Li H: Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron. Pattern Recogn. Lett. 2012, 33(7):882-889. 10.1016/j.patrec.2011.04.014

    Article  Google Scholar 

  28. 28.

    Bashir K, Xiang T, Gong S: Cross-view gait recognition using correlation strength. In Proceedings of the British Machine Vision Conference. Edited by: Labrosse F, Zwiggelaar R, Liu Y, Tiddeman B. Guildford: BMVA Press; 2010:109.1-109.11.

    Google Scholar 

  29. 29.

    Lee CS, Elgammal A: Towards scalable view-invariant gait recognition: multilinear analysis for gait. In Proceedings of the 5th International Conference on Audio- and Video-Based Biometric Person Authentication. Heidelberg: Springer; 2005:395-405.

    Chapter  Google Scholar 

  30. 30.

    Hadid A, Ghahramani M, Kellokumpu V, Pietikainen M, Bustard J, Nixon M: Can gait biometrics be spoofed? In 2012 21st International Conference on Pattern Recognition (ICPR). Piscataway: IEEE; 2012:3280-3283.

    Google Scholar 

  31. 31.

    Altab Hossain M, Makihara Y, Wang J, Yagi Y: Clothing-invariant gait identification using part-based clothing categorization and adaptive weight control. Pattern Recogn. 2010, 43(6):2281-2291. 10.1016/j.patcog.2009.12.020

    Article  Google Scholar 

  32. 32.

    Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31(2):210-227.

    Article  Google Scholar 

  33. 33.

    Singh S, Biswas KK: Biometric gait recognition with carrying and clothing variants. In Recognition and Machine Intelligence. Edited by: Chaudhury S, Mitra S, Murthy CA, Sastry PS, Pal SK. Heidelberg: Springer; 2009:446-451.

    Chapter  Google Scholar 

  34. 34.

    Tanawongsuwan R, Bobick A: Modelling the effects of walking speed on appearance-based gait recognition. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2: II-783-II-790.

    Google Scholar 

  35. 35.

    Tsuji A, Makihara Y, Yagi Y: Silhouette transformation based on walking speed for gait identification. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE; 2010:717-722. 3283

    Chapter  Google Scholar 

  36. 36.

    Kusakunniran W, Wu Q, Zhang J, Li H: Gait recognition across various walking speeds using higher order shape configuration based on a differential composition model. IEEE Trans. Syst. Man Cybern. B Cybern. 2012, 42(6):1654-1668.

    Article  Google Scholar 

  37. 37.

    Liu Z, Sarkar S: Improved gait recognition by gait dynamics normalization. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(6):863-876.

    Article  Google Scholar 

  38. 38.

    Tan D, Huang K, Yu S, Tan T: Uniprojective features for gait recognition. In Advances in Biometrics. Edited by: Lee S-W, Li SZ. Heidelberg: Springer; 2007:673-682.

    Chapter  Google Scholar 

  39. 39.

    Guan Y, Chang-Tsun L: A robust speed-invariant gait recognition system for walker and runner identification. In 2013 International Conference on Biometrics (ICB). Piscataway: IEEE; 2013:1-8.

    Chapter  Google Scholar 

  40. 40.

    Oja E: Subspace Methods of Pattern Recognition. Hertfordshire: Research Studies Press; 1983.

    Google Scholar 

  41. 41.

    Hotelling H: Relations between two sets of variates. Biometrika 1936, 28: 321. 10.1093/biomet/28.3-4.321

    MATH  Article  Google Scholar 

  42. 42.

    Yamaguchi O, Fukui K, Maeda K: Face recognition using temporal image sequence. In Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998. Piscataway: IEEE; 1998:318-323.

    Chapter  Google Scholar 

  43. 43.

    Li F, Dai Q, Xu W, Er G: Weighted subspace distance and its applications to object recognition and retrieval with image sets. IEEE Signal Process Lett. 2009, 16(3):227-230.

    Article  Google Scholar 

  44. 44.

    Liu N, Lu J, Tan YP, Li M: Set-to-set gait recognition across varying views and walking conditions. In 2011 IEEE International Conference on Multimedia and Expo (ICME). Piscataway: IEEE; 2011:1-6.

    Google Scholar 

  45. 45.

    Kim TK, Arandjelović O, Cipolla R: Learning over sets using boosted manifold principal angles (BoMPA). In Proceedings of British Machine Vision Conference 2005. Manchester: BMVA Press; 2005:779-788.

    Google Scholar 

  46. 46.

    Wolf L, Shashua A, Geman D: Learning over sets using kernel principal angles. J. Mach. Learn. Res. 2003, 4: 2003.

    MathSciNet  Google Scholar 

  47. 47.

    Fukui K, Yamaguchi O: Face recognition using multi-viewpoint patterns for robot vision. In Robotics Research. Edited by: Dario P, Chatila R. Heidelberg: Springer; 2005:192-201.

    Google Scholar 

  48. 48.

    Fukui K, Stenger B, Yamaguchi O: A framework for 3D object recognition using the kernel constrained mutual subspace method. In Computer Vision - ACCV 2006. Edited by: Narayanan PJ, Nayar SK, Shum H-Y. Heidelberg: Springer; 2006:315-324.

    Chapter  Google Scholar 

  49. 49.

    Nishiyama M, Yamaguchi O, Fukui K: Face recognition with the multiple constrained mutual subspace method. In Audio- and Video-Based Biometric Person Authentication. Edited by: Kanade T, Jain A, Ratha NK. Heidelberg: Springer; 2005:71-80.

    Chapter  Google Scholar 

  50. 50.

    Connie T, Michael GKO, Jin ATB: Grassmannian locality preserving discriminant analysis to view invariant gait recognition with image sets. In Proceedings of the 27th Conference on Image and Vision Computing, New Zealand. New York: ACM; 2012:400-405.

    Google Scholar 

  51. 51.

    Golub GH, Loan CFVV: Matrix Computations (Johns Hopkins Studies in Mathematical Sciences. 3rd edition. Baltimore: The Johns Hopkins University Press; 1996.

    Google Scholar 

  52. 52.

    Yang J, Wright J, Huang TS, Ma Y: Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19(11):2861-2873.

    MathSciNet  Article  Google Scholar 

  53. 53.

    Abhari K, Marsousi M, Babyn P, Alirezaie J: Medical image denoising using low pass filtering in sparse domain. In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Piscataway: IEEE; 2012:114-117.

    Chapter  Google Scholar 

  54. 54.

    Murray JF, Kreutz-Delgado K: Visual recognition and inference using dynamic overcomplete sparse learning. Neural Comput. 2007, 19(9):2301-2352. 10.1162/neco.2007.19.9.2301

    MATH  MathSciNet  Article  Google Scholar 

  55. 55.

    Han J, Bhanu B: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(2):316-322.

    Article  Google Scholar 

  56. 56.

    Yang L, Gong W, Gu X, Li W, Liu Y: Bagging null space locality preserving discriminant classifiers for face recognition. Pattern Recogn. 2009, 42(9):1853-1858. 10.1016/j.patcog.2008.10.014

    MATH  Article  Google Scholar 

  57. 57.

    CASIA Gait Database Beijing: Center for Biometrics and Security Research; 2005. . Accessed 13 January 2014

  58. 58.

    Makihara Y, Mannami H, Tsuji A, Hossain MA, Sugiura K, Mori A, Yagi Y: The OU-ISIR gait database comprising the treadmill dataset. IPSJ Trans. Comput. Vis. Appl. 2012, 4: 53-62.

    Google Scholar 

  59. 59.

    Koh K, Kim SJ, Boyd S: l1_ls: simple Matlab solver for l1-regularized least squares problems. 2008. . Accessed 13 January2014

    Google Scholar 

  60. 60.

    Jain A, Nandakumar K, Ross A: Score normalization in multimodal biometric systems. Pattern Recogn. 2005, 38(12):2270-2285. 10.1016/j.patcog.2005.01.012

    Article  Google Scholar 

Download references


This research was partially supported by Ministry of Science and Technology (MOSTI), Malaysia and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (2013006574).

Author information



Corresponding author

Correspondence to Andrew Beng Jin Teoh.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Connie, T., Goh, M.K.O. & Teoh, A.B.J. A Grassmann graph embedding framework for gait analysis. EURASIP J. Adv. Signal Process. 2014, 15 (2014).

Download citation


  • Sparse Representation
  • View Angle
  • Similarity Graph
  • Grassmann Manifold
  • Locality Preserve Projection