 Research
 Open Access
 Published:
A Grassmann graph embedding framework for gait analysis
EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 15 (2014)
Abstract
Gait recognition is important in a wide range of monitoring and surveillance applications. Gait information has often been used as evidence when other biometrics is indiscernible in the surveillance footage. Building on recent advances of the subspacebased approaches, we consider the problem of gait recognition on the Grassmann manifold. We show that by embedding the manifold into reproducing kernel Hilbert space and applying the mechanics of graph embedding on such manifold, significant performance improvement can be obtained. In this work, the gait recognition problem is studied in a unified way applicable for both supervised and unsupervised configurations. Sparse representation is further incorporated in the learning mechanism to adaptively harness the local structure of the data. Experiments demonstrate that the proposed method can tolerate variations in appearance for gait identification effectively.
1 Introduction
The use of CCTV video cameras for surveillance is common in public and commercial establishments like banks, shopping malls, parks, and railway stations. Most of the current video surveillance systems require human operators to constantly supervise the cameras. In other words, the effectiveness of the system is largely dependent on the vigilance of the person monitoring the system. To resolve this shortcoming, research is under way to develop automated systems for realtime cameras monitoring. Among the efforts, gait recognition is a popular study for automatic human identification. Gait recognition is a biometric technology that identifies people based on the manner they walk. This technology is suitable for person identification at a distance when other biometrics like face, iris, or fingerprint might be obscured or at too low a resolution. In many situations, gait is the only evidence available from a crime scene [1].
With the advent of visual surveillance, it is not difficult to obtain multipleviewpoint shot of a subject or video outputs over a period of time. These multiple sets of images can be combined to yield better performance as compared to singleshot images. Subspacebased approaches have been shown effective in modeling data consisting of multiple sets of images [2]. For example, Jacobs et al. [3] showed that illumination on human faces can be modeled as a ninedimensional subspace under mild assumptions. Subsequent to this finding, sets of images of the same person under varying lighting conditions are often modeled as low dimensional subspaces [4–6]. While a subspace is a linear space, the collection of linear subspaces is a completely different space known as the Riemannian manifold [7]. More formally, the ddimensional subspace in ℝ^{n} is called the Grassmann manifold, named after the famous mathematician Hermann Günther Grassmann [8]. The Grassmann manifold has long been known for its fascinating mathematical properties. However, its applications in computer vision and machine learning have appeared rather recently.
Turaga et al. [9] demonstrated the use of computer vision applications such as videobased face recognition, activity recognition, and image setbased object recognition on the Grassmann manifold. The Grassmann manifold structure of the face shape is also utilized in [10] for age estimation and face verification. In [11], geometrical structure of the Grassmann manifold was exploited for visual tracking scheme.
Hamm and Lee [12] showed that using a suitable Grassmann kernel, the Grassmann space can be embedded to a higherdimensional reproducing kernel Hilbert space (RKHS) where many Euclidean algorithms can be generalized. Subsequent to this finding, several studies extended the use of dimension reduction methods on the Grassmann manifold [13, 14]. Considerable improvement in recognition accuracy has been reported for this application.
In this paper, we propose an approach called Grassmann graph embedding (GGE) for gait analysis. Motivated by the success of the graph embedding (GE) framework [15], we show how GE can be integrated in the Grassmann manifold for the gait recognition problem through the use of welldefined kernel functions on the manifold. We provide a general formulation that supports both supervised and unsupervised dimension reduction mechanisms. We further attach semantic meaning to the gait data by incorporating sparse representation in our learning mechanism.
The rest of the paper is organized as follows. In Section 2, we review the different approaches to gait recognition. In Section 3, we provide the background of the methods used in this paper. The overall framework for the proposed Grassmann GE learning is described in Section 4. In Section 5, we present experimental results on different settings. Lastly, some concluding remarks are given in Section 6.
2 Related work
We provide a background study for the methods addressing view angle, clothing, and also speed factors in gait recognition. Besides, some subspacebased techniques related to our work are also reviewed in this Section.
2.1 Gait recognition under various viewing angles
Appearance change due to varying view angles is one of the greatest challenges in gait analysis. Studies show that singleview gait recognition performance drops when the view angle changes [16, 17]. Current approaches to gait recognition under various viewing angles can be classified into one of the three major categories: (1) extraction of viewinvariant gait feature, (2) generation of threedimensional (3D) gait information, and (3) learning projection or mapping functions to transform gait features from various views into a common feature space.
The first approach attempts to find gait features that are invariant to view changes. Jean et al. [18] introduced body part trajectories as the viewinvariant feature. The 2D trajectories of the feet and head were normalized to make them appear as if they were always seen from the fronttoparallel viewpoint. A method was proposed by Kale et al. [19] to synthesize the lateral view from arbitrary view through perspective projection in a sagittal plane. Recently, Goffredo et al. [20] derived gait features based on estimated joint positions. A reconstruction method was employed to normalize the gait features from different viewpoints into the side plane. The methods in the first category can only work with limited range of view angles, and the accuracy of the methods can be affected by selfocclusion.
The methods in the second category integrate 3D information from multiple cameras to construct a gait model. An imagebased rendering method was employed by Bodor et al. [21] to reconstruct the 3D view of the subject from a blend of different views. Zhao et al. [22] used video sequences acquired by multiple cameras to setup a human 3D model. Matching for the 3D models was performed using a linear time normalization technique. Yamauchi et al. [23] captured the body data using a highresolution projectorcamera system. They were able to obtain fairly accurate reconstructed synthetic human poses. The methods in the second category are able to provide reliable performance. However, these 3D analysis methods require complicated setup of a calibrated multicamera system. Besides, these methods demand complex computation which makes them unsuitable for practical application.
The methods in the third category have some learnt mapping/projection function to normalize the gait features obtained from various viewing points to a shared feature space. Makihara et al. [24] extracted a frequencydomain gait feature using Fourier analysis. After that, a view transformation model (VTM) was used to learn a mapping function for the gait features obtained from different views. Some other variations based on VTMs had also been introduced [25–27]. Studies that utilize VTM [24–27] assume that the feature matrix in the training set can be completely decomposed into view and subject independent submatrices without overlapping elements. However, the view angle may sometimes be difficult to obtain a priori.
In [28], the correlation of gait sequences from different views was modeled using canonicalcorrelation analysis (CCA). The CCA strengths were directly used to match two gait sequences. Lee and Elgammal [29] presented a multilinear generative model using higherorder singular value decomposition. View factors, body configuration factors, and gaitstyle factors could be obtained using such model. The methods in the third category generate more stable gait features and are less sensitive towards noise as compared to the methods in the first category. Furthermore, the methods in the third category deploy a simpler camera setup as compared to those in the second category.
2.2 Gait recognition with clothing and carrying conditions
Clothing is another challenging factor for gait recognition. The appearance of a person changes when the person wears different types of clothes. Besides, a recent study [30] shows that gait spoofing is possible by imitating the clothing of a person with similar build. These observations imply that the clothing factor yields high intraclass variation and low interclass variation which makes personal identification difficult.
Hossain et al. [31] attempted to address the clothing factor in gait recognition by proposing an approach to adaptively assign weights to different body parts based on how much that area is affected by clothing variation. For example, the head will usually be affected if a person wears a hat, while the leg will be affected if the person wears a long skirt. The algorithm assigns less weight to the head when the person wears a hat and similarly assigns less weight to the leg when the person wears a long skirt. This method thus reduces the influence of clothing by the adaptive weight tuning mechanism. However, the method makes strong assumption on the types of clothing the person wears (e.g., the clothing types must be known beforehand), and this makes it not very practical in reallife application.
Another study [32] approached the clothing factor using a random subspace method. Multiple subspaces were randomly formed using the coefficients generated by 2DPCA. A promising result was obtained as the method combined the evidences from multiple subspaces which provided different information about the clothing aspects when classification was performed.
There is also a group of researchers who introduced the use of gait energy image (GEI) with sway alignment [33] to overcome the clothing and carrying effects. Instead of taking the whole body to generate GEIs, only the area below the knee was used. The authors claimed that their method produced better accuracy as they believed that the lower part of the body was usually unaffected by the clothing and carrying conditions. Nevertheless, this method easily fails when the person's leg is obscured (e.g., the person wears a long skirt or carries a briefcase).
2.3 Gait recognition across various walking speeds
The approaches towards speed factor in gait recognition bear some resemblance to those methods addressing viewpoint variation. There are two general approaches that deal with gait with varying speeds: (1) learning mapping functions to transform the gait features from various speeds into a common walking speed and (2) extraction of speedinvariant gait feature. In the first approach, Tanawongsuwa and Bobic [34] proposed a stride normalization technique to transform the gait feature across various speeds into a common walking speed. On the other hand, Tsuji et al. [35] viewed crossspeed gait recognition as a similar problem as crossview gait recognition and applied the VTM [24] technique to transform the gait from different speeds to a common speed for recognition.
In the second approach, Kusakunniran et al. [36] showed that the use of Procrustes shape analysis could tolerate the gait changes due to speed differences. They extended the technique to a higherorder shape configuration that could better represent the gait signature across speeds. They further introduced a differential composition model to assign different weights to different shape boundary to cope with large changes in walking speeds. Liu and Sarkar [37] proposed a population hidden Markov model to normalize the gait features based on a generic walking model. The proposed model, when combined with linear discriminant analysis, could distinguish the shapes of different subjects and suppress the differences of the same subject under various conditions, including speed changes. Tan et al. [38] represented the gait features using eight projective representations. The representation using projection from different directions yielded acceptable accuracy for gait recognition across speeds. Recently, Guan and Li [39] deployed the random subspace method [32] to address the crossspeed problem. This method also seemed to respond well towards speed changes.
2.4 Subspacebased approaches
In the computer vision community, the subspace method [40] has been used to represent an image set by a linear subspace that is spanned by all the images in the set. A number of algorithms have been proposed to measure the distances/similarities among the subspaces. Among the many distance/similarity measures, the concept of principal angle [41] between two subspaces has been widely adopted due to its efficient, accurate, and robust characteristics. Yamaguchi et al. [42] presented a method called mutual subspace method (MSM) that directly used the angles between two subspaces as the similarity score of two face image sets. Li et al. [43] further introduced the idea of weighted subspace distance to more effectively account for the characteristics of the underlying data distribution. This method was adopted by Liu et al. [44] in gait recognition to compare two subspaces comprising gait images captured from different view angles. A nonlinear extension of the principal angle method has also been presented in [45, 46].
Fukui and Yamaguchi proposed a constrained CSM (CMSM) [47] to learn a subspace in which the entire class exhibited small variance. This method greatly outperformed the original MSM. Later on, the nonlinear extension of the method using kernel trick was presented in [48]. The concept of multiple CMSM was proposed in [49] to create multiple constrained subspaces using ensemble learning, and MSM was used for classification. Inspired by linear discriminant analysis, Kim et al. [4] developed a technique that minimizes the canonical correlations of betweenclass sets and maximizes the canonical correlations of withinclass sets. This method was shown to perform well in several object recognition problems.
2.5 Motivation and contribution
The subspacebased approach is shown to be promising in modeling video sequences. Subspaces can accommodate the effect of a wide range of variations and capture the dynamic properties in the video sequences. In many video surveillance applications, multiple snapshots of the same subject at different time instances can be obtained for recognition. Similarly, multiple images of the same subjects under varying viewpoints are also available in video camera networks. Therefore, it is natural to utilize these multiple sets of images instead of the conventional single snapshot image in our recognition task.
Clearly, the subspacestructure data resides on a nonlinear manifold. The nonEuclidean domain which suits the subspacestructure data is the Grassmann manifold. The Grassmann manifold G(m, D) is the set of mdimensional linear subspaces of the ℝ^{D}. Hence, a set of linear subspaces can be perceived as points on the Grassmann manifold. Most of the computer vision algorithms are developed for data lying in ℝ^{D}. Applying these algorithms directly on the nonlinear manifold will yield poor accuracy as the underlying geometry of the manifold is ignored. Therefore, this paper aims to generalize the algorithm developed for ℝ^{D} to the Grassmann manifold through the use of welldefined Grassmann kernels.
Our primary contributions in this paper are (1) a formulation for modeling gait subspaces on the Grassmann manifold, (2) a framework to integrate supervised and unsupervised GE techniques in the Grassmann manifold, (3) a method to incorporate sparse representation in the learning algorithms, and (4) extensive experiment to corroborate the proposed approach.
A preliminary version of this paper was presented in [50], which explored the use of gait recognition on the Grassmann manifold. This paper provides the road block for modeling gait image sets on the Grassmann manifold. A localbased discriminant analysis method called Grassmann locality preserving discriminant analysis was deployed, and an encouraging result was reported. In this paper, we provide a more detailed analysis and present a framework to integrate supervised and unsupervised GE methods. On top of that, we also propose three graph learning mechanisms, namely global, local, and adaptive learning, which operate around the GE framework which was not studied in the previous paper.
3 Preliminaries
Brief reviews of the Grassmann manifold and sparse representation are provided in this section. The theory behind the Grassmann kernel would be helpful to understand how points on the manifold could be measured. Some background knowledge of sparse representation would be beneficial in understanding how adaptive learning is accomplished in this work.
3.1 Grassmann manifold
The geometric property of the Grassmann manifold has received significant attention, and a good introduction for this topic can be found in [7]. For image set matching problem, an image set comprising of m images, with each image having D pixels, can be represented as a point on G(m, D). Two points on the Grassmann manifold, which correspond to two image sets, are equivalent if one can be mapped to the other by an m × m orthogonal matrix [7].
The distance between two subspaces can be measured by canonical distance, which is the length of geodesic path connecting two points on the Grassmann manifold. However, it is more computationally efficient to compute the distances between the subspaces using the principal angles [51]. Given two subspaces,P_{1} and P_{2}, or referred to as points on the Grassmann manifold, principal angles are related to the geodesic distance by
where θ = [θ_{1}, …, θ_{m}]^{′} denotes the distance between span (P_{ i }) and span (P_{ j }). Principal angles can be conveniently computed using singular value decomposition as
where U = [u_{1}…u_{m}], u_{ k } ∈ span (P_{1}), V = [v_{1}…v_{ m }], v_{ k } ∈ span (P_{2}), and S is the diagonal matrix S = diag (cosθ_{1}…cosθ_{m}).
Various distances have been defined based on the principal angles, and some wellknown distances are the BinetCauchy, projection, and Procrustes distances. Among the various distances, the projection distance, BinetCauchy distance, and canonicalcorrelation distance (the largest principal angle) are induced from positive definite kernels. This means that we can define the corresponding kernels on the Grassmann manifold based on these matrices.
In this paper, the projection kernel and canonicalcorrelation kernel are adopted as they are reported to provide good result [12, 14]. Given two points on a Grassmann manifold, X_{ i } and X_{ j } ∈ ℝ^{D xm}, the similarity between the points is defined as
subject to {a}_{p}^{T}{a}_{p}={b}_{p}^{T}{b}_{p}=1 and {a}_{p}^{T}{a}_{q}={b}_{p}^{T}{b}_{q}=0, p ≠ q; k_ proj denotes the projection kernel while k_ cc signifies the canonicalcorrelation kernel.
3.2 Sparse representation
In the past few years, sparse representation (SR) has proven to be a powerful tool for computer vision, computational biology, statistics, pattern recognition, and other applications [32, 52, 53]. Given a signal, or the column vector of an image in our case, x_{ i } ∈ ℝ^{k} and an overcomplete dictionary [54] with k bases, X = [x_{1}, x_{2}, …, x_{ n }] ∈ ℝ^{n × k} (k > n), the goal of SR is to represent x_{ i } using as few entries of X as possible. The objective function can be defined as follows:
where S_{ i } denotes the sparse coefficient matrix and ∥∙∥_{0} denotes the l_{0} norm of a vector.
However, it is NPhard to find the sparsest solution for Equation 2 using l_{0}minimization. As such, l_{1}minimization is often used to solve the problem [54]. In practical applications, there might be noises in signal x_{ i }. Therefore, the following optimization model is used to estimate S_{ i }:
where ∥∙∥_{1} is l_{1}norm and ϵ is the errortolerant term.
4 Proposed approach
The detail of the proposed approach is given in this section. The proposed method mainly consists of three stages: GEI construction, Grassmann projection, and GGE. Two types of GGE configurations are introduced: supervised and unsupervised. Three different graph learning mechanisms are further presented for each of the GGE learning modes. The general framework for the proposed approach is depicted in Figure 1.
4.1 Gait energy image
The simple yet effective GEI [55] approach is deployed in this paper. Given a gait sequence {\left\{{I}_{t},\left(i\text{,},\phantom{\rule{0.25em}{0ex}},j\right)\right\}}_{t=1}^{F}, where I_{ t } (i, j) is a pixel at position (i, j) in the image I_{ t }, and F is the total number of frames in the gait sequence, GEI is defined as
One advantage of representing the gait feature using GEI is that we do not need to consider the underlying dynamics of the walking motion. This representation enables us to study the gait sequence from a holistic view by implicitly characterizing the structural statistics of the spatiotemporal patterns of the walking person. The original silhouette images and the resulting GEI images of three subjects are illustrated in Figure 2. We observe that the subjects can be favorably distinguished from the GEI images.
4.2 Grassmann projection
The set of GEI images taken from the video sequence are modeled as a collection of linear subspaces. In this way, the undesired variability due to view angle, pose, and appearance changes can be absorbed within subspaces, and the variability of subject identity can be emphasized as variability among the subspaces. Most subspacebased learning techniques [4–6] employ an inconsistent mechanism, e.g., feature extraction is performed in the Euclidean space while nonEuclidean subspace distances are used. Optimization and convergence will be difficult to achieve using this inconsistent approach [12]. Under the Grassmann framework, the feature extraction and distance measurement can be integrated in a graceful manner, resulting in a simpler and more familiar algorithm.
Given sets of GEIs calculated using Equation 7, we compute SVD over the image sets to obtain the corresponding subspaces{X_{1}, X_{2}, …, X_{ n }} where X_{ i } ∈ ℝ^{D xm} and D refers to the length of the gait feature while m signifies the number of images comprising the subspaces. After that, the Grassmann kernel is applied on these subspaces. To this end, we have tested two types of kernel functions, namely the projection and canonical kernels [12, 14] given in Equations 3 and 4.
4.3 Grassmann graph embedding
Grassmann kernels allow us to embed the manifold in a higherdimensional RKHS to which many Euclidean algorithms can be generalized. Conventional dimension reduction techniques like linear discriminant analysis (LDA), principal component analysis (PCA), and locality preserving projection (LPP) can thus be applied on the Grassmann manifold to further improve recognition accuracy [12–14]. The GE framework [15] has proven to be effective in unifying the various dimension reduction algorithms. Given points from the underlying Grassmann manifold ℳ, the local geometrical structure of ℳ can be modeled by constructing a similarity graph W. Let G = {V, W} denotes an undirected weighted graph with vertices V and similarity matrix W. The values for W can be directly obtained from the output of the Grassmann kernel. On the other hand, the diagonal matrix D and the Laplacian matrix L of the graph G are defined as L = D – W where {D}_{\mathit{ii}}={\displaystyle \sum _{j\ne i}}{W}_{\mathit{ij}}.
The task of GE is to determine a lowdimensional representation of the vertex set V that preserves similarities between vertex pairs in the original highdimensional space. The solution can be directly obtained using eigenvalue decomposition [15]. In the following text, we formulate the GE dimension reduction problem over the Grassmann manifolds for unsupervised and supervised configurations.
The unsupervised GGE approach is suitable for open surveillance systems like applications to monitor pedestrians at the streets and customers at the shopping malls. It is very difficult, if not impossible, to obtain the subject's identity in such settings; thus, unsupervised GGE will be useful in discerning an individual with unknown identity. On the contrary, supervised GGE is appropriate for closedset identification like monitoring employees in a workplace. As the identity of the legitimate subject is known, supervised GGE would be able to classify the gait data reliably using identity information.
4.3.1 Unsupervised GGE
We formulate the unsupervised GGE method by first forming the similarity graph W. We want to find a mapping function F:Y_{ i } → Z_{ i } to map the points on the Grassmann manifold, ℳ, to a new manifold, ℳ’, to preserve the local geometry of the manifold. In other words, we want to find a transformation which maps the connected points on W as close as possible. The following objective function realizes this criterion:
The objective function W_{ ij } incurs a heavy penalty if the connected neighbors are mapped far apart in ℳ’. Therefore, minimizing W_{ ij } ensures that Z_{ i } and Z_{ j } are close if Y_{ i } and Y_{ j } are close.
Suppose\u23df\mathbb{U} is a projection matrix, Z^{T} =\u23df\mathbb{U}^{T}Y, that fulfills the objective function (8) and Y is the kernel matrix produced by the Grassmann kernel. By simple algebra manipulation, the objective function can be reduced to
where D is a diagonal matrix given by {D}_{\mathit{ij}}={\displaystyle \sum _{j}}{W}_{\mathit{ij}}. The optimization problem can be reduced to finding
The projection matrix \u23df\mathbb{U} that minimizes Equation 8 is given by the maximum eigenvalue solution to the generalized eigenvalue problem:
4.3.2 Supervised GGE
The unsupervised GGE method can be extended to the supervised version by constructing two similarity graphs, W_{ w,ij } and W_{ b,ij }, which denote the withinclass and betweenclass similarity matrices, respectively. The extension is desirable as we can take advantage of the class label information to improve the classification accuracy. The mapping function for supervised GGE is slightly different from its unsupervised counterpart. The new mapping function {F}^{\prime}\text{:}{Y}_{i}\to {Z}_{t}^{\prime} is formed such that the connected points of the withinclass similarity matrix, W_{ w,ij }, stay as close as possible while connected points of the betweenclass similarity matrix, W_{ b,ij }, stay as distant as possible. The class label information is used in this method to discover the discriminant structure of the samples. The objective functions for supervised GGE are defined as follows:
The objective function W_{ w,ij } incurs a heavy penalty if neighboring points {Z}_{i}^{\prime} and {Z}_{j}^{\prime} are mapped far apart while they are actually in the same class. Likewise, the objective function W_{ b,ij } incurs a heavy penalty if neighboring points {Z}_{i}^{\prime} and {Z}_{j}^{\prime} are mapped close together while they belong to different classes.
Suppose\u23df\mathbb{U} is a projection matrix, Z^{′}^{T} =\u23df\mathbb{U}^{T}Y, to realize the objective functions (12) and (13). By simple algebra manipulation, the objective function (12) can be reduced to
where D_{ w } is a diagonal matrix given by {D}_{w,\mathit{ij}}={\displaystyle \sum _{j}}{W}_{w,\mathit{ij}}. Similarly, the objective function (13) can be condensed to the following form:
where D_{ b } is a diagonal matrix obtained through {D}_{b,\mathit{ii}}={\displaystyle \sum _{j}}{W}_{b,\mathit{ij}}. The optimization problem can be condensed into the following form:
The projection matrix that minimizes Equation 16 can be obtained by solving the generalized eigenvalue problem:
The procedure to implement GGE for supervised and unsupervised configurations is summarized in Algorithm 1.
4.3.3 Constructing the similarity graphs
Graph relations play a crucial role in the GE framework to determine how the methods behave based on the connectivity and weight assignment of the neighboring points in the data. We present three approaches for graph construction: global, local, and adaptive. The first approach constructs fully connected graphs where all nodes are connected using predefined weights. The representative methods for this approach are PCA and LDA for unsupervised and supervised configurations, respectively.
The second approach takes into consideration the neighborhood information where only the k neighboring nodes are connected in the graph. If k = N, the local approach is the same as the global approach. Some popular methods for this approach are LPP and locality preserving discriminant analysis [56] for unsupervised and supervised modes, respectively.
The third approach adaptively assigns weights to the nodes based on how the rest of the samples contribute to the sparse representation of the nodes. This is an unconventional approach for graph construction, and the detail of constructing the adaptive graph is given in the subsequent section.
Weight assignment for the similarity graphs for the global approach is straightforward where all nodes in the graph are connected with equal weights. For the unsupervised mode, the simplest graph structure is to set W_{ ij } = 1. Another way to form the similarity graph is using the heat kernel equation {W}_{\mathit{ij}}=\left\{\u2225{\widehat{x}}_{i}{\left(\right)}^{{\widehat{x}}_{j}}2,/,t\right\}\n[15] where t is an adjustable constant. In contrast to the unsupervised mode, two graphs are constructed in the supervised mode. Weights are assigned to the withinclass similarity graph, W_{ w,ij }, if two nodes share the same class label; 0 otherwise. Similarly, weights are assigned to the betweenclass similarity graph, W_{ b,ij }, if two nodes are not from the same class; 0 otherwise.
For the local approach, the simplest graph structure is the simpleminded graph where the similarity matrix W_{ ij } is set to 1 if {\widehat{x}}_{i} is among the k th nearest neighbors of {\widehat{x}}_{j}; 0 otherwise. The weight can also be replaced by the heat kernel equation. On the other hand, the supervised method takes into consideration the class information and sets the withinclass similarity graph W_{ w,ij } = 1 if {\widehat{x}}_{i} is among the k th nearest neighbors of {\widehat{x}}_{j} in the same class;0 otherwise. In a similar manner, the betweenclass similarity graph assigns W_{ b,ij } = 1 if {\widehat{x}}_{i} is among the k th nearest neighbors of {\widehat{x}}_{j} in different classes; 0 otherwise.
We also propose a selfadaptive graph structure. Suppose S(i, j) is the sparse output estimated by Equation 6 using the column vector of \widehat{\mathbf{X}} (output of the Grassmann kernel), the similarity graph for unsupervised selfadaptive graph is defined as W_{ ij } = S(i, j). On the other hand, the withinclass similarity graph for the supervised method is defined as W_{w,ij} = S_{ w }(i, j). S_{ w } is the output from Equation 6 fulfilling the conditions {\widehat{x}}_{i}\in {N}_{w}\left({\widehat{x}}_{j}\right) or {\widehat{x}}_{i}\in {N}_{w}\left({\widehat{x}}_{j}\right) where {N}_{w}\left({\widehat{x}}_{j}\right) is the set of k neighbors sharing the same label with {\widehat{x}}_{i}. The betweenclass similarity graph is characterized by W_{b,ij} = S_{ b }(i, j). S_{ b } is the output from Equation 6 and {\widehat{x}}_{i}\in {N}_{b}\left({\widehat{x}}_{j}\right) or {\widehat{x}}_{j}\in {N}_{b}\left({\widehat{x}}_{i}\right)\phantom{\rule{0.12em}{0ex}} where {N}_{b}\left({\widehat{x}}_{i}\right) is the set of k neighbors having different labels. This is the basic approach to construct an adaptive graph where a single dictionary is learnt for all classes. Since the dictionary is learnt only once, some computational burden can be saved.
A number of variations can be derived from this basic idea. For example, classspecific dictionary can be learnt where each class is modeled independently of the others. W_{ w } can be modeled from the SR output using the column vector of \widehat{{\mathbf{X}}_{w}\phantom{\rule{0.12em}{0ex}}}, where \widehat{{\mathbf{X}}_{w}\phantom{\rule{0.12em}{0ex}}} is the Grassmann output sharing the same labels with the test sample. W_{ b } can also be constructed using the SR output using the column vector of \widehat{{\mathbf{X}}_{b}\phantom{\rule{0.12em}{0ex}}}, where \widehat{{\mathbf{X}}_{b}\phantom{\rule{0.12em}{0ex}}} is the Grassmann output having different labels with the test sample. This approach enables the learnt dictionary to have an efficient representation for each class. However, dictionary learning has to be performed multiple times for different classes.
If one wishes to uncover only the semantic information in the betweenclass similarity graph (due to the fact that perhaps not much interesting information can be revealed in the sparse withinclass similarity graph as large values are expected for nodes coming from the same class), the betweenclass similarity graph could be generated using the sparse approach while the withinclass similarity graph be constructed using the simpleminded or heat kernel functions. The combination of fully connected and sparse graphs benefits from the flexibility of sparse graph and low computational cost of the fully connected graph. Table 1 summarizes the different graph construction methods for GGE.
5 Experiments
Two databases were used to evaluate the proposed method namely, the Chinese Academy of Sciences, Institute of Automation (CASIA) gait database: dataset B [57] and the Osaka University, Institute of Scientific and Industrial Research (OUISIR) gait database: datasets A and B [58]. The CASIA gait database is good for assessing the view variation effect on gait as it contains a large number of subjects taken from different viewing angles. The CASIA gait database consists of 124 subjects captured from 11 different angles. The viewing angles range from 0° to 180°, separated by an interval of 18°. There are ten walking sequences for each subject, with six samples containing subjects walking under normal condition, two samples with subjects walking with coats, and two samples with subjects carrying bags. Therefore, there are altogether 13,640 (10 × 11 × 124) gait sequences in the database. All the images were cropped and normalized to 120 × 120 pixels.
The OUISIR gait database is suitable for assessing the influence of speed changes and clothing variations on gait. The OUISIR gait database: dataset A contains 35 subjects captured from side view with speed variation from 2 to 7 km/h, at an interval of 1 km/h. There are two walking sequences for each speed level. Thus, there are 420 (2 × 6 × 35) gait sequences in this dataset. On the other hand, dataset B is made up of 68 subjects acquired from side view with clothing variations. There are many clothing combinations in this dataset which include pants, half shirt, rain coat, skirt, and cap. All the images for the OUISIR database were cropped and resized to 128 × 88 pixels.
5.1 Experiment result
5.1.1 Evaluation on view variations
The CASIA gait database was used to testify the performance of the proposed method under view changes. All the six gait sequences under the normal walking condition were used. For clear indication, each of the viewing angles {0°, 18°, …, 180°} were labeled as θ = {1, 2, …, 11}. We formulated three cases to evaluate the proposed method against viewpoint changes. We simulated realistic scenarios where the multiple views could have been acquired from fairly different viewpoints:

1.
Same view setting, θ _{ test } = θ _{ train }. In this setting, all the viewpoints used in the training and testing sets were the same, e.g., θ _{train} = {1, …, 11} and θ _{test} = {1, …, 11}.

2.
Mixed view setting, θ _{ test } = θ; θ _{ train } = θ. In this setting, we made it challenging in which not all the poses in the testing sets were available for training, e.g., θ _{train} = {2, 3, 4, 6, 8} and θ _{test} = {2, 4, 6, 7, 9}.

3.
Different view setting, θ _{ test } = θ  θ _{ train }. This is a difficult case where the testing set contains images which were totally different from those in the training set, e.g., θ _{train} = {2, 4, 6, 9} and θ _{test} = {1, 3, 5, 8}. We further included more challenging scenarios to test how the proposed method was able to generalize unseen viewpoints, e.g., θ _{train} = {1, 2, 3, 4} and θ _{test} = {7, 8, 9, 10}. This is an interesting experiment to see how well the proposed method performs in extrapolating view angles beyond the known view angles. The previous setting where the estimation of view angles is within the range of known view angles (e.g., θ _{train} = {2, 4, 6, 9} and θ _{test} = {1, 3, 5, 8}) can be seen as an interpolation case.
The following setup was deployed to run the experiment. We randomly selected four gait sequences from each subject to form the training set, and the remaining sequences were for the testing set. The selected view angles for the different settings were modeled as the subspace for each sample in the training and testing sets. We then computed the similarity score for every pair of training–testing matches. The random division of the gait sequences into training and testing sets was repeated several times, and the average result was recorded. The knearest neighbor method was used to measure the similarity score between the training and testing sets.
For SR dictionary learning, we deployed the l_{1}regularized least square problem solver distributed by Boyd's research group [59]. The algorithms are sensitive towards several parameters listed in Table 2. The values or range of values that generally yield good performance based on empirical test are also given in Table 2. The results reported in this paper were obtained based on the best possible combination of the parameters. The rank1 recognition rate was used as the performance indicator. The correct match was counted when the sample in the testing set was the best match (top one) from the training set.
The experimental results for evaluating the changes in view angles are shown in Table 3. The canonicalcorrelation kernel is denoted as CC, kernel while projection kernel is termed as ‘Proj’. The prefixes ‘SM’, ‘HK’, and ‘SR’ are the abbreviations for simpleminded (the binary graph), heat kernel function, and sparse representation. We included comparison with the multiview subspace representation (MSR) method. On top of that, we also added the classical scorelevel fusion method to benchmark the algorithm. The scores from the different view angles were fused together using the minimum dissimilarity selection rule [60].
When all of the viewing angles are used to train the system, 100% accuracy could be achieved for all the methods except for scorelevel fusion and MSR. It is not surprising to get such good result because GGE captures the variations in viewpoint changes when recognition is performed. In the mixed view settings, promising results close to 100% accuracy is obtained. This is encouraging as the proposed methods are shown to possess crossview capability. The performance is still favorable in the different view settings when the viewpoints in the testing set are close to that in the training set. However, the accuracy drops when the viewpoints in the testing sets are far apart from the training set (e.g., in the case of TR 1, 2, 3, 4; TT 7, 8, 9, 10). We accept such poor result because we understand that it is a challenging problem to extrapolate unseen views which are very different from the existing views. Based on the results shown in Table 3, we notice that the proposed method using CC kernel consistently yields good results. The effectiveness of SR has been verified in the experiments.
5.1.2 Variants of SR analysis
In this experiment, we evaluate the different variants derived from the SR similarity graph construction approach described in Section 4.3.3. Table 4 displays the performance of the different graph combination methods. Adaptive supervised GGE with CC kernel was applied in this experiment. SR variation 1 refers to the basic adaptive graph construction approach which learns a shared dictionary for all classes. This method constructs the withinclass and betweenclass similarity graphs using W_{w,ij} = S_{ w }(i, j) and W_{b,ij} = S_{ b }(i, j). The l_{1}minimization algorithm given in Equation 6 is run once per test sample, and the output is split into W_{ w,ij } and W_{ b,ij } based on the class labels.
On the contrary, SR variation 2 learns classspecific dictionary. It builds the withinclass similarity graph using {W}_{w,\mathit{ij}}={\widehat{S}}_{w}\left(i,j\right) where {\widehat{S}}_{w}\left(i,j\right) is the result of running l_{1}minimization on the Grassmann output sharing the same class labels with the test sample. The betweenclass similarity graph is constructed using {W}_{b,\mathit{ij}}={\widehat{S}}_{b}\left(i,j\right) where {\widehat{S}}_{b}\left(i,j\right) is the result of running l_{1}minimization on the Grassmann output having different class labels from the test sample. In this respect, the l_{1}minimization algorithm was run twice per test sample: the first time using the training data from the same class to construct W_{ w,ij } and the second time using the training data from the other classes to construct W_{ b,ij }.
As for the SM + SR method, the simpleminded function was used to generate the withinclass similarity graph, W_{ w,ij }, while SR was used to build the betweenclass similarity graph, W_{ b,ij }. Likewise, the heat kernel function and SR were used to generate W_{ w,ij } and W_{ b,ij }, respectively, for the HK + SR method.
We observe that the performances of the different SR variations do not deviate significantly. We use SR variation 1 in the subsequent sections as it gives slightly better results in most situations. Besides, it is less timeconsuming as compared to SR variation 2 and does not require additional parameters like the number of neighbors k and the constant term t as compared to SM + SR and HK + SR.
5.1.3 Evaluation on clothing and carrying conditions
We conducted experiments to examine the performance of the proposed methods under clothing variations. The main purpose of this experiment is to simulate the condition where suspects captured by the surveillance cameras are trying to masquerade themselves by wearing covers like rain coat or hat. This experiment is also useful to identify the ability of the proposed method to discriminate individuals who wear loose outfits like baggy pants and skirt (for ladies) which can obstruct the gait pattern from being observed properly.
The CASIA and OUISIR gait databases were used for this evaluation. For the CASIA database, we took four normal gait sequences as the training set and two bagscarrying and two coatswearing sequences as the testing sets. All the 11 viewing angles were applied in the test. Using two types of data for training and testing (one from the normal walking sequence and the other from the carrying/clothing conditions) is a more realistic setting where we need to generalize the unknown carrying/clothing type from the existing dataset. In reallife scenario, there is no way for us to predict the types of clothes the person wears or the things the person carry when he/she walks. As for the OUISIR database, six different clothing combinations were tested. Most of the clothing combinations were from types A (e.g., regular pants and parka) to M (e.g., baggy pants and down jacket) [58]. The clothing types were chosen such that we could get the largest possible variations for the test. Only 16 subjects were tested in this experiment. This is because we could only identify 16 corresponding pairs between dataset A, the normal walking sequence, and dataset B, walking with clothing variations. Six sequences from dataset A were used as the training set, while the six sequences in dataset B were used as the testing set.
The results of the tests are shown in Table 5. We find that the variations in clothing alter an individual's appearance and make the problem of gait identification challenging. For example, the images depicted in Figure 3 are taken from the same subject. The images look different when different types of clothing are worn. The experiment suggests that further investigation has to be carried out to study gait recognition with substantial clothing variations. Nevertheless, the methods could handle the carrying condition satisfactorily.
5.1.4 Evaluation on walking speeds
We have also conducted experiments to assess the effect of walking speed on gait. We are interested in this study as the perpetrator usually walks faster in order to leave the crime scenes immediately. The OUISIR gait database was used for this evaluation. Using similar treatment as the view angle evaluation, we labeled the speed {2, 3, …, 7 km/h} as S = {1, 2, …, 6}. Table 6 shows the result of evaluating speed variations. Some methods could achieve 100% accuracy when all the speeds are used. Unlike clothing variations, speed changes do not drastically affect the accuracy of gait identification. Therefore, the methods could tolerate speed variations quite robustly.
5.2 Summary and discussion
The important findings of this work are summarized below:

The Grassmann manifold provides a platform to reduce the subspacetosubspace matching problem to a pointtopoint matching model. This is immensely useful for gait recognition as the gait video sequences naturally fall in the subspace learning paradigm (unlike face recognition which can be carried out using single image).

The unsupervised and supervised GGE configurations provide different treatments to gait dataset of different natures (labeled and unlabeled). Nevertheless, the two approaches can be unified gracefully under a general formulation.

GGE outperforms the benchmark methods for all cases. The proposed adaptive learning approach, in particular, yields considerable improvement in classification accuracy.

As a comparison among the different graph construction approaches, the adaptive graph construction method obviously outperforms its counterparts. However, no conclusive remark can be drawn between the global and local methods. The global approach performs better than the local approach under view angle changes, but the opposite happens for the clothing and carrying conditions, while almost similar results were obtained for speed variation. As such, we conjecture that the topological structure of the graph has disparate impact on different scenarios. No single graph structure (referring to the global and local graphs) works best for all cases.

Unsupervised GGE surprisingly outperforms its supervised counterpart in a number of scenarios, e.g., gait image sets with varying view angles and different clothing appearances. The reason why the unsupervised method performs better than the supervised scheme may be because the similarity graph W encodes general information about the relationship among the nodes, whereas the within and betweensimilarity graphs, W_{ w } and W_{ b }, may overlook some subtle discriminative connection in the graphs. There may also be some outliers in the labeled training set, for example, an image of a person wearing thick sweater with hood that confuses the true appearance of the person, which explains why the supervised method is slightly inferior to the unsupervised method. This counterintuitive result suggests that it might be better to resort to unsupervised method when the cost of labeling the data is high where the class information would not lead to a dramatic improvement in recognition rate.

The canonicalcorrelation kernel generally performs better than the projection kernel in the changing view scenario. We attribute this to the nature of the canonicalcorrelation kernel which is based on the notion of principal angle. The canonicalcorrelation kernel will always find the closest view angle in the training set for comparison. This concept is illustrated in Figure 4. Nevertheless, the projection kernel performs better than the canonicalcorrelation kernel in the clothing and carrying conditions. This may be due to the fact that projection kernel treats the image subspaces from a more holistic aspect. However, if the sample size is small, e.g., for the OU dataset, projection kernel does not have any advantage over the canonicalcorrelation kernel. The result suggests that the kernels describe different aspects of the subspaces.
6 Conclusions
This paper demonstrates how it is possible to formulate the gait recognition problem on the Grassmann manifold. This formulation enables us to work in higherorder data structure to harness the nonlinear structure of the data and yet benefit from conventional vectorbased computation. We present a method comprising unsupervised and supervised learning modes on the Grassmann manifold. We further introduce the concept of adaptive graph in the learning mechanism to adaptively tailor the graph content based on the nature of the dataset. Experimental results suggest that the proposed method has a potential for practical application as it demonstrates view and speedinvariant capabilities.
References
Matovski DS, Nixon MS, Mahmoodi S, Carter JN: The effect of time on gait recognition performance. IEEE Trans. Inf. Forensics and Secur. 2012, 7(2):543552.
Zhao W: Face Processing. Burlington: Academic Press; 2005.
Basri R, Jacobs DW: Lambertian reflectance and linear subspaces. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25(2):218233. 10.1109/TPAMI.2003.1177153
Kim TK, Kittler J, Cipolla R: Discriminative learning and recognition of image set classes using canonical correlations. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29(6):10051018.
Cevikalp H, Triggs B: Face recognition based on image sets. In 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010. Piscataway: IEEE; 2010:25672573.
Wang R, Shan S, Chen X, Gao W: Manifoldmanifold distance with application to face recognition based on image set. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR2008. Piscataway: IEEE; 2008:18.
Edelman A, Arias TA, Smith ST: The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal. Appl. 1999, 20(2):303353.
Wikipedia, Hermann Grassmann Wikimedia Foundation, Inc; 2013.http://en.wikipedia.org/wiki/Hermann_Grassmann . Accessed 13 January 2014
Turaga P, Veeraraghavan A, Srivastava A, Chellappa R: Statistical computations on Grassmann and Stiefel manifolds for image and videobased recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33(11):22732286.
Wu T, Turaga P, Chellappa R: Age estimation and face verification across aging using landmarks. IEEE Trans. Inf. Forensics and Secur. 2012, 7(6):17801788.
Khan ZH, Gu IYH: Visual tracking and dynamic learning on the Grassmann manifold with inference from a Bayesian framework and state space models. In 2011 18th IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE; 2011:14331436.
Hamm J, Lee DD: Grassmann discriminant analysis: a unifying view on subspacebased learning. In Proceedings of the 25th International Conference on Machine Learning. New York: ACM; 2008:376383.
Wang T, Shi P: Kernel Grassmannian distances and discriminant analysis for face recognition from image sets. Pattern Recogn. Lett. 2009, 30(13):11611165. 10.1016/j.patrec.2009.06.002
Harandi MT, Sanderson C, Shirazi S, Lovell BC: Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE; 2011:27052712.
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S: Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Anal. Mach. Intell., IEEE Transactions 2007, 29(1):4051.
Yu S, Tan D, Tan T: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. 18th International Conference on Pattern Recognition, 2006. ICPR 2006, 4: 441444.
Kawai R, Makihara Y, Hua C, Iwama H, Yagi Y: Person reidentification using viewdependent scorelevel fusion of gait and color features. In Proceedings of the 21st International Conference on Pattern Recognition. Piscataway: IEEE; 2012:26942697.
Jean F, Bergevin R, Albu AB: Computing and evaluating viewnormalized body part trajectories. Image Vis. Comput. 2009, 27(9):12721284. 10.1016/j.imavis.2008.11.009
Kale A, Chowdhury AKR, Chellappa R: Towards a view invariant gait recognition algorithm. In Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance. Piscataway: IEEE; 2003:143150.
Goffredo M, Bouchrika I, Carter JN, Nixon MS: Selfcalibrating viewinvariant gait biometrics. IEEE Trans. Syst. Man Cybern. B Cybern. 2010, 40(4):9971008.
Bodor R, Drenner A, Fehr D, Masoud O, Papanikolopoulos N: Viewindependent human motion classification using imagebased reconstruction. Image Vis. Comput. 2009, 27(8):11941206. 10.1016/j.imavis.2008.11.008
Zhao G, Liu G, Li H, Pietikäinen M: 3D gait recognition using multiple cameras. In 7th International Conference on Automatic Face and Gesture Recognition FGR 2006. Piscataway: IEEE; 2006:529534.
Yamauchi K, Bhanu B, Saito H: Recognition of walking humans in 3D: initial results. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009. Piscataway: IEEE; 2009:4552.
Makihara Y, Sagawa R, Mukaigawa Y, Echigo T, Yagi Y: Gait recognition using a view transformation model in the frequency domain. Proceedings of the 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. Lecture notes in Computer Science, vol. 3953. In Computer Vision  ECCV 2006. Edited by: Leonardis A, Bischof H, Pinz A. Heidelberg: Springer; 2006:151163.
Kusakunniran W, Wu Q, Li H, Zhang J: Multiple views gait recognition using view transformation model based on optimized gait energy image. In 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). Piscataway: IEEE; 2009:10581064.
Kusakunniran W, Wu Q, Zhang J, Li H: Support vector regression for multiview gait recognition based on local motion feature selection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE; 2010:974981.
Kusakunniran W, Wu Q, Zhang J, Li H: Crossview and multiview gait recognitions based on view transformation model using multilayer perceptron. Pattern Recogn. Lett. 2012, 33(7):882889. 10.1016/j.patrec.2011.04.014
Bashir K, Xiang T, Gong S: Crossview gait recognition using correlation strength. In Proceedings of the British Machine Vision Conference. Edited by: Labrosse F, Zwiggelaar R, Liu Y, Tiddeman B. Guildford: BMVA Press; 2010:109.1109.11.
Lee CS, Elgammal A: Towards scalable viewinvariant gait recognition: multilinear analysis for gait. In Proceedings of the 5th International Conference on Audio and VideoBased Biometric Person Authentication. Heidelberg: Springer; 2005:395405.
Hadid A, Ghahramani M, Kellokumpu V, Pietikainen M, Bustard J, Nixon M: Can gait biometrics be spoofed? In 2012 21st International Conference on Pattern Recognition (ICPR). Piscataway: IEEE; 2012:32803283.
Altab Hossain M, Makihara Y, Wang J, Yagi Y: Clothinginvariant gait identification using partbased clothing categorization and adaptive weight control. Pattern Recogn. 2010, 43(6):22812291. 10.1016/j.patcog.2009.12.020
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31(2):210227.
Singh S, Biswas KK: Biometric gait recognition with carrying and clothing variants. In Recognition and Machine Intelligence. Edited by: Chaudhury S, Mitra S, Murthy CA, Sastry PS, Pal SK. Heidelberg: Springer; 2009:446451.
Tanawongsuwan R, Bobick A: Modelling the effects of walking speed on appearancebased gait recognition. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2: II783II790.
Tsuji A, Makihara Y, Yagi Y: Silhouette transformation based on walking speed for gait identification. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE; 2010:717722. 3283
Kusakunniran W, Wu Q, Zhang J, Li H: Gait recognition across various walking speeds using higher order shape configuration based on a differential composition model. IEEE Trans. Syst. Man Cybern. B Cybern. 2012, 42(6):16541668.
Liu Z, Sarkar S: Improved gait recognition by gait dynamics normalization. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(6):863876.
Tan D, Huang K, Yu S, Tan T: Uniprojective features for gait recognition. In Advances in Biometrics. Edited by: Lee SW, Li SZ. Heidelberg: Springer; 2007:673682.
Guan Y, ChangTsun L: A robust speedinvariant gait recognition system for walker and runner identification. In 2013 International Conference on Biometrics (ICB). Piscataway: IEEE; 2013:18.
Oja E: Subspace Methods of Pattern Recognition. Hertfordshire: Research Studies Press; 1983.
Hotelling H: Relations between two sets of variates. Biometrika 1936, 28: 321. 10.1093/biomet/28.34.321
Yamaguchi O, Fukui K, Maeda K: Face recognition using temporal image sequence. In Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998. Piscataway: IEEE; 1998:318323.
Li F, Dai Q, Xu W, Er G: Weighted subspace distance and its applications to object recognition and retrieval with image sets. IEEE Signal Process Lett. 2009, 16(3):227230.
Liu N, Lu J, Tan YP, Li M: Settoset gait recognition across varying views and walking conditions. In 2011 IEEE International Conference on Multimedia and Expo (ICME). Piscataway: IEEE; 2011:16.
Kim TK, Arandjelović O, Cipolla R: Learning over sets using boosted manifold principal angles (BoMPA). In Proceedings of British Machine Vision Conference 2005. Manchester: BMVA Press; 2005:779788.
Wolf L, Shashua A, Geman D: Learning over sets using kernel principal angles. J. Mach. Learn. Res. 2003, 4: 2003.
Fukui K, Yamaguchi O: Face recognition using multiviewpoint patterns for robot vision. In Robotics Research. Edited by: Dario P, Chatila R. Heidelberg: Springer; 2005:192201.
Fukui K, Stenger B, Yamaguchi O: A framework for 3D object recognition using the kernel constrained mutual subspace method. In Computer Vision  ACCV 2006. Edited by: Narayanan PJ, Nayar SK, Shum HY. Heidelberg: Springer; 2006:315324.
Nishiyama M, Yamaguchi O, Fukui K: Face recognition with the multiple constrained mutual subspace method. In Audio and VideoBased Biometric Person Authentication. Edited by: Kanade T, Jain A, Ratha NK. Heidelberg: Springer; 2005:7180.
Connie T, Michael GKO, Jin ATB: Grassmannian locality preserving discriminant analysis to view invariant gait recognition with image sets. In Proceedings of the 27th Conference on Image and Vision Computing, New Zealand. New York: ACM; 2012:400405.
Golub GH, Loan CFVV: Matrix Computations (Johns Hopkins Studies in Mathematical Sciences. 3rd edition. Baltimore: The Johns Hopkins University Press; 1996.
Yang J, Wright J, Huang TS, Ma Y: Image superresolution via sparse representation. IEEE Trans. Image Process. 2010, 19(11):28612873.
Abhari K, Marsousi M, Babyn P, Alirezaie J: Medical image denoising using low pass filtering in sparse domain. In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Piscataway: IEEE; 2012:114117.
Murray JF, KreutzDelgado K: Visual recognition and inference using dynamic overcomplete sparse learning. Neural Comput. 2007, 19(9):23012352. 10.1162/neco.2007.19.9.2301
Han J, Bhanu B: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(2):316322.
Yang L, Gong W, Gu X, Li W, Liu Y: Bagging null space locality preserving discriminant classifiers for face recognition. Pattern Recogn. 2009, 42(9):18531858. 10.1016/j.patcog.2008.10.014
CASIA Gait Database Beijing: Center for Biometrics and Security Research; 2005.http://www.cbsr.ia.ac.cn/english/Gait%20Databases.asp . Accessed 13 January 2014
Makihara Y, Mannami H, Tsuji A, Hossain MA, Sugiura K, Mori A, Yagi Y: The OUISIR gait database comprising the treadmill dataset. IPSJ Trans. Comput. Vis. Appl. 2012, 4: 5362.
Koh K, Kim SJ, Boyd S: l1_ls: simple Matlab solver for l1regularized least squares problems. 2008.http://www.stanford.edu/~boyd/l1_ls/ . Accessed 13 January2014
Jain A, Nandakumar K, Ross A: Score normalization in multimodal biometric systems. Pattern Recogn. 2005, 38(12):22702285. 10.1016/j.patcog.2005.01.012
Acknowledgements
This research was partially supported by Ministry of Science and Technology (MOSTI), Malaysia and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (2013006574).
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Connie, T., Goh, M.K.O. & Teoh, A.B.J. A Grassmann graph embedding framework for gait analysis. EURASIP J. Adv. Signal Process. 2014, 15 (2014). https://doi.org/10.1186/16876180201415
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16876180201415
Keywords
 Sparse Representation
 View Angle
 Similarity Graph
 Grassmann Manifold
 Locality Preserve Projection