Automated target tracking and recognition using coupled view and identity manifolds for shape representation
 Vijay Venkataraman^{1},
 Guoliang Fan^{1}Email author,
 Liangjiang Yu^{1},
 Xin Zhang^{2},
 Weiguang Liu^{3} and
 Joseph P Havlicek^{4}
https://doi.org/10.1186/168761802011124
© Venkataraman et al; licensee Springer. 2011
Received: 31 May 2011
Accepted: 7 December 2011
Published: 7 December 2011
Abstract
We propose a new couplet of identity and view manifolds for multiview shape modeling that is applied to automated target tracking and recognition (ATR). The identity manifold captures both interclass and intraclass variability of target shapes, while a hemispherical view manifold is involved to account for the variability of viewpoints. Combining these two manifolds via a nonlinear tensor decomposition gives rise to a new target generative model that can be learned from a small training set. Not only can this model deal with arbitrary view/pose variations by traveling along the view manifold, it can also interpolate the shape of an unknown target along the identity manifold. The proposed model is tested against the recently released SENSIAC ATR database and the experimental results validate its efficacy both qualitatively and quantitatively.
Keywords
tracking and recognition shape representation shape interpolation manifold learning1 Introduction
Automated target tracking and recognition (ATR) is an important capability in many military and civilian applications. In this work, we mainly focus on tracking and recognition techniques for infrared (IR) imagery, which is a preferred imaging modality for most military applications. A major challenge in visionbased ATR is how to cope with the variations of target appearances due to different viewpoints and underlying 3D structures. Both factors, identity in particular, are usually represented by discrete variables in practical existing ATR algorithms [1–3]. In this paper we will account for both factors in a continuous manner by using view and identity manifolds. Coupling the two manifolds for target representation facilitates the ATR process by allowing us to meaningfully synthesize new target appearances to deal with previously unknown targets as well as both known and unknown targets under previously unseen viewpoints.
Common IR target representations are nonparametric in nature, including templates [1], histograms [4], edge features [5] etc. In [5] the target is represented by intensity and shape features and a selforganizing map is used for classification. Histogrambased representations were shown to be simple yet robust under difficult tracking conditions [4, 6], but such representations cannot effectively discriminate among different target types due to the lack of higher order structure. In [7], the shape variability due to different structures and poses is characterized explicitly using a deformable and parametric model that must be optimized for localization and recognition. This method requires highresolution images where salient edges of a target can be detected, and may not be appropriate for ATR in practical IR imagery. On the other hand, some ATR approaches [8, 1, 9] depend on the use of multiview exemplar templates to train a classifier. Such methods normally require a dense set of training views for successful ATR tasks and they are often limited in dealing with unknown targets.
The remainder of this paper is organized as follows. In Section 2, we review some related work in the area of 3D object representation. In Section 3, we present our generative model where the identity and view manifolds are discussed in detail. In Section 4, we discuss the implementation of the particle filter based inference algorithm that incorporates the proposed target model for ATR tasks. In Section 5, we report experimental results of target tracking and recognition on both IR sequences from the SENSIAC dataset and some visibleband video sequences, and we also discuss the limitations and possible extensions of the proposed generative model. Finally, we present our conclusions in Section 6.
2 Related Work
This section begins with a review of different ways to represent a 3D object and the reasons for our choice of a multiview silhouettebased method. Then we focus on several existing shape representation methods by examining their ability to parameterize shape variations, the ability to interpolate, and the ease of parameter estimation.
There are two commonly used approaches to represent 3D rigid objects. The first approach suggests a set of representative 2D snapshots [11, 12] captured from multiple viewpoints. These snapshots may be represented in the form of simple shape silhouettes, contours, or complex features such as SIFT, HOG, or image patches. The second approach involves an explicit 3D object model [13] where common representations vary from simple polyhedrons to complex 3D meshes. In the first case, unknown views can be interpolated from the given sample set, whereas in the second case, the 3D model is used to match the observed view via 3Dto2D projection. Accordingly, most object recognition methods can be categorized into one of two groups: those involving 2D multiview images [14–19] and those supported by explicit 3D models [20–23]. There are also hybrid methods [24] that make use of both the 3D shape and 2D appearances/features.
In this work, we choose to represent a target by its representative 2D views due to two main reasons. First, this is theoretically supported by the psychophysical evidence presented in [25] which suggest that the human visual system is better described as recognizing objects by 2D view interpolation than by alignment or other methods that rely on objectcentered 3D models. Second, it could be practically cumbersome to store and reference a large collection of detailed 3D models of different target types in a practical ATR system. Moreover, it is worth noting that many robust features (HOG, SIFT) used to represent objects were developed mainly for visibleband images and their use is limited by some factors such as image quality, resolution etc. In IR imagery, the targets are often small and frequently lack sufficient resolution to support robust features. Finally, the IR sensors in the SENSIAC database are static, facilitating target segmentation by background subtraction. Thus the ability to efficiently extract target silhouettes and the simplicity of silhouettebased shape representation motivates us to use the silhouette for multiview target representation.
There are two related issues for shape representation. One is how to effectively represent the shape variation, and the other is how to infer the underlying shape variables, i.e., view and identity. As pointed out in [26], feature vectors obtained from common shape descriptors, such as shape contexts [27] and moment descriptors [28], are usually assumed to lie in a Euclidean space to facilitate shape modeling and recognition. However, in many cases the underlying shape space may be better described by a nonlinear low dimensional (LD) manifold that can be learned by nonlinear dimensionality reduction (DR) techniques, where the learned manifold structures are often either targetdependent or viewdependent [29]. Another trend is to explore a shape space where every point represents a plausible shape and a curve between two points in this space represents a deformation path between two shapes. Though this method was shown successful in applications such as action recognition [26] and shape clustering [30], it is difficult to explicitly separate the identity and view factors during shape deformation as is necessary in the context of ATR applications.
This brings us to the point of learning the LD embedding of the latent factors, e.g., view and identity, from the highdimensional (HD) data, e.g., silhouettes. In an early work [31], PCA was used to find two separate eigenspaces for visual learning of 3D objects, one for the identity and one for the pose. The bilinear models [32] and tensor analysis [33] provide a more systematic multifactor representation by decomposing HD data into several independent factors. In [34], the view variable is related with the appearance through shape submanifolds which have to be learned for each object class. All of these methods are limited to a discrete identity variable where each object is associated with a separate view manifold. Our work draws inspiration from [35] where a nonlinear tensor decomposition method is used to learn an identityindependent view manifold for multiview dynamic motion data. A torus manifold was also proposed in [36, 37] for the same purpose that is a product of two circularshaped manifolds, i.e., the view and pose manifolds. In [36, 37, 35], the style factor of body shape (i.e., the identity) is a continuous variable defined in a linear space.
Our work presented in this paper is distinct from that in [36, 37, 35] primarily in terms of two main original contributions. The first is our couplet of view and identity manifolds for multiview shape modeling: unlike [36, 37, 35] where the identity is treated linearly, for the first time we propose a 1D identity manifold to support a continuous nonlinear identity variable. Also, the view and pose manifolds in [36, 37, 35] have welldefined topologies due to their sequential nature. However, in our IR ATR application the topology of the identity manifold is not clear owing to a lack of understanding of the intrinsic LD structure spanning a diverse set of targets. Finding an appropriate ordering relationship among a set of targets is the key to learning a valid identity manifold for effective shape interpolation. To better support ATR tasks, the view manifold used here involves both the azimuth and elevation angles, compared with the case of a single variable in [36, 37, 35]. The second contribution is the development of a particle filterbased ATR approach that integrates the proposed model for shape interpolation and matching. This new approach supports joint tracking and recognition for both known and unknown targets and achieves superior results compared with traditional templatebased methods in both IR and visibleband image sequences.
3 Target Generative Models
3.1 Identity manifold
The identity manifold that plays a central role in our work is intended to capture both interclass and intraclass shape variability among training targets. In particular, the continuous nature of the proposed identity manifold makes it possible to interpolate valid target shapes between known targets in the training data. There are two important questions to be addressed in order to learn an identity manifold with the desired interpolation capability. The first one is which space this identity manifold should span. In other words, should it be learned from the HD silhouette space or a LD latent space? We expect traversal along the identity manifold to result in gradual shape transition and valid shape interpolation between known targets. This would ideally require the identity manifold to span a space that is devoid of all other factors that contribute to the shape variation. Therefore the identity manifold should be learned in a LD latent space with only the identity factor rather than in the HD data space where the view and identity factors are coupled together. The second important question is how to learn a semantically valid identity manifold that supports meaningful shape interpolation for an unknown target. In other words, what kind of constraint should be imposed on the identity manifold to ensure that interpolated shapes correspond to feasible realworld targets? We defer further discussion of the first issue to Section 3.3 and focus here on the second one that involves the determination of an appropriate topology for the identity manifold.
The topology determines the span of a manifold with respect to its connectivity and dimensionality. In this work, we suggest a 1D closedloop structure to represent the identity manifold and there are several important considerations to support this seemingly arbitrary but actually practical choice. First, the learning of a higherdimensional manifold requires a large set of training samples that may not be available for a specific ATR application where only a relatively small candidate pool of possible targetsofinterest is available. Second, this identity manifold is assumed to be closed rather than open, because all targets in our ATR problem are manmade ground vehicles which share some degree of similarity with extreme disparity unlikely. Third, the 1D closed structure would greatly facilitate the inference process for online ATR tasks. As a result, the manifold topology is reduced to a specific ordering relationship of training targets along the 1D closed identity manifold. Ideally, we want targets of the same class or those with similar shapes to stay closer on the identity manifold compared with dissimilar ones. Thus we introduce a classconstrained shortestclosedpath method to find a unique ordering relationship for the training targets. This method requires a viewindependent distance or dissimilarity measure between two targets. For example, we could use the shape dissimilarity between two 3D target models that can be approximated by the accumulated mean square errors of multiview silhouettes.
where . represents the Euclidean distance and β is a constant. The first term in (2) denotes a view independent shape similarity measure between targets u and v as it is averaged over all training views. The second term is a penalty term that ensures targets belonging to the same class to be grouped together. The manifold topology T^{ * } defined in (1) tends to group targets of similar 3D shapes and/or the same class together, enforcing the best local semantic smoothness along the identity manifold, which is essential for a valid shape interpolation between target types.
It is worth mentioning that the identity manifold to be learned according to T^{ * } will encompass multiple target classes each of which has several subclasses. For example, we consider six classes of vehicles in this work each of which includes six subclass types. Although it is easy to understand the feasibility and necessity of shape interpolation within a class to accommodate intraclass variability, the validity of shape interpolation between two different classes may seem less clear. Actually, T^{ * } not only defines the ordering relationship within each class but also the neighboring relationship between two different classes. For example the six classes considered in this paper are ordered as: Armored Personnel Carriers (APCs) → Tanks → Pickup Trucks → Sedan Cars → Minivans → SUVs → APCs. Although APCs may not look like Tanks or SUVs in general, APCs are indeed located between Tanks and SUVs along the identity manifold according to T*. It occurs because that (1) finds an APCTank pair and an APCSUV pair that have the least shape dissimilarity compared with all other pairs. Thus this ordering still supports sensible interclass shape interpolation, although it may not be as smooth as intraclass interpolation, as will be shown later in the experiments.
3.2 Conceptual view manifold
We need a view manifold to accommodate the viewinduced shape variability for different targets. A common approach is to use nonlinear DR techniques, such as LLE or Laplacian eigenmaps, to find the LD view manifold for each target type [29]. One main drawback of using identitydependent view manifolds is that they may lie in different latent spaces and have to be aligned together in the same latent space for general multiview modeling. Therefore, the view manifold here is designed to be a hemisphere that embraces almost all possible viewing angles around a ground vehicle as shown in Figure 1 and is characterized by two parameters: the azimuth and elevation angles Θ = {θ, ϕ}. This conceptual manifold provides a unified and intuitive representation of the view space and supports efficient dynamic view estimation.
3.3 Nonlinear Tensor Decomposition
We extend the nonlinear tensor decomposition in [35] to develop the proposed generative model. The key is to find a viewindependent space for learning the identity manifold through the commonlyshared conceptual view manifold (the first question raised in Section 3.1).
This equation supports shape interpolation along the view manifold. This is possible due to the interpolation friendly nature of RBF kernels and the well defined structure of the view manifold. However it cannot be said with certainty that any arbitrary vector i∈ span(i^{1},..., i^{ N }) will result in a valid shape interpolation due to the sparse nature of the training set in terms of the identity variation.
To support meaningful shape interpolation, we constrain the identity space to be a 1D structure that includes only those points on a closed Bspline curve connecting the identity basis vectors {i^{ k }k = 1,..., N} according to the manifold topology defined in (1). We refer to this 1D structure as the identity manifold denoted by $\mathcal{M}\subset {\mathbb{R}}^{N}$. Then an arbitrary identity vector $\mathit{i}\in \mathcal{M}$ would be semantically meaningful due to its proximity to the basis vectors, and should support a valid shape interpolation. Although the identity manifold $\mathcal{M}$ has an intrinsic 1D closedloop structure, it is still defined in the tensor space ℝ^{ N }. To facilitate the inference process, we introduce an intermediate representation, i.e., a unit circle as an equivalent of $\mathcal{M}$ parameterized by a single variable. First, we map all identity basis vectors {i^{ k }k = 1,..., N} onto a set of angles uniformly distributed along a unit circle, {α_{ k } = (k  1) * 2π/Nk = 1,..., N}.
where α ∈ [0, 2π) is the identity variable and $i\text{(}\alpha \text{)}\in \mathcal{M}$ is its corresponding identity vector along the identity manifold in ℝ^{ N }. Thus (9) defines a generative model for multiview shape modeling that is controlled by two continuous variables α and Θ defined along their own manifolds.
4 Inference Algorithm
where Δt is the time interval between two adjacent frames. The process noise associated with the target kinematics is Gaussian, i.e., ${w}_{t}^{\phi}~N\left(0,\phantom{\rule{2.77695pt}{0ex}}{\sigma}_{\phi}^{2}\right)$, ${w}_{t}^{v}\phantom{\rule{2.77695pt}{0ex}}~N\left(0,\phantom{\rule{2.77695pt}{0ex}}{\sigma}_{v}^{2}\right)$, ${w}_{t}^{x}~N\left(0,\phantom{\rule{2.77695pt}{0ex}}{\sigma}_{x}^{2}\right)$, ${w}_{t}^{y}~N\left(0,\phantom{\rule{2.77695pt}{0ex}}{\sigma}_{y}^{2}\right)$, and ${w}_{t}^{z}~N\left(0,\phantom{\rule{2.77695pt}{0ex}}{\sigma}_{z}^{2}\right)$. The Gaussian variances should be chosen to reflect the possible target dynamics and ground conditions. For example, if the candidate pool includes highly maneuvering targets, then large values ${\sigma}_{\phi}^{2}$ and ${\sigma}_{v}^{2}$ are needed while tracking on a rough or uneven ground plane requires larger values ${\sigma}_{y}^{2}$.
where ${w}_{t}^{\alpha}~N\left(0,\phantom{\rule{2.77695pt}{0ex}}{\sigma}_{\alpha}^{2}\right)$. This model allows the estimated identity value to evolve along the identity manifold and converge to the correct one during sequential estimation. There are two possible future improvements to make this approach more efficient. One is to add an annealing treatment to reduce ${\sigma}_{\alpha}^{2}$ over time and the other is to make ${\sigma}_{\alpha}^{2}$ viewdependent. In other words, the variance can be reduced near the side view when the target is more discriminative and increased near front/rear views when it is more ambiguous.
Pseudocode for the particle filterbased ATR algorithm
• Initialization: Draw ${\mathbf{X}}_{0}^{j}~N\left({X}_{0},1\right)$, and ${\alpha}_{0}^{j}={\alpha}_{0}$ ∀j ∈ {1,..., N_{ p }}. Here X_{0} and α_{0} are the initial kinematic state and identity values, respectively. 

• For t = 1,..., T (number of frames) 
1. For j = 1,..., N_{ p } (number of particles) 
1.1 Draw samples ${\mathbf{X}}_{t}^{j}~p\left({\mathbf{X}}_{t}^{j}\mid {\mathbf{X}}_{t1}^{j}\right)$ and ${\alpha}_{t}^{j}~p\left({\alpha}_{t}^{j}\mid {\alpha}_{t1}^{j}\right)$ as in (10) and (11). 
1.2 Compute weights ${w}_{t}^{j}=p\left({\mathbf{z}}_{t}\mid {\alpha}_{t}^{j},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{X}}_{t}^{j}\right)$ using (12). 
End 
2. Normalize the weights such that ${\sum}_{j=1}^{{N}_{p}}{w}_{t}^{j}=1$. 
3. Compute the mean estimates of the kinematics and identity ${\widehat{\mathbf{X}}}_{t}={\sum}_{j=1}^{{N}_{p}}{w}_{t}^{j}{\mathbf{X}}_{t}^{j}$ and ${\widehat{\alpha}}_{t}={\sum}_{j=1}^{{N}_{p}}{w}_{t}^{j}{\alpha}_{t}^{j}$. 
4. Set $\left[{\alpha}_{t}^{j},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{X}}_{t}^{j}\right]=\text{resample}\left({\alpha}_{t}^{j},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{X}}_{k}^{j},\phantom{\rule{2.77695pt}{0ex}}{w}_{k}^{j}\right)$ to increase the effective number of particles [39]. 
• End 
5 Experimental results
We have developed four particle filterbased ATR algorithms that share the same inference framework shown in Figure 4 and by which we can evaluate the effectiveness of shape interpolation. MethodI uses the proposed target generative model involving both the view and identity manifolds for shape interpolation (i.e., both the identity and view variables are continuous). MethodII applies a simplified version where only the view manifold is involved for shape interpolation (i.e., the identity variable is discrete). MethodIII involves shape interpolation along the identity manifold only (i. e., the view variable is discrete). Finally, MethodIV is a traditional templatebased method that only uses the training data for shape matching without shape interpolation (i.e., both the view and identity variables are discrete).
We report three major experimental results in the following. First we present the learning of the proposed generative model along with some simulated results of shape interpolation. Then we introduce the SENSIAC dataset [10] followed by detailed results on a set of IR sequences of various targets at multiple ranges. We also include three visiblebased video sequences for algorithm evaluation, among which two were captured from remotecontrolled toy vehicles in a room and one was from a realworld surveillance video. Background subtraction [40] was applied to all testing sequences to obtain the initial target segmentation result in each frame and the distance transform [29] was applied to create the observation sequences that were used for shape matching.
5.1 Generative Model Learning

Shape interpolation along the view manifold: We selected one target from each of the six classes and created three interpolated shapes (after thresholding) between three training views, as shown in Figure 6(a). We observe smooth transitions between the interpolated shapes and training shapes, especially around the wheels of the targets.

Shape interpolation along the identity manifold within the same class: We generated six interpolated shapes along the identity manifold between three adjacent training targets for each of the six classes, as shown in Figure 6(b). Despite the fact that the three training targets are quite different in terms of their 3D structures, the interpolated shapes blend the spatial features from the two adjacent training targets in a natural way.

Shape interpolation along the identity manifold between two adjacent classes: It is also interesting to see the shape interpolation results between two adjacent target classes, as shown in Figure 6(c). Although the series of shape variations may not be as smooth as that in Figure 6(b), the generative model still produces intermediate shapes between two vehicle classes that are realistic looking.
The above results show that the target model supports semantically meaningful shape interpolation along the two manifolds, making it possible to handle not only a known target seen from a new view but also an unknown target seen from arbitrary views. Also, the continuous nature of the view and identity variables facilitates the ATR inference process.
5.2 Tests on the SENSIAC database
5.2.1 Tracking Evaluation
5.2.2 Recognition Evaluation
Overall recognition accuracies (%) of four methods (MethodI/MethodII/MethodIII/MethodIV) against 48 SENSIAC sequences
Targets/ranges  Tanks  APCs  SUV  Pickup 

1000 m  96/94/91/90  94/92/89/88  100/100/99/99  100/100/100/99 
1500 m  93/91/88/86  88/86/85/82  100/99/98/98  100/100/100/98 
2000 m  86/83/82/81  85/83/80/80  98/96/96/95  97/96/97/95 
2500 m  78/73/72/69  76/72/71/70  92/90/89/86  90/88/88/86 
3000 m  70/65/62/60  72/69/66/65  86/84/82/79  82/80/79/77 
3500 m  68/62/58/57  70/65/64/62  78/76/75/70  73/72/70/65 
5.3 Results on Visibleband Sequences
5.4 Discussion and Limitations
Although these results are promising, we still consider this work preliminary for three main reasons. First, the computational complexity of the proposed algorithm (MethodI) is relatively high due to the shape interpolation using the generative model. Our experimental results are based on a nonoptimized Matlab implementation. Shape interpolation requires approximately 0.03s on a PC i7 computer (without parallel computation), and the inference with 200 particles requires about 6.9s per frame. Fast implementation is still needed to support realtime processing. Second, we use a silhouettebased shape representation that requires target segmentation. The background subtraction used here assumes that the camera platform is not moving. In the case of a moving camera platform, the initial target segmentation could become a challenging issue. Third, we did not consider the issue of occlusion that has to be accounted for in any practical ATR system. The silhouette is a global feature that could be sensitive to occlusion. An extension to other more salient and robust features such as SIFT and HOG would increase the applicability of the proposed method for realworld applications. Nevertheless, our main contribution is a new shapebased target model where, for the first time, both the view and identity variables are continuous and defined along their own respective manifolds.
6 Conclusion and Future Work
We have presented a new shapebased generative model that incorporates two continuous manifolds for multiview target modeling. Specifically, the identity manifold was proposed to capture both interclass and intraclass shape variability among different target types. The hemispherical view manifold is designed to reflect nearly all possible viewpoints. A particle filterbased ATR algorithm was presented that adopts the new target model for joint tracking and recognition. The experiments on both IR and visiblebased video sequences show the advantages of shape interpolation along both the view and identity manifolds.
However, the current work only considers the silhouettebased shape for target representation that may not be sufficiently distinctive in some challenging cases. This work could be extended to other more salient and robust features thereby making the proposed model more promising for realworld applications. Another issue that needs further research is the structure and dimensionality of the identity manifold. In some sense, the lD identity manifold used here is a practical simplification where a small set of training models (e.g., six models for each of the six classes, totally 36 in this work) is used for learning the generative model. It is possible we can learn a 2D or even 3D identity manifold for more generalized target modeling given sufficient training data. However, there will be two major challenges in going to a higher dimension space. One is how to learn an appropriate manifold topology in 2D or 3D, which is much harder than the lD learning we considered here. The other is how to infer the identity variable effectively in a 2D or 3D identity manifold. There should be a balanced consideration of both the complexity and efficiency when using the couplet of view and identity manifolds for realworld ATR applications.
Notes
^{1}Both1 AS90 and 2S3 are selfpropelled howitzers.
Declarations
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that helped us improve this paper.
This work was supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under grants W911NF0410221 and W911NF0810293, the National Science Foundation under Grant IIS0347613, and an OHRS award (HR09030) from the Oklahoma Center for the Advancement of Science and Technology.
Authors’ Affiliations
References
 Mei X, Zhou SK, Wu H: Integrated Detection, Tracking and Recognition for IR VideoBased Vehicle Classification. Proc IEEE International Conference on Acoustics, Speech and Signal Processing 2006.Google Scholar
 Miller MI, Grenander U, Osullivan JA, Snyder DL: Automatic target recognition organized via jumpdiffusion algorithms. IEEE Trans Image Processing 1997, 6: 157174. 10.1109/83.552104View ArticleGoogle Scholar
 Venkataraman V, Fan X, Fan G: Integrated Target Tracking and Recognition using Joint AppearanceMotion Generative Models. Proc IEEE Workshop on Object Tracking and Classification Beyond Visible Spectrum (OTCBVS08) in conjunction with CVPR08 2008.Google Scholar
 Venkataraman V, Fan G, Fan X: Target Tracking with Online Feature Selection in FLIR Imagery. Proc IEEE Workshop on Object Tracking and Classification Beyond Visible Spectrum (OTCBVS07) in conjunction with CVPR07 2007.Google Scholar
 Shaik J, Iftekharuddin K: Automated tracking and classification of infrared images. Proc International Joint Conference on Neural Networks 2003.Google Scholar
 Venkataraman V, Fan G, Fan X, Havlicek J: Appearance Learning by Adaptive Kalman Filters for FLIR Tracking. Proc IEEE Workshop on Object Tracking and Classification Beyond Visible Spectrum (OTCBVS09) in conjunction with CVPR09 2009.Google Scholar
 Zhang Z, Dong W, Huang K, Tan T: EDA Approach for Model Based Localization and Recognition of Vehicles. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2007.Google Scholar
 Chan L, Nasrabadi N: Modular waveletbased vector quantization for automatic target recognition. Proc International Conference on Multisensor Fusion and Integration for Intelligent Systems 1996.Google Scholar
 Wang L, Der S, Nasrabadi N: Automatic target recognition using a featuredecomposition and datadecomposition modular neural network. IEEE Trans Image Processing 1998,7(8):11131121. 10.1109/83.704305View ArticleGoogle Scholar
 Military Sensing Information Analysis Center (SENSIAC)2008. [Https://www.sensiac.org/]
 Poggio T, Edelman S: A network that learns to recognize threedimensional objects. Nature 1990, 343: 263266. 10.1038/343263a0View ArticleGoogle Scholar
 Ullman S, Basri R: Recognition by Linear Combinations of Models. IEEE Trans Pattern Analysis and Machine Intelligence 1991, 13: 9921006. 10.1109/34.99234View ArticleGoogle Scholar
 Ullman S: An Approach to Object Recognition: Aligning Pictorial Descriptions. Cognition 1989, 32: 193254. 10.1016/00100277(89)90036XView ArticleGoogle Scholar
 Khan S, Cheng H, Matthies D, Sawhney H: 3D model based vehicle classification in aerial imagery. Proc IEEE International Conf on Computer Vision and Pattern Recognition 2010.Google Scholar
 Kushal A, Schmid C, Ponce J: Flexible object models for categorylevel 3D object recognition. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2007.Google Scholar
 Su H, Sun M, FeiFei L, Savarese S: Learning a dense multiview representation for detection, viewpoint classification and synthesis of object categories. Proc IEEE International Conference on Computer Vision 2009.Google Scholar
 Ozcanli O, Tamrakar A, Kimia B: Augmenting shape with appearance in vehicle category recognition. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2006.Google Scholar
 Savarese S, FeiFei L: Multiview Object Categorization and Pose Estimation. Computer Vision, Volume 285 of Studies in Computational Intelligence, Springer 2010.Google Scholar
 Toshevand A, Makadiaa A, Daniilidis K: Shapebased object recognition in videos using 3D synthetic object models. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2009.Google Scholar
 Lou J, Tan T, Hu W, Yang H, Maybank S: 3D modelbased vehicle tracking. IEEE Trans Image Processing 2005, 14: 15611569.View ArticleGoogle Scholar
 Leotta M, Mundy J: Predicting high resolution image edges with a generic, adaptive, 3D vehicle model. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2009.Google Scholar
 Sandhu R, Dambreville S, Yezzi A, A T: Nonrigid 2D3D pose estimation and 2D image segmentation. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2009.Google Scholar
 Tsin Y, Gene Y, Ramesh V: Explicit 3D modeling for vehicle monitoring in nonoverlapping cameras. Proc IEEE International Conference on Advanced Video and Signal based Surveillance 2009.Google Scholar
 Liebelt J, Schmid C: Multiview object class detection with a 3D geometric model. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2010.Google Scholar
 Bülthoff H, Edelman S: Psychophysical support for a 2D view interpolation theory of object recognition. Proc of the National Academy of Science 1992, 89: 6064. 10.1073/pnas.89.1.60View ArticleGoogle Scholar
 Abdelkader M, AbdAlmageed W, Srivastava A, Chellappa R: Silhouettebased gesture and action recognition via modeling trajectories on Riemannian shape manifolds. Computer Vision and Image Understanding 2011,115(3):439455. 10.1016/j.cviu.2010.10.006View ArticleGoogle Scholar
 Belongie S, Malik J, Puzicha J: Shape matching and object recognition using shape contexts. IEEE Trans Pattern Analysis and Machine Intelligence 2002,24(4):509522. 10.1109/34.993558View ArticleGoogle Scholar
 Hu M: Visual pattern recognition by moment invariants. IRE Trans Information Theory 1962,8(2):179187. 10.1109/TIT.1962.1057692View ArticleMATHGoogle Scholar
 Elgammal A, Lee CS: Separating style and content on a nonlinear manifold. Proc IEEE International Conference on Computer Vision and Pattern Recognition 2004.Google Scholar
 Srivastava A, Joshi S, Mio W, Liu X: Statistical shape analysis: clustering, learning, and testing. IEEE Trans Pattern Analysis and Machine Intelligence 2005,27(4):590602.View ArticleGoogle Scholar
 Murase H, Nayar S: Visual learning and recognition of 3D objects from appearance. International Journal of Computer Vision 1995, 14: 524. 10.1007/BF01421486View ArticleGoogle Scholar
 Tenenbaum J, Freeman WT: Separating style and content with bilinear models. Neural Computation 2000, 12: 12471283. 10.1162/089976600300015349View ArticleGoogle Scholar
 Vasilescu MAO, Terzopoulos D: Multilinear analysis of image ensembles: Tensorfaces. Proc IEEE European Conference on Computer Vision 2002.Google Scholar
 Gosch C, Fundana K, Heyden A, Schnörr C: View point tracking of rigid objects based on shape submanifolds. Proc European Conference on Computer Vision 2008.Google Scholar
 Lee C, Elgammal A: Modeling View and Posture Manifolds for Tracking. Proc IEEE International Conference on Computer Vision 2007.Google Scholar
 Elgammal A, Lee CS: Tracking people on torus. IEEE Trans on Pattern Analysis and Machine Intelligence 2009, 31: 520538.View ArticleGoogle Scholar
 Lee C, Elgammal A: Simultaneous Inference of View and Body Pose using Torus Manifolds. Proc IEEE Int'l Conference on Pattern Recognition 2006.Google Scholar
 Vasilescu MAO, Terzopoulos D: Multilinear image analysis for facial recognition. Proc IEEE International Conferenec on Pattern Recognition 2002.Google Scholar
 Arulampalam S, Maskell S, Gordon N, Clapp T: A Tutorial on Particle Filters for Online Nonlinear/NonGaussian Bayesian Tracking. IEEE Trans Signal Processing 2002,50(2):174188. 10.1109/78.978374View ArticleGoogle Scholar
 Zivkovic Z, van der Heijden F: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters 2006, 27: 773780. 10.1016/j.patrec.2005.11.005View ArticleGoogle Scholar
 She K, Bebis G, Gu H, Miller R: Vehicle Tracking Using OnFusion of Color and Shape Features. Proc IEEE International Conference on Intelligent Transportation Systems 2004.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.