Classspecific Gaussianmultinomial latent Dirichlet allocation for image annotation
 Zhiming Qian^{1}Email author,
 Ping Zhong^{1} and
 Runsheng Wang^{1}
https://doi.org/10.1186/s136340150224z
© Qian et al.; licensee Springer. 2015
Received: 26 November 2014
Accepted: 7 April 2015
Published: 1 May 2015
Abstract
Image annotation has been a challenging problem due to the wellknown semantic gap between two heterogeneous information modalities, i.e., the visual modality referring to lowlevel visual features and the semantic modality referring to highlevel human concepts. To bridge the semantic gap, we present an extension of latent Dirichlet allocation (LDA), denoted as classspecific Gaussianmultinomial latent Dirichlet allocation (csGMLDA), in an effort to simulate the human’s visual perception system. An analysis of previous supervised LDA models shows that the topics discovered by generative LDA models are driven by general image regularities rather than the semantic regularities for image annotation. To address this, csGMLDA is introduced by using class supervision at the level of visual features for multimodal topic modeling. The csGMLDA model combines the labeling strength of topic supervision with the flexibility of topic discovery, and the modeling problem can be effectively solved by a variational expectationmaximization (EM) algorithm. Moreover, as natural images usually generate an enormous size of highdimensional data in annotation applications, an efficient descriptor based on Laplacian regularized uncorrelated tensor representation is proposed for explicitly exploiting the manifold structures in the highorder image space. Experimental results on two standard annotation datasets have shown the effectiveness of the proposed method by comparing with several stateoftheart annotation methods.
Keywords
Image annotation Latent Dirichlet allocation Variational EM Uncorrelated tensor representation Laplacian regularization1 Introduction
Automatic image annotation is a challenging work of tasks related to understanding what we see in a visual scene due to the wellknown semantic gap [1]. Given an input image, the goal of image annotation is to assign meaningful tags to the image aiming at summarizing its visual contents. Such methods are becoming more and more important given the growing collections of both private and publicly available images. However, challenges for these methods often lie in three aspects: the intertag similarity problem that different tags may have similar visual contents, the tag locality problem that most tags are only related to their corresponding semantic regions, and the intratag diversity problem that the relevant regions for each tag at different images can be different.
The intertag similarity problem reveals the fact that the visual similarity does not always guarantee the semantic similarity, which in general is conflicting with the inherent assumption of many image annotation methods, e.g., some relevant methods [2,3] that perform tag propagations according to their visual similarities. To cope with this problem, it is emergent to develop more discriminative visual features that can be used to separate various visual contents for different tags. However, traditional vector representations in the form of bagoffeatures or bagofwords, such as the visual descriptor that quantizes SIFT local features [3] and the colored pattern appearance model (CPAM) [4], are usually incompetent for the intention. The reason is that these features usually ignore the highorder characteristics of natural images and might lead to the curse of dimensionality problem when requiring a relatively discriminative representation for describing the complex visual world. In practice, an image is intrinsically a twodimensional or highorder tensor. To fairly evaluate the highorder characteristics of image contents, tensor representations [5,6], which can explicitly describe the multiple interrelated restrictions, might allow us to avoid the problem of curse of dimensionality.
To tackle the tag locality problem, one may employ local image features instead of holistic image features to describe the visual contents of a certain tag. The work in [7] considered each image as a bag of multiple segmented regions and predicted the tag of each region by a multiclass bag classifier. This method, however, heavily depends on the segmentation performance, which is very sensitive to the image noise. Recently, implicit image representations attract much attention on describing local regions. To reveal the tag locality, Bao et al. [8] introduced hidden concepts for decomposing holistic image representation into tag representations. Mesnil et al. [9] learned implicit representations for both the objects and their parts. Although these representations cannot explicitly describe the regions of a certain tag, they implicitly capture the tag’s local visual contents by learning from large amount of annotated images. Thus, implicit image representation is nontrivial for tackling the tag locality problem in largescale datasets.
Considering the problem of intratag diversity, a straightforward way is to set up the classspecific techniques [10,11] by treating annotation tags as class labels and learning the visual contents within each class. Although capable of identifying sets of visual contents discriminative for the classes of interest, these straightforward methods do not explicitly model the interclass and intraclass structures of visual distributions due to its lack of hierarchical content groupings. To facilitate the discovery of these structures, various hierarchical generative methods have been recently ported from the text to the vision literature. Among these methods, topic models, such as latent Dirichlet allocation (LDA) [12] and probabilistic latent semantic analysis (pLSA) [13], that consider probabilistic latent variable models for hierarchical learning have caused extensive interest. However, an analysis of previous supervised topic models [14] shows that the topics discovered by these models are driven by general image regularities rather than the semantic regularities for image annotation. For example, it has been noted in [14] that given a collection of movie reviews, LDA might discover topics as movie properties, such as genres, which are not central to the annotation task. Therefore, incorporating a class label variable into a generative model might tackle the intratag diversity problem well. Such extensions have been successfully applied into the classification task, such as class LDA (cLDA) [14], supervised LDA (sLDA) [15], classspecificsimplex LDA (cssLDA) [16], and so on.
In this paper, we develop a new extension of LDA coupled with Laplacian regularized uncorrelated tensor representation for learning semantics in the image data. Since tensor representation can well capture the highorder statistics and structures from the training data, the proposed representation method achieves an efficient compressed image representation by imposing noncorrelation constraints and Laplacian regularization in tensor factorization. Based on this representation, a threelevel hierarchical probabilistic model, denoted as classspecific Gaussianmultinomial latent Dirichlet allocation (csGMLDA), is developed by using class supervision at the level of visual features. In csGMLDA, latent variables or topics are served as middlelevel concepts for building the correspondences between visual features and annotation tags.

A novel hierarchical probabilistic model, namely csGMLDA, is presented by combining the labeling strength of topic supervision with the flexibility of topic discovery, and can be effectively modeled by applying a variational EM algorithm.

An effective image representation method, namely, Laplacian regularized uncorrelated tensor representation, is developed to explicitly consider the manifold structures in the highorder image space.

By learning with csGMLDA, a unified framework is introduced to infer the hierarchies of multiple modalities and predict tags for a new image. Benefiting from the exploration of hierarchical probabilistic inferences, the unified framework can be effectively conducted.
The rest of this paper is organized as follows. We first discuss the related work in Section 2. Then, we present Laplacian regularized uncorrelated tensor representation in Section 3. After that, the proposed model is described in Section 4. Moreover, quantitative experiments validating strong improvements by the proposed method are presented in Section 5. Finally, Section 6 draws the conclusion.
2 Related work
In this section, we outline research contributions which are most related to our work. We first review techniques for tensorbased image representation. Then, topic models are further discussed.
2.1 Tensorbased image representation
It is believed that the specialized structures of a visual object are intrinsically in the form of second or even higher order tensor [5]. To retain these highorder characteristics, tensors or multidimensional arrays become a natural choice for the visual representation. In practice, exact image representation as a full tensor is often redundant and impossible when coping with mass of images. However, approximative image representation using tensor subspace learning techniques in many cases can be helpful for describing various visual objects. In this paper, we discuss two main kinds of tensor subspace learning (TSL) algorithms: supervised and unsupervised TSL.
Supervised TSL algorithms use conceptdriven dimensionality reduction to achieve discriminant tensor subspaces by considering the subsequent classification or recognition tasks. This line of algorithms requires that either manual class labels or object priors in the training set can be applicable to a particular image classification [5,6] or object representation [17,18]. However, as image annotation system needs to handle a large number of classes and most classes may require many training samples due to significant intraclass shape and appearance variations, it is important that the learning does not involve any human interaction. This makes unsupervised TSL algorithms more appealing. Unsupervised TSL algorithms are actively explored for datadriven dimensionality reduction that uses low rank tensors to approximate the exact represented tensors. The extensions of principal component analysis (PCA) and singular value decomposition (SVD) are most familiar methods for the research on this line. By maximizing the variance measure, twodimensional PCA (2DPCA) [19] represented an image by projecting it to principal components along the vertical direction of the image data. Then, generalized PCA (GPCA) [20] employed bilinear subspace analysis for dimensionality reduction with matrices. Later, the multilinear PCA (MPCA) [21] and uncorrelated MPCA (UMPCA) [22] were proposed for dimensionality reduction with tensors of any order. By minimizing the reconstruction error, the generalized lowrank approximation of matrices (GLRAM) [23] took into account the spatial correlation of image pixels within a localized neighborhood and applied bilinear transforms to the input image matrices. For higherorder tensors, the work in [24] used the highorder SVD (HOSVD) to decompose an ensemble of images into basis images that capture the different underlying factors of variations. Furthermore, concurrent subspaces analysis (CSA) [25] was presented as a generalization of GLRAM for higherorder tensors. Recently, multiple tensor rankR decomposition (MTRD) [26] was proposed for approximating a higherorder tensor with a series of rankR tensor approximations.
In this paper, we propose an unsupervised method with Laplacian regularized uncorrelated tensor representation to explicitly consider manifold structures in the highorder image space. That is, data points that are close in the intrinsic geometry of the image space shall thus be close to each other under the factorized tensor basis. By combining unsupervised TSL and Laplacian regularization, we can achieve a more discriminative descriptor which is much important for accurate semantic learning.
2.2 Topic models for image annotation
Topic models annotate images as the samples from a specific mixture of topics, where each topic is a distribution over image observations. Three alternatives of pLSAbased models that provided in [13] were presented by using asymmetric learning for semantic indexing of large image collections. Then, a Gaussianmultinomial pLSA (GMpLSA) model [27] was presented to learn multimodal correlations from the image data by applying continuous feature vectors. Furthermore, the work in [28] extended pLSA to a higherorder formalism, so as to become applicable for more than two observable variables. However, pLSAbased models are incomplete in that they provide no probabilistic restriction on how to generate the training data. In these models, each image is represented as a list of the mixing proportions for topics, and there is no probabilistic inference for generating these numbers of topics. This leads to two problems: first, the number of modeling parameters grows linearly with the size of the training set, which leads to serious problems with overfitting; second, it is not clear how to assign probability to an image outside of the training set. To overcome these problems, it is much effective to endow the topic model with Dirichlet priors over topic parameters as they are conjugate to the multinomial distribution of the associated tags. The correspondence LDA (CorrLDA) [29] was first presented for modeling the joint distribution of images and tags. To capture more general forms of association and allow the number of topics in the two data modalities to be different, topic regression multimodal latent Dirichlet allocation (trmmLDA) [30] was proposed by introducing a regression module to correlate the two sets of topics. Taking advantage of limited tagged training images and rich untagged images, the work in [31] proposed a regularized semisupervised latent Dirichlet allocation (rSSLDA) for learning visual concept classifiers in a semisupervised way. However, several supervised methods [1416] show that the topics discovered by LDA models are driven by general image regularities rather than the semantic regularities for image annotation. To address this, we propose a new threelevel hierarchical probabilistic model by incorporating supervision into the extended LDA model, making the annotation applications be much effective than previous LDA models.
3 The proposed representation method
In this section, we first give the notations that are necessary in defining the multiway problem. Then, a tensorbased method is proposed for visual representation.
3.1 Notations and definitions
List of key notations
Symbol  Description 

\(\mathcal {X}_{n},\tilde {\mathcal {X}}_{n},\mathcal {G}_{n}\)  The representations of an original tensor, centered tensor,and its core tensor 
\(U^{(k)},\tilde {U}_{(k)} \)  The kmode transformed matrix and the Kroneckerproducts except the matrix 
\(\tilde {W},\tilde {D},\tilde {L}\)  The weight matrix of tensorial features, its diagonal matrix,and its Laplacian matrix 
g _{ n },y _{ n }  Vectorizations of the core tensor and the transformedtensor 
α,β,μ _{ c },σ _{ c }  Parameters of the Dirichlet distribution for latent topics, multinomial distribution for tags, mean and variance ofGaussian distribution for visual features 
w _{ m },v _{ d }  Symbol of the tag and the visual feature 
y _{ m },z _{ d }  Latent topics for the tag and the visual feature 
W,V  Collections of tags and visual features 
Y,Z  Collections of latent topics for tags and visual features 
3.2 Laplacian regularized uncorrelated tensor representation
where Y,C _{ b },C _{ r } are the three color channels obtained by transforming the original RGB image.
The pseudo code for the proposed representation method is described in Algorithm 1. For this representation, a full solution is referring to the formalism in Equation 5. However, the alternating solution for this problem is quadratic with respect to the number of the image dataset, which is much expensive for image representation when dealing with a large dataset. In real applications, we perform the above representation method with a much smaller size by first using graph shift [34] for image clustering and then learning the representation for each group. Noticeably, the image data of one group should subtract the projections of previous multilinear transformations to preserve the orthogonality.
4 The proposed annotation method
In this section, we first describe the proposed method for image annotation. Then, we turn our attention to parameter estimation for the modeling problem. Finally, a unified framework is presented to infer the hierarchies of multiple modalities and predict tags for a new image.
4.1 Classspecific Gaussianmultinomial latent Dirichlet allocation
 1.Draw an image topic proportion θ∼P _{ Π }(θ;α)
 2.For each associated tag \(w_{m} \in {\mathcal W}=\{ 1,\cdots,C\},m=1,\cdots,M\)
 (a)
Draw a topic assignment, \(y_{m} \sim P_{Y\Pi } (y_{m} \theta),y_{m} \in {\mathcal T}_{y} =\{ 1,\cdots,K\} \)
 (b)
Draw a tag, \(w_{m} \sim P_{WY} (w_{m} y_{m} ;\beta),w_{m} \in {\mathcal W}=\{ 1,\cdots,C\} \)
 (a)
 3.For each visual word v _{ d },d=1,⋯,D
 (a)
Draw a topic assignment, \(z_{d} \sim P_{Z\Pi } (z_{d} \theta),z_{d} \in {\mathcal T}_{z} =\{ 1,\cdots,K\} \)
 (b)
Draw a visual description, \(v_{d} \sim P_{VZ,W} (v_{d} z_{d},w_{m} ;\mu _{w_{m}},\sigma _{w_{m}})\phantom {\dot {i}\!}\)
 (a)
In csGMLDA, the parameters {α,β,μ _{1:C },σ _{1:C }} are datasetlevel parameters, assumed to be sampled once in the process of generating a set of images. The variables θ _{ n } is an imagelevel variable, sampled once per image. The variables y _{ nm } and w _{ nm } are taglevel variables, sampled once for each annotated tag. And the variables z _{ nd } and v _{ nd } are structurelevel variables, sampled once for each visual description. Structural models similar to that shown in Figure 1 are often studied in Bayesian statistical modeling, where they are referred to as conditionally independent hierarchical models. Indeed, as we discuss in the following subsection, we adopt the empirical Bayes approach to estimating parameters with a variational EM algorithm.
4.2 Parameter estimation via variational inference
where {Y,Z} are the collections of latent topics, the Dirichlet parameter η _{ n }, and the multinomial parameters {ϕ _{ nm },ζ _{ nd }} are the free variational parameters.
 1.
Estep: For each image, find the optimizing values of the variational parameters {η,ϕ,ζ}.
 2.
Mstep: Maximize the resulting lower bound \({\mathcal L}(\eta,\phi,\zeta ;\alpha,\beta,\mu _{1:C},\sigma _{1:C})\)on the log likelihood of P _{ W,V }(W,Vα,β,μ _{1:C },σ _{1:C }) with respect to the model parameters {α,β,μ _{1:C },σ _{1:C }}.
We summarize the parameter estimation algorithm in Algorithm 2. This is a standard EM process, and the lower bound \(\mathcal {L}(\eta,\phi,\zeta ;\alpha,\beta,\mu _{1:C},\sigma _{1:C})\) is a concave function. Therefore, Algorithm 2 is convergent. From the pseudo code, it is clear that each iteration of the Estep for csGMLDA requires \(\mathcal {O}(NMDK)\) operations. Empirically, we find that the number of iterations in this step is in proportion to the number of tags and the dimensionality of visual features. Parameter estimation in Mstep for {β,μ _{1:C },σ _{1:C }} requires \(\mathcal {O}(NCMDK)\) operations. And the number of iterations required for the NewtonRaphson method is linear to the dimensionality of α. Therefore, each EM iteration yields a total number of operations roughly on \(\mathcal {O}\left (NM\left (C+M+D\right)DK\right)\). The number of iterations for the EM algorithm is mainly determined by the number of involved parameters, which is in proportion to C(1+D)K. Thus, the complexity for building the proposed model is about \(\mathcal {O}\left (NCM\left (C+M+D\right)D^{2} K^{2} \right)\). When coping with largescale data (i.e., N≫K,C,D), the complexity of our modeling system is approximately linear to the number of images, which is much effective by comparing with the typical quadratic annotation models (e.g., pLSA [13] that requires \(\mathcal {O}\left (N^{2} KC\right)\) operations, GMpLSA [27] that yields the number of operations roughly on \(\mathcal {O}\left (N^{2} K^{2} C\right)\), and so on).
4.3 A unified framework for image annotation
5 Experiments
To evaluate the performance of our annotation framework, we set up several quantitative experiments. First, we investigate the effects of the setting parameters by conducting cross validation to select the best parameters for our proposed model. Then, we give a comparison of different image representations and validate the effectiveness of our representation method. Finally, we evaluate the proposed method on two benchmark datasets and report the results over state of the art.
5.1 Datasets and representations
Statistics of two image datasets
Dataset  Number of tags  Number of images  Tags per image 

Corel5K  5,000  374  3.4 
ESPGame  20,000  268  4.7 
To get a reasonable size that keeps the images from serious deterioration for our representation method, we fix the size of images in Corel5K and ESPGame as 128×192 and 225×169, respectively. The vectorization of the core tensor constituting the Ddimensional Laplacian regularized uncorrelated tensorial vector (LGUTV) can be viewed as an image descriptor, with each item corresponding to an uncorrelated elementary multilinear projection. In our experiment, the dimensionality of LGUTV is fixed as 128 for each group in both the two datasets. We further divide the Corel5K and ESPGame into five and ten groups by graph shift [34], resulting in a 640dimensional vector and a 1,280dimensional vector for these two datasets, respectively. In addition, we compare the proposed representation method with several common representation methods, i.e., the quantified color histograms with 16 bins in each color channel for RGB, LAB, HSV representations (CHIST), the quantified SIFT features both extracted densely on a multiscale grid (DSIFT) or for HarrisLaplacian interest points (HSIFT) [3], local binary patterns (LBP) [38], and CPAM [4]. To get a proper evaluation of these image descriptors, we set their dimensions equal to that of LGUTV.
5.2 Evaluation criteria and baselines
where Prec(i) is the precision of the correctly retrieved images at rank i in the ranking results of a query q, r e l(q) is the set of relevant images for this query, and N _{ q } is the number of all queries.
For all images in the two standard datasets, our methods are compared with several most relevant and stateoftheart methods, including TagProp [3], pLSA [13], GMpLSA [27], GMLDA [29], CorrLDA [29], topic regression multimodal latent Dirichlet allocation (trmmLDA) [30], and cssLDA [16].
5.3 Investigate the impact of the setting parameters
5.4 Evaluation of different image representations
5.5 Comparison with existing methods
Comparison of different methods on both Corel5K and ESPGame
Corel5K  ESPgame  

Methods  F _{ 1 } Score  mAP  F _{ 1 } score  mAP 
TagProp  0.342  0.374  0.323  0.281 
pLSA  0.193  0.235  0.185  0.156 
GMpLSA  0.254  0.289  0.213  0.179 
GMLDA  0.276  0.303  0.224  0.192 
CorrLDA  0.316  0.354  0.288  0.253 
trmmLDA  0.323  0.361  0.295  0.259 
cssLDA  0.350  0.391  0.325  0.282 
csGMLDA  0.353  0.394  0.334  0.290 
6 Conclusions
In this paper, we propose a novel model, denoted as csGMLDA, based on Laplacian regularized uncorrelated tensor representation for image annotation. The proposed annotation possesses two characteristics, namely: 1) images are represented by a set of uncorrelated tensorial descriptions and 2) classspecific information is integrated into semantic learning with the extension of the standard LDA model. The entire problem is formulated within the proposed framework, and csGMLDA is presented to bridge the semantic gap between image contents and annotated tags. The experimental results demonstrate the effectiveness of our proposed method. Following the research on this line, we will further exploit regionbased tensorial features for discriminative image representation and discuss the correlation of the classspecific information in a hierarchical LDA formalism.
7 Appendix
Declarations
Acknowledgement
This research was conducted with the support of National Natural Science Foundation of China (Grant No. 61271439).
Authors’ Affiliations
References
 H Ma, J Zhu, MT Lyu, I King, Bridging the semantic gap between image contents and tags. IEEE Trans. Multimedia. 12(5), 462–473 (2010).View ArticleGoogle Scholar
 J Liu, M Li, Q Liu, H Lu, S Ma, Image annotation via graph learning. Pattern Recognit. 42, 218–228 (2009).View ArticleMATHGoogle Scholar
 M Guillaumin, T Mensink, J Verbeek, C Schmid, in Proc. IEEE Conf. Comput. Vis. Recognit. Tagprop: Discriminative metric learning in nearest neighbor models for image autoannotation, (2009), pp. 309–316.Google Scholar
 N Zhou, W Cheung, G Qiu, X Xue, A hybrid probabilistic model for unified collaborative and contentbased image tagging. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1281–1294 (2011).View ArticleGoogle Scholar
 S Yan, D Xu, Q Yang, L Zhang, X Tang, HJ Zhang, in Proc. IEEE Conf. Comput. Vis. Recognit. Discriminant analysis with tensor representation, (2005), pp. 526–532.Google Scholar
 Y Liu, Y Liu, S Zhong, K Chan, Tensor distance based multilinear globality preserving embedding: a unified tensor based dimensionality reduction framework for image and video classification. Expert Syst. Appl. 39, 10500–10511 (2012).View ArticleGoogle Scholar
 Z Zhou, M Zhang, in Proc. Adv. Neural Inf. Process. Syst. Multiinstance multilabel learning with application to scene classification, (2006), pp. 1609–1616.Google Scholar
 BK Bao, T Li, S Yan, Hiddenconcept driven multilabel image annotation and label ranking. IEEE Trans. Multimedia. 14(1), 199–210 (2012).View ArticleGoogle Scholar
 G Mesnil, A Bordes, J Weston, G Chechik, Y Bengio, Learning semantic representations of objects and their parts. Mach Learn. 94(2), 281–301 (2014).View ArticleMATHMathSciNetGoogle Scholar
 J Li, J Wang, Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1075–1088 (2003).View ArticleGoogle Scholar
 Q Mao, IH Tsang, S Gao, Objectiveguided image annotation. IEEE Trans. Image Process. 22(4), 1585–1597 (2013).View ArticleMathSciNetGoogle Scholar
 D Blei, A Ng, M Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).MATHGoogle Scholar
 F Monay, D GaticaPerez, Modeling semantic aspects for crossmedia image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1802–1817 (2007).View ArticleGoogle Scholar
 D Blei, J McAuliffe, in Proc. Adv. Neural Inf. Process. Syst. Supervised topic models, (2008), pp. 121–128.Google Scholar
 Q Guo, N Li, Y Yang, G Wu, in IEEE International Conference on Systems, Man, and Cybernetics. Supervised LDA for image annotation, (2011), pp. 471–476.Google Scholar
 N Rasiwasia, N Vasconcelos, Latent Dirichlet allocation models for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2665–2679 (2013).View ArticleGoogle Scholar
 S Yan, D Xu, Q Yang, L Zhang, X Tang, HJ Zhang, Multilinear discriminant analysis for face recognition. IEEE Trans. Image Process. 16(1), 212–220 (2007).View ArticleMathSciNetGoogle Scholar
 D Tao, X Li, X Wu, S Maybank, General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1700–1715 (2007).View ArticleGoogle Scholar
 J Yang, D Zhang, A Frangi, J Yang, Twodimensional PCA: a new approach to appearancebased face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004).View ArticleGoogle Scholar
 J Ye, R Janardan, Q Li, in Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining. Gpca: An efficient dimension reduction scheme for image compression and retrieval, (2004), pp. 354–363.Google Scholar
 H Lu, K Plataniotis, A Venetsanopoulos, MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008).View ArticleGoogle Scholar
 H Lu, K Plataniotis, A Venetsanopoulos, Uncorrelated multilinear principal component analysis for unsupervised multilinear subspace learning. IEEE Trans. Neural Netw. 20(11), 1820–1836 (2009).View ArticleGoogle Scholar
 J Ye, Generalized low rank approximations of matrices. Mach. Learn. 16(1), 167–191 (2005).View ArticleMATHGoogle Scholar
 M Vasilescu, D Terzopoulos. Eur. Conf. Comput. Vis, (2002), pp. 447–460.Google Scholar
 D Xu, S Yan, L Zhang, S Lin, HJ Zhang, T Huang, Reconstruction and recognition of tensorbased objects with concurrent subspaces analysis. IEEE Trans. Circuits Syst. Video Technol. 18(1), 36–47 (2008).View ArticleGoogle Scholar
 B Zhou, F Zhang, L Peng, Compact representation for dynamic texture video coding using tensor method. IEEE Trans. Circuits Syst. Video Technol. 23(2), 280–288 (2013).View ArticleGoogle Scholar
 Z Li, Z Shi, X Liu, Z Shi, Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recognit. Lett. 32, 516–523 (2011).View ArticleGoogle Scholar
 S Nikolopoulos, S Zafeiriou, I Patras, I Kompatsiaris, Highorder pLSA for indexing tagged images. Signal Process. 93, 2212–2228 (2013).View ArticleGoogle Scholar
 D Blei, M Jordan, in Proc. 26nd Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval. Modeling annotated data, (2003), pp. 127–134.Google Scholar
 D Putthividhya, H Attias, S Nagarajan, in Proc. IEEE Conf. Comput. Vis. Recognit. Topic regression multimodal latent Dirichlet allocation for image annotation, (2010), pp. 3408–3415.Google Scholar
 L Zhuang, H Gao, J Luo, Z Lin, Regularized semisupervised latent Dirichlet allocation for visual concept learning. Neurocomputing. 119, 26–32 (2013).View ArticleGoogle Scholar
 H Lu, K Plataniotis, A Venetsanopoulos, A survey of multilinear subspace learning for tensor data. Pattern Recognit. 44, 1540–1551 (2011).View ArticleMATHGoogle Scholar
 WY Ma, B Manjunath, Edgeflow: a technique for boundary detection and image segmentation. IEEE Trans. Image Process. 9(8), 1375–1388 (2000).View ArticleMATHMathSciNetGoogle Scholar
 H Liu, S Yan, in International Conference on Machine Learning. Robust graph mode seeking by graph shift, (2010), pp. 671–6783.Google Scholar
 K Murphy, Machine Learning: A Probabilistic Perspective (The MIT Press, 2012).Google Scholar
 H Müller, S MarchandMaillet, T Pun, in Proc. ACM Int’l Conf. Image and Video Retrieval. The truth about Corel evaluation in image retrieval, (2002), pp. 38–49.Google Scholar
 L Ahn, L Dabbish, in Proc. ACM SIGCHI Conf. on Human Factors in Computing Systems. Labeling images with a computer game, (2004), pp. 354–363.Google Scholar
 T Ahonen, A Hadid, M Pietikainen, Face description with local binary patterns: application to face recognition. Mach. Learn. 28(12), 2037–2041 (2006).MATHGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.