Skip to main content

Overview of constrained PARAFAC models

Abstract

In this paper, we present an overview of constrained parallel factor (PARAFAC) models where the constraints model linear dependencies among columns of the factor matrices of the tensor decomposition or, alternatively, the pattern of interactions between different modes of the tensor which are captured by the equivalent core tensor. Some tensor prerequisites with a particular emphasis on mode combination using Kronecker products of canonical vectors that makes easier matricization operations, are first introduced. This Kronecker product‐based approach is also formulated in terms of an index notation, which provides an original and concise formalism for both matricizing tensors and writing tensor models. Then, after a brief reminder of PARAFAC and Tucker models, two families of constrained tensor models, the co‐called PARALIND/CONFAC and PARATUCK models, are described in a unified framework, for N th‐order tensors. New tensor models, called nested Tucker models and block PARALIND/CONFAC models, are also introduced. A link between PARATUCK models and constrained PARAFAC models is then established. Finally, new uniqueness properties of PARATUCK models are deduced from sufficient conditions for essential uniqueness of their associated constrained PARAFAC models.

1 Review

1.1 Introduction

Tensor calculus was introduced in differential geometry, at the end of the nineteenth century, and then tensor analysis was developed in the context of Einstein’s theory of general relativity, with the introduction of index notation, the so‐called Einstein summation convention, at the beginning of the twentieth century, which allows to simplify and shorten physics equations involving tensors. Index notation is also useful for simplifying multivariate statistical calculations, particularly those involving cumulant tensors[1]. Generally speaking, tensors are used in physics and differential geometry for characterizing the properties of a physical system, representing fundamental laws of physics, and defining geometrical objects whose components are functions. When these functions are defined over a continuum of points of a mathematical space, the tensor forms what is called a tensor field, a generalization of vector field used to solve problems involving curved surfaces or spaces, as it is the case of curved space‐time in general relativity. From a mathematical point of view, two other approaches are possible for defining tensors, in terms of tensor products of vector spaces, or multilinear maps. Symmetric tensors can also be linked with homogeneous polynomials[2].

After the first tensor developments by mathematicians and physicists, the need of analyzing collections of data matrices that can be seen as three‐way data arrays gave rise to three‐way models for data analysis, with the pioneering works of Tucker in psychometrics[3], and Harshman in phonetics[4], who proposed what is now referred to as the Tucker and parallel factor (PARAFAC) decompositions, respectively. The PARAFAC decomposition was independently proposed by Carroll and Chang[5] under the name canonical decomposition (CANDECOMP) and then called CANDECOMP/PARAFAC (CP) in[6]. For a history of the development of multi‐way models in the context of data analysis, see[7]. Since the 1990s, multi‐way analysis has known a growing success in chemistry and especially in chemometrics (see Bro’s thesis[8] and the book by Smilde et al.[9] for a description of various chemical applications of three‐way models, with a pedagogical presentation of these models and of various algorithms for estimating their parameters). At the same period, tensor tools were developed for signal processing applications, more particularly for solving the so‐called blind source separation (BSS) problem using cumulant tensors (see[10]‐[12] and De Lathauwer’s thesis[13] where the concept of high‐order singular value decomposition (HOSVD) is introduced, a tensor tool generalizing the standard matrix SVD to arrays of order higher than two). A recent overview of BSS approaches and applications can be found in the handbook co‐edited by Comon and Jutten[14].

Nowadays, (high‐order) tensors, also called multi‐way arrays in the data analysis community, play an important role in many fields of application for representing and analyzing multidimensional data, as in psychometrics, chemometrics, food industry, environmental sciences, signal/image processing, computer vision, neuroscience, information sciences, data mining, pattern recognition, among many others. Then, they are simply considered as multidimensional arrays of numbers, constituting a generalization of vectors and matrices that are first‐ and second‐order tensors, respectively, to orders higher than two. Tensor decompositions, also called tensor models, are very useful for analyzing multidimensional data under the form of signals, images, speech, music sequences, or texts and also for designing new systems as it is the case of wireless communication systems since the publication of the seminal paper by Sidiropoulos et al.[15]. Besides the references already cited, overviews of tensor tools, models, algorithms, and applications can be found in[16]‐[19].

Tensor models incorporating constraints (sparsity; non‐negativity; smoothness; symmetry; column orthonormality of factor matrices; Hankel, Toeplitz, and Vandermonde structured matrix factors; allocation constraints...) have been the object of intensive works, during the last years. Such constraints can be inherent to the problem under study or the result of a system design. An overview of constraints on components of tensor models most often encountered in multi‐way data analysis can be found in[7]. Incorporation of constraints in tensor models may facilitate physical interpretability of matrix factors. Moreover, imposing constraints may allow to relax uniqueness conditions and to develop specialized parameter estimation algorithms with improved performance both in terms of accuracy and computational cost, as it is the case of CP models with a column‐wise orthonormal factor matrix[20]. One can classify the constraints into three main categories: i) sparsity/non‐negativity, ii) structural, and iii) linear dependencies/mode interactions. It is worth noting that the three categories of constraints involve specific parameter estimation algorithms, the first two generally inducing an improvement of uniqueness property of the tensor decomposition, while the third category implies a reduction of uniqueness, named partial uniqueness. We briefly review the main results concerning the first two types of constraints, Section 1.3 of this paper being dedicated to the third category.

Sparse and non‐negative tensor models have recently been the subject of many works in various fields of applications like computer vision[21, 22], image compression[23], hyperspectral imaging[24], music genre classification[25] and audio source separation[26], multi‐channel EEG (electroencephalography) and network traffic analysis[27], fluorescence analysis[28], data denoising and image classification[29], among many others. Two non‐negative tensor models have been more particularly studied in the literature, the so‐called non‐negative tensor factorization (NTF), i.e., PARAFAC models with non‐negativity constraints on the matrix factors, and non‐negative Tucker decomposition (NTD), i.e., Tucker models with non‐negativity constraints on the core tensor and/or the matrix factors. The crucial importance of NTF/NTD for multi‐way data analysis applications results from the very large volume of real‐world data to be analyzed under constraints of sparseness and non‐negativity of factors to be estimated, when only non‐negative parameters are physically interpretable. Many NTF/NTD algorithms are now available. Most of them can be viewed as high‐order extensions of non‐negative matrix factorization (NMF) methods, in the sense that they are based on an alternating minimization of cost functions incorporating sparsity measures (also named distances or divergences) with application of NMF methods to matricized or vectorized forms of the tensor to be decomposed (see for instance[16, 23, 28, 30] for NTF and[29, 31] for NTD). An overview of NMF and NTF/NTD algorithms can be found in[16].

The second category of constraints concerns the case where the core tensor and/or some matrix factors of the tensor model have a special structure. For instance, we recently proposed a nonlinear CDMA scheme for multiuser SIMO communication systems that is based on a constrained block‐Tucker2 model whose core tensor, composed of the information symbols to be transmitted and their powers up to a certain degree, is characterized by matrix slices having a Vandermonde or a Hankel structure[32, 33]. We also developed Volterra‐PARAFAC models for nonlinear system modeling and identification. These models are obtained by expanding high‐order Volterra kernels, viewed as symmetric tensors, by means of symmetric or doubly symmetric PARAFAC decompositions[34, 35]. Block structured nonlinear systems like Wiener, Hammerstein, and parallel‐cascade Wiener systems can be identified from their associated Volterra kernels that admit symmetric PARAFAC decompositions with Toeplitz factors[36, 37]. Symmetric PARAFAC models with Hankel factors and symmetric block PARAFAC models with block Hankel factors are encountered for blind identification of multiple‐input multiple‐output (MIMO) linear channels using fourth‐order cumulant tensors, in the cases of memoryless and convolutive channels, respectively[38, 39]. In the presence of structural constraints, specific estimation algorithms can be derived as it is the case for symmetric CP decompositions[40], CP decompositions with Toeplitz factors (in[41], an iterative solution was proposed, whereas in[42], a non‐iterative algorithm was developed), Vandermonde factors[43], circulant factors[44], banded and/or structured matrix factors[45, 46], and also for Hankel and Vandermonde structured core tensors[33].

The rest of this paper is organized as follows: In Section 1.2, we present some tensor prerequisites with a particular emphasis on mode combination using Kronecker products of canonical vectors that makes easier the matricization operations, especially to derive matrix representations of tensor models. This Kronecker product‐based approach is also formulated in terms of an index notation, which provides an original and concise formalism for both matricizing tensors and writing tensor models. Then, we present the two most common tensor models, the so‐called Tucker and PARAFAC models, in a general framework, i.e., for N th‐order tensors. In Section 1.3, two families of constrained tensor models, the co‐called PARALIND/CONFAC and PARATUCK models, are described in a unified way, with a generalization to N th‐order tensors. New tensor models, called nested Tucker models and block PARALIND/CONFAC models, are also introduced. A link between PARATUCK models and constrained PARAFAC models is also established. In Section 1.4, uniqueness properties of PARATUCK models are deduced using this link. The paper is concluded in Section 2.

Notations and definitions. and denote the fields of real and complex numbers, respectively. Scalars, column vectors, matrices, and high‐order tensors are denoted by lowercase, boldface lowercase, boldface uppercase, and calligraphic uppercase letters, e.g., a, a, A, and, respectively. The vector Ai. (resp. A.j) represents the i th row (resp. j th column) of A.

I N , 1 N T , and e n ( N ) stand for the identity matrix of order N, the all‐ones row vector of dimensions 1×N, and the n th canonical vector of the Euclidean space R N , respectively.

AT, AH, A, tr (A), and r A denote the transpose, the conjugate (Hermitian) transpose, the Moore‐Penrose pseudo‐inverse, the trace, and the rank of A, respectively. D i (A)=diag(Ai.) represents the diagonal matrix having the elements of the i th row of A on its diagonal. The operator bdiag(.) forms a block diagonal matrix from its matrix arguments, while the operator vec(.) transforms a matrix into a column vector by stacking the columns of its matrix argument one on top of the other one. In case of a tensor, the vec(.) operation is defined in (6).

The outer product (also called tensor product), and the matrix Kronecker, Khatri‐Rao (column‐wise Kronecker), and Hadamard (element‐wise) products are denoted by , , , and , respectively.

Let us consider the setS={ n 1 ,, n N } obtained by permuting the elements of the set {1,…,N}. For A ( n ) C I n × R n and u ( n ) C I n × 1 , n=1,…,N, we define

n S A ( n ) = A ( n 1 ) A ( n 2 ) A ( n N ) C I n 1 I n N × R n 1 R n N ;
(1)
n S A ( n ) = A ( n 1 ) A ( n 2 ) A ( n N ) C I n 1 I n N × R , when R n = R , n = 1 , , N ; n S A ( n ) = A ( n 1 ) A ( n 2 ) A ( n N ) C I × R , when I n = I , and R n = R , n = 1 , , N ; n S u ( n ) = u ( n 1 ) u ( n 2 ) u ( n N ) C I n 1 × × I n N .
(2)

The outer product of N non‐zero vectors defines a rank‐one tensor of order N.

By convention, the order of dimensions is directly related to the order of variation of the associated indices. For instance, in (1) and (2), the product I n 1 I n 2 I n N of dimensions means that n1 is the index varying the most slowly while n N is the index varying the most fastly in the Kronecker products computation.

ForS={1,,N}, we have the following identities:

n S u ( n ) i 1 , , i N = n = 1 N u ( n ) i 1 , , i N = n = 1 N u i n ( n ) , n S u ( n ) i = n = 1 N u ( n ) i = n = 1 N u i n ( n ) with i = i N + n = 1 N - 1 ( i n - 1 ) j = n + 1 N I j .
(3)

In particular, foru C I × 1 ,v C J × 1 ,w C K × 1

X = u v w C I × J × K x ijk = u i v j w k , x = u v w C IJK × 1 x k + ( j - 1 ) K + ( i - 1 ) JK = u i v j w k .

Some useful matrix formulae are recalled in Appendix 1.

1.2 Tensor prerequisites

In this paper, a tensor is simply viewed as a multidimensional array of measurements. Depending that these measurements are real‐ or complex‐valued, we have a real‐ or complex‐valued tensor, respectively. The order N of a tensor refers to the number of indices that characterize its elements x i 1 , , i N , each index i n (i n =1,…,I N ,for n=1,…,N) being associated with a dimension, also called a way, or a mode, and I n denoting the mode‐n dimension.

An N th‐order complex‐valued tensorX C I 1 × × I N , also called an N‐way array, of dimensions I1××I N , can be written as

X= i 1 = 1 I 1 i N = 1 I N x i 1 , , i N n = 1 N e i n ( I n ) .
(4)

The coefficients x i 1 , , i N represent the coordinates of in the canonical basis n = 1 N e i n ( I n ) , i n = 1 , , I n ; n = 1 , , N of the space C I 1 × × I N .

The identity tensor of order N and dimensions I××I, denoted by I N , I or simply, is a diagonal hypercubic tensor whose elements δ i 1 , , i N are defined by means of the generalized Kronecker delta, i.e., δ i 1 , , i N = 1 if i 1 = = i N 0 otherwise , and I n =I,n=1,…,N. It can be written as

I N , I = i = 1 I e i ( I ) e i ( I ) N terms .

Different reduced order tensors can be obtained by slicing the tensorX C I 1 × × I N along one mode or p modes, i.e., by fixing one index i n or a set of p indices{ i n 1 ,, i n p }, which gives a tensor of order N-1 or N-p, respectively. For instance, by slicing along its mode‐n, we get the i n th mode‐n slice of, denoted by X i n , that can be written as

X i n = i 1 = 1 I 1 i n - 1 = 1 I n - 1 i n + 1 = 1 I n + 1 i N = 1 I N x i 1 , , i n , , i N × e i n + 1 ( I n + 1 ) e i N ( I N ) e i 1 ( I 1 ) e i n - 1 ( I n - 1 ) C I n + 1 × × I N × I 1 × × I n - 1 .

For instance, by slicing the third‐order tensorX C I × J × K along each mode, we get three types of matrix slices, respectively called horizontal, lateral, and frontal slices:

X i.. C J × K , X .j. C K × I and X ..k C I × J , with i = 1 , , I ; j = 1 , , J ; k = 1 , , K.

1.2.1 Tensor Hadamard product

ConsiderA C R 1 × × R N × I 1 × × I P 1 andB C R 1 × × R N × I P 1 + 1 × × I P , where{ i 1 ,, i P 1 } and{ i P 1 + 1 ,, i P } are two disjoint ordered subsets of the set of indices {i1,…,i P } andR={ r 1 ,, r N }.

We define the Hadamard product of with along their common modes, as the tensorC C R 1 × × R N × I 1 × × I P such that

C = A R B c r 1 , , r N , i 1 , , i P = a r 1 , , r N , i 1 , , i P 1 b r 1 , , r N , i P 1 + 1 , , i P

For instance, given two third‐order tensorsA C R 1 × R 2 × I 1 andB C R 1 × R 2 × I 2 , the Hadamard productA { r 1 , r 2 } B gives a fourth‐order tensorC C R 1 × R 2 × I 1 × I 2 such that

c r 1 , r 2 , i 1 , i 2 = a r 1 , r 2 , i 1 b r 1 , r 2 , i 2 .

Such a tensor Hadamard product can be calculated by means of the matrix Hadamard product of matrix unfoldings of extended tensors, as defined in (21) and (22) (see also (94) to (96) in Appendix 2). For the example above, we have

C R 1 R 2 × I 1 I 2 = A R 1 R 2 × I 1 I I 1 1 I 2 T B R 1 R 2 × I 2 1 I 1 T I I 2
1.2.1.0 Example

For A R × I 1 = a 1 a 2 a 3 a 4 , B R × I 2 = b 1 b 2 b 3 b 4 , and the tensor such as c r , i 1 , i 2 = a r , i 1 b r , i 2 , a mode‐1 flat matrix unfolding of is given by

C R × I 1 I 2 = A R × I 1 I 2 1 2 T B R × I 2 1 2 T I 2 = a 1 a 1 a 2 a 2 a 3 a 3 a 4 a 4 b 1 b 2 b 1 b 2 b 3 b 4 b 3 b 4 = a 1 b 1 a 1 b 2 a 2 b 1 a 2 b 2 a 3 b 3 a 3 b 4 a 4 b 3 a 4 b 4

1.2.2 Mode combination

Different contraction operations can be defined depending on the way according to which the modes are combined. Let us partition the set {1,…,N} in N1 ordered subsets S n 1 , constituted of p(n1) elements with n 1 = 1 N 1 p( n 1 )=N. Each subset S n 1 is associated with a combined mode of dimension J n 1 = I n n S n 1 . These mode combinations allow to rewrite the N th‐order tensorX C I 1 × × I N under the form of an N 1 th ‐order tensorY C J 1 × × J N 1 as follows

Y= j 1 = 1 J 1 j N 1 = 1 J N 1 x j 1 , , j N 1 n 1 = 1 N 1 e j n 1 ( J n 1 ) with e j n 1 ( J n 1 ) = n S n 1 e i n ( I n ) .
(5)

Two particular mode combinations corresponding to the vectorization and matricization operations are now detailed.

1.2.3 Vectorization

The vectorization ofX C I 1 × × I N is associated with a combination of the N modes into a unique mode of dimensionJ= n = 1 N I n , which amounts to replace the outer product in (4) by the Kronecker product

vec(X)= i 1 = 1 I 1 i N = 1 I N x i 1 , , i N n = 1 N e i n ( I n ) C I 1 I N × 1
(6)

the element x i 1 , , i N of being the i th entry ofvec(X) with i defined as in (3).

The vectorization can also be carried out after a permutation of indices π(i n ),n=1,…,N.

1.2.4 Matricization or unfolding

There are different ways of matricizing the tensor according to the partitioning of the set {1,…,N} into two ordered subsets S 1 and S 2 , constituted of p and N-p indices, respectively. A general formula for the matricization, forp 1 , N - 1 , is

X S 1 ; S 2 = i 1 = 1 I 1 i N = 1 I N x i 1 , , i N n S 1 e i n ( I n ) n S 2 e i n ( I n ) T C J 1 × J 2
(7)

with J n 1 = I n n S n 1 , for n1=1 and 2. From (7), we can deduce the following expression of the element x i 1 , , i N in terms of the matrix unfolding X S 1 ; S 2

x i 1 , , i N = n S 1 e i n ( I n ) T X S 1 ; S 2 n S 2 e i n ( I n ) .
(8)

1.2.5 Particular case: mode‐ n matrix unfoldings X n

A flat mode‐n matrix unfolding of the tensor corresponds to an unfolding under the form X S 1 ; S 2 with S 1 ={n} and S 2 ={n+1,,N,1,,n-1}, which gives

X I n × I n + 1 I N I 1 I n - 1 = X n = i 1 = 1 I 1 i N = 1 I N x i 1 , , i N e i n ( I n ) n S 2 e i n ( I n ) T C I n × I n + 1 I N I 1 I n - 1 .
(9)

We can also define a tall mode‐n matrix unfolding of, by choosing S 1 ={n+1,,N,1,,n-1} and S 2 ={n}. Then, we have X I n + 1 I N I 1 I n - 1 × I n = X n T C I n + 1 I N I 1 I n - 1 × I n .

The column vectors of a flat mode‐n matrix unfolding X n are the mode‐n vectors of, and the rank of X n , i.e., the dimension of the mode‐n linear space spanned by the mode‐n vectors, is called mode‐n rank of, denoted by rank n (X).

In the case of a third‐order tensorX C I × J × K , there are six different flat unfoldings, denoted XI×J K, XI×K J, XJ×K I, XJ×I K, XK×I J, and XK×J I. For instance, we have

X I × JK = X { 1 } ; { 2 , 3 } = i = 1 I j = 1 J k = 1 K x i , j , k e i ( I ) e j ( J ) e k ( K ) T .
(10)

Using the properties (84), (85), and (87) of the Kronecker product gives

X I × JK = j = 1 J e j ( J ) T i = 1 I k = 1 K x i , j , k e i ( I ) e k ( K ) T = j = 1 J e j ( J ) T ( X .j. ) T = X . 1 . T X .J. T C I × JK .

Similarly, there are six tall matrix unfoldings, denoted XJ K×I, XK J×I, XK I×J, XI K×J, XI J×K, XJ I×K, like for instance

X JK × I = i = 1 I j = 1 J k = 1 K x i , j , k e j ( J ) e k ( K ) e i ( I ) T = X I × JK T C JK × I .
(11)

Applying (8) to (10) gives

x i , j , k = e i ( I ) T X I × JK e j ( J ) e k ( K ) = X I × JK i , ( j - 1 ) K + k .

1.2.6 Mode‐ n product of a tensor with a matrix or a vector

The mode‐n product ofX C I 1 × × I N withA C J n × I n along the n th mode, denoted byX × n A, gives the tensor of order N and dimensions I1××In-1×J n ×In+1××I N , such as[47]

y i 1 , , i n - 1 , j n , i n + 1 , , i N = i n = 1 I n a j n , i n x i 1 , , i n - 1 , i n , i n + 1 , , i N
(12)

which can be expressed in terms of mode‐n matrix unfoldings of and

Y n = AX n .

This operation can be interpreted as the linear map from the mode‐n space of to the mode‐n space of, associated with the matrix A.

The mode‐n product ofX C I 1 × × I N with the row vector u T C 1 × I n along the n th mode, denoted byX × n u T , gives a tensor of order N-1 and dimensions I1××In-1×In+1××I N , such as

y i 1 , , i n - 1 , i n + 1 , , i N = i n = 1 I n u i n x i 1 , , i n - 1 , i n , i n + 1 , , i N

that can be written in vectorized form as vec T (Y)= u T X n C 1 × I n + 1 I N I 1 I n - 1 .

When multiplying a N th‐order tensor by row vectors along p different modes, we get a tensor of order N-p. For instance, for a third‐order tensorX C I × J × K , we have

x ij. = X × 1 e i ( I ) T × 2 e j ( J ) T , x ijk = X × 1 e i ( I ) T × 2 e j ( J ) T × 3 e k ( K ) T .

Considering an ordered subsetS={ m 1 ,, m P } of the set {1,…,N}, a series of mode‐ m p products ofX C I 1 × × I N with A ( m p ) C J m p × I m p , p{1,…,P}, PN, will be concisely noted as

X × m 1 A ( m 1 ) × m P A ( m P ) = X × m = m 1 m P A ( m ) .
1.2.6.0 Properties
  • For any permutation π(.) of P distinct indices m p {1,…,N} such as q p =π(m p ), p{1,…,P}, with PN, we have

    X × q = q 1 q P A ( q ) = X × m = m 1 m P A ( m )

    which means that the order of the mode‐ m p products is irrelevant when the indices m p are all distinct.

  • For two products ofX C I 1 × × I N along the same mode‐n, withA C J n × I n andB C K n × J n , we have[13]

    Y = X × n A × n B = X × n ( B A ) C I 1 × × I n - 1 × K n × I n + 1 × × I N .
    (13)

1.2.7 Kronecker product‐based approach using index notation

In this subsection, we propose to reformulate our Kronecker product‐based approach for tensor matricization in terms of an index notation introduced in[48]. Using this index notation, foru C I × 1 , v T C 1 × J , andX C I × J , we can write

u = i = 1 I u i e i ( I ) = u i e i v T = j = 1 J v j e j ( J ) T = v j e j X = i = 1 I j = 1 j x ij e i ( I ) e j ( J ) T = x ij e i j X T = x ij e j i vec ( X ) = x ij e ji

As with Einstein summation convention, the index notation allows to drop summation signs. If an index i[ 1,I] is repeated in an expression (or more generally in a term of an equation), it means that this expression (or this term) must be summed over that index from 1 to I. However, it is worth noting the two differences between the index notation used in this paper and Einstein summation convention: (i) each index can be repeated more than twice in any expression and (i i) the index notation can be used with ordered sets of indices. We have to notice that the index notation can be interpreted in terms of two separate combinations of indices, one associated with the column (superscript) indices and the other one with the row (subscript) indices, with the following rules:

  • the ordering of the column indices is independent of that of the row indices;

  • the ordering of the column and row indices cannot be changed.

Considering the setS={ n 1 ,, n N } obtained by permuting the elements of {1,…,N} and defining the ordered set of indicesI={ i n 1 ,, i n N } associated with, we denote by e I and e I the Kronecker products n S e i n and n S e i n T , respectively. So, we have

n S u ( n ) = n S u i n ( n ) e I
(14)

Partitioning two ordered sets of indices and into two subsets ( I 1 , I 2 ) and ( J 1 , J 2 ), respectively, the rules enounced previously imply the following identities:

e I J = e I e J = e J e I = e I 1 I 2 J 1 J 2 = e I 1 e I 2 e J 1 e J 2 = e I 1 e J 1 e I 2 e J 2 = e I 1 e J 1 e J 2 e I 2 = e J 1 e J 2 e I 1 e I 2 = e J 1 e I 1 e J 2 e I 2 = e J 1 e I 1 e I 2 e J 2

These identities directly result from the property that the Kronecker product of a column vector with a row vector is independent of the order of the vectors (uvT=vTu), which implies that in a sequence of Kronecker products of column and row vectors, a column vector can be permuted with a row vector without altering the final result, if the proper ordering of the column vectors and of the row vectors is not changed in the sequence ( u 1 u 2 v T = u 1 v T u 2 = v T u 1 u 2 v T u 2 u 1 if u1u2).

Using the index notation, the horizontal, lateral, and frontal slices of a third‐order tensorX C I × J × K can be written as

X i.. = x ijk e j k ; X .j. = x ijk e k i ; X ..k = x ijk e i j .

The Kronecker products of vectors (u C I × 1 ,v C J × 1 ) and matrices (A C I × J ,B C K × L ) can be concisely written as

u v = ( u i e i ) ( v j e j ) = u i v j e ij u T v T = u i v j e ij u v T = u i v j e i j A B = a ij e i j b kl e k l = a ij b kl e ik jl A T B T = a ij b kl e jl ik

ForU= u ( 1 ) u ( N ) C I × N andV= v 1 v N C J × N , we have

U V T = n = 1 N u ( n ) ( v ( n ) ) T = u i ( n ) v j ( n ) e i j
(15)

where the summation over n is to be done after the matricization u(n)(v(n))T.

Using the index notation, the Khatri‐Rao product can be written as follows:

A B = a ik b jk e ij k ( A B ) T = a ik b jk e k ij
(16)

The Kronecker and Khatri‐Rao products defined in (1) and (2), with a i n , r n ( n ) as entry of A(n), can then be defined as

n S A ( n ) = n S a i n , r n ( n ) e i n 1 , , i n N r n 1 , , r n N = n S a i n , r n ( n ) e I R
(17)
n S A ( n ) = n S a i n , r ( n ) e i n 1 , , i n N r = n S a i n , r ( n ) e I r
(18)

whereR={ r n 1 ,, r n N }.

Applying these results, the unfoldings (7), (10), and (11) and the formula (8) can be rewritten respectively as

X S 1 ; S 2 = x i 1 , , i N e I 1 I 2
(19)
X I × JK = x i , j , k e i jk X JK × I = x i , j , k e jk i x i 1 , , i N = e I 1 X S 1 ; S 2 e I 2
(20)

where I 1 and I 2 represent the sets of indices i n associated with the sets S 1 and S 2 of index n, respectively.

We can also use the index notation for deriving matrix unfoldings of tensor extensions of a matrixB C I × J . For instance, if we define the tensorA C I × J × K such as ai,j,k=bi,j for k=1,…,K, mode‐1 flat unfoldings of are given by

A I × JK = i , j , k a i , j , k e i e j e k = a i , j , k e i jk = i , j b i , j e i j k = 1 K e k = B 1 K T = B I J 1 K T
(21)
A I × KJ = a i , j , k e i kj = k = 1 K e k b i , j e i j = 1 K T B = B 1 K T I J
(22)

These two formulae will be used later for establishing the link between PARATUCK‐(2,4) models and constrained PARAFAC‐4 models.

1.2.8 Basic tensor models

We now present the two most common tensor models, i.e., the Tucker[3] and PARAFAC[4] models. In[7], these models are introduced in a constructive way, in the context of a three‐way data analysis. The Tucker models are presented as extensions of the matrix singular value decomposition (SVD) to three‐way arrays, which gave rise to the generalization as HOSVD[13, 49], whereas the PARAFAC model is introduced by emphasizing Cattell’s principle of parallel proportional profiles[50] that underlies this model, so explaining the acronym PARAFAC. In the following, we adopt a more general presentation for multi‐way arrays, i.e., tensors of any order N.

1.2.8.0 Tucker models

For a N th‐order tensorX C I 1 × × I N , a Tucker model is defined in an element‐wise form as

x i 1 , , i N = r 1 = 1 R 1 r N = 1 R N g r 1 , , r N n = 1 N a i n , r n ( n )
(23)

with i n =1,…,I n for n=1,…,N, where g r 1 , , r N is an element of the core tensorG C R 1 × × R N and a i n , r n ( n ) is an element of the matrix factor A ( n ) C I n × R n .

Using the index notation and defining the set of indicesR={ r 1 ,, r N }, the Tucker model can also be written simply as

x i 1 , , i N = g r 1 , , r N R a i n , r n ( n )
(24)

Taking the definition (4) into account and noting that i n = 1 I n a i n , r n ( n ) e i n ( I n ) = A . r n ( n ) , this model can be written as a weighted sum of n = 1 N R n outer products, i.e., rank‐one tensors

X = r 1 = 1 R 1 r N = 1 R N g r 1 , , r N n = 1 N A . r n ( n ) = g r 1 , , r N R A . r n ( n ) ( with the index notation )
(25)

Using the definition (12) allows to write (23) in terms of mode‐n products as

X = G × 1 A ( 1 ) × 2 A ( 2 ) × 3 × N A ( N ) = G × n = 1 N A ( n ) .
(26)

This expression evidences that the Tucker model can be viewed as the transformation of the core tensor resulting from its multiplication by the factor matrix A(n) along its mode‐n, which corresponds to a linear map applied to the mode‐n space of, for n=1,…,N, i.e., a multilinear map applied to. From a transformation point of view, and can be interpreted as the input tensor and the transformed tensor, or output tensor, respectively.

Matrix representations of the Tucker model. A matrix representation of a Tucker model is directly linked with a matricization of tensor like (7), corresponding to the combination of two sets of modes S 1 and S 2 . These combinations must be applied both to the tensor and its core tensor.

The matrix representation (7) of the Tucker model (23) is given by

X S 1 ; S 2 = n S 1 A ( n ) G S 1 ; S 2 n S 2 A ( n ) T
(27)

with G S 1 ; S 2 C J 1 × J 2 , and J n 1 = R n n S n 1 , for n1=1 and 2.

Proof

See Appendix 3.

For the flat mode‐n unfolding, defined in (9), the formula (27) gives

X n = A ( n ) G n A ( n + 1 ) A ( N ) A ( 1 ) A ( n - 1 ) T .
(28)

Applying the vec formula (92) to the right‐hand side of (28), we obtain the vectorized form of associated with its mode‐n unfolding X n

vec ( X ) = vec ( X n ) = A ( n + 1 ) A ( N ) A ( 1 ) A ( n ) vec ( G n ) .
1.2.8.0 Tucker‐(N 1 ,N) models

A Tucker‐ (N1,N) model for a N th‐order tensorX C I 1 × × I N , with NN1, corresponds to the case where N-N1 factor matrices are equal to identity matrices. For instance, assuming that A ( n ) = I I n , which implies R n =I n , for n=N1+1,…,N, (23) and (26) become

x i 1 , , i N = r 1 = 1 R 1 r N 1 = 1 R N 1 g r 1 , , r N 1 , i N 1 + 1 , , i N n = 1 N 1 a i n , r n ( n ) X = G × 1 A ( 1 ) × 2 × N 1 A ( N 1 ) × N 1 + 1 I I N 1 + 1 × N I I N
(29)
= G × n = 1 N 1 A ( n ) .
(30)

One such model that is currently used in applications is the Tucker‐(2,3) model, usually denoted Tucker2, for third‐order tensorsX C I × J × K . Assuming A ( 1 ) =A C I × P , A ( 2 ) =B C J × Q , and A ( 3 ) = I K , such a model is defined by the following equations:

x ijk = p = 1 P q = 1 Q g pqk a ip b jq
(31)
X = G × 1 A × 2 B
(32)

with the core tensorG C P × Q × K .

1.2.8.0 PARAFAC models

A PARAFAC model for a N th‐order tensor corresponds to the particular case of a Tucker model with an identity core tensor of order N and dimensions R××R

G = I N , R = I g r 1 , , r N = δ r 1 , , r N

(23) to (26) then become, respectively

x i 1 , , i N = r = 1 R n = 1 N a i n , r ( n )
(33)
= n = 1 N a i n , r ( n ) ( with the index notation )
(34)
X = r = 1 R n = 1 N A .r ( n ) X = I N , R × n = 1 N A ( n )
(35)

with the factor matrices A ( n ) C I n × R ,n=1,,N.

1.2.8.0 Remarks
  • The expression (33) as a sum of polyads is called a polyadic form of by Hitchcock[51].

  • The PARAFAC model (33, 34 and 35) amounts to decomposing the tensor into a sum of R components, each component being a rank‐one tensor. When R is minimal in (33), it is called the rank of[52]. This rank is related to the mode‐n ranks by the following inequalities rank n (X)R,n=1,,N. Furthermore, contrary to the matrices for which the rank is always at most equal to the smallest of the dimensions, for higher‐order tensors, the rank can exceed any mode‐n dimension I n .

  • There exists different definitions of rank for tensors, like typical and generic ranks, or also symmetric rank for a symmetric tensor (see[53, 54] for more details).

  • In telecommunication applications, the structure parameters (rank, mode dimensions, and core tensor dimensions) of a PARAFAC or Tucker model are design parameters that are chosen in function of the performance desired for the communication system. However, in most of the applications, as for instance in multi‐way data analysis, the structure parameters are generally unknown and must be determined a priori. Several techniques have been proposed for determining these parameters (see[55]‐[58] and references therein).

  • The PARAFAC model is also sometimes defined by the following equation

    x i 1 , , i N = r = 1 R g r n = 1 N a i n , r ( n ) with g r >0.
    (36)
  • In this case, the identity tensor I N , R in (35) is replaced by the diagonal tensorG C R × × R whose diagonal elements are equal to scaling factors g r , i.e.

    g r 1 , , r N = g r if r 1 = = r N = r 0 otherwise

    and all the column vectors A .r ( n ) are normalized, i.e., with a unit norm, for 1≤nN.

  • It is important to notice that the PARAFAC model (33) is multilinear (more precisely N‐linear) in its parameters in the sense that it is linear with respect to each matrix factor. This multilinearity property is exploited for parameter estimation using the standard alternating least squares (ALS) algorithm[4, 5] that consists in alternately estimating each matrix factor by minimizing a least squares error criterion conditionally to the knowledge of the other matrix factors that are fixed with their previously estimated values.

Matrix representations of the PARAFAC model. The matrix representation (7) of the PARAFAC model (33)‐(35) is given by

X S 1 ; S 2 = n S 1 A ( n ) n S 2 A ( n ) T .
(37)
Proof.

See Appendix 4.

1.2.8.0 Remarks
  • From (37), we can deduce that a mode combination results in a Khatri‐Rao product of the corresponding factor matrices. Consequently, the tensor contraction (5) associated with the PARAFAC‐N model (35) gives a PARAFAC‐ N1 model whose factor matrices are equal to n S n 1 A ( n ) C J n 1 × R , n1=1,…,N1, with J n 1 = I n n S n 1 .

  • For the PARAFAC model, the flat mode‐n unfolding, defined in (9), is given by

    X n = A ( n ) A ( n + 1 ) A ( N ) A ( 1 ) A ( n - 1 ) T ,
    (38)
  • and the associated vectorized form is obtained in applying the vec formula (93) to the right‐hand side of the above equation, with I R =diag(1 R )

    vec(X)=vec( X n )= A ( n + 1 ) A ( N ) A ( 1 ) A ( n ) 1 R
    (39)
  • In the case of the normalized PARAFAC model (36), (37) and (39) become, respectively,

    X S 1 ; S 2 = n S 1 A ( n ) diag ( g ) n S 2 A ( n ) T vec ( X ) = vec ( X n ) = A ( n + 1 ) A ( N ) A ( 1 ) A ( n ) g
  • whereg= g 1 g R T C R × 1 .

  • For the PARAFAC model of a third‐order tensorX C I × J × K with factor matrices (A,B,C), the formula (37) gives for S 1 ={i,j} and S 2 ={k}

    X IJ × K = X 1 .. X I.. =(AB) C T C IJ × K .
  • Noting thatAB= B D 1 ( A ) B D I ( A ) , we deduce the following expression for mode‐1 matrix slices

    X i.. =B D i (A) C T .
  • Similarly, we have

    X JK × I = ( B C ) A T , X KI × J = ( C A ) B T , X .j. = C D j ( B ) A T , X ..k = A D k ( C ) B T .
  • For the PARAFAC model of a fourth‐order tensorX C I × J × K × L with factor matrices (A,B,C,D), we obtain

    X IJK × L = ( A B C ) D T = ( B C ) D 1 ( A ) ( B C ) D I ( A ) D T = C D 1 ( B ) D 1 ( A ) C D J ( B ) D I ( A ) D T C IJK × L X ij.. = C D j ( B ) D i ( A ) D T C K × L
    (40)

    Other matrix slices can be deduced from (40) by simple permutations of the matrix factors.

In the next section, we introduce two constrained PARAFAC models, the so‐called PARALIND and CONFAC models, and then PARATUCK models.

1.3 Constrained PARAFAC models

The introduction of constraints in tensor models can result from the system itself that is under study or from a system design. In the first case, the constraints are often interpreted as interactions or linear dependencies between the PARAFAC factors. Examples of such dependencies are encountered in psychometric and chemometric applications that gave origin, respectively, to the PARATUCK‐2 model[59] and the parallel profiles with linear dependencies (PARALIND) model[60, 61], introduced in[47] under the name canonical decomposition with linear constraints (CANDELINC), for the multiway case. A first application of the PARATUCK‐2 model in signal processing was made in[62] for blind joint identification and equalization of Wiener‐Hammerstein communication channels. The PARALIND model was applied for identifiability and propagation parameter estimation purposes in a context of array signal processing[63, 64].

In the second case, the constraints are used as design parameters. For instance, in a telecommunications context, we proposed two constrained tensor models: the CONFAC (constrained factor) model[65] and the PARATUCK‐ (N1,N) model[66, 67]. The PARATUCK‐2 model was also applied for designing space‐time spreading‐multiplexing MIMO systems[68]. For these telecommunication applications of constrained tensor models, the constraints are used for resource allocation. We are now going to describe these various constrained PARAFAC models.

1.3.1 PARALIND models

Let us define the core tensor of the Tucker model (26) as follows:

G= I N , R × n = 1 N Φ ( n )
(41)

where Φ ( n ) R R n × R ,n=1,,N, withR max n ( R n ), are constraint matrices. In this case, will be called the ‘interaction tensor’, or ‘constraint tensor’.

The PARALIND model is obtained by substituting (41) into (26) and applying the property (13), which gives

X=G × n = 1 N A ( n ) = I N , R × n = 1 N A ( n ) Φ ( n ) .
(42)

This equation leads to two different interpretations of the PARALIND model, as a constrained Tucker model whose core tensor admits a PARAFAC decomposition with factor matrices Φ(n), called ‘interaction matrices,’ and as a constrained PARAFAC model with constrained factor matrices A ̄ ( n ) = A ( n ) Φ ( n ) .

The interaction matrix Φ(n) allows taking into account linear dependencies between the columns of A(n), implying a rank deficiency for this factor matrix. When the columns of Φ(n) are formed with 0’s and 1’s, the dependencies simply consist in a repetition or an addition of certain columns of A(n). In this particular case, the diagonal element ξ r , r ( n ) 1 of the matrix Ξ ( n ) = Φ ( n ) T Φ ( n ) R R × R represents the number of columns of A(n) that are added to form the r th column of the constrained factor A(n)Φ(n). The choice Φ ( n ) = I R means that there is no such dependency among the columns of A(n).

Note that (42) can be written element‐wise as

x i 1 , , i N = r 1 = 1 R 1 r N = 1 R N g r 1 , , r N n = 1 N a i n , r n ( n ) = r = 1 R n = 1 N ā i n , r ( n ) with ā i n , r ( n ) = r n = 1 R n a i n , r n ( n ) ϕ r n , r ( n ) . with g r 1 , , r N = r = 1 R n = 1 N ϕ r n , r ( n )
(43)

This constrained PARAFAC model constitutes an N‐way form of the three‐way PARALIND model, used for chemometric applications in[60, 61].

1.3.2 CONFAC models

When the constraint matrices Φ ( n ) R R n × R are full row rank and their columns are chosen as canonical vectors of the Euclidean space R R n , for n=1,…,N, the constrained PARAFAC model (42) constitutes a generalization to N th‐order of the third‐order CONFAC model, introduced in[65] for designing MIMO communication systems with resource allocation. This CONFAC model was used in[69] for solving the problem of blind identification of underdetermined mixtures based on cumulant generating function of the observations. In a telecommunications context where represents the tensor of received signals, such a constraint matrix Φ(n) can be interpreted as an ‘allocation matrix’ allowing to allocate resources, like data streams, codes, and transmit antennas, to the R components of the signal to be transmitted. In this case, the core tensor will be called the ‘allocation tensor.’ By assumption, each column of the allocation matrix Φ(n) is a canonical vector of R R n , which means that there is only one value of r n such that ϕ r n , r ( n ) =1, and this value of r n corresponds to the n th resource allocated to the r th component.

Each element x i 1 , , i N of the received signal tensor is equal to the sum of R components, each component r resulting from the combination of N resources, each resource being associated with a column of the matrix factor A(n), n=1,…,N. This combination, determined by the allocation matrices, is defined by a set of N indices {r1,…,r N } such that n = 1 N ϕ r n , r ( n ) =1. As for anyr 1 , R , there is one and only one N‐uplet (r1,…,r N ) such as n = 1 N ϕ r n , r ( n ) =1, we can deduce that each component r of x i 1 , , i N in (43) is the result of one and only one combination of the N resources under the form of the product n = 1 N a i n , r n ( n ) . For the CONFAC model, we have

r n = 1 R n D r n Φ ( n ) = I R ,n=1,,N

meaning that each resource r n is allocated at least once, and the diagonal element of Ξ(n)=Φ(n)TΦ(n) is such as ξ r , r ( n ) =1,n=1,,N, because only one resource r n is allocated to each component r. Moreover, we have to notice that the assumptionR max n ( R n ) implies that each resource can be allocated several times, i.e., to several components. Defining the interaction matrices

Γ ( n ) = Φ ( n ) Φ ( n ) T R R n × R n , Γ ( n 1 , n 2 ) = Φ ( n 1 ) Φ ( n 2 ) T R R n 1 × R n 2

the diagonal element γ r n , r n ( n ) 1 , R - R n + 1 represents the number of times that the r n th column of A(n) is repeated, i.e., the number of times that the r n th resource is allocated to the R components, whereas γ r n 1 , r n 2 ( n 1 , n 2 ) determines the number of interactions between the r n 1 th column of A ( n 1 ) and the r n 2 th column of A ( n 2 ) , i.e., the number of times that the r n 1 th and r n 2 th resources are combined in the R components. If we choose R n =R and Φ ( n ) = I R ,n=1,,N, the PARALIND/CONFAC model (42) becomes identical to the PARAFAC one (35).

The matrix representation (7) of the PARALIND/CONFAC model can be deduced from (37) in replacing A(n) by A(n)Φ(n)

X S 1 ; S 2 = n S 1 A ( n ) Φ ( n ) n S 2 A ( n ) Φ ( n ) T .

Using the identity, (86) gives

X S 1 ; S 2 = n S 1 A ( n ) n S 1 Φ ( n ) n S 2 Φ ( n ) T n S 2 A ( n ) T ,
(44)

or, equivalently,

X S 1 ; S 2 = n S 1 A ( n ) G S 1 ; S 2 n S 2 A ( n ) T ,

where the matrix representation G S 1 ; S 2 of the constraint/allocation tensor, defined by means of its PARAFAC model (41), can also be deduced from (37) as

G S 1 ; S 2 = n S 1 Φ ( n ) n S 2 Φ ( n ) T .

1.3.3 Nested Tucker models

The PARALIND/CONFAC models can be viewed as particular cases of a new family of tensor models that we shall call nested Tucker models, defined by means of the following recursive equation:

X ( p ) = X ( p - 1 ) × n = 1 N A ( p , n ) for p = 1 , , P = G × n = 1 N q = P 1 A ( q , n )

with the factor matrices A ( p , n ) C R ( p , n ) × R ( p - 1 , n ) for p=1,…,P, such as R(0,n)=R n and R(P,n)=I n , for n=1,…,N, the core tensor X ( 0 ) =G C R 1 × × R N , and X ( P ) C I 1 × × I N . This equation can be interpreted as P successive linear transformations applied to each mode‐n space of the core tensor. So, P nested Tucker models can then be interpreted as a Tucker model for which the factor matrices are products of P matrices. WhenG= I N , R , which implies R(0,n)=R n =R for n=1,…,N, we obtain nested PARAFAC models. The PARALIND/CONFAC models correspond to two nested PARAFAC models (P=2), with A(1,n)=Φ(n), A(2,n)=A(n), R(0,n)=R, R(1,n)=R n , and R(2,n)=I n , for n=1,…,N.

By considering nested PARAFAC models with P=3, A ( 1 , n ) = Φ ( n ) C K n × R , A ( 2 , n ) = A ( n ) C J n × K n , and A ( 3 , n ) = Ψ ( n ) C I n × J n , for n=1,…,N, we deduce doubly PARALIND/CONFAC models described by the following equation:

X= I N , R × n = 1 N Ψ ( n ) A ( n ) Φ ( n ) .

Such a model can be viewed as a doubly constrained PARAFAC model, with factor matrices Ψ(n)A(n)Φ(n), the constraint matrix Ψ(n), assumed to be full column rank, allowing to take into account linear dependencies between the rows of A(n). A third‐order nested Tucker model is visualized in Figure1.

Figure 1
figure 1

Visualization of a third‐order nested Tucker model.

1.3.4 Block PARALIND/CONFAC models

In some applications, the data tensorX C I 1 × × I N is written as a sum of P sub‐tensors X ( p ) , each sub‐tensor admitting a tensor model with a possibly different structure. So, we can define a block‐PARALIND/CONFAC model as

X = p = 1 P X ( p ) ,
(45)
X ( p ) = G ( p ) × n = 1 N A ( p , n ) , G ( p ) = I N , R ( p ) × n = 1 N Φ ( p , n ) ,
(46)

where A ( p , n ) C I n × R ( p , n ) , Φ ( p , n ) C R ( p , n ) × R ( p ) , and G ( p ) C R ( p , 1 ) × × R ( p , N ) are the mode‐n factor matrix, the mode‐n constraint/allocation matrix, and the core tensor of the PARALIND/CONFAC model of the p th sub‐tensor, respectively. The matrix representation (44) then becomes

X S 1 ; S 2 = p = 1 P n S