In Section 3, we presented CP for three-way tensors. Various optimization algorithms exist to compute CP decomposition without constraint, as ALS or descent algorithms [7, 8, 19]. We subsequently present optimization algorithms to compute the CP decomposition (10), under the constraints of unit norm columns of loading matrices.
4.1 Alternating least squares algorithm
The ALS algorithm was proposed for CP computation by Carroll and Chang in [20] and Harshman in [3] and still stays the workhorse algorithm today, mainly owing to its ease of implementation [21]. ALS is hence the classical solution to minimize the cost function (10), despite its lack of convergence proof. This iterative algorithm alternates among the estimation of matrices A, B, and C.
The principle of the ALS algorithm is to convert a nonlinear optimization problem into three independent linear least squares (LS) problems. In the first steps, one of the three matrices, say, A is updated while the two others (B and C) are fixed to their values obtained in previous estimation steps. The estimate of A is given by:
(15)
where TI, KJ is the unfolding matrix of size I × J K defined in Section 2.2, and ()† is the Moore-Penrose pseudo inverse. By symmetry, the expressions are similar for and .
4.2 Proposed algorithms
Our optimization problem consists in minimizing the squared error Υ under a collection of 3R constraints, namely:
(16)
Therefore, we need to find three matrices A, B, and C of unit norm columns which minimize (16). Stack these three matrices in a I+J+K by R matrix denoted by X. The objective can now be also written Υ(X,Λ), for the sake of convenience.
The computation of loading matrices is performed by minimizing the quadratic cost function (10). One generates a series of iterates X(k) = [A(k)T, B(k)T, C(k)T]T, , with initial value X(0) arbitrarily chosen. Generally, the algorithm consists of choosing at the k th iteration a point X(k + 1) in a direction lying in the half space defined by the gradient of objective function Υ, defined by matrix D(k), which verifies [22]:
(17)
One possibility is to choose the direction opposite to the gradient:
(18)
The gradient components ∇Υ
A
(size I × R), ∇Υ
B
(J × R), and ∇Υ
C
(K × R) can be stacked into a single matrix of size I+J+K by R:
(19)
Since objective function (10) is real but its arguments are complex, its gradient can be computed in the sense of [23, 24]. Using the quadratic form proposed in [23], the objective function can be expanded into:
(20)
where and . Thus, the gradient of Υ with respect to A is:
where M of size R × R and N of size I × R. The gradient of Υ with respect to B and C is similar, taking into account the fact that matrices M and N need to be defined accordingly (for the gradient of Υ with respect to B and C, the dimension of matrix N is J × R and K × R, respectively, while the dimension of M is always R × R).
The difficulty that arises in constrained optimization is to make sure that the move remains within the feasible set,
, defined by the constraints. In the following subsections, we propose two versions of our algorithm, with two different ways of calculating scale factor Λ.
Descent algorithms are also determined by the steps that will be executed in the chosen direction. There are various methods for the step selection, and the most widely used are Backtracking and Armijo[22]. To compute the stepsize ℓ(k) in Algorithm 1 and Algorithm 2, we use a popular inexact line search method, very simple and quite effective, which is the backtracking line search. It depends on two fixed constants α, β with 0 < α < 0.5 and 0 < β < 1.
Backtracking algorithm
-
1.
Given a descent direction D for Υ, α ∈ (0,0.5), β ∈ (0,1).
-
2.
Initialization: ℓ = 1.
-
3.
while Υ (X +ℓ D;Λ) > Υ(X;Λ)+α ℓ∇Υ TD
-
4.
ℓ = β ℓ.
4.2.1 Algorithm 1
In the recent work on CP tensor decomposition, optimization of the objective function (10) was made without explicitly considering the factor Λ. More precisely in most contributions, the scaling factor is integrated into loading matrices, so that Λ may be set to the identity. The first solution we propose is to use a projected gradient algorithm while calculating Λ as the product of normalizing factors of matrices A, B, and C:
(22)
where ⊡ is the Hadamard product, Λ
A
=Diag{∥a1∥,⋯,∥a
R
∥}, and similar definitions for Λ
B
and Λ
C
. This approach, which we call ‘Algorithm 1’ can be described by the pseudo-code below:
4.2.2 Algorithm 2
The other approach is to consider Λ as an additional variable. By canceling the gradient of Υ(X,Λ) with respect to Λ, one obtains Equation 11, which can be solved for Λ when X is fixed. This gives the algorithms below.
4.2.2.0 Stopping criterion
The convergence is usually considered to be obtained at the k th iteration when the error between tensor
, and the tensor reconstructed from the estimated loading matrices, does not significantly change between iterations k and k + 1.
However, in the complex case, the phase of the entries of loading matrices found at the end of the algorithm – as defined by the stopping criterion above - is different from the original. To remedy this problem, we propose a new performance criterion in order to minimize the distance between the original and the estimated matrices. Although this criterion is not usable when actual loading matrices are unknown, it still permits to assess the performances effectively attained.