Using tensor contractions to derive the structure of slice-wise multiplications of tensors with applications to space–time Khatri–Rao coding for MIMO-OFDM systems

The slice-wise multiplication of two tensors is required in a variety of tensor decompositions (including PARAFAC2 and PARATUCK2) and is encountered in many applications, including the analysis of multidimensional biomedical data (EEG, MEG, etc.) or multi-carrier multiple-input multiple-output (MIMO) systems. In this paper, we propose a new tensor representation that is not based on a slice-wise (matrix) description, but can be represented by a double contraction of two tensors. Such a double contraction of two tensors can be efficiently calculated via generalized unfoldings. It leads to new tensor models of the investigated system that do not depend on the chosen unfolding (in contrast to matrix models) and reveal the tensor structure of the data model, such that all possible unfoldings can be seen at the same time. As an example, we apply this new concept to the design of new receivers for multi-carrier MIMO systems in wireless communications. In particular, we consider MIMO-orthogonal frequency division multiplexing (OFDM) systems with and without Khatri–Rao coding. The proposed receivers exploit the channel correlation between adjacent subcarriers, require the same amount of training symbols as traditional OFDM techniques, but have an improved performance in terms of the symbol error rate. Furthermore, we show that the spectral efficiency of the Khatri–Rao-coded MIMO-OFDM can be increased by introducing cross-coding such that the “coding matrix” also contains useful information symbols. Considering this transmission technique, we derive a tensor model and two types of receivers for cross-coded MIMO-OFDM systems using the double contraction of two tensors.

tools to derive such an explicit tensor structure in general. The received data in a MIMO-OFDM system are derived from such an explicit tensor structure, which is efficiently exploited at the receiver for a joint channel and symbol estimation. More specifically, we first present the double contraction between an uncoded signal tensor and a channel tensor for OFDM systems, yielding the same spectral efficiency as matrix-based approaches (since no additional spreading is used) [17]. We propose an application of the double contraction operator to Khatri-Rao-coded MIMO-OFDM systems [18]. Due to the Khatri-Rao coding, the signal tensor has a richer structure and can be recast as a constrained CP-like model. In fact, the Khatri-Rao space-time coding concept has been introduced in [19]. Later, it has been extended in [20] to Khatri-Rao space-time-frequency coding. In contrast to the state of the art [4,13,14,20], in this work we exploit the structure of the channel and the contraction properties using the transmit signal tensor and the known coding matrix to propose a receiver based on the LS-KRF. In addition, we reduce the number of required pilot symbols by exploiting the correlation of the channel in the frequency domain, which has not been exploited in these previous works. Finally, we propose a more spectrally efficient cross-coding model for MIMO-OFDM systems. In this case, the known and fixed Khatri-Rao coding matrix is eliminated, and two useful symbol matrices are cross-coded by means of the Khatri-Rao product. By exploiting the CP-like tensor structure of the received signal, we also design two types of receivers for the cross-coded MIMO-OFDM systems.
This paper is organized as follows. In Sect. 2, we introduce the tensor algebra notation and provide the mathematical tools to derive an explicit tensor structure from the slice-wise multiplication of two tensors. Section 3 describes the system model using the double contraction formalism for the traditional MIMO-OFDM transmission. In Sect. 4, we recast the tensor signal model for the Khatri-Rao-coded MIMO-OFDM case and present the two closed-form receiver designs for this system, which are based on the Khatri-Rao factorization. In Sect. 5, we consider a cross-coded MIMO-OFDM system with enhanced spectral efficiency and derive the corresponding semi-blind receivers. A discussion on the computational complexity of the different receivers is also carried out. In Sect. 6, numerical results are presented, and the paper is concluded in Sect. 7.

Notation
We use the following notation. Scalars are denoted either as capital or lower-case italic letters, A, a. Vectors and matrices are denoted as bold-faced lower-case and capital letters, a, A , respectively. Tensors are represented by bold-faced calligraphic letters A . The following superscripts, T , H , −1 , and + denote transposition, Hermitian transposition, matrix inversion, and Moore-Penrose pseudo-matrix inversion, respectively. The outer product, Kronecker product, and Khatri-Rao product are denoted as • , ⊗ , and ⋄ , respectively. Moreover, we denote the Hadamard product (element-wise multiplication) and the inverse Hadamard product (element-wise division) between two arrays of equal dimensions as ⊙ and ⊘ , respectively. The operators ||.|| F and ||.|| H denote the Frobenius norm and the higher order norm of a tensor that is defined as the square root of the sum of the squared absolute values of its elements, respectively. Moreover, the n-mode product between a tensor A ∈ C I 1 ×I 2 ...×I N and a matrix B ∈ C J ×I n is denoted as A × n B , for n = 1, 2, . . . N [21]. The identity N-way tensor of dimension R × R · · · × R is denoted as I N ,R . Similarly, an identity matrix of dimension R × R is denoted as I R and we denote a vector of ones of length R as 1 R . The nth three-mode slice of a tensor A ∈ C I×J ×N is denoted as A (.,.,n) and accordingly one element of this tensor is denoted as A (i,j,n) . The operator diag(.) transforms a vector into a diagonal matrix and the operator vec(.) transforms a matrix into a vector. Note that we distinguish between a super-diagonal or an identity tensor and a diagonal tensor. A diagonal tensor is a tensor that consists of diagonal slices along one dimension. For instance, a diagonal tensor D A ∈ C M×N ×N that is diagonal along the first dimension has diagonal one-mode slices, i.e., D A(m,.,.) = diag(a m ) , for m = 1, . . . , M , where a m is an n-dimensional vector. The concatenation of two tensors along their mth dimension is denoted as ⊔ m [22]. For two tensors A ∈ C I×I 2 ×I 3 and B ∈ C J ×I 2 ×I 3 , after the concatenation along the first dimension, we get A ⊔ 1 B ∈ C I+J ×I 2 ×I 3 .

The CP decomposition and generalized tensor unfoldings
The CP tensor decomposition decomposes a given tensor into the minimum number of rank one components. The CP decomposition of a four-way, rank R tensor A ∈ C I×J ×M×N can be written as where F 1 ∈ C I×R , F 2 ∈ C J ×R , F 3 ∈ C M×R , and F 4 ∈ C N ×R are the factor matrices [21,23]. In addition to the n-mode unfoldings, generalized matrix unfoldings can be defined by using two subsets of any of the N dimensions [24,25]. For instance, the set of modes (1, 2, . . . , N ) of an N-way tensor A can be divided into two non-overlapping subsets with cardinality P and N − P , α (1) = [α 1 . . . α P ] and α (2) = [α P+1 . . . α N ] , respectively. This leads to the generalized unfolding [A] (α (1) ,α (2) ) , where the indices contained in α (1) vary along the rows and the indices contained in α (2) vary along the columns. Here, the index α 1 varies the fastest between the rows, the index α P+1 varies the fastest between the columns, P is any number between one and N, and α n is any of the tensor dimensions. For instance, let us assume the four-way tensor A ∈ C I×J ×M×N defined in Eq. (1). In the generalized unfolding [A] ( [1,2], [3,4]) the first mode varies faster than the second mode along the rows and the third mode varies faster than the fourth mode along the columns. Moreover, for a tensor with a CP structure, its unfoldings and generalized unfoldings can be expressed in terms of the factor matrices. For instance, the generalized unfolding [3,4]) of the tensor A satisfies [18,25] In a similar way, the rest of the tensor unfoldings and generalized unfoldings can be defined.

Tensor contraction
The contraction A • m n C between two tensors A ∈ C I 1 ×I 2 ...×I N and C ∈ C J 1 ×J 2 ...×J N represents an inner product of the nth mode of A with the mth mode of C , provided that I n = J m [26]. Contraction along several modes of compatible dimensions is also possible [3,4] and accordingly the contraction along two modes is denoted as A • m,l n,k C . More specifically, the double contraction between the tensors A ∈ C I×J ×M×N and C ∈ C M×N ×K is defined as [26], This example represents a contraction of the third and fourth mode of A with the first and second mode of C , respectively.
Using the concept of the generalized unfoldings, it can be shown that the tensor contraction satisfies In the generalized unfolding [A] ( [1,2], [3,4]) the first mode varies faster than the second mode between the rows and the third mode varies faster than the fourth mode between the columns.

Hadamard product via tensor contraction
First, let us consider a Hadamard product (element-wise multiplication) between two vectors a ∈ C M×1 and b ∈ C M×1 , c (m) = a (m) b (m) , ∀m = 1, . . . , M ( c ∈ C M×1 ). The Hadamard product can be expressed via the multiplication of a diagonal matrix and a vector, i.e., a ⊙ b = diag (a)b = diag (b)a . Using the fact that a matrix multiplication is equivalent to the contraction

Slice-wise multiplication via tensor contraction
A slice-wise multiplication between two tensors A ∈ C M×N ×K and B ∈ C N ×J ×K is defined as T 1(.,.,k) = A (.,.,k) B (.,.,k) , ∀k = 1, . . . , K . We depict this slice-wise multiplication in Fig. 1. To express this slice-wise multiplication we can diagonalize B to obtain ( .  3 2,4 B ∈ C M×K ×J with D A(m,n,k,k) = A (m,n,k) as diagonal elements (nonzero elements of D A ). Note that the tensors T 1 , T 2 , and T 3 contain the same elements, but have permuted dimensions. However, the permuted order of the dimensions is not relevant, because we always explicitly declare which dimension is multiplied or unfolded.

Representation of diagonal matrices and diagonal tensors in terms of Khatri-Rao products
An explicit expression of the diagonalized tensor can be obtained by expressing its generalized unfolding in terms of a Khatri-Rao product with an identity matrix. First, let us consider the column vector a ∈ C M . It can be easily shown that Next, let us consider the reshaping of the matrix A ∈ C M×N into a diagonal tensor D (A) = I 3,M × 3 A T . By studying the resulting tensor structure, the tensor unfoldings, and the properties of the Khatri-Rao product, we get Likewise, for the tensor The expression of the diagonalized tensor in terms of its generalized unfoldings and the Khatri-Rao product with an identity matrix can also be obtained for N-way tensors. It is useful to note that there exists a link between the diagonalized tensor structures and their corresponding generalized unfoldings. The latter can always be expressed as a Khatri-Rao product between an identity matrix and a generalized unfolding of the tensor to be diagonalized, where the dimensions that are diagonalized are in the columns of the second matrix. This notation will be used later in this paper, and it is given in Table 1.
The element-wise or slice-wise multiplication between two arrays of the same order can be written in terms of a contraction if the unaffected mode vectors are transformed  into a diagonal matrix (by adding an additional array dimension). This diagonalization can be performed using the Khatri-Rao product as shown in Table 1. As an example, please refer to the transformation of Eq. (4) to the equations at the beginning of Sect. 3.3 in this paper.

MIMO-OFDM
We assume a MIMO-OFDM system with M T transmit and M R receive antennas. One OFDM block consists of N samples, which equals the discrete Fourier transform (DFT) length, using the assumption that all N subcarriers are used for data transmission. If guard subcarriers are used, i.e., not all subcarriers are used for data transmission, the number of OFDM samples is smaller that the DFT length. All signals and equations used for the following derivation are in the frequency domain. Moreover, N is the number of subcarriers and K denotes the number of transmitted frames, where each frame consists of N symbol periods. The received signal in the frequency domain Ỹ ∈ C N ×M R ×K after the removal of the cyclic prefix is defined by means of the contraction operator We use ∼ to distinguish the frequency domain from the time domain, i.e., Ỹ = Y × 1 F N , where F N ∈ C N ×N is the DFT matrix and Y is the received signal in the time domain. The transmit signal tensor is denoted as S ∈ C N ×M T ×K and Ñ ∈ C N ×M R ×K represents the additive white Gaussian noise in the frequency domain. The tensor Ỹ 0 ∈ C N ×M R ×K represents the noiseless received signal in the frequency domain after the removal of the cyclic prefix. The frequency-selective propagation channel is represented by a channel tensor H ∈ C N ×N ×M R ×M T as we propose in [18] the structure of which is detailed as follows.

Channel tensor
We assume that the frequency-selective channel has an impulse response h  [10,11]. Here, the matrix F N ×L ∈ C N ×L contains the first L columns of the DFT matrix of size N × N . Collecting all the channel matrices in a four-way channel tensor H , we get For each receive-transmit antenna pair, the channel transfer matrix is a diagonal matrix that is represented by the corresponding slice of the tensor H as shown in (5). The vector h (m R ,m T ) ∈ C N ×1 contains the frequency domain channel coefficients. An example of a MIMO system with M T = 2 transmit antennas and M R = 3 receive antennas and the corresponding channel vectors is depicted in Fig. 2. We assume that the channel stays constant during the K frames. Note that only in case of cyclic prefix OFDM the channel tensor in the frequency domain contains diagonal matrices for each receive-transmit antenna pair. In a general multi-carrier system, the frequency domain channel matrix is not necessarily diagonal. However, Eq. (4) is still satisfied which means that our general model is valid for any multi-carrier MIMO system (not only OFDM-based), including systems without orthogonality in the frequency domain and systems with different types of coding. In (5), we have defined the channel tensor. However, up to this point, we have not revealed the explicit tensor structure. In order to do so, let us first assume that all channel transfer matrices for the m T th transmit and all receive antennas are collected in a Based on this diagonal structure, the tensor H (m T ) R can be written as the following CP decomposition: where The complete four-way channel tensor, defined in Eq. (5), can be obtained by concatenating the H (m T ) R tensors along the fourth dimension. Hence, the four-way channel tensor H can be expressed as Note that H satisfies a very special block term decomposition (BTD), where D (.,.,.,1) = I 3,N ∈ R N ×N ×N ×1 ( D = I 4,1 ⊗ I 3,N ) and e m T ∈ R M T ×1 is a pining vector.
The BTD decomposes a tensor into block terms of smaller n-mode ranks [27]. We prove the BTD structure of the channel tensor H in "Appendix. " In this appendix, we also show that the ( [1,3], [2,4]) generalized unfolding of the channel tensor can be expressed as where H ∈ C M R ×NM T is a matrix containing all nonzero elements of the tensor H and it is defined as

Data transmission
The signal tensor S in Eq. (4) contains all data symbols in the frequency domain that are transmitted on N subcarriers, M T transmit antennas, and K frames. For notational simplicity, we define the following block matrix S as the transpose of the three-mode unfolding of S where S (m T ) ∈ C K ×N contains the symbols transmitted via the m T th antenna.
Moreover, we assume that the symbol matrix consists of data and pilot symbols, S =S d +S p . The matrices S d and S p represent the data symbols and the pilot symbols, respectively. The matrix S d contains zeros at the positions of the pilot symbols. Accordingly, the matrix S p contains nonzero elements only at the pilot positions. Typically, there are three ways of arranging the pilot symbol within the OFDM blocks (block, comb, and lattice-type) [7]. We assume a comb-type arrangement, where the pilot symbols are positioned on non-consecutive positions with equidistant spacing in the time and the frequency domains, for each antenna. The spacing in the time domain is denoted by K . Moreover, we assume a spacing in the frequency domain of F between two pilot symbols. Furthermore, there are positions where neither pilot symbols nor data symbols are allowed to be transmitted. These positions are reserved for the pilot symbols corresponding to the remaining antennas. This results in M T ⌊ N F ⌋ pilot symbols per frame. In comparison, other publications such as [4,[12][13][14] use N M T pilot symbols per frame. By exploiting the channel correlation among adjacent subcarriers, a reduced number of pilot symbols can be used for channel estimation.

Receiver design
Using the property of the generalized unfoldings in Eq. (2), the received signal in Eq. (4) becomes Next, by substituting the corresponding tensor unfoldings in the above equation, we get The above equation satisfies an unfolding of a noisy observation of a low-rank tensor with a CP structure. By applying an inverse unfolding for the received signal in the frequency domain after the removal of the cyclic prefix, we get the desired tensor description of the received data tensor Note that this model is a constrained CP-like model where the one-mode factor is a known constraint matrix. Our goal is to exploit (12) to jointly estimate the channel and the symbols, i.e., H and S . The author of [16] proposes a similar model for the received signal of FBMC systems. In contrast to the model derived in this paper from contractions, the model in [16] is derived from the PARA TUC K2 model. This means that the received signal should fit the PARA TUC K2 decomposition in order to satisfy the received signal structure. On the other hand, the proposed derivation based on contractions in (4) is more general and it holds without such an assumption. More specifically, the proposed tensor contraction formalism that defines the signal model in Eq. (5) does not require the matrix slices H defined in Eq. (6) to be diagonal. Therefore, the proposed model and the derived algorithms remain valid for nonorthogonal multi-carrier systems with an arbitrary structure of the equivalent channel tensor in Eq. (4). This aspect is not captured by the tensor modeling approach of [16].
Using the prior knowledge of the pilot symbols and their positions, the channel in the frequency domain can be estimated. Naturally, the channel is estimated only at those subcarrier positions where the pilot symbols are located. Afterwards, an interpolation is applied to get the complete channel estimate. Alternatively, as shown in [10,11] the channel can be first estimated in the time domain and then transformed into the frequency domain. Either way, this leads to a pilot-based channel estimate that we denote as H p , or H p 1 . The pilot-based channel estimate is then used to estimate the data symbols. In the remainder of this section, we discuss different ways to estimate the symbols. We use the pilot-based channel estimate to initialize the proposed algorithms. [Ỹ] ( [1,2], [3] Traditionally, the estimate of the symbols is obtained in the frequency domain with a ZF receiver. In this case, the symbols are calculated by inverting the channel matrix for each subcarrier individually. Alternatively, if we compute the one-mode unfolding of the tensor Ỹ in Eq. (12), we get Taking into account the structure of the matrices (10), and S in (11), the one-mode unfolding becomes After transposition and omitting the noise term, we get This sum of Khatri-Rao products can be resolved in a column-wise fashion. Let After reshaping this vector into the matrix Ỹ n ∈ C M R ×K , such that ỹ n = vec(Ỹ n ) , it is easy to see that this matrix satisfies where H n and S n are the nth slices of H (n,n,.,.) ∈ C M R ×M T and S (n,.,.) ∈ C M T ×K , respectively. Note that Ỹ n is the nth slice of Ỹ (n,.,.) . Using the pseudo inverse of the channel, we get the traditional ZF receiver.
Alternatively, the channel and the symbols on each subcarrier can be estimated by means of iterative or recursive LS algorithms. Similar algorithms were proposed in [28] and [29] for blind source separation on a single subcarrier. We extend two of the algorithms presented in [29] that are based on projection to our application. We have proposed an extension of these algorithm using enumeration in [17], namely iterative least squares with projection (ILSP) and recursive least squares with projections (RLSP). In this paper, our focus is on finite alphabet projection-based algorithms since that they are computationally less expensive than the algorithms based on enumeration.
The identifiability properties of the problem in Eq. (13) have already been studied in [29], where the authors present sufficient conditions for identifiability.

Khatri-Rao-coded MIMO-OFDM
In this section, we model a Khatri-Rao-coded MIMO-OFDM communication system as a double tensor contraction between a channel and a signal tensor that contains coded symbols. This double tensor contraction is essentially equivalent to the model in (4). However, we assume that the signal tensor contains Khatri-Rao-coded symbols.
[Ỹ] ([1], [2,3] [3,2]) . As in Sect. 3, we assume a MIMO-OFDM communication system with M T transmit and M R receive antennas. One OFDM block consists of N samples, which equals the DFT length. Moreover, all N subcarriers are used for data transmission. Furthermore, we assume a frequency-selective channel model that stays constant over the transmission of P frames. In contrast to the model presented in Sect. 3, here, we assume that the P frames are divided into K groups of Q blocks (Q corresponds to the spreading factor), P = K · Q.
Accordingly, the received signal in the frequency domain is given by where H ∈ C N ×N ×M R ×M T is the channel tensor and X ∈ C N ×M T ×K ×Q is the signal tensor. The tensor Ñ ∈ C N ×M R ×K ×Q contains additive white Gaussian noise and Ỹ 0 ∈ C N ×M R ×K ×Q is the noiseless received signal.

Channel tensor
In this section, we use the model of the channel tensor H defined in Eq. (8). Moreover, we have defined the generalized unfolding H

Data transmission
We can impose a CP structure to the transmit signal tensor, if we assume Khatri-Raocoded symbols [19,20]. The coding is proportional to the number of transmit antennas if we use a spreading factor Q = M T , for each subcarrier n = 1, 2, . . . , N . Hence, the generalized unfolding of the signal tensor is where the matrix S n ∈ C K ×M T contains modulated data symbols and C n ∈ C Q×M T is a Vandermonde coding matrix as defined in [19]. The matrices S = S 1 . . .S N ∈ C K ×M T N and C = C 1 . . . C N ∈ C Q×M T N contain all symbol and coding matrices for each subcarrier, respectively. Note that S =S · P , where the matrix S is defined in Eq. (11) and P ∈ R NM T ×M T N is the above-mentioned permutation matrix that reorders the columns such that the faster increasing index is M T instead of N. Moreover, we assume that S contains pilot symbols as explained after Eq. (11). As shown in [19] and as directly follows from (16) [4,3]

Receiver design
Using Eqs. (2), (3), and (14), the noiseless received signal can be expressed as Inserting the corresponding unfoldings of the channel and the signal tensor in Eqs. (15) and (16), respectively, the noiseless received signal in the frequency domain is given by The above equation represents an unfolding of a four-way tensor with a CP structure. Therefore, the noiseless received signal tensor can be expressed as Equation (17) represents the received signal in the frequency domain for all N subcarriers, M R receive antennas, and P frames after the removal of the cyclic prefix. Depending on the available a priori knowledge at the receiver side, channel estimation, symbol estimation, or joint channel and symbol estimation can be performed.
Let us compare the MIMO-OFDM tensor model and the Khatri-Rao-coded MIMO-OFDM tensor model in Eqs. (12) and (17), respectively. First, the factor matrices in these equations have different index orderings. In Eq. (12) the faster increasing index in N, whereas in Eq. (17) the faster increasing index in M T along the columns of the factor matrices. We use ∼ and − to distinguish the different index orderings of the factor matrices. Recall that we have defined a permutation matrix P that considers the reordering of the columns of the factor matrices. Moreover, Eq. (17) has an additional tensor dimension (the four-mode) corresponding to the coding technique and the spreading factor Q. Furthermore, taking into account the permutation matrix P , we get Eq. (12) from Eq. (17) for Q = 1 and C = 1 T M T N (i.e., no coding and the spreading factor equals one).
Using Eq. (17), the channel and the data symbols can be jointly estimated from the ( [1,4], [3,2]) generalized unfolding of the noise corrupted received signal Under the assumption that Q = M T , C ⋄ (I N ⊗ 1 T M T ) ∈ C NQ×M T N is a block diagonal, left invertible matrix and known at the receiver. Using the properties of the coding matrices defined in [19], i.e., C H n C n = M T I M T , we have After transposition, Ȳ T ≈H ⋄S can be approximated by the Khatri-Rao product between the channel and the data symbols. Therefore, the channel and the data symbols can be jointly estimated based on the LS-KRF as in [30].
Using the LS-KRF, the matrices H and S can be identified up to one complex scaling factor ambiguity per column. Hence, the estimated matrices satisfy the following relations: where ∈ C M T N ×M T N is a diagonal matrix with diagonal elements equal to the M T N complex scaling ambiguities. The simplest way to resolve the scaling ambiguity is by assuming the knowledge of one row of the matrix S ∈ C K ×M T N . This corresponds to M T N pilot symbols, i.e., one pilot symbol per transmit antenna and subcarrier. Since traditional MIMO-OFDM communication systems use fewer pilot symbols than M T N , we propose to use the same amount of pilot symbols and exploit the channel correlation between adjacent subcarriers in order to estimate the scaling matrix. We transmit pilot symbols on positions with equidistant spacing in the frequency and the time domain. With the prior knowledge of the pilot symbols and their positions, we can obtain an initial channel estimate as in traditional MIMO-OFDM systems (see Sect. 3). We denote this pilot-based channel estimate by H p (H p ) . The pilot-based channel estimate is then used to estimate the scaling ambiguity in Eq. (18) as By multiplying the solution of the LS-KRF with the diagonal matrix ˆ , the scaling ambiguity in Eq. (18) is resolved and the data symbols can be demodulated. Note that the proposed Khatri-Rao receiver estimates the channel and the symbols in a semi-blind fashion. First, the channel and the symbols are jointly estimated without any a priori information. The pilot-based channel estimate is then used to resolve the scaling ambiguity affecting the columns of Ĥ and Ŝ . Therefore, the optimal length and repetition of the piloting sequences are identical as for the traditional OFDM systems. We summarize the steps of the proposed Khatri-Rao (KR) receiver in Algorithm 1.
Furthermore, the channel estimate resulting from the KR receiver can be used for channel tracking in future transmission frames if the channel has not changed drastically. If the channel estimate is used for tracking, it could be improved by means of an additional LS estimate from [Ỹ] ([2,4,1], [3]) with the knowledge of the estimated and projected symbols onto the finite alphabet , i.e., Q(S) = proj(S) . The finite alphabet depends on the modulation type and the modulation order M o .
However, we can also use this improved channel estimation to further improve the performance of the KR receiver. Using this updated channel estimate an improved estimate of the diagonal scaling matrix ˆ can be calculated and with that an enhanced estimate of the symbols, Ŝ LS , using Eq. (18). Note that, instead of just one LS estimate of the channel and the symbols the performance can be further enhanced with additional iterations leading to an iterative receiver. Note that the symbol matrix Ŝ LS can be estimated in the least squares sense from the three-mode unfolding of Eq. (17), but the estimation of ˆ is computationally cheaper. The KR receiver with its enhancement via LS is summarized in Algorithm 2.
Due to the additional LS-based estimates, the KR+LS algorithm has higher computational complexity than the KR algorithm.

Khatri-Rao cross-coding MIMO-OFDM
In Sect. 4, we have proposed a tensor model for KR-coded MIMO-OFDM systems that introduces an additional CP-like structure to the signal tensor. The additional CP-like structure of the signal tensor is achieved by means of a simplified Khatri-Rao coding. However, using such a Khatri-Rao coding, we add additional spreading that reduces the spectral efficiency of the system. To overcome this issue, in this section we propose to keep the CP structure of the signal tensor proposed in Sect. 4, but introduce a cross-coding approach, where the known Khatri-Rao coding matrices C 1 , . . . , C N are replaced by symbol matrices containing useful information symbols to be transmitted.
As in Sect. 4, the received signal in the frequency domain after the removal of the cyclic prefix is given by Likewise, the P = KQ frames that are divided into K groups of Q blocks ("spreading factor"). We model the channel tensor H according to Eq. (8). Details regarding this model are also provided in "Appendix. " In this section, we make use of the generalized unfolding [H] ([1,3], [4,2]) =H ⋄ (I N ⊗ 1 T M T ) that is defined in (15). The generalized unfolding ( [2,1], [4,3]) of the received signal tensor is given bŷ where the matrix S (1) n ∈ C K ×M T and S (2) n ∈ C Q×M T are the first and second symbol matrices that carry information symbols. The first symbol matrix S (1) follows the structure of the symbol matrix in Sect. 4 and is composed of a pilot part and a data symbols part (c.f. Eq. (11)). On the other hand, the second symbol matrix only contains data symbols, except its first row, which contains known symbols (e.g., row vectors composed of 1's). We refer to this transmission scheme as cross-coded MIMO-OFDM, due to the fact that S (1) n plays the role of a random KR coding with respect to S  2) and (19), the noiseless received signal is given by Inserting (15) and (20) into (21), we obtain or, alternatively, using the n-mode product notation Depending on the available a priori knowledge at the receiver side, channel estimation, symbol estimation, or joint channel and symbol estimation can be performed. Differently from the KR-coded system, where a known coding matrix is used, in the crosscoded MIMO-OFDM system, this knowledge is not available, which makes the receiver design more challenging. A joint channel and symbol estimation now involves the estimation of three factor matrices from the noisy version of the four-way CP model (22). From the three-mode, four-mode, and two-mode unfoldings of Ỹ in (19), and using (22), we can obtain the LS equations for estimating S (1) , S (2) and H , respectively: (20) [X ] ([2,1], [4,3] [4,3] [4,3]) .

Simulation results
In this section, we evaluate the performance of the proposed receivers for MIMO-OFDM systems using Monte Carlo simulations. First, we compare the performance of ZF, ILSP, and RLSP, using 5000 realizations. We consider a 2 × 2 OFDM system, with K frames, and N = 128 subcarriers. The pilot symbols are transmitted on every third subcarrier such that F = 3 and only during the first frame, i.e., K = K . Using these pilots, we obtain a pilot-based channel estimate with which we initialize all of the algorithms. The transmitted data symbols are independent and they are drawn from a quadrature amplitude modulation (4-QAM). The frequency selective propagation channel is modeled according to the 3rd Generation Partnership Project (3GPP) Pedestrian A channel (Ped A) [31]. The duration of the cyclic prefix is 32 samples and the weighting factor α = 1 , for the recursive LS. The maximum number of iterations for the iterative algorithm is set to 7. In Fig. 3, we compare the SER performance of the traditional frequency domain ZF receiver, the proposed Khatri-Rao (KR) receiver (see Algorithm 1) and the proposed Khatri-Rao receiver with one additional LS iteration (see Algorithm 2) for different numbers of transmitted blocks. In this case, note that the KR and the KR+LS receivers benefit from the increased number of frames as the channel has been kept constant during the P = Q · K frames. Moreover, as the number of frames increases, the advantages of the enhancement via LS become more pronounced.
Moreover, the SER comparison between the ZF and the Khatri-Rao-coded algorithms, for N = 128 , Q = M T , K = 2 , K = 2 , F = 4 , and different numbers of antennas are Fig. 4. The KR and KR-LS receivers benefit from an increased number of transmit antennas due to the increased spreading factor, Q = M T . The performance enhancement with the additional LS estimate is achieved for K > 2 . However, the KR receiver outperforms the ZF one even without the LS enhancement. We can observe that for the Khatri-Rao-coded algorithms, i.e., the KR and the KR-LS, the performance of the receiver is increased. However, as shown in Table 2, we linearly increase the computational complexity of the receiver, since more rank-one approximations must be computed.
In Fig. 5, we depict the SERs for these two systems. The KR receiver has similar accuracy to the ILSP and the RLSP algorithms [17] that improves with the increased SNR. The KR+LS receiver outperforms the ILSP algorithm and the KR algorithm in terms of SER. Recall that the KR-coded OFDM model in Eq. (17) has a richer tensor structure than the OFDM model in Eq. (12) due to the coding. The KR algorithm and the KR-LS algorithm effectively exploit this structure to estimate the channel and the symbols. Note that the KR-LS algorithm computes an improved estimate of the scaling matrix. Therefore, KR-LS leads to lower SER levels than the ILSP and KR algorithms.
In Fig. 6, we provide an SER comparison for two scenarios. For both scenarios, we assume Q = 2 , and the symbols are drawn from a 4-QAM modulation. Moreover, K = 5 , F = 10 , and K = 5 , for the first scenario, whereas for the second scenario K = 3 , F = 5 , and K = 3 . Hence, in the first scenario we estimate more symbols than in the second scenario, using fewer pilot symbols. As expected, we achieve a lower SER if more pilot symbols are used because they lead to a more accurate initial pilot-based channel estimate. Moreover, in Fig. 6 we see that the CC-KR+ALS receiver outperforms the CC-KR receiver. Thus, we benefit from the additional iterations and from exploiting the complete tensor structure. In contrast to CC-KR, CC-KR+ALS also estimates the channel matrix. Furthermore, the accuracy gain of the CC-KR+ALS receiver is more pronounced if we initialize the CC-KR+ALS with a less accurate pilot-based channel estimate (the gain is more pronounced for the solid lines than for the dashed lines in Fig. 6). Finally, in Fig. 7, we depict the SER performance for a 4 × 4 MIMO system, considering the following receivers: (i) ILSP receiver [17], (ii) RLSP receiver [17], (iii) KR receiver (Algorithm 1), (iv) KR-LS receiver (Algorithm 2), (v) CC-KR receiver (Algorithm 3), and (vi) CC-KR+ALS (Algorithm 4). To ensure a fair comparison in terms of spectral efficiency, the following parameters were chosen for the different receivers: The KRcoded OFDM system assumes N = 128 , F = 10 , K = 2 , K = 2 , Q = 4 , P = KQ = 8 and the symbols are modulated using 16-QAM. For the CC-coded OFDM system we assume N = 128 , F = 10 , K = 2 , K = 2 , Q = 4 , P = KQ = 8 and the symbols are drawn from a BPSK modulation. The OFDM system assumes N = 128 , F = 10 , K = 8 , K = 8 , and BPSK symbols. We see that the CC-KR receiver outperforms ILSP and RLSP receivers from [17]. In addition, the KR and KR-LS receivers for KR-coded OFDM have different slopes than the uncoded OFDM and the cross-coded OFDM, exhibiting a better performance, as expected.
In Table 2, we show the computational complexity of the compared algorithms. We take into account the main computational efforts, i.e., the computation of matrix inverses. For a matrix A ∈ C N ×M , we consider the cost of O M 3 for its inversion, and O NM 2 for the computation of its rank-one SVD approximation. The Khatri-Rao factorization-based algorithms, i.e., the KR and the CC-KR algorithms have the lowest computational effort. This is due to the fact that they compute NM T independent rank-one matrix approximations, while the remaining algorithms (ZF, RLSP, and On the other hand, in the proposed CC-KR receiver, two data symbol matrices are transmitted ( S (1) and S (2) ), increasing the spectral efficiency of the MIMO-OFDM system.

Conclusion and discussion
In this paper, we have presented a tensor model for MIMO-OFDM systems using the double contraction between a channel tensor and a transmit signal tensor. The use of double contractions allows us to derive explicit CP-like, or Tucker-like, tensor models for the received signal, which are exploited for a joint channel and symbol estimation using semi-blind algorithms. The proposed model is a very general and flexible way of describing the received signal in MIMO-OFDM systems for all subcarriers jointly.
We have also proposed Khatri-Rao-coded MIMO-OFDM models and proposed the corresponding semi-blind receivers based on the derived explicit CP-like tensor structure of the data model. In particular, the proposed KR-coded receivers, namely KR, KR+LS, CC-KR, and CC-KR+ALS, achieve a better performance in terms of the symbol error rate than the state-of-the-art schemes from the literature (ZF, ILSP, and RLSP). Also, the Khatri-Rao-based receivers (KR and CC-KR) can benefit from parallel processing, thus having a lower computational processing delay than the competitors. In addition, we have improved the performance of the Khatri-Rao-based receivers by means of an additional LS iteration (KR+LS) and an ALS procedure (CC-KR+ALS). Note that the Khatri-Rao coding strategy (KR and KR+LS) has a reduced spectral efficiency than the uncoded MIMO-OFDM system. To overcome this limitation, we have proposed a cross-coded Khatri-Rao strategy (CC-KR and CC-KR+ALS algorithms), where the "coding matrix" contains useful data symbols. For this crosscoded system, two receivers have been proposed.
A natural perspective of this work is an extension of the proposed semi-blind receivers to other multi-carrier techniques such as universal filtered multi-carrier (UFMC) and FBMC modulation [16], relay-assisted systems, and multi-user systems. In the case of a multi-user system, the proposed CC-KR algorithm, and possibly the CC-KR+ALS algorithm, can be used where the transmitted data symbols of multiple users are used as "coding matrices" to improve the total spectral efficiency of the system.