An Adaptive Constraint Method for Paraunitary Filter Banks with Applications to Spatiotemporal Subspace Tracking

This paper presents an adaptive method for maintaining paraunitary constraints on direct-form multichannel ﬁnite impulse response (FIR) ﬁlters. The technique is a spatiotemporal extension of a simple iterative procedure for imposing orthogonality constraints on nearly unitary matrices. A convergence analysis indicates that it has a large capture region, and its convergence rate is shown to be locally quadratic. Simulations of the method verify its capabilities in maintaining paraunitary constraints for gradient-based spatiotemporal principal and minor subspace tracking. Finally, as the technique is easily extended to multidimensional convolution forms, we illustrate such an extension for two-dimensional adaptive paraunitary ﬁlters using a simple image sequence encoding example.


INTRODUCTION
Paraunitary filters and their one-dimensional cousins, allpass filters, are important for a number of useful signal processing tasks, including coding, deconvolution and equalization, beamforming, and subspace processing [1][2][3][4][5][6][7][8][9][10][11][12].Paraunitary filters are lossless devices, such that no spectral energy is lost or gained in any targeted spatial dimension of the multichannel input signal being filtered.The main use of paraunitary filters is to alter the phase relationships of the signals being sent through them.They are also typically used to reduce the spatial dimensionality of a multichannel signal with a minimal loss of signal power in the process.
Adaptive paraunitary filters are devices that adjust their characteristics to meet some prescribed task while maintaining paraunitary constraints on the multichannel system.For a general adaptive paraunitary filtering task, an n-input, moutput multichannel system operates on the vector input sequence x(k) = [x1(k) • • • x n (k)] T to produce the output sequence where the (m × n)-dimensional matrix sequence {W p }, 0 ≤ p ≤ L − 1, with L odd (we choose an odd-length FIR filter structure for notational convenience) contains the coeffi-cients of the multichannel adaptive linear system.The goal is to minimize or maximize a cost function typically depending on the sequence {y(k)}, such as the mean-squared error E{ e(k) 2 } with e(k) = d(k) − y(k) and d(k) being an m-dimensional desired response vector sequence, or the mean output power E{ y(k) 2 }, while maintaining paraunitary constraints on {W p }.These constraints can be described in the time domain as min{L−1,L−1+l} p=max{0,l} where I m is the m-dimensional identity matrix, • T denotes the transpose operation, and M = (L − 1)/2 is typically chosen.Alternatively, they can be described in the frequency domain as for some discrete set of frequencies ω ∈ [−π, π], where W (z) is the z-transform of {W} given by Although the constraints in (2) or (3) imply a similarity to the rows of W p or W (z), the cost function is optimized and/or the input signal statistics usually cause the parameters within these rows to converge to different, unique solutions.When m = n = 1, (3) implies that the unknown system has a unit magnitude frequency response.Historically, there have been two basic approaches for adaptive paraunitary systems.The first approach builds the constraints defined by (2) or (3) into the system structure, such that the system is guaranteed by design to maintain the constraints.This approach uses a minimal parametrization, which is good for numerical reasons.The adaptation algorithm becomes more complicated, however, and stability monitoring may be necessary.Examples of this approach include the adaptive allpass filter described in [1] and the adaptive paraunitary filter described in [3].
The second approach chooses a convenient, potentially overparametrized structure for the adaptive system, for example, a multichannel finite-impulse response (FIR) filter, and adapts the coefficients of this structure in ways that approximately maintain allpass or paraunitary constraints on the system.These approaches are often simpler to implement due to their use of multiply-accumulates, and no stability monitoring is required for the FIR structures.Examples of such algorithms include the adaptive allpass filtering approach in [11] and the gradient-based adaptive paraunitary filtering algorithms in [12].The overparametrized nature of their FIR-based system structure, however, means that they are prone to numerical accumulation of errors, and clever algorithm design is required to mitigate these effects in practice.In subspace tracking, numerical issues can affect the performance of subspace tracking algorithms.Such issues have made the design of minor subspace and component tracking algorithms particularly problematic in the past, leading efforts to stabilize such methods by appropriate algorithm modifications or the specification of new gradient flows [13][14][15][16].Of course, in the simpler spatial-only case, it is possible to impose unitary constraints using a Gram-Schmidt procedure or via a symmetric square root operation, the latter of which is a projection in the Euclidean space of the vectorized system parameters [17].For a review of such techniques, see [18].Unfortunately, such methods are not easily extended to multichannel FIR filters, necessitating a novel approach to the task.
In this paper, we consider a third approach that might loosely be called a "step-and-constrain" method.In our procedure, the coefficients of the adaptive FIR system are adjusted to maximize or minimize a cost function, for example, by moving a small distance in the direction of the gradient of the cost, at which point the coefficients are adjusted back to the constraint space by a simple iterative procedure.Such ideas are not new in adaptive signal processing; see, for example, work on adaptation of coefficient vectors under unit-norm constraints [19] and the adaptation of unitary matrices [20].What is new is our discovery of an iterative technique for imposing the autocorrelation constraints in (2) on a multichannel FIR system that has a number of useful properties, including fast convergence, a reasonably large capture region, and computational simplicity.The technique is a spatiotemporal extension of a classic technique for imposing unitary constraints on close-to-unitary matrices [21].Through frequency-domain analysis of the iterative method, we analyze the dynamics of our proposed iterative procedure, showing that convergence of the method is locally quadratic.Numerical evaluations illustrate that the technique typically converges in tens of iterations when faced with significant deviations of the multichannel system away from paraunitariness, and convergence is much faster with smaller-magnitude deviations.Moreover, when combined with existing gradient-based spatiotemporal subspace tracking algorithms, the method is observed to stabilize the numerical performance of these algorithms using only a single iteration of the constraint update procedure at each time instant for both principal and minor subspace tracking tasks, and it allows much larger step sizes to be used in these algorithms for faster convergence.Finally, as the technique is easily described using convolution operations, it can be extended to multidimensional signal sets, and we provide a simple image sequence coding example to show how the method might be used in such cases.
As for notation, all signals and coefficients are assumed to be real-valued, although extensions of the described method to the complex-signal case are straightforward.As a portion of our analysis is in the frequency domain, however, we will make use of complex vectors and matrices for analytical purposes.

AN ADAPTIVE ALGORITHM FOR MAINTAINING PARAUNITARY CONSTRAINTS
In this paper, our focus is on a procedure that imposes paraunitary constraints on the matrix sequence {W p } adaptively through its operation.Thus, the adjustment of {W p } by some cost-driven procedure such as a gradient maximization or minimization approach is, for the moment, implied.The technique considered in this paper would adapt W p = W p (t) iteratively for t = {0, 1, 2, . ..} after an update based on a costdriven adaptive procedure has been applied, and this embedded stabilizing update would be executed for as many iterations as often as needed to impose the constraints given by (2) to an accuracy that matches the needs of the signal processing application at hand.In later sections, we will consider such an embedding for gradient-based spatiotemporal subspace analysis.The proposed technique for imposing paraunitary constraints is where C l (t) is defined as In both ( 5) and ( 6), the sequence {W p (t)} is assumed to be zero outside the interval p ∈ [0, (L−1)].In order to better see the structure of this algorithm, we can use the well-known connection between polynomial multiplication and convolution to describe (5) and (6).Defining the z-transform of W p (t) as this algorithm can be written as where [•] N M denotes truncation of the polynomial to the range of powers within [M, N].
Several initial comments about this algorithm can be made.
(i) The technique is a spatiotemporal extension of a classic procedure for computing the best estimate of an orthogonal matrix [21], which for a (m × n) complexvalued matrix W(t) is given by where • H denotes complex (Hermitian) transpose.This procedure has recently been rediscovered by the independent component analysis community as a simple method for maintaining orthogonality constraints in prewhitened blind source separation [22] This frequency-domain connection is exact if trunction issues are ignored, or equivalently, if L → ∞, as then we can employ the substitution z = e jω in (8) to obtain 10) is identical to (9) for W(t) = W t (e jω ).The filter truncation employed in ( 5)-( 6) or (8) for finite L, however, makes our proposed algorithm novel and distinct from the frequency-domain algorithm in (10).
(ii) The technique can also be viewed as a spatiotemporal extension of a natural gradient prewhitening procedure popular for blind source separation that has been analyzed in [23,24].The properties of the proposed method are significantly different from these natural gradient prewhitening methods, however, because of the algorithm's large effective step size.(iii) The technique requires approximately 1.25 m 2 nL 2  multiply-accumulates at each iteration.While several iterations are typically needed to move {W p (t)} towards a paraunitary sequence, the number of iterations required in an online adaptive estimation setting depends on the cost function being optimized.As we will show, in some cases a single update of this procedure per time instant is sufficient to maintain good overall performance.(iv) Since the technique involves convolution operations, fast convolution procedures can be employed to implement ( 5)-( 6) when L is large, reducing its complexity to O(m 2 nL log L) at each iteration.
The ultimate utility of the technique in ( 5)-( 6) depends on the theoretical and numerical properties of the update.We explore each of these issues in turn.

ALGORITHM ANALYSIS
In this section, we analyze the convergence behavior of the adaptive orthonormalization procedure given by ( 5)-( 6).Initially, we consider the complex extension of this procedure in the single-matrix case, where L = 1.A portion of this analysis parallels that performed in [21], although we provide extensions of the results contained therein, particularly in terms of the capture region of the method.In the sequel, we extend these results for the single-matrix algorithm to the convolutive form given in ( 5)-( 6) for an unconstrainedlength (i.e., doubly infinite noncausal) paraunitary impulse response Consider the update in ( 9) for a single (m × n) complexvalued matrix W(t).The first three of the following four theorems pertain to this update.

Theorem 1. Define a modified singular value decomposition of W(t) as
where ] has positive real-valued unordered entries, and the matrix J(t) is a diagonal matrix whose diagonal entries J i (t) are constrained to be either (+1) or (−1).Then, it is possible to define the diagonal matrix sequences Σ(t) and J(t) such that Equivalently, the following two relations hold: Proof.Let W(t 0 ) = U(t 0 )Σ(t 0 )J(t 0 )V H (t 0 ) be the modified singular value decomposition of W(t) at time t = t 0 .Then, substituting for W(t 0 ) in ( 9), we obtain after some simplification Clearly, the matrix inside the large brackets on the right-hand side of ( 14) is diagonal, implying that is diagonal.One possible situation that guarantees the diagonal nature of U H (t 0 )W(t 0 + 1)V(t 0 ) is U(t 0 ) = U(t 0 + 1) and V(t 0 ) = V(t 0 + 1), such that Define the sequences Then, setting t 0 = {0, 1, 2, . ..}, the result follows.
Theorem 2. The algorithm in (9) causes the singular values of W(t) to converge to unity if the following two conditions hold: (1) the singular values of W(0) satisfy 0 (2) none of the singular values of W(0) lead to the condition σ i (t 0 ) = √ 3 for some t 0 ≥ 1.
Proof.Neglect the ordering of the singular values of W(t), and consider the evolution of the diagonal entries of Σ(t) in (11), as defined by (17).Consider first the possibility that σ i (0) = √ 3, in which case σ i (t) = 0 for all k ≥ 1, a clearly undesirable condition.Moreover, if σ i (t 0 ) = √ 3 for some t 0 , then σ i (t) = 0 for all k ≥ t 0 + 1.Thus, values of σ i (0) that lead to σ i (t) = √ 3 must be avoided if convergence of σ i (t) to unity is desired.This verifies the second part of the theorem.
To prove the first part of the theorem, define the error criterion Then, (17) becomes Squaring both sides of (20), we get Substituting σ 2 i (t) = γ i (t) + 1, we have after some simplification the result We wish to guarantee that γ i (t) → 0, which will be the case if |γ i (t + 1)/γ i (t)| < 1 for all t.Thus, for convergence, Since γ i (t) ≥ −1, we can guarantee that |γ i (t + 1)/γ i (t)| < 1 if we satisfy the following two inequalities: Employing the constraint that γ i (t) ≥ −1, it can be shown after further study that both inequalities are satisfied if This will be the case if Finally, if σ i (0) satisfies ( 26), monotonic convergence of σ i (t) to unity is guaranteed by the inequality |γ i (t + 1)/γ i (t)| < 1 over the interval (0, √ 5), so long as σ i (t) = √ 3 for any t.Thus, the first part of the theorem follows.Finally, we note that the ordering of the singular values does not affect their numerical evolutions as defined by (17), which completes the proof of the theorem.

Theorem 3. Convergence of σ 2
i (t) to unity is locally quadratic.
Proof.This fact can be seen from the form of (22), where it can be seen for γ i (t) near zero that Theorem 4. Define the z-transform of the sequence W p (t) as in (7).Furthermore, assume that the multichannel system function is stable, such that the multichannel system frequency response W t (e jω ) satisfies tr[W t (e jω )W H t (e − jω )] < ∞.Then, for L → ∞, the algorithm in (5)-( 6) obeys all of the results of Theorems 1, 2, and 3, namely, Proof.The above results are easily seen for the case L → ∞ given the connection between ( 5)-( 6) and (10).All that is needed is the stability of W t (z), which is a condition given in the statement of the theorem.In such situations, Theorems 1, 2, and 3 hold for the spatiotemporal extension in ( 5)-( 6).
Remark 1.The results of Theorems 2 and 4 indicate that the capture region of the algorithm is somewhat larger than that predicted by the analysis in [21] for the algorithm in the L = 1 case, in which the constraint 0 < σ i (0) < √ 3 was determined. 1 As the squares of the singular values in the spatialonly algorithm analysis correspond to the multichannel frequency response of the system W t (e jω )W T t (e − jω ), the algorithm will remain stable and essentially monotonically convergent if where λ(M) denotes the spectral radius of the Hermitian symmetric matrix M. When combined with a cost-driven iterative procedure, this fact means that one should limit the step size of the cost-based portion of the overall algorithm so that the coefficients {W p (t)} remain in the stable capture region of the iterative procedure in ( 5)-( 6).For gradient-based approaches, this issue is of little concern in practice, as indicated in our simulations.Explicit stabilization of the method in more aggressive adaptation scenarios is also possible.For example, if an estimate of or bound on the largest singular value σ max (0) of W 0 (e jω ) is available, then one can scale all W p (0) by the inverse of this bound prior to employing the proposed iterative algorithm.An example of such a bound is although the computation of this bound is computationally burdensome.Simpler approaches to stabilization involving implicit coefficient normalization can be developed but will not be considered in this paper.
Remark 2. Many subspace tracking algorithms, including gradient-based approaches and power-iteration-based methods, are linearly convergent [18].Thus, our proposed procedure is ideally suited for such methods, as the quadratic convergence of our method to the constraint space means that the algorithm's overall dynamics will not be limited by the adaptive procedure in ( 5)-( 6).
Remark 3.Although the analytical results above justify the use of ( 5)-( 6) as an iterative procedure for imposing paraunitary constraints on {W p (t)}, they do not justify the choice of impulse response truncation within the algorithm, such that 1 The condition in part 2 of Theorem 2 does not preclude the existence of a dense subset of an interval in ( {C l (t)} is nonzero only for |l| ≤ (L − 1)/2 within the update in (5).Our use of truncation is motivated by the observed performance of the procedure, in which {W p (t)} converges to a sequence satisfying for |l| ≤ (L − 1)/2 up to the numerical precision of the computing environment if it is allowed to run long enough.
Algorithm 1 provides a MATLAB implementation of the adaptive paraunitary constraint procedure.The two functions orthW and gfun apply the update in ( 5)- (6) to the (nL × m) matrix Wp to obtain the paraunitary system response in W. The overall program paraunitarytest generates a perturbed paraunitary system for testing the iterative procedure in a method that we use to explore its intrinsic numerical performance in the next section.

VERIFICATION OF NUMERICAL PERFORMANCE
We now explore the behavior of the procedures in ( 9) and ( 5)-( 6) via numerical simulations.The performance metric used for these simulations is the averaged value of as computed from a set of simulation runs with different initial conditions W(0) or {W p (0)}.The first set of simulations is designed to verify that the convergence analysis of ( 9) is accurate for L = 1.For each simulation run, a ten-by-ten matrix W(0) is generated with random orthonormal real-valued left and right singular vectors and a set of ten singular values uniformly distributed in the range (0, √ 5).The procedure in ( 9) is then applied to this initial matrix.The averaged value of the performance criterion in (31) is computed from 1000 different simulation runs of the procedure, where m = n = 10.Shown in Figure 1 is the evolution of in dB, indicating that the algorithm causes W(t) to converge quickly to an orthonormal matrix if the singular values of W(0) lie within the algorithm's monotonic capture region.
The second set of simulations is designed to verify that the proposed spatiotemporal procedure in ( 5)-( 6) can be used to impose paraunitary constraints on {W p (t)}.In these simulations, m = 4, n = 7, L = 11, and {W p (0)}is initialized as where N p is a sequence of jointly Gaussian matrices having uncorrelated entries that were zero mean and standard deviation of either sig = 0.1 or sig = 0.01 (see Algorithm 1).One hundred simulation runs have been averaged to compute the performance curve shown in Figure 2.Although convergence of the performance metric is slower than that in the spatialonly case, the results show that the proposed method does cause {W p (t)} to converge to a paraunitary system.Moreover, if enough iterations are taken, the performance metric reaches the machine precision of the computing environment.For small initial perturbations away from paraunitariness, convergence of the algorithm is extremely fast, requiring only a few iterations to decrease the performance metric by more than 30 dB.

APPLICATIONS TO SPATIOTEMPORAL SUBSPACE ANALYSIS
Consider a sequence of n-dimensional vectors x(k) from a wide-sense stationary random process in which is the autocorrelation function matrix at lag l.The goal of spatiotemporal subspace analysis is to determine an n-input,  m-output paraunitary system, m < n, with impulse response W p such that the output sequence has either maximum or minimum total energy E{ y(k) 2 }, where y(k) denotes the L 2 or Euclidean norm of y(k).If E{ y(k) 2 } is maximized, then is the optimal rank-m linear filtered approximation to the vector sequence x(k) in a mean-square-error sense.Such techniques could be used to code multichannel signals, among other applications.Minimization of E{ y(k) 2 } under paraunitary constraints yields the spatiotemporal extension of the minor subspace analysis task, which is important for direction of arrival in wideband array processing systems [2,3,25,26].
In [12], simple iterative gradient-based algorithms were derived for principal and minor subspace analysis tasks.The spatiotemporal principal subspace algorithm is given by where μ(k) is the algorithm step size.This algorithm is the spatiotemporal extension of the well-known principal subspace rule [27].A spatiotemporal minor subspace algorithm is also provided in [12]; it is the spatiotemporal extension of the self-stabilized algorithm in [14].The algorithms are stochastic-gradient procedures that only approximately maintain the paraunitary constraints through their adaptive behaviors, and their ability to maintain the constraint is linked to the step size chosen for the adaptive procedure.
The proposed iterative procedure in this paper provides a potential solution to the numerical stabilization of these gradient-based algorithms, in which the imposition of the constraint is met by embedding ( 5)-( 6) within the updates in (36)-(38).In this algorithm design, we may choose to use a limited number of iterations of ( 5)-( 6) to improve the numerical performance of the overall algorithm, a choice that is motivated by the fast convergence of the constraint procedure.As it is now shown, even a single iteration of ( 5)-( 6), when used in conjunction with (36)-(38), enables fast and accurate convergence to either a principal or minor subspace estimate, depending on the sign of the step size μ(k).The simulations that follow explore these issues further.
Consider the example in [12], in which the following T , and ν i (k), i ∈ {1, 2, 3, 4} are independent zero-mean Gaussian signals with r νν,i (l) = σ 2 ν δ l and σ 2 ν = 10 −4 .We compare the performance of (36)-(37) with and without one iteration of the adaptive constraint procedure in ( 5)-( 6) per time instant, where m = 2, n = 4, L = 14, μ(k) = 0.008 for the algorithm with the adaptive constraint method, μ(k) = 0.005 for the algorithm without the adaptive constraint method, and w i j p (0) is unity if i = j and p = L/2 and is zero otherwise.Note that the step size for the algorithm with the constraint method is eight times larger than that used in the simulations in [12], and the step size for the algorithm without the constraint method was chosen to obtain the fastest convergence without instability.Shown in Figures 3(a and η(k) in (31), respectively, as averaged over one hundred different simulation runs.As can be seen, the proposed algorithm with a single iteration of the adaptive constraint method per time instant converges to an accurate subspace estimate that minimizes the low-rank mean-squared error criterion.The steady-state value of ρ PSA (k) is approximately 3.1 × 10 −4 , which is near the minimum value of 2 × 10 −4 theoretically obtainable from the data model.In contrast, the original spatiotemporal principal subspace algorithm converges more slowly due to the stability limits on the algorithm step size.Larger step sizes caused this latter algorithm to diverge, that is, it could not maintain the paraunitary constraints with a larger step size despite being locally stable to the constraint space as μ(k) → 0. Although not easily proven, the reason for the poor performance of the original method for larger step sizes could be due to the delayed-gradient approximation employed in its derivation, in which past coefficient values appear within the coefficient updates in the error terms {e(k − p)}.Such delayed-gradient terms are known to limit the convergence performance of filtered-gradient algorithms in multichannel active noise control systems [28].Computing the coefficient update terms using the most recent coefficient values requires more than 3mnL 2 multiplyaccumulates, which for m nL 2 is close to the complexity of a single step of the adaptive constraint procedure.Our novel adaptive projection method alleviates the convergence difficulties introduced by the delayed-gradient approximations and enables the algorithm to properly function for large step sizes.
We now explore the behavior of (36)-(37) with one iteration of the adaptive constraint procedure in ( 5)-( 6) per time instant when applied to the spatiotemporal minor subspace analysis task, in which μ(k) < 0. Note that, without taking any corrective measures to maintain the coefficient constraints, the update in (36)-( 37) is unstable in this context as is the spatial-only principal subspace rule that is obtained when L = 1 [27].Figures 4(a and η(k) in (31) for the same input signal model as in the previous simulation, where μ(k) = −0.005.The algorithm without stabilization (dashed lines) quickly diverges.The algorithm with proposed stabilization method performs minor subspace analysis successfully in this situation, and its convergence speed is much faster than the self-stabilized algorithm described in [12], which requires approximately 30 000 iterations to converge under these same conditions.

MULTIDIMENSIONAL EXTENSIONS
The adaptive procedure described in ( 5)-( 6) could be compactly and approximately defined as where " * " denotes discrete-time convolution over the index p and the all-important truncation issues associated with the finite-length convolutions have been ignored.This form of the adaptive procedure inspires us to consider versions of the algorithm for higher-dimensional data, such as images, video, and hyperspectral imagery.It is reasonable to assume that, with an appropriately defined convolution operator, one could extend the procedure in ( 5)-( 6) to these other data types.For example, consider an n-input, m-output twodimensional (2D) FIR linear filter of the form where {W p,q } contain the coefficients of the multichannel system.A multichannel 2D paraunitary filter would impose the constraints on the coefficients of the linear system.Translating the proposed multichannel one-dimensional paraunitary constraint procedure to this two-dimensional structure, we obtain the update in polynomial form as where is the 2D z-transform of {W p,q (t)} and [•] N M here denotes truncation of its two-dimensional polynomial argument to the individual powers for z 1 and z 2 within the range [M, N].We can illustrate the usefulness of this particular procedure with a simple video coding example, described using the MATLAB technical computing environment.
Consider the task of designing a three-input (n = 3), one-output (m = 1) paraunitary system for a set of three similar images, in which the convolution kernel for each image is of size (L×L), where L is odd.Let W1, W2, and W3 denote the corresponding 2D convolution kernel matrices, such that L = 3, and W t (z 1 , z 2 ) is a (1 × 3) vector of polynomials in z 1 and z 2 .Then, the following MATLAB code employing the function filter2 can be used to impose paraunitary constraints on the filter coefficient set {W 1, W2, W3}: To illustrate that this procedure works as designed, consider a simple video compression example.Given an image sequence, we first calculate a sequence of difference images.For every three difference images, we estimate a principal component image y(k, l) by maximizing the output power of the image pixels from the three-input, one-output paraunitary system while imposing a paraunitary constraint via the above adaptive procedure.In this procedure, we used a "center-spike" initialization strategy, where W1 and W3 were set to zero matrices and W2 had one non-zero value in the center of its impulse response.We then reconstruct the first and third difference images from the single principal component image y(k, l) using W1 and W3, resulting in  the reconstructed difference images u 1 (k, l) and u 3 (k, l), respectively.Finally, we use the reconstructed difference images to calculate two intermediate frames from every third "key" frame within the image sequence using adds and subtracts, respectively.The result is a compressed image sequence, because for every three frames, one only needs on average one "key" image frame, one principal component image frame y(k, l), and the two filtering kernels W1 and W3 to represent three images within the sequence.Of course, such a compression scheme cannot compete with more-common motion-based image compression schemes, but the success of a 2D adaptive paraunitary filter in such an application illustrates the capability and flexibility of the proposed constraint method.
We applied the above video compression scheme to a spatially downsampled version of the Cronkite video sequence obtained from the USC SIPI database, where L = 3.In this case, the images were downsampled to size 128 × 128 pixels, and a gradient-based principal component analysis procedure was used in conjunction with the adaptive 2D paraunitary constraint procedure with numiter = 50 to maximize the output powers in the principal component images.From the sixteen-frame sequence, the ten resulting reconstructed images had an average PSNR of 26.75 dB with a standard deviation of 2.17 dB.Shown in Figure 5 are the original (left) and reconstructed (right) frames from this procedure from the eleventh (top) and twelfth (bottom) frames, respectively.As can be seen, the quality of reconstruction is high, and the proposed paraunitary constraint method can be employed to solve this approximation task.

√ 5
(a) the update in(9) only changes the singular values of W t (e jω ) over time; it does not change the orientations of the left-or right-singular vectors of W t (e jω ); (b) the singular values of W t (e jω ) converge to unity as long as (i) the singular values of W 0 (e jω ) satisfy 0 < σ i (0) < for 1 ≤ i ≤ m, and (ii) none of the singular values of W 0 (e jω ) lead to the condition σ i (t 0 ) = √ 3 for some t 0 ≥ 1; (c) convergence of σ 2 i (t) to unity is locally quadratic.