A Fast Algorithm for Selective Signal Extrapolation with Arbitrary Basis Functions

Signal extrapolation is an important task in digital signal processing for extending known signals into unknown areas. The Selective Extrapolation is a very effective algorithm to achieve this. Thereby, the extrapolation is obtained by generating a model of the signal to be extrapolated as weighted superposition of basis functions. Unfortunately, this algorithm is computationally very expensive and, up to now, efficient implementations exist only for basis function sets that emanate from discrete transforms. Within the scope of this contribution, a novel efficient solution for Selective Extrapolation is presented for utilization with arbitrary basis functions. The proposed algorithm mathematically behaves identically to the original Selective Extrapolation, but is several decades faster. Furthermore, it is able to outperform existent fast transform domain algorithms which are limited to basis function sets that belong to the corresponding transform. With that, the novel algorithm allows for an efficient use of arbitrary basis functions, even if they are only numerically defined.


I. INTRODUCTION
T HE extrapolation of signals is a very important area in digital signal processing, especially in image and video signal processing. Thereby, unknown or not accessible samples are estimated from known surrounding samples. In image and video processing, signal extrapolation tasks arise e. g. in the area of concealment of transmission errors as described in [1] or for prediction in hybrid video coding as shown in [2].
In general, signal extrapolation can be regarded as underdetermined problem as there are infinitely many different solutions for the signal to be estimated, based on the known samples. According to [3], sparsity-based algorithms are well suited for solving underdetermined problems as these algorithms are able to cover important signal characteristics, even if the underlying problem is underdetermined. These algorithms can be applied well to image and video signals, as in general natural signals are sparse [4] in certain domains, meaning that they can be described by only few coefficients.
As has been shown in [5], [6], out of the group of sparse algorithms the greedy sparse algorithms are of interest, as these algorithms are able to robustly solve the problem. One algorithm out of this group is e. g. Matching Pursuits from [7]. Another powerful greedy sparse algorithm is the Selective Extrapolation (SE) from [8]. SE iteratively generates a model of the signal to be extrapolated as weighted superposition of basis functions. In the past years, this extrapolation algorithm also has been adopted by several others like [9], [10] to solve extrapolation problems in their contexts.
Unfortunately, SE as it exists up to now is computationally very expensive. This holds except for the case that basis func-J. Seiler  tion sets are regarded that emanate from discrete transforms. In such a case, the algorithm can be efficiently carried out in the transform domain. The functions of the Discrete Fourier Transform (DFT) [11] are one example for such a basis function set. Using this set, an efficient implementation in the Fourier domain exists by Frequency Selective Extrapolation (FSE) [8]. If basis function sets are regarded that do not emanate from discrete transforms or overcomplete basis function sets or even only numerically defined basis functions, such transform domain algorithms cannot exist.
Although Fourier basis functions have proven to form a good set for a wide range of signals, there also exist signals where other basis function sets lead to better extrapolation results. This holds for example for the case that the support area on which the extrapolation is based is very unequal or in the case that very steep signal changes occur as e. g. in artificial signals. Fig. 1 shows three examples for such signals. The left column shows the original signal, the second column shows a distorted signal with the area to be extrapolated marked in black. The signals in the third column result from applying FSE which utilizes Fourier basis functions. In the last column, Selective Extrapolation is carried out with different basis function sets. In the first row, the basis function set results from the union of the functions from the Discrete Cosine Transform (DCT) [12] and the Walsh-Hadamard Transform (WHT) [13]. In the second row, a binarized version of DFT functions is used in order to reconstruct the steep changes in this artificial signal. In the third row, the basis function set emanates from the union of DFT functions and binarized DFT functions. The three examples have in common that the used basis function sets produce significantly better subjective as well as objective results than the Fourier-based extrapolation does. But they have also in common that for such sets no efficient transform domain implementation can exist which would be necessary for a fast implementation.
Within the scope of this contribution we want to introduce a novel spatial domain solution for SE which is called Fast Selective Extrapolation (FaSE). This algorithm is able to generate a model of the signal for arbitrary basis functions in the same way as the original SE, even in the case that the basis function set does not possess any structure and the basis functions are only numerically defined or in the case that an overcomplete basis function set is regarded. But at the same time, the algorithm is very fast as it can efficiently trade computational complexity versus memory consumption. The paper is organized as follows: first, SE will be reviewed for the general case of complex-valued basis functions. With that, an overview of the algorithm is given and the computationally most expensive steps are pointed out. After that, the novel Fast Selective Extrapolation is presented in detail and its complexity is compared to SE. Finally, simulation results are given for proving the abilities of the novel algorithm.

II. REVIEW OF SELECTIVE EXTRAPOLATION
For the presentation of Selective Extrapolation (SE) a scenario as shown in Fig. 2 is regarded. There, signal parts which have to be extrapolated are subsumed in loss area B. For extrapolating the signal, surrounding correctly received signal parts are used. These signal parts form the support area A. The two areas together form the so called extrapolation area L which is of size M × N samples and is depicted by the spatial coordinates m and n. The signal in L is denoted by s [m, n], but is only available in the support area A. The extrapolation of square blocks is used for presentational reasons at this point only. In general, arbitrarily shaped regions can be extrapolated. In addition to that, in general, the used basis functions can as well be larger than the regarded extrapolation area. In such a case, the extrapolation area has to be padded with zeros to be of the same size as the basis functions. But, for presentational reasons we also assume that the extrapolation area and the basis functions have the same size subsequently.
As described in [8], SE aims at generating a parametric model g the individual basis functions, one expansion coefficientĉ k is assigned to each basis function ϕ k [m, n]. The challenge is to determine which basis functions to use for the model and to calculate the corresponding weights. SE solves this problem iteratively, at which in every iteration one basis function is selected and the corresponding weight is estimated. This is achieved by successively approximating signal s [m, n] in support area A and identifying the dominant basis functions of the signal. In doing so, the signal can be continued well into area B, if an appropriate set of basis functions is used. Initially, model g (0) [m, n] is set to zero and with that the initial approximation residual is equal to the original signal. At the beginning of each iteration, in general the ν-th iteration, a weighted projection of the residual onto each basis function is conducted. For every basis function, this leads to the projection coefficient which results from the quotient of the weighted scalar product between the residual and the basis function and the weighted scalar product between the basis function and itself. In this context, the weighting function has two tasks. Firstly, it is used to mask area B from the calculation of the scalar product as there is no information available about the signal. Secondly, using the function ρ [m, n] it can control the influence different samples have on the model generation depending on their position. For instance, samples far away from loss area B can get a smaller weight and due to this weaker influence on the model generation compared to the samples close to area B. In [14], an exponentially decreasing weight is proposed withρ controlling the decay. After the projection coefficients have been calculated for all basis functions, one basis function has to be selected to be added to the model in the actual iteration. The choice falls on the basis function that minimizes the weighted distance between the approximation residual r (ν−1) [m, n] and the projection p (ν) k ϕ k [m, n] onto the according basis function. In this process, again weighting function w [m, n] from above is used. Hence, the index u (ν) of the basis function to be added in the ν-th iteration is: (7) Subsequent to the basis function selection, the corresponding weight has to be determined. In this process it has to be noted, that although the basis functions may have been orthogonal with respect to the complete extrapolation area L they cannot be anymore if the scalar products are evaluated in combination with the required weighting function. This effect is not considered in the original paper from [8] and is called orthogonality deficiency and is described in detail in [15]. In [16] fast orthogonality deficiency compensation is proposed to efficiently estimate the expansion coefficient by taking only the fraction γ of the projection coefficient: The factor γ is between zero and one and depends on the extrapolation scenario, as described in detail in [16]. After one basis function has been selected and the corresponding weight has been determined, the model and the residual have to be updated by adding the selected basis function to the model generated so far: The approximation residual can be updated in the same way and results in The above described iterations are repeated until the predefined number of I iterations is reached. Finally, area B is cut out of the model and is used for replacing the lost signal.
Alg. 1 shows the pseudo code of SE for giving a compact overview of this algorithm. Regarding this code and taking into account the equations above, the weighted projection onto all the basis functions in every iteration can be identified as computationally most expensive step. To obtain the projection, a weighted scalar product between the residual and every basis function has to be carried out, leading to a large number of multiplications and additions. Compared to this, the actual basis function selection, the expansion coefficient estimation and the model and residual update have a very small complexity.

III. FAST SELECTIVE EXTRAPOLATION
In order to solve the dilemma of the huge computational complexity of SE, we propose a novel formulation of this algorithm that also operates in the spatial domain but is as fast as transform domain algorithms which have been mentioned at the beginning. With that, the advantages of both approaches are combined: the high speed of transform domain algorithms and the independence from certain basis function sets, offered by the spatial domain SE algorithm. The high speed of the novel algorithm results from the fact that the weighted scalar products only have to be evaluated once, prior to the first iteration. In the successive iterations they can be replaced by a recursive calculation. The novel algorithm is called Fast Selective Extrapolation (FaSE) and is outlined in detail for the general complex-valued scenario subsequently. If only realvalued signals and basis functions are regarded, the conjugate complex operations can just be discarded.
Although the principal behavior of FaSE is similar to SE, not the residual r [m, n] in the spatial domain is regarded, but rather the weighted scalar products between the residual and the basis functions. This yields for depicting the weighted scalar product between the residual and the basis function with index k in the ν-th iteration. This scalar product has to be evaluated only once explicitly. This has to be done for the initial step, where the residual is equal to the original signal, leading to After the initial R (0) k have been determined, all subsequent calculations can be carried out with respect to the weighted scalar products and no explicit evaluation of the scalar products is necessary anymore.
Using R (ν) k and exploiting the fact that the square root is a monotonic increasing function for positive arguments, the basis function selection from (7) can be simplified to Using expression R for the weighted scalar product between the selected basis function and the residual from the previous iteration, the estimate for the expansion coefficient results tô . (14) Here, again fast orthogonality deficiency compensation is used to derive the estimate for the expansion coefficient from the projection coefficient. Finally, the update of the model in every iteration can be carried out according to (9). For the subsequent iterations the weighted scalar products can be updated by applying definition (11) on the residual update from (10), yielding Obviously, the weighted scalar product between the residual and a certain basis function can be easily updated from one iteration to the other by subtracting the weighted scalar product between the actual basis function and the selected one, further weighted by the estimated expansion coefficient. Since the update only incorporates the weighted scalar product between two basis functions and is independent of the actual residual, it can be carried out very fast by calculating the different weighted scalar products of all basis functions in advance. This novel formulation of the SE algorithm has two advantages. First of all, the residual now does not have to be calculated explicitly in every iteration step anymore, rather the weighted scalar products between the residual and the basis functions are updated. But more important is the fact, that the most complex calculations can be carried out in advance and can be tabulated. Namely, these are the weighted scalar products between every two basis functions and one over the square root of the weighted scalar product between a basis function and itself. This leads to the matrix containing the weighted scalar products between every two basis functions and the vector holding the inverse of the square root of the weighted scalar products. Obviously, C (k,l) and D k are independent of the input signal and the residual. Hence, they only have to be calculated once and do not have to be calculated for every extrapolation process. Thus, they can either be computed at the beginning of the extrapolation process or read from storage. During the whole computation, they are kept in memory. Furthermore, as C (k,l) is of size |D| 2 and D k has length |D|, the memory consumption is manageable without any problems.
Here, the expression |D| expresses the cardinality of dictionary D that contains all possible basis functions. Regarding the two equations above, one can see that they both depend on the weighting function. If different weighting functions are used, C (k,l) and D k have to be adapted according to the weighting function. But, regarding typical signal extrapolation tasks as e. g. error concealment or prediction, the same patterns or only a small number of different patterns occur. Therefore, this also is no problem, as C (k,l) and D k can be calculated for the different patterns in advance as well. During the generation of C (k,l) the complex symmetry of this matrix can be exploited and only |D| 2 +|D| 2 weighted scalar products have to be actually calculated.
Using these pre-calculated and tabulated values, the basis function selection from (13) can be rewritten as In addition to that, the estimation of the expansion coefficient from (14) can also be expressed very compactly bŷ Furthermore, the update of the weighted scalar products between the residual and all possible basis functions from (15) can also be formulated very efficiently by Regarding the three equations above, one can recognize that instead of evaluating the weighted scalar products in every iteration step explicitly, only one value has to be read from memory for every calculation. Thus, the very high computational load from the original spatial domain SE is traded against an increased memory consumption. But as the memory consumption still is easily manageable this is a quite reasonable exchange. The novel FaSE implementation has the further advantage that no divisions are required. With that, this implementation is suited more for fixed point or integer implementations than the original SE. In such a scenario, D k could be calculated with high accuracy and then quantized to integer or fixed point values. Thus, no expensive divisions have to be carried out within the iteration loop and the effect of error propagation due to a restricted word length can be reduced. Depending on the architecture on which the extrapolation is carried out and the regarded application, it may be preferable to store 1 and to calculate |·| 2 instead of |·|. By using this modification, the complexity could be reduced a little bit more, if the platform on which the extrapolation runs directly supports the relevant operations. Nevertheless, at this point a sufficiently high computational accuracy is assumed for the above outlined calculations. For a hardware implementation or an implementation on a digital signal processor, finite-word length effects have to be considered and further research is necessary for determining the required bit-depth of the tables and the impact of fixed-point arithmetic. In order to give a final overview of FaSE, Alg. 2 and 3 show the pseudo code for generating the tabulated values and for the actual model generation. The table generation is separated from the model generation for emphasizing again that the generation of the tables only has to be carried out once. Regarding the operations that have to be carried out within the iteration loop, one can recognize that only very simple operations have to be performed which can furthermore be processed very fast. The only computational expensive operation is the initial calculation of R (0) k , but compared to the original SE, this complex step only has to be carried out only once instead of in every iteration.

IV. COMPLEXITY EVALUATION
Regarding the two previous sections, one can recognize that FaSE is able to outperform the original SE since the computational complexity within the iteration loop is reduced and since as many calculations as possible are carried out in advance and are tabulated. To quantify the complexity of SE and FaSE, the number of operations is regarded that is necessary for generating the model by each of the algorithms. In Tab. I for SE, FaSE and the table generation for FaSE, the number of operations is listed, depending on the extent M, N of extrapolation area L, dictionary size |D| and the number of iterations I to be carried out. Here, the operations are separated into three groups, the number of multiplications (MUL), the number of additions (ADD) and the number of  Table Generation MUL  This plot only shows the overall number of operations, i. e. the sum of MUL, ADD and OTHER, in order to give a rough impression of the overall complexity and compare the different algorithms. The fact that complex operations like divisions require more processing time than a simple multiplication is omitted for this plot. It can be easily recognized that the number of operations that is necessary for generating the model by SE is several decades larger than for FaSE. The plot further shows the number of operations that is required for generating the tabulated C (k,l) and D k , indicated by a rhomb. In addition to that, the number of operations for the table generation is displayed as dashed line over the complete iteration range. It has to be noted that the table generation is independent of the iterations and this illustration is only chosen for comparing the complexity of the table generation with SE. Therewith, it can be recognized that the table generation requires roughly as many operations as 1000 iterations of SE would require. Since the number of iterations for generating the model can easily reach values larger than 200 as has been shown in [16], the expense for generating the tables amortize even after a small number of blocks. Taking into account that in typical scenarios a large number of blocks is extrapolated with the same weighting function, the complexity for generating the tables very soon becomes negligible.

V. RESULTS FOR ARBITRARY BASIS FUNCTIONS
In order to support the complexity evaluation from the previous section, the processing time for SE and FaSE is further examined. The first results presented are for arbitrary two-dimensional basis functions. In this case, only the original SE and the novel FaSE can be used, as transform domain algorithms like FSE cannot deal with arbitrary basis functions. For the runtime evaluation, the model generation has been implemented in C, compiled with gcc 4.3.2 and optimizations -O3, and the simulations have been carried out on an Intel Core2 Quad@2.83 GHz, equipped with 8 GB RAM. In order to reduce the influence from the operating system, multiple runs of the simulations have been conducted and the computation has been limited to the usage of only one single core. For the simulations, a block of size 16×16 samples is extrapolated from its surrounding samples. Furthermore, different sizes of extrapolation area L between 48 × 48 and 96 × 96 samples are regarded. Fig. 4 shows the extrapolation time per block for different numbers of candidate basis functions and for 250 iterations performed for model generation. For this plot, the cardinality of the dictionary is selected to be of the same size as the extrapolation area. Thus, D varies between |D| = 2304 basis functions of size 48 × 48 and |D| = 9216 basis functions of size 96 × 96. Comparing the two curves of SE and FaSE one can easily recognize that FaSE is about 250 times faster than the original SE, independently of the problem size. This is due to the fact, that for FaSE the computationally expensive weighted scalar products only have to be evaluated once, namely prior to the first iteration. In the later iterations, the expensive steps can be avoided by making use of the tabulated values and avoiding the update of the residual. For these evaluations, the calculation time for generating the tabulated values is not considered, as they only have to be computed once and can be stored. The very high computational cost of the weighted scalar products can also  Fig. 3 cannot be directly translated into the processing time shown in Fig. 5 since not all regarded operations consume the same processing time and since the analytical evaluation cannot account for optimizations introduced by the compiler. Fig. 6 shows the processing time for generating the tables for different dictionary sizes |D| and for different sizes of extrapolation area L. Comparing these results with the ones shown in Fig. 5 one can recognize that for an extrapolation area of size 64 × 64, a dictionary size of |D| = 4096 and 250 iterations, the table generation only takes as long as SE would roughly need for extrapolating 6 blocks. This corresponds well to the theoretical results presented in the analytical evaluation. The discrepancy follows from the fact that different operations consume unequal amounts of processing time while in the analytical evaluation only the absolute number of operations has been counted.
Since the proposed novel spatial domain solution does not affect the model generation principle of SE, still a very high extrapolation quality can be achieved. Due to the acceleration of the algorithm, now very good extrapolation results can be achieved at a manageable complexity for arbitrary basis functions. To prove this, Table II shows the average extrapolation quality in terms of PSNR and the processing time for extrapolating 126 blocks of size 16 × 16 samples in every image from the Kodak image database. For comparison, the Total Variation Image Reconstruction (TV) algorithm from [17], the patch-based algorithm from [18] that uses Stochastic Factor Graphs (SFG) and the simple but very fast Spatial-Domain Interpolation (SDI) from [19] are regarded. The comparison has been carried out in MATLAB R2008b, and again only one core of the above mentioned computer has been used. Apparently, FaSE provides the highest extrapolation quality among the considered algorithms only with SFG coming close. But at the same time, it is second fastest algorithm.

VI. MODIFICATIONS FOR TRANSFORM-BASED BASIS FUNCTION SETS
As aforementioned, for FaSE the weighted scalar products only have to be evaluated prior to the first iteration. In the case that the regarded basis function set contains a subset of basis functions that emanate from a discrete transform as e. g. functions of the DCT or the DFT, the explicit evaluation of the weighted scalar products can be simplified by replacing the summation over the product between the weighted signal and the basis function by the corresponding transform coefficient of the weighted signal which can be achieved through a fast transform. To give an example, the idea that the basis function set contains some basis functions which emanate from the DFT will be extended. In this case, a basis function is defined by with vertical frequency µ k and horizontal frequency η k . Then, the summation from (12) at frequency µ k , η k . Thus, the weighted scalar products for many basis functions can be efficiently evaluated simultaneously by making use of fast transforms like the Fast Fourier Transform [20] or respectively a fast transform that is appropriate to the regarded basis functions. It has to be noted that the utilization of fast transforms is only reasonable if a large number of transform domain coefficients has to be calculated at the same time. The fast transforms only speed up the parallel calculation of many coefficients. The calculation of just a single coefficient would take as long as the explicit evaluation of the weighted scalar product. The above described property could also be used for speeding up the table generation in (16). Regarding again the example of a subset of DFT basis functions, the product between a basis function and a conjugate complex second one is equal to a basis function where the horizontal and vertical frequency results from the difference of the original frequencies: Hence, (16) can also be expressed by the corresponding coefficients from the DFT. For other transform-based basis function sets similar properties exist. In addition to the results for arbitrary basis functions shown in Section V, the performance of FaSE and SE is compared to a transform domain algorithm. For this, FSE is regarded that utilizes Fourier functions for extrapolation. Here the circumstance has to be considered, that, as described in [8], FSE does not generate a complex-valued model. FSE selects in every iteration step one basis function and its corresponding conjugate complex one, in such a way that the model always is real-valued. Hence, in most cases two basis functions are selected in an iteration, with the exception of the real-valued constant basis function and the function with the highest possible alternation. Thus, the number of iterations has to be doubled for SE and FaSE for a fair comparison as they select only one basis function per iteration. Fig. 7 shows the processing time per block for the different approaches with |D| = 4096 Fourier basis functions of size 64 × 64. For these simulations, the initial scalar products for FaSE are expressed by the transform coefficients according to (22). Although FaSE needs twice the number of iterations as FSE for generating the model, it is still significantly faster than FSE and furthermore several magnitudes faster than the original spatial domain SE. Taking all the results from the two previous sections into account, the following recommendations can be given. In the case that the Selective Extrapolation is carried out with Fourier basis functions or other basis function sets that are based on a discrete transform, one can decide either to use a transform domain algorithm or the novel FaSE. If always the same extrapolation scenario is considered, the tables only have to calculated once and the time gain of FaSE prevails, otherwise the transform domain algorithm is the better choice as no calculation of the tables is necessary. If the extrapolation process is carried out with basis functions for which no transform domain implementation is possible, FaSE should be preferred over the original SE. FaSE is able to efficiently trade computational complexity versus memory consumption as the expensive operations only have to be carried out once. Thus, the actual iterations for generating the model become very simple and very fast.

VII. CONCLUSION
Within the scope of this contribution, we presented Fast Selective Extrapolation for image and video signal extrapolation. For this, Selective Extrapolation, a powerful signal extrapolation algorithm has been reviewed and its most complex parts have been identified. The novel algorithm behaves mathematically identical to the original algorithm but is able to outspeed the original algorithm by several decades by effectively trading memory consumption versus processing time. Furthermore, the novel algorithm is able to outperform existent fast transform domain extrapolation algorithms which are even limited to certain basis function sets. With that, it opens the door for further research on carrying out the extrapolation with different basis function sets. Up to now, the extrapolation only has been computationally manageable for special basis function sets that are based on discrete transforms. But by using Fast Selective Extrapolation, the extrapolation can be carried out for arbitrary basis functions which may even be only numerically defined. This ability allows for further research on extrapolation with signal adapted basis functions, obtained through the Karhunen-Loève Transform [21], [22], which has not been computationally feasible up to now.
Although the algorithm has been introduced only for twodimensional data sets, it can be extended straightforwardly to three dimensions by making use of the ideas from [23] and four dimensions by using [24]. There, a three-dimensional or respectively a four-dimensional model is generated in the same way as described above for two dimensions.