Orthogonality is superiority in piecewise-polynomial signal segmentation and denoising

Segmentation and denoising of signals often rely on the polynomial model which assumes that every segment is a polynomial of a certain degree and that the segments are modeled independently of each other. Segment borders (breakpoints) correspond to positions in the signal where the model changes its polynomial representation. Several signal denoising methods successfully combine the polynomial assumption with sparsity. In this work, we follow on this and show that using orthogonal polynomials instead of other systems in the model is beneficial when segmenting signals corrupted by noise. The switch to orthogonal bases brings better resolving of the breakpoints, removes the need for including additional parameters and their tuning, and brings numerical stability. Last but not the least, it comes for free!


Introduction
Polynomials are an essential instrument in signal processing.They are indispensable in theory, as in the analysis of signals and systems [1] or in signal interpolation and approximation [2,3], but they have been used also in specialized application areas such as blind source separation [4], channel modeling and equalization [5], to name a few.Orthonormal polynomials often play a special role [2,6].
Segmentation of signals is one of the important applications in digital signal processing, while the most prominent sub-area is the segmentation of images.A plethora of methods exists which try to determine individual nonoverlapping parts of the signal.The neighboring segments should be identified such that they contrast in their "character." For digital signal processing, such a vague word has to be mathematically expressed in terms of signal features, which then differ from segment to segment.As examples, the segments could differ in their level, statistics, frequency content, texture properties, etc.In this article, we rely on the assumption of smoothness of individual segments, which means that segments can be distinguished by their respective underlying polynomial description.The point in signal where the character changes is called a breakpoint, i.e., a breakpoint indicates the location of segment border.The features involved in the segmentation are chosen or designed a priori (i.e., model-based class), while the other class of methods aims at learning discriminative features from the training data [7,8].
Within the first of the two classes, i.e., within approaches based on modeling, one can distinguish explicit and implicit types of models.In the "explicit" type, the signal is modeled such that it is a composition of sub-signals which often can be expressed analytically [9][10][11][12][13][14][15][16].In the "implicit" type of models, the signal is characterized by features that are derived from the signal by using an operator [17][18][19][20][21].The described differences are in an analogy to the "synthesis" and "analysis" approaches, respectively, recognized in the sparse signal processing literature [22,23].Although the two types of models are different in their nature, connections can be found, for example, the recent article [24] showing the relationship between splines and generalized total variation regularization or [21] discussing the relationship between "trend filtering" and spline-based smoothers.
Note that signal denoising and segmentation often rely on similar or even identical models.Indeed, when borders of segments are found, denoising can be easily done as postprocessing.Conversely, the byproducts of denoising can be used to detect segment borders.This paradigm is also true for our model, which can provide segmentation and signal denoising/approximation at the same time.As examples of other works that aim at denoising but can be used for segmentation as well, we cite [19,20,25,26].
The method described in this article belongs to the "explicit" type of models.We work with noisy onedimensional signals, and our underlying model assumes that individual segments can be well approximated by polynomials.The number of segments is supposed to be much lower than the number of signal samples-this natural assumption at the same time justifies the use of sparsity measures involved in segment identification.The model and algorithm presented for 1D in this article can be easily generalized to a higher dimension.For example, images are commonly modeled as piecewise smooth 2D-functions [27][28][29][30][31].
In [9,13,15], the authors build explicit signal segmentation/denoising models based on the standard polynomial basis 1, t, t 2 , . . ., t K .In our previous articles, e.g., [11,32], we used this basis as well.This article shows that modeling with orthonormal bases instead of the standard basis (which is clearly non-orthogonal) brings significant improvement in detection of the signal breakpoints and thus in the eventual denoising performance.It is worth noting that this improvement comes virtually for free, since the cost of generating an orthonormal basis is negligible compared to the cost of the algorithm which finds, in the iterative fashion, the numerical solution with such a basis fixed.
Worth to note that the method closest to ours is the one from [9], which was actually the initial inspiration of our work in the discussed direction.Similar to us, the authors of [9] combine sparsity, overcompleteness, and a polynomial basis; however, they approximate the solution to the model by greedy algorithms, while we rely on convex relaxation techniques.The other, above-cited methods do not exploit overcompleteness.Out of those, an interesting study [21] is similar to our model in that it allows piecewise polynomials of arbitrary (fixed) degree; however, it can be shown that their model does not allow jumps in signals, while our model does.This makes a significant difference, as will be shown later in the article.
The article is structured as follows: Section 2 introduces the mathematical model of segmentation/denoising, and it suggests the eventual optimization problem.The numerical solution to this problem by the proximal algorithm is described in Section 3. Finally, Sections 4 and 5 provide the description of experiments and analyze the results.

Problem formulation
In continuous time, a polynomial signal of degree K can be written as a linear combination of basis polynomials: where x k , k = 0, . . ., K, are the expansion coefficients in such a basis.If the standard basis is used, i.e., the respective scalars x k correspond to the intercept, slope, etc. Assume a discrete-time setting and limit the time instants to n = 1, . . ., N. Elements of a polynomial signal are then represented as In this formula, the signal is constructed by a linear combination of sampled polynomials.
Assuming the polynomials p k , k = 0, . . ., K, are fixed, every signal given by ( 3) is determined uniquely by the set of coefficients {x k }.In contrast to this, we introduce a time index also to these coefficients, allowing them to change in time: This may seem meaningless at this moment; however, such an excess of parameters will play a principal role shortly.It will be convenient to write this relation in a more compact form, for which we need to introduce the notation for k = 0, . . ., K.After this, we can write or even more shortly where the length of the vector x is (K + 1) times N and P is a fat matrix of size N × (K + 1)N.
Such a description of signal of dimension N is obviously overcomplete-there are (K + 1)N parameters to characterize it.Nevertheless, assume now that y is a piecewise polynomial and that it consists of S independent segments.Each segment s ∈ {1, . . ., S} is then described by K + 1 polynomials.In our notation, this can be achieved by letting vectors x k be constant within time indexes belonging to particular segments.(The polynomials in P are fixed).Figure 1 shows an illustration.The reason for not using a single number describing each segment is that the positions of the segment breakpoints are unknown and will be subject to search.
Following the above argumentation, if x k are piecewise constant, the finite difference operator ∇ applied to vectors x k produces sparse vectors.Operator ∇ computes simple differences of each pair of adjacent elements in the vector, i.e., ∇ : R . Actually, not only ∇ applied to each parameterization vector produces S − 1 nonzeros at maximum, but also the nonzero components of each ∇x k occupy the same positions across k = 0, . . ., K.
Together with the assumption that the observed signal is corrupted by an i.i.d.Gaussian noise, it motivates us to formulate the denoising/segmentation problem as finding x = arg min x reshape(Lx) 21 s.t.y − PWx 2 ≤ δ. (8) In this optimization program, W is the square diagonal matrix of size (K + 1)N that enables us to adjust the lengths of vectors placed in P and operator L represents the stacked differences such that The operator reshape() takes the stacked vector Lx to the form of a matrix with disjoint columns: Fig. 1 Illustration of the signal parameterization.The top plot shows four segments of a piecewise-polynomial signal (both the samples and the underlying continuous-time model); each segment is of the second order.The middle plot are the three basis polynomials, i.e., the diagonals of matrices P k (in this particular case, the respective sampled vectors are mutually orthonormal, actually).The parameterization coefficients shown in the bottom plot are vectors x 0 , x 1 , and x 2 .Notice that infinitely many other combinations of values in x 0 , x 1 , and x 2 generate the same signal, but we show the piecewise-constant case which is of the greatest interest for our study It is necessary to organize the vectors in such a way for the purpose of the 21 -norm which is explained below.The first term of ( 8) is the penalty.Piecewise-constant vectors x k suggest that these vectors are sparse under the difference operation ∇.As an acknowledged substitute of the true sparsity measure, the 1 -norm is widely used [33,34].Since the vectors should be jointly sparse, we utilize the 21 -norm [35] that acts on a matrix Z with p rows and is formally defined by i.e., the 2 -norm is applied to the particular rows of Z and the resulting vector is measured by the 1 -norm.Such a penalty promotes sparsity across matrix' rows, and therefore, the 21 -norm enforces the nonzero components in the matrix to lie on the same rows.
The second term in (8) is the data fidelity term.The Euclidean norm reflects the fact that gaussianity of the noise is assumed.The level of the error is required to fall below δ.Finally, vector x contains the achieved optimizers.
When standard polynomial basis {1, t, . . ., t K } is used for the definition of P, the high-order components blow up so rapidly that it brings two problems: First, the difference vectors follow the scale of the respective polynomials.In the absence of normalization, i.e., when W is identity, this is not fair with respect to the 21 -norm, since no polynomial should be preferred.In this regard, the polynomials should be "normalized" such that W contains the reciprocal of 2 -norms of the respective polynomials.It is worth noting that in our former work, in particular in [12], we basically used model ( 8), but with the difference that there has been no weighting matrix and we used (9).Finding suitable values of τ k has been a demanding trial-and-error process.In this perspective, simple substitution Wx → x brings us in fact to the model from [12], and we see that τ k should correspond to the norms of the respective polynomial.However, it still holds true that manual adjustments of these parameters can increase the success rate of the breakpoint detection, as they depend, unfortunately, on the signal itself (recall that a part of a signal can correspond to locally high parameterization values while other part does not).This is however out of scope of this article.
Second, there is the numerics issue, meaning that the algorithms (see below) used to find the solution x failed due to the too wide range of the processed values.However, for short signals (like N ≤ 500), this problem was solved by taking the time instants not as integers, but linearly spaced values from 1/N to 1, as the authors of [9] did.
This article shows that the simple idea of shifting to orthonormal polynomials solves the two problems with no extra requirements.At the same time, orthonormal polynomials result in better detection of the breakpoints.
One may also think of an alternative, unconstrained formulation of the problem: This formulation is equivalent to (8) in the sense that to a given δ, there exists λ such that the optima are identical.However, the constrained form is preferable since changing the weight matrix W does not induce any change in δ, in opposite to a possible shift in λ in (12).

Algorithms
We utilize the so-called proximal splitting methodology for solving optimization problem (8).Proximal algorithms (PA) are algorithms suitable for finding minimum of a sum of convex functions.Proximal algorithms perform iterations involving simple computational tasks such as evaluation of gradient or/and proximal operators related to the individual functions.
It is proven that under mild conditions, PA provide convergence.The speed of convergence is influenced by properties of the functions involved and by the parameters used in the algorithms.

Condat algorithm solving (8)
The generic Condat algorithm (CA) [36,37] represents one possibility for solving problems of type over x, where functions h 1 and h 2 are convex and L 1 and L 2 are linear operators.In our paper [12], we have compared two variants of CA; in the current work, we utilize the variant that is easier to implement-it does not require a nested iterative projector.
To connect ( 13) with (8), we assign Algorithm solving (8) is described in Algorithm 1. Therein, two operators are involved: Operator soft row τ (Z) takes matrix Z and performs the row-wise group soft thresholding with threshold τ on it, i.e., it maps each element of Z such that Projector proj B 2 (y,δ) (z) finds the closest point to z in the All particular operations in Algorithm 1 are quite simple, and they are obtained in O(N) time.It is worth emphasizing, however, that the number of iterations necessary to achieve convergence grows with the number of time samples N. A better notion of the computational cost is provided by Table 1.It shows that both the cost per iteration and the number of necessary iterations grow linearly, resulting in an overall O N 2 complexity of the algorithm.The cost of postprocessing (described in Section 3.2) is negligible compared to such a quantity of operations.
Algorithm 1: The Condat Algorithm solving (8) Convergence of the algorithm is guaranteed when it holds ξσ L 1 L 1 + L 2 L 2 ≤ 1.For the use of the inequality , it is necessary to have the upper bound on the operator norms.The upper bound of L 1 is: (16) and thus The operator norm of PW satisfies PW 2 = PWW P , and thus, it suffices to find the maximum eigenvalue of PW 2 P .Since PW has the multi-diagonal structure (cf.relation ( 7)), PW 2 P is diagonal, and in effect, it is enough to find the maximum on its diagonal.Altogether, the convergence is guaranteed when ξσ max diag PW 2 P + 4(K + 1) ≤ 1.

Signal segmentation/denoising
Vectors x as the optimizers of problem (8) allow a means to estimate the underlying signal; it can be done simply by ŷ = PWx.However, this way we do not obtain the segment ranges.Second disadvantage of this approach is that the jumps are typically underestimated in size, which comes from the bias inherent to the 1 norm [38][39][40] as the part of the optimization problem.
The nonzero values in ∇ x0 , . . ., ∇ xK indicate segment borders.In practice, it is almost impossible to achieve truly piecewise-constant optimizers [38] as in the model case in Fig. 1, and vectors ∇ xk are crowded by small elements, besides larger values indicating possible segment borders.We apply a two-part procedure to obtain the segmented and denoised signal: the breakpoints are detected first, and then, each detected segment is denoised individually.
Recall that the 21 -norm cost promotes significant values in vectors ∇ xk situated at the same positions.As a base for breakpoint detection, we gather ∇ xk s to a single vector using the weighted 2 -norm according to the formula where α k = 1/ max( ∇ xk ) are positive factors serving to normalize the range of values in the parameterization vectors differences.The computations in (17) are elementwise.
The comparisons presented in this article will be concerned only with the detection of breakpoints, and thus, in our further analysis, we process no more than the vector d.However, in case we would like to recover the denoised signal, we would proceed as in our former works [11,12], where first a moving median filter is applied to d and subtracted from d, allowing to keep the significant values and at the same time to push small ones toward zero.Put simply, values larger than a selected threshold then indicate the breakpoints positions.The second step is denoising itself, which is done by least squares on each segment separately, using (any) polynomial basis of degree K.

Experiment-does orthogonality help in signals with jumps?
The experiment has been designed to find out whether substituting non-orthogonal bases with the orthogonal ones reflects in emphasizing the positions of breakpoints when exploring the vector d.

Signals
As test signals, five piecewise quadratic signals (K = 2) of length N = 300 were randomly generated.They are generated such that they contain polynomial segments similar to the 1D test signals presented in [9].All signals consist of six segments of random lengths.There are noticeable jumps in value between neighboring segments, which is the difference to the test signals in [9].Five SNR values were prescribed for the experiment: 15, 20, 25, 30, and 35 dB.These numbers entered into the calculation of the respective noise standard deviation σ such that It is clear that the resulting σ is influenced by energy of the clean signal as well.For each signal and each SNR, 100 realizations of noise were generated, making a set of 2500 noisy signals in total.

Bases
Since the test signals are piecewise quadratic, the bases subject to testing all consist of three linearly independent discrete-time polynomials.For the sake of this section, the three basis vectors can be viewed as the columns of the N × 3 matrix.The connection to problem (8) is that these vectors form the diagonals of the system matrix PW.In the following, the N × 3 basis matrices will be distinguished by the letter indicating the means of their generation:

Non-orthogonal bases (B)
Most of the papers that explicitly model the polynomials utilize directly the standard basis (2), which is clearly not orthogonal either in continuous nor discrete setting.The norms of such polynomials differ significantly as well.We generated 50 B bases using formula B = SD 1 AD 2 .
Here, the elements of the standard basis-the columns of S-are first normalized using a diagonal matrix D 1 , then mixed using a random Gaussian matrix A and finally dilated to different lengths using D 2 having uniformly random entries at the diagonal.This way we acquired 50 bases, which are non-orthogonal and non-normalized at the same time.

Normalized bases (N)
Another set of 50 bases, the N bases, were obtained by simply normalizing the length of the B basis polynomials, N = BD 3 .We want to find out whether this simple step helps in detecting the breakpoints.

Orthogonal bases (O)
Orthogonal bases were obtained by orthogonalization of N bases.The process was as follows: A matrix N was decomposed by the SVD, i.e., Matrix U consists of three orthonormal columns of length N. The new orthonormal system is simply the matrix O = U.
One could doubt whether the new basis O spans the same space as N does.Since N has full rank, contains three positive values on its diagonal.Because V is also orthogonal, the answer to the above question is positive.A second question could be whether the new system is still consistent with any polynomial basis on R. The answer is yes again, since both matrices N and U can be substituted by their continuous-time counterparts, thus generating the identical polynomial.

Random orthogonal bases (R)
The last class consists of random orthogonal polynomial bases.The R bases were generated as follows: First, the SVD has been applied to the matrix N as in (20), now symbolized using the subscripts, N = U N N V N .Next, a random matrix A of size 3 × 3 was generated, each element of whose independently follows the Gauss distribution.This matrix is then decomposed to A = U A A V A .The new basis R is obtained as R = U N U A .Note that since both matrices on the right hand side are orthonormal, the columns of R form an orthonormal basis spanning the desired space.Elements of U A determine the linear combinations used in forming R.
We generated 50 such random bases, meaning that in total 200 bases (B, N, O, R) were ready for the experiment.

A note on other polynomial bases
One could think of using predefined polynomial bases as Chebychev or Legendre bases, for example.Note that such bases are defined in continuous time and are therefore orthogonal with respect to an integral scalar product [6].Sampling such systems at equidistant time-points does not lead to orthogonal bases; actually when preparing this article, we found out that their orthogonalization via the SVD (as done above) significantly changes the course of the basis vectors.As far as we know, there are no predefined discrete-time orthogonal polynomial systems.In combination with the observation that neither the sampled nor the orthogonalized systems perform better than other non-ortho-or orthosystems, respectively, we did not include any such system in our experiments.

Experiment
The algorithm of breakpoint detection that we utilized in the experiments has been described in Section 3.2.We used formula (17) for computing the input vector.The Condat algorithm run for 2000 iterations which was sufficient in all cases.Three items were subject to vary within the experiments, configuring the problem (8): • The input signal y, • parameter δ controlling the modeling error, • the basis of polynomials PW (induced from the columns of matrices B, N, O, or R).
Each signal entered into calculation with each of the bases, making 2500 × 200 experiments in total in signal breakpoints detection.

Setting parameter δ
For each of the 2500 noisy signals, the parameter δ was calculated.Since both the noisy and clean signals are known in our simulation, δ should be close to the actual 2 error caused by the noise.We found out that particular δ leading to best breakpoint detection varies around the 2 error, however.For the purpose of our comparison, we fixed a universal value of δ determined according to δ = y noisy − y clean 2 • 1.05 (21) meaning that we allowed the model error to deviate from the ground truth by 5% at maximum. Figure 3 shows the distribution of values of δ.For different signals, δ is concentrated around a different quantity.This effect is due to the noise generation, wherein the resulting SNR (18) was set and fixed at first, while δ is linked to the noise deviation σ that depends on the signal, cf.(19).Values of δ vary within the signal (which is given by particular realizations of the noise) and between the signals (which is due to fixing the SNR rather than the noise power) Note that in practice, however, δ would have to take into account not only the (even unknown) noise level, but also the modeling error, since real signals do not follow the polynomial model exactly.A good choice of δ unfortunately requires a trial process.

Evaluation
The focus of the article is to study whether orthogonal polynomials lead to better breakpoint detection than the non-orthogonal polynomials.To evaluate this, several values that indicate the quality of breakpoint detection process were computed.These markers are based on vector d.
But first, for each single signal in test, define two disjoint sets of indexes, chosen out of {1, . . ., N}: Highest values (HV): Recall that each of the clean test signals contains five breakpoints.Note also that d defined by ( 17) is nonnegative.The HV group thus gathers the indexes of the five values in d that are likely to represent breakpoints.These five indexes are selected iteratively: At first, the largest value is chosen to belong to HV.Then, since it can happen that multiple high values sit next to each other, the two neighboring indexes to the left and two to the right are omitted from further consideration.The remaining four steps select the rest of the HV members in the same manner.

Other values (OV):
The second group consists of the remaining indexes in d.The indexes excluded during the HV selection are not considered in OV.This way, the number of elements in OV is 274 at least and 289 at most, depending on the particular course of the HV selection process.
For each signal, the ratio of the averages of the values belonging to HV versus the average of the values in OV is computed; we denote this ratio AAR.We also computed the MMR indicator, which we define as the ratio of the minimum of values of HV to the maximum of the OV values.Both these indicators, and especially the MMR, should be as high as possible to enable safe recognition of the breakpoints.
The next parameter in evaluation was the number of correctly detected breakpoints (NoB).We are able to introduce NoB in our report since the true positions of the breakpoints are known.The breakpoint positions are not always found exactly, especially due to the influence of the noise (will be discussed later), and therefore, we consider the breakpoint as detected correctly if the indicated position lies within an interval of ± two indexes from the ground truth.
In addition, classical mean square error (MSE) has been involved to complete the analysis.The MSE measures the average distance of the denoised signal from the noiseless original and is defined as As y denoised , two variants were considered: (a) the direct signal estimate computed as ŷ = PWx, where x is the solution to ( 8) and (b) the estimate where the ordinary least squares have been used separately on each of the detected segments with a polynomial of degree two.
Note that approach (b) is an instance of the so-called debiasing methods, which is sometimes done in regularized regression, based on the a priori knowledge that the regularizer biases the estimate.As an example, debiasing is commonly done in LASSO estimation [39,41], where the biased solution is used only to fix the sparse vector support and least squares are then used to make a better fit on the reduced subset of regressors, see also related works [12,33,42].
The results from approach a will be abbreviated "CA" in the following, meaning "Condat Algorithm", and the results from the least squares adjustment by "LS."

Results and discussion
Using orthogonal bases reflects in significantly better results than working with non-orthogonal bases.The improvement can be observed in all parameters in consideration.The AAR, MMR, and NoB indicators increase with orthogonal bases and the MSE decreases.
An example comparison of the three types of bases in terms of the AAR is depicted in Fig. 4. A larger AAR  that it is not possible to fairly fuse results for different signals, since the signal shape and size of the jumps influence the values of the considered parameters.Another reason is that the energy of the noise differs across signals, even when the SNR is fixed (see discussion of this effect above).
However, looking at the complete list of figures which are available at the accompanying webpage 1 , the same trend is observed in all of the figures: the orthogonal(ized) bases perform better than the non-orthogonal bases.At the same time, there is no clear suggestion whether R bases are better than O bases; while Fig. 5 shows superiority of R bases, other plots at the website contain various results.The NoB is naturally the ultimate criterion for measuring the quality of segmentation.Histograms of the NoB parameter for one particular signal are shown in Fig. 6.From this figure, as well as from the supplementary material at the webpage, we can conclude that B bases are beaten by N bases.Most importantly, the two orthogonal classes of bases (O, R) perform better than the N bases in a majority of cases (although one finds situation when the systems perform on par).Looking closer to obtain a final statement whether O bases or R bases are preferable, we can see that R bases usually provide better detection of breakpoints; however, the difference is very small.This might be the result of the test database being too small.Does the distribution of NoB in Fig. 6 also suggest that some of the bases may perform better than others within the same class, when the signal and the SNR are fixed?It is not fair to make such a conclusion based on the histograms; histograms cannot reveal whether the effect on NoB is due to the particular realization of noise or it is due to differences between the bases, regardless of noise.Let us take some effort to find the answer to the question.Figures 7 and 8 show selected maps of NoB.It is clearly visible that for mild noise levels, there are bases that perform better than the others and that also a few bases perform significantly worse-in a uniform manner.In the low SNR regime, on contrary, the horizontal structures in the images prevail, meaning that specific noise shape takes over.This effect can be explained easily: the greater is the amplitude of the noise, the greater is the probability that an "undesirable" noise sample in the nearness of the breakpoint spoils its correct identification.
In practice, nevertheless, the signal to be denoised/segmented is given including the noise.In light of the presented NoB analysis (Figs. 7 and 8 in particular), it means that (especially) when SNR is high, it may be beneficial to run the optimization task (8) multiple times, i.e., with different bases, fusing the particular results for a final decision.
The last measure of performance is the MSE.First, Fig. 9 presents an example of denoising using the direct and least squares approach (those are described in Section 4.4).
Figures 10 and 11 show successful and distracting results in terms of MSE, respectively.While with signals "1" to "4, " orthobases improve MSE, it is not the case of signal "5." It is interesting to note that signal "5" does not exhibit great performance in terms of the other indicators (AAR, MMR, NoB) neither.

Software
The experiment has been done in MATLAB (2017a) on a PC with Intel i7 processor and with 16 GB of RAM.For some proximal algorithm components we benefited from using the flexible UnLocBox toolbox [43].The code related to the experiments is available via the mentioned webpage.
It is computationally cheap to generate an orthogonal polynomial system, compared to the actual cost of iterations in the numerical algorithm.For N = 300, convergence has been achieved after performing 2000 iterations of Algorithm 1.While one iteration takes about 0.5 ms, generation of one orthonormal basis (dominated by the SVD) takes up to 1 ms.

Experiment-the effect of jumps
Another experiment has been performed focusing on the sensitivity of the breakpoint detection in relation to the size of the jumps in signal.For this study, we utilized a single signal, depicted in blue in Fig. 12; the signal length was again of length N = 300.It contains five segments of similar length, and quadratic polynomials are used, similar to test signals in [9].The signal is designed such that there are no jumps on the segment borders.Nine new signals were generated from this signal in a way that segments two and four were raised up by a constant value; nine constants uniformly ranging from 5 to 45 were applied.Each signal was corrupted by gaussian noise 100 times independently, with 10 different variances.This way, 10 000 signals were available in this study in total.
As the polynomial systems, three O bases and three B bases were randomly chosen out of the set of 50 of the same kind from the experiment above.We ran the optimization problem (8) on the signals with δ set according to (21).Each solution was then transformed to the vector d (see formula (17)).Four largest elements (since there are five true segments) in d were selected and their positions were considered the estimated breakpoints.Evaluation of correctly detected breakpoints (NoB) was performed as in the above experiment, with the difference that ± 4 samples from the true position were accepted as a successful detection.
Figure 13 shows the average results.It is clear that the presence of even small jumps prioritize the use of O bases, while, interestingly, in case of little or no jumps, B bases perform slightly better (note, however, that both We the results such that although our model includes cases when the signal does not contain jumps, such cases could benefit from extending the model by the additional condition that the segments have to tie up at the breakpoints.For small jumps, our model does not resolve the breakpoints correctly, independent of the choice of the basis.

Conclusion
The experiment confirmed that using orthonormal bases is highly preferable over the non-orthogonal bases when solving the piecewise-polynomial signal segmentation/denoising problem.It has been shown that the number of correctly detected breakpoints is increased when orthobases are used.Also other performance indicators are improved on average with orthobases, and the plots show that the improvement is the more pronounced Fig. 12 Test signal with no jumps.In blue the clean signal, in red its particular noisy observation (SNR 14.2 dB, i.e., σ = 13.45), in green the recovery using the proposed method Fig. 13 The effect of jump size in signal.The plots show average NoB scores for B bases (left) and O bases (right).Color lines correspond to different σ (i.e., to the noise level), and the horizontal axis represents the size of jumps of both the second and fourth segments in signal from Fig. 12 the higher is the noise level.The effect comes almost for free, since it is cheap to generate an orthogonal system, relative to the cost of the numerical algorithm that utilizes the system.In addition, the new approach avoids demanding hands-on setting of "normalization" weights that has been done both by us and by other researchers previously.The user still has to choose δ, the quantity which includes the noise level and the model error.
Our experiment revealed that some orthonormal bases are better than others in a particular situation; our results indicate that it could be beneficial to merge detection results of multiple runs with different bases.Such a fusion process could be an interesting direction of future research.
During the revision process of this article, our paper that generalizes the model (8) to two dimensions has been published, see [44].It shows that it is possible to detect edges in images using this approach; however, it does not aim at comparing different polynomial bases.

Fig. 2
Fig. 2 Example of two noiseless and noisy test signals used in the experiment.Signals of length N = 300 consist of six segments of various length, with a perceptible jump between each two segments.The SNRs used for this illustrations were 25 and 15 dB

Fig. 3
Fig. 3 Distribution of δ parameter across the five groups of test signals.The SNR is 25 dB in this example.The box plots show the maximum and minimum, first quartile and the third quartile forming the edges of the rectangle, and the median value within the box.Values of δ vary within the signal (which is given by particular realizations of the noise) and between the signals (which is due to fixing the SNR rather than the noise power)

Fig. 4 Fig. 5
Fig. 4 Results of the AAR indicator for test signal "1."Five different SNRs in use are indicated by the subscripts.The box plot shows the distribution of the AAR under 100 realizations of random noise.In terms of the AAR distribution, random bases R and the orthonormalized bases O perform better than the other two systems.Normalization of the B bases resulted in a slight decrease of the AAR variance

Fig. 6 of 15 Fig. 7 Fig. 8
Fig.6 Results in terms of the NoB indicator.The respective 3D histograms show the frequency of the number of correctly detected breakpoints when the SNR changes, here for signal "4".For each SNR and a specific basis type, 5000 experiments were performed (50 bases times 100 noise realizations).An expected trend is pronounced that increasing value of SNR lowers the number of correctly detected breakpoints, independently of the choice of the basis.The worst results are obtained using non-orthogonal bases (B)

15 Fig. 9 2 Fig. 10 Fig. 11
Fig. 9 Example of time-domain reconstruction, test signal "1".Left side shows the noiseless and noisy signals, the plot on the right hand presents the direct signal estimate ŷ = PWx (CA), and the respective least squares refit (LS), on top of the noiseless signal.Clearly, LS radically improves the adherence to the data (and thus improves the MSE).The bias of the CA is explained in Section 3.2

Table 1
Time spent per iteration (in seconds) and the total number of iterations until convergence with respect to N, for an orthonormal polynomial base, fixed K = 2