Distributed GramSchmidt orthogonalization with simultaneous elements refinement
 Ondrej Slučiak^{1},
 Hana Straková^{2},
 Markus Rupp^{1}Email author and
 Wilfried Gansterer^{2}
https://doi.org/10.1186/s1363401603226
© Slučiak et al. 2016
Received: 28 May 2015
Accepted: 10 February 2016
Published: 24 February 2016
Abstract
We present a novel distributed QR factorization algorithm for orthogonalizing a set of vectors in a decentralized wireless sensor network. The algorithm is based on the classical GramSchmidt orthogonalization with all projections and inner products reformulated in a recursive manner. In contrast to existing distributed orthogonalization algorithms, all elements of the resulting matrices Q and R are computed simultaneously and refined iteratively after each transmission. Thus, the algorithm allows a tradeoff between run time and accuracy. Moreover, the number of transmitted messages is considerably smaller in comparison to stateoftheart algorithms. We thoroughly study its numerical properties and performance from various aspects. We also investigate the algorithm’s robustness to link failures and provide a comparison with existing distributed QR factorization algorithms in terms of communication cost and memory requirements.
Keywords
Distributed processing GramSchmidt orthogonalization QR factorization1 Introduction
Orthogonalizing a set of vectors is a wellknown problem in linear algebra. Representing the set of vectors by a matrix \({\mathbf {A}}\in \mathbb {R}^{n\times m}\), with n≥m, several orthogonalization methods are possible. One example is the socalled reduced QR factorization (matrix decomposition), A=Q R, with a matrix \({\mathbf {Q}}\in \mathbb {R}^{n\times m}\) having orthonormal columns, and an upper triangular matrix \({\mathbf {R}}\in \mathbb {R}^{m\times m}\) containing the coefficients of the basis transformation [1]. In the signal processing area, QR factorization is used widely in many applications, e. g., when solving linear least squares problems or decorrelation [2–4]. In adaptive filtering, a decorrelation method is typically used as a prestep for increasing the learning rate of the adaptive algorithm [5], ([6], p. 351), ([7], p. 700).
From an algorithmic point of view, there are many methods for computing QR factorization with different numerical properties. A standard approach is the GramSchmidt orthogonalization algorithm, which computes a set of orthonormal vectors spanning the same space as the given set of vectors. Other methods include Householder reflections or Givens rotations, which are not considered in this paper.
Optimization of QR factorization algorithms for a specific target hardware has been addressed in the literature several times (e.g., [8, 9]). Parallel algorithms for computing QR factorization, which are applicable for reliable systems with fixed, regular, and globally known topology, have been investigated extensively (e.g., [10–13]).
Besides parallel algorithms, there are two other potential approaches for computation across a distributed network. In the standard—centralized—approach, the data are collected from all nodes and the computation is performed at a fusion center. Another approach is to consider distributed algorithms for fully decentralized networks without any fusion center where all nodes have the same functionality and each of them communicates only with its neighbors. Such an approach is typical for sensoractuator networks or autonomous swarms of robotic networks [14]. Nevertheless, the investigation of distributed QR factorization algorithms designed for loosely coupled distributed systems with independently operating distributed memory nodes and with possibly unreliable communication links has only started recently [3, 15, 16]. In the following, we focus on algorithms for such decentralized networks.
1.1 Motivation
The main goal of this paper is to present a novel distributed QR factorization algorithm—DSCGS—which is based on the classical GramSchmidt orthogonalization. The algorithm does not require any fusion center and assumes only local communication between neighboring nodes without any global knowledge about the topology. In contrast to existing distributed approaches, the DSCGS algorithm computes the approximations of all elements of the new orthonormal basis simultaneously and as the algorithm proceeds, the values at all nodes are refined iteratively, approximating the exact values of Q and R. Therefore, it can deliver an estimate of the full matrix result at any moment of the computation. As we will show, this approach is, among others, superior to existing methods in terms of the number of transmitted messages in the network.
In Section 2, we briefly recall the concept of a consensus algorithm which we use later in the distributed orthogonalization algorithm. In Section 3, we review the basics of the QR decomposition and existing distributed methods. In Section 4, we describe the proposed distributed GramSchmidt orthogonalization algorithm with simultaneous refinements of all elements (DSCGS). We experimentally compare DSCGS with other distributed approaches in Section 5 where we also investigate the properties of DSCGS from many different viewpoints. Section 6 concludes the paper.
1.2 Notation and terminology
In what follows, we use k as the node index, \(\mathcal {N}_{k}\) denotes the set of neighbors of node k, N denotes the (known) number of nodes in the network, \(\mathcal {E}\) the set of edges (links) of the network, d _{ k } the kth node degree (\(d_{k}=\mathcal {N}_{k}\)), \(\bar {d}\) the average node degree of the network, and t a discrete time (iteration) index.
We will describe the behavior of the distributed algorithm from a network (global) point of view with the corresponding vector/matrix notation. For example, the (column) vector of all ones denoted by 1, corresponds to all nodes having value 1. In general, we denote the number of rows of a matrix by n and the number of columns by m. Elementwise division of two vectors is denoted as \({\mathbf {z}} = \frac {{\mathbf {x}}}{{\mathbf {y}}} \equiv \frac {x_{i}}{y_{i}}, \forall i\), elementwise multiplication of two vectors as z=x∘y≡x _{ i } y _{ i },∀i and of two matrices as Z=X∘Y. The operation \({\mathbf {X}}\circledast {\mathbf {Y}}\) is defined as follows: Having two matrices X=(x _{1},x _{2},…,x _{ m }) and Y=(y _{1},y _{2},…,y _{ m }), the resulting matrix \({\mathbf {Z}}={\mathbf {X}}\circledast {\mathbf {Y}}\) is a stacked matrix of all matrices Z _{ i } such that \({\mathbf {Z}}_{i}\,=\,({\mathbf {x}}_{1},{\mathbf {x}}_{2},\dots,{\mathbf {x}}_{i})\circ ((\underbrace {1,1,\dots,1}_{i})\otimes {\mathbf {y}}_{i+1})\) (⊗denotes the Kronecker product; i = 1,2,…,m−1), i.e., thus creating a big matrix containing combinations of column vectors: \({\mathbf {Z}}\in \mathbb {R}^{n\times \frac {m^{2}m}{2}}\). This later corresponds in our algorithm to the offdiagonal elements of the matrix R. Also note that all variables with the “hat” symbol, e.g., \(\hat {{\mathbf {u}}}(t)\) represent variables that are computed locally at nodes, while variables with the “tilde” symbol, e.g., \(\tilde {{\mathbf {u}}}(t)\), are updated based on the information from neighbors.
2 Average consensus algorithm
We model a wireless sensor network (WSN) by synchronously working nodes which broadcast their data into their neighborhood within a radius ρ (socalled geometric topology). The WSN is considered to be static, connected, and with errorfree transmissions (except for Section 5.4 ahead). Although the practicality of synchronicity can be argued [17, 18], we note that it is not an unrealizable assumption [19].
In the following, we briefly review the classical consensus algorithm for computing the average of values distributed in a network. Note that the algorithm can be easily adapted to computing a sum by multiplying the final average value (arithmetic mean) by the total number of nodes N.
The selection of the weight matrix W, representing the connections in a strongly connected network, crucially influences the convergence of the average consensus algorithm [20–22]. The main condition for the algorithm to converge is that the largest eigenvalue of W is equal to 1, i.e., λ _{max} = 1, with multiplicity one, and that each row of W sums up to 1. It can then be directly shown [20] that the value x _{ k }(t) at each node converges to a common global value, e.g., average of the initial values.
These weights guarantee that the consensus algorithm converges to the average of the initial values.
3 QR factorization
As mentioned in Section 1, there exist many algorithms for computing the QR factorization with different properties [1, 23]. In this paper we utilize the QR decomposition based on the classical GramSchmidt orthogonalization method (in ℓ ^{2} space).
3.1 Centralized classical GramSchmidt orthogonalization
where \(\left \{\mathbf {u}}\right \_{2} = \sqrt {\sum _{i=1}^{n}{{u_{i}^{2}}}}\) and \(\langle {\mathbf {q}}, {\mathbf {a}}\rangle = \sum _{i=1}^{n}q_{i}a_{i}\).
It is known that the algorithm is numerically sensitive depending on the singular values (condition number) of matrix A as well as it can produce vectors q _{ i } far from orthogonal when the matrix A is close to being rank deficient even in a floatingpoint precision [23]. Numerical stability can be improved by other methods, e.g., modified GramSchmidt method, Householder transformations, or Givens rotations [1, 23].
3.2 Existing distributed methods
Assuming that each node k stores its local values \({u_{k}^{2}}\) and q _{ k } a _{ k }, it is then straightforward to redefine the CGS in a distributed way, suitable for a WSN, by following the definition of the ℓ ^{2} norm, i.e., \(\left \{\mathbf {u}}\right \^{2}_{2} ={u_{1}^{2}}+{u_{2}^{2}}+\dots +{u_{n}^{2}}\) (cf. (4)), and inner products, 〈q,a〉=q _{1} a _{1}+q _{2} a _{2}+⋯+q _{ n } a _{ n } (cf. (5)). The summations can then be computed using any distributed aggregation algorithm, e.g., average consensus [20]^{1} (see Section 2), and asynchronous gossiping algorithms [24], using only communication with the neighbors.
Nevertheless, to our knowledge, all existing distributed algorithms for orthogonalizing a set of vectors are based on the gossipbased pushsum algorithm [16, 24]. Specifically in [3], authors used a distributed CGS based on gossiping for solving a distributed least squares problem and in [15], a gossipbased distributed algorithm for modified GramSchmidt orthogonalization (MGS) was designed and analyzed. The authors also provided a quantitative comparison to existing parallel algorithms for QR factorization. A slight modification of the latter algorithm was introduced in [25], which we use for comparison in this paper. We denote the two Gossipbased distributed GramSchmidt orthogonalization algorithms as GCGS [3] and GMGS [25], respectively.
Since the classical GramSchmidt orthogonalization computes each column of the matrix Q from the previous column recursively, i.e., to know vector q _{2}, we need to compute the norm of u _{2} which depends on vector q _{1}, the existing distributed algorithms always need to wait for convergence of one column before proceeding with the next column. This may be a big disadvantage in WSNs as it requires a lot of transmissions. Also, if the algorithm fails at some moment, e.g., due to transmission errors, the matrices Q and R are incomplete and unusable for further application.
In contrast, the distributed algorithm proposed in this paper overcomes these disadvantages and computes approximations of all elements of the matrices Q and R simultaneously. All the norms and inner products are refined iteratively which leads to a significant decrease of transmitted messages, and also the algorithm brings an intermediate approximation of the whole matrices Q and R at any time instance.
4 Distributed classical GramSchmidt with simultaneous elements refinement
As mentioned in Section 3.2, the GramSchmidt orthogonalization method can be computed in a distributed way using any distributed aggregation algorithm. We refer to CGS based on the average consensus (see Section 2) as ACCGS. ACCGS as well as GCGS [3] and GMGS [25] have the following substantial drawback.
In all GramSchmidt orthogonalization methods, the computation of the norms ∥u _{ i }∥ and the inner products 〈q _{ j },a _{ i }〉,〈q _{ j },q _{ j }〉, occurring in the matrices Q and R, depends on the norms and inner products computed from the previous columns of the input matrix A. Therefore, each node k must wait until the estimates of the previous norms ∥u _{ j }∥ (j < i) have achieved an acceptable accuracy before processing the next norm ∥u _{ i }∥ (a “cascading” approach; see [15]). The same holds also for computing the inner products. We here present a novel approach overcoming this drawback.
Similarly to the stateoftheart methods (see Section 3.2), we further assume that the matrices \({\mathbf {A}}\in ~\mathbb {R}^{n\times m}\) and \({\mathbf {Q}}\in ~\mathbb {R}^{n\times m}\) are distributed over the network rowwise, meaning that each node stores at least one row of the matrix A and corresponding rows of the matrix Q and each node stores the whole matrix R. In case n>N, more rows must be stored at the node and each node must sum the data locally before broadcasting to neighbors. Obviously, the data distribution over the network influences the speed of convergence of the algorithm, as can be seen also in the simulations ahead (see Section 5).
Notation A _{ k },Q _{ k }(t) here represent the rows of the matrices A and Q at a given node k at time t. If more rows are stored in one node, A _{ k } and Q _{ k }(t) are matrices, otherwise they are row vectors. Matrix R ^{(k)}(t) represents the whole matrix R at node k at time t.
From a global (network) point of view, the algorithm is defined in Algorithm 1.
Proof of convergence of DSCGS.
For the first column, vector i=1, \(\hat {{\mathbf {u}}}_{1}(t) = {\mathbf {a}}_{1}\), and thus the convergence results of the average consensus, see Section 2, apply, i.e., as t→∞, the nodes will monotonically reach the common values, i.e., \(\tilde {{\mathbf {u}}}_{1}(t)=1/N\{\mathbf {a}}_{1}\^{2}_{2}{\mathbf {1}}\) and thus also, \(\hat {{\mathbf {q}}}_{1}(t)=\frac {{\mathbf {a}}_{1}}{\{\mathbf {a}}_{1}\^{2}_{2}}\), \(\tilde {{\mathbf {q}}}_{1}(t)=1/N{\mathbf {1}}\), \(\tilde {{\mathbf {p}}}_{1}^{(1)}(t)=1/N\{\mathbf {a}}_{1}\^{2}_{2}{\mathbf {1}}\), and \(\tilde {{\mathbf {p}}}^{(2)}_{1}(t)=1/N\langle {\mathbf {a}}_{1}, {\mathbf {a}}_{2}\rangle {\mathbf {1}}\).
Furthermore, for all columns i>1, all the elements depend only on the first column (i=1), e.g., Eq. (7), \(\hat {{\mathbf {u}}}_{2}(t)={\mathbf {a}}_{2}\frac {\tilde {{\mathbf {p}}}^{(2)}_{1}(t1)\circ \hat {{\mathbf {q}}}_{1}(t1)}{\tilde {{\mathbf {q}}}_{1}(t1)}\Big (\vphantom {\frac {\hat {{\mathbf {u}}}_{1}(t)}{\sqrt {N\tilde {{\mathbf {u}}}_{1}(t1)}}}\Big.\)from Eq. (6) \(\Big.\hat {{\mathbf {q}}}_{1}(t)\! =~\!\!\frac {\hat {{\mathbf {u}}}_{1}(t)}{\sqrt {N\tilde {{\mathbf {u}}}_{1}(t1)}}\Big)\). Thus, eventually, \(\hat {{\mathbf {u}}}_{2}(t)\) will converge to u _{2} (Eq. (5)) and similarly will do all norms and inner products (Eqs. (4) and (5)) of matrix Q and R.
Intuitively, we can see that as \(\tilde {{\mathbf {u}}}_{1}(t)\) converges to its steady state, all other variables converge, with some “delay,” to their steady states as well. We may say that as the first column converges, it “drags” other elements to their steady states. In the worst case, the consequent (following) column starts to converge only when the previous column is fully converged. This behavior differs from the known methods where we have to wait for \(\tilde {{\mathbf {u}}}_{1}(t)\) to be converged before computing other terms.
We note that the normalization constant N (or ω(t), respectively) affects only^{3} the orthonormality (columns remain orthogonal but not normalized) of the columns of the matrix Q(t), and in case only orthogonality is sufficient, as in [26], we can omit this constant. We can, thus, overcome the necessity of knowing the number of the nodes or reduce the number of transmitted data in the network, respectively.
4.1 Relation to dynamic consensus algorithm
The dynamic consensus algorithm is a distributed algorithm which is able to track the average of a timevarying input signal. There exist many variations of the algorithm, e.g., [27–33]. Comparing the proposed DSCGS algorithm with a dynamic consensus algorithm from [30, 32], we observe an interesting resemblance.
5 Performance of DSCGS
In our simulations, we consider a connected WSN with N = 30 nodes. We explore the behavior of DSCGS for various topologies: fully connected (each node is connected to every other node), regular (each node has the same degree d), and geometric (each (randomly deployed) node is connected to all nodes within some radius ρ—a WSN model). If not stated otherwise, the randomly generated input matrix \({\mathbf {A}}\in ~ \mathbb {R}^{300\times 100}\) has uniformly distributed elements from the interval [0,1] and a low condition number κ(A)=35.7. In Section 5.3.2, we, however, investigate the influence of various input matrices with different condition numbers on the algorithm’s performance.
Also, except for the Sections 5.3.1 and 5.4, for the consensus weight matrix we use the metropolis weights (Eq. (2)).
The confidence intervals were computed from the several instantiations using a bootstrap method [35].
5.1 Orthogonality and factorization error

Relative factorization error—\(\frac {\left \{\mathbf {A}}{\mathbf {Q}}(t){\mathbf {R}}^{(k)}(t)\right \_{2}}{\left \{\mathbf {A}}\right \_{2}}\) —which measures the accuracy of the QR factorization at node k,

Orthogonality error— ∥I−Q(t) ^{⊤} Q(t)∥_{2} —which measures the orthogonality of the matrix Q(t) (see step 2 of the algorithm).
Note that the error at the beginning stage in Fig. 1 is caused by the disagreement and not converged norms and inner products across the nodes, i.e., the values of \(\tilde {{\mathbf {u}}}(t)\), \(\tilde {{\mathbf {Q}}}(t)\), \(\tilde {{\mathbf {P}}}^{(1)}(t)\), and \(\tilde {{\mathbf {P}}}^{(2)}(t)\). We also observe that the error floor^{4} is highly influenced by the network topology, weights of matrix W, and condition number of input matrix A. We investigate these properties in Section 5.3.
5.2 Initial data distribution
If n>N, some nodes store more than one row of A. Thus, before doing distributed summation (broadcasting to neighbors), every node has to locally sum the values of its local rows.
We observe that not only the initial distribution of the data influences the convergence behavior but also the topology of the underlying network. In the case of a regular topology (Fig. 2 a), the influence of the distribution is small and relatively weak in terms of convergence time but stronger in terms of the final accuracy achieved. We recognize that the difference between the nodes comes only from the variance of the values in input matrix A. On the other hand, in case of a highly irregular geometric topology (see Fig. 2 b), where the node with most neighbors stores most of the data, the algorithm converges much faster than in the case when most of the data are stored in a node with only few neighbors.
5.3 Numerical sensitivity
As mentioned in Section 3.1, the classical GramSchmidt orthogonalization possesses some undesirable numerical properties [1, 23]. In comparison to centralized algorithms, numerical stability of DSCGS is furthermore influenced by the precision of the mixing weight matrix W, the network topology, and properties of input matrix A, i.e., its condition number (see Fig. 5 ahead) and the distribution of the numbers in the rows of the matrix (see Figs. 2 and 3). In this section, we provide simulation results showing these dependencies.
5.3.1 Weights
5.3.2 Condition numbers
It is well known that the classical GramSchmidt orthogonalization is numerically unstable [23]. In cases when input matrix A is illconditioned (high condition number) or rankdeficient (matrix contains linear dependent columns), the computed vectors Q can be far from orthogonal even when computed with high precision.
In this section, we study the influence of the condition number of input matrix A on the accuracy of the orthogonality. The condition number is defined with respect to inversion as the ratio of the largest and smallest singular value. In comparison to classical (centralized) GramSchmidt orthogonalization, we observe (Fig. 5 a) that the DSCGS algorithm behaves similarly, although it reaches neither the accuracy of ACCGS nor of the centralized algorithm (even in the fully connected network). We observe in all of the simulations that the orthogonality error in the first phase can reach very high values (due to divisions by numbers close to zero), which may influence the numerical accuracy in the final phase.
Figure 5 also shows that GMGS is the most robust method in comparison to the others. This is caused by the usage of the modified GramSchmidt orthogonalization instead of the classical one.
5.3.3 Mixing precision
Another factor influencing the algorithm’s performance is the numerical precision of the mixing weights W. Here, we simulate the case of a geometric topology with the Metropolis weights model, where the weights are of given precision—characterized by the number of variable decimal digits (4, 8, 16, 32, “Infinite”).^{5}
As we find in Fig. 6, the error floor moves with the mixing precision. However, we must note that even for the “infinite” mixing precision the orthogonality error stalls at an accuracy (∼10^{−12}) lower than the used machine precision—taking into account also the conversion to double precision. From the simulations, we conclude that this is caused by high numerical dynamic range in the first phases of the algorithm as well as by the errors created by the misagreement among the nodes during the transient phase of the algorithm.
5.4 Robustness to link failures
In case of distributed algorithms, it is of big importance that the algorithm is robust against network failures. Typical failures in WSN are message losses or link failures, which occur due to many reasons, e.g., channel fading, congestions, message collisions, moving nodes, or dynamic topology.
We model link failures as a temporary dropout of a bidirectional connection between two nodes, meaning that no message can be transmitted between the nodes. In every time step, we randomly remove some percentage of links in the network. As a weight model, we picked the constant weights model, Eq. (8), due to its property that every node can compute at each time step the weights locally based only on the number of received messages (d _{ i }). Thus, no global knowledge is required. However, the nodes must still work synchronously.^{6}
It is worth noting that moving nodes and dynamic network topology can be modeled in the same way. We therefore argue that the algorithm is robust also to such scenarios (assuming that synchronicity is guaranteed).
5.5 Performance comparison with existing algorithms
We compare our new DSCGS algorithm with ACCGS, GCGS, and GMGS introduced in Section 3.2. Although all approaches have iterative aspects, the cost per iteration strongly differs for each algorithm. Thus, instead of providing a comparison in terms of number of iterations to converge, we compare the communication cost needed for achieving a certain accuracy of the result. We investigate the total number of messages sent as well as the total amount of data (real numbers) exchanged.
Because the message size of ACCGS is even smaller than in the gossipbased approaches, it sends the highest number of messages. Since the energy consumption in a WSN is mostly influenced by the number of transmissions [36, 37], it is better to transmit as few messages as possible (with any payload size); therefore, DSCGS is the most suitable method for a WSN scenario. However, we notice that in many cases, DSCGS does not achieve the same final accuracy of the result as the other methods.
Note that in fully connected networks, ACCGS delivers a highly accurate result from the beginning, because within the first iterations, all nodes exchange the required information with all other nodes.
Comparison of various distributed QR factorization algorithms
Total number of  Total amount of  Local memory  

sent messages  data (real numbers)  requirements  
per node  
DSCGS  N·I ^{(d)}  \(N\cdot I^{(d)} \cdot \frac {m^{2}+5m}{2}\)  O(mn/N + m ^{2}) 
ACCGS  \(N \cdot I^{(s)}\cdot \frac {(m+1)m}{2}\)  \(N \cdot I^{(s)} \cdot \frac {(m+1)m}{2}\)  O(mn/N + m ^{2}) 
GCGS  N·R·(2m−1)  \(N \cdot R \cdot \frac {m^{2}+5m2}{2}\)  O(nm/N) 
GMGS  N·R·(2m−1)  \(N \cdot R \cdot \frac {m^{2}+5m2}{2}\)  O(nm/N) 
6 Conclusions
We presented a novel distributed algorithm for computing QR decomposition and provided an analysis of its properties. In contrast to existing methods, which compute the columns of the resulting matrix Q consecutively, our method iteratively refines all elements at once. Thus, in any moment, the algorithm can deliver an estimate of both matrices Q and R. The algorithm dramatically outperforms known distributed orthogonalization algorithms in terms of transmitted messages, which makes it suitable for energyconstrained WSNs. Based on our empirical observation, we argue that the evaluation of the local factorization error at each node might lead to a suitable stopping criterion for the algorithm. We also provided a thorough study of its numerical properties, analyzing the influence of the precision of the mixing weights and condition numbers of the input matrix. We furthermore analyzed the robustness of the algorithm to link failures and showed that the algorithm is capable to reach a certain accuracy even for a high percentage of link failures.
The biggest drawback of the algorithm is the necessity to have synchronously working nodes. This leads to poor robustness when the messages are sent (or lost) asynchronously. As we showed, since the algorithm originates from the classical GramSchmidt orthogonalization, also the numerical sensitivity of the algorithm is a big issue and needs to be addressed in the future. The optimization of the weights and design of algorithm in such way that it avoids a big dynamic numerical range, especially in the first phases, is also of interest.
An alternative approach, not considered here, which could be worth of future research, would be to find a distributed algorithm as an optimization problem, e.g., mins.t. Q ^{⊤} Q=I∥A−Q R∥. In literature, there exist many distributed optimization methods, e.g., [38, 39], which could lead to even superior algorithms, with even faster convergence and smaller error floors.
Last but not least, theoretical bounds of DSCGS for the convergence time and rate remain an open issue. A first application of the algorithm has already been proposed in [26]. Also, since the proposed algorithm is not restricted to the usage in wireless sensor networks only, a transfer of the proposed algorithm onto socalled networkonchip platforms [40] could possibly lead to further new interesting and practical applications as well.
7 Endnotes
^{1}Knowing n, \(\left \{\mathbf {u}}\right \^{2}_{2} =~ n{\lim }_{\textit {t}\to \infty }{\mathbf {W}}^{t}({\mathbf {u}}\circ {\mathbf {u}})=\sum _{i=1}^{n}{u_{i}^{2}}\).
^{2} \({\lim }_{\textit {t}\to \infty }\boldsymbol {\omega }(t)=1/N{\mathbf {1}}\).
^{3}Not considering numerical properties.
^{4}Error level at which the algorithm stalls at given computational precision.
^{5}The simulations were performed in Matlab R2011b 64bit using the Symbolic Math Toolbox with variable precision arithmetic. “Infinite” precision denotes weights represented as an exact ratio of two numbers. The depicted result after “infinite” precision multiplication was converted to double precision.
^{6}If there is a link, nodes see each other and immediately exchange messages. From a mathematical point of view, this implies that weight matrix W will be doubly stochastic [1] in every time step.
8 Appendix: local algorithm
For a better clarity, we here reformulate DSCGS algorithm from the point of view of an individual node i (local point of view). Note that input matrix A is stored rowwise in the nodes, and for simplicity, we show here the case when the number of rows of matrix \({\mathbf {A}}\in \mathbb {R}^{n\times m}\) is equal to the number of nodes in the network. For a formulation from the network (global) point of view and arbitrary size of matrix A, see Section 4.
Declarations
Acknowledgements
This work was supported by the Austrian Science Fund (FWF) under project grants S10608N13 and S10611N13 within the National Research Network SISE. Preliminary parts of this work were previously published at the 46th Asilomar Conf. Sig., Syst., Comp., Pacific Grove, CA, USA, Nov. 2012 [32].
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 GH Golub, CF Van Loan, Matrix Computations, 3rd Ed. (Johns Hopkins Univ. Press, Baltimore, USA, 1996).MATHGoogle Scholar
 JM Lees, RS Crosson, in Spatial Statistics and Imaging, 20, ed. by A Possolo. Bayesian ART versus conjugate gradient methods in tomographic seismic imaging: an application at Mount St. Helens, Washington (IMS Lecture NotedMonograph SeriesHayward, CA, 1991), pp. 186–208.View ArticleGoogle Scholar
 C Dumard, E Riegler, in Int. Conf. on Telecom. ICT ’09. Distributed sphere decoding (IEEEMarrakech, 2009), pp. 172–177.Google Scholar
 G Tauböck, M Hampejs, P Svac, G Matz, F Hlawatsch, K Gröchenig, Lowcomplexity ICI/ISI equalization in doubly dispersive multicarrier systems using a decisionfeedback LSQR algorithm. IEEE Trans. Signal Process.59(5), 2432–2436 (2011).View ArticleMathSciNetGoogle Scholar
 E Hänsler, G Schmidt, Acoustic Echo and Noise Control (Wiley, Chichester, New York, Brisabne, Toronto, Singapore, 2004).View ArticleGoogle Scholar
 PSR Diniz, Adaptive Filtering—Algorithms and Practical Implementation (Springer, US, 2008).MATHGoogle Scholar
 AH Sayed, Adaptation, Learning, and Optimization over Networks, vol. 7 (Foundations and Trends in Machine Learning, BostonDelft, 2014).Google Scholar
 KJ Cho, YN Xu, JG Chung, in IEEE Workshop on Signal Processing Systems. Hardware efficient QR decomposition for GDFE (IEEEShanghai, China, 2007), pp. 412–417.Google Scholar
 X Wang, M Leeser, A truly twodimensional systolic array FPGA implementation of QR decomposition. ACM Trans. Embed. Comput. Syst.9(1), 3–1317 (2009).View ArticleGoogle Scholar
 A Buttari, J Langou, J Kurzak, J Dongarra, in Proc. of the 7th International Conference on Parallel Processing and Applied Mathematics. Parallel tiled QR factorization for multicore architectures (SpringerBerlin, Heidelberg, 2008), pp. 639–648.View ArticleGoogle Scholar
 J Demmel, L Grigori, MF Hoemmen, J Langou, Communicationoptimal parallel and sequential QR and LU factorizations (2008). Technical report, no. UCB/EECS200889, EECS Department, University of California, Berkeley.Google Scholar
 F Song, H Ltaief, B Hadri, J Dongarra, in International Conference for High Performance Computing, Networking, Storage and Analysis. Scalable tile communicationavoiding QR factorization on multicore cluster systems (IEEE Computer SocietyWashington, DC, USA, 2010), pp. 1–11.Google Scholar
 M Shabany, D Patel, PG Gulak, A lowlatency lowpower QRdecomposition ASIC implementation in 0.13 μm CMOS. IEEE Trans. Circ. Syst. I. 60(2), 327–340 (2013).MathSciNetGoogle Scholar
 A Nayak, I Stojmenović, Wireless Sensor and Actuator Networks: Algorithms and Protocols for Scalable Coordination and Data Communication (Wiley, Hoboken, NJ, 2010).View ArticleGoogle Scholar
 H Straková, WN Gansterer, T Zemen, in Proc. of the 9th International Conference on Parallel Processing and Applied Mathematics, Part I. Lecture Notes in Computer Science, 7203. Distributed QR factorization based on randomized algorithms (Springer Berlin HeidelbergBerlin, Heidelberg, 2012), pp. 235–244.Google Scholar
 H Straková, Truly distributed approaches to orthogonalization and orthogonal iteration on the basis of gossip algorithms (2013). PhD thesis, University of Vienna.Google Scholar
 O Slučiak, M Rupp, in Proc. of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Reaching consensus in asynchronous WSNs: algebraic approach (Prague, 2011), pp. 3300–3303. Chap. Acoustics, Speech and Signal Processing (ICASSP), 2011.Google Scholar
 O Slučiak, M Rupp, in Proc. of Statistical Sig. Proc. Workshop (SSP). Almost sure convergence of consensus algorithms by relaxed projection mappings (IEEEAnn Arbor, MI, USA, 2012), pp. 632–635.Google Scholar
 F Sivrikaya, B Yener, Time synchronization in sensor networks: a survey. IEEE Netw. Mag. Special Issues Ad Hoc Netw. Data Commun. Topol. Control. 18(4), 45–50 (2004).Google Scholar
 R OlfatiSaber, JA Fax, RM Murray, Consensus and cooperation in networked multiagent systems. Proc. IEEE. 95(1), 215–233 (2007).View ArticleGoogle Scholar
 L Xiao, S Boyd, Fast linear iterations for distributed averaging. Syst. Control Lett.53:, 65–78 (2004).View ArticleMathSciNetMATHGoogle Scholar
 L Xiao, S Boyd, S Lall, in Proc. ACM/IEEE IPSN–05. A scheme for robust distributed sensor fusion based on average consensus (IEEELos Angeles, USA, 2005), pp. 63–70.Google Scholar
 LN Trefethen, D Bau III, Numerical Linear Algebra (SIAM: Society for Industrial and Applied Mathematics, Philadelphia, 1997).View ArticleMATHGoogle Scholar
 D Kempe, A Dobra, J Gehrke, in Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on. Gossipbased computation of aggregate information, (2003), pp. 482–491. ISSN:02725428, doi:10.1109/SFCS.2003.1238221.
 H Straková, WN Gansterer, in 21st Euromicro Int. Conf. on Parallel, Distributed, and NetworkBased Processing (PDP). A distributed eigensolver for loosely coupled networks (IEEEBelfast, UK, 2013), pp. 51–57.Google Scholar
 O Slučiak, M Rupp, Network size estimation using distributed orthogonalization. IEEE Sig. Proc. Lett.20(4), 347–350 (2013).View ArticleGoogle Scholar
 P Braca, S Marano, V Matta, in Proc. Int. Conf. Inf. Fusion (FUSION 2008). Running consensus in wireless sensor networks (IEEECologne, Germany, 2008), pp. 152–157.Google Scholar
 W Ren, in Proc. of the 2007 American Control Conference. Consensus seeking in multivehicle systems with a timevarying reference state (IEEENew York, NY, 2007), pp. 717–722.View ArticleGoogle Scholar
 V Schwarz, C Novak, G Matz, in Proc. 43rd Asilomar Conf. on Sig., Syst., Comp. Broadcastbased dynamic consensus propagation in wireless sensor networks (IEEEPacific Grove, CA, 2009), pp. 255–259.Google Scholar
 M Zhu, S Martínez, Discretetime dynamic average consensus. Automatica. 46(2), 322–329 (2010).View ArticleMathSciNetMATHGoogle Scholar
 O Slučiak, O Hlinka, M Rupp, F Hlawatsch, PM Djurić, in Rec. of the 45th Asilomar Conf. on Signals, Systems, and Computers. Sequential likelihood consensus and its application to distributed particle filtering with reduced communications and latency (IEEEPacific Grove, CA, 2011), pp. 1766–1770.Google Scholar
 O Slučiak, H Straková, M Rupp, WN Gansterer, in Rec. of the 46th Asilomar Conf. on Signals, Systems, and Computers. Distributed GramSchmidt orthogonalization based on dynamic consensus (IEEEPacific Grove, CA, 2012), pp. 1207–1211.Google Scholar
 P Braca, S Marano, V Matta, AH Sayed, in Proc. of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Large deviations analysis of adaptive distributed detection (IEEEFlorence, Italy, 2014), pp. 6153–6157.Google Scholar
 O Slučiak, Convergence analysis of distributed consensus algorithms (2013). PhD thesis, TU Vienna.Google Scholar
 B Efron, RJ Tibshirani, An Introduction to the Bootstrap (Chapman & Hall/CRC Monographs on Statistics & Applied Probability 57, London, UK, 1994).Google Scholar
 P Rost, G Fettweis, in GLOBECOM Workshops, 2010 IEEE. On the transmissioncomputationenergy tradeoff in wireless and fixed networks (IEEEMiami, FL, 2010), pp. 1394–1399.View ArticleGoogle Scholar
 R Shorey, A Ananda, MC Chan, WT Ooi, Mobile, Wireless, and Sensor Networks: Technology, Applications, and Future Directions (Wiley, Hoboken, NJ, 2006).Google Scholar
 B Johansson, On distributed optimization in networked systems (2008). PhD thesis, KTH, Stockholm.Google Scholar
 I Matei, JS Baras, Performance evaluation of the consensusbased distributed subgradient method under random communication topologies. IEEE J. Sel. Top. Signal Process.5(4), 754–771 (2011).View ArticleGoogle Scholar
 L Benini, GD Micheli, Networks on chips: a new SoC paradigm. IEEE Comput.35(1), 70–78 (2002).View ArticleGoogle Scholar