The problem we investigate in this paper is that of determining the locations of all sensors in a network, given noisy distance measurements between some sensors. It is usually assumed that the positions of some of the sensors are known, which are referred to as anchors. This is called a localization problem. Specifically, we present a distributed algorithm that solves the maximum likelihood estimation problem for localization when the measurements are corrupted with Gaussian noise. Our algorithm is based on applying the Levenberg-Marquardt algorithm to the resulting nonlinear least-squares problem. It requires a good initialization, and we initialize it with an approximate estimate obtained from the algorithm proposed in [1], which is based on a convex relaxation of our nonlinear least-squares problem formulation.

Over the years there has been a considerable interest in wireless sensor network localization using inter-sensor distance or range measurements [2–4]. Wireless sensor networks are small and inexpensive devices with low energy consumption and computing resources. Each sensor node comprises sensing, processing, transmission, and power units, some with mobilizers [5, 6]. The applications are many, e.g., natural disaster relief, patient tracking, military targets, automated warehouses, weather monitoring, smart space, monitoring environmental information, detecting the source of pollutants, and mobile peer-to-peer computing to mention a few, as well as underwater applications [7]. The information collected through a sensor network can be used more effectively if it is know where it is coming from and where it needs to be sent. Therefore, it is often very useful to know the positions of the sensor nodes in a network. The use of global positioning system is a very expensive solution for this [8]. Instead, techniques to estimate node positions are needed that rely just on the measurements of distances and angles between neighboring nodes [2, 9]. Deployment of a sensor network for these applications can be done in a random fashion, e.g., dropped from an airplane in a disaster management application, or manually, e.g., fire alarm sensors in a facility or sensors planted underground for precision agriculture [5].

Localization in this setting is mostly based on optimizing some cost function which is dependent on the model uncertainties. The most widely used technique is based on maximizing a likelihood function, know as maximum likelihood estimation, which in general is equivalent to a non-convex optimization problem of high dimensionality [10, 11]. Both centralized and distributed algorithms have been used to solve the problem [12]. The centralized algorithms require that each sensor/agent sends its information to a central unit where an estimate of the sensors position can be computed using for example second-order optimization methods. Then the results are sent back to the agents.

The disadvantage of these algorithms is that the processing in the central unit can be computationally heavy, specially when the number of sensors are large. Distributed algorithms overcome this obstacle. These algorithms enable us to solve the problem through collaboration and communication between several computational agents, which could correspond to the sensors, without the need for a centralized computational unit. The disadvantage of these algorithms is that they might not result in as accurate estimates of the positions as for centralized algorithms. Moreover, they might require excessive communication between the senors, specially for large network sizes. The algorithm we propose in this paper is somewhere between a centralized and distributed algorithms in the sense that, instead of having one central unit as in a centralized algorithms, the sensors are grouped together in a structured way, where one computational agent is assigned for each group. The sensors then send their measurements to their groups computational agent and those agents in turn carry out the computations by communicating with one another. As a result, neither the computations in the proposed algorithm are as heavy as in centralized algorithms, nor the communication burden is as intensive as in distributed algorithms in which all the adjacent sensors communicate together in order to find a solution to the localization problem.

There have been various techniques developed to distribute the computations. We will now survey different distributed methods for localization. First, we discuss approaches which are based on the original non-convex maximum likelihood formulation. They all solve the maximum likelihood problem exactly. The authors in [13], propose a distributed multidimensional scaling algorithm which minimizes multiple local nonlinear least-squares problems. Each local problem is solved using quadratic majorizing functions. In [14], the authors present two distributed optimization approaches, namely a distributed gradient method with Barzilai-Borwein step sizes, and a distributed Gauss-Newton approach. In [15], a decentralized algorithm is devised based on the incremental subgradient method. In [11], the authors propose a distributed alternating direction method of multipliers approach. To this end, an equivalent equality constrained problem of the original nonlinear least-square problem is considered by introducing duplicate variables in the optimization problem which allows for a distributed solution. In [16], the authors reformulate the problem to obtain a gradient Lipschitz cost which in turn enables them to propose a distributed algorithm based on a majorization-minimization approach. The main shortcoming of the surveyed approaches is that they take many iterations to converge, and hence are slow, since many communications are required to reach a solution. The reason for this is either because they are based on first-order optimization methods, or as is the case for the Gauss-Newton method, a consensus algorithm is used in order to compute the search direction. Also it is difficult to effectively initialize the algorithms, since the likelihood function might have several local maxima. The latter problem can be overcome by using some approximate algorithm for localization which is easy to initialize. Then the solution from this approximate method is used to initialize the non-convex optimization problem solver. Good approximate problems can be obtained from convex relaxations of the maximum likelihood problem.

We will now continue the survey with methods based on convex relaxations of the maximum likelihood formulation. These are not only used for initialization of non-convex formulations, but are also of interest per se assuming that the approximation provides a good enough approximate localization. A good survey of semi-definite programming relaxation methods is given in [4]. The authors in [3], and [12], use the relaxation in [4] to devise distributed algorithms based on the alternating direction method of multipliers and second-order cone programming approaches, respectively. A nice property of these algorithms is that they have convergence guarantees. This, however, comes at the cost of solving a semi-definite programming at every iteration of the algorithm which imposes a substantial computational burden. In [17], a distributed algorithm in which only linear system of equations have to be solved at each iteration is proposed. They distribute the computations using message-passing over a tree. Another way to decrease the computational cost at each iteration is to consider a disk relaxation of the localization problem instead of an semi-definite programming relaxation. Based on this idea, the authors in [1] and [18], devise distributed algorithms for solving the resulting problem which rely on projection based methods and Nestrov’s optimal gradient method, respectively. In [19], the authors propose a hybrid approach based on the disk relaxation in [1] and a semi-definite programming relaxation, by fusing range and angular measurements. It should be stressed that the solutions from relaxation based methods do not provide global optima. The quality of the solutions are highly dependent on how tight the relaxation is.

As mentioned above solutions from relaxed formulations may be used to initialize the non-relaxed formulations. In [2], a semi-definite programming relaxation of the problem, combined with a regularization term is used to initialize a gradient-descent method for solving the exact maximum likelihood problem. In [20], the authors propose a hybrid solution to the localization problem. To this end, they apply a distributed alternating direction method of multipliers approach in two stages. In the first stage, they use the disk relaxation formulation of the localization problem as in [1], and then there is a smooth transition to the second stage where they use the original non-convex formulation as in [11].

Although the algorithm in [20], has faster convergence rate than what is presented in [1], the number of communications per sensor is not significantly lower, and in addition to that there is an extra communication overhead because of the existence of several duplicate variables which needs to be passed among the sensors. Notice that the number of duplicate variables in each sensor is proportional to the number of sensors that a senor can communicate with, which causes a considerable amount of computations and communications, specially if the network size is large. Also note that for the algorithms in both [1], and in [20], the computations are distributed in such a way that each sensor has to carry out its own computations and exchange messages with adjacent sensors. The authors in [20] argue that this way of distributing the computations might require an excessive communication burden, specially for large network sizes. Because of this, they discuss the possibility of devising a regional reinterpretation of their algorithm, where the sensors are partitioned in regions and there is one computational agent per region which is responsible for carrying out the computations and exchanging messages with the adjacent computational agents. We will see later that in our proposed algorithm, we also distribute the computations in such a way that not every sensor has to be involved in the computations.

In this paper, we propose a distributed algorithm based on the Levenberg-Marquardt method [21], with a localization accuracy which is better than the algorithm in [1], but with much fewer communications per sensor/agent. The accuracy is better since we solve the maximum likelihood problem and not only an approximation of it. This will show that the claim in [19], that the algorithm in [1] has equal localization accuracy compared to the one in [20], is debatable. We use an approximate estimate obtained from the relaxation based algorithm presented in [1], as the initial starting point for our algorithm. We will see that since the number of communications between agents in our algorithm is far less than the algorithm presented in [1], our algorithm can be utilized on top of the algorithm in [1], in order to improve the estimate in terms of accuracy with much less iterations than what are used in [1], and achieving better accuracy. Note that both algorithms in [1] and [20] which are based on Nesterov’s gradient and alternating direction method of multipliers approaches, respectively, are first-order methods whereas the Levenberg–Marquardt algorithm is a pseudo-second-order method as it uses approximate Hessian information. It is known that in general second-order methods require fewer iterations in order to converge than first-order methods. The reason is that second-order methods use both gradient and curvature information of the objective function, whereas first-order methods rely solely on the gradient of the objective function. As a result the number of communications between agents in the distributed Levenberg-Marquardt algorithm is expected to be lower than for the algorithms in [1] and [20].

### 1.1 Contributions

We propose a distributed algorithm for localization that solves the localization problem to highest accuracy using few communication and computations.

### 1.2 Example

In order to introduce the notation and to exemplify what results will be derived, a simple one-dimensional example will be considered. Relevant applications of this are e.g. a metro line in-between two stations with anchors at the stations or a mine tunnel with anchors at the intersection of tunnels. The anchors are positioned at \(p_{a}^{1}\) and \(p_{a}^{2}\) and the position of the other sensors \(p_{s}^{j}, j = 1, 2, 3, 4\), are such that \(p_{a}^{1} \leq p_{s}^{1} \leq \dots \leq p_{s}^{4} \leq p_{a}^{2}\). Moreover, we assume that each sensor can measure the distance of the adjacent sensors. We depict this in what is known as the inter-sensor measurement graph shown to the left in Fig. 1. The nodes represent the sensors and anchors, and there is and edge between two nodes if they can measure the distance to one another. Assume that we are given measurements *R*_{ij} between sensors *i* and *j* and measurements *Y*_{ij} between sensors *i* and anchors *j* with Gaussian measurement errors with zero mean and unit variance. Then, the maximum-likelihood problem of estimating the positions of the sensors is equivalent to

$$\begin{array}{*{20}l} \underset{P}{\text{minimize}} \:\:\:\:&\left(p_{s}^{1} - p_{a}^{1} - \mathcal Y_{11}\right)^{2} + \left(p_{s}^{2} - p_{s}^{1} - \mathcal R_{12}\right)^{2} \\ &+\left(p_{s}^{3} - p_{s}^{2} - \mathcal R_{23}\right)^{2} + \left(p_{s}^{4} - p_{s}^{3} - \mathcal R_{34}\right)^{2} \\ & +(p_{a}^{2} - p_{s}^{4} - \mathcal Y_{42})^{2} \end{array} $$

where \(P = [p_{s}^{1} \dots p_{s}^{4}]\), which is a linear least-squares problem.

The maximal subgraphs of the inter-sensor measurement graph in Fig. 1, which are complete, i.e. contains an edge from every node to every other node, are called cliques and given by \(C_{1} = \left \{p^{1}_{a}, p^{1}_{s}\right \}, C_{2} = \left \{p^{1}_{s}, p^{2}_{s}\right \}, C_{3} = \left \{p^{2}_{s}, p^{3}_{s}\right \}, C_{4} = \left \{p^{3}_{s}, p^{4}_{s}\right \}\) and \(C_{5} = \left \{p^{4}_{s}, p^{2}_{a}\right \}\). They can be arranged in a tree as is seen to the right in Fig. 1. This tree is called a clique tree. It is not unique, but it can always be arranged in such a way that any element in the intersection of two cliques will also be elements of cliques on the path between the two cliques. This is called the clique intersection property. It is not possible in general for any inter-sensor measurement graph to derive a clique tree. For this to be possible, the graph has to be what is called chordal. We will discuss this in more detail later. However, for our example, the graph is chordal, i.e. any cycle of length four or more in the graph has a chord. It is now possible to solve the least-squares problem over this clique tree by using each of the cliques as computational agents. This is done by associating terms of the objective function with different cliques. A valid assignment is that the variables of the terms that are assigned to a clique should belong to the clique. Therefore, we assign

$$ f_{1}(p_{s}^{1})=\left(p_{s}^{1} - p_{a}^{1} - \mathcal Y_{11}\right)^{2} $$

to *C*_{1},

$$ f_{2}\left(p_{s}^{1},p_{s}^{2}\right)= \left(p_{s}^{2} - p_{s}^{1} - \mathcal R_{12}\right)^{2} $$

to *C*_{2},

$$ f_{3}\left(p_{s}^{2},p_{s}^{3}\right)= \left(p_{s}^{3} - p_{s}^{2} - \mathcal R_{32}\right)^{2} $$

to *C*_{3},

$$ f_{4}\left(p_{s}^{3},p_{s}^{4}\right)= \left(p_{s}^{4} - p_{s}^{3} - \mathcal R_{34}\right)^{2} $$

to *C*_{4}, and

$$ f_{5}\left(p_{s}^{4}\right)=\left(p_{s}^{4} - p_{a}^{2} - \mathcal Y_{23}\right)^{2} $$

to *C*_{5}. Hence, the least-square problem is equivalent to

$$ \underset{P}{\text{minimize}} \:\:\:\: f_{1}\left(p_{s}^{1}\right) + f_{2}\left(p_{s}^{1},p_{s}^{2}\right) + f_{3}\left(p_{s}^{2},p_{s}^{3}\right) + f_{4}\left(p_{s}^{3},p_{s}^{4}\right) + f_{5}\left(p_{s}^{4}\right) $$

We then start with the leaf clique *C*_{5} and its corresponding function \(f_{5}\left (p^{4}_{s}\right)\) and minimize it with respect to the variables that are not shared with its parent. There is no such variable and hence the minimization is not carried out. We then let \(m_{54}\left (p^{4}_{s}\right)=f_{5}\left (p_{s}^{4}\right)\), which is called a message function or value function. This is added to the objective function corresponding to the parent of *C*_{5}, i.e. to \(f_{4}\left (p^{3}_{s}, p^{4}_{s}\right)\). Notice that any quadratic function can be represented with a matrix and a vector, and hence this is the only information that has to be passed to the parent. We then again minimize the resulting function with respect to the variables that are not shared with its parent, i.e. \(p^{4}_{s}\). Since the problem is convex and quadratic, this is equivalent of solving a linear equation, and after back substitution of the solution, the objective function value will be a quadratic function of \(p^{3}_{s}\), which we denote by \(m_{43}\left (p^{3}_{s}\right)\). We then add the message function to the objective function corresponding to the parent of *C*_{4} i.e. to \(f_{3}\left (p^{2}_{s}, p^{3}_{s}\right)\) and repeat the procedure until we reach the root clique. For the root clique *C*_{1}, we now can optimize \(f_{1}\left (p^{1}_{s}\right) + m_{21}\left (p^{1}_{s}\right)\) with respect to the remaining variable \(p^{1}_{s}\), where \(m_{21}(p^{1}_{s})\) is a message from the child clique. By parsing this solution down the clique tree the remaining optimal variables can be computed assuming that the parametric solutions have been stored in the nodes of the clique tree.

The fact that the problem is convex and quadratic, makes it easy to compute the messages. In general, this is not the case, but we will use this procedure not for the optimization problem itself but for computing the search directions in a non-linear least-square method, in particular the Levenberg-Marquardt method [21]. These equations are linear equations and correspond to a quadratic approximation of the problem at the current iterate of the Levenberg-Marquardt method. All other computations in the Levenberg-Marquardt method also distribute over the clique tree. We see that what we are doing in this example is nothing but serial dynamic programming. In general, the clique tree will not be a chain, and then we will carry out dynamic programming or message passing over a tree, see [22] for details. The clique tree is not unique. For the example we can just as well make *C*_{5} the root and *C*_{1} the leaf. Moreover, we can take *C*_{3} as root and get two branches with *C*_{1} and *C*_{5} as leafs. This will facilitate parallel computations.

### 1.3 Outline

In Section 2, we review the maximum likelihood formulation of the localization problem. In Section 3, we discuss how to find the clique tree and how to assign subproblems, in order to distribute the computations for a general optimization method. In Section 4, we review the Levenberg-Marquardt algorithm for solving non-linear least-square problems. In Section 5, we discuss how to distribute the computations in the Levenberg-Marquardt algorithm using the clique tree. Numerical experiments are presented in Section 6,and we conclude the paper in Section 7.

### 1.4 Notations and definitions

We denote by \(\mathbb {R}\), the set of real scalars and by \(\mathbb {R}^{n\times m}\), the set of real *n*×*m* matrices. The transpose of a matrix *A* is denoted by *A*^{T}. We denote the set of positive integers \(\{1,2,\dots,p\}\), with \(\mathbb {N}_{p}\). With *x*_{i}, we denote the *i*th componenet of the vector *x*. For a square matrix *X*, we denote with diag(*X*), a vector with its elements given by the diagonal elements of *X*.

A graph is denoted by \(G(V, \mathcal {E})\), where \(V = \{1,\dots,n\}\) is its set of vertices or nodes and \(\mathcal {E} \subseteq V \times V \)denotes its set of edges. Vertices *i*,*j*∈*V* are adjacent if \((i, j) \in \mathcal {E}\), and we denote the set of adjacent vertices of *i* by \(Ne(i) = \left \{j \in V |(i, j) \in \mathcal {E}\right \}\). A graph is said to be complete if all its vertices are adjacent. An induced graph by *V*^{′}⊆*V* on \(Q(V, \mathcal {E})\), is a graph \(Q_{I}(V', \mathcal {E}')\), where \(\mathcal {E}' = \mathcal {E} \cap V' \times V'\). A clique *C*_{i} of \(Q(V, \mathcal {E})\) is a maximal subset of *V* that induces a complete subgraph on *Q*, i.e., no clique is properly contained in another clique [23]. Assume that all cycles of length at least four of \(Q(V, \mathcal {E})\) have a chord, where a chord is an edge between two non-consecutive vertices in a cycle. This graph is then called chordal [24, Ch. 4]. It is possible to make graphs chordal by adding edges to the graph. The resulting graph is then referred to as a chordal embedding. Let \(C_{Q} = \left \{C_{1},\dots,C_{q}\right \}\) denote the set of its cliques, where *q* is the number of cliques of the graph. Then there exists a tree defined on *C*_{Q} such that for every *C*_{i},*C*_{j}∈*C*_{Q} where *i*≠*j*,*C*_{i}∩*C*_{j} is contained in all the cliques in the path connecting the two cliques in the tree. This property is called the clique intersection property [23]. Trees with this property are referred to as clique trees.