2.1 Overview
The TALA is a batch estimation algorithm that utilises all measurements generated within a time window by an array of sensors, in order to detect and localise an unknown number of target events (i.e. intermittent signals, such as acoustic impulses or radio frequency transmissions). Initially, the TALA generates “candidate” target locations, and then performs “soft” nearest neighbour data association (e.g. [21]), allowing each measurement to be associated with more than one candidate location. This approach removes the need to perform global multisensor/multitarget data association, e.g. as necessary in [20], thereby maintaining computational feasibility, even for large scale problems.
Using the measurements associated with each candidate location, ML estimation is then performed in order to localise each potential target. The ML estimation problem cannot be solved analytically, and an iterative GN approach (e.g. [24]) is used to solve an equivalent nonlinear least squares problem. The GN approach performs iterative gradient descent, and in order to combat potential divergence, line search and randomisation are used to ensure that each iteration increases the value of the likelihood. It is noted that alternative techniques, such as the NewtonRaphson (NR) approach (e.g. [25]) or the LevenbergMarquardt algorithm [26, 27], could also be used to perform the gradient descent and may offer similar performance.
2.2 Summary of the main steps
The main steps in the TALA are as follows:

1.
Step 1: Determine initial candidate locations

If possible (generally only in twodimensional emitter geolocation), determine the intersection between measurements generated by each pair of sensors.

More generally, determine a candidate location that minimises a Mahalanobisbased distance metric using measurements generated by each pair of sensors.

These points form the initial candidate (target) location set.

In cases for which performing measurement intersection or Mahalanobis distance minimisation is problematic/impossible, initial candidate locations should be randomly sampled within the surveillance region.

2.
Step 2: Associate measurements and determine likelihood for each candidate location

Determine the measurement from each sensor that has the greatest individual likelihood (or equivalently the smallest Mahalanobis distance) for each candidate location.

This measurement is associated with the location provided that the individual likelihood is greater than a prespecified threshold value.

Each measurement is allowed to be associated with more than one candidate location.

The overall likelihood of each candidate location is calculated using all of the associated measurements.

3.
Step 3: Candidate location deletion

The number of candidate locations can be large.

To reduce the computational expense of the algorithm, at this stage, some of the candidate locations are deleted.

A candidate location is deleted if it either has too few measurements associated with it or if it shares identical associations with another candidate target location that has a greater overall likelihood.

Optionally, the candidate location is deleted if it shares any associations with another candidate target location that has a greater overall likelihood.

4.
Step 4: Maximum likelihood estimation

Using the candidate locations retained from Step 3, plus the measurements associated with each location, determine ML estimates via an iterative GN approach.

Optionally, measurement reassociation may be performed on each iteration of the GN algorithm.

5.
Step 5: Final downselection/outputs
An illustrative example of the TALA is shown in Figs. 1 and 2.
2.3 Step 1: Determine initial candidate locations
An N sensor system is considered. Let n(i) denote the number of measurements, each of dimensionality d
_{
i
}, generated by sensor i. Let z(i,j) denote the jth measurement generated by sensor i. It is assumed that targetgenerated measurements are corrupted by additive Gaussian noise. Hence, for a target located at coordinates \(\boldsymbol {X}\in \mathbb {R}^{3}\), each targetgenerated measurement at sensor i is given as follows:
$$\begin{array}{@{}rcl@{}} \boldsymbol{z}(i,.) &=& \boldsymbol{f}(\boldsymbol{X};i) + \boldsymbol{e}(i) \end{array} $$
(1)
where \(\boldsymbol {f}(\boldsymbol {X};i)\triangleq \left (f_{1}(\boldsymbol {X};i) \ \ldots \ f_{d_{i}}(\boldsymbol {X};i)\right)'\). Each measurement error \(\boldsymbol {e}(i)\sim {\mathcal N}(0, \boldsymbol {\Sigma }_{i})\), with Σ
_{
i
} denoting the error covariance of each targetgenerated measurement at sensor i.
The first step in the TALA is to generate a set of initial candidate location hypotheses, with these hypotheses then manipulated in order to determine ML estimates of the locations of an unknown number of targets. Therefore, it would seem prudent to choose candidate locations that are consistent with the measurements. To this end, the following methodology is used to generate initial candidate locations:

1.
If the focal problem is concerned with the geolocation of targets within a twodimensional region (e.g. the geolocation of groundbased targets within a geographically flat region), initial candidate locations can be determined as the intersection of each pair of measurements (if such an intersecting point exists). In later simulations, the intersections between pairs of AOA measurements and pairs of DDOA measurements are used to generate initial candidate locations.

2.
For more complex applications in which measurement intersection cannot be performed (e.g. threedimensional target geolocation, in which case the measurements are extremely unlikely to intersect because of the presence of measurement errors), for each pair of measurements \(\boldsymbol {\hat z}\triangleq \left (\boldsymbol {z}(i_{1},.)' \ \boldsymbol {z}(i_{2},.)'\right)'\), for i
_{1}≠i
_{2}; a candidate location X
_{
c
} can be determined by minimising the Mahalanobis distance between \(\boldsymbol {\hat z}\) and \(\boldsymbol {\hat f}(\boldsymbol {X})\triangleq \left (\boldsymbol {f}(\boldsymbol {X};i_{1})' \ \boldsymbol {f}(\boldsymbol {X};i_{2})'\right)'\), i.e.
$$\begin{array}{@{}rcl@{}} \boldsymbol{X}_{c} & = & \mathop{\text{arg min}}\limits_{\boldsymbol{X}\in \mathbb{R}^{3}} \left[ \boldsymbol{\epsilon(\boldsymbol{X})}' \boldsymbol{\hat \Sigma}^{1} \boldsymbol{\epsilon(\boldsymbol{X})} \right] \end{array} $$
(2)
where \(\boldsymbol {\epsilon }(\boldsymbol {X})\triangleq (\boldsymbol {\hat z}\boldsymbol {\hat f}(\boldsymbol {X}))\); and \(\boldsymbol {\hat \Sigma }\) is the error covariance of the measurement \(\boldsymbol {\hat z}\).
It may be necessary to limit the number of candidate locations by not considering all combinations of sensor measurements in determining the intersections (in twodimensional applications) or minimising (2) (in threedimensional applications). Moreover, in threedimensional geolocation applications, the optimisation in Eq. (2) may not be straightforward, and it may be more efficient to randomly select candidate locations within the surveillance region.
2.4 Step 2: Associate measurements and determine likelihood for each candidate location
It is assumed that for each sensor, a maximum of one measurement is generated by each target in the time window under consideration. Furthermore, for each candidate target location X, and each sensor i:

1.
The index a(X;i)∈{1,…,n(i)} of the measurement that is associated is the one with the largest individual likelihood, i.e. nearest neighbour data association is performed (e.g. see [21]).

2.
If every measurement generated by a sensor has an individual likelihood that is less than 100 ξ % of the maximum value l
_{
i
}(max), then no measurement from that sensor is associated with the location. The threshold ξ∈[0,1] is prespecified. It is noted that this approach is equivalent to gating the measurement Mahalanobis distance with a threshold \(g= \sqrt {2\ln \xi }\).
Therefore, a(X;i) is given as follows:
$$\begin{array}{@{}rcl@{}}a(\boldsymbol{X};i) & \triangleq & \left\{ {{\begin{array}{ll} \underset{j=1,\ldots,n(i)}{\text{arg max}} \ {l_{i}(\boldsymbol{X};j)} &\quad \text{if} \ \underset{j=1,\ldots,n(i)}{\max} l_{i}(\boldsymbol{X};j) \geq \xi \times l_{i}(\max)\qquad \\ \\ 1 &\quad \text{otherwise} \end{array}}} \right.\\ \end{array} $$
(3)
where l
_{
i
}(X;j) denotes the individual likelihood at sensor i as a result of associating the jth measurement z(i,j) with candidate target location X. For the measurement model (1), this individual likelihood is given as follows:
$$ \begin{aligned} l_{i}(\boldsymbol{X};j) &= \frac{1}{(2\pi)^{d_{i}/2}\det(\boldsymbol{\Sigma}_{i})^{1/2}}\exp\left(\frac{1}{2} \left[\boldsymbol{z}(i,j)\right.\right.\\ & \quad\left.\left.\boldsymbol{f}(\boldsymbol{X};i)\right]' \boldsymbol{\Sigma}_{i}^{1}\left[\boldsymbol{z}(i,j)\boldsymbol{f}(\boldsymbol{X};i)\right]{\vphantom{\frac{1}{2}}}\right) \end{aligned} $$
(4)
It is noted that a(X;i)=−1 denotes that no measurement from sensor i is associated with the location X.
The overall measurement likelihood is then given as follows:
$$ \begin{aligned} L(\boldsymbol{X}) &= \frac{1}{(2\pi)^{D/2}\det(\boldsymbol{\Sigma})^{1/2}}\exp\left\{\frac{1}{2}[\boldsymbol{Z}\right.\\ &\quad\left.\boldsymbol{f}(\boldsymbol{X})]' \boldsymbol{\Sigma}^{1}[\boldsymbol{Z}\boldsymbol{f}(\boldsymbol{X})]{\vphantom{\frac{1}{2}}}\right\} \end{aligned} $$
(5)
where:
$$\begin{array}{@{}rcl@{}} N_{a} &\triangleq& \text{total number of measurements associated with }\\ &&\text{the location \(\boldsymbol{X}\)} \end{array} $$
(6)
$$\begin{array}{@{}rcl@{}} &=&\sum_{i=1}^{N} \sum_{a(\boldsymbol{X};i)>1}1 \end{array} $$
(7)
$$\begin{array}{@{}rcl@{}} \boldsymbol{Z} &\triangleq& \text{\small{concatenated vector of associated measurements}} \end{array} $$
(8)
$$\begin{array}{@{}rcl@{}} & = &(\boldsymbol{z}(1,a(\boldsymbol{X};1))' \ \ldots \ \boldsymbol{z}(N_{a},a(\boldsymbol{X};N_{a}))')' \end{array} $$
(9)
$$\begin{array}{@{}rcl@{}} && \quad\!\text{(with the sensor indices reordered to 1}, \ldots, N_{a})\\ \,\boldsymbol{f}(\boldsymbol{X}) \!&=&\!\! \left(\boldsymbol{f}(\boldsymbol{X};1)' \ \ldots \ \boldsymbol{f}(\boldsymbol{X};N_{a})'\right)' \end{array} $$
(10)
$$\begin{array}{@{}rcl@{}} D &=& \text{dimensionality of the concatenated vector of}\\ &&\text{all associated measurements} \end{array} $$
(11)
$$\begin{array}{@{}rcl@{}} &=& \sum_{i=1}^{N} \sum_{a(\boldsymbol{X};i)>1} d_{i} \end{array} $$
(12)
$$\begin{array}{@{}rcl@{}} \boldsymbol{\Sigma} &=& \left(\begin{array}{cccc} \boldsymbol{\Sigma}_{1} & \boldsymbol{\Sigma}_{2,1} & \ldots & \boldsymbol{\Sigma}_{N_{a}, 1} \\ \boldsymbol{\Sigma}_{1,2} & \boldsymbol{\Sigma}_{2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & \boldsymbol{\Sigma}_{N_{a},N_{a}1} \\ \boldsymbol{\Sigma}_{1,N_{a}} & \ldots & \boldsymbol{\Sigma}_{N_{a}1,N_{a}} & \boldsymbol{\Sigma}_{N_{a}} \\ \end{array} \right) \\ \end{array} $$
(13)
$$\begin{array}{@{}rcl@{}} \boldsymbol{\Sigma}_{i,j} & \triangleq & \text{correlation between the measurements at}\\ &&\text{sensors {i} and {j}} \end{array} $$
(14)
It is noted that if the measurements from all sensors are uncorrelated (i.e. Σ
_{
i,j
}=0, for all i, j), the overall measurement likelihood at each candidate location X is given by the product of the individual likelihood values of the associated measurements, i.e.
$$\begin{array}{@{}rcl@{}} L(\boldsymbol{X}) & = & \prod_{i=1}^{N} \prod_{a(\boldsymbol{X};i)>1} l_{i}(\boldsymbol{X};a(\boldsymbol{X};i)) \end{array} $$
(15)
More importantly, in this case, the ensemble of measurements that satisfy Eq. (3), for i=1,…, N, also maximises the overall measurement likelihood.
There is no practical reason why the nearest neighbour data association approach cannot be used if the measurements from different sensors are correlated. However, it should be noted that the resulting measurement set is not guaranteed to be close to optimal in maximising the overall measurement likelihood. In such cases, performing measurement reassociation during gradient descent (see Section 2.6.3) may be helpful in correctly resolving the complex multisensor/multitarget data association problem.
An exemplar likelihood map is shown in Fig. 1. It is noted that this map is shown for illustration only. The reader is reminded that the TALA calculates the likelihood only at the initial candidate locations and at the locations determined on subsequent iterations of the gradient descent algorithm.
2.5 Step 3: Candidate location deletion
Clearly, the number of candidate locations can be large. To reduce the computational expense of the algorithm, at this stage, some of the candidate locations are deleted. A candidate location is deleted if any of the following are true.

1.
Deletion criterion 1: The candidate location does not have at least μ
P
_{
d
}
N measurements associated with it (i.e. it is not consistent enough with the data). This value is set by noting that the average number of measurements generated by each target is P
_{
d
}
N for a system with N sensors, and with a probability P
_{
d
} that each target is detected by each sensor. In simulations, a value of μ=0.5 was shown to produce excellent results.

2.
Deletion criterion 2: The candidate location has exactly the same measurements associated with it as another candidate location that has greater overall likelihood.

3.
Deletion criterion 3 (optional): The candidate location has one or more measurements associated with it that are also associated with another candidate location that has greater overall likelihood. The procedure for implementing this deletion criterion is as follows:

(a)
The overall likelihood is calculated for each candidate location, using the procedure described on Step 2 of the algorithm.

(b)
The candidate location with the greatest overall likelihood is accepted as a potential target location.

(c)
Recursively, consider the candidate location with next greatest likelihood. If this candidate location does not share any associations with any of the previously accepted candidate locations it is also accepted as a potential target location, otherwise it is deleted.
Deletion criterion 3 has the advantage of significantly reducing the number of candidate locations that need to be manipulated, and this can significantly reduce the computational expense of the algorithm. The disadvantage is that by deleting candidate locations at this early stage, the TALA has a reduced probability of detecting all target events. This criterion therefore compromises estimator performance for increased computational speed.
In Fig. 2
b, the results of the intersection deletion step are shown for the exemplar scenario. It is noted that deletion criterion 3 is not used in this example.
2.6 Step 4: Maximum likelihood estimation
2.6.1 Background — standard GaussNewton approach
Consider the set of N
_{
a
} measurements associated with a candidate location, calculated via Eq. (3). The ML estimate \(\hat {\boldsymbol {X}}_{MLE}\) of the target location, based on these measurements is given as follows:
$$\begin{array}{@{}rcl@{}} \hat{\boldsymbol{X}}_{MLE} & = & \mathop{\text{arg max}}\limits_{\boldsymbol{X}} L(\boldsymbol{X}) \end{array} $$
(16)
$$\begin{array}{@{}rcl@{}} & = & \mathop{\text{arg min}}\limits_{\boldsymbol{X}} \left[ [\boldsymbol{Z}\boldsymbol{f}(\boldsymbol{X})]' \boldsymbol{\Sigma}^{1}[\boldsymbol{Z}\boldsymbol{f}(\boldsymbol{X})]\right] \end{array} $$
(17)
with Z, f(X), and Σ given in Eqs. (9), (10), and (13) respectively.
The nonlinear least squares problem (17) can be solved using the GN approach (e.g. [24]). The GN approach performs iterative gradient descent, starting with an initial estimate X
_{0}. It generates a sequence of estimates as follows:
$$\begin{array}{@{}rcl@{}} \boldsymbol{X}_{k+1} &=& \boldsymbol{X}_{k}  \boldsymbol{\delta}_{k} \end{array} $$
(18)
where the full “Newton step” δ
_{
k
} is given as follows:
$$ \boldsymbol{\delta}_{k} =\left[ \boldsymbol{F}(\boldsymbol{X}_{k})'\boldsymbol{\Sigma}^{1}\boldsymbol{F}(\boldsymbol{X}_{k}) \right]^{1} \boldsymbol{F}(\boldsymbol{X}_{k})' \boldsymbol{\Sigma}^{1}[\boldsymbol{Z}\boldsymbol{f}(\boldsymbol{X}_{k})] $$
(19)
The Hessian matrix F(X
_{
k
}) is given as follows:
$$ \begin{aligned} \boldsymbol{F}(\boldsymbol{X}_{k}) \! =\! \left(\nabla_{\boldsymbol{X}_{k}}\,\boldsymbol{f}(\boldsymbol{X}_{k})' \right)'\,=\, \left(\nabla_{\boldsymbol{X}_{k}}\,\boldsymbol{f}(\boldsymbol{X}_{k};1)' \ldots \nabla_{\boldsymbol{X}_{k}}\,\boldsymbol{f}(\boldsymbol{X}_{k};N_{a})'\right)' \end{aligned} $$
(20)
where \(\nabla _{\boldsymbol {X}_{k}}\phantom {\dot {i}\!}\) is the firstorder partial derivative operator with respect to \(\boldsymbol {X}_{k}\in \mathbb {R}^{3}\).
If the iterative scheme given in Eq. (18) converges, it will do so to a stationary point, thereby providing a ML estimate. However, convergence is not guaranteed and is highly dependent on the proximity of the initial estimate X
_{0} to the stationary value.
2.6.2 Implementation — GaussNewton approach with an adaptive step size
In light of the potential for the GN approach to diverge, the implementation herein allows steps smaller than, and in the opposite direction to, the full “Newton step”, whilst attempting to maximise the increase in the overall measurement likelihood on each iteration. Specifically, the GN approach is initialised with each initial candidate location. On each subsequent iteration, the location is modified as follows:
$$\begin{array}{@{}rcl@{}} \boldsymbol{X}_{k+1} &=& \boldsymbol{X}_{k} + \boldsymbol{\Lambda}_{k} \end{array} $$
(21)
where, either:

Λ
_{
k
} is the increment from the set {α
δ
_{
k
}/m: α=−m,…,−1,1,…,m} that results in the greatest increase in the overall measurement likelihood; where δ
_{
k
} is the full Newton step (19) and m is a prespecified positive integer;
or, if no step from the above set increases the overall measurement likelihood:

Λ
_{
k
} is a step in a randomly generated direction (i.e. drawn from a Uniform distribution on [−π,π]) of magnitude δ
_{
M
} (nominally, δ
_{
M
} = 200 metres). This random step is accepted if it increases the overall measurement likelihood.
The GN approach is terminated if either:

1.
A total of 20 random steps have been attempted.

2.
The magnitude of each component of the gradient of the normalised sumofsquared errors (GNSSE) is smaller than a prespecified value (nominally 10^{−3}). Only in this case is successful convergence to a ML estimate deemed to have been achieved.
This “line search” adaptation of the GN approach is similar to the line search approach detailed in Section 9.7 in [25]. In Fig. 2
c, ML estimates calculated using the GN approach are shown for the exemplar scenario.
2.6.3 Reassociation during gradient descent
In scenarios in which the measurement errors are large, each initial candidate location (e.g. generated from the intersection of a pair of measurements) may be distant from the ML estimate. In such cases, the measurements associated with the initial candidate location may not be the nearest to each of the subsequent iterates, X
_{
k
}, k=1,2,…, of the GN algorithm.
Motivated by this, in cases in which the measurements are inaccurate, reassociation can be performed after each iteration of the GN approach. That is, having determined iterate X
_{
k
}, reassociation is performed, and the measurements associated with location X
_{
k
} are used to determine the next increment δ
_{
k
}, and next iterate X
_{
k+1}.
Performing reassociation can significantly improve performance when measurement errors are large. However, this is at the cost of (i) increasing the computational expense of the algorithm and (ii) making the algorithm less likely to converge to a ML estimate, hence reducing the number of target events located.
2.7 Step 5: Final downselection/outputs
Having determined the ML estimates on Step 4, downselection is performed in order to ensure that each measurement is associated with no greater than one ML estimate. The procedure for performing this downselection is exactly the same as given in the optional deletion criterion 3 on Step 3. It is noted that if the optional criterion is performed on Step 3, and provided that reassociation is not performed during the gradient descent on Step 4, then this downselection has already been performed.
A final downselection step also deletes estimates that lie within the sensor perimeter. Such estimates are rare, but can occur because of incorrect associations, or convergence to the wrong point of intersection of the associated measurements.
The remaining ML estimates provide estimates of the target event locations. The approximate error covariance (denoted \(\boldsymbol {\mathcal C}(\boldsymbol {X}^{\star })\)) of each estimate X
^{⋆} is given by the inverse of the observed Fisher information matrix [28]. This covariance is as follows:
$$\begin{array}{@{}rcl@{}} \boldsymbol{\mathcal C}(\boldsymbol{X}^{\star}) &\approx & \left[\boldsymbol{F}(\boldsymbol{X}^{\star})'\boldsymbol{\Sigma}^{1}\boldsymbol{F}(\boldsymbol{X}^{\star})\right]^{1} \end{array} $$
(22)
The matrix Σ is again given by Eq. (13); and the matrix F(.) is given by Eq. (20). In Fig. 2
d, the final outputs of the target localisation algorithm are shown for the exemplar scenario.