### 2.1 Scattering model

#### 2.1.1 Common scattering model

In previous studies, a variety of scattering models have been proposed, among which the most commonly used models are ROS and DOS, as shown in Fig. 2. The ROS regards the scattering points to be randomly distributed on the circle with the target as the center and satisfies a certain numerical radius, and the DOS regards the scattering points as obeying the two-dimensional Gaussian random distribution in the circle. In the traditional scheme, the multipath is introduced, the system of equations adds an unknown parameter, and too many unknown parameter will make the solving process of the system complicated.

#### 2.1.2 Scattering area model

Aiming at the problem of the insufficient information in the traditional NLOS single-station positioning system, this paper proposes a new signal reflection model in the NLOS environment. In the case of dense obstacles in the environment, there are often areas with reflected signals near the fixed BSs, such as tall or dense buildings. The propagation signal from the terminal is mainly reflected in these areas, and the scattering points are mainly distributed in these areas. In order to facilitate analysis, the design of the scattering area model in this paper is as follows. Referring to the spatial layout of the environment near the BS, the scattering area is set as a circular area with a certain point as the center and a fixed radius. The center of the circle is regarded as the center of the scattering area and the scattering point is in the scattering area. The area obeys a two-dimensional Gaussian random distribution, and the average value is the center coordinate of the scattering area. The signal entering the scattering area is reflected by these scattering points and received by the BS. The scattering area model is shown in Fig. 3.

The scattering area model is shown in Fig. 3. Each blue solid circle is the center of a scattering area (*x*_{i},*y*_{i}),*i*=1,2,...,*N*, where *N* is the number of scattering areas, and each dotted circle is a defined scattering area with the radius is *r*_{i}. The scattering point is in the circle, which obeys the two-dimensional Gaussian distribution with the mean value of the coordinate value of the scattering area center. Actually, scattering points that can reflect signals are not only limited in the defined scattering area but also exist far away from the scattering area. In this paper, they are called as interference scattering points, which are shown as red triangle points in Fig. 3. The signal transmitted by the target is reflected by the scattering point and received by the BS. Letting *n* be the number of measured multipaths, coordinates of scattering points are (*x*_{sj},*y*_{sj}),*j*=1,2,...,*n*, and (*x*_{B},*y*_{B}) is the coordinate of the BS.

#### 2.1.3 Measurement parameters

In this scheme, the TOA and the AOA are extracted from the channel state information (CSI). Under the condition of the NLOS propagation, the TOA of the signal is the flight time that the signal is transmitted from the terminal to the scattering point, and then from the scattering point to the BS. The AOA is the relative angle between the scattering point of the scattering point and the BS. In the 4G or 5G wireless communication systems, the wireless signals relies on the orthogonal frequency division multiplexing (OFDM) technology to divide the system frequency band into several separate sub-carriers. The CSI on the subcarrier contains the relevant parameters of the carrier signal, and the information needed for positioning can be extracted by using the super-resolution parameter estimation technology. The scheme proposed in this paper works in the NLOS environment, that is, the environment with dense obstacles without the LOS.

The AOA and the TOA of the multipath signal arriving at the BS are represented as (*θ*_{j},*τ*_{j}),*j*=1,2,...,*n*, where *θ*_{J} is the AOA signal and *τ*_{j} is the TOA of the multipath, which are received at the BS. Generally, the propagation speed of the electromagnetic waves is constant, denoted as *c*, and then the relationship between the measured parameter and the scattering point is expressed as

$$ \left\{ {\begin{array}{*{20}{l}} {{\theta_{j}}{{ = }}\arctan \left({\frac{{{y_{sj}} - {y_{B}}}}{{{x_{sj}} - {x_{B}}}}}\right)}\\ {c{\tau_{j}} = c{{\tau '}_{j}} + \sqrt {{{\left({{x_{sj}} - {x_{B}}}\right)}^{2}} + {{\left({{y_{sj}} - {y_{B}}}\right)}^{2}}} } \end{array}} \right. $$

(1)

where *τ*^{′}_{j} is the time for the signal propagating from the target terminal to the *j*th scattering point.

### 2.2 Joint clustering algorithm

According to the model constructed in this paper, the scattering points in the same scattering area have the characteristic of aggregation, by which the AOA and the TOA have the similar distribution in the two-dimensional space, so the clustering algorithm can be used to process these parameters. The clustering algorithm is a type of unsupervised learning algorithms, which can be used to classify data without labels. We let (*θ*_{j},*τ*_{j}) be the measurement parameter of the BS and the sample used for clustering be *S*={*s*_{1},*s*_{2},⋯,*s*_{n}}, where *s*_{j} is the sample parameter, as shown in (2).

$$ {{s}_{i}} = \left\{ {\left. {\left({{\alpha_{i}},{\beta_{i}}}\right)} \right|{\alpha_{i}} = c{\tau_{i}} \cdot \cos \left({{\theta_{i}}}\right),{\beta_{i}} = c{\tau_{i}} \cdot \sin \left({{\theta_{i}}}\right)} \right\} $$

(2)

Here, *s*_{j} is also called as the pseudo target coordinate value determined by the *j*th multipath measurement parameter, so each scattering point (*x*_{sj},*y*_{sj}) corresponds to a sample parameter *s*_{j}.

The scattering point \(\left ({x_{sj}^{i},y_{sj}^{i}}\right)\) in the scattering area *i* is regarded as obeying the expected Gaussian distribution with the center of the scattering area (*x*_{i},*y*_{i}). If there are enough scattering points, the mean value of the scattering point coordinates in a scattering area approximately equals to the center of the scattering area (*x*_{i},*y*_{i}). Then, the average value of the pseudo-target coordinate values corresponding to all the scattering points in the scattering area *i* is approximately equal to the pseudo-target coordinate value formed by the signal reflected by the center of the scattering area (*x*_{i},*y*_{i}). If the average value after clustering is approximated as the pseudo-target coordinate value of the signal reflected from the center of the scattering area (*x*_{i},*y*_{i}), and the corresponding AOA and reach distance are calculated by using the average value of the clustering, a set of equations can be established to conduct positioning.

At present, there are many clustering algorithms in the existing research. Among them, the k-means clustering algorithm is based on the Euclidean distance between the data to calculate the similarity, so the operation is simple, and the time complexity tends to be linear in general [26]. However, the result of the k-means clustering algorithm is easily affected by the noise and the isolated sample points. The result of k-means clustering is shown in Fig. 4. It can be seen that the outlier participating in clustering has a great influence on the clustering result. The above problem can be solved by using the mean shift clustering algorithm [27], which is based on the data distribution density to measure the similarity. The interference points are scattered and deviate from the main scattering area. The algorithm will separate them into clusters. However, the sliding window size of mean shift clustering has an important influence on clustering performance. In this paper, the window size is related to the radius of the scattering area. If all data points are clustered once, the window size of mean shift clustering should be set based on the maximum scattering area, but the clustering performance of the clusters corresponding to the scattering area with a smaller radius is poor. The results of mean shift clustering are shown in Fig. 5, in which the dashed circle represents the size of the clustering window. The fixed-size sliding window does not have a good clustering effect when the difference in the radius of the scattering area is too large, and some points are even not classified into the cluster.

In this paper, two clustering algorithms are combined into a joint clustering algorithm. First of all, all the data are roughly distinguished by using the k-means clustering, and then the mean shift clustering is used to filter out the interference points for the result obtained from k-means clustering. After k-means clustering, we can get the parametric clustering results (*α*^{′}_{i},*β*^{′}_{i}). Each result corresponds to a scattering area, the center of the scattering area is (*x*_{i},*y*_{i}), and its corresponding radius is *r*_{i}. For each k-means clustering result, the equation for the sliding window size of the mean shift clustering is:

$$ {w_{i}} = {r_{i}} \cdot \frac{{2\sqrt {{{\left({{{\bar \alpha }_{i}} - {x_{B}}}\right)}^{2}} + {{\left({{{\bar \beta }_{i}} - {y_{B}}}\right)}^{2}}} }}{{\sqrt {{{\left({{x_{i}} - {x_{B}}}\right)}^{2}} + {{\left({{y_{i}} - {y_{B}}}\right)}^{2}}} }} $$

(3)

The clustering result of the joint clustering algorithm is shown in Fig. 6. The size of the dotted circle represents the sliding window size of the mean shift cluster in each k-means clustering result, and each dotted circle represents a cluster. In Fig. 6, the clusters with a large number of distributed points represent the clustering results of the pseudo-target corresponding to the reflection signals of the scattered points in the scattering area, and the clusters with fewer distributed points represent the clustering result of the pseudo-target corresponding to the interference scattering points. The clustering results with the largest number of pseudo-target points are selected as references, and the clustering center is used as a parameter to establish an equation set, whereas other clustering results are discarded as the corresponding results of interference scattering points.

### 2.3 Target position calculation

The information that we have got involve the center position of scattering areas (*x*_{i},*y*_{i}),*i*=0,1,...,*N* and the clustering result obtained by the joint clustering algorithm \(\left ({{{\bar \alpha }_{i}},{{\bar \beta }_{i}}}\right)\). The corresponding parameter \(\left ({{{\bar \theta }_{i}},{{\bar \tau }_{i}}}\right)\) is deduced from (1), and then the AOA between the center of the scattering area (*x*_{i},*y*_{i}) and the BS is calculated as

$$ {\theta '_{i}} = arctan\left({\frac{{{y_{i}}}}{{{x_{i}}}}}\right)\quad i = 0,1,...,N $$

(4)

The parameters \({\bar \theta _{i}}\) and *θ**i*′ are matched according to the principle of the minimum difference, and \(\left ({{{\bar \theta }_{i}},{{\bar \tau }_{i}}}\right)\) is taken as the approximate measurement parameter with the scattering center (*x*_{i},*y*_{i}). If the coordinate of the target terminal is (*x*,*y*), the relations of the target, the scattering point, and the BS are shown in Fig. 7.

According to the geometric structure in Fig. 7, we can establish

$$ \left\{ \begin{array}{l} \sqrt {{{\left({x - {x_{1}}}\right)}^{2}} + {{\left({y - {y_{1}}}\right)}^{2}}} = c{{\bar \tau }_{1}} - \sqrt {{{\left({{x_{1}} - {x_{B}}}\right)}^{{2}}}{{ + }}{{\left({{y_{1}} - {y_{B}}}\right)}^{{2}}}} \\ \sqrt {{{\left({x - {x_{2}}}\right)}^{2}} + {{\left({y - {y_{2}}}\right)}^{2}}} = c{{\bar \tau }_{2}} - \sqrt {{{\left({{x_{2}} - {x_{B}}}\right)}^{{2}}}{{ + }}{{\left({{y_{2}} - {y_{B}}}\right)}^{{2}}}} \\ \quad \quad \quad \quad \vdots \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \vdots \\ \sqrt {{{\left({x - {x_{N}}}\right)}^{2}} + {{\left({y - {y_{N}}}\right)}^{2}}} = c{{\bar \tau }_{N}} - \sqrt {{{\left({{x_{N}} - {x_{B}}}\right)}^{{2}}}{{ + }}{{\left({{y_{N}} - {y_{B}}}\right)}^{{2}}}} \end{array} \right. $$

(5)

In the actual communication process between the target terminal and the BS, the parameters have the synchronization error due to the out of sync of clocks between the transmitter and the receiver, that is

$$ {\tau_{TOA}} = {\tau_{TOA\_true}} + {\tau_{s\_err}} + {\tau_{n}} $$

(6)

where \({\tau _{TOA\_true}}\) is the true TOA, \({\tau _{s\_err}}\) is the delay of the synchronization error, and *τ*_{n} is the measurement error caused by the white noise. In the single station positioning system, the clock synchronization error of each multipath is the same in the same signal transmission process, so the difference equation of two multipath TOAs is established to eliminate the synchronization error. By assuming that the distance from the target to the scattering point is *l*_{i}(*i*=0,1,⋯,*N*), we have

$$ {l_{i}} = c{\bar \tau_{i}} - \sqrt {{{\left({{x_{i}} - {x_{B}}}\right)}^{{2}}}{{ + }}{{\left({{y_{i}} - {y_{B}}}\right)}^{{2}}}} $$

(7)

If the number of scattering areas is *N*, the *N*−1 equations can be established as

$$ \left\{ \begin{array}{l} \sqrt {{{\left({x - {x_{N}}}\right)}^{2}} + {{\left({y - {y_{N}}}\right)}^{2}}} - \sqrt {{{\left({x - {x_{1}}}\right)}^{2}} + {{\left({y - {y_{1}}}\right)}^{2}}} = {l_{N}} - {l_{1}}\\ \sqrt {{{\left({x - {x_{N}}}\right)}^{2}} + {{\left({y - {y_{N}}}\right)}^{2}}} - \sqrt {{{\left({x - {x_{2}}}\right)}^{2}} + {{\left({y - {y_{2}}}\right)}^{2}}} = {l_{N}} - {l_{2}}\\ \quad \quad \quad \quad \vdots \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \vdots \\ \sqrt {{{\left({x - {x_{N}}}\right)}^{2}} + {{\left({y - {y_{N}}}\right)}^{2}}} - \sqrt {{{\left({x - {x_{N - 1}}}\right)}^{2}} + {{\left({y - {y_{N - 1}}}\right)}^{2}}} = {l_{N}} - {l_{N - 1}} \end{array} \right. $$

(8)

In practice, the (8) is not exactly equal. The solving of deterministic equation can be transformed into the solving of optimization problem. First, we rewrite (8) as

$$ \begin{array}{l} {\varphi_{i}}\left({x,y}\right) = \sqrt {{{\left({x - {x_{N}}}\right)}^{2}} + {{\left({y - {y_{N}}}\right)}^{2}}} - \sqrt {{{\left({x - {x_{i}}}\right)}^{2}} + {{\left({y - {y_{i}}}\right)}^{2}}} - \left({{l_{N}} - {l_{i}}}\right),\\ \quad \quad \;\;\;i = 0,1, \cdots,N - 1 \end{array} $$

(9)

where *φ*_{i}(*x*,*y*) is the error objective function. Then, the solution that minimizes the objective function shown in (10) is the position of the target.

$$ \varepsilon \left({x,y}\right) = \frac{1}{2}\sum\limits_{i = 1}^{N{{ - 1}}} {\varphi_{i}^{2}\left({x,y}\right)} $$

(10)

This paper uses the LM algorithm to solve (10). The LM algorithm is an iterative algorithm for finding the extreme value of a function, which can be used to solve the nonlinear least squares problem. The positioning scheme in this paper introduces the spatial layout as an information supplement and uses a clustering algorithm to process multipath parameters, which can greatly simplify equations and avoid the situation that the function does not converge in the limited domain. In the LM algorithm, the solution that minimizes the objective error function is the optimal solution, which is the estimation of the target terminal position.