On the application of generalized linear mixed models for predicting path loss in LTE networks

To meet the ever-growing demand for higher data rates, accurate channel models are needed for optimal design and deployment of mobile wireless networks. This work proposes a new method for addressing path loss modeling at 800 MHz of suburban environment based on field measurements. Using generalized linear mixed models, we develop a new statistical model that accounts for the autocorrelation among measurements at the same distance at different times. The proposed method allows linear, quadratic, and cubic relationship forms between the path loss measurements and the natural logarithm of the distance, which is almost unexplored as existing models use a straight line relationship. A comparison study consists of comparing nine propagation models in terms of the mean absolute prediction error. The new model performs over 30%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$30\%$$\end{document} better than the existing models for the considered measurements. We also show that a cubic relationship form between path loss measurements and the logarithm of distance could be more suitable than a straight line form for prediction purposes. The results show that the generalized linear mixed models significantly improve the prediction power regardless of the form of the model (linear, quadratic, or cubic).


Introduction
In the deployment of mobile wireless networks, propagation models play a significant role in predicting coverage area and tower locations. With new technologies and deployment environments, it is essential to tune existing models and propose new models to ensure accurate predictions of signals in the deployment area.
Long-term evolution (LTE) is the advanced fourth generation of mobile wireless technology but also used by fifth-generation devices [1]. The wireless and mobile were quick to adopt and deploy the technology to meet the ever-growing demand for bandwidth. By the end of 2020, there were over 5.95 billion LTE mobile subscribers [2]. Different carrier frequencies are supported by the standards. The deployment carrier frequency is determined by the region. The frequencies 800, 1800, and 2600 MHz are currently deployed [3]. RF power analysis, measurements, and modeling are critical for the successful deployment of a mobile network. Post-deployment, modeling, and analysis are utilized to optimize network performance. Propagation and path loss models are utilized by operators to predict the coverage area and ensure the quality of service for customers.
Propagation models can be classified as empirical, semi-deterministic, or deterministic. Empirical models are statistical models based on measurements (data). Deterministic models are developed using physical laws governing electromagnetic propagation. They require high computation power and detailed site-specific parameters. Semideterministic models are based on empirical data and deterministic algorithms. Driven by new deployment environments, path loss modeling continues to be researched to develop more accurate models [4][5][6].
The path loss for non-line of sight microcell environment was studied in [5]. The work extended applicable models for frequency below 6 GHz. The models modify the frequency term to reduce root means square error. Artificial neural networks for path loss prediction model were proposed for macrocell data [6]. The results showed that the model outperformed autoregressive moving average (ARMA) and COST-231 models. The 1900 MHz frequency model was studied in [4]. The work examined different antenna heights. The results of the investigation do question common models and recommend modifications of original forms to ensure more reliable predictions. Using a neural network model [7], a hybrid model was proposed to predict path loss for a suburban area in 800 MHz and 2600 MHz. The hybrid error-based model is shown to be more accurate than the other methods in the tested scenarios. A study of LTE 1800 MHz was conducted using three propagation models, namely [8]: Walfisch-Ikegami model, SUI model and Ericsson model. For LOS, the Ericsson model has the highest prediction accuracy, while for the NLOS, SUI C model performed best. The paper proposed combining models to improve accuracy. By combining the Free Space Loss (FSL) model with the SUI C model or Ericsson model, the accuracy is improved by 2 dB. Using nonlinear exponential regression to adjust the Standard Propagation Model was proposed in [9]. The results show an improvement in the performance of the adjusted model. Both fixed locations and drive test was presented in [3] for an urban environment. While fixed locations path loss can be accurately predicted using modern channel models, the models fail to accurately predict path loss for drive tests.
An empirical study for fixed wireless links in a vegetated residential environment is given in [10]. To validate the new model, field measurements collected in a suburban microcell channels utilizing 5.8 GHz. The study developed a path loss model that incorporates vegetation effects. The study observes that external effects, such as wind, affect the signal performance. The terrain was another factor, which needs to be considered before the deployment of wireless links. The work in [11] examined propagation modeling for a WiMAX network. A new model based on curve fitting is found to provide more accurate predictions than accepted models. A new model was proposed for an environment with vegetation in [12] operating at 700-800 MHz. The work noted that current models present a pessimistic estimate of path loss. In [13], field measurements were conducted at 800 MHz with a 15MHz bandwidth. The work compares different propagation models accuracy in predicting path loss.
In this work, we propose a new approach for modeling the path loss using generalized linear mixed models (GLMM). The new model takes into account the autocorrelation among measurements at the same distance and at different times. We investigate the cubic relationship form between path loss data and the logarithm of distances. The model utilizes data in [11]. A comparison is conducted with nine existing models for suburban environments operating at 800MHz.
The paper is organized as follows: Sect. 2 reviews the propagation models. Section 3 introduces the proposed model. Section 4 introduces measurements and results. Finally, Sect. 5 concludes the paper.

Review of propagation models
In wireless communication systems, electromagnetic waves are responsible for carrying information between antennas. Due to interaction with the environment, waves suffer a loss on their path. Propagation models attempt to predict the path loss to help in RF planning and network deployment.
Several propagation models are proposed and are studied to predict path loss. These models have been traditionally applied to frequencies below 2 GHz [14][15][16][17][18][19]. The models and their parameters are discussed in [13]. The free space (FS) path loss is an analytical model that predicts the strength of the signal received when a clear line of sight path. Since it does not account for multiple paths the signal takes, it cannot be used for point to the multi-point radio link. However, it is included as a reference. The Stanford University Interim (SUI) is an empirical model recommended by the standardizing committee [15]. The model contains frequency and height correction factors. The ECC-33 also known as Hata-Okumura extended model is based on Okumura model [18,19].
COST-231 Hata is an extension to the Hata-Okumura model that includes a correction factor for the environment. While the Hata-Okumura model was developed for 500-1550 MHz, the COST-231 model extends a frequency range up to 2 GHz [13]. The model can be used for urban, suburban, and rural areas. Based on a modified Hata-Okumura model, the Ericsson model allows changing parameters based on the environment. Another form of the COST-231-Hata model is the standard propagation model (SPM), which was adopted in several RF planning tools. The standard propagation model (SPM) is based on empirical formulas. It is particularly suitable for predication in the 150MHz 3500MHz band over a long distance.

Methods and models
The path loss measurements are often recorded at some distances, and then a predictive model is fitted to the path loss using different variables such as distance, frequency, and many others. When measurements are repeated at the same distance, generally, we obtained highly autocorrelated observations. For example, Fig. 1 shows the autocorrelation function of path loss measurements from our experiment at different distances. Autocorrelation is present since the values are significantly different than zero (outside the blue interval). For example, at distance 2855 m the correlation coefficient at lag 1 cor(y t , y t−1 ) is approximately equal to 0.4. This autocorrelation is often forgotten when modeling path loss because the basic approach consists of finding the mean at each distance and then use these averages to fit a model. We aim at proposing a generalized linear mixed model that takes into account the correlation information among path loss observations.

Generalized linear mixed models
Generalized linear mixed models are a class of statistical linear models that allow data to be non-normal as well as can include random effects in addition to the fixed effects. Generalized linear models (GLM) assume that the response variable comes from the exponential dispersion family of distributions given by [20]: where θ is the canonical parameter and φ is the dispersion parameter. b(θ) is the cumulant function. a(y, φ) is a normalizing function ensuring that equation (1) is a probability function. For example, the Normal probability density function P(y; µ, is a special case of Eq. (1), where θ = µ is the canonical parameter, b(θ) = θ 2 /2 , and φ = σ 2 . The normalizing term is equal to exp − y 2 2σ 2 . A crucial aspect of working with a GLM is the relationship between the mean and variance of the distribution of the response variable (path loss). The variance can be rewritten as a function of the mean as follows: where V (µ) is called the variance function. It is known that the mean-variance relationship in (2) can help to determine the distribution to consider in the modeling process. For instance, if the mean and variance are independent then a Normal model would be a potential model to consider. If the variance is proportional to the mean squared a Gamma model could be of interest, when the observations are continuous positive values. A generalized linear mixed model (GLMM) for y is given by: where β β β is a p × 1 vector fixed effects and u i u i is a q × 1 vector random effects. This model decomposes y ij into a term (fixed) x ij β β β for the mean, a term (random) z ij u i u i u i for variability between-distances, and a term ε ij for variability within-distances. u is the matrix of variance covariance of the random-effect terms. Often, an intercept randomeffect model is used, thus u i ∼ N (0, σ 2 u ) a univariate random variable. Figure 2 shows the mean-variance relationship in our measurements, and there is no obvious relationship. This was created by computing the means and variances at each distance. The plot shows that the variance is not changing when the mean increases; this property corresponds to a normal model; therefore, we will consider a normal linear mixed model for predicting the path loss. Note that in our data set most distances have a path loss mean close to 143 dB and that is why a cloud is built around 143.

Normal mixed model for path loss
The experiment of recording the path loss measurements can be seen as a multivariate data collection where each distance (cluster) i has n i observations. It is reasonable to assume the observations are correlated within a cluster and models need to account for that correlation. Models that ignore the correlations among observations have invalid standard errors. For instance, at distance i the measurements can be denoted as y i y i y i = (y i1 , y i2 , . . . , y in ) t , i = 1, 2, . . . , d , where d is the number of distances and n is the number of measurements taken at the same distance i. A first model for path loss (PL) is given by: where u i are assumed to be independent and follow N (0, σ 2 u ) . Often ε i are also assumed to be independent; however, a more realistic assumption is to allow (ε i1 , ε i2 , . . . , ε id ) to be correlated, such as having an autoregressive moving average process correlation structure. The model given in Equation (4) assumes a straight line relationship between path loss and logarithm of the distance; however, examining this relationship closely shows there may be a cubic function between path loss and logarithm of the distance, see Fig. 3.
Therefore, our proposed model for path loss is given by:  where d i = log(distance i /100) . u i ∼ N (0, σ 2 u ) is the random component, and ε t is AR(1) process, which means ε t = φε t−1 + z t , where z t is a white noise with N (0, σ 2 z ). The advantages of using the linear mixed models for prediction relate to whether or not the new data points we are going to predict are from one of the sites (distances) we have already measured. If that is the case, then this proposed model (Eq. 5) should perform very well using the information we know about that distance. If new data are not from one of the distances we measured, which is often the case, then the proposed model should perform at least as a fixed-effect model providing the population level predictions.

Model fitting and selection
To select a correlation structure of the model error, given by Eq. (5). We first divide the data into two data sets: training ( 80% ) and testing ( 20% ). The training data set is utilized to determine the correlation structure of the error term when building our model as follows: • Correlation structure within distances ε i : AR(1) , AR(2) , ARMA(2, 1), and ARMA (1, 2). These choices are based on examining the sample autocorrelation function and partial autocorrelation function, see Fig. 1. • Cross-validation simulations were run with 5 folds and 30 replications to select the best model according to a cost function. • The best model is selected using the mean absolute prediction error (cost function) defined by: In order to select a model we divide the data into two data sets: training ( 80% ) and testing ( 20% ) and also used only the first 80% of the data points in time to fit the model. Therefore, the testing data set would have all distances some used in training the model and others are new to the model. The testing data set with only new distance to the model will be used to test the best model and its generalization capacity compared to other models. All models in this step have never seen the testing data set before. MAPE will be used to compare the predictions to the actual values. Lower values of MAPE indicate better models.

Measurement and results
Field measurements collected in the suburb of a metropolitan area covered by a base station that employs LTE (see Fig. 4) are used to validate the proposed model. The site utilizes a three sector antenna at 25 m height. The coverage area is flat and is bordering to the sea. The area has urban clutter and is characterized by high humidity (i.e., 60%). Measurements were conducted along a highway that is 43 m (m) wide (both directions including sidewalks). The buildings are residential of two sizes: 26 m and 11 m. and are spaced 13 m and 11 m apart. Most buildings are 2 stories high. A user Equipment (UE) with a modem logging tool is used to extract and log field measurements.
Radio strength signal indicator (RSSI) in dBm as well as other related parameters such as RSRP, SINR, and RSRQ are logged. The measured data (RSSI, RSRP, SINR, and RSRQ) are tabulated along with location coordinates. For each location, over a hundred measurements are averaged as shown in Fig. 4.
Path loss can be calculated by considering the relationship between transmitted power, received power, gains and losses, i.e., where G T is transmitter gain, G R is receiver gain, L T is transmitter loss, L R is receiver feeder loss, and PL is the propagation model path loss.

Path loss modeling using GLMM
We shall recall that our data set consists of 186 distances and 100 observations in each distance. To fit the model for prediction purposes we will consider linear, quadratic, and cubic models with and without random terms and correlated errors. In total six models will be examined and evaluated. Table 1 shows results about different correlation structures within distances. The evaluation demonstrates that the autoregressive of order 1 model is a good correlation structure for predicting purposes since it provides the lowest prediction errors, equal to 1.0785. It is worth noting that the AR(2) model gives a similar performance, but it is a more complicated model than AR(1) . Similar results were obtained with linear, quadratic, and cubic models. Therefore, we decide to use the AR(1) model for error correlation.
The model equation is given by: and ε t is AR(1) process, which means ε t = φε t−1 + z t , where z t is a white noise with N (0, σ 2 z ) . We shall call the proposed model in Eq. (8) LMM3 (linear mixed model of order 3). Similarly, we fit LMM1 and LMM2, which are the models of orders 1 and 2, respectively. We also consider models without the random-effect terms ( u i ) and autocorrelated errors, which we call LM1 (linear model), LM2, and LM3. Given that the training data set is randomly selected from the original data set, then MAPE changes every time the training test data changes (e.g., testing). To account for this variability as well as avoid over-fitting, we repeat the process of generating the training data set and fitting models 100 times, and we present the quantiles of the MAPE values with their averages for each model in Fig. 5.
Two main results are demonstrated: • The cubic model (LM3 or LMM3) outperforms the straight line and quadratic models regardless if the model is mixed-effects or fixed-effects. This may indicate that a cubic model can significantly improve the prediction performance of a path loss model compared to a straight line model often considered in the literature. • The generalized linear mixed models are superior in predicting path loss compared to linear models in all cases.
Therefore, the best model equation is as follows: where d i = log(distance i /100) ; u i ∼ N (µ = 0, σ 2 u = 4.28) and ε t is AR(1) process, which means ε t = 0.55ε t−1 + z t , where z t is a white noise with N (0, σ 2 z = 1.94) . The model equation can be used to predict path loss for a similar environment that the one presented in this work. More importantly, the model that we presented here should provide crucial knowledge when developing a new prediction model in any other environment where other factors and variables are available to include in the model.

Comparison and discussion
It is anticipated that the LMM3 model will perform better than other existing models presented in this paper (see Table 2) because of the fact that LMM3 was fitted to a similar set of measurements. To establish that, a confirmatory comparison is conducted using a testing data set. We shall recall that the testing data set was not utilized in fitting (9)   any of the models. The comparison is given in terms of the mean absolute prediction errors (MAPE). Table 2 shows the values of MAPE and their standard errors of all considered models. The proposed method (LMM3) provides superior performance in predicting path loss. LMM3 has the lowest MAPE value of 1.82. The proposed model decreases the MAPE by more than 30% compared to the best existing model (COST-231) applied to the testing data set. The free space model has the largest MAPE value among other models and it equals 40.96 in this case. Figure 6 shows the actual values (testing data) of path loss measurements and the predicted values using different propagation models. The visualization is consistent with the results reported in Table 2. The goal of this paper was to highlight the application of generalized linear mixed models for path loss data. The generalized linear mixed model for path loss predictions proposed in this paper differs from machine learning approaches such as support vector regression [21]. GLMM is a statistical model that can accommodate the autocorrelation among observations and correlation among a cluster of observations. A model with a random-effect term can improve the performance of the prediction regardless of the form of the relationship (linear, quadratic, or cubic) compared to fixed-effect models. The GLMM is a statistical modeling approach that allows an understanding of the relationship between variables. On the other hand, support vector regression is a machine learning-based approach to predicting values, but it does not allow an explicit understanding of the relationship between variables (black box process). The paper presented nine comparisons with other models, which are not machine learning approaches.

Limitations
In this study, we developed a predictive model for path loss using 1) GLMM to account for the correlation in the data; and 2) different shapes of the relationship between path loss and distance (linear, cubic, and quadratic).
However, only one set of measurements from one location and one frequency (800 MHz) were used to evaluate and validate the model. These limitations may affect the generalization capacity of the proposed model, and the reader should be aware of that. Future research ideas to extend this work would be to evaluate the proposed model using various data sets at 800MHz. In addition, a similar model can easily be developed to include different frequencies as a parameter of the model if data are available.
On the other hand, the findings from this study show that (1) mixed models can improve the predictive performance of the path loss model, and (2) the cubic model is a solid alternative to the linear path loss model.

Conclusion
This paper proposed a generalized linear mixed model for path loss predictions. The proposed statistical model accommodates 1) the autocorrelation among observations and 2) the variability from distance to distance using a random-effects term. Several autoregressive moving average models were compared to capture autocorrelation. Results showed that a model with a random-effect term dramatically improved the performance of the prediction regardless of the form of the relationship (linear, quadratic, or cubic). The results also revealed that a cubic relationship between path loss measurements and the natural logarithm of distances could be more appropriate than straight line or quadratic forms. We compared the LMM model to some existing models in terms of the mean absolute prediction error (MAPE). We also demonstrated that mixed-effects models are more effective than fixed-effects models in predicting path loss.