The path loss measurements are often recorded at some distances, and then a predictive model is fitted to the path loss using different variables such as distance, frequency, and many others. When measurements are repeated at the same distance, generally, we obtained highly autocorrelated observations. For example, Fig. 1 shows the autocorrelation function of path loss measurements from our experiment at different distances. Autocorrelation is present since the values are significantly different than zero (outside the blue interval). For example, at distance 2855 m the correlation coefficient at lag 1 \(cor(y_t,y_{t1})\) is approximately equal to 0.4. This autocorrelation is often forgotten when modeling path loss because the basic approach consists of finding the mean at each distance and then use these averages to fit a model. We aim at proposing a generalized linear mixed model that takes into account the correlation information among path loss observations.
3.1 Generalized linear mixed models
Generalized linear mixed models are a class of statistical linear models that allow data to be nonnormal as well as can include random effects in addition to the fixed effects. Generalized linear models (GLM) assume that the response variable comes from the exponential dispersion family of distributions given by [20]:
$$\begin{aligned} P(y;\theta , \phi ) = a(y,\phi )\exp \bigg ( \frac{y\theta  b(\theta )}{\phi } \bigg ), \end{aligned}$$
(1)
where \(\theta\) is the canonical parameter and \(\phi\) is the dispersion parameter. \(b(\theta )\) is the cumulant function. \(a(y,\phi )\) is a normalizing function ensuring that equation (1) is a probability function. For example, the Normal probability density function \(\left(P(y;\mu ,\sigma ^2)=\frac{1}{\sigma \sqrt{2\pi }}\exp {\Big (\frac{(y\mu )^2}{2\sigma ^2}\Big )}\right)\) is a special case of Eq. (1), where \(\theta =\mu\) is the canonical parameter, \(b(\theta )=\theta ^2/2\), and \(\phi =\sigma ^2\). The normalizing term is equal to \(\frac{1}{\sigma \sqrt{2\pi }}\exp {\Big (\frac{y^2}{2\sigma ^2}\Big )}\).
A crucial aspect of working with a GLM is the relationship between the mean and variance of the distribution of the response variable (path loss). The variance can be rewritten as a function of the mean as follows:
$$\begin{aligned} Var[Y]=\phi \frac{d \mu }{d \theta } = \phi V(\mu ), \end{aligned}$$
(2)
where \(V(\mu )\) is called the variance function. It is known that the mean–variance relationship in (2) can help to determine the distribution to consider in the modeling process. For instance, if the mean and variance are independent then a Normal model would be a potential model to consider. If the variance is proportional to the mean squared a Gamma model could be of interest, when the observations are continuous positive values. A generalized linear mixed model (GLMM) for y is given by:
$$\begin{aligned} y_{ij} = x_{ij} \pmb {\beta } + z_{ij} \pmb {u_i} + \varepsilon _{ij}, \end{aligned}$$
(3)
where \(\pmb {\beta }\) is a \(p\times 1\) vector fixed effects and \(\pmb {u_i} \sim N(\pmb {0}, \pmb {\Sigma _u})\) is a \(q\times 1\) vector random effects. This model decomposes \(y_{ij}\) into a term (fixed) \(x_{ij} \pmb {\beta }\) for the mean, a term (random) \(z_{ij} \pmb {u_i}\) for variability betweendistances, and a term \(\varepsilon _{ij}\) for variability withindistances. \(\Sigma _u\) is the matrix of variance covariance of the randomeffect terms. Often, an intercept randomeffect model is used, thus \(u_i \sim N(0,\sigma ^2_u)\) a univariate random variable.
Figure 2 shows the mean–variance relationship in our measurements, and there is no obvious relationship. This was created by computing the means and variances at each distance. The plot shows that the variance is not changing when the mean increases; this property corresponds to a normal model; therefore, we will consider a normal linear mixed model for predicting the path loss. Note that in our data set most distances have a path loss mean close to 143 dB and that is why a cloud is built around 143.
3.2 Normal mixed model for path loss
The experiment of recording the path loss measurements can be seen as a multivariate data collection where each distance (cluster) i has \(n_i\) observations. It is reasonable to assume the observations are correlated within a cluster and models need to account for that correlation. Models that ignore the correlations among observations have invalid standard errors. For instance, at distance i the measurements can be denoted as \(\pmb {y_i}=(y_{i1},y_{i2}, \ldots , y_{in})^t\), \(i=1,2,\ldots ,d\), where d is the number of distances and n is the number of measurements taken at the same distance i. A first model for path loss (PL) is given by:
$$\begin{aligned} PL_{i} = \beta _{0} + \beta _{1} \log (\text {distance}_i/100) + u_i + \varepsilon _{i} \end{aligned}$$
(4)
where \(u_i\) are assumed to be independent and follow \(N(0,\sigma ^2_u)\). Often \(\varepsilon _{i}\) are also assumed to be independent; however, a more realistic assumption is to allow \((\varepsilon _{i1}, \varepsilon _{i2}, \ldots , \varepsilon _{id})\) to be correlated, such as having an autoregressive moving average process correlation structure. The model given in Equation (4) assumes a straight line relationship between path loss and logarithm of the distance; however, examining this relationship closely shows there may be a cubic function between path loss and logarithm of the distance, see Fig. 3.
Therefore, our proposed model for path loss is given by:
$$\begin{aligned} PL_{i}\!&=\!\beta _0\!+\!\beta _1 d_i\!+\!\beta _2 d^2_i\!+\!\beta _3d^3_i\!+\!u_i (1\mid \text {distance})\!+\!\varepsilon _{i} \end{aligned}$$
(5)
where \(d_i=\log (\text {distance}_i/100)\). \(u_i \sim N(0, \sigma _u^2)\) is the random component, and \(\varepsilon _t\) is \(\textrm{AR}(1)\) process, which means \(\varepsilon _t = \phi \varepsilon _{t1}+z_t\), where \(z_t\) is a white noise with \(N(0,\sigma _z^2)\).
The advantages of using the linear mixed models for prediction relate to whether or not the new data points we are going to predict are from one of the sites (distances) we have already measured. If that is the case, then this proposed model (Eq. 5) should perform very well using the information we know about that distance. If new data are not from one of the distances we measured, which is often the case, then the proposed model should perform at least as a fixedeffect model providing the population level predictions.
3.3 Model fitting and selection
To select a correlation structure of the model error, given by Eq. (5). We first divide the data into two data sets: training (\(80\%\)) and testing (\(20\%\)). The training data set is utilized to determine the correlation structure of the error term when building our model as follows:

Correlation structure within distances \(\varepsilon _i\): \(\textrm{AR}(1)\), \(\textrm{AR}(2)\), ARMA(2, 1), and ARMA(1, 2). These choices are based on examining the sample autocorrelation function and partial autocorrelation function, see Fig. 1.

Crossvalidation simulations were run with 5 folds and 30 replications to select the best model according to a cost function.

The best model is selected using the mean absolute prediction error (cost function) defined by:
$$\begin{aligned} {\text {MAPE}}=\frac{1}{N} \sum _{i=1}^N \mid {\text {Actual}}_i  {\text {Predicted}}_i \mid \end{aligned}$$
(6)
In order to select a model we divide the data into two data sets: training (\(80\%\)) and testing (\(20\%\)) and also used only the first \(80\%\) of the data points in time to fit the model. Therefore, the testing data set would have all distances some used in training the model and others are new to the model.
The testing data set with only new distance to the model will be used to test the best model and its generalization capacity compared to other models. All models in this step have never seen the testing data set before. MAPE will be used to compare the predictions to the actual values. Lower values of MAPE indicate better models.