As shown in Fig. 1, we consider a wireless edge cache-enabled network, in which BS is equipped with a storage and connected to *M* relays {*R*_{m}|*m*=1,2,…,*M*} with cache space through backhaul link. Among them, the relay node covers *N* MDs {*M**D*_{n}|*n*=1,2,…,*N*}, and the buffer space of each relay node is *C*. The MEC server is located in BS, which regularly predicts the popularity of files through the collected historical data and updates the caching strategies of the relay nodes.

### 2.1 System model

Let {*T*_{i}(*α*_{i},*β*_{i},*γ*_{i})|*i*=1,2,…,*I*} denote the information of computational task store at the BS, where *α*_{i} is the size of input computational task, *β*_{i} is the number of CPU cycles required to accomplish the task, and *γ*_{i} denotes the size of computation result of the task. In order to maximize the use of the BS and relays with limited space, and to satisfy the needs of most MDs, we need to accurately predict the request contents of MDs around the BS, and then to compute the contents in advance at the BS. Therefore, we use cache hit rate to measure the performance of prediction. In particular, the cache hit rate *P*_{hit} is defined as

$$\begin{array}{*{20}l} P_{hit}=\frac{\sum_{n=1}^{N}\sum_{i=1}^{I} x_{n,i}}{U}, \end{array} $$

(1)

where *U* is the total number of requests sent by users around the BS, and *x*_{n,i} is the caching strategy defined as,

$$ x_{n,i}= \left\{ \begin{array}{lr} 1 \ \text{if\ the\ file}\ i \text{\ result\ requested\ by\ the\ user}\ n\ \text{is\ cached\ at\ the\ BS}, \\ 0 \ \text{otherwise}. \end{array} \right. $$

(2)

The BS gets the files that users may request by predicting the file popularity, calculates these files in the edge server, and then sends the results to the corresponding relay node for caching. We assume that the data rate of the wireless link between the BS and relay *R*_{m} based on the Shannon theory is given by,

$$\begin{array}{*{20}l} C_{B,m}=W_{B,m}\log_{2}\left(1+\frac{P_{B,m}|h_{B,m}|^{2}}{\sigma_{B,m}^{2}}\right), \end{array} $$

(3)

where *W*_{B,m} denotes the wireless bandwidth and \(h_{B,m}\sim \mathcal {CN}(0,\epsilon _{B,m})\) denotes the channel gain between the BS and relay *R*_{m} [20–22]. *P*_{B,m} is the transmit power at the BS and \(\sigma _{B,m}^{2}\) is the variance of the additive white Gaussian noise at the BS [23–25]. Let *f*_{B} denote the computational capability of the BS, and express the computational latency as

$$\begin{array}{*{20}l} L_{compute}^{i} = \frac{\beta_{i}}{f_{B}}. \end{array} $$

(4)

The transmission latency, caused by the BS sending the task result *T*_{i} to the relay *R*_{m}, can be calculated as,

$$\begin{array}{*{20}l} L_{B,m}^{i} = \frac{\gamma_{i}}{C_{B,m}}. \end{array} $$

(5)

Similarly, the data rate of the wireless link between the relay *R*_{m} and the user *M**D*_{n} based on the Shannon theory is given by,

$$\begin{array}{*{20}l} C_{m,n}=W_{m,n}\log_{2}\left(1+\frac{P_{m,n}|h_{m,n}|^{2}}{\sigma_{m,n}^{2}}\right), \end{array} $$

(6)

where *W*_{m,n} denotes the wireless bandwidth and \(h_{m,n}\sim \mathcal {CN}(0,\epsilon _{m,n})\) denotes the channel gain between the relay *R*_{m} and the user *M**D*_{n}. *P*_{m,n} is the transmit power at the relay *R*_{m} and \(\sigma _{m,n}^{2}\) is the variance of the additive white Gaussian noise at the relay *R*_{m}. The transmission latency, caused by the relay *R*_{m} sending the task result *T*_{i} to the user *M**D*_{n}, can be calculated as,

$$\begin{array}{*{20}l} L_{m,n}^{i} = \frac{\gamma_{i}}{C_{m,n}}. \end{array} $$

(7)

In this paper, the BS uses idle time to compute the task and transmit the result of the task to the nearby relay in advance according to the prediction in the proposed scenario. When the user request the task results, the user can get the correspond results from the nearby relay without waiting. The waiting latency is mainly caused by computation latency at the BS and transmission latency from the BS to the relay. We reduced waiting latency of the user by increasing the predictive cache hit rate. So, the latency reduction can be expressed as

$$\begin{array}{*{20}l} L_{re}^{i} = x_{n,i} \left(L_{compute}^{i} + L_{B,m}^{i} \right). \end{array} $$

(8)

The higher \(L_{re}^{i}\) indicates higher cache hit rate.

### 2.2 Problem formulation

The problem in this study consists of two subproblems: maximizing cache hit ratio and minimizing request latency.

#### 2.2.1 Maximizing cache hit ratio

The problem of maximizing the cache space of the BS can be translated into the problem of maximizing cache hit rate, which can be written as

$$\begin{array}{*{20}l} \max_{} \quad &P_{hit} \end{array} $$

(9a)

$$\begin{array}{*{20}l} \text{s.t.} \quad &\text{C\_{1}}: \sum_{i=1}^{I} T_{i} \leq L, \end{array} $$

(9b)

where *L* is the maximum cache space of BS. *C*_{1} indicates that the number of files cached at the BS which cannot exceed the cache space limit of the BS. Similarly, *P*_{hit} gets higher while \(L_{re} = \sum {L_{re}^{i}} \) gets higher.

#### 2.2.2 Minimizing request latency

When there are multiple relays around the user, the BS needs to consider which relay should send the results after predictions, so that the user would take less time to get the file results. From (6), we can see that by considering the transmission channel condition and bandwidth between the relays and the user, the user *M**D*_{n} would take less time to get the results of task *T*_{i}. For the considered system, the goal of latency in the process of contents placement can be expressed as

$$\begin{array}{*{20}l} \min \quad L_{pl} = \sum_{m=1}^{M}\sum_{n=1}^{N}\sum_{i=1}^{I} L_{m,n}^{i}. \end{array} $$

(10)

The conventional approach for predictive modeling involves complex feature engineering and deep analysis of the data by hand. In this paper, we aim to improve hit rate by learning user preferences through large historical data. A deep neural network (DNN) [26–28] based predictive framework needs to be developed to learn user preferences. We can feed the data directly into the networks without manual processing and can mine more information from the data. So we use DNN to solve the problem of content popularity prediction, which is given as follows.