Content popularity prediction for cache-enabled wireless B5G networks

In this paper, we study the cache prediction problem for mobile edge networks where there exist one base station (BS) and multiple relays. For the proposed mobile edge computing (MEC) network, we propose a cache prediction framework to solve the problem of contents prediction and caching based on neural networks and relay selection, by exploiting users’ history request data and channels between the relays and users. The proposed framework is then trained to learn users’ preferences by using the users’ history requested data, and several caching policies are proposed based on the channel conditions. The cache hit rate and latency are used to measure the performance of the proposed framework. Simulation results demonstrate the effectiveness of the proposed framework, which can maximize the cache hit rate and meanwhile minimize the latency for the considered MEC networks.

(2021) 2021:69 Page 2 of 16 offloading strategy in the system of multiple users and multiple computational access nodes.
In the mobile edge network, when users send contents requests to the remote server through the central base station, a data transfer is required by the users from the base station (BS) to the remote contents-providing server. When a large number of users send requests in a short period of time, this process can cause tremendous pressure on the network and degrade user experience. By combining caching with mobile edge networks, we can reduce transmission latency and energy consumption, improve user experience, reduce repeated transmission of the same contents, and improve transmission efficiency. In [14], the authors have combined caching with mobile edge networks to investigate a cache-assisted MEC network. And the results have shown that wireless caching networks can effectively reduce transmission time, and caching can significantly mitigate the impact of increasing computational task size. In [15][16][17], the authors have considered the communication, caching, and computation problems in multi-user cache-assisted MEC systems and proposed a joint caching and offloading scheme.
To deal with the explosive growth of data traffic and user demand, caching becomes a promising technology in recent years and receives a lot of attention in reducing network traffic and alleviating backhaul load. Huge cache space at the BSs and relays can increase vendors costs in practice, we generally consider limited cache space at the MEC scenario. In [18], the authors have considered caching problem in MEC scenarios with limited storage. Using edge caching technology, we can shift a large amount of data traffic to the edge of the network, such as access points, small cell base stations, and mobile users. We can use various wireless access technologies to improve network spectrum efficiency, network coverage and network capacity in order to reduce communication costs. In [19], the authors have investigated the problem of contents popularity prediction in fog radio access networks and used deep learning to predict contents popularity. It has been shown that reinforcement learning can be used to find the optimal cache replacement policy with the aim of maximizing the cache hit ratio.
In this paper, we propose a cache prediction framework in the MEC network to maximize the cache hit rate and reduce latency. First, we propose a wireless MEC network with a BS and several relays to solve the problem of contents prediction and caching policy. By analyzing the communication cost, we formulate the problem as two subproblems: maximize the cache hit rate and minimize the request latency. Next, we perform contents prediction by training deep neural networks to learn the preferences of users around the BS. And then we analyze the relay-to-user channel to determine the location of requested contents and the caching policy. Finally, the simulation results demonstrate that the proposed approach can improve the cache hit rate and reduce system latency.
The outline of this article is organized as follows. We investigate the system model and formulate the problem for the considered system in Section 2. The proposed problem solution framework is described in Section 3. Section 4 presents the simulation results and conclusions are shown in Section 5.

System model and problem formulation
As shown in Fig. 1, we consider a wireless edge cache-enabled network, in which BS is equipped with a storage and connected to M relays {R m |m = 1, 2, . . . , M} with cache space through backhaul link. Among them, the relay node covers N MDs {MD n |n = . . , N}, and the buffer space of each relay node is C. The MEC server is located in BS, which regularly predicts the popularity of files through the collected historical data and updates the caching strategies of the relay nodes.

System model
. , I} denote the information of computational task store at the BS, where α i is the size of input computational task, β i is the number of CPU cycles required to accomplish the task, and γ i denotes the size of computation result of the task. In order to maximize the use of the BS and relays with limited space, and to satisfy the needs of most MDs, we need to accurately predict the request contents of MDs around the BS, and then to compute the contents in advance at the BS. Therefore, we use cache hit rate to measure the performance of prediction. In particular, the cache hit rate P hit is defined as where U is the total number of requests sent by users around the BS, and x n,i is the caching strategy defined as, The BS gets the files that users may request by predicting the file popularity, calculates these files in the edge server, and then sends the results to the corresponding relay node for caching. We assume that the data rate of the wireless link between the BS and relay R m based on the Shannon theory is given by, where W B,m denotes the wireless bandwidth and h B,m ∼ CN (0, B,m ) denotes the channel gain between the BS and relay R m [20][21][22]. is the variance of the additive white Gaussian noise at the BS [23][24][25]. Let f B denote the computational capability of the BS, and express the computational latency as The transmission latency, caused by the BS sending the task result T i to the relay R m , can be calculated as, Similarly, the data rate of the wireless link between the relay R m and the user MD n based on the Shannon theory is given by, where W m,n denotes the wireless bandwidth and h m,n ∼ CN (0, m,n ) denotes the channel gain between the relay R m and the user MD n . P m,n is the transmit power at the relay R m and σ 2 m,n is the variance of the additive white Gaussian noise at the relay R m . The transmission latency, caused by the relay R m sending the task result T i to the user MD n , can be calculated as, In this paper, the BS uses idle time to compute the task and transmit the result of the task to the nearby relay in advance according to the prediction in the proposed scenario. When the user request the task results, the user can get the correspond results from the nearby relay without waiting. The waiting latency is mainly caused by computation latency at the BS and transmission latency from the BS to the relay. We reduced waiting latency of the user by increasing the predictive cache hit rate. So, the latency reduction can be expressed as The higher L i re indicates higher cache hit rate.

Problem formulation
The problem in this study consists of two subproblems: maximizing cache hit ratio and minimizing request latency.

Maximizing cache hit ratio
The problem of maximizing the cache space of the BS can be translated into the problem of maximizing cache hit rate, which can be written as where L is the maximum cache space of BS. C 1 indicates that the number of files cached at the BS which cannot exceed the cache space limit of the BS. Similarly, P hit gets higher while L re = L i re gets higher.

Minimizing request latency
When there are multiple relays around the user, the BS needs to consider which relay should send the results after predictions, so that the user would take less time to get the file results. From (6), we can see that by considering the transmission channel condition and bandwidth between the relays and the user, the user MD n would take less time to get the results of task T i . For the considered system, the goal of latency in the process of contents placement can be expressed as The conventional approach for predictive modeling involves complex feature engineering and deep analysis of the data by hand. In this paper, we aim to improve hit rate by learning user preferences through large historical data. A deep neural network (DNN) [26][27][28] based predictive framework needs to be developed to learn user preferences. We can feed the data directly into the networks without manual processing and can mine more information from the data. So we use DNN to solve the problem of content popularity prediction, which is given as follows.

The proposed problem solution method
In this paper, we propose a framework to solve the problem of the contents prediction and contents caching in MEC system. Specifically, BS adopts DNN to make cache prediction after collecting the users' information and cache the corresponding files according to the location of the users and the BS. The proposed framework is described in Algorithm 1.

Neural network for request prediction
Firstly, users need to send requests to the nearby BSs when they perform online activities, such as online browsing and shopping. The BS can observe and record historical information about the users' behaviors of sending requests. In our proposed system model, we use the BS to collect user information, which includes data of user attributes and request contents. Through the collection of these information by the BS, we can mine the users request behavior.
Secondly, we need to pre-process the user information after collecting. When there are categorical features in the users' and items' information dataset, the values of categorical features are generally discrete rather than continuous. We then need to digitize the categorical features. Generally, the categorical features are converted into one-hot encoding when dealing with such categorical features. However, because of the long categorical features, this approach has some problems, such as sparse encoding and huge input dimensions. Therefore, in this paper, when preprocessing the users and items information dataset, we convert these categorical features into numbers, which are used as the index of the embedding matrix of size (K, L), where K is the number of categorical features and L is the size of the input layer. The embedding matrix is used before the training phase to transform the input of a positive integer into a fixed size vector.
We then generate the training and testing samples based on the users and items data after pre-processing. Figure 2 shows the proposed contents prediction framework. We use two independent neural networks to extract user features {u 1 , u 2 , . . . , u N } and file features {r 1 , r 2 , ..., r I }, and then input the proposed user and file features into the next network. Next, we obtain the feature matrices of MDs and contents of size (1, L). The activation function of the output layer is ReLU which from [29] is generally defined as The preference values of the MDs is generated by the inner product of feature matrices. The problem in this work is that we need to predict not a pre-defined category, but an arbitrary real number. The neural networks solving regression problems generally have only one output node, and the output value of this node is the predicted value. The loss function can well reflect the gap between the trained network model and the actual data. We gradually adjust the trained network by calculating the loss function to make the loss smaller and make the prediction model more accurate. For regression problems, we use mean squared error (MSE) in this paper which is the expected value of the square of difference between the estimated value and the true value. The loss function of MSE can be obtained from [30] as where y is the true value and f (x) predicted value of the model. Finally, we can use the trained network to evaluate the new computational tasks. If the new tasks have high popularity, it indicates that the tasks match the preference of most users around the BS and there are high probabilities that the users will request the tasks in the future, which can be computed and cached in advance by the base station with MEC server.
In this paper, the proposed prediction framework uses the full-connection networks of DNN to learn the users preferences. The computational complexity of the proposed prediction framework is from the matrix operation of the full-connection networks. Therefore, the total computational complexity of the proposed framework is O(L J j=1 K j−1 K j ), where L is input dimension, J is the number of network layer and K j represents the neural size at the jth layer (1 ≤ j ≤ J).

Caching strategy
After predicting the contents and computing the results at the BS for the considered system, we perform a relay selection to the results which come from the BS. We propose a cache policy to store the results, and then the users are able to get the requested task result in a shorter amount of time. For the task result T i , the BS selects relay according to the channel condition between the relay and the user who requests the task T i . When there are M relays around the user MD n who request the task T i , the transmission date rate can be obtained from (6)  Obtain cache policy according to (13)

Results and discussion
In this section, we present the performance of the proposed framework. We used the MovieLens dataset containing over 100,000 ratings from 6000 users on almost 4000 movies. Considering that the users' ratings reflect the users preference level for the movie, we use the number of high rating as the number of users who requests for the movie. A movie with higher ratings means that more users prefer this movie. High rating movies are more likely to requested in the future those of low ratings. The framework of training phase of the MovieLens dataset can be described in Fig. 3. In addition, we consider the channels following the Rayleigh flat fading in the considered system [31,32]. We assume that all the computational tasks have the same size. The transmit power at the BS and relays are set to 10 W and 5 W, respectively. The bandwidth between the BS and relays  Table 1. The training process of the proposed prediction framework is shown in Fig. 4, which can indicate that the loss decreased with the number of training steps increasing and the loss finally reaches convergence. Figure 5 shows the performance comparison of the cache hit rate, where the cache size of the BS ranges from 50 to 500. We can observe from Fig. 5 that the cache hit rate increases as the cache size of the BS increases. The reason is that the BS can store more computation tasks and improve the probability of hitting the requests. In addition, for comparison, we plot the strategy of "Random, " which indicates that the BS stores tasks randomly without considering users' preferences. The cache hit rate of the proposed framework is higher than "Random" when the number of users is 3400. For example, when the cache size is 150, the cache hit rate of the proposed framework is about twice as high as random caching. This is because that the proposed framework can learn users' preferences after training and select tasks that the users around the BS are most likely to request. When the number of users becomes smaller, more users' requests can be satisfied when there is a certain amount of cache space. So, for the proposed framework, the cache hit rate of 1700 users is higher than the users number of 4100 and 3400. On the contrary, only more cache space is available to meet more user requests when the number of users is large. These results verify that the proposed framework can predict the users' preferences accurately when cache size varies from 50 to 500. Figure 6 shows the performance comparison of the cache hit rate, where the number of users is ranges from 500 to 3000. We can observe from Fig. 6 that the cache hit rate Table 1 The parameters of simulation increases as the number of users increases. The reason why the cache hit increases with number of users is that we cache the tasks with higher predicted popularity when the storage space is certain. The tasks with higher popularity are wanted by more users. So, we have a higher probability of hitting users' requests when the number of users is large. In addition, the hit rate of the proposed prediction strategy is still higher than that of random caching when the cache size of the BS is the same. For example, when the number of the users is 2500, the cache hit rate of the proposed prediction strategy is about 25% higher than the random strategy. This is because the BS can store the tasks with higher predicted popularity after learning. With the increase in storage space, the BS is able to store more tasks for a certain number of requests from users, and this situation increases the probability of satisfying user requests. So, when the cache sizes are 450 and 500, the cache hit rates of both schemes are higher than the scheme where the cache size is 400. This further demonstrates that the proposed strategy is effective in the process of prediction caching. Figure 7 shows the performance comparison of the cache hit rate, where the number of tasks ranges from 500 to 1500. From Fig. 7, we observe that the cache hit rate decreases as the number of tasks increases. The reason behind this is that the number of tasks the user requests may exceed the BS's limited cache space. Moreover, when they have the same cache size of the BS, the hit rate of the proposed prediction caching is better than that of random caching. For example, when the number of tasks is 700, the cache hit rate of the proposed prediction caching is about twice as high as the random strategy. In addition, we find that the size of cache space of the BS is also related to the cache hit rate. When the cache sizes are 450 and 500, the cache hit rates of both schemes are higher than that of the scheme with cache size 400. This is because that limited cache space can only store a fixed number of tasks and satisfy a limited amount of user requests. The results prove that the cache hit rate is affected by the number of tasks and cache space. Figure 8 demonstrates the performance of the latency reduction, where the cache size of the BS ranges from 50 to 500. For comparison, we plot the presentation of "Random" and "Without prediction, " where "Random" indicates that the BS selects tasks randomly, while "Without prediction" indicates that the BS can not store tasks. We observe from Fig. 8 that the latency reduction increases as the cache size of the BS increases. This is because the BS can store more computation tasks and compute them during free time. The more tasks being stored at the BS, the more time the considered system can save. Particularly, the latency reduction of the proposed predictive caching is higher than those of both the "Random" and "Without prediction" strategies when the number of users is 3400. From Fig. 7, we can see that when the number of users is 3400 and the cache size is 200, prediction caching's L r e is about 70% higher than that of 'Random. ' This is because the BS stores and computes the tasks in advance, which can reduce latency of computation and transmission. The proposed strategy can select tasks according to user preferences while "Random" can not. So, when the number of users becomes smaller, the latency reduction for the whole system is lower than that of the other prediction scheme with high users number. On the contrary, the more cache space the BS has, the more latency is reduced. These results verify that the proposed strategy can reduce the system latency effectively when the BS has different sizes of cache space. Figure 9 demonstrates the performance of the latency reduction, where the number of users ranges from 500 to 3000. We observe from Fig. 9 that the latency reduction increases as the number of users increases. The reason is that more users means more requests sent by the users, so when the BS can store computation tasks and compute them during free time, the latency reduction of the considered system increases. Particularly, latency reduction of the proposed predictive caching is the highest when comparing with "Random" and "Without prediction" strategies, where the cache size of the BS is 400. For example, the prediction caching saves the most time when the number of users is 2500, which is about 20% higher than "Random. " This is because the considered system predicts and stores the contents most likely to be requested by users, effectively reducing latency. In addition, when the cache size of the BS are 450 and 500, the latency reductions of both schemes are higher than that of the scheme with cache size 400. So, when the cache space of the BS becomes bigger, the latency reduction for the whole system is higher than those of the other prediction schemes with less cache space. Therefore, the more cache space the BS has, the more latency is reduced. The results prove that latency reduction is affected by the number of users and cache size. Figure 10 demonstrates the performance of the latency reduction, where the number of tasks ranges from 500 to 1500.From Fig. 10, we observe that the latency reduction increases as the number of tasks increases. The reason is that when the BS stores computation tasks and computes them in advance, the users of the considered system experience less latency to obtain the results they want. Particularly, latency reductions of "Random" and "Without prediction" strategies are lower than that of the proposed predictive caching when the cache size of the BS is 400. For example, the prediction caching saves the most time when the number of users is 1100, which is about 55% higher than "Random. " This is because by predicting, storing, and computing the contents most likely to be requested by the users, the proposed system takes less amount of time. In addition, the cache size of the BS is related to the latency reduction. when the cache size of the BS is 400, latency reduction is lower than those of schemes with cache sizes 450 and 500, respectively. This is because bigger cache size means that more tasks are being computed at the BS after prediction, and the system can satisfy more requests sent by the users. So, the latency reduction becomes higher when the cache size becomes larger for the whole system. This further confirms that the proposed predictive caching is effective. Figure 11 shows the performance comparison of the latency L pl , where the cache size of each relay ranges from 50 to 150. For comparison, we plot the presentation of "LCD" and "Worst, " where "LCD" indicates the placement strategy of largest contents diversity, while "Worst" indicates the tasks results placed on the relay with the worst channel. We can observe from Fig. 11 that the latency L pl increases as the cache size of each relay increases. The reason is that the relay can store more task results and the users can get more tasks results from relay. So the considered system spends more time. In addition, the latency L pl of the proposed placement strategy is lower than those of "LCD" and "Worst". Comparing these three methods, we can see that when the cache size of each relay is 110, the proposed placement strategy has the lowest cost, which is about 50% lower than "LCD" and about 70% lower than "Worst. " This is because that we put the task results in the relay with the channel best connected to the intended user. These results confirm that the proposed placement strategy can reduce system latency effectively.

Conclusions
In this paper, we considered the problem of predictive caching in the proposed MEC network. The problem of the contents prediction and contents caching are solved by the proposed framework in the considered MEC system. We used the cache hit rate and latency to measure the performance of prediction. Specially, we used neural network for request prediction which was trained to learn users' preferences. The cache policies were obtained by the channel conditions between the relays and users. Simulation results were shown to prove that the proposed framework could improve the cache hit rate and reduce system latency. In the future, we will continue to focus on the cache prediction and explore other wireless technologies to extend our MEC model. Moreover, we will incorporate some other intelligent algorithms such as the deep learning based algorithms [33][34][35], or the federated learning based algorithms [36][37][38], into the considered systems, in order to further enhance the MEC system performance.