Computing resource allocation scheme of IOV using deep reinforcement learning in edge computing environment

With the emergence and development of 5G technology, Mobile Edge Computing (MEC) has been closely integrated with Internet of Vehicles (IoV) technology, which can effectively support and improve network performance in IoV. However, the high-speed mobility of vehicles and diversity of communication quality make computing task offloading strategies more complex. To solve the problem, this paper proposes a computing resource allocation scheme based on deep reinforcement learning network for mobile edge computing scenarios in IoV. Firstly, the task resource allocation model for IoV in corresponding edge computing scenario is determined regarding the computing capacity of service nodes and vehicle moving speed as constraints. Besides, the mathematical model for task offloading and resource allocation is established with the minimum total computing cost as objective function. Then, deep Q-learning network based on deep reinforcement learning network is proposed to solve the mathematical model of resource allocation. Moreover, experience replay method is used to solve the instability of nonlinear approximate function neural network, which can avoid falling into dimension disaster and ensure the low-overhead and low-latency operation requirements of resource allocation. Finally, simulation results show that proposed scheme can effectively allocate the computing resources of IoV in edge computing environment. When the number of user uploaded data is 10K bits and the number of terminals is 15, it still shows the excellent network performance of low-overhead and low-latency.


Introduction
In recent years, the automobile industry has brought tremendous changes to people's lives under the impetus transformation of information and communication technology.The applications equipped on vehicles can provide drivers and passengers with more useful help information, such as safety information, surrounding environmental conditions and traffic information [1][2][3].
The emergence of Internet of Vehicles (IoV) can integrate information provided by multiple applications to solve many problems in transportation [4][5][6].The IoV network uses vehicles as basic information unit.Within a certain communication range, road entities such as pedestrians, vehicles and roadside facilities are connected to traffic management network by sensor technology, information acquisition technology, access technology, transmission technology and networking technology.The mobile network is connected to the backup network, which serves applications such as vehicle safety, traffic control, information services and user network access.It aims to establish an intelligent comprehensive network system that improves traffic conditions and travel efficiency, and expands information interaction forms.
The traditional IoV network communication can only meet part of the network needs of vehicle users, and is mainly suitable for some applications with small calculation amount and low delay sensitivity [7].The intelligent technology of automobiles is gradually being widely used and promoted according to the popularization and development of IoV technology at this stage.With the development and popularization of fifth-generation communication technology, IoV market has also spawned a large number of new service applications (such as unmanned intelligent driving), which have more stringent requirements for network bandwidth, offloading delay, etc. [8,9].Therefore, the traditional IoV communication has been unable to meet the current operating requirements, which brings huge challenges to IoV in terms of computing and communication capabilities.
In recent years, Mobile Edge Computing (MEC), as a key 5G technology, is of great significance to alleviating the congestion of cloud network or the core layer of data center in IoV.MEC deploys computing and storage resources at the network edge to provide IT services and cloud computing capabilities for mobile networks.It can greatly accelerate the execution speed of computing tasks [10,11], solve the problem of insufficient computing resources for vehicle itself, and provide users with ultra-low latency and high bandwidth network service solutions.
Task offloading is one of the key technologies of MEC.Scientifically and rationally, part or all of the computing tasks of in-vehicle devices are handed over to edge computing server for processing, which can effectively solve the problems of in-vehicle devices in terms of resource storage, computing performance and energy efficiency, which reduces communication and calculation delay.This in turn enables real-time operation of IoV network and higher responsiveness [12,13].But at the same time, it should be noted that the complex network scenarios of IoV also bring many problems to MEC technology application.The high-speed mobility of vehicles and the diversity of communication quality in IoV make computing task offloading strategies more complicated.Thus, the research on offloading decisionmaking and execution resource allocation has become a key issue that urgently needs to be solved in vehicle edge computing.

Methods
The deep integration of IoV and MEC technology, relying on a new generation of information and communication technology to build a new format of intelligent vehicles, to achieve friendly information interaction between vehicles and the outside world, can support the development needs of next generation for "vehicle connected everything" [14].However, with the development of intelligent and informatization of IoV, the application of in-vehicle terminals has gradually developed towards multimedia entertainment, which has caused an explosive growth of task data.This has put a heavy pressure on the scarce network resources [15].Therefore, for the limited resources of IoV, rational allocation of vehicle's own resources can be more effective in realizing that IoV also meets the efficient network computing capabilities when vehicles are running fast, providing the quality of user experience and improving traffic efficiency.
The in-depth integration of IoV and MEC technology relies on a new generation of information and communication technology to build a new format of smart vehicles.This can realize friendly information interaction between vehicles and the outside world, and can support the development needs of next generation for "car-connected everything" [14].However, with the development of intelligent and informatization of IoV, the application of in-vehicle terminals has gradually developed towards multimedia entertainment, which has caused an explosive growth of task data.This has put a heavy pressure on the scarce network resources [15].Therefore, for the limited resources of IoV, rational allocation of vehicle's own resources can be more effective in realizing that IoV also meets the efficient network computing capabilities when vehicles are running fast, providing the quality of user experience and improving traffic efficiency.
The joint management of wireless networks and computing resources is the key to achieving high efficiency and low latency in IoV networks.The network architecture in which MEC server and wireless access point coexist promotes the realization of related technologies [16].For the resource management and offloading decisions of MEC system, scholars have launched corresponding researches.Literature [17] proposed a convex optimization problem to minimize the total energy consumption of mobile devices.The optimal strategy for controlling the size of offloaded data and time allocation had a simple threshold-based structure.The offloading priority function was derived based on channel conditions and local calculation energy consumption, and the full offloading and minimum offloading are performed respectively based on a given threshold.Literature [18] used dynamic voltage and frequency scaling techniques to minimize local execution energy consumption for tasks with strict execution deadlines, and used data transmission scheduling to optimize the energy consumption of computing offload.Literature [19] proposed an end-to-end communication task offloading framework based on network assistance, which can realize resource sharing among mobile users.Literature [20] proposed a cooperative downloading scheme to offload traffic from cellular networks by VANETs.Appropriate data was obtained from cellular network, and the data is distributed to vehicles in an approximately optimal way, and a storage time aggregation graph for planning data transmission was designed.Literature [21] proposed a cloud-edge-based MEC vehicle network offloading framework, which reduces the time consumption of computing tasks and the impact of vehicle mobility.
The existing traditional optimization algorithms are feasible to solve the problems of MEC computing offloading and resource allocation.But it should be noted that the time slot interval divided by MEC system is very small.Traditional optimization algorithms generally require complicated operations and iterations to obtain optimization results.Thus, traditional optimization algorithms are not very suitable for high realtime MEC systems.
Reinforcement Learning (RL) is very suitable for solving decision-making problems, such as computational offloading decision [22].The RL algorithm can create experience to learn and complete the optimization goal by a trial-return feedback mechanism that is different from traditional optimization algorithms.The deep learning algorithm can learn the characteristics of historical data, and after the training is completed, it has a great efficiency improvement compared with traditional optimization algorithms.If you use traditional algorithm data for training, you can combine the advantages of two.Literature [23] proposed a distributed wireless resource allocation based on multi-agent theory and reinforcement learning algorithm.This allowed devices to independently select resource blocks and power levels, ensuring that network system had low complexity and signaling overhead.Literature [24] developed an optimal and adaptive vehicle cloud resource allocation model for car networking systems based on Semi Markov Decision Process (SMDP) and reinforcement learning algorithms.It considered the balance between IoV network resource costs and system revenue, make optimization decisions on IoV network service quality and vehicle user experience quality to optimize the total system overhead of IoV network.Literature [25] proposed a new architecture that combined with reinforcement learning algorithms to dynamically orchestrate edge computing and cache resources.It improved the practicability of system and maximized its utility.Literature [26] proposed a task scheduling and resource allocation model based on hybrid ant colony optimization and deep reinforcement learning.This model took the shortest overall task completion time and highest utilization rate of idle resources as goals.The space complexity is reduced and network performance is improved by using weighted values to construct a binary ordered traversal tree and deep reinforcement learning algorithm.
In this paper, oriented to the precise needs of mobility characteristics and task allocation for IoV users, drawing on the existing task management research of MEC, this paper proposes a computing resource allocation scheme using deep reinforcement learning in edge computing environment.The main contributions of this paper are as follows: 1) In order to clarify the mathematical model of MEC task distribution algorithm proposed in this paper, this paper considers the computing power of service nodes and vehicle speed on the basis of determining system network model, computing model and communication model of task offloading and resource allocation.The cache capacity of service nodes is a constraint.Moreover, a mathematical model of task offloading and resource allocation is established with the minimum total computing cost of system as objective function.2) In order to achieve fast and efficient vehicle network computing resource allocation and avoid the limited dimensions of traditional Q-learning network solving task resource allocation algorithm, this paper proposes a task computing resource allocation scheme based on deep Q network.This scheme uses experience replay method as the training method to solve the instability of Q-learning network due to nonlinear approximation function.It realizes the optimal allocation of task resources, so that resource allocation can improve the corresponding operating speed while ensuring low overhead.
The rest of this paper is organized as follows.Section 3 introduces the vehicle network resource allocation system model and the corresponding mathematical calculation problem description.Section 4 introduces the task distribution and offloading based on DQN algorithm.Section 5 builds simulation scenarios based on related protocols to verify the performance of proposed method.Section 6 concludes the paper.

System model
This paper analyzes the corresponding resource allocation scheme based on a vehicle cloud collaborative edge cache model as the network model.The specific vehicle network model is shown in Fig. 1.In this model, there are L RSUs deployed around the road, denoted as ℒ = = {ℳ 1 , ℳ 2 , ℳ 3 , ⋯, ℳ L }, and each RSU is equipped with an MEC server.The Poisson distribution is suitable for describing the number of random events in unit time (or space).Therefore, it is assumed that N vehicles on the road have a Poisson distribution [27], which is expressed as V ¼ fv 1 ; v 2 ; v 3 ; ⋯; v N g .Since both MEC server and neighboring vehicles have computing and caching capabilities, they are collectively referred to as service nodes W ¼ fw 1 ; w 2 ; w 3 ; ⋯; w M g. n vehicles are randomly distributed within the coverage area of each RSU, that is, the set of vehicles within the coverage area of RSU or the service area of ℳ j is V j ¼ fv 1 ; v 2 ; ⋯; v n g.The vehicle 802.11pOBU has an 802.11p network interface and a cellular network interface.Vehicles can offload tasks to MEC servers for calculation by RSU, or offload to neighboring vehicles for V2V communication.In order to effectively reuse spectrum, V2I mode and V2V mode work in the same frequency band.The spectrum is evenly divided into K sub-channels, denoted as K¼ ¼ f1; 2; 3; ⋯; K g , and the bandwidth of each sub-channel is B Hz.The vehicle offloading strategy set is expressed as A¼ ¼ fa 1 ; a 2 ; a 3 ; ⋯; a N g, if a i = 1, it means v i , and the task is offloaded to service nodes for calculation.If a i = 0, it means that v i will perform computing tasks locally.Assume that at t, there are some tasks in buffer pool.When vehicles have a task request, if the task is cached on service nodes, service nodes inform vehicles that the task exists on service nodes.When the calculation of service nodes is completed, it is directly sent back to vehicles.In this way, the vehicle does not need to perform task offloading operations, which can effectively reduce the energy consumption of mobile devices and the delay of task offloading.If there is no cache for requested tasks on service nodes, the vehicle needs to make an offloading decision and further resource allocation.When the service node completes requested tasks for the first calculation, it considers the cache decision.
The cache strategy set of service nodes w m is denoted as G m ¼ ¼ fg m;1 ; g m;2 ; g m;3 ; ⋯; g m;n1 g.If g m, n1 = 1, it means that service node w m will cache computing task n1.This allows the next request to reduce network transmission and reduce calculation delay.
The cache collection of all service nodes is denoted as AG¼ ¼ fG 1 ; G 2 ; G 3 ; ⋯; G M g.

Computing model
Based on the system model built above, it is assumed that each task requesting vehicle has a computing task Z ¼ fd i ; s i ; t max i g, i ∈ N to be processed.Where d i represents the input size of task Z i .s i represents the number of CPU cycles required to complete computing task Z i .t max i is the maximum delay that computing task Z i can tolerate.The vehicle can offload tasks to MEC servers for calculation by RSU, or offload to neighboring vehicles for processing, or execute on local vehicles.
For offloading computing, when the limited computing power of vehicle itself is not enough to support the time delay requirement of tasks, the task needs to be offloaded to service nodes for calculation.The task processing process will inevitably bring time delay and energy consumption.Since the data volume of processing results returned is small, the delay and energy consumption of return process are ignored, and only the upload delay, calculation delay and transmission energy consumption are considered [28,29].
In this paper, the task request vehicle to offload tasks to service node w j calculation process is defined as the weighted combination of delay and energy consumption, expressed as: where α and β respectively represent the weighting factors of non-negative delay and energy consumption, and satisfy α + β ≤ 1.
represents the sum of offloading delay and calculation delay.f i j represents the computing resources allocated by service node w j to vehicle v i .e off i ¼ p i d i r i; j represents the energy consumption of transmission process.For local calculations, suppose that the computing power of vehicle v i is F l i , and the computing power of different vehicles is different.When vehicle task Z i is calculated locally, the cost that vehicle v i needs to bear is: where is the time delay required for calculation.e l i ¼ φs i ð F l i Þ 2 represents the energy consumption to perform tasks.φ is the power coefficient of energy consumed per CPU cycle [30].

Communication model
When the traditional orthogonal multiple access technology is applied in MEC system, each terminal user has a one-to-one corresponding transmission channel to ensure stable signal transmission.The delay T OMA v in completing task offloading in this scenario is expressed as follows: where p OMA v represents the transmission power of user v. h v represents the channel gain between users and edge servers.p v represents the noise interference power of users.B represents the channel transmission bandwidth of users.Thus, the total time delay T OMA to complete the offloading of all vehicle users is expressed as: In a communication network based on hybrid NOMA-MEC, this system can allow multiple vehicle users to complete task transmission and offloading in the same time slot or frequency band.Suppose there are two car network users m and n requesting task offloading at the same time, D n ≥ D m , m, n ∈ {1, 2, …, v}.Thus, in this mode, users m and n can simultaneously offload tasks to MEC servers in time slot D m .The transmission power of vehicle users m and n are p OMA m and p OMA n respectively.It should be pointed out here that if the information of user m is decoded in the second stage of serial interference cancellation, the performance of user m is same as OMA.Therefore, the transmission delay of user m will not be affected [31].The expression of user n transmission rate R n in time slot D m is: where p NOMA nm represents the transmission power of vehicle user n in time slot D m .h m and h n represent the channel gains of vehicle users m and n respectively.
The task offloading of end users by NOMA will generate more energy consumption than OMA mode [32].Therefore, this paper uses a hybrid NOMA-MEC method to offload the tasks requested by mobile terminal users.The specific steps are: firstly, user m and user n perform task offloading at the same time within time where p NOMA nn represents the transmission power offloaded by vehicle user n in the second part.The time delay T m of actual offloading for vehicle user m is expressed as:

Problem description
When a smart vehicle requests a task calculation, it first checks whether there is a content cache in its own buffer pool.If the content is available locally, there is no need to post a task request.Otherwise, scan the surrounding service node to see if there is a content cache, and if it exists, it will be returned after the service node calculation is completed.If it does not exist, you need to consider whether to offload.
After the task is offloaded to service nodes and the calculation is completed, service nodes consider the update of cache.After the content is returned, the service ends.This paper aims to minimize system overhead through proper offloading and caching decisions, as well as the allocation of communication and computing resources.Thus, the optimization goal is expressed as: min A;C;P;ℱ ;AG C8 : where A represents the offloading decision set of all task request vehicles.C represents the channel allocation status; P is the task transmission power set of offloaded vehicles.ℱ is the computing resource allocation strategy, and AG represents the cache decision of service nodes.
In equations ( 9) to ( 16), constraints C1 and C3 indicate that the offloading decision is a 0-1 decision.C2 indicates that the channel allocation matrix is a binary variable.C4 ensures that the power distribution is non-negative and does not exceed the range of uplink transmission power.C5 and C6 indicate that the computing resource allocation does not exceed the maximum computing capacity of service nodes.C7 represents the delay constraint, where L j is the coverage of RSU j and V u is the moving speed of vehicle requested by tasks.V v is the moving speed of service vehicles, and d interrupt is the maximum interruption distance.C8 indicates that the cache content of service nodes cannot exceed its maximum cache capacity.

Offloading decision based on deep reinforcement learning
As an optimization problem, IoV network resource allocation problem is essentially a mixed integer nonlinear programming model.Traditional optimization algorithms are used to solve the model has the problem of obtaining sub-optimal solutions [33,34].In order to achieve fast and efficient mathematical model solving, this paper uses deep Q network to calculate nonlinear mathematical problems.This can avoid the danger of traditional Q-learning network easily falling into a dimensional disaster, so that the vehicle network resource allocation can improve the corresponding operating speed while ensuring low overhead.

Q-learning Network
Q-learning is a classic reinforcement learning algorithm, that is a method of recording Q-value.Each state and action group has a value Q(s, a).For each step, the agent calculates and stores Q(s, a) in Q table.This value can be regarded as the expectation of long-term return, Q(s, a) update formula can be expressed as: where (s, a) is the current state and action; (s ′ , a ′ ) is the state and action of next time slot.This paper defines γ as the learning rate, and γ is a constant that satisfies 0 ≤ γ ≤ 1.It is worth noting that if γ tends to 0, it means that the agent mainly considers current instantaneous return.If γ tends to 1, it means that the agent is also very concerned about future returns.For each step, iterate the value of Q(s, a).In this way, we can get the optimal A.
Algorithm 1 shows the corresponding operation process Q-learning algorithm.

Offloading decision algorithm based on DQN
In order to further reduce the amount of calculation of IoV network computing resource allocation and improve the real-time performance of algorithm, Deep Qlearning Network (DQN) approximate estimation Q(s, a) is used.It realizes the traversal of enough sample states to make the algorithm meet the needs of actual engineering environment.
DQN algorithm enables V-UEs to dynamically make the best offloading decision based on their behavior and the behavior of edge cloud.This process is formulated as a limited Markov Decision Process (MDP).It is defined as a tuple M = (S, A, R), where S and A represent state and behavior spaces.R(s, a) represents the timely reward for performing action a in state s.π is a strategy that matches a behavior a from a state s, such as π(s) = a.The main goal of V-UEs is to find the optimal strategy π * to minimize the utility obtained by users, thereby minimizing energy consumption and delay.
State space S is the number of task offloading requests Q u of V-UEs and the size of remaining tasks in edge cloud Q c .The distance D between V-UEs and the edge cloud consists of three parts, which are defined as follows: Behavior space A is expressed as: where a 0 represents the task sequence processed locally; a x represents the sequence offloaded to edge cloud.a max is the maximum number of tasks that are processed locally or offloaded to the cloud in each decision cycle.The total number of tasks for each behavior is less than or equal to the number of tasks currently staying in user queue.
The instant return is the cost of V-UEs making the optimal offloading decision in each system state.Thus, the instant reward matrix R(s, a) for a given behavior a in state s is: where U(s, a) and C(s, a) are instant utility matrix and cost matrix respectively.For immediate utility, it can be expressed as: where ρ is the utility constant.Correspondingly, C(s, a) cost matrix can be expressed as: where η 1 and η 2 are constants.E(s, a) and T(s, a) energy consumption and delay matrices respectively, expressed as follows: Q matrix is an online learning scheme of model-free deep learning algorithm In this scheme, V-UEs select behavior a t for potassium planting in state s t at time step t to minimize the immediate future return [35].Q matrix can be expressed as: where r t is the minimum reward for adopting an offloading strategy π after performing behavior a in state s at time step t.E[⋅] represents the expectation function; γ is the attenuation coefficient.Q matrix is a neural network approximator Q(s, a; θ), θ is a weighting factor.In each decision cycle, state vector S = (Q u , Q c , D) taken by V-UEs for the first time is used as the input of Q matrix, and all possible behaviors A are used as the output of Q matrix.Then V-UEs select the behavior according to ε − greedy method.In addition, Q matrix is iteratively adjusted θ to minimize the loss function.Therefore, the loss function at time step t can be defined as: In other words, given a converted 〈s t , a t , r t , s t + 1 〉 weight factor θ, Q matrix is updated by minimizing the square error between the current predicted Q value Q(s t , a t ) and the In addition, the empirical replay method is used as a training method to solve the instability of Q network due to the nonlinear approximation function in DQN.More specifically, user experience e t = 〈s t , a t , r t , s t + 1 〉 is stored in the memory Ω = {e t − ψ , …, e t }.At each time step t, a random mini-batch conversion is selected from memory to train Q network instead of the most recent conversion e t .
Figure 2 shows the corresponding DQN-based offloading decision algorithm flow chart.From Fig. 2, we can see that the algorithm steps 2-4 are recursion.Q value is estimated according to Q network, and the offloading decision action made by users at the beginning of each decision period is presented.Steps 5-7 use the experience replay method to train Q network.

Simulation setting
In this section, MATLAB simulation platform is used to verify the efficient performance of proposed resource allocation mechanism in DQN algorithm-based vehicle network under edge computing environment.This experiment is carried out in the context of IEEE 802.11p vehicle network scene standard and MEC white paper, using the channel gain model proposed in 3GPP standardization.The simulation scenario is set to a one-way straight road, and vehicles running on the road can communicate with roadside base stations as well as vehicle-tovehicle communication.The purpose is to simulate proposed MEC task distribution algorithm based on deep reinforcement learning and evaluate the performance in different situations.This paper mainly considers 3 communities along the roadside.Each cell is equipped with RSU and MEC server, and the coverage radius of RSU is 500 meters.The specific simulation parameters are shown in Table 1.

Algorithm sensitivity analysis
In order to verify the superiority of proposed method for the allocation of computing resources in IoV tasks, a discussion and analysis are carried out from two aspects: the total system computing overhead and time delay.Then it achieves the superior performance of proposed method in this paper with low overhead and high real-time in task allocation.

Sensitivity analysis of total system overhead
In this paper, two basic methods, "full local calculation" and "full offload calculation", are compared and verified with proposed method.Discuss and analyze the relationship between number of users, the computing capacity of servers, and the volume of uploaded data and the total computing overhead of system."Full local calculation" means that all users choose local calculation."Full offload calculation" means that all users choose to offload calculation.At this time, the computing resources of MEC servers are equally distributed to each user.Figure 3 is a graph showing the relationship between total expenditure and the number of users.On the whole, when the number of users continues to increase, the total cost of the three methods is on the rise.
In Fig. 3, the performance of proposed DQN method is relatively stable and can achieve the best results.When the number of users reaches 15 cars, the total system overhead can still be kept at a low level compared with the comparison method.Among them, there is almost no difference between the curve of full offloading method and DQN when the number of users is 4.But when the number of vehicles increases, the total cost increases rapidly.The analysis believes that when the number of users increases and all of them choose to offload computing, MEC servers with limited computing resources cannot provide sufficient computing resources for each user, which reduces the overall performance.
Figure 4 is an analysis diagram of the influence of computing capacity for MEC servers on weighted total overhead.It can be seen from the figure that as the computing capacity of servers increases, for the total system overhead, the method proposed in this paper can always maintain a lower level than the comparison method, and has obvious advantages in computing performance.
It can be seen from Fig. 4 that the more special one is the all local calculation curve, and the weighted total overhead does not change with the calculation capacity of MEC servers.Obviously, this is because the number of computing resources of MEC servers has no effect on the local computing process.The other two curves show a downward trend as F increases.This is because the larger F is, the server can allocate more computing resources to users, thereby reducing processing time and energy consumption.The curve of DQN method proposed in this paper is always at the bottom and performs best.
Figure 5 shows the performance of various algorithms under different upload data volume conditions.It can be seen from Fig. 5 that as the size of uploaded data increases, the curves of all algorithms show an upward trend.Because a larger amount of data means more time to upload and process data, this process also increases energy consumption correspondingly, leading to an increase in the total system overhead.According to Fig. 5, DQN method we proposed has the best effect because it rises the slowest among these three lines.The upward trend of all locally calculated curve is much higher than other two curves, and the performance gap with other two algorithms is getting bigger and bigger.

Sensitivity analysis of system time delay
For the distribution of computing tasks in IoV, the delay is also an important indicator to measure the quality of resource allocation.In order to prove that proposed algorithm can further meet the needs of practical engineering applications, the algorithm of literature [25] and the algorithm of literature [26] are selected here as a comparison method and the method proposed in this paper is compared and verified.
Figure 6 is a simulation result of the number of users requesting task offloading and the total time delay of task offloading.Compared with literature [25] and literature [26], DQN algorithm proposed in this paper has a slower increase in time delay.Besides, when the number of users reaches 15 and the offloading delay reaches the upper limit of 235ms, it has obvious advantages in fast calculation.
It can be seen from Fig. 6 that as the number of users increases, the total delay of task offloading also gradually increases.At the same time, the total delay gap of task offloading under different modes has gradually increased.The reason for the above phenomenon is that when the number of users requesting task offloading is small, the channel resources in the three modes are relatively sufficient, which can satisfy users to perform offloading at the same time.However, with the further increase in number of users, the problem of insufficient channel resources has gradually emerged.The users in literature [25] and literature [26] need to perform task offload sequence, and wait for other users to complete tasks before offloading.The offloading strategy method proposed in this paper can satisfy more users to offload tasks at the same time under limited channel resources.Figure 7 is a simulation diagram of task offloading delay and data size for a single user in different modes.According to the simulation results, it can be found that the data size of user task offloading is linearly positively correlated with the offloading delay.In the three offloading modes, when the size of offloading tasks is the same, there is no big difference in offloading delay.The reason for the above simulation results is that when a single user requests task offloading, the channel resources of communication network model are abundant, which can ensure that offloading requests are transmitted with the optimal channel bandwidth.
In summary, compared with other current task resource allocation methods, DQN algorithm-based task resource allocation method for IoV proposed in this paper has a good performance in edge computing environment.The algorithm not only guarantees the low-overhead computing performance of system, but also realizes lower-latency communication, which provides a better service experience for users in IoV.

Conclusion
The high-speed mobility of vehicles and diversity of communication quality in current IoV make offloading strategies for computing tasks more complicated.To solve the problem, this paper proposes a computing resource allocation scheme based on deep reinforcement learning network in MEC scenarios.Considering the computing power of service nodes and vehicle moving speed as constraints, the scheme builds a task resource allocation model in edge computing scenario with the minimum total system computing cost as objective function.In addition, deep Q learning network is used to solve the mathematical model of resource allocation, experience replay method is used to avoid dimension disaster and ensure the low- overhead and low-latency operation requirements of resource allocation.Simulation results prove that the proposed scheme still shows excellent network performance with low overhead and low latency when the amount of user upload data is 10K bits and the number of terminals is 15.
The future research will be to explore the platformization of our proposed method and strive to realize its commercialization.

Fig. 3
Fig.3Relationship between total cost and number of users

Fig. 4
Fig. 4 Relationship between total cost and server computing capacity

Fig. 5 Fig. 6
Fig. 5 Relationship between total cost and the size of uploaded data

Fig. 7
Fig. 7 Relationship between time delay and the size of uploaded data Zhang et al.EURASIP Journal on Advances in Signal Processing (2021) 2021:33 Zhang et al.EURASIP Journal on Advances in Signal Processing (2021) 2021:33 D m .Secondly, after user m completes task offloading, user n needs to continue the task offloading in OMA manner.It takes T re n to complete the offloading of this part of tasks, so total time delay T n of vehicle user n is:

Table 1
Experimental simulation parameter setting