Adaptive resource optimization for edge inference with goal-oriented communications

Goal-oriented communications represent an emerging paradigm for efficient and reliable learning at the wireless edge, where only the information relevant for the specific learning task is transmitted to perform inference and/or training. The aim of this paper is to introduce a novel system design and algorithmic framework to enable goal-oriented communications. Specifically, inspired by the information bottleneck principle and targeting an image classification task, we dynamically change the size of the data to be transmitted by exploiting banks of convolutional encoders at the device in order to extract meaningful and parsimonious data features in a totally adaptive and goal-oriented fashion. Exploiting knowledge of the system conditions, such as the channel state and the computation load, such features are dynamically transmitted to an edge server that takes the final decision, based on a proper convolutional classifier. Hinging on Lyapunov stochastic optimization, we devise a novel algorithmic framework that dynamically and jointly optimizes communication, computation, and the convolutional encoder classifier, in order to strike a desired trade-off between energy, latency, and accuracy of the edge learning task. Several simulation results illustrate the effectiveness of the proposed strategy for edge learning with goal-oriented communications.

will give rise to more autonomous, i.e., zero-touch, networks enabling a truly pervasive deployment of intelligent services, subject to a variety of constraints, in terms of learning and inference reliability, latency, and energy consumption.
The need to tightly control latency and limit energy consumption motivates the shift toward edge intelligence (EI) architectures [3], where the information exchange and processing are kept as local as possible. In the EI framework, every device may have access only to a tiny fraction of the data and low-latency inference/training tasks need to be performed collectively and distributively at the wireless network edge.
An efficient design of the EI platform calls for the adoption of a holistic approach, where communication, computation, learning, and control are jointly orchestrated to achieve new target levels of reliability, energy efficiency, and sustainability. This trend motivates the current widespread interest in distributed, low-latency and reliable machine learning (ML) tools, calling for a major departure from cloud-based, centralized training and inference. In EI, the mobile devices, also called user equipment (UE), need to perform AI/ML tasks by partially offloading their computations to edge servers (ESs), placed at the edge of the wireless network [4]. The overall system must be then designed in order to achieve an optimal balance between accuracy of the ML tasks and usage of the network resources, by dynamically allocating transmission and computational parameters, such as transmission rates and central processing unit (CPU) clock frequencies, as well as the scheduling of transmission and computation tasks, possibly under uncertainties about the wireless channel state and task arrival rates.
The resource optimization problems formulated in this scenario are mainly focused on the trade-off between energy, latency and learning accuracy [5][6][7][8][9]. However, looking at the predictions about the exponential increase in traffic in next-generation networks [10], it is evident that it is time to envisage a new communication paradigm able to support EI while preventing the data rate explosion. A possible paradigm shift in this direction may come from semantic and goal-oriented communications (GOC) [11]. In this new context, the focus is not anymore on the reliable recovery of the transmitted bits, but instead on the meaning (semantics) conveyed by those bits or the goal motivating the transmission of bits.
It is clear that EI and GOCs find application whenever we have a set of devices characterized by limited capabilities that need to perform specific tasks timely and with prescribed requirements in term of reliability. In vehicular edge computing (VEC) scenarios [12], for instance, we can imagine that the UEs installed onboard the vehicles need to perform an object classification task (e.g., detection of traffic signs in the scene) or collision avoidance/pose estimation. If, due to the limited resources, the devices are not able to timely perform the task with the required quality, they may ask an edge server to perform the task and send back the outcome. It is clear that a smart communication scheme, capable to extract and transmit only the data that are relevant to the task, would be very attractive both energy-wise and delay-wise.
On the other hand, also IoT scenarios represent a noticeable field where EI is widely deployed [13]. As an example, we can imagine decentralized estimation tasks, possibly based on energy harvesting devices [14], where smart compression schemes are fundamental to parsimoniously offload data toward the edge cloud, in order to save as much transmission resources as possible while guaranteeing negligible estimation error degradation. Another application of EI that would benefit of a GOC architecture is realtime automatic video surveillance, where there is a continuous flow of video data that must be processed timely by specific inference models (e.g., neural networks). Also in this case, a fully local deployment at the IoT device may be impractical or impossible, making offloading a valuable solution to guarantee the service [13].
The goal of this work is to propose a dynamic communication strategy and an optimal allocation of all the network resources, including communication, computational, and ML resources, in order to implement a dynamic goal-oriented scheme, which aims to transmit only the data that are informative to the fulfillment of the specified goal (e.g., image recognition), under constraints on decision accuracy, service delay, and energy consumption.

Related works
Deep neural networks (DNNs) have already been proposed to design a joint source/ channel coding (JSCC), as an alternative to the conventional cascade of source and channel encoders, to achieve superior performance in the finite block-length regime for image retrieval applications [15]. Designing the JSCC encoder focusing directly on the recognition accuracy rather than performing image reconstruction and then classification separately, was investigated in [16]. In [17], the authors proposed an image retrieval scheme where, instead of sending the image, the feature vectors are first extracted and then mapped into channel input symbols, while the noisy channel output is used by the server to retrieve the most relevant images, without involving any explicit channel code. This approach has been extended in [18], where the encoder outputs are quantized prior to the mapping on the channel symbols, while in [19] a deep-JSCC with channel output feedback exploitation is proposed. In contrast to most of the works, which consider AWGN channels, the authors in [20] design a communication scheme for flat-fading channels based on an OFDM system. Other interesting work can be found in [21] and [22], where a combination of JSCC and nonlinear transform coding (NTC) [23] is proposed.
In applications such as text transmission, the semantics underlying the text has been also explicitly exploited in designing a JSCC, such as in [24] and [25], where a noiseaware JSCC system is described. The authors of [26] designed speech recognitionoriented semantic communications to directly recognize the speech signals into texts. The work in [27] exploits a hybrid automatic repeat request (HARQ) scheme in order to improve reliability in sentence semantic transmission. Semantic communications for multimodal data were considered in [28] for serving the visual question answering problem, which adopts long short-term memory for text transmission and a convolutional neural network for the image transmission. More recently, a transformer-based approach has also been investigated in [29] to support both image and text transmission. Alternative methods were also proposed in [30] and [31], to define an optimized common-language between a listener and a speaker, employing reinforcement learning (RL) and curriculum learning (CL). Other interesting examples can be found in [32] and [33], concerning, respectively, image classification in an unmanned aerial vehicle (UAV) scenario and visual question answering (VQA) tasks.
A more principled approach, based on the information bottleneck principle [34,35], to limit transmission only to the information that is relevant for the intended goal of communication, was recently proposed in [36], for the Gaussian case, and in the more general case in [37], using a variational IB (VIB) approach. Recently, VIB has also been considered for multi-device cooperative edge inference [38]. Some rate distortion approaches were also proposed in [39] and in [40] to support goal-oriented communications.
In all the above works, except [36], the focus was on the communication system, but without optimizing the usage of the available resources, namely communication, computational, and semantic-related resources. Resource optimization has been considered in [41] and [5,6]. Specifically, the authors in [41] propose to tune the GOC resources, e.g., bandwidths and powers, as well as the size of the goal-oriented compressed representation of the data, in order to optimize the success probability of the task under flat-fading zero-mean Gaussian channels. This optimization, which includes training of the compressive and classification (C C) architecture, and the choice of the data compression ratio, is performed once, by exploiting knowledge of the average statistics (e.g., standard deviation) of the flat-fading channel. The optimal bandwidths and transmission powers obtained this way, likewise the C&C neural network architecture and the compression ratio, are fixed for a given scenario (average SNR, etc.), and they are used over all the possible channel states that the GOC system may experience. This fixed allocation of both resources and C&C architecture, is a distinctive difference with respect to our approach, where we dynamically adapt all the energy and hardware resources, according to the system state, as we will further clarify.
Conversely, the dynamic analysis and optimization of the trade-offs between decision accuracy, overall (i.e., transmission and computation) energy consumption and service delay, has been considered in [5,6], where the trade-off is achieved by dynamically adapting the source encoding rate and the scheduling of transmission and computation tasks. This approach was recently extended in [42], for energy-efficient edge classification with reliability guarantees, in [43] for ensemble inference at the edge, and in [36] by incorporating the information bottleneck principle to identify and transmit only the information relevant to the task. Contributions In this paper, we focus on the dynamical joint management and optimization of computation, communication, and semantic-extracting resources of a GOC system, where transmitter and receiver architectures incorporate a pair of variable size convolutional encoders (CE) and classifiers (CC). A finite set of CE/CC pairs, each having a variable dimension of the CE output, is pre-trained offline, to make possible the selection of the most suitable pair to be used online, depending both on how well the overall communication system is fulfilling the goal and on the constraints of the communication link. The proposed communication scheme is reported in Fig. 1, where, inspired by the IB principle, the bottleneck is made time-varying, by adaptively selecting in each time slot the most suitable CE/CC pair, according to a strategy resulting from the solution of two possible constrained optimization problems: i) minimum energy consumption, under average service delay and accuracy constraints strategy (MEDA); ii) maximum accuracy under average service delay and energy constraints strategy (MADE). This is significantly different from the static optimization proposed in [41], where scheduling (and buffering) is not considered as a fundamental ingredient to make best use of the available resources in a dynamic fashion. Furthermore, we adapt to the buffer load and channel condition the assignment of computation and transmission resources, as well as the size of the compressed data, by a dynamic choice of the proper CE/CC pair at each time slot. Note that this dynamic use of multiple low-complexity CE/CC (neural network) pairs, makes our approach quite different from [41] and the recent literature on semantic and GOC [15,17,19,33,44], where typically a fixed (very complex) DNN architecture is split among transmitter and receiver. The single DNN architecture that these GOC schemes have to train is (typically) very complex because it has to work well for a variety of state channels, noise levels, and required task performance, which the GOC may have to face. Conversely, in our case, we train a set of NNs, where each NN is much simpler because it is well matched to (and will be used with) a much more restricted variety of conditions. In particular, each NN has a different output size (the bottleneck) and we adapt the bottleneck dimension online to optimize performance.
To address the dynamic management of the overall goal-oriented architecture, we hinge on Lyapunov stochastic optimization tools [6,45], which implement the solution in a time-slotted fashion. Specifically, in each time slot, we perform a deterministic optimization of the involved variables, valid also in the general situation where some of the involved variables, such as the channel state and the task arrival rates, are random, with unknown probability distribution. Under proper feasibility conditions, the proposed approach is shown to achieve the optimal solution, while respecting the given constraints. The simulation results confirm the effectiveness of the proposed approach to manage the system resources in an adaptive way and strike an optimal trade-off between average energy, delay, and accuracy.
Outline The paper is organized as follows. In Sect. 2, we present the scheme of our goal-oriented communication system, including the joint training procedure of the CE/ CC pair, assuming as goal the classification of the images sent by the UE. In Sect. 3, we introduce the overall system model supporting the offloading of the learning task from the UE to the ES, defining all quantities of interest, e.g., latency, learning accuracy, and energy, involved in the proposed resource optimization problems, which are then solved in Sect. 4, exploiting stochastic Lyapunov optimization. Section 5 presents the simulation results, and, finally, Sect. 6 draws some conclusions and highlights future research directions.

Proposed design of goal-oriented communications
We consider as an example of application of our proposed strategy, the transmission of images from a UE to an ES, where the goal is image classification. The key point of the proposed approach is to exploit knowledge of the system state (channel condition, computation load, buffer load, etc.) to dynamically compress the images to be transmitted, and then classified, using a GOC perspective, where the goal is not to recover the image at the receiver side, but only to achieve the desired classification accuracy.
To this end, the inspiring principle is the Information Bottleneck framework, whose purpose is to find a (probabilistic) compact representation U of the random variable X emitted by a source, in order to preserve as much information as possible about the classification output variable Y, while minimizing the complexity associated with the representation of X through U. The IB is based on the following functional optimization problem (in Lagrangian form) [34]: where the mutual information I(X; U) represents the complexity, in terms of number of bits used to represent X by U; the term I(U; Y) represents the relevance of U in conveying information about Y; β is the parameter used to control the trade-off between complexity and relevance. Since problem (1) depends on the (joint) probability density function (pdf ) of X, U, and Y, the optimal solution can be found only in specific cases, e.g., when the involved random variables are either Gaussian [46] or discrete. In the latter case, the solution is known only in an iterative form [34]. However, except for the Gaussian case, (1) is quite difficult to solve in practice, especially when the dimension of the data X is very large, as it happens with images [47].
Due to the aforementioned issues, in this work we pursue a simpler approach that, while it is inspired by the IB principle in (1) and the associated GOC scheme for the Gaussian case [36], it implements a practical goal-oriented communication scheme that performs a tunable data compression at the UEs, using a convolutional encoder that is trained offline to learn how to extract the relevant information necessary to achieve the accuracy of the inference task, while consuming the minimum amount of resources by properly compressing the input data. Since we focus on image classification, we choose the structure of both the encoder and decoders as two convolutional neural networks, incorporating a layer-by-layer max-pooling strategy [48] to adapt the dimension of the data to be transmitted. The pictorial scheme of the proposed goal-oriented communication scheme is illustrated in Fig. 1.
The design of the CEs has been driven by two main strategies: • Short-CE The compression is obtained by using a single convolutional layer, followed by a max-pooling layer, which directly implements the desired compression factor. • Deep-CE The compression is obtained by cascading a set of convolutional layers, each one followed by a max-pooling step that implements a compression factor equal to 2. The number of layers n l to be used is imposed by the total compression factor ρ that is desired at the output, e.g., ρ = 2 n l .
It is worth to emphasize that the architecture of the CNN that we are using is not necessarily optimal. There certainly exist alternative architectures that may perform better, although, as we do in our resource management, the ultimate classification performance should always take into account complexity and energy expenditure, which may be critical for mobile and simple UEs. Thus, the reason underlying our choice is simply dictated by the request of having a few simple alternative architectures that make possible to keep the complexity and energy spent for processing at the devices as small as possible.
The training of each CE/CC pair, at the UE and ES sides, has been performed jointly and offline, as a solution of the following problem where L ce is a suitable loss function, while θ ρ and φ ρ represent the parameters of the CNNs used at the CE and CC, respectively, for a given compression (bottleneck) parameter ρ , and N t is the size of the training set. More specifically, dealing with a multi-class classification task, we used the categorical cross-entropy as the loss function, so that L ce reads as [49]: where K is the number of classes, Y k (X n ) ∈ {0, 1} are the hot-coded true probabilities, i.e., those identifying the ground-truth labels, for the k-th class and n-th training sample; whereas, Ŷ k (X n , φ ρ , θ ρ ) are the soft probabilities estimated at the output of the classification network, i.e., those generating the predicted labels.
A key feature of the IB formulation in (1) is that the balance between complexity and relevance of the compressed representation U is tuned by acting on the trade-off parameter β . In our setup, this balance is tuned by acting on the dimension of the CE/CC pair, as depicted in Fig. 1. Hence, the architecture used in each time slot to encode the images and extract the relevant information is selected, slot-by-slot, depending on the service delay and accuracy constraints, as a function of the current values of the system parameters, such as wireless channel state and data arrivals. We remark that the training procedure is performed offline, while the selection of the most suitable architecture to be used in each time slot is performed in a dynamic fashion according to the criteria described in the next section. While the IB looks for a probabilistic mapping of the data source X to the compressed representation U [36,46], in our setup, the mapping is deterministic. Nevertheless, the proposed training scheme has an important link with the IB principle, as it was proved that the L ce (Y n , Y n ) is a good proxy for the mutual information I(U; Y) [50]. In particular, minimizing the cross-entropy loss (over the training set) leads to the maximization of the I(U; Y) of a deterministic mapping. Furthermore, IB arguments can be used to explain the performance of a deep neural network trained by a cross-entropy loss [51], which further motivates why the IB represents an information-theoretic justification of our practical procedure. In principle, we could also make our compression law probabilistic by adding noise in the encoding step as well as in the training phase, as this has been recognized as a method to improve the generalization capability of a CNN and reduce the overfitting errors [52].

Remark 2
Differently from works inspired by JSCC where the encoders directly map the input data to the symbols to be transmitted [15,17,19,20,37], we foresee a more traditional approach, where after compression we transmit bits over a conventional, capacity-achieving communication link, which makes use of ideal channel co-decoding, i.e., with zero bit error rate (BER). Although certainly interesting, we leave for future work the quantification, and proper handling, of the impact on classification accuracy of a residual BER in the communication link, due for instance to finite-length channel coding, where also JSCC schemes find their motivations.
Specifically, we split the encoder in a convolutional encoder (CE) followed by a lossless compression, as depicted in Fig. 1, where the compression is obtained using the lossless JPEG2000 and TIFF codecs. We follow this strategy for the sake of simplifying the overall adaptive strategy that selects, slot-by-slot, the most suitable communication architecture, and to enable an easy control at each time slot of the specific dimension of the (goal-oriented) data that have to be transmitted for every image, depending on how well the system is behaving in terms of balance between classification accuracy, service delay and energy consumption.
The relation between the (data) compression ratio ρ 1 to choose from, the dimension of the CE output and the size (number of bits) of the data to be transmitted, before and after compression, is reported in Table 1. The values for lossless compression for ρ = 32, 64 are not-available (N/A), since the overhead due to the zipping algorithm is higher than the file size reduction. We remark that state-of-the-art lossless compression after the CE at the UE allows us to save information bits to be transmitted, without impacting the overall accuracy granted by the offline training of the proposed CE/CC structure, under capacity-achieving ideal assumptions. Obviously, the price to be paid Binucci et al. EURASIP Journal on Advances in Signal Processing (2022) 2022:123 is a higher computational complexity of the system, which has also to perform the lossless decompression at the ES before feeding the convolutional classifier. We will take into account this computational complexity, as well as the associated delay and energy expenditure, in the resource management policies and optimization.

System model
The envisaged goal-oriented communication scenario includes an UE, with limited computational (or energy) capabilities, which is connected to an ES with higher computational resources and energy, through a wireless link with an access point (AP). The overall scheme is depicted in Fig. 1. We focus on image classification at the edge, assuming a pre-trained set of goal-oriented CE-CC schemes, as described in Sect. . We assume that the system state evolves in a time-slotted fashion with time-varying context parameters (i.e., wireless channels and data arrivals); each time slot t has a fixed duration τ . In our procedure, data (i.e., images) are generated/collected at the UE, with an arbitrary distribution of the arrival time, and uploaded to an ES for inference purposes. In particular, we design a procedure where data are: (i) collected and buffered locally at the device; (ii) encoded in a goal-oriented fashion, zipped, and transmitted; (iii) remotely buffered and processed by the ES for classification.
The goal of our optimization procedures is to provide inference results within a finite E2E delay considering: (i) the minimum energy consumption at the mobile device, under a prescribed inference reliability and decision delay; (ii) the maximum accuracy for a given energy consumption and delay. In this context, several resources must be optimized and adapted over time depending on dynamic system conditions, e.g., wireless channels, data arrivals, and buffered images. In particular, the UE must select its transmission rate R(t) toward the ES, its local computational clock frequency f d (t) , as well as the data compression factor ρ(t) , to generate the compressed latent representation U. At the same time, the ES has to allocate its computational clock frequency f c (t) in order to complete the specific learning task, i.e., image classification. The above quantities represent the optimization variables for the proposed resource allocation strategies. In the sequel, we illustrate the adopted model for latency, energy, and classification accuracy.

Latency model
The dynamicity of the system is modeled using queues, which are also used to control the overall delay of the service. In particular, our model involves two queues: In the sequel, we introduce some important assumptions for the resource optimization problem we are going to design: Assumption 1 Each data unit must be compressed and transmitted by the UE in the same time slot. It is indeed impossible to choose in advance the optimal compression factor for a data unit that would have to be stored and transmitted in the future, unless we could reliably predict also the future system state (e.g., the wireless channel condition, energy status, queue lengths, computational power, etc.) at the time slot the data unit would be actually transmitted. Therefore, compression and transmission operations must be done sequentially within the same time slot.

Assumption 2
We assume that, while the UE transmits some data units, it may also simultaneously compress other data units.
The maximum number of data units that could be transmitted at the t-th time slot is expressed by where R(t) and ρ(t) are, respectively, the transmission rate and the compression factor chosen by the device for such a time slot, and W (ρ(t)) = M(ρ(t))N (ρ(t)) is the average number of bits per data unit. M(ρ(t)) is the data unit size (in pixels) for the compression factor ρ(t) , and N (ρ(t)) is the associated number of bits that are necessary (on average) to encode a pixel in the compressed and encoded pseudo-image, that we will detail in the simulation results. On the other hand, the number of data units that is possible to compress during time slot t is given by where J d (ρ(t)) denotes the number of data units compressed in a clock cycle (which depends on the chosen compression factor ρ(t) ), while f d (t) denotes the clock frequency chosen by the UE during the t-th time slot. By Assumption 1 and 2, the UE cannot transmit more data units that can also (simultaneously) compress, which suggests that in (4) we have to use a rate Furthermore, although we are assuming parallel compression and transmission of (the previously compressed) data units, the very first data unit needs a time 1/(f d (t)J d (ρ(t)) to be compressed before transmission can start. This means that the number N UE (t) that the UE can actually transmit and compress in a time slot is given by .
that will be exploited later on to solve the optimization problems.
We can now write the dynamic evolution of the queue at the UE, which is fed by the arrival/acquisition of new data units (images) and is drained by the transmission of data units to the ES, thus reading as: where A(t) is a data arrival process, whose statistical properties are generally unknown. Once the data units arrive at the ES, they are put into a computational queue Q ES (t) . To make explicit the dynamic evolution of Q ES (t) , we need to quantify the number of data units that can be processed by the ES at time slot t. To this aim, let 1 J s (ρ) denote the number of clock cycles that are necessary to process (classify) a data unit encoded with a compression factor ρ . Then, the maximum number N ES (t) of data units that can be processed at time slot t by the ES is given by is the set containing the compression factors associated with each data unit in the ES queue, during the t-th time slot and indexed from the oldest to the newest. Indeed, problem (9) maximizes the number of processed data units in the queue, which clearly must be less than or equal to the ES computational capability, i.e., τ f c (t) . Finally, the ES computation queue evolves as In such a queued dynamic system, the overall latency experienced by a data unit before processing depends on the sum of the two queues in (8) and (10), i.e., In fact, assuming an average data arrival rate A = E A(t) τ , the average long-term delay is defined by the Little's law as [53]: Thus, we can attain an average delay D avg constraining the average queue length in (11) as: with Q avg = D avg A . In the sequel, we introduce the model for the system energy consumption.

Energy model
The system energy consumption involves three parts: • Transmission energy at the UE, needed to transmit the data units to the ES. • Computation energy at the UE, needed to compress/encode the data units.
• Computation energy at the ES, needed to classify the data units transmitted by the UE.
Assuming a capacity-achieving transmission system in a flat-fading channel, the transmission power p tx (t) can be inferred by the Shannon capacity [54]: where h(t) is the channel gain, N 0 denotes the power spectral density at the (ES) receiver side, while B is the bandwidth allocated to the UE. The flat-fading channel assumption simplifies the analysis and the optimal resource management, which already contains several optimization variables. Conceptually, the proposed framework can be extended also to frequency-selective channels, by employing OFDM, which converts it in a set of parallel flat-fading channels. This would request to add to the optimization problems described in the following an extra vector of optimization variables to dynamically split the available transmission power among all the parallel channels, to maximize the overall system transmission rate. This solution would lead to a water-filling-like problem, which is a well-studied topic in the literature. This possible extension is, however, left for future work, which could possibly build upon the results of this manuscript.
Thus, inverting (14), the energy required for transmission during a time slot of duration τ is given by: From the computation perspective, we exploit the model in [55], which assumes a cubic dependence of the computing power with respect to the clock frequency. Thus, letting f d (t) and f s (t) be the CPU clock frequencies of the UE and ES, respectively, the corresponding energies needed for computation read as: where the constants κ d and κ s represent the effective switched capacitance of the UE and ES processing units, respectively. Finally, we introduce a weighted energy function E α (t) , which quantifies the energy consumption of the overall system during the t-th time slot: where α ∈ [0, 1] is a weighting parameter to be chosen. For instance, choosing α = 1 leads to a pure user-centric strategy; whereas, α = 0 determines a pure network-centric strategy. An intermediate strategy, which we term as holistic, can be obtained with The use of this weighting parameter helps introduce more degrees of freedom and flexibility in the resource optimization, depending on the needs of the operators, users, and service providers.

Accuracy model
It is generally difficult to establish an analytic expression that relates the accuracy of the classification task over an available test set and the compression factor adopted by our goal-oriented communication scheme. Thus, in this paper we use a more practical approach, where the accuracy function G(ρ(t)) for the ES-based learning/classification task can be cast in the optimization problem by using a look-up table (LUT) indexed by the compression factor ρ(t) , whose entries have been obtained by offline testing each CE/CC associated with a specific compression factor. Examples of LUTs for the considered classification tasks will be provided in the sequel in Tables 2 and 3 and in Fig. 2. The LUT is instrumental to define constraints on the average accuracy we want to guarantee for the image classification task, as detailed in the two resource management policies described in the sequel. Note that, by the rate in (14), we are ideally assuming a capacityachieving communication system, which also simplifies the analysis and mathematical tractability of the problem. Such a Shannon rate can be practically granted by long channel codes, which also grant (almost) zero (coded) bit error rate (BER). Thus, coherently with (14), we train the CE-CCs without taking into account possible accuracy degradation induced by a finite BER, and also the LUTs are obtained by testing the CE-CCs neural networks, with zero BER in the communication link. Although certainly interesting, the design of CE-CCs networks that are capable to handle, and possibly mitigate, communication systems with non-negligible BER is out of the scope of this manuscript and could be the subject of further studies. Anyway, the results we will obtain for the energy, accuracy, and delay trade-offs, can still be considered bounds on those obtainable for finite (coded) BER scenarios, which will be tight and achievable up to a maximum BER (that depends on the specific task).

Problem formulation and methodology
The latency, energy, and accuracy models defined in the previous section can be exploited in the formal definition of two dynamic resource optimization strategies, which are described in the sequel.

MEDA: minimum energy under average service delay and accuracy constraints strategy
In the first resource allocation strategy, we formulate a long-term optimization problem that aims at minimizing the average energy consumption of the system, subject to average delay and accuracy constraints. The problem can be mathematically cast as: where collects the discrete optimization variables, and In (18) we impose two long-term constraints: (a) the average queue length must be lower than Q avg , i.e., we are imposing a maximum average service delay equal to D avg = Q avg /A (cf. 13); (b) the average classification accuracy must be greater than G avg . The others are feasibility constraints: (c) imposes an instantaneous constraint on the transmission rate, which must be greater than zero and smaller than a maximum value R max , obtained as in (14) using the maximum transmission power, say P max , available at the UE; finally, (d) specifies the discrete feasible sets S , F s , F d , for the goal-oriented compression factor and for the ES and UE computational clock frequencies, respectively. Since we do not assume any knowledge of the statistics of quantities involved in the system (e.g., data arrivals, radio channels, etc.) solving (18) is very challenging. However, resorting to stochastic Lyapunov optimization [45], we derive lowcomplexity dynamic solutions for the original optimization problem, as detailed in the following. According to [45], we associate each long-term constraint, (a) and (b) in problem (18), to a specific virtual queue The parameters ν and µ are step sizes, used to adjust the convergence speed of the algorithm. As detailed in [45], guaranteeing the mean-rate stability of the queues in (19) is equivalent to satisfy the constraints (a) and (b) in (18). In the sequel, we collect the virtual queues employed in the system in a vector �(t) = [Z(t), Y (t)] . Then, to stabilize all the queues, we introduce the Lyapunov Function

and the associated Lyapunov Drift
Minimizing the Lyapunov Drift �(t) leads to the stabilization of the virtual queues, but possibly with an unjustified and uncontrolled energy consumption. Thus, to trade-off system stability with energy consumption, the Lyapunov Drift is augmented with a term (18) min dependent on the objective function of (18), thus obtaining the following Lyapunov Drift plus Penalty function [45] In particular, the drift-plus-penalty function is the conditional expected change of L(t) over successive slots, with a penalty factor that weights the objective function of (18), with a weighting parameter V. Now, if � p (t) is lower than a finite constant for all t, the virtual queues are stable and the optimal solution of (18) is asymptotically reached as V increases [45,39,Th. 4.8]. In practical scenarios with finite V values, the higher is V, the more importance is given to the energy consumption, rather than to the virtual queue backlogs, thus pushing the solution toward optimality, while still guaranteeing the stability of the system.
Following similar arguments as in [45], we proceed by minimizing an upper-bound of the drift-plus penalty function in (20) in a stochastic fashion. After some simple algebra (similar as in [6] and omitted here due to space limitations), we obtain the following perslot problem at each time t: where Q tx (t) = 2µ 2 (Q UE (t) − Q ES (t)) + µZ(t) , and Q comp (t) = 2µ 2 Q ES (t) + µZ(t) . In the sequel, we will show how (21) can be split into subproblems that admit low-complexity solution procedures for the optimal UE resources (i.e., rate, compression factor, local CPU clock frequency), and the computation resources at the ES (i.e., remote CPU clock frequency).

UE's resource optimization for MEDA
The resource allocation problem at the UE aims at optimizing the transmission rate R(t), the compression factor ρ(t) , and the UE CPU frequency cycles f d (t) in (21). In the sequel, to ease the notation, the dependence from the time index t is omitted. It is clear from (21) that the UE allocation problem can be split by the optimization of the ES computation resources, thus obtaining the following subproblem at the UE: where where, exploiting (7) and ⌊x⌋ ≥ x − 1 , the cost function is a (tight) upper-bound of the original one, with the same optimal solution because Q tx does not depend on the optimization variables. Assumptions 1 and 2, means that in practice it does not make any sense that the transmission rate could exceed the value R + max in (23), which is the minimum between three terms: (i) R max , i.e., the maximum rate obtainable by the radio interface; (ii) the rate necessary to empty the UE local queue Q UE (t) , by compressing all the data units with a specific compression factor ρ ; (iii) the maximum rate that is necessary to grant transmission of all the data units that is possible to compress during the t-th time slot using a compression factor ρ and a CPU frequency f d .
The problem in (22) is a mixed integer optimization problem since both the compression factor ρ ∈ S and the device frequency f d ∈ F d take values on a discrete set. However, in our case S and F d have a limited cardinality, allowing for an exhaustive search of the optimal values in a short time. Furthermore, since the objective function in (22) is (strictly) convex with respect to R, for any fixed frequency f d and compression factor ρ , by Lagrange theory and KKT conditions, we obtain a unique solution for the transmission rate that reads as: if Q tx > 0 , and R * = 0 otherwise. The overall procedure for UE resource allocation is summarized in Algorithm 1.

ES' resource optimization for MEDA
The resource allocation problem at the ES aims at optimizing the CPU frequency cycles f c (t) in (21), thus leading to the following optimization: Note that (25) is an integer optimization problem, where also the number N ES (t) of processable data units depends on f s (t) by (9). Since the number of possible CPU frequencies in F s is small, we proceed using an exhaustive search procedure, which can be summarized in the following steps: 1 For each possible clock frequency f s (t) ∈ F s , observe Q comp (t) , evaluate N ES (t) by (9), and compute the value of the objective function in (25). 2 Select the frequency f * s (t) that leads to the lowest objective value.
The main steps of the procedure are summarized in Algorithm 2.

Overall edge learning procedure
The two resource optimizations procedures at the UE and ES jointly contribute to the overall dynamic resource allocation procedure for edge learning, which is summarized in Algorithm 3. Lyapunov optimization theory guarantees that, as V increases, Algorithm 3 minimizes the average energy consumption, while respecting average latency and accuracy constraints.

MADE: maximum accuracy under average service delay and energy constraints strategy
In this section, we introduce an alternative strategy for optimizing edge learning with goal-oriented communications. In particular, the aim of this strategy is to maximize the average long-term accuracy, under long-term latency and energy constraints. Let E UE (t) = E d (t) + E tx (t) be the overall energy spent by the UE at time slot t. Then, the long-term optimization problem can be cast as: where �(t) = [f s (t), f d (t), ρ(t)] collects the discrete optimization variables. In this case, we have an exchange of the constraints and the objective function with respect to (18). Indeed, in (26) we have the following long-term constraints: (a) the average queue length must be lower that Q avg , (as in (18)); (b) The average energy spent at the UE must be lower than E UE,avg ; (c) the average energy spent at the ES must be lower than E s,avg ; (d) and (e) impose instantaneous constraints on the optimization variables, similarly to (d) in (18).
To handle the long-term latency constraint (a), we use the same virtual queue Z(t) we already introduced in the previous problem, and that evolves according to (19). Furthermore, we introduce the virtual queues S(t) and O(t), associated with the two energy constraints (b) and (c), which evolve as: where and η are step sizes that control the convergence speed of the algorithm. Then, proceeding as in the previous case, we write the Lyapunov function and the Lyapunov drift-plus-penalty function given by Exploiting the same Lyapunov framework [45], we proceed by minimizing an upperbound of the drift-plus-penalty function in (29) in a stochastic fashion. After some simple derivations, we obtain the following per-slot problem at each time t: As for the MEDA strategy, it is easy to see that (30) decouples in the two separate optimization problems, as detailed in the two following subsections.

UE's resource optimization for MADE
The resource allocation problem at the UE aims at optimizing the transmission rate R(t), the compression factor ρ(t) , and the UE CPU frequency cycles f d (t) in (30) at every time t. Omitting the time index t, the subproblem at the UE can be cast as: Problem (31) is a mixed-integer optimization program that, by the same arguments and bounds used for the MEDA problem, can be proved to be strictly convex with respect to the transmission rate R, for any fixed compression factor ρ and computational clock frequency f d , with optimal closed form solution for Q tx (t) > 0 , and R * = 0 otherwise. Thus, the overall optimal solution R * (t) can be found by an exhaustive search in the product space F d × S of the UE clock frequencies and compression factors, by comparing the obtained objective values in (31) for the |F d ||S| potential solutions R * (ρ, f d ) . The procedure follows the same steps already described in Algorithm 1.

ES' resource allocation for MADE
The resource allocation problem at the ES aims at optimizing the CPU frequency cycles f c (t) in (21), thus leading to the following optimization:    Similarly to (25), the ES frequency f s (t) takes values in the discrete frequency set F s and, consequently, the problem can be solved only by an exhaustive search, which is similar to that one proposed in subsection . The only two differences are: (i) the cost function, and (ii) the presence of the queue O(t), which is used to control the energy constraint at the ES. Thus, the main steps are the same already listed in Algorithm 2. Finally, the overall resource allocation procedure following the MADE design can be described by Algorithm 3, with the aforementioned modifications for the UE's and ES's resource allocations.

Numerical results and discussion
In this section, we assess the performance of the proposed strategies for edge learning with goal-oriented communications. As previously mentioned in Sect. 3.3, we need to build a LUT that quantifies the behavior of the accuracy of the proposed goal-oriented learning scheme with respect the adopted compression factor ρ . To this aim, Tables 2  and 3 report the values of the accuracy G(ρ) , the data units J d (ρ) that the UE can at most compress (and zip by JPEG2000) in a clock cycle, the data units J zip (ρ) it can zip by JPEG2000 in a clock cycle, and the data units J c (ρ) that it can compress in a clock cycle, by the deep-CE and short-CE models, respectively. Also, Table 4 reports the data units J s (ρ) that the ES can at most classify in a clock cycle, as well as the image size M(ρ) and the average number of bits/pixel N (ρ) that are shared by both the short-CE and the deep-CE, when using JPEG-2000. As far as the wireless channel model is concerned, we modeled the local scattering according to a Rayleigh flat-fading channel, whose statistical evolution in time obeys a Clarke's autocorrelation function [56], which has been used to set the time slot duration. We considered two operating scenarios, as summarized in Table 5, where σ 2 0 represents the average power path loss, which has been computed according to the Alpha-Beta-Gamma model [57]. Finally, the UE's and ES's CPU clock frequency sets are selected as F d = {0.1, 0.2, . . . , 0.9, 1} × 1.4GHz , and F s = {0.1, 0.2, . . . , 0.9, 1} × 4.5GHz , respectively, assuming a switched capacitance κ = 1.097 × 10 −27 [ s cycles ] 3 (equal for both UE and ES). To give further insight, we report in Tables 6 and 7 the maximum number of data units the UE and ES can process with specific computation capabilities, in Channels A and B, respectively, when dealing with images from the dataset we describe in the following.

Compression-accuracy trade-off
In the experimental setup, we used the German Traffic Sign Recognition Benchmarks (GTSRB) [58] dataset, which includes 1213 pictures of German road signals divided into 43 different classes, thus representing a quite challenging classification task. The dataset has been split in an 80% training set, composed of 970 images, and 20% test set, composed of 243 images. During the data loading phase, all the images have been normalized to a size of 256 × 256 and then converted to a three-channel image (one channel for each RGB color), such that the initial size of each data unit, is 256 × 256 × 3 . We considered compression factors ρ ∈ {2, 4, 8, 16, 32, 64}.
In Fig. 2, we illustrate the behavior of the accuracy of the proposed scheme, versus the compression factor, for different architectures: i) deep-CE; ii) short-CE; and a simple image down-sampling procedure with anti-aliasing filter. As expected, and shown in Fig. 2, the accuracy G(ρ) has a monotone decreasing behavior with respect to the compression factor. The deep-CE has always the best performance even if, for lower compression factors (up to 8), the difference between the three architectures is almost negligible. In contrast, at large compression factors (i.e., 16,32,64), there is a clear advantage in using the deep-CE architecture. For compression factor ρ = 64, we get output tensors with a size of 4 × 4 × 3 = 48 pixels. Interestingly, although images of this size have clearly undergone a heavy transformation, the deep-CE still allows the ES CC to classify them with an 82% accuracy. For this compression factor, both image down-sampling and short-CE do not allow a meaningful classification. In the next sections, we extensively assess the trade-off between energy, latency, and performance of the proposed edge learning strategies with goal-oriented communications.

MEDA with deep-CE
In this section, we illustrate the performance of the proposed goal-oriented scheme with the MEDA strategy. We considered the wireless channel scenario A in Table 5 and the holistic paradigm that minimizes the energy consumption of the whole system, which corresponds to set α = 1/2 in (17). The time slot duration τ has been set to 50 ms , which fits within the coherence time where the channel can be considered constant. The image arrival process, whose statistical knowledge is not exploited, has been modeled as a Poisson process with an average rate A avg = 2 arrival/slot . This situation, for instance, is compatible with a web cam that transmits images with a rate of 40 frame/sec . The average latency constraint has been set to 150 ms . In the sequel, we consider only the deep-CE learning architecture, which is the one having the best performance as illustrated in Fig. 2.
In Fig. 3, we show the average system energy versus the average latency (i.e., the energy-latency trade-off ), for different accuracy constraints and learning architectures. Specifically, from (20), (22), (25), the parameter V is used to explore the tradeoff between energy, latency and accuracy. As the parameter V increases, we move on  Fig. 3 from the right to the left, reducing the energy at the expense of a higher latency, up to the maximum latency constraint, which corresponds to the optimal solution of the problem. As expected, the trade-off curves reported in Fig. 3 show that a stricter accuracy constraint implies also a higher system energy consumption and latency, according to the Energy/Accuracy and Latency/Accuracy trade-offs [6]. Then, the proposed deep-CE strategy is compared with the one performing compression with down-sampling, which is depicted using dashed lines in Fig. 3. As we can notice from Fig. 3, the proposed goal-oriented compression strategy enables a considerable saving in term of energy consumption, while satisfying the same accuracy and delay constraints. This gain is obtained thanks to the proposed deep-CE learning scheme, which is capable to grant quite high accuracy employing smaller data units, thus paying on average a lower energy/delay cost for transmission and classification.

Ensemble of goal-oriented compression schemes
Looking at Tables 2 and 3, we notice that also the short-CE and the classical down-sampling compression can lead to quite good accuracy results for low compression factors ρ ∈ {2, 4, 8, 16} , while requiring a lower computational complexity than deep-CE. Thus, it makes sense to consider an edge-based classification scheme equipped with an ensemble of all the available compression strategies, i.e., deep-CE, short-CE, and down-sampling, which might lead to enhanced performance. To this aim, in Fig. 4, we illustrate again the energy-latency trade-off curve of the system, for different accuracy constraints, comparing the ensemble of goal-oriented compression strategies (solid curves) with deep-CE (dashed curves). As we can notice from Fig. 4, there is a remarkable gain obtained by using the proposed ensemble compression scheme, since the system has more degrees of freedom (in terms of accuracy, complexity, and latency) to adapt to the instantaneous variations of the system parameters, i.e., queues, wireless channels, data arrivals, etc. The gain is even more appreciable if we consider the UE's energy consumption, whose behavior with respect to average latency is shown in Fig. 5, for the same accuracy constraints. This result shows that looking for a flexible, scalable, and finely tunable network for compression and classification is an interesting research direction.
Finally, Fig. 6 reports the actual average accuracy values obtained for the same simulations results shown in Figs. 4, 5 (i.e., for several values of the V parameter), comparing them with the accuracy constraints (dashed lines). From Fig. 6, we can notice how the system strictly respects the (minimum) accuracy prescribed by the constraints, without unnecessarily wasting energy or increasing the delay.

Comparison with static resource allocation
As anticipated in the Introduction, the joint dynamic adaption of the system resources and the learning models (i.e., the adaptivity of the CE-CC network), is one of the main strengths of the proposed framework with respect to most of the literature. Thus, in order to properly highlight the advantages of the framework, we did comparisons with : (i) a dynamic resource allocation strategy with a fixed CE-CC couple, which is capable to respect the average constraint imposed to our approach. This approach is quite similar to [6], where a fixed learning model is considered and the optimization of the transmission resources at the UE-side acts on the quantization bits; (ii) a completely static resource allocation strategy, which not only employs the fixed CE-CC couple, but also fixes the optimal static transmission resources (e.g., rate and power) exploiting the knowledge of the average channel statistics and the average image arrival rate; (iii) A hybrid static/ dynamic optimization strategy where the transmission resources (e.g., rate and power) are fixed according to the average channel statistics, while the learning CE-CC architecture and (only) the computational resources are jointly dynamically optimized. In particular, this approach is similar in philosophy to that one in [41], where a single network is considered, whose compression degree is made adaptive by selecting only the most significant features for increasing compression ratios. However, differently from [41], we also consider the computational cost and the task scheduling. Specifically, the static resource allocation fixes the transmission power to the minimum one that guarantees a transmission rate, computed through the capacity formula for flat-fading channels [59], which makes the UE queue stable (e.g., average transmitted images per slot equal to average images arrival per slot).
For the selected learning model, we fixed the UE clock frequency to the minimum one that is capable to respect Assumptions 1 and 2 in Sect. . In this set of simulations we considered the MADE strategy, with channel scenario A in Table 5, an arrival rate A avg = 2DU/slot , and an accuracy constraint set to G avg = 0.95 . Furthermore, we considered a UE-centric paradigm for the energy consumption, which corresponds to set  (17). In both the strategies i) and ii) we considered the deep-CE with ρ = 8 , as the single learning model, which has a fixed accuracy equal to 0.951.
As expected and witnessed by the trade-off curves presented in Fig. 7, any dynamic resource allocation strategy that exploits instantaneous knowledge of the system status outperforms a static allocation based on the knowledge of the average system statistics. Specifically, by letting the system to jointly and adaptively choose the best compression factor (e.g., the best CE-CC network) and the system resources, as envisaged by our framework, we obtain a significantly better energy-latency trade-off, and a much lower (minimum) UE's energy consumption for the optimal solution (i.e., the maximum Vs) of the MEDA strategy.
In a second set of simulations, we tested the capability of the dynamic policies to adapt to changes in the system statistics, such as the images (average) arrival rate. To this end we considered simulation runs with a duration of 2 × 10 4 time slots, where the  average arrival rate suddenly doubles to A avg = 4DU /slot after 5 × 10 3 slots. In this case, according to Little's theorem, the same average delay constraint corresponds to a double average length of the images queue Q tot (t) . Reminding that the proposed problems were targeting average performance and constraints, we performed 1.00 × 10 2 simulation runs. Figure 8 shows the sample mean of the UE's queue lengths Q tot (t) for each competitive strategy, while the shaded areas identify the associated standard deviations, computed over the 1.00 × 10 2 runs. From Fig. 8, it is possible to appreciate that, while our approach is capable to maintain the system stable also in a non-stationary environment by rapidly doubling the images queue length, the policies with a static allocation of the system resources experience an explosion of the latency queue Q tot , as a consequence of the mismatched A avg used to allocate transmission rate and power. Conversely, the mixed policy that uses a fixed CE-CCs network, even if it pays a price in energy consumption as shown in Fig. 7, is capable to adapt the queue length to the correct value, although with a longer transient and higher standard deviation with respect to our policy. Figure 9 shows the associate energy consumption for the same 1.00 × 10 2 simulation runs and confirms that the optimization policies that allow to adapt online the learning strategy grant the minimum UE energy consumption.

Performance of MADE
In this section, we assess the performance of the MADE goal-oriented strategy. In the sequel, we consider the channel scenario B of Table 5, while the other parameters are the same we considered for the first set of simulations in the previous subsection. Channel B is characterized by a huge attenuation, making the UE's transmission energy comparable with its computation energy. Also in this case, we start comparing the deep-CE compression with down-sampling, which are, respectively, the best and the worst strategies from the accuracy perspective (cf. Fig. 2). Figure 10 shows the behavior of the average latency versus the accuracy of the learning task, for different UE's energy constraints, while the ES's energy constraint has been set equal to 3 Joule per time slot, i.e., a power of 60 W, which largely satisfies the task requests and pushes the algorithm toward the optimization of the UE's resources. As expected, Fig. 10 and 11 show that, while both the compression strategies satisfy the UE energy constraint, the deep-CE leads to a better latency-energyaccuracy trade-off, since it allows higher accuracy values even transmitting smaller data units, which obviously induce a lower transmission energy and latency.
Finally, we show the optimum (maximum) accuracy versus the UE's energy constraint, achieved by different CE-CCs architectures (i.e., deep-CE and ensemble) in different channel scenarios (i.e., A and B in Table 5). The values in Fig. 12 are obtained for the largest value of V (i.e., V = 1 × 10 5 ), which push the system to tightly attain the latency constraint 1.50 × 10 −1 sec . As far as the Channel B is concerned, due to the large channel attenuation (see Table 5), the transmission cost is more critical than the compression cost. Due to this reason, we do not observe for Channel B significant differences  among the CC-CEs ensemble and deep-CE-CCs, because the CE-CCs ensemble tends to employ the deep-CE model in almost every slot, since this strategy is capable to grant quite good accuracy also for large compression factors, which are the most appealing for this harsh and energy-expensive channel. On the other hand, when the channel conditions are moderately good, such as for Channel A in Table 5), the limiting factor is represented by the compression energy, which is higher for deep-CEs. In this case, for the strictest energy constraints, the UE with deep-CE tends to apply the highest compression factors, which save energy because they do not require the lossless zipping phase, but suffers of some correct classification degradation. Vice versa, the ensemble-CE-CCs tend to use also those CE-CCs with the lowest compression factors (e.g., 4/8/16), whose zipping phase is (computationally and energy-wise) less expensive with respect to the same compression factors of the deep-CE model. This fact explains the performance advantage of the ensemble-CS-CCs on the deep-CE-CCs alone.

Future directions
Several extensions and interesting research directions are open for investigation. For instance, the trade-off curves have shown that a careful choice of the regularization parameter V is needed to drive the system converging to the optimal solutions, i.e., those that are close to the constraints bounds. Thus, an interesting research direction for system deployment in practical scenarios is to develop algorithms that make the convergence fast, stable, and adaptive by properly controlling the regularization parameter V and the queues evolution step sizes and η.
We may also adapt the resource management strategy to scenarios where (low) latency or (high) accuracy constraints have to be almost always guaranteed, e.g., in URLLC network slices, and not just on average as we did in this manuscript. This could be done by imposing constraints on key performance indicators such as the out-of-service probability, as suggested in [60].
A key feature of our proposal is to enable the UE to dynamically select the most suitable CNN architecture to be used in every time slot, within a pool of possible architectures. Certainly, we might expand the pool by introducing other CNN architectures, with different number of nodes per layer, or different layers, or even alternative NN structures. Clearly, although we may want to expand the set of available architectures to choose from, with those capable to improve the accuracy performance, these new architectures may reasonably also require additional computational complexity and, possibly, larger power consumption at the mobile device. Then, an interesting research question is how to make a better trade-off between not only accuracy, energy consumption and service delay, but also complexity.
A further possibility would be to perform data compression to a goal-oriented latent variable with dynamically adjustable size, by exploiting a single classification network that could possibly dynamically reconfigure itself to different compression factors. The use of the variational IB principle [37,61] is a possible step toward this direction, which deserves to be further explored. Another possibility is to consider an opportunistic offloading to the ES, for those UEs that have enough computational capabilities to perform themselves the task, when for instance either the channel conditions are too bad, or the ES queues too long, to respect latency constraints.
Together with the extension to OFDM modulations for frequency-selective channels that we already mentioned in Sect. , the proposed scenario could also be extended to a multi-user and/or multi-server scenario, where the UEs and ESs optimization problems maybe strongly coupled.
Finally, the design of a proper online training procedure is another interesting research direction for the proposed framework.

Conclusions
In this paper, we have analyzed the trade-offs between energy, latency, and accuracy in an edge learning scenario equipped with goal-oriented communications, designing an adaptive classification network based on CEs and CCs. For such goal-oriented communication system, we designed two resource optimization strategies, hinging on the Lyapunov stochastic optimization framework. The proposed strategies optimize dynamically and jointly the communication parameters (i.e., rates, compression factors) and the computation resources (i.e., CPU clock cycles of UE and ES) with the aim of striking the best trade-off between energy, latency, and accuracy of the edge learning task. Even in the complex dynamic learning scenario considered in the paper, the proposed approaches require only low-complexity procedures at each time slot and enable online adaptation of the CE at the UE to dynamically control the goal-oriented communication mechanism. The presence of tunable parameters, which can be used to dynamically weight the different terms of the cost functions, makes the resource management very flexible. Finally, our experimental results have shown that using CEs to compress images at the UE leads to good performance at the ES, also with extreme compression factors, for a quite challenging classification task with 4.3 × 10 1 -classes. Several simulations assess the good performance of the proposed strategies, illustrating the potential gain and adaptation capabilities.