Coupled-decompositions: exploiting primal–dual interactions in convex optimization problems

Morell, Antoni; Vicario, José López; Seco-Granados, Gonzalo

doi:10.1186/1687-6180-2013-41

Research
Open access
Published: 05 March 2013

Coupled-decompositions: exploiting primal–dual interactions in convex optimization problems

Antoni Morell¹,
José López Vicario¹ &
Gonzalo Seco-Granados¹

EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 41 (2013) Cite this article

2238 Accesses
1 Altmetric
Metrics details

Abstract

Decomposition techniques implement the so-called “divide and conquer” in convex optimization problems, being primal and dual decompositions the two classical approaches. Although both solutions achieve the goal of splitting the original program into several smaller problems (called the subproblems), these techniques exhibit in general slow speed of convergence. This is a limiting factor in practice and in order to circumvent this drawback, we develop in this article the coupled-decompositions method. As a result, the number of iterations can be reduced by more than one order of magnitude. Furthermore, the new technique is self-adjustable, i.e., it does not depend on user-defined parameters, as opposite to what happens with classical strategies. Given that in signal processing applied to communications and networking we usually deal with a variety of problems that exhibit certain coupling structures, our method is useful to design decentralized as well as centralized optimization schemes with advantages over the existing techniques in the literature. In this article, we expose there different resource allocation problems where the proposed method is successfully applied.

1 Introduction

Convex optimization theory [1, 2] has provided in the last decades a powerful framework to solve optimization problems in many distinct areas. Besides the numerous applications existing in the signal processing literature, it is also possible to find examples in topics such as filter design, machine learning, or finance among others. This great success has been motivated by (i) convex optimization provides relevant insights into each specific problem, thanks to a mature theoretical framework, (ii) some problems can be solved analytically or semi-analytically applying the so-called Karush–Kuhn–Tucker (KKT) optimality conditions, and (iii) efficient numerical methods, e.g., interior point methods, have been developed to solve generic convex problems in polynomial time.

In many engineering areas, optimization problems with a partially coupled structure arise. In particular, we consider programs where the objective can be expressed as a sum of functions that depend on disjoint sets of variables, which are additionally coupled by the problem constraints (e.g., [3–5]). The optimization of such programs is the topic addressed by decomposition methods [6] and a common strategy is to split the original problem into several smaller subproblems that are somehow coordinated until they reach the optimal solution. Additionally and as a by-product, the resulting methods can deal more naturally with decentralized implementations [7, 8].

However, existing decomposition methods exhibit some drawbacks in practice. Roughly speaking, the speed of convergence of the algorithms is in general slow (this can be appreciated, for instance, in the numerical examples of [6]) and furthermore, it is necessary to manually adjust the step-size used in the successive updates of the algorithms. Since there is no universal rule to do that optimally, the performance of the methods is compromised [9]. In order to overcome these drawbacks, we introduce a novel technique, the coupled-decompositions method (CDM). It can be applied to decentralized implementations and furthermore, due to its superior computational performance in terms of convergence speed, the new technique is also competitive when compared to well-established centralized methods.

In the following, we synthesize the main contributions of this article: (i) development of new interactions between the primal and dual domains in convex decomposition problems, (ii) development of a new method based on these novel interactions for problems with a single coupling constraint, (iii) convergence proof of the proposed method, (iv) further analysis of the method when it is applied to a subset of the problems of interest, and (v) presentation of numerical examples that show the benefits of having an unsupervised and efficient solution (in terms of both computational cost and convergence speed).

The remainder of the article is organized as follows. Section 2 formulates the type of problems that we deal with and it also reviews the classical decomposition techniques. Section 3 describes the proposed CDM and proves its convergence to the optimal solution whereas Section 4 provides further analysis on the proposed method when the problem is particularized. Finally, Section 5 presents numerical examples of the proposed method and Section 6 concludes the article.

2 Problem formulation and existing solutions

In this section, we first define the type of problems that we deal with throughout the text. Thereafter, the existing decomposition techniques in the literature are reviewed.

2.1 Problem formulation

Let us consider the following optimization problem,

\begin{array}{c} min_{{x_{j}}} & \sum_{j = 1}^{J} f_{j} (x_{j}) \\ s.t. & x_{j} \in X_{j}, & j = 1, \dots, J \\ \sum_{j = 1}^{J} h_{j} (x_{j}) \leq C \end{array}

(1)

with variables $x_{j} \in R^{n_{j}}$ . The functions $f_{j}, h_{j} : R^{n_{j}} \to R$ are assumed convex and differentiable in the sets $X_{j}$ , also convex and compact too. These sets are defined as $X_{j} = {x_{j} | g_{j} (x_{j}) ≼ 0}$ ^a with $g_{j} (x_{j}) = {[g_{j}^{1} (x_{j}), \dots, g_{j}^{G_{j}} (x_{j})]}^{T}$ , where the functions $g_{j}^{k} : R^{n_{j}} \to R$ are convex and differentiable. Therefore, Equation (1) defines a convex problem and if we further assume that its feasible region has non-empty relative interior, then strong duality holds.

Note that we may interpret (1) as the distribution of a quantity C of resources among J entities where the j th entity aims to set the values of the variables in x _j (constrained to lie in $X_{j}$ ) in order to minimize the global cost function $\sum_{j = 1}^{J} f_{j} (x_{j})$ without exceeding the coupling constraint $\sum_{j = 1}^{J} h_{j} (x_{j}) \leq C$ . The presented formulation applies, among others, to fair dynamic bandwidth allocation (DBA) in point-to-multipoint networks [4], to problems related with multiple-input multiple-output design [3, 10, 11] or problems related to OFDM system design [12, 13].

The problem in (1) is suitable for a dual decomposition approach and also for a primal decomposition if it is adequately reformulated. In the next sections, those classical solutions are reviewed.

2.2 Primal decomposition

Let us consider the following modified version of (1),

\begin{array}{c} min_{{x_{j}}, y} & \sum_{j = 1}^{J} f_{j} (x_{j}) \\ s.t. & x_{j} \in X_{j}, & j = 1, \dots, J \\ h_{j} (x_{j}) \leq y_{j}, & j = 1, \dots, J \\ \sum_{j = 1}^{J} y_{j} \leq C \\ y \in Y, & Y = Y_{1} \times \dots \times Y_{J} \end{array}

(2)

where we have introduced the coupling variables y= [y ₁,…,y _J]^T. The subsets $Y_{j} \in R$ are defined as the images of $X_{j}$ through the functions h _j, i.e., $h_{j} : X_{j} \to Y_{j}$ . Since the functions h _j are convex over $X_{j}$ and so continuous, the subsets $Y_{j}$ are guaranteed to be compact ([14], Th. 5.2.2). Therefore, each $Y_{j}$ has both a minimum and a maximum.

In primal decomposition, we assume that the coupling variables are fixed to a given value $y \in Y$ (more details can be found in [15], Sec. 6.4.2). Then, the problem in (2) is solved as J independent problems in the variables x _j. They are called the subproblems and they are expressed as

p_{j} (y_{j}) = \{\begin{array}{c} min_{x_{j}} & f_{j} (x_{j}) \\ s.t. & h_{j} (x_{j}) \leq y_{j} \\ x_{j} \in X_{j} \end{array}

(3)

Interestingly, we know from ([15], Sec. 5.4.4) that −λ _j, i.e., minus the Lagrange multiplier associated to the constraint h _j(x _j)≤y _j, is in fact a subgradient^b of p _j at y _j.

Having defined the primal subproblems, we can rewrite (2) as

\begin{array}{c} min_{{y_{j}}} & \sum_{j = 1}^{J} p_{j} (y_{j}) \\ s.t. & \sum_{j = 1}^{J} y_{j} \leq C \\ y \in Y \end{array}

(4)

and (4) is referred to as the primal master problem. Note that since the subgradients of the primal subproblems are obtained at no cost, we can use a projected gradient approach ([15], Sec. 2.3) to solve the problem. In other words, the following recursion (k indexes iterations)

y^{k + 1} = {[y^{k} - α^{k} s^{k}]}^{‡}

(5)

with $s^{k} = - {[λ_{1}^{*} (y_{1}^{k}), \dots, λ_{J}^{*} (y_{J}^{k})]}^{T}$ and where [·]^‡is the projection onto the feasible set (i.e., $y \in {y | y \in Y, \sum_{j = 1}^{J} y_{j} \leq C}$ ) converges to y ^∗. The interested reader can find more details about primal decomposition in ([15], Sec. 6.4.2) and also in [6]. However, note that it is necessary to appropriately adjust the value of α ^k in order to guarantee the convergence to the desired solution [6, 9].

2.3 Dual decomposition

Dual decomposition is the dual-domain alternative to primal decomposition. Let us compute the partial Lagrangian of (1) by means of relaxing only the coupling constraint

q (μ) = \sum_{j = 1}^{J} min_{x_{j} \in X_{j}} {f_{j} (x_{j}) + μ h_{j} (x_{j})} - μ C

(6)

Clearly, the problem in (6) decouples into J independent problems, called the dual subproblems and defined as

q_{j} (μ) = \{\begin{matrix} min_{x_{j}} & f_{j} (x_{j}) + μ h_{j} (x_{j}) \\ s.t. & x_{j} \in X_{j} \end{matrix}

(7)

Note that the dual subproblems are convex programs for μ ≥ 0 and that given a value of μ, the values of the variables in x _j are found after solving the subproblems in (7) for j = 1,…,J, which can be computed in parallel. In particular, the optimal values of the primal variables, i.e., {x _j}, are obtained from an optimal value of the dual variable, i.e., μ ^∗.

Using the dual subproblems, the dual master problem is written as

\begin{matrix} max_{μ} & \sum_{j = 1}^{J} q_{j} (μ) - μC \\ s.t. & μ \geq 0 \end{matrix}

(8)

and, as in primal decomposition, a projected gradient approach can be applied ([15], Sec. 6.4.1) to finally get μ ^∗. The recursion is

μ^{k + 1} = {[μ^{k} + α^{k} s^{k}]}^{+}

(9)

where k indexes iterations and [a]⁺= max{0,a}. As well as in primal decomposition, it can be shown that a subgradient of q _j at μ ^k is readily found as^c $h_{j} (x_{j}^{*} (μ^{k}))$ once the dual subproblems are solved ([15], Sec. 6.1) and therefore, a subgradient of q at μ ^k is given by $s^{k} = \sum_{j = 1}^{J} h_{j} (x_{j}^{*} (μ^{k})) - C$ . Finally, note that a user-defined step-size is also necessary in dual decomposition and, as we discuss later, this is a serious drawback of classical decomposition methods in practice.

2.4 Primal–dual techniques

There is a huge list of methods in the literature that are termed primal–dual but, to the best of our knowledge, the essentials in our proposed CDM have not been established previously. In general, all the reviewed methods suffer from (i) slow speed of convergence to the optimal solution (this restricts the number of practical applications), (ii) no consideration for the separated nature of the problem (i.e., the techniques are not decomposition-based approaches), and/or (iii) the decentralized implementation of the methods is not taken into account. On the contrary, all these aspects are addressed in the proposed CDM.

A first group of existing primal–dual techniques focus on iteratively finding a saddle-point of the Lagrangian, which is a convex and concave function of the primal and dual variables, respectively. Although these methods were not originally conceived from a decomposition perspective, they can be applied to the problems of interest in this article (and also implemented in a decentralized manner). Among these techniques, we find the classical work of Arrow et al. [16] or the more recent Mean Value Cross (MVC) decompositions method [17, 18]. However, both techniques need to fix an step-size (explicit or implicit as in the MVC decompositions method) and, as a consequence, they penalize in terms of convergence speed in practice.

In a second group of techniques we include all the possible combinations of classical primal and dual decompositions, as described in [6]. The idea in this case is to solve some parts of the problem with a primal decomposition approach while other parts are tackled by means of a dual decomposition. Therefore, these solutions do not consider full primal–dual interactions as in the proposed CDM, where each part considers both domains simultaneously. Furthermore, they also suffer from slow convergence speeds due, in part, to the manually adjusted step-sizes. However, it is important to remark that in the last decade a significant progress has been made in dual-decomposition-based solutions using smoothing [19] or path-following [20] strategies, improving the number of iterations of the classical dual decomposition by an order of magnitude. Notwithstanding, these methods tackle problems with linear constraints and are not designed under a decentralized implementation perspective.

Finally, let us mention the primal–dual interior point methods ([2], Sec. 11.7) and its variants [21, 22] as the third group of primal–dual approaches. In this case, the basic idea is to iteratively solve the KKT conditions of the problem using numerical methods typically applied to the resolution of systems of nonlinear equations such as the Newton method. These techniques have received great attention during the past years due to their good performance in terms of convergence speed when used in generic convex problems. However, since they were not conceived to exploit the separability of the problem (if it exists), it is not straightforward to derive decentralized solutions from this third group of techniques (one of the goals in this article).

3 The CDM

In order to overcome the detected drawbacks, we design our CDM with the aim to (i) exploit the primal and dual domains in convex optimization problems and (ii) simultaneously benefit from the separability of the problem in order to derive decentralized solutions. Although there are solutions in the literature that exploit both solution domains as discussed, the development of a fast technique satisfying (i) and (ii) is still pending. In the following, we first describe our proposed CDM and thereafter, we prove that the iterates of the method convergence to the optimal solution.

3.1 Description of the method

The proposed method has four building blocks: the primal subproblems, the dual subproblems, the primal projection, and the dual projection. These blocks are connected as depicted in Figure 1 and in what follows, we describe the actions taken at each step of the method and we provide a summary of the technique in algorithmic form. Thereafter, the convergence of the successive updates of the CDM, i.e., μ ^k, towards an optimal value of the dual variable, i.e., μ ^∗, is proved (and the same is valid for the rest of variables, primal, and dual).

3.1.1 Step 1: dual subproblems

From μ ^k, the primal value $y_{j}^{k}$ is obtained after solving the following convex optimization problem in d _j

d_{j} (μ^{k}) = \{\begin{array}{c} min_{x_{j}, y_{j}^{k}} & f_{j} (x_{j}) + μ^{k} y_{j}^{k} \\ s.t. & h_{j} (x_{j}) \leq y_{j}^{k} \\ x_{j} \in X_{j} \end{array}

(10)

Note that d _j(μ ^k) coincides with (7) if we substitute $y_{j}^{k}$ by h _j(x _j). Note also that $λ_{j}^{k}$ , i.e., the dual variable associated to the constraint $h_{j} (x_{j}) \leq y_{j}^{k}$ , always takes the value of μ ^k. This can be checked using one of the KKT optimality conditions of the problem as follows

\frac{\partial L (x_{j}, y_{j}^{k}, λ_{j}^{k}, \dots)}{\partial y_{j}^{k}} = μ^{k} - λ_{j}^{k} = 0 \to λ_{j}^{k} = μ^{k}

(11)

where $L (x_{j}, y_{j}^{k}, λ_{j}^{k}, \dots)$ stands for the Lagrangian function of the problem. The interested reader can find more details on the Lagrangian function as well as on the KKT optimality conditions of convex problems in ([2], Sec. 5.1, Sec. 5.5).

3.1.2 Step 2: primal projection

In the second step of the method, the values in $y_{j}^{k}$ from all the subproblems are grouped in $y^{k} = {[y_{1}^{k}, \dots, y_{J}^{k}]}^{T}$ and projected to the subset $Y \cap {y | \sum_{i} y_{i} = C}$ if μ ^k> 0 and to the subset $Y \cap {y | \sum_{i} y_{i} \leq C}$ otherwise. Note that both projections force the values in ${\hat{y}}^{k}$ to be feasible and that the choice of the projection subset depending on μ is in accordance with the complementary slackness constraint $μ (\sum_{j = 1}^{J} y_{j} - C) = 0$ of the problem.

In the most usual case, that is, for μ ^k> 0, the following convex problem has to be solved

\begin{array}{c} min_{{\hat{y}}^{k}} & | | y^{k} - {\hat{y}}^{k} | |^{2} \\ s.t. & \sum_{j = 1}^{J} ŷ_{j}^{k} = C \\ {\hat{y}}^{k} \in Y \end{array}

(12)

which can be done semi-analytically as discussed in “Proof of Proposition 2” in Appendix.

3.1.3 Step 3: primal subproblems

The j th primal subproblem is defined as

p_{j} (ŷ_{j}^{k}) = \{\begin{array}{c} min_{x_{j}} & f_{j} (x_{j}) \\ s.t. & h_{j} (x_{j}) \leq ŷ_{j}^{k} \\ x_{j} \in X_{j} \end{array}

(13)

and it can be solved once $ŷ_{j}^{k}$ is available. In this case, we are interested in the optimal value of the Lagrange multiplier associated to $h_{j} (x_{j}) \leq ŷ_{j}^{k}$ , that is, ${\hat{λ}}_{j}^{k}$ . As later discussed in Section 3.4, the step 4 of the method uses only the values of ${\hat{λ}}_{j}^{k}$ that result from $ŷ_{j}^{k} \notin bd Y_{j}$ , where $bd A$ stands for the boundary of the subset $A$ . The selected values are then grouped in the list ${{\overset{̆}{λ}}_{j}^{k}}$ , which is the input of the dual projection. Note that if $\sum_{j = 1}^{J} ŷ_{j}^{k} = C$ , then the list is guaranteed to be non-empty as shown in Proposition 3 (Section 3.4). Besides, it is important to solve the primal subproblem in (13) according to its dual version in (10). In other words, if $ŷ_{j}^{k}$ is fixed in the j th primal subproblem then ${\hat{λ}}_{j}^{k}$ (not necessarily unique) is accepted as valid only if $d_{j} ({\hat{λ}}_{j}^{k})$ gives $y_{j}^{k} = ŷ_{j}^{k}$ .

3.1.4 Step 4: dual projection

If $\sum_{j = 1}^{J} ŷ_{j}^{k} = C$ , a new update of μ, i.e., μ ^{k + 1}, is obtained as the solution of the following optimization problem

\begin{matrix} min_{μ^{k + 1}} & | | μ^{k + 1} - μ^{k} | |^{2} \\ s.t. & μ^{k + 1} \in {{\overset{̆}{λ}}_{j}^{k}} \end{matrix}

(14)

In other words, μ ^{k + 1} takes the value in ${{\overset{̆}{λ}}_{j}^{k}}$ that is the closest to μ ^k. As discussed in Section 3.3, this is equivalent to set $μ^{k + 1} = min {{\overset{̆}{λ}}_{j}^{k}}$ if μ ^k< μ ^∗ and $μ^{k + 1} = max {{\overset{̆}{λ}}_{j}^{k}}$ if μ ^k> μ ^∗.

If $\sum_{j = 1}^{J} ŷ_{j}^{k} < C$ then μ ^{k + 1} is fixed to 0, which is in accordance with the complementary slackness constraint $μ (\sum_{j = 1}^{J} y_{j} - C) = 0$ .

3.2 The CDM in algorithmic form

Let us consider without loss of generality a decentralized implementation of the proposed method with a controller and J independent participants. Each participant is able to solve the corresponding primal and dual subproblems whereas the task of the controller is to compute the primal and dual projections. Note that both operations involve simple computations as discussed in the steps 2 and 4 above. The proposed CDM is then summarized in the following algorithm,

Choose an initial value for μ ⁰ and repeat

1.
The controller sends μ ^k to the participants, which compute d _j(μ ^k) in (10) and return $y_{j}^{k}$ .
2.
With $y^{k} = [y_{1}^{k}, y_{2}^{k}, \dots, y_{J}^{k}]$ , the controller computes $ŷ^{k}$ using the primal projection (step 1 above) and sends $ŷ_{j}^{k}$ to the participants if $ŷ_{j}^{k} \notin bd Y_{j}$ .
3.
The participants compute $p_{j} (ŷ_{j}^{k})$ in (13) and return ${\hat{λ}}_{j}^{k}$ to the controller.
4.
The controller fixes μ ^{k + 1} to the received value that is closer to μ ^k.

Until convergence.

3.3 Resource–price interpretation

Often in convex optimization, primal variables are interpreted as resources and dual variables as prices to be paid for them. In the sequel, we revisit the proposed technique under this resource–price perspective. Initially, a global price μ ^k is fixed and sent to the parts. Given that price, the parts estimate the amount of resources they want to buy. Intuitively, there will be a deficit of resources (a total request over C) if the price is too low and an excess if it is too high. In both cases, the primal projection corrects the allocation in order to distribute all the available resources among the parts. However, there is no guarantee that the distribution follows a common market law. In order to correct the situation, the primal subproblems estimate the price to be paid for the new resource allocation and, in case the individual prices differ, the dual projection fixes a new common price μ ^{k + 1} in order to advance towards a consensus price μ ^∗.

3.4 Proof of the method

Before proving that the successive updates of the proposed method converge to the optimal solution, let us establish the relationship between primal and dual variables in the subproblems with the following proposition.

Proposition 1. Take the jth primal subproblem p _j in (13) and the jth dual subproblem d _j in (10) of the CDM. Then, the following two statements hold: (i) ${\hat{λ}}_{j}^{k} (ŷ_{j}^{k})$ is non-increasing on $ŷ_{j}^{k}$ in (13) and (ii) $y_{j}^{k} (μ^{k})$ is non-increasing on μ ^k in (10).

Proof. See “Proof of Proposition 1” in Appendix. □

Next, the goal is to verify that the primal and dual projections effectively coordinate the subproblems towards the optimal solution. Let us assume, without loss of generality, that the initial guess is μ ⁰ = 0 so that μ ⁰ ≤ μ ^∗. From that value, the CDM starts by solving the dual subproblems in (10) in order to obtain y ⁰. As a result, there are two possibilities, namely, (i) $\sum_{j} y_{j}^{0} \leq C$ and (ii) $\sum_{j} y_{j}^{0} > C$ . In the first situation, μ ⁰ as well as y ⁰ and the corresponding values in {x _j} are optimal. Note that the subproblems are in this case decoupled and therefore the individual optimization carried out in the dual subproblems is globally optimal, too. For the sake of brevity, we do not discuss here what are the outputs of the following steps and iterations of the method, but it can be checked that the solution remains unaltered as expected. In the second case, μ ⁰ = 0 is clearly non-optimal and in the sequel we show how the successive updates of μ ^kconverge to an optimal value of the dual variable, that is, μ ^∗ > 0.

Let us revisit then a complete iteration of the method starting at the dual subproblems in (10) with μ ^k< μ ^∗, which holds at least for k = 0. Since $y_{j}^{k}$ is a non-increasing function of μ ^kin the j th dual subproblem (see Proposition 1), μ ^k< μ ^∗ and $y_{j}^{k} (μ^{*}) = y_{j}^{*}$ , it is true that $y_{j}^{k} \geq y_{j}^{*}$ . Moreover, if we take into account that $\sum_{j = 1}^{J} y_{j}^{*} = C$ (we are considering the case where the optimal solution is coupled), we can establish that $\sum_{j = 1}^{J} y_{j}^{k} > C$ unless y ^k= y ^∗.

Thereafter, it is verified in the second step of the method (primal projection) that ${\hat{y}}^{k} ≼ y^{k}$ ( $ŷ_{j}^{k} < y_{j}^{k}$ for some j) according to Proposition 2 next.

Proposition 2. Given the optimization problem in (12) and y ^k≽ y ^∗ (y ^k≠ y ^∗), its optimal solution can be expressed as ${\hat{y}}^{k} = y^{k} - r$ with r≽ 0(r _j > 0 for some j).

Proof. See “Proof of Proposition 2” in Appendix. □

In the third step of the method, the j th primal subproblem defined in (13) computes the individual price ${\hat{λ}}_{j}^{k}$ and the list of individual prices ${{\hat{λ}}_{j}^{k}}$ is constructed with the values obtained from the J independent subproblems, indexed by j = 1,…,J. Note, however, that our main interest is not in the prices ${\hat{λ}}_{j}^{k}$ but in finding a global consensus price μ ^∗. Fortunately, if we come back to the problem definition in (2), we notice that there is a dependence between the dual variable associated to the constraint h _j(x _j)≤ y _j, i.e., λ _j, and the dual variable associated to the constraint $\sum_{j = 1}^{J} y_{j} \leq C$ , i.e., μ (in terms of the proposed algorithm, $ŷ_{j}^{k}$ , ${\hat{λ}}_{j}^{k}$ and μ ^k play the role of y _j, λ _j and μ, respectively). This dependance motivates in our algorithm the selection of some of the values in the list ${{\hat{λ}}_{j}^{k}}$ . To be more specific, the value ${\hat{λ}}_{j}^{k}$ is chosen if the corresponding primal variable $ŷ_{j}^{k}$ satisfies $ŷ_{j}^{k} \notin bd Y_{j}$ as discussed next.

Let us first write the Lagrangian of the problem in (2), that is

\begin{matrix} \begin{matrix} L ({x_{j}}, y, λ, μ, {ξ_{j}}, {ψ_{j}}) = \sum_{j = 1}^{J} f_{j} (x_{j}) + \sum_{j = 1}^{J} ξ_{j}^{T} g_{j} (x_{j}) \\ + \sum_{j = 1}^{J} ψ_{j}^{T} q_{j} (y_{j}) + \sum_{j = 1}^{J} λ_{j} (h_{j} (x_{j}) - y_{j}) + μ (\sum_{j = 1}^{J} y_{j} - C) \end{matrix} \end{matrix}

(15)

where the set of convex functions q _j(y _j) with associated Lagrange multipliers ψ _j define the subset $Y_{j}$ . From the Lagrangian function we derive some of the KKT optimality conditions of the convex optimization problem as far as the optimal values of the variables form a saddle-point in the function plot. In particular, let us consider the following condition

\frac{\partial L}{\partial y_{j}} = μ - λ_{j} + ψ_{j}^{T} \frac{\partial}{\partial y_{j}} q_{j} (y_{j}) = 0

(16)

that reveals

μ = λ_{j} - {ψ_{j}}^{T} \frac{\partial}{\partial y_{j}} q_{j} (y_{j})

(17)

This equality is not very useful in general and neither from an algorithmic point of view because the values of the multipliers in ψ _j are unknown. However, we can make use of the following complementary slackness conditions ([2], Sec. 5.5.2) of the problem, compactly written as ψ _j⊙ q _j(y _j) = 0,^d and observe that if $y_{j} \notin bd Y_{j}$ then $q_{j} (y_{j}) ≺ 0$ and consequently ψ _j= 0. In that case, the link between μ and λ _j is clear,

μ = λ_{j} if y_{j} \notin bd Y_{j}

(18)

Back to the algorithm, this result motivates the use of ${\hat{λ}}_{j}^{k}$ only if it is derived from $ŷ_{j}^{k} \notin bd Y_{j}$ and so a new list ${{\overset{̆}{λ}}_{j}^{k}}$ that contains all these suitable dual values is constructed. Besides, it is necessary to guarantee that the new list ${{\overset{̆}{λ}}_{j}^{k}}$ is non-empty or, equivalently, that after the dual projection at least one value in ${\hat{y}}^{k}$ satisfies $ŷ_{j}^{k} \notin bd Y_{j}$ . This is the result of Proposition 3 next.

Proposition 3. Let ${\hat{y}}^{k} = y^{k} - r$ ( ${\hat{y}}^{k} \neq y^{*}$ ) be a primal point resulting from the primal projection of the CDM with the value of r≽ 0 suitable to fulfill $\sum_{j = 1}^{J} ŷ_{j}^{k} = C$ , ${\hat{y}}^{k} \in Y$ . Then, at least one value in ${ŷ_{j}^{k}}$ verifies $ŷ_{j}^{k} \notin bd Y_{j}$ and also $ŷ_{j}^{k} > y_{j}^{*}$ .

Proof. See “Proof of Proposition 3” in Appendix. □

Finally, we need to prove that the last step of the method, i.e., the dual projection in (14), is able to find an update of the global price μ from the list ${{\overset{̆}{λ}}_{j}^{k}}$ such that μ ^k→ k → ∞ μ ^∗. Since we have assumed that primal and dual subproblems are reciprocal in the sense that they agree on the values of the dual variables ${\hat{λ}}_{j}^{k}$ and $λ_{j}^{k}$ when $y_{j}^{k} = ŷ_{j}^{k}$ (see Section 3.1, step 3), a consequence is that ${\overset{̆}{λ}}_{j}^{k} (y_{j}^{*})$ computed in (13) equals $λ_{j}^{*} = μ^{*}$ as well as $y_{j}^{k} (μ^{*}) = y_{j}^{*}$ in (10). Note that we have intentionally written ${\overset{̆}{λ}}_{j}^{k}$ instead of ${\hat{λ}}_{j}^{k}$ because our focus is only on the primal subproblems with $ŷ_{j}^{k} \notin bd Y_{j}$ , which ensures λ _j= μ according to (18). Additionally, the following two claims can be made: (i) all the values in ${{\overset{̆}{λ}}_{j}^{k}}$ satisfy ${\overset{̆}{λ}}_{j}^{k} \geq μ^{k}$ and (ii) at least one value in the list verifies ${\overset{̆}{λ}}_{j}^{k} \leq μ^{*}$ . The first statement uses Proposition 1 and in particular that $\hat{λ_{j}^{k}}$ (or $\overset{̆}{λ_{j}^{k}}$ equivalently) is a non-increasing function of $\hat{y_{j}^{k}}$ in the primal subproblems. Recalling that $p_{j} (y_{j}^{k})$ in (13) would produce $λ_{j}^{k}$ as inner Lagrange multiplier and that $λ_{j}^{k} = μ^{k}$ according to (11), it is true that ${\overset{̆}{λ}}_{j}^{k} \geq μ^{k}$ since $\hat{y_{j}^{k}} \leq y_{j}^{k}$ (as a result of the primal projection). The second statement is verified in a similar manner taking into account that at least one value in ${ŷ_{j}^{k}}$ verifies $ŷ_{j}^{k} \notin bd Y_{j}$ and also $ŷ_{j}^{k} > y_{j}^{*}$ (see Proposition 3). Since ${\overset{̆}{λ}}_{j}^{k} (y_{j}^{*}) = λ_{j}^{*} = μ^{*}$ in the j th primal subproblem, Proposition 1 establishes that ${\overset{̆}{λ}}_{j}^{k} \leq μ^{*}$ .

Figure 2a explains the effects of the three steps of the CDM graphically from the dual domain point of view. Each bar represents an entity (J in total) and a point in that bar indicates the value of the dual variable $λ_{j}^{k}$ or ${\hat{λ}}_{j}^{k}$ . The highest the point the highest the value. At the beginning of the k th iteration, the dual subproblems enforce $λ_{j}^{k} = μ^{k} \forall j$ and translate these dual values to the primal variables in y ^k. Immediately after the primal projection, the corrected values in ${\hat{y}}^{k}$ are converted again to dual variables, i.e., ${{\hat{λ}}_{j}^{k}}$ . In the figure, we appreciate the effect of the primal projection on the Lagrange multipliers of interest. In short, we notice that (i) all values increase and (ii) there is at least one value below μ ^∗.

The role of the dual projection in (14) is then to update to μ ^{k + 1} by selecting the closest value to μ ^k from the list ${{\overset{̆}{λ}}_{j}^{k}}$ , that is, $μ^{k + 1} = min {{\overset{̆}{λ}}_{j}^{k}}$ if μ ^k< μ ^∗, as depicted in Figure 2b. Together with the previous results, i.e., ${\overset{̆}{λ}}_{j}^{k} \geq μ^{k}$ and ${\overset{̆}{λ}}_{j}^{k} \leq μ^{*}$ , the new update verifies μ ^{k + 1}∈ [μ ^k,μ ^∗] and thus our initial hypothesis (μ ^k< μ ^∗) is also satisfied for the next iteration unless μ ^{k + 1}= μ ^∗. Therefore, successive iterations confirm μ ^k→ k → ∞ μ ^∗ and, accordingly, ${\hat{y}}^{k} \overset{k \to \infty}{\to} y_{j}^{*} \forall j$ . This concludes the proof of the proposed method.

4 Convergence rate analysis and stopping criterion

This section provides additional insights into the proposed CDM by means of the following particularization of (1),

\begin{array}{c} min_{{x_{j}}, y} & \sum_{j = 1}^{J} f_{j} (x_{j}) \\ s.t. & x_{j} \in X_{j}, & j = 1, \dots, J \\ h_{j} (x_{j}) \leq y_{j}, & j = 1, \dots, J \\ \sum_{j = 1}^{J} y_{j} \leq C \\ y \in Y, & Y = Y_{1} \times \dots \times Y_{J} \end{array}

(19)

where the variables in {x _j} as well as the subsets in ${X_{j}}$ are uni-dimensional. To be precise, not all the problems that can be formulated as in (19) are considered in the following convergence analysis but only those with the following dependance between the primal variable y _j and the dual variable λ _j in the subproblems of the CDM, still interesting as far as usual problems in the literature exhibit that relationship (see some examples in Section 5),

\begin{array}{l} y_{j} = a_{j} {(λ_{j})}^{- α} + b_{j}, & for a certain value of α > 0, \\ a_{j} > 0 and b_{j} \in R \end{array}

(20)

In the general case, the relationship between y _jand λ _jcan be established again, thanks to the KKT optimality conditions of the problem. Therefore, let us construct the Lagrangian of (19), that is,

\begin{matrix} L ({x_{j}}, y, {λ_{j}}, μ, {ξ_{j}}) = & \sum_{j = 1}^{J} f_{j} (x_{j}) + μ (\sum_{j = 1}^{J} y_{j} - C) \\ + \sum_{j = 1}^{J} λ_{j} (h_{j} (x_{j}) - y_{j}) \\ + \sum_{j = 1}^{J} ξ_{j}^{T} (g_{j} (x_{j})) + \sum_{j = 1}^{J} ψ_{j}^{T} q_{j} (y_{j}) \end{matrix}

(21)

and consider the following optimality condition,

\frac{\partial L}{\partial x_{j}} = {\dot{f}}_{j} (x_{j}) + λ_{j} {\dot{h}}_{j} (x_{j}) + ξ_{j}^{T} \frac{\partial}{\partial x_{j}} g_{j} (x_{j}) = 0

(22)

where $\dot{f}$ and $\dot{h}$ stand for the first derivatives of the functions f and h, respectively. Note that if $x_{j} \notin bd X_{j}$ then ξ _j= 0 due to complementary slackness and

λ_{j} = - \frac{{\dot{f}}_{j} (x_{j})}{{\dot{h}}_{j} (x_{j})}

(23)

Moreover, if the constraint h (x _j) ≤ y _jis satisfied with equality (the usual case as we consider coupled problems) then $x_{j} = h_{j}^{- 1} (y_{j})$ and the relationship between λ _jand y _jis established.

Finally, as we show in Section 5, (20) is found for common functions f and h appearing in usual problems. Furthermore, the convergence rate of the proposed method can be derived assuming (20) and a stopping criterion that enhances the performance of the CDM can be designed. These two issues are developed in the following subsections.

4.1 Convergence rate analysis

In order to find out the convergence rate of the proposed method, let us compare the value of |(μ ^k)^{− α}−(μ ^∗)^{− α}| in two successive iterations, i.e., k and k + 1. First, let us classify the optimal primal variables ${y_{j}^{*}}$ into three groups: $I^{*}$ includes the indexes j corresponding to the variables that satisfy $y_{j}^{*} = inf Y_{j}$ , $S^{*}$ embraces the indexes where $y_{j}^{*} = sup Y_{j}$ and finally, $A^{*}$ contains the remaining indexes, i.e., those associated to $y_{j} \notin bd Y_{j}$ . Using (20) and recalling the optimality condition $λ_{j}^{*} = μ^{*}$ seen in (11), it is true that

y_{j}^{*} = \{\begin{matrix} a_{j} {(μ^{*})}^{- α} + b_{j} & j \in A^{*} \\ m_{j} & j \in I^{*} \\ d_{j} & j \in S^{*} \end{matrix}

(24)

where $m_{j} = inf Y_{j}$ and $d_{j} = sup Y_{j}$ . Assuming that $\sum_{j = 1}^{J} y_{j}^{*} = C$ is fulfilled, we get

{(μ^{*})}^{- α} = \frac{C - \sum_{j \in A^{*}} b_{j} - \sum_{j \in I^{*}} m_{j} - \sum_{j \in S^{*}} d_{j}}{\sum_{j \in A^{*}} a_{j}}

(25)

For any other value μ ^k≠ μ ^∗ we define

y_{j}^{k} = \{\begin{matrix} a_{j} {(μ^{k})}^{- α} + b_{j} & j \in A^{k} \\ m_{j} & j \in I^{k} \\ d_{j} & j \in S^{k} \end{matrix}

(26)

where the subsets $A^{k}$ , $I^{k}$ , and $S^{k}$ are defined likewise $A^{*}$ , $I^{*}$ , and $S^{*}$ but refer to the indexes of the variables in ${y_{j}^{k}}$ .

Let us assume μ ^k< μ ^∗ and let us obtain ${y_{j}^{k}}$ from (26). Clearly, since (μ ^k)^−α> (μ ^∗)^−α, it holds that $y_{j}^{k} \geq y_{j}^{*} \forall j$ . As a result of the primal projection in (12), now with the objective value modified by the weighting matrix W= [1/a ₁,…,1/a _J]^T, i.e., $W^{1 / 2} | | y^{k} - {\hat{y}}^{k} | |^{2}$ , the corrected $ŷ_{j}^{k}$ values can be expressed as

ŷ_{j}^{k} = \{\begin{matrix} a_{j} {(μ^{k})}^{- α} + b_{j} - a_{j} K & j \in A^{k} \\ m_{j} & j \in I^{k} \\ d_{j} - a_{j} K & j \in S^{k} \end{matrix}

(27)

for the value of K > 0 to be determined. The proof is very similar to the case W= I in “Proof of Proposition 1” in Appendix and the convergence of the method is not affected. We use this projection in this particularized version simply because it offers better performance and we did not use it before just because we had no means to find a better weighting matrix than the identity matrix.

At the third step of the method, i.e., the dual subproblems, the reduced list ${{\overset{̆}{λ}}_{j}^{k}}$ is obtained from the values $ŷ_{j}^{k}$ in (27) with $j \in A^{k} \cup S^{k}$ . In other words, reversing (20) we find

{({\overset{̆}{λ}}_{j}^{k})}^{- α} = \frac{ŷ_{j}^{k} - b_{j}}{a_{j}} = \{\begin{matrix} {(μ^{k})}^{- α} - K, & \forall j \in A^{k} \\ \frac{d_{j} - b_{j}}{a_{j}} - K, & \forall j \in S^{k} \end{matrix}

(28)

Finally, in the dual projection we select the minimum value in ${{\overset{̆}{λ}}_{j}^{k}}$ , which is in this case the closest to μ ^kgiven μ ^k< μ ^∗ because μ ^{k + 1}∈[μ ^k,μ ^∗] (see Section 3.3),

μ^{k + 1} = min {{\overset{̆}{λ}}_{j}^{k}}

(29)

or equivalently,

{(μ^{k + 1})}^{- α} = max {{\overset{̆}{λ}}_{j}^{k}} = {(μ^{k})}^{- α} - K

(30)

since $\frac{d_{j} - b_{j}}{a_{j}}$ is always lower than (μ ^k)^{− α} or otherwise $\frac{d_{j} - b_{j}}{a_{j}} - K$ would belong to $A^{k}$ . Note in (27) that the definition of the subsets $A^{k}$ and $S^{k}$ implies d _j< a _j(μ ^k)^−α+ b _j.

Using the previous results, we can state that

| {(μ^{k + 1})}^{- α} - {(μ^{*})}^{- α} | = | {(μ^{k})}^{- α} - {(μ^{*})}^{- α} - K |

(31)

This can be further refined if K is developed using (27) and $\sum_{j = 1}^{J} ŷ_{j}^{k} = C$ ,

\begin{array}{l} K & = \frac{{(μ^{k})}^{- α} \sum_{j \in A^{k}} a_{j} + \sum_{j \in A^{k}} b_{j} + \sum_{j \in I^{k}} m_{j} + \sum_{j \in S^{k}} d_{j} - C}{\sum_{j \in A^{k} \cup S^{k}} a_{j}} \\ = \frac{\sum_{j \in A^{k}} a_{j}}{\sum_{j \in A^{k} \cup S^{k}} a_{j}} {(μ^{k})}^{- α} - \frac{\sum_{j \in A^{k}} a_{j}}{\sum_{j \in A^{k} \cup S^{k}} a_{j}} \\ \times [\frac{C - \sum_{j \in A^{k}} b_{j} - \sum_{j \in I^{k}} m_{j} - \sum_{j \in S^{k}} d_{j}}{\sum_{j \in A^{k}} a_{j}}] \end{array}

(32)

Particularly, note that the expression within brackets in (32) is exactly (μ ^∗)^−α when the subsets $A^{k}$ , $I^{k}$ and $S^{k}$ coincide with the optimal ones. We say that the algorithm is in the optimal zone when the sets $(A^{k}$ , $I^{k}$ , $S^{k})$ coincide with $(A^{*}$ , $I^{*}$ , $S^{*})$ .

Finally, we can conclude that the speed of convergence within the optimal zone obeys the following rule, which is obtained by plugging (32) into (31),

\begin{array}{l} | {(μ^{k + 1})}^{- α} - {(μ^{*})}^{- α} | = & | {(μ^{k})}^{- α} - {(μ^{*})}^{- α} | \\ \times (1 - \frac{\sum_{j \in A^{*}} a_{j}}{\sum_{j \in A^{*} \cup S^{*}} a_{j}}) \end{array}

(33)

In other words, (μ ^k)^−α converges linearly to (μ ^∗)^−α except when $S^{*} = {\emptyset}$ , showing superlinear convergence. Alternatively, if the initial hypothesis is μ ⁰> μ ^∗, the convergence is also linear expect for $I^{*} = {\emptyset}$ , in which case it is superlinear. Note in both cases that since (1) and (19) are assumed coupled problems, $A^{*} \neq {\emptyset}$ .

4.2 Stopping criterion

The previous convergence rule in (33) can be used to define a stopping criterion for the CDM. It is based on the particular evolution followed by μ ^k inside the optimal zone. For that purpose, let us take three consecutive values of μ, i.e., μ ^k, μ ^{k + 1}, and μ ^{k + 2}, all of them in the optimal zone. The successive application of (33) leads to

\begin{array}{l} | {(μ^{k + l})}^{- α} - {(μ^{*})}^{- α} | = & | {(μ^{k})}^{- α} - {(μ^{*})}^{- α} | \\ \times {(1 - \frac{\sum_{j \in A^{*}} a_{j}}{\sum_{j \in A^{*} \cup S^{*}} a_{j}})}^{l} \\ l = & {0, 1, 2} \end{array}

(34)

From (34) it is verified that

\frac{{(μ^{k + 2})}^{- α} - {(μ^{k + 1})}^{- α}}{{(μ^{k + 1})}^{- α} - {(μ^{k})}^{- α}} = 1 - \frac{\sum_{j \in A^{*}} a_{j}}{\sum_{j \in A^{*} \cup S^{*}} a_{j}}

(35)

and therefore, in the optimal zone, the left side of (35) is a constant number regardless of k. From the practical point of view and thanks to this result, we can monitor the evolution of

S C^{k} = \frac{{(μ^{k + 2})}^{- α} - {(μ^{k + 1})}^{- α}}{{(μ^{k + 1})}^{- α} - {(μ^{k})}^{- α}}, \forall k

(36)

and stop the iterations when S C ^k stabilizes to a constant value. Afterwards, the optimal solution is readily obtained since at that point we know which allocations saturate to either m _j or d _j and the exact value of μ ^∗ can be computed by means of (25).

4.3 Graphical comparison among decomposition techniques

In the sequel, we include a graphical comparison among decomposition techniques and the goal is to highlight the manner in which the different methods operate in essence. We do this with the support of the following toy optimization problem

\begin{array}{c} min_{x_{1}, x_{2}, y_{1}, y_{2}} & a_{1} {(x_{1} - c_{1})}^{2} + a_{2} {(x_{2} - c_{2})}^{2} \\ x_{i} \leq y_{i}, & i = 1, 2 \\ s.t. & y_{1} + y_{2} \leq C \\ 0 \leq x_{i} \leq x_{i}^{max}, & i = 1, 2 \\ 0 \leq y_{i} \leq x_{i}^{max}, & i = 1, 2 \end{array}

(37)

where we have included the variables y _ito match the formulation of the proposed CDM and a primal decomposition as well. In Figure 3, we compare our proposed method to the classical decomposition techniques. In all the cases, the feasibility region of the problem in terms of the variables y ₁,y ₂ is marked in grey. Also, the contour lines of the objective function (centered at c= [c ₁,c ₂]^T) are represented in the plots (even we know that the dependance of the objective function is on x ₁,x ₂ instead of y ₁,y ₂).

As depicted in the figure, a primal decomposition approach updates y ^kby adding the subgradient to the point and, if the result is not feasible, a projection corrects the situation by finding the closest point in the feasible set. In the figure, note that arrows represent subgradients and dashed lines projections. In this way, the successive projections tend to the optimal solution, i.e., y ^∗. Next, let us consider a dual decomposition approach. In order to analyze it from the perspective of the primal variables, we need to establish first the relationship between y _jand λ _j and also between λ _j and μ. For that purpose, we consider again the Lagrangian of the problem, that is

\begin{array}{l} L (x, y, & λ, μ, ξ_{1}, ξ_{2}, ψ_{1}, ψ_{2}) \\ = & a_{1} {(x_{1} - c_{1})}^{2} + a_{2} {(x_{2} - c_{2})}^{2} + λ^{T} (x - y) \\ + μ (y_{1} + y_{2} - C) \\ - \sum_{i = 1}^{2} ξ_{1, i} x_{i} + \sum_{i = 1}^{2} ξ_{1, i} (x_{i} - x_{i}^{max}) \\ - \sum_{i = 1}^{2} ψ_{1, i} y_{i} + \sum_{i = 1}^{2} ψ_{1, i} (y_{i} - y_{i}^{max}) \end{array}

(38)

For the case $x_{i} \in (0, x_{i}^{max}), i = 1, 2$ and y _i= x _i, the dual variables in ψ _iand ξ _i satisfy ψ _i= ξ _i= 0,i = 1,2 due to slackness (note that $y_{i}^{max} = x_{i}^{max}, i = 1, 2$ ). In that conditions, the KKT optimality condition ∂ L/∂ x _i= 0 forces

x_{i} = c_{i} - \frac{λ_{i}}{a_{i}}

(39)

Furthermore, if we take into account that y _i= x _i in the case of interest and λ _i= μ due to the KKT optimality condition ∂ L/∂ y _i= 0, then we verify that

y_{i} (μ) = c_{i} - \frac{μ}{a_{i}}

(40)

The dashed line in Figure 3 shows all the points that can be obtained by changing the value of μ in (40). Note in particular that for μ = 0 the optimum of the unconstrained version of (38) is achieved. Using the dual decomposition technique (in the figure we start with μ ⁰ = 0), the successive updates move along this line (the orientation and module defined by the subgradient) until the optimal solution is achieved.

In the two classical decomposition approaches, primal and dual decomposition, a good election of the step-size that modifies the length of the subgradient plays a central role and it is recommended to choose a value that diminishes with the iteration number in order to prevent the successive updates from moving indefinitely around the optimal solution without reaching it. This issue is in fact an important practical impairment of both solutions. Notwithstanding, the method we propose in this article avoids the usage of a user-defined step-size. As the reader can appreciate in the figure, once the initial guess y ⁰ derived from μ ⁰ = 0 is projected to the feasible subset, the proposed method finds out several candidates to update μ ⁰ to μ ¹ (two in this case, i.e., ${\overset{̆}{λ}}_{1}^{1}$ and ${\overset{̆}{λ}}_{2}^{1}$ ). These two candidates provide the possible updates $y^{1} ({\overset{̆}{λ}}_{1}^{1})$ and $y^{1} ({\overset{̆}{λ}}_{2}^{1})$ and the method always chooses the dual candidate that provides the smallest possible update. This operation guarantees that the primal update, i.e., y ¹ in this case, remains in the same half-space (with respect to the frontier y ₁ + y ₂ = C). Note with this simple example the interesting feature of the proposed method in comparison with the other techniques, that is, the step is automatically controlled.

5 Applications and numerical results

In the sequel, we present three different applications of the proposed method, the first two are related to power allocation problems and the third one deals with DBA in satellite networks. In the first problem, a decentralized solution is required to reduce the amount of signaling information. In the second one, a centralized implementation of the CDM is used to solve a time-varying water-filling problem and the aim is to show the benefits of having an unsupervised method in that changing conditions. Finally, the third example shows the advantages of the proposed technique when a small allocation time is required in order to accommodate a large number of users.

5.1 Decentralized power allocation for cognitive radios

Let us consider a communication device that is able to establish simultaneous communication links by joining several networks or using multiple channels within the same system (e.g., this is possible in IEEE 802.11n). To do so, the device integrates multiple radio transceivers [23, 24] which, at their turn, operate over multiple subchannels or subcarriers in order to combat the multipath fading (see Figure 4). We assume that the device can sense the wireless channel and determine the non-used subcarriers in each subsystem, as it is usual in cognitive scenarios. Furthermore, each transceiver is able to optimally allocate the available power among its subcarriers using the water-filling solution. This is advantageous from the system design point of view because we can employ off-the-shelf radio transceivers and simply balance the device power among them. Finally, there is a central controller that performs the distribution task, being the global objective to maximize the total sum rate capacity. Note that, depending on the signal strength and capacity in each subsystem, some of the transceivers may remain temporarily idle.

The problem can be formulated for M radio transceivers as

\begin{array}{c} max_{{P_{i}}} & \sum_{i = 1}^{M} r_{i} (P_{i}) \\ s.t. & \sum_{i = 1}^{M} P_{i} \leq P_{T} \\ P_{i} \geq 0, & i = 1, \dots, M \end{array}

(41)

where r _i(P _i) is the transmission rate of the i th transceiver when power P _i is allocated to it and P _T is the total available power. Each of the transmission rates is actually the result of another optimization problem, that is,

r_{i} (P_{i}) = \{\begin{array}{c} max_{{p_{j}^{i}}} & \sum_{j = 1}^{N_{s}^{i}} B W_{i} log (1 + \frac{p_{j}^{i} H_{j}^{i}}{N_{0} B W_{i}}) \\ s.t. & p_{j}^{i} \geq 0 \\ \sum_{j = 1}^{N_{s}^{i}} p_{j}^{i} \leq P_{i} \end{array}

(42)

where $N_{s}^{i}$ is the number of subcarriers of the i th radio, B W _i stands for subcarrier bandwidth, N ₀ is the noise power spectral density, and $H_{j}^{i}$ is the channel gain at the j th subcarrier of the i th transceiver.

Note that given the separability of the problem, i.e., there are many independent transceivers coupled by a total power constraint, a decentralized optimization method is adequate both from a mathematical and a practical point of view. In this approach, the controller decides the total power per transceiver and each subsystem computes its own optimal allocation. The application of the CDM to solve (41) is briefly detailed next.

Given μ ^k< μ ^∗, each transceiver computes the following dual subproblem,

\begin{matrix} max_{{p_{i, j}}, P_{i}^{k}} & \sum_{j = 1}^{N_{s}^{i}} B W_{i} log (1 + \frac{p_{i, j} H_{i, j}}{N_{0} B W_{i}}) - μ^{k} P_{i}^{k} \\ s.t. & \sum_{j = 1}^{N_{s}^{i}} p_{i, j} \leq P_{i}^{k} \\ p_{i, j} \geq 0 \end{matrix}

(43)

and the application of the KKT conditions gives the solution

\begin{matrix} p_{i, j} = B W_{i} {(\frac{1}{μ^{k}} - \frac{N_{0}}{H_{i, j}})}^{+} \\ P_{i}^{k} = \sum_{j = 1}^{N_{s}^{i}} p_{i, j} \end{matrix}

(44)

As a result of the dual subproblems we obtain $P^{k} = {[P_{1}^{k}, \dots, P_{M}^{k}]}^{T}$ and we use it as the input for the primal projection, that is,

\begin{matrix} min_{{{\hat{P}}_{i}^{k}}} & | | {\hat{P}}^{k} - P^{k} | |^{2} \\ s.t. & \sum_{i = 1}^{M} {\hat{P}}_{i}^{k} = P_{T} \\ {\hat{P}}_{i}^{k} \geq 0 \end{matrix}

(45)

where ${\hat{P}}^{k} = {[{\hat{P}}_{1}^{k}, \dots, {\hat{P}}_{M}^{k}]}^{T}$ .

The corrected values ${\hat{P}}_{i}^{k}$ are then used in the primal subproblems defined in (42) and each transceiver computes the optimal allocation by its own. As a result of the primal subproblems, the Lagrange multipliers associated to the constraints $\sum_{j = 1}^{N_{s}^{i}} p_{i, j} \leq {\hat{P}}_{i}^{k}$ at the k th iteration, i.e., ${\hat{λ}}_{i}^{k}$ , are obtained. Discarding the values that result from ${\hat{P}}_{i}^{k} = 0$ , we obtain the reduced list ${{\overset{̆}{λ}}_{i}^{k}}$ and finally, the dual projection updates μ ^k from ${{\overset{̆}{λ}}_{i}^{k}}$ by doing

μ^{k + 1} = min {{\overset{̆}{λ}}_{i}^{k}}

(46)

A completely different approach is to gather all the information at the controller and to compute there the optimal power allocation. Afterwards, the result is sent back to the transceivers. Note that this centralized solution has an important drawback in terms of signaling because the powers in all the subcarriers and all the transceivers need to be exchanged. On the contrary, decentralized solutions benefit from transceiver-level signaling. In the numerical results below, we compare the CDM to other approaches.

5.1.1 Numerical results

We consider a device with three different OFDM transmitters. The first transmitter employs 256 subcarriers spanning a total bandwidth of 1.536 MHz (6 kHz per subchannel), the second one has 256 subcarriers as well and 3.072 MHz of bandwidth (12 kHz per subchannel) and the third one manages 128 subcarriers in 1.28 MHz of bandwidth (10 kHz per subchannel), so that a total of 640 subcarriers and 5.888 MHz have to be controlled (see Table 1). We assume frequency selective Rayleigh-fading channels in all three systems with a channel length of 20 taps and an exponential power delay profile where the delay spread is 1 ms. Mean channel gain is 0 dB in system 1, −10 dB in system 2, and −5 dB in system 3. Moreover, we assume that the noise power spectral density is flat over frequency with $N_{0} = σ_{n}^{2} / B W_{1}$ , being B W ₁ the subcarrier bandwidth in system 1. Initially, we set up a uniform power allocation in all the methods and the total available power is always $P_{T} (dB) = σ_{n}^{2} (dB) + 10 \underset{10}{log} (640) + 5$ .

Table 1 Description of the subsystems

Full size table

Figure 5 shows a multi-system water-filling allocation example. The plot at the top depicts one channel realization for the three systems whereas the plot at the bottom shows the optimal power allocation. As expected, most of the power is allocated to transceivers 1 and 3, which are the ones that have the best channel condition. On the contrary, subsystem 2 only allocates power to a few subcarriers that have the highest channel gains. Notwithstanding, in absolute terms, transceiver 2 receives quite a large allocation in order to exploit the higher subcarrier bandwidth.

Figure 6 shows the evolution of the Normalized Mean Squared Error (NMSE) in the power allocation with respect to the number of messages exchanged between the transmission subsystems and the central controller. The optimal power allocation is computed using the bisection method (relative error below 10⁻⁵). We compare the proposed CDM to the classical primal and dual decomposition techniques, the classical primal–dual algorithm of Arrow et al. [16] and also to a centralized approach. The classical decomposition techniques use $α^{k} = 1 / \sqrt{k}$ as step-size and the Arrow–Hurwicz method initializes the value in the dual variable μ to 0, the primal variables with a uniform power allocation and the step-size is fixed to 0.1. On the one hand, results show that the proposed CDM is the best option, whereas the remaining alternatives require at least to double the amount of signaling in order to achieve the same allocation error. On the other hand, note that the classical decomposition techniques as well as the primal–dual approach are penalized in terms of convergence speed even taking into account that we have manually adjusted the step-size of each method in order to achieve the best possible result. Finally, note also that a centralized approach is not efficient at all as far as the allocation error becomes small enough only when the entire allocation has been transmitted. This requires 640 messages in our case to send the channel gains to the controller and 640 messages more to return the optimal power allocation values to the radios.

5.2 Power allocation in a conventional OFDM transmission

In the following, we apply our method to a classical water-filling problem where a decentralized solution is not necessary. In this occasion, we are interested in the adaptability of the method in time-varying scenarios.

Let us consider the well-known single-user water-filling solution over parallel Gaussian channels ([25], Sec. 10.4), which provides the optimal power allocation to the subcarriers of an OFDM-based system in order to maximize the mutual information given a total power constraint. Mathematically,

\begin{array}{l} max_{{p_{i}}} & \sum_{i = 1}^{N_{s}} log (1 + \frac{p_{i}}{σ_{n_{i}}^{2}}) \\ s.t. & p_{i} \geq 0 \\ \sum_{i = 1}^{N_{s}} p_{i} \leq P \end{array}

(47)

where N _s is the total number of subcarriers or parallel channels in the system, P is the total transmission power, $σ_{n_{i}}^{2}$ is the noise variance in the i th subcarrier and p _i stands for the allocated power. The application of the KKT optimality conditions to (47) leads to the solution

p_{i} = {(\frac{1}{μ} - σ_{n_{i}}^{2})}^{+}

(48)

where (a)⁺ = max {0,a} and $\frac{1}{μ}$ is denoted as the water-level and shall be chosen in order to satisfy the total power constraint. Typically, the bisection method is employed to find μ ^∗. However, note that (47) can be rewritten in the form of (2) and also (19). Therefore, we can apply the proposed CDM as well. Indeed, (48) and the relationship in (20) match if we identify p _i with y _i and μ with λ _i (remember that the required relationship applies only to $y_{i} \notin bd Y_{i}$ , that is, y _i= p _i> 0).

5.2.1 Numerical results

Let us assume N _s= 512 subcarriers. The channel is time-varying and frequency selective; it has 20 taps. The power delay profile is assumed exponential with a delay spread of 1 ms and the baseband sampling time is 1 μ s. We compare now the proposed CDM to the bisection method and also to the classical primal–dual algorithm in [16]. It is remarkable that the CDM requires no modification at all (it is completely unsupervised) and the same holds for the primal–dual algorithm. On the contrary, the bisection method requires a slight modification to be able to track the time-varying scenario. For that purpose, we introduce the updating factor α _u. Initially, the method is applied as usual, that is, having the initial hypothesis on $μ_{l}^{0}$ and $μ_{u}^{0}$ (two values that are below and above μ ^∗, respectively), we compute $μ^{1} = 1 / 2 (μ_{l}^{0} + μ_{u}^{0})$ and we update $μ_{l}^{1}$ to μ ¹ if $\sum_{i = 1}^{N_{s}} p_{i} (μ^{1}) > P$ or $μ_{h}^{1}$ to μ ¹ otherwise. In the subsequent iterations, given that the channel is time-varying, we need to check first if $μ_{l}^{k}$ and $μ_{h}^{k}$ are still valid. If $\sum_{i = 1}^{N_{s}} p_{i} (μ_{l}^{k}) > P$ is not accomplished, we update $μ_{l}^{k}$ to $\frac{μ_{l}^{k}}{α_{u}}$ and we repeat this while $\sum_{i = 1}^{N_{s}} p_{i} (μ_{l}^{k}) > P$ . Similarly, if $\sum_{i = 1}^{N_{s}} p_{i} (μ_{u}^{k}) < P$ is not attained, we modify $μ_{u}^{k}$ to $α_{u} \cdot μ_{u}^{k}$ and we repeat this while $\sum_{i = 1}^{N_{s}} p_{i} (μ_{u}^{k}) < P$ . Then, we compute $μ^{k + 1} = 1 / 2 (μ_{l}^{k} + μ_{u}^{k})$ and we update the hypothesis accordingly, as in the normal version of the technique.

Figure 7 plots the NMSE of the power allocation for both methods as a function of the mean SNR. As in the previous application example, we compute the optimal power allocation using the bisection method (relative error below 10⁻⁵). Moreover, all the algorithms are initialized to the optimal solution for the current channel condition, α _u= 1.05 in the bisection technique, the step-size is fixed to 0.001 in the Arrow–Hurwicz method and we have considered three different channel velocities, namely, (i) $T_{c}^{1} = 10 \cdot T_{CDM}$ , (ii) $T_{c}^{2} = 100 \cdot T_{CDM}$ , and (iii) $T_{c}^{3} = 1000 \cdot T_{CDM}$ , where T _CDM is the time taken by one complete iteration of the CDM and $T_{c}^{i}$ is the coherence time of the channel at the i th scenario. Note that we have manually adjusted α _u in the bisection method and the step-size in the primal–dual algorithm in order to achieve the best possible performance at the worst channel condition, that is, when the channel coherence time is the smallest one, i.e., $T_{c}^{1}$ .

Results show that the CDM usually outperforms the bisection method and it is far better than the primal–dual algorithm. Indeed, it performs worse than the bisection only for $T_{c}^{1}$ and at low SNR. Note that since the CDM has no user-defined parameter, it automatically adapts to the different channel velocities. On the contrary, this adaptation does not occur in the other two methods. This is reflected in Figure 7, where, for example, the bisection method saturates to an NMSE around 10⁻⁴ for $T_{c}^{1}$ , $T_{c}^{2}$ , and $T_{c}^{3}$ as the SNR grows.

5.3 Fair DBA

The fair DBA problem arises in many-to-one communication systems [26, 27] and the goal is to fairly distribute the available bandwidth. In many cases and specially in systems with a huge number of users [28], the computational cost of the techniques plays an important role. Additionally, let us remark that recent works on the topic aim at providing mechanisms for QoS differentiation [4, 29] to modify a plain fair allocation. Therefore, we consider the following network utility maximization (NUM) formulation to solve a fair DBA problem,

\begin{array}{l} max_{{r_{j}}} & \sum_{j = 1}^{N} U_{j} (r_{j}; p_{j}) \\ s.t. & m_{j} \leq r_{j} \leq d_{j}, & j = 1, \dots, N \\ \sum_{j = 1}^{N} r_{j} \leq B \end{array},

(49)

where B is the available bandwidth, r _j is the rate allocated to the j th flow, and U _j is the j th utility function (the terms bandwidth and rate are used interchangeably). The parameters m _j, d _j (with 0 ≤ m _j< d _j), and p _j> 0 are used to define the QoS requirements for each ongoing connection and they represent the minimum necessary rate, the required (maximum) bandwidth and the priority of the j th flow, respectively. Furthermore, we assume that $\sum_{j} m_{j} < B < \sum_{j} d_{j}$ , i.e., the problem is coupled. As argued before, the utility functions can adequately be chosen in order to achieve a fair distribution of resources in different degrees. The following family of functions parameterized by γ

U_{j} (r_{j}; p_{j}, γ) = \{\begin{array}{l} p_{j} log (r_{j}), & γ = 1 \\ p_{j} \frac{r_{j}^{(1 - γ)}}{1 - γ}, & γ \neq 1 \end{array}

(50)

define different types of fairness, being γ → ∞ (max–min fairness) and γ = 1 (proportional fairness) the most relevant ones [29].

Note that (49) can be rewritten in the form of (19) and in particular, the problem is strictly convex and we assume that strong duality holds, i.e., there is at least one strictly feasible point. Therefore, we can apply the KKT optimality conditions to solve (49) semi-analytically. In this case, the optimal rates must verify^e

r_{j}^{*} (μ) = {[{(\frac{p_{j}}{μ})}^{\frac{1}{γ}}]}_{m_{j}}^{d_{j}}

(51)

and the optimal value of μ is such that $\sum_{j = 1}^{N} r_{j}^{*} (μ^{*}) = B$ . The bisection method is a classical technique widely used in the literature in order to approximate μ ^∗ but, alternatively, we can also apply the enhanced version of the CDM. Specifically, by adding the new variables {y _j} and identifying f _j(r _j) with − U _j(r _j) and h _j(r _j) = r _j, (23) together with $r_{j} = h_{j}^{- 1} (y_{j})$ , (51) turns into

y_{j} = {(\frac{p_{j}}{λ_{j}})}^{\frac{1}{γ}}

(52)

when m _j< y _j< d _j and has the required form in (20). Therefore, once the subsets $S^{*}$ , $I^{*}$ , and $A^{*}$ are known, the optimal value of μ is readily found according to (25) as

μ^{*} = {[\frac{\sum_{i \in A^{*}} \sqrt[γ]{p_{i}}}{B - \sum_{i \in I^{*}} m_{i} - \sum_{i \in I^{*}} d_{i}}]}^{γ} .

(53)

5.3.1 Numerical results

Let us draw the values of m _j from an integer uniform distribution between 0 and 10. Each request d _j is obtained summing m _j and an integer random number between 0 and 100. The priority values p _j are drawn from a uniform distribution that takes values between 0.25 and 5 in steps of 0.25 and γ = 1. Figure 8 plots the mean allocation time, i.e., execution time, of the CDM when centrally computed in combination with the stopping criterion defined in Section 4.2. The algorithm has been executed in a Intel ^Ⓒ Core 2 Duo CPU running at 2.2 GHz and programmed in Matlab ^Ⓒ. We have considered three different values for the total available bandwidth, namely $B 1 = \sum_{j} m_{j} + 0.25 \sum_{j} d_{j}$ , $B 2 = \sum_{j} m_{j} + 0.5 \sum_{j} d_{j}$ , and $B 3 = \sum_{j} m_{j} + 0.75 \sum_{j} d_{j}$ . The results of the CDM have been compared to the classical bisection method and to the hypothesis testing method [30]. Since the allocation time is not sensitive to the available capacity for the latter methods, in Figure 8 we distinguish among B 1, B 2, and B 3 only for the CDM.

In order to provide a fair comparison among the methods, it is necessary to take into account the accuracy with respect to the optimal solution. The hypothesis testing strategy always achieves the exact optimal solution (see the details in [30]). The bisection method has been adjusted to achieve a relative error in the allocation lower than 10⁻⁶, or in other words,

\frac{| | r^{BI} - r^{*} | |}{| | r^{*} | |} \leq 1 0^{- 6},

(54)

where r ^∗ is the optimal allocation (which can be obtained with the hypothesis testing method) and r ^BIis the allocation achieved by the bisection method. Initially, the two hypothesis for the values of μ are 0 and 10. In the CDM we stop the iterations when

\frac{| S C^{k + 1} - S C^{k} |}{| S C^{k + 1} |} \leq 1 0^{- 2} .

(55)

Note that as the number of users grows, the difference in time between the proposed method and the others also grows, specially when the system is more restricted in terms of capacity, i.e., for B 1. In this case, the CDM is able to compute the allocation in half the time required by classical methods. In terms of accuracy in the solution, (55) gives the exact optimal solution for B 1, B 2 and a relative allocation error lower than 10⁻⁴ for B 3. Overall, the solution is good in practice; it is optimal in capacity-constrained scenarios and nearly optimal in less critical situations. In order to illustrate the selection of the threshold for the stopping criterion, we plot in Figure 9 the evolution of the relative error and allocation time as a function of the accuracy in the stopping criterion. Note that a threshold of 10⁻² provides a good trade-off between both performance metrics for the worst scenario, i.e., for B 3. Finally, if we consider a higher available bandwidth, e.g., 99% of the whole system demand, this threshold value keeps the allocation time small as in B 3 at the expenses of a higher allocation error (around 5%). However, the accuracy degradation appears only in this extreme case and it is not critical in practice as far as all the users nearly reach their demands.

6 Conclusions and future work

This article has contributed with novel decomposition ideas that efficiently intertwine the classical primal and dual decomposition approaches in a single iteration of a new technique, called the CDM. It solves generic convex optimization problems that have one coupling constraint with the known advantages of decomposition-based approaches, that is, the implementation of decentralized solutions. However, it reduces the number of iterations by more than one order of magnitude with respect to the classical primal and dual decomposition solutions and furthermore, it is completely unsupervised, that is, there is no parameter that requires a manual adjustment. Moreover, when the problem is particularized (but still of interest), additional results regarding the convergence rate of the proposed technique are achieved and an stopping criterion that enhances the performance of the method (in terms of the number of iterations required to achieve the optimal solution) is derived.

The proposed method has been tested in three different problems, two dealing with power allocation in OFDM-based systems and a third one dealing with DBA. In the first two cases, the goal is to find the well-known water-filling solution in power. In one case, we benefit from a decentralized approach that suits the system architecture whereas in the other case, the proposed method is applied to a conventional OFDM transmission that deals with a time-varying channel. In both examples, we have compared our solution to other decomposition strategies and our approach performs significantly better than the available alternatives when a decentralized solution is required. In particular, our results show that the signaling requirements can be reduced at least by a factor 2. Moreover, in the centralized application of the method, that is, the conventional OFDM transmission, the proposed method benefits from being unsupervised and the channel variability does not compromise the performance as classical methods do. In particular, and at the high SNR regime, the difference in the NMSE of the power allocation between the proposed method and the bisection is more than two orders of magnitude in all the explored scenarios.

Finally, when applied to a NUM problem and thanks to the enhanced version of the method, the proposed technique reduces the computational time by a factor 2 with respect to well-established techniques such as the bisection method. This reduction is very important in systems having a large number of users (as it happens in satellite communication networks), where the bandwidth allocation has to be computed in a short-time interval.

Appendix

Proof of Proposition 1

For the sake of simplicity, in what follows we obviate the iteration index k as well as the modifier $\hat{(\cdot)}$ in $ŷ_{j}^{k}$ , ${\hat{λ}}_{j}^{k}$ , and μ ^k. Let us consider first the dependance of λ _j on y _j in the primal subproblems. Interestingly, the function p _j in (13) has already been studied in the convex literature and it is known as the primal function ([15], Sec. 5.4.4). We recall here two main results related to the primal function: (i) p _j defines a convex function over the set $P_{j}$ defined as $P_{j} = {y_{j} | p_{j} (y_{j}) < \infty}$ and (ii) the optimal value of the dual variable associated to the constraint h _j(x _j) ≤ y _j and with opposite sign, say $- λ_{j}^{*} (y_{j})$ , is a subgradient of p _j at y _j. In our case, note that $Y_{j} \subseteq P_{j} \forall j$ and thus, these results can be applied to (13). Furthermore, since p _j is convex in the region of interest, it is guaranteed to be continuous although not necessarily differentiable. However, given that the objective function as well as all the constraints in the problem (finite number of them) are differentiable, then for each singular value in $Y_{j}$ we can find an open interval that includes it where p _j is differentiable (except of course at the singular point). Then, taking into account that the first derivative of a convex function is non-decreasing by definition and noting that the subgradient equals to the gradient where the function is differentiable, we obtain the desired result. In other words, $- λ_{j}^{*} (y_{j})$ is non-decreasing in the intervals where p _j is differentiable just because the subgradient and the first derivative coincide whereas it takes a value in-between the right and left derivatives at the singular points, thus preserving the non-decreasing property. Finally, by removing the minus sign we can state that $λ_{j}^{*} (y_{j})$ is non-increasing on y _j.

Second, we prove that y _j(μ) is non-increasing on μ in (10). Let us first rewrite (10) as

d_{j} (μ) = min_{y_{j}} \{p_{j} (y_{j}) + μ y_{j}\}

(56)

Then take any two values of μ, say μ ₁ and μ ₂, that accomplish: (i) 0 ≤ μ ₁ < μ ₂ and (ii) there exist two values in $Y_{j}$ , $y_{j, 1}^{*}$ and $y_{j, 2}^{*}$ , such that $d_{j} (μ_{1}) = p_{j} (y_{j, 1}^{*}) + μ_{1} y_{j, 1}^{*}$ and $d_{j} (μ_{2}) = p_{j} (y_{j, 2}^{*}) + μ_{2} y_{j, 2}^{*}$ . In other words, $y_{j, 1}^{*}$ and $y_{j, 2}^{*}$ are minimizers of d _j(μ ₁) and d _j(μ ₂), respectively. Now, since $y_{j, 1}^{*}$ is not necessarily a minimizer of d _j(μ ₂), we can establish the following inequality,

\begin{align} d_{j} (μ_{2}) & = p_{j} (y_{j, 2}^{*}) + μ_{1} y_{j, 2}^{*} + (μ_{2} - μ_{1}) y_{j, 2}^{*} \\ \leq p_{j} (y_{j, 1}^{*}) + μ_{1} y_{j, 1}^{*} + (μ_{2} - μ_{1}) y_{j, 1}^{*} \end{align}

(57)

Next, we prove by contradiction assuming $y_{j, 2}^{*} > y_{j, 1}^{*}$ . In this case, it is true that (i) $(μ_{2} - μ_{1}) y_{j, 2}^{*} > (μ_{2} - μ_{1}) y_{j, 1}^{*}$ since μ ₂ − μ ₁ > 0 and (ii) $p_{j} (y_{j, 2}^{*}) + μ_{1} y_{j, 2}^{*} \geq p_{j} (y_{j, 1}^{*}) + μ_{1} y_{j, 1}^{*}$ since $y_{j, 1}^{*}$ is a minimizer of d _j(μ ₁). Finally, observations (i) and (ii) together contradict the inequality in (57) and therefore $y_{j, 2}^{*}$ must necessarily satisfy $y_{j, 2}^{*} \leq y_{j, 1}^{*}$ . This proves our second statement.

Proof of Proposition 2

Let us apply the KKT optimality conditions corresponding to (12). The Lagrangian is

\begin{matrix} L ({\hat{y}}^{k}, μ, α, β) = & \sum_{j = 1}^{J} {(y_{j}^{k} - ŷ_{j}^{k})}^{2} + μ (\sum_{j = 1}^{J} ŷ_{j}^{k} - C) \\ + α^{T} (y_{min} - \hat{y}) + β^{T} (\hat{y} - y_{max}) \end{matrix}

(58)

where $y_{min} = [min Y_{1}, \dots, min Y_{J}]$ and $y_{max} = [max Y_{1}, \dots, max Y_{J}]$ and, if we look at the optimality condition $\partial L / \partial ŷ_{j}^{k} = 0$ , we get

ŷ_{j}^{k} = y_{j}^{k} - μ - β_{j} + α_{j}, j = 1, \dots, J

(59)

Therefore, assuming that the j th optimal primal value, i.e., $ŷ_{j}^{k, *}$ , lies inside the interval $(min Y_{j}, max Y_{j})$ , then β _j= α _j= 0 due to slackness and $ŷ_{j}^{k} = y_{j}^{k} - μ$ (with $μ \in R$ ). If this is not the case, then either $ŷ_{j}^{k, *} = min Y_{j}$ or $ŷ_{j}^{k, *} = max Y_{j}$ . In both cases, note that we can choose an adequate value of the free dual multipliers α _j or β _j, respectively, in order to satisfy (59). Finally, taking these results into account, we can conclude that the solution to the primal projection is

ŷ_{j}^{k} = {[y_{j}^{k} - μ]}_{min Y_{j}}^{max Y_{j}}

(60)

where μ is adjusted to accomplish $\sum_{j = 1}^{J} ŷ_{j}^{k} = C$ . Since y ^k≽y ^∗ ( $\sum y_{j}^{k} > C$ unless $y_{j}^{k} = y_{j}^{*} \forall j$ ) and $\sum_{j = 1}^{J} y_{j}^{*} = C$ , it is necessary that μ > 0.

Proof of Proposition 3

From Proposition 2 it is verified that non-optimal values y that attain $\sum_{j = 1}^{J} y_{j} > C$ diminish its value in the primal projection in order to achieve $\sum_{j = 1}^{J} ŷ_{j} = C$ unless $y_{j} = min Y_{j}$ , in which case $\hat{y_{j}} = y_{j}$ . Let us distinguish two subsets of variables: $I$ includes the indexes of the values y _j that attain $y_{j} = min Y_{j}$ and $\bar{I}$ the rest. Note that $\bar{I}$ exactly contains the indexes of the ${\overset{̆}{λ_{j}}}$ candidates used in the dual projection. Then it is true that

\hat{y_{j}} \leq y_{j}^{*}, \forall j \in I \Rightarrow \sum_{j \in I} ŷ_{j} \leq \sum_{j \in I} y_{j}^{*},

(61)

and therefore,

\sum_{j \in \bar{I}} ŷ_{j} > \sum_{j \in \bar{I}} y_{j}^{*}

(62)

since the equality constraints $\sum_{j = 1}^{J} ŷ_{j} = \sum_{j = 1}^{J} y_{j}^{*} = C$ are always fulfilled. This fact assures that there is at least one value $ŷ_{j}$ with $j \in \bar{I}$ that attains $ŷ_{j} > y_{j}^{*}$ unless all the values are optimal yet.

Endnotes

^aNotation: ≼, ≺, ≽, and ≻ stand for component-wise inequalities.^bThe vector s is a subgradient of the function $f : R^{n} \to R$ at $x \in R^{n}$ if $f (y) \geq f (x) + {(y - x)}^{T} s, \forall y \in R^{n}$ . If f is differentiable at x, the subgradient s and the gradient ∇f(x) coincide. Otherwise, there exist many subgradients.^cThe notation $x_{j}^{*} (μ^{k})$ stands for the optimal solution of the j th subproblem given μ ^k.^dNotation: ⊙ stands for vector element-wise product. If a= [a ₁,a ₂,…,a _N]^T and b= [b ₁,b ₂,…,b _N]^T, then a⊙ b= [a ₁·b ₁,a ₂·b ₂,…,a _N· b _N]^T.^eNotation: ${[a]}_{m_{j}}^{d_{j}}$ equals a if m _j< a < d _j, m _j if a ≤ m _j and d _j if a ≥ d _j.

References

Bertsekas D, Nedić A, Ozdaglar A: Convex Analysis and Optimization. Belmont, MA, USA: Athena Scientific; 2003.
MATH Google Scholar
Boyd L, Vandenberghe S: Convex Optimization. Cambridge, MA: Cambridge University Press; 2003.
MATH Google Scholar
Palomar D, Cioffi J, Lagunas M: Joint Tx-Rx beamforming design for multicarrier MIMO channels: a unified framework for convex optimization. IEEE Trans. Signal Process 2003, 51(9):2381-2401. 10.1109/TSP.2003.815393
Article Google Scholar
Yache H, Mazumdar R, Rosenberg C: A game theoretic framework for bandwidth allocation and pricing in broadband networks. IEEE/ACM Trans. Netw 2000, 8(5):667-678. 10.1109/90.879352
Article Google Scholar
Xiao L, Johansson M, Boyd S: Simulatenous routing and resource allocation via dual decomposition. IEEE Trans. Commun 2004, 52(7):1136-1144. 10.1109/TCOMM.2004.831346
Article Google Scholar
Palomar D, Chiang M: Alternative decompositions for distributed maximization of network utility: framework and applications. IEEE Trans. Autom. Control 2007, 52(12):2254-2269.
Article MathSciNet Google Scholar
Tan CW, Palomar D, Chiang M: Distributed optimization of coupled systems with applications to network utility maximization. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toulousse; May 2006:981-984.
Google Scholar
Liao S, Cheng W, Liu W, Yang Z, Ding Y: Distributed optimization for utility-energy tradeoff in wireless sensor networks. In IEEE International Conference on Communications (ICC). Glasgow; June 2007:3190-3194.
Google Scholar
Xiao L, Boyd S: Optimal scaling of a gradient method for distributed resource allocation. J. Optim. Theory Appl. (JOTA) 2006, 129(3):469-488. 10.1007/s10957-006-9080-1
Article MathSciNet MATH Google Scholar
Palomar D, Fonollosa J: Practical algorithms for a family of waterfilling solutions. IEEE Trans. Signal Process 2005, 53(2):686-695.
Article MathSciNet Google Scholar
Palomar DP, Bengtsson M, Ottersten B: Minimum BER linear transceivers for MIMO channels via primal decomposition. IEEE Trans. Signal Process 2005, 53(8):2866-2882.
Article MathSciNet Google Scholar
Scaglione A, Barbarossa S, Giannakis GB: Optimal adaptive precoding for frequency-selective Nagakami-m fading channels. In IEEE 52nd Vehicular Technology Conference (VTC Fall 2000). Boston; September 2000:1291-1295.
Google Scholar
Marqués AG, Digham FF, Giannakis GB: Optimizing power efficiency of OFDM using quantized channel state information. IEEE J. Sel. Areas Commun 2006, 24(8):1581-1592.
Article Google Scholar
Arkhangel’skii A, Fedorchuk V: The Basic Concepts and Constructions of General Topology. In General Topology, I, Encyclopedia of the Mathematical Sciences. New York: Springer; 1990.
Google Scholar
Bertsekas D: Nonlinear Programming. Belmont, MA, USA: Athena, Scientific; 1999.
MATH Google Scholar
Arrow KJ, Hurwicz L, Uzawa H: Iterative Methods in Concave Programming. In Studies in Linear and Nonlinear Programming. Palo Alto: Stanford University Press; 1958:154-165.
Google Scholar
Holmberg K, Kiwiel K: Mean value cross decomposition for nonlinear convex problems. Optim. Methods Softw 2006, 21(3):401-417. 10.1080/10556780500098565
Article MathSciNet MATH Google Scholar
Holmberg K: Primal and Dual Decomposition as Organizational Design: Price and/or Resource Directive Decomposition. In Design Models for Hierarchical Organizations: Computation, Information, and Decentralization. Dordrecht: Kluwer Academic Publishers; 1995:61-92.
Chapter Google Scholar
Necoara I, Suykens JAK: Application of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 2008, 53(11):2674-2679.
Article MathSciNet Google Scholar
Dinh QT, Necoara I, Savorgnan C, Diehl M: An inexact perturbed path-following method for lagrangian decomposition in large-scale separable convex optimization. SIAM J. Optim 2013, 23(1):95-125. 10.1137/11085311X
Article MathSciNet MATH Google Scholar
Tseng P, Bertsekas D: On the convergence of the exponential multipliers method for convex programming. Math. Program 1993, 60(1-3):1-19. 10.1007/BF01580598
Article MathSciNet MATH Google Scholar
Polyak R: Primal–dual exterior point method for convex optimization. Optim. Methods Softw 2008, 23(1):141-160. 10.1080/10556780701363065
Article MathSciNet MATH Google Scholar
Akyildiz I, Mohanty S, Xie J: A ubiquitous mobile communication architecture for next-generation heterogeneous wireless systems. IEEE Commun. Mag 2005, 43(6):S29-S36.
Article Google Scholar
Wang D, Miao K, John V, Rungta S, Chan W: Considering wireless mesh network with heterogeneous multiple radios. In IEEE WiCom. Shanghai; September 2007:1681-1684.
Google Scholar
Cover T, Thomas J: Elements of Information Theory. New York: Wiley; 1991.
Book MATH Google Scholar
IEEE: Air Interface for Fixed and Mobile Broadband Wireless Access Systems; Amendment 2: Physical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Band and Corrigendum 1, IEEE Standards,. 2006.
Google Scholar
ETSI: Digital Video Broadcasting (DVB); Interaction Channel for Satellite Distribution Systems ETSI EN 301 790, 2005
Acar G, Rosenberg C: Weighted fair bandwidth-on-demand (WFBoD) for geostationary satellite networks with on-board processing. Comput. Netw 2002, 39(1):5-20. 10.1016/S1389-1286(01)00295-X
Article Google Scholar
Mo J, Walrand J: Fair end-to-end window-based congestion control. IEEE/ACM Trans. Netw 2000, 8(5):556-567. 10.1109/90.879343
Article Google Scholar
Seco-Granados G, Vazquez-Castro M, Morell A, Vieira F: Algorithm for fair bandwidth allocation with QoS constraints in DVB-S2/RCS. In Proceedings of the IEEE Global Telecommunication Conference (GLOBECOM). San Francisco, USA; November 2006:1-5.
Google Scholar

Download references

Acknowledgements

This work is supported by the Spanish Government under project TEC2011-28219 and the Catalan Government under grant 2009 SGR 298.

Author information

Authors and Affiliations

Telecommunications and System Engineering Department (TES), Universitat Autonoma de Barcelona (UAB), Q Building, 08193, Cerdanyola del Valles, Spain
Antoni Morell, José López Vicario & Gonzalo Seco-Granados

Authors

Antoni Morell
View author publications
You can also search for this author in PubMed Google Scholar
José López Vicario
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Seco-Granados
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoni Morell.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Morell, A., Vicario, J.L. & Seco-Granados, G. Coupled-decompositions: exploiting primal–dual interactions in convex optimization problems. EURASIP J. Adv. Signal Process. 2013, 41 (2013). https://doi.org/10.1186/1687-6180-2013-41

Download citation

Received: 24 October 2012
Accepted: 23 January 2013
Published: 05 March 2013
DOI: https://doi.org/10.1186/1687-6180-2013-41

Coupled-decompositions: exploiting primal–dual interactions in convex optimization problems

Abstract

1 Introduction

2 Problem formulation and existing solutions

2.1 Problem formulation

2.2 Primal decomposition

2.3 Dual decomposition

2.4 Primal–dual techniques

3 The CDM

3.1 Description of the method

3.1.1 Step 1: dual subproblems

3.1.2 Step 2: primal projection

3.1.3 Step 3: primal subproblems

3.1.4 Step 4: dual projection

3.2 The CDM in algorithmic form

3.3 Resource–price interpretation

3.4 Proof of the method

4 Convergence rate analysis and stopping criterion

4.1 Convergence rate analysis

4.2 Stopping criterion

4.3 Graphical comparison among decomposition techniques

5 Applications and numerical results

5.1 Decentralized power allocation for cognitive radios

5.1.1 Numerical results

5.2 Power allocation in a conventional OFDM transmission

5.2.1 Numerical results

5.3 Fair DBA

5.3.1 Numerical results

6 Conclusions and future work

Appendix

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Endnotes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords