Incentivecompatible demandside management for smart grids based on review strategies
 Jie Xu^{1}Email author and
 Mihaela van der Schaar^{1}
https://doi.org/10.1186/s1363401502359
© Xu and van der Schaar. 2015
Received: 28 October 2014
Accepted: 3 June 2015
Published: 19 June 2015
Abstract
Demandside load management is able to significantly improve the energy efficiency of smart grids. Since the electricity production cost depends on the aggregate energy usage of multiple consumers, an important incentive problem emerges: selfinterested consumers want to increase their own utilities by consuming more than the socially optimal amount of energy during peak hours since the increased cost is shared among the entire set of consumers. To incentivize selfinterested consumers to take the socially optimal scheduling actions, we design a new class of protocols based on review strategies. These strategies work as follows: first, a review stage takes place in which a statistical test is performed based on the daily prices of the previous billing cycle to determine whether or not the other consumers schedule their electricity loads in a socially optimal way. If the test fails, the consumers trigger a punishment phase in which, for a certain time, they adjust their energy scheduling in such a way that everybody in the consumer set is punished due to an increased price. Using a carefully designed protocol based on such review strategies, consumers then have incentives to take the socially optimal load scheduling to avoid entering this punishment phase. We rigorously characterize the impact of deploying protocols based on review strategies on the system’s as well as the users’ performance and determine the optimal design (optimal billing cycle, punishment length, etc.) for various smart grid deployment scenarios. Even though this paper considers a simplified smart grid model, our analysis provides important and useful insights for designing incentivecompatible demandside management schemes based on aggregate energy usage information in a variety of practical scenarios.
1 Introduction
Governments and relevant industries are making a significant effort to develop nextgeneration energy grids (“smart grid”) which meet new environmental requirements as well as increased usage demands [1]. To address these challenges, demandside management (DSM) techniques for smart grids (see e.g., [1, 2]) were proposed as a way to significantly save energy. In a typical configuration, the energy producer (e.g., a utility company) periodically receives usage information from the smart meter affiliated with consumers via a communication network. The energy producer then manages the energy generation/purchase/transmission and bills the consumer based on this usage information. To save energy, reduce cost, and increase reliability, the energy producer can use for instance smart pricing to encourage consumers to transfer peakhour consumptions to offpeak hours [3].
We model participating consumers’ interactions as a repeated game [4] in which the energy consumption game is played repeatedly (e.g., every day). We assume that the energy delivery system is deploying a protocol which is designed by the utility company or a third party aiming to maximize the system efficiency (i.e., the sum utility of the consumers in this paper). Besides designing a smart pricing scheme that is based on the aggregate usage pattern of the consumers, this protocol designer also constructs a protocol which recommends a set of scheduling actions to consumers based on the past prices, which depend on the history of all the consumers’ energy consumption and scheduling. The protocol designer is only active during the design stage of the protocol and it is passive at run time. In implementing the protocol, the utility company observes every day the (aggregate) energy consumption pattern of the consumer set and based on this performs billing every billing cycle. The consumers use the past prices to determine their future energy consumption scheduling actions in a selfinterested and completely decentralized manner. Note that the protocol designer is passive at run time, and hence, it cannot oblige the consumers to follow the recommended energy scheduling. The consumers will only adopt the recommended consumption scheduling if it is in their selfinterest to do so, i.e., if they are better off following the recommended protocol rather than deviating from it. Such a protocol is called incentivecompatible (IC).
There are two important points that are worth noting. First, even though the consumers who did not deviate will also receive low utilities due to the raised prices, the protocol is designed in such a way that consumers still want to carry out such a punishment. Second, the utilities obtained by the consumers in the punishment phase are lower than those in the review phase. However, they are the same as in the scenario where there was no review strategy protocol being deployed. Therefore, the review strategybased protocol is guaranteed to achieve better system performance than that which can be obtained without deploying the protocol. Importantly, the protocol designer needs to carefully design the protocol such that this punishment does not take place too frequently since it reduces all consumers’ utilities. We rigorously characterize the performance of deploying review strategies and determine the optimal design (optimal billing cycle, punishment phase length, etc.) for various smart grid deployment scenarios. It is also important to note that the proposed review strategybased protocol can be deployed in conjunction with any DSM scheme besides the smart pricing scheme which is used for illustration in this paper, which may take into account findgrained load scheduling (e.g., hourly scheduling). The scheduling results of any DSM scheduling schemes can serve as the input of our design framework, and the output is the review strategybased protocol.
The rest of this paper is organized as follows. In Section 2, related works are discussed. Section 3 models the repeated energy consumption game and formulates the design problem of the DSM with usage data aggregation. Section 4 formally introduces the proposed incentive protocol based on review strategies. Section 5 determines the optimal protocol parameters and evaluates its performance. Section 6 provides simulation results. Section 7 concludes the paper.
2 Related works
A main issue for the efficient deployment of smart grids is the design of DSM [5]. A large body of literature assumes the deployment of smart meters and designs smart pricing schemes to encourage individual consumers to manage their own loads (e.g., by shifting their energy consumption from peak hours to offpeak hours). Among them, realtime pricing (RTP) [6], timeofuse pricing (TOUP) [7], and criticalpeak pricing (CPP) [8] represent popular options.
Recent works [3, 9–14] considered consumers’ discomfort costs and aimed to jointly minimize the consumers’ billing and discomfort costs, by assuming some utility functions. These works can be classified into two categories. In the first category, consumers are assumed to be pricetaking, meaning that they do not consider how their consumption will affect the prices. In this case, the decisionmaking of a single foresighted consumer is formulated as a stochastic control problem aiming to maximize its longterm utility [11–13]. Alternatively, in [15, 16], multiple myopic consumers aim to maximize their utility, and their decisions are formulated as static optimization problems among cooperative users.
The second category assumed that consumers are myopic and priceanticipating, meaning that they take into account how their consumption will affect the prices. In this case, each consumer’s electricity usage affects the other consumers’ billing costs. These works [3, 17, 18] model the interaction emerging among myopic consumers as oneshot games and studied the Nash equilibrium (NE) of the emerging game. In this paper, we also model the consumers as priceanticipating. However, consumers interact with each other repeatedly and are foresighted, thereby engaging in a repeated game. It is wellknown that the Nash equilibrium in oneshot games with myopic players is often inefficient. In this paper, we design a novel class of incentive protocols based on review strategies in order to achieve the socially optimal load scheduling in smart grid systems. Prior work [19] also studied DSM in a repeated game setting. However, that work assumes that each individual consumer’s action can be perfectly observed, while in this work, only the aggregate scheduling of a set of participating consumers can be observed with noise.
This paper adopts a similar pricing method as in [3] where pricing is performed based on the aggregate usage pattern of a set of consumers and consumers are charged proportional to their total daily energy consumption. It is argued in [3] that this proportional charging model is consistent with the existing residential metering models. Nevertheless, our work can be used in conjunction with a variety of existing DSM scheduling methods [6–8]. Augmenting these methods with our proposed incentive protocols is especially important when consumers have incentives to deviate from the optimal scheduling given by a DSM scheme (e.g., when it is performed on the aggregate energy usage information).
The approach proposed in this paper contributes to both the smart grid and the gametheoretic literature dedicated to engineering applications. Review strategies have been adopted in the principalagent games with discounting in [20] in economics, and such games differ significantly from the smart grid deployment scenario considered in this paper. In [20], the game is played between only two players (i.e., a principal and an agent). However, in the considered smart grid scenario, there are multiple players (i.e., a set of consumers) and their utilities exhibit negative externalities.
Comparison with existing works
Pricetaking or  Myopic or  Model  Observe individual  

priceanticipating  foresighted  or aggregate usage  
Pricetaking  Myopic  Optimization  Individual  
Priceanticipating  Myopic  Oneshot game  Individual  
[19]  Priceanticipating  Foresighted  Repeated game  Individual 
This work  Priceanticipating  Foresighted  Repeated game  Aggregate 
3 System model
3.1 Power system
We consider a smart grid system with multiple consumers and one energy producer, e.g., a utility company. These consumers receive electricity from the same aggregator which distributes electricity to the consumers. Each consumer is equipped with an energy consumption scheduler (ECS) for scheduling the household energy consumption. A smart meter is connected to the set of consumers from which it collects and analyzes the energy consumption. This smart meter gathers (almost) accurate readings automatically, at requested time intervals, and relays them to the utility company. Using this information, the aggregator (utility company) can adjust its energy generation, purchase, and transmission accordingly. The communication between the utility company and the consumers’ smart meters is done through the local area network (LAN) by using appropriate communication protocols. Let \(\mathcal {N}\) denote the set of consumers that share the same aggregator, where the number of consumers is N. Figure 1 illustrates the system model.
where b _{peak}>b _{off−peak}>0 and B(·) is a benefit function.
where κ is the revenue/cost ratio. If κ=1, then the billing system is budgetbalanced and the utility company charges the consumers only the generating/providing energy costs for the utility (i.e., the utility company does not make money and it serves simply as a benevolent energy provider). If κ>1, then the difference between the total charges to the consumers and the total energy cost represent the profit made by the utility company. In this paper, the protocol designer is considered to be benevolent and represent the consumer’s interests rather than maximizing the utility company’s profit. Thus, κ=1 in the subsequent analysis.
We make the following standard assumptions on the benefit function and cost functions throughout this paper.
Assumption 1.
(1) The benefit function B(x) is increasing and concave in x. B(0)=0. (2) The cost functions C _{peak}(x) and C _{off−peak}(x) are increasing and strictly convex in x. C _{peak}(0)=C _{off−peak}(0) and Cpeak′(x)≥Coff−peak′(x),∀x≥0.
where ε is induced by the monitoring noise of the usage pattern. Because we assume that the schedulable demand vector d is fixed, the price p(D _{peak},D _{off−peak}) only depends on the consumption scheduling actions a of the consumers. Therefore, we alternatively write the price p(a) as a function of the scheduling action profile p(a). Note that p(a) is the expected price if a is taken and \(\hat {p}({\boldsymbol {a}})\) is the actual realized price if a is taken in the noisy environment. When consumers are making scheduling decisions, only p(a) is important since consumers can only compute the expected price but not the actual price which has not been realized yet.
A billing cycle consists of L days. At the beginning of each billing cycle, the consumers determine the consumption scheduling actions for the next L days. At the end of each billing cycle, the utility company posts bills as well as the electricity price of the previous cycle. Using the pricing information, the consumers are able to infer the aggregate daily usage pattern in the last L days. However, since the price does not perfectly reflect the aggregate usage pattern due to the noise term, the consumers’ knowledge of the aggregate usage pattern is imperfect.
3.2 Energy consumption game: stage game

Players: consumers in the set \(\mathcal {N}\).

Actions: each consumer \(n \in \mathcal {N}\) selects its energy consumption scheduling action \(a_{n} \in \mathcal {A} = [0,1]\), i.e., the fraction of its energy consumption in peak hours ^{3}.

Payoffs: the utility for consumer n is the benefit obtained by the energy consumption minus the payment to the utility company as in (4). By separating consumer n’s action from other consumers’ actions, the utility can also be written as$$\begin{array}{*{20}l} &u_{n}\left(a_{n};{\boldsymbol{a}}_{n}\right) = b_{n}\left(a_{n}\right)  p\left(a_{n}, {\boldsymbol{a}}_{n}\right) d_{n} \end{array} $$(5)
where a _{−n }, by convention, is the action profile of consumers except consumer n.
Since consumers are selfinterested, they will want to selfishly maximize their own utilities. We use Nash equilibrium as the solution concept of this energy consumption game.
Definition 1 (NE).
A Nash equilibrium action profile a ^{ N E } is such that, \(\forall n \in \mathcal {N}, \forall \tilde {a}_{n}, u_{n}\left (a^{NE}_{n},{\boldsymbol {a}}^{NE}_{n}\right) \geq u_{n}\left (\tilde {u}_{n}, {\boldsymbol {a}}^{NE}_{n}\right)\).
In NE, no agent can improve its own utility by unilaterally changing its own action. Theorem 1 proves the existence of NE of the considered energy consumption game.
Theorem 1.
There exists at least one NE in the energy consumption game.
Proof.
We will show that energy consumption game is a strictly concave Nperson game. The existence of NE for this type of games then directly results from [21].
 (1)
By Assumption 1, we know that B(·) is a concave function. It is then straightforward to see that b _{ n }(a _{ n }) is concave in a _{ n }.
 (2)Now consider the price term. We perform secondorder derivative with respect to a _{ n },$$\begin{array}{*{20}l} {}\frac{\partial p\left({\boldsymbol{a}}\right)}{\partial a} = \frac{{d_{n}^{2}} \left(C^{\prime\prime}_{\text{peak}}\left(D_{\text{peak}}\right) + C^{\prime\prime}_{\mathrm{offpeak}}\left(D_{\mathrm{offpeak}}\right)\right)}{\sum_{n} d_{n}} > 0 \end{array} $$(6)
Hence, the payment term is strictly convex in a _{ n }.
In sum, u _{ n }(a) is strictly concave in a _{ n }. Thus, there exists at least one NE in the energy consumption game.
It is well known that NE is often not efficient in terms of Paretooptimality. If an action profile is Paretooptimal, then no consumer can gain a higher utility without decreasing at least one other consumer’s utility by using a different action profile. We provide a formal definition as follows.
Definition 2 (Paretooptimal (PO)).
An action profile a is Paretooptimal if there does not exist any other action profile \(\tilde {{\boldsymbol {a}}}\) such that \(u_{n}\left (\tilde {{\boldsymbol {a}}}\right) \geq u_{n}\left ({\boldsymbol {a}}\right),\forall n\).
Even though a PO action profile a ^{PO} is superior to a ^{NE}, consumers do not have incentives to automatically adopt this action profile in the energy consumption game since they are selfinterested. This is because for any action profile that is not a NE, there will always be some consumers who want to schedule a different amount of energy consumption in peak hours to increase their own utilities. In order to provide incentives for the consumers to play the PO action profile, in this paper, we will design a protocol that exploits the ongoing nature of consumers’ energy consumption interactions. For notational simplicity, we denote u _{ n }(a ^{NE}) as \(u_{n}^{\text {NE}}\) and u _{ n }(a ^{PO}) as \(u_{n}^{\text {PO}}\). Before we proceed to that, we provide a simple example to illustrate the difference between a ^{NE},a ^{PO}, and the inefficiency of a ^{NE}.
Example (Two consumers). Consider two consumers: N=2. Let κ=1,b _{peak}=2,b _{off−peak}=1,d _{1}=d _{2}=1 and B(x)=x. The cost functions are simple quadratic functions as in [22]: C _{peak}(x)=x ^{2},C _{off−peak}=0.5x ^{2}. The utility of consumer n is therefore u _{ n }(a _{1},a _{2})=1+a _{ n }−0.5[(a _{1}+a _{2})^{2}+0.5(2−a _{1}−a _{2})^{2}].
In this twoconsumer game example, the unique symmetric NE action profile is a1NE=a2NE=2/3. That is, both consumers schedule 2/3 of their energy consumption in peak hours and no one wants to unilaterally schedule a different amount since that will only reduce its own utility. Thus, the corresponding utilities are u1NE=u2NE=2/3. However, the sum utility is maximized when both users choose the action profile a1PO=a2PO=0.5, which is the unique symmetric PO action profile. The corresponding utilities are u1PO=u2PO=5/4. The sum utility obtained when taking this action profile is 87.5 % higher than that by taking the NE action profile (i.e., 4/3), thereby indicating the inefficiency of NE.
3.3 Repeated energy consumption game
That is, the longterm utility is the normalized sum of the discounted expected stage utilities which are induced by the strategy where the expectation is taken over the probability over different sequences of action profiles \(\left \{{\boldsymbol {a}}^{t}\right \}_{t=1}^{\infty }\).
3.4 Problem formulation
The constraint requires that all consumers have incentives to follow the protocol Ψ.
4 Protocol based on review strategies
A possible incentive compatible protocol is the grimtrigger strategy [4] which uses the strongest punishment that can be imposed upon deviation. In the trigger strategy, following any point in time at which there is any evidence that any consumer had deviated from any previous recommendation, the protocol designer recommends that each consumer consumes afterwards a large amount of energy in peak hours (i.e., revert to NE action profile forever afterwards). In the twoconsumer example, the protocol designer recommends both consumers to take the PO actions a1PO=a2PO=0.5 in each billing cycle. If there is a deviation, the protocol designer recommends both consumers to take the NE actions a1NE=a2NE=2/3 forever (i.e., all the way until the consumers unsubscribe from this service). Therefore, the longterm utility that consumer 1 can receive by following the recommendation is \(u^{\text {PO}}_{1} + \sum _{t=1}^{\infty } \delta ^{t} u^{\text {PO}}_{t}\), and the longterm utility by deviation is \({u^{d}_{1}} + \sum _{t=1}^{\infty } \delta ^{t} u^{\text {NE}}_{t}\) where \({u^{d}_{1}} > u^{\text {PO}}_{1}\) is the oneshot utility by deviation. Because u1PO>u1NE, the grimtrigger strategy may provide sufficient incentives for the consumers to play the PO actions. Since no users want to deviate, the system never enters the NE equilibrium phase, thereby leading to the highest social welfare. However, if there is noise in the prices, the designer will determine that some users have deviated by accident almost surely, and hence, the system will eventually enter the punishment phase with probability one and stay there indefinitely. This leads to the lowest social welfare. Therefore, a protocol which allows stopping the punishment after a certain time is needed in such imperfect monitoring scenarios.
4.1 Protocols based on review strategies
Importantly, note that the utilities obtained by the consumers in the punishment phase are lower than in the review phase and they are the same as in the scenario where there was no review strategy protocol being deployed. Therefore, the review strategybased protocol is guaranteed to achieve better system performance than that which can be obtained without deploying the protocol. However, the designer still needs to carefully design the protocol such that the system enters the punishment phase as rarely as possible.
Given this review strategy structure, the protocol can be fully characterized by the length of the review phase (i.e., the billing cycle length) L, the length of the punishment phase KL and the statistical test G. Therefore, we write a protocol based on review strategies as Ψ(L,K,G).
Information and computation of protocol based on review strategies
The design stage  
1.  Each consumer reports d _{ n } and B(·) to designer. 
2.  Designer computes a ^{PO} and a ^{NE}. 
3.  Designer determines the protocol ϕ(L,K,G). 
4.  Designer informs each consumer about a ^{PO},a ^{NE},p(a ^{PO}), and Ψ(L,K,G). 
The operation stage (each billing cycle m):  
1.  Each consumer performs its load schedule \({a^{t}_{n}}\), ∀t=m L+1,…,(m+1)L. 
2.  Designer determines \(\hat {p}^{t}, \forall t = mL + 1,\ldots, (m+1)L\) based on the actual load schedule. 
3.  Each consumer is charged according to its load d _{ n } and the price \(\hat {p}^{t}, \forall t = mL + 1, \ldots, (m+1)L\) at the end of the current cycle. 
4.  The statistical test G is performed using \(\hat {p}^{t}\), ∀t=m L+1,…,(m+1)L to determine which phase (review or punishment) the next billing cycle is in. 
4.2 Performance metrics
To evaluate the protocols based on review strategies, we need to define several performance metrics. First, the protocol should be IC since the consumers are selfinterested and will only follow recommendations when this is in their selfinterest. Particularly, consumers should have incentives to take the PO scheduling actions in the review phase.
Proposition 1.
In a PO action profile, a selfinterested consumer will not want to schedule less than the recommended energy consumption in peak hours in the stage game.
Proof.
If consumer n can increase its stage utility by choosing \({a^{d}_{n}} < a^{PO}_{n}\), then all other consumers’ utilities are also increased because the price is reduced as the aggregate usage in peak hours decreases. This causes a contradiction to the definition of “Paretooptimality”.
Proposition 1 states that, given the daily desired energy demand of a consumer, if the designer recommends to it the PO scheduling action, then the selfinterested consumer will schedule no less than the PO energy consumption in peak hours in order to maximize its own utility assuming other consumers are complying with the recommended PO scheduling. Note that this applies even in the case when the consumer’s daily demand is low since the consumer is simply rescheduling its desired amount of energy consumption to different hours. In the twoconsumer example, given the recommended PO action profile a _{1}=a _{2}=0.5, a consumer can only increase its utility by scheduling more than half of its energy consumption in peak hours to increase its own utility. Therefore, we only need to focus on the case \({a^{d}_{n}} > a^{\text {PO}}_{n}\) when considering consumers’ incentive problems and denote the corresponding utility by \({u^{d}_{n}}\) for consumer n. Note that consuming more energy in peak hours increases the price and, hence, consumer n’s own payment is increased. However, because the costs are shared among the entire consumer set, consumer n is still able to receive a higher utility \({u^{d}_{n}} > u^{\text {PO}}_{n}\) by unilaterally increasing its own consumption in peak hours when ∇_{ n } u _{ n }(a ^{PO})>0.
At the beginning of each billing cycle, the consumers determine the scheduling actions for this billing cycle based on the previous energy consumption history. We will focus on constant strategies during a billing cycle, i.e., consumer n chooses a constant scheduling action every day during a billing cycle. However, our analysis can also be extended to more sophisticated strategies (i.e., the consumer may use different scheduling actions during a billing cycle) by taking the equivalent average scheduling action. Let U _{ n }(σ ^{ R }s=1) and U _{ n }(σ ^{ R }s=0) denote the longterm utilities of consumer n at the beginning of a review phase and a punishment phase, respectively, if all consumers follow the strategy σ. Let \(\tilde {\sigma }^{R}_{n}\) denote a strategy where consumer n deviates to some \({a^{d}_{n}} > a^{\text {PO}}_{n}\) in the current review phase and follow the review strategy afterwards and all consumers follow σ ^{ R } all the time. The following proposition characterizes the IC condition of a protocol.
Proposition 2.
Proof.
This directly results from oneshot deviation principle in repeated games [4].
Proposition 2 uses the oneshot deviation principle in repeated games and shows that if a consumer cannot gain by unilaterally deviating from the recommended strategy σ ^{ R } in the current billing cycle and following afterwards, it cannot gain by switching to any other strategies either and vice versa. In the punishment phase, the recommended strategy is the NE scheduling action profile a ^{NE}, and hence, the consumers will not have incentives to deviate from the recommended action. Therefore, we only need to check whether the consumers have incentives to take the recommended PO profile a ^{PO}, i.e., whether the oneshot deviation principle holds in the review phase. The lefthand side of (10) represents the longterm utility by following σ ^{ R }, and the righthand side represents the longterm utility for consumer n by deviating to \({a^{d}_{n}}\) only in the current review phase while all other consumers follow σ ^{ R }. In Section 4.2, we will use Proposition 2 to construct IC protocols based on review strategies.
The goal of the protocol designer is to maximize the sum utility of all the users. The maximum sum utility is achieved when all consumers take the PO scheduling actions in each time slot, denoted by \(V^{*} = \sum _{n} u_{n}^{\text {PO}}\). We call V ^{∗} the “firstbest” utility which yields the maximum sum utility for the consumers. The efficiency loss of a protocol is defined by C(Ψ)=(V ^{∗}−V(Ψ))/V ^{∗}.
Definition 3.
A protocol Ψ is said to be ΔPareto optimal (ΔPO) if C(Ψ)<Δ.
A ΔPO protocol yields a sum utility no less than 1−Δ of the firstbest sum utility. If a protocol Ψ prescribes the scheduling action profile a ^{PO} every day regardless of the history, then V(Ψ)=V ^{∗} and thus Ψ achieves the firstbest sum utility (0PO). However, such a protocol is not IC, and hence, the firstbest is not achievable.
4.3 Statistical test
If there are some consumers who deviated by consuming more than the recommended amount of energy during the peak hours, the price of that day is increased^{4}. Due to the monitoring noise, the actual price is \(\hat {p}\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right) = p\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right) + \epsilon \).

False alarm probability q _{ F }(L,p _{th}): the probability that z=0 when all consumers follow the recommended strategy σ ^{ R }, i.e., they all scheduling the optimal amount of energy consumption in peak hours.

Miss detection probability q _{ M,n }(L,p _{ th }): the probability that z=1 when consumer n deviates to the action \({a^{d}_{n}}\) in the previous billing cycle, i.e., it schedules more than the optimal amount of energy consumption in peak hours while all other consumers take the optimal scheduling.
Note that our design can also be extended to analyze the collusion problem where a subset of consumers colludes in order to gain higher utilities. In that case, we can simply take the consumers who collude as a single consumer with its demand being the aggregate demand.
The following proposition characterizes the impact of the threshold and the billing cycle length on the error probabilities.
Proposition 3.
\(\forall p_{\textit {th}} \in \left [0, p\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right)p\left ({\boldsymbol {a}}^{\text {PO}}\right)\right ], \lim \limits _{L\to \infty } q_{F}\left (L, p_{\textit {th}}\right) = 0\) and \(\lim \limits _{L\to \infty } q_{M,n}\left (L, p_{\text {th}}\right) = 0\).
Proof.
By the law of large numbers, the sample averages converge in probability and almost surely to the expected value as the sample number tends to infinity. Therefore, when the threshold is large than 0, the false alarm probability goes to 0 and when the threshold is smaller than \(p\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right) p\left ({\boldsymbol {a}}^{\text {PO}}\right)\), the miss detection probability goes to 0.
Proposition 3 informs the protocol designer’s selection of the statistical test (i.e., the test threshold). It also proves that, in order to accurately detect selfinterested consumers’ excessive usage in peak hours, a longer billing cycle should be chosen such that the monitoring mitigated. However, a very long billing cycle also reduces the consumers’ valuation of utilities in the future billing cycles, and hence, the punishment may not be strong enough to induce consumers’ compliance. Therefore, the optimal billing cycle as well as the punishment length should be carefully designed to induce the optimal performance of the smart grid system.
5 Design and performance analysis

We first establish the existence of IC protocols based on review strategies and provide the IC conditions such that the consumers have incentives to perform the recommended scheduling in the review phase.

Next, we propose a greedy algorithm to determine the optimal design of review strategies.

We then evaluate the performance of the optimal protocol based on review strategies. Specifically, the proposed protocol is able to achieve ΔPO of the firstbest efficiency.
5.1 Incentivecompatibility
The first term in (15) is the utility gain by the deviation in the current review phase which is larger than \(u^{\text {PO}}_{n}\). The second term is the continuation utility after the review phase. With probability 1−q _{ M,n }(L,p _{th}) the deviation is detected, so the system moves to a punishment phase; with probability q _{ M,n }(L,p _{th}) the system remains in the review phase.
Let us examine the physical meaning of this incentive ratio. Essentially, the numerator represents the longterm utility gain due to deviation, and the denominator represents the maximal longterm utility loss due to the punishment. To enforce consumers to cooperatively optimize their energy consumption, this loss should be positive and larger than the gain obtained when deviating. Therefore, the incentive ratio should be in the range [0,1]. It is worth noting that g _{ n }(L,p _{ th }) should be strictly less than 1 since the denominator is only an upper bound on the loss induced by the punishment but not the actual loss (which depends on L,K,p _{th}). Theorem 2 provides a condition such that a protocol based on review strategy is IC against a deviation action \({a^{d}_{n}}\). This condition serves as a guideline for the choices of the proper protocol parameters L,K, and p _{th}.
Theorem 2.
Proof.
The sufficient and necessary condition for the above to hold is 0≤g _{ n }(L,p _{th})<1 and K L≥ logδ(1−g _{ n }(L,p _{th})).
5.2 Optimal protocol parameters
We first determine the efficiency of a given protocol. If a protocol is IC, then all consumers follow the recommended strategy and play the PO action profile. Therefore, the efficiency depends on the probability that the system is in review phases and punishment phases due to monitoring errors. Denote these two probabilities by η _{ R }(Ψ) and η _{ P }(Ψ)=1−η _{ R }(Ψ), respectively. The efficiency of an IC protocol is thus \(V(\Psi) = \sum _{n} (u^{\text {PO}}_{n} \eta _{R}(\Psi) + u^{\text {NE}}_{n} \eta _{P}(\Psi))\) where η _{ R }(Ψ) is determined in the following Lemma.
Lemma 1.
\(\eta _{R}(\Psi) = \frac {1}{1+{Kq}_{F}\left (p_{\textit {th}},L\right)}\)
Proof.
Lemma implies that in order to maximize the system efficiency, the protocol designer should choose L,K,p _{th} such that K q _{ F }(L,p _{th}) is minimized subject to IC conditions in Theorem 2. These design parameters are coupled in a complex manner, and thus, to find the optimal design parameters, it is better to work backwards.
In general, the statistical test threshold p _{th} has two opposite effects on K ^{∗}(L,p _{th})q _{ F }(L,p _{th}). Minimizing q _{ F }(L,p _{th}) often leads to a large q _{ M,n }(L,p _{th}), and hence, q _{ F }(L,p _{th})+q _{ M,n }(L,p _{th}) may also be large. This by (16) induces a large K ^{∗}(L,p _{th}). Therefore, the protocol designer has to tradeoff when selecting p _{ th } between minimizing the false alarm probability q _{ F }(L,p _{ th }) and the punishment phase length K ^{∗}(L,p _{th}).
Step 3. The previous two steps provide the optimal \(p^{*}_{\text {th}}\) and \(K^{*}(L, p_{\text {th}}^{*})\) given the review phase length L. However, the space of L includes all positive integer numbers and is infinite. In the following, we determine the upper bound on L such that an IC protocol can be designed.
Proposition 4.
Proof.
If (22) does not hold, then there must exist \(n \in \mathcal {N}\) such that g _{ n }(L,p _{th})>1 which violates the IC condition in Theorem 2. Therefore, for a protocol to be IC, (22) must be satisfied.
Proposition 4 leads to a crucial tradeoff of the review phase length. On one hand, the protocol designer wants to choose a longer review phase period L because it improves the monitoring accuracy, and hence, there is a smaller probability that the system goes into a punishment phase due to monitoring errors. On the other hand, a longer review increases the current gain of a consumer in the review phase upon deviation while it reduces the future loss due to the punishment because of the discounting of the future utility. This requires a stronger punishment (longer punishment phase), and hence, it induces more energy consumption in peak hours as the system stays longer in the punishment phase. More importantly, if the review phase is too long, then even the strongest punishment (i.e., a trigger strategy that prescribes to stay in the punishment phase forever upon deviations) is not able to provide the sufficient incentives for the consumers to schedule the PO energy consumption in the current review phase. Therefore, given consumers’ valuation of the future utility (i.e., the discount factor δ) and the structure of the stage game (i.e., \({u^{d}_{n}}, u^{\text {PO}}_{n}, u^{\text {NE}}_{n},\forall n \in \mathcal {N}\)), there is a maximum length of the review phase, and hence, the billing cycle should not be too long. To make the protocol IC, the protocol designer must choose a review phase no longer than the upper bound determined in Proposition 4. The optimal review phase length L (i.e., billing cycle) is thus such that it minimizes the product of \(K^{*}\left (L, p^{*}_{\text {th}}\right)\) and \(q_{F}\left (p^{*}_{\text {th}}, L\right)\) which have been determined in the previous two steps for a fixed L. Based on the above three design steps, we propose a greedy algorithm (presented in Table 2) to determine the optimal protocols based on review strategies which requires only finite iterations on L. If the candidate threshold p _{th} belongs to a continuous interval, then we need to quantize p _{th} to solve (19) in finite iterations, and hence, the algorithm leads to a suboptimal protocol.
5.3 Performance evaluation
In the previous subsection, we determine the optimal protocol design. In this subsection, we characterize the performance of these optimal protocols. Specifically, we are interested in determining whether a protocol based on review strategies Ψ can be ΔPO.
Recall that the firstbest efficiency is \(V^{*} = \sum _{n} u^{\text {PO}}_{n}\), and the sum utility of a given IC protocol can be determined using the result in Proposition 3. Therefore, if η _{ R }(Ψ) is close enough to 1, then (V(Ψ)−V ^{∗})/V ^{∗} can be made less than a given Δ. According to Proposition 3, this can be done if q _{ F } is close enough to 0 and if K is finite. That is, an accurate enough statistical test is required. By Proposition 4, if L is long enough, then the monitoring can be accurate enough. However, a long L can be chosen only when the consumers valuate the future utilities sufficiently high enough. The following theorem characterizes the condition when a protocol is ΔPO.
Theorem 3.
\(\forall p_{\textit {th}} \in \left [0, {p^{d}_{n}}  p^{PO}_{n}\right ]\), for a given Δ∈[0,1], there exists δ _{ min }∈(0,1), such that ∀δ≥δ _{ min } there exists a protocol Ψ(L,K,p _{ th }) such that it is ΔPO.
Proof.
Note that Δ∈(0,1)→A∈(0,1), Δ=0→A=1, and Δ=1→A=0.
Such \(\bar {L}\) exists due to the law of large numbers (see Proposition 3).
This implies that there exists a δ _{min} close to 1 such that for every δ≥δ _{min} the above inequality defines an IC protocol (the inequality must be strict). Since (28) is exactly that same as (25), the constructed protocol \(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is IC for δ≥δ _{min}.
Theorem 3 characterizes the asymptotical performance of the protocols based on review strategies. Specifically, when the consumers highly value their future utilities, then the protocol designer can design a protocol based on review strategies whose sum utility is close enough to the firstbest. That is, consumers will comply with the recommended scheduling most of the time.
Corollary 1.
If δ→1, then the efficiency loss of the optimal protocol goes to 0.
The corollary states that if the consumers do not discount their future utilities, then the protocol designer can design a review strategybased protocol which is IC and also asymptotically achieves the full efficiency.
Remark: Our analysis depends on the deviation strategy a ^{ d }. To compute \({a^{d}_{n}}\), we consider the unilateral deviation by consumer n at the optimal action profile a ^{OPT}, i.e., \({a^{d}_{n}} = a^{*}_{n} = \arg \max _{a_{n}} \mu _{n}\left (a_{n}; {\boldsymbol {a}}^{\text {OPT}}_{n}\right)\). In this way, a consumer chooses to deviate to the action that maximizes its own utility. However, if the consumer is “smarter”, it can deviate to a slightly lower action than \(a^{*}_{n}\) to avoid being detected but still gains some increased utility (of course, since there is noise, the probability of being detected is higher if its selected action is closer to \({a^{d}_{n}}\)). Hence, a more practical way to set \({a^{d}_{n}}\) is by setting a maximal tolerable deviation action \({a^{d}_{n}} = (1+\gamma)a^{\text {OPT}}_{n}\) where γ<1 depends on the maximal tolerable social welfare loss. Since, by doing this the (oneshot) social welfare loss is at most \(u_{n}\left ({\boldsymbol {a}}^{\text {OPT}}_{n}\right)  u_{n}\left ((1+\gamma){\boldsymbol {a}}^{\text {OPT}}\right)\), the designer can determine γ according to the maximal tolerable social welfare loss and set \({a^{d}_{n}}\) accordingly.
6 Simulations
In this section, we present numerical results and assess the performance of our proposed framework. For illustration purposes, we assume that the benefit function takes the linear form B _{ n }(a _{ n })=b _{peak} a _{ n } d _{ n }+b _{offpeak}(1−a _{ n })d _{ n }, the cost functions have the quadratic forms [3] C _{peak}(x)=c _{ p } x ^{2}, C _{offpeak}(x)=0.5c _{ p } x ^{2}. We regard c _{ p } as parameter determined by the power generation technology and the power market and will investigate their impact of the protocol design and performance.
Figure 5 shows how the optimal design of the billing cycle varies depending on consumers’ valuation of future utilities (reflected by the discount factor δ). On one hand, a longer billing cycle improves the monitoring accuracy, and therefore, the system has a smaller probability to go into the punishment phase. Hence, the efficiency is increasing in the billing cycle. On the other hand, a too long billing cycle reduces the consumers’ valuation of future utilities and, beyond some point, it violates the consumers’ incentives to comply. Moreover, the optimal billing cycle depends on how much the consumers value the future utilities which is parameterized by the discount factor δ. When δ is large, consumers value more their future utilities, and hence, a longer billing cycle can be used.
In Fig. 8, we consider a set of eight consumers with various load demands. The first subplot illustrates the consumers’ energy consumption scheduling actions with and without the proposed protocol deployment. With the deployment of the protocol, consumers tend to consume less energy in peak hours. At the same time, their individual utilities are increased as illustrated in the second subplot. This is because the energy cost is smaller, and hence, the bills charged to consumers are smaller. The third subplot further shows the prices with respect to various discount factors. Note that the price is proportional to the energy cost. Therefore, the lower the price is, the more energyefficient the system is. When the consumers value more the future utilities (i.e., the discount factor is larger), consumers tend to schedule less energy consumption in peak hours, and therefore, the price is also lower.
Figure 9 illustrates a realtime curve of the prices for the considered set of consumers. Note that the price also reflects the energy generation and transmission cost of this consumer set. The energy price is low in the review phase since consumers schedule less power in peak hours. Due to the imperfect monitoring, if the average price of the previous billing cycle exceeds the predetermined threshold, the system goes into the punishment phase. In the punishment phase, the energy price is high because consumers schedule more consumption in peak hours. When the punishment phase ends, the system automatically goes back to a new review phase and the price drops again.
7 Conclusions
In this paper, we augment existing DSM schemes using aggregate usage information of a set of participating consumers by proposing a novel framework which incentivizes consumers to reduce their consumption in peak hours. The proposed technology is review strategybased protocols. It is general and can be deployed in conjunction with any DSM schemes proposed in smart grids to make it incentivecompatible (i.e., selfish consumers find it in their selfinterest to follow it). The proposed protocols are simple, and thus, they are suitable for practical implementation in smart grids where the energy scheduling actions cannot be perfectly monitored. Even though this paper considers a simplified smart grid model, our analysis provides important and useful insights for designing incentivecompatible demandside management schemes based on aggregate energy usage information in a variety of practical scenarios.
The success of DSM relies heavily on the energy producer’s knowledge of individual consumers’ energy usage information. However, a huge concern is that smart meters threaten consumers’ privacy as data mining techniques are applied to energy consumption traces in order to infer consumers’ habits and behaviors [2, 23, 24]. This information may be used for other purposes besides improving the efficiency of smart grids, thereby giving rise to privacy concerns [25]. To respond to these concerns, in 2010, California’s new smart meter privacy law [26] was adopted, which mandates privacy protection for the consumers’ energy consumption data. The proposed scheme does not require detailed knowledge of household electricity demand profiles at the appliance level or for a fine time granularity. Instead, it only needs the aggregate usage pattern of individual consumers at large time scales, e.g., the aggregate usage in peak hours of a day and the aggregate usage in offpeak hours of a day. This information is very limited for the detection of the actual consumption traces of individual consumers, and hence, we believe that the proposed scheme can be applied in a wide range of practical deployment scenarios.
8 Endnotes
^{1} The proposed protocol can also be extended to directly recommend perappliance scheduling actions. However, such an extension would only complicate the notation and presentation of the proposed methods while the proposed methods will remain unchanged.
^{2} In practice, the utility function may dynamically change over time. In this paper, we make the common assumption that the utility function is fixed. One way to handle the dynamically changing utility functions is by periodically updating the design of the proposed protocol to adapt to the changes.
^{3} The proposed method allows both continuous and discrete value space of actions. For illustration purpose, we consider only continuous value space in this paper.
^{4} If consumers consume less than the recommended amount of energy in peak hours, the price is reduced. Since selfinterested consumers only have incentives to deviate to a higher consumption in peak hours (by Proposition 1), we do not regard consuming less in peak hours as a deviation.
Declarations
Acknowledgements
This research is supported by NSF CCF 1218136.
Authors’ Affiliations
References
 SM Amin, BF Wollenberg, Toward a smart grid: power delivery for the 21st century. IEEE Power Energy Mag. 3(5), 34–41 (2005).View ArticleGoogle Scholar
 Z Fan, P Kulkarni, S Gormus, C Efthymiou, G Kalogridis, M Sooriyabandara, Z Zhu, S Lambotharan, WH Chin, Smart grid communications: overview of research challenges, solutions, and standardization activities. IEEE Commun. Surv. Tutorials. 15(1), 21–38 (2013).View ArticleGoogle Scholar
 AH MohsenianRad, VW Wong, J Jatskevich, R Schober, A LeonGarcia, Autonomous demandside management based on gametheoretic energy consumption scheduling for the future smart grid. IEEE Trans. Smart Grid. 1(3), 320–331 (2010).View ArticleGoogle Scholar
 GJ Mailath, L Samuelson, Repeated Games and Reputations, vol. 2 (Oxford University Press, Oxford, UK, 2006).View ArticleGoogle Scholar
 CW Gellings, JH Chamberlin, Demandside management: concepts and methods (The Fairmont Press Inc., Lilburn, GA, 1987).Google Scholar
 H Allcott, Real time pricing and electricity markets (Harvard University, 2009).Google Scholar
 C Triki, A Violi, Dynamic pricing of electricity in retail markets. 4OR. 7(1), 21–36 (2009).MATHMathSciNetView ArticleGoogle Scholar
 K Herter, Residential implementation of criticalpeak pricing of electricity. Energy Policy. 35(4), 2121–2130 (2007).View ArticleGoogle Scholar
 B Ramanathan, V Vittal, A framework for evaluation of advanced direct load control with minimum disruption. IEEE Trans. Power Syst. 23(4), 1681–1688 (2008).View ArticleGoogle Scholar
 M Alizadeh, A Scaglione, RJ Thomas, From packet to power switching: digital direct load scheduling. Selected IEEE J. Areas Commun. 30(6), 1027–1036 (2012).View ArticleGoogle Scholar
 L Jia, L Tong, in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. Optimal pricing for residential demand response: a stochastic optimization approach (IEEEAllerton Park, IL, USA, 2012), pp. 1879–1884.View ArticleGoogle Scholar
 HI Su, AEl Gamal, in Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. Modeling and analysis of the role of fastresponse energy storage in the smart grid (IEEEAllerton Park, IL, USA, 2011), pp. 719–726.View ArticleGoogle Scholar
 L Huang, J Walrand, K Ramchandran, in Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on. Optimal demand response with energy storage management (IEEETainan, ROC, 2012), pp. 61–66.View ArticleGoogle Scholar
 Z Yu, L Jia, MC MurphyHoye, A Pratt, L Tong, Modeling and stochastic control for home energy management. IEEE Trans. Smart Grid. 4(4), 2244–2255 (2013).View ArticleGoogle Scholar
 N Li, L Chen, SH Low, in Power and Energy Society General Meeting, 2011 IEEE. Optimal demand response based on utility maximization in power networks (IEEESan Diego, CA, USA, 2011), pp. 1–8.Google Scholar
 C JoeWong, S Sen, S Ha, M Chiang, Optimized dayahead pricing for smart grids with devicespecific scheduling flexibility. IEEE J. Selected Areas Commun. 30(6), 1075–1085 (2012).View ArticleGoogle Scholar
 C Ibars, M Navarro, L Giupponi, in Smart grid communications (SmartGridComm), 2010 first IEEE international conference on. Distributed demand management in smart grid with a congestion game (IEEEGaithersburg, MD, USA, 2010), pp. 495–500.View ArticleGoogle Scholar
 HK Nguyen, JB Song, Z Han, in Computer Communications Workshops (INFOCOM WKSHPS), 2012 IEEE Conference on. Demand side management to reduce peaktoaverage ratio using game theory in smart grid (IEEE, 2012), pp. 91–96.Google Scholar
 L Song, Y Xiao, M van der Schaar, Demand side management in smart grids using a repeated game framework. IEEE J. Selected Areas Commun. 32, 1412–1424 (2014).View ArticleGoogle Scholar
 R Radner, Repeated principalagent games with discounting. Econometrica: J. Econ. Soc. 53(5), 1173–1198 (1985).MATHMathSciNetView ArticleGoogle Scholar
 JB Rosen, Existence and uniqueness of equilibrium points for concave nperson games. Econometrica: J. Econ. Soc. 33(3), 520–534 (1965).MATHView ArticleGoogle Scholar
 AJ Wood, BF Wollenberg, Power Generation, Operation, and Control (Wiley, USA, 2012).Google Scholar
 P McDaniel, S McLaughlin, Security and privacy challenges in the smart grid. IEEE Secur. Privacy. 7(3), 75–77 (2009).View ArticleGoogle Scholar
 MK El Mahrsi, S Vignes, G Hébrail, ML Picard, in Research Challenges in Information Science, 2009. RCIS 2009. Third International Conference on. A data stream model for home device description (IEEEFez, Morocco, 2009), pp. 395–402.View ArticleGoogle Scholar
 G Kalogridis, C Efthymiou, SZ Denic, TA Lewis, R Cepeda, in Smart Grid Communications (SmartGridComm) 2010. First IEEE International Conference on. Privacy for smart meters: towards undetectable appliance load signatures (IEEEGaithersburg, MD, USA, 2010), pp. 232–237.View ArticleGoogle Scholar
 C King, California’s new landmark smart meter privacy law (2010). http://www.emeter.com/2010/californiasnewlandmarksmartmeterprivacylaw/.
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.