 Research
 Open Access
 Published:
Incentivecompatible demandside management for smart grids based on review strategies
EURASIP Journal on Advances in Signal Processing volume 2015, Article number: 51 (2015)
Abstract
Demandside load management is able to significantly improve the energy efficiency of smart grids. Since the electricity production cost depends on the aggregate energy usage of multiple consumers, an important incentive problem emerges: selfinterested consumers want to increase their own utilities by consuming more than the socially optimal amount of energy during peak hours since the increased cost is shared among the entire set of consumers. To incentivize selfinterested consumers to take the socially optimal scheduling actions, we design a new class of protocols based on review strategies. These strategies work as follows: first, a review stage takes place in which a statistical test is performed based on the daily prices of the previous billing cycle to determine whether or not the other consumers schedule their electricity loads in a socially optimal way. If the test fails, the consumers trigger a punishment phase in which, for a certain time, they adjust their energy scheduling in such a way that everybody in the consumer set is punished due to an increased price. Using a carefully designed protocol based on such review strategies, consumers then have incentives to take the socially optimal load scheduling to avoid entering this punishment phase. We rigorously characterize the impact of deploying protocols based on review strategies on the system’s as well as the users’ performance and determine the optimal design (optimal billing cycle, punishment length, etc.) for various smart grid deployment scenarios. Even though this paper considers a simplified smart grid model, our analysis provides important and useful insights for designing incentivecompatible demandside management schemes based on aggregate energy usage information in a variety of practical scenarios.
Introduction
Governments and relevant industries are making a significant effort to develop nextgeneration energy grids (“smart grid”) which meet new environmental requirements as well as increased usage demands [1]. To address these challenges, demandside management (DSM) techniques for smart grids (see e.g., [1, 2]) were proposed as a way to significantly save energy. In a typical configuration, the energy producer (e.g., a utility company) periodically receives usage information from the smart meter affiliated with consumers via a communication network. The energy producer then manages the energy generation/purchase/transmission and bills the consumer based on this usage information. To save energy, reduce cost, and increase reliability, the energy producer can use for instance smart pricing to encourage consumers to transfer peakhour consumptions to offpeak hours [3].
In this paper, we focus on DSM which operates based on the knowledge of aggregate information of realtime energy usage of a set of consumers instead of each individual consumer (see Fig. 1). The use of aggregate information instead of individual consumers’ information may be due to various reasons, e.g., high cost for largescale deployment of smart meters for individual houses. Specifically, the utility company generates/purchases/transmits electricity according to the realtime aggregate usage information of a set of consumers and charges the consumers for their electricity usage at a price proportional to the cost due to energy generation/purchase/transmission. Since the cost incurred in peak hours is higher than that in offpeak hours, the price is higher if consumers schedule more loads in peak hours. If consumers were cooperative, then such a pricing scheme would result in a socially optimal load scheduling by the consumers, meaning that the sum utility of consumers is maximized. However, the selfinterested nature of individual consumers imposes some important challenges for the deployment of such solutions. Because the consumers’ utilities (i.e., the benefit from energy consumption minus the payment) depend on other consumers’ scheduling actions, an energy consumption game emerges: selfinterested consumers may want to consume more energy in peak hours while paying a relatively lower cost since this cost is shared among all the participating consumers. Therefore, a crucial problem for such smart grid systems becomes how to incentivize individual consumers to perform the socially optimal load scheduling in order to maximize the overall grid performance. In this paper, we augment existing DSM schemes by proposing a novel technology which incentivizes consumers to reduce their consumption in peak hours. This technology is general and can be deployed in conjunction with many DSM schemes proposed in smart grids based on utility functions so that selfish consumers find it in their selfinterest to follow the DSM scheme.
We model participating consumers’ interactions as a repeated game [4] in which the energy consumption game is played repeatedly (e.g., every day). We assume that the energy delivery system is deploying a protocol which is designed by the utility company or a third party aiming to maximize the system efficiency (i.e., the sum utility of the consumers in this paper). Besides designing a smart pricing scheme that is based on the aggregate usage pattern of the consumers, this protocol designer also constructs a protocol which recommends a set of scheduling actions to consumers based on the past prices, which depend on the history of all the consumers’ energy consumption and scheduling. The protocol designer is only active during the design stage of the protocol and it is passive at run time. In implementing the protocol, the utility company observes every day the (aggregate) energy consumption pattern of the consumer set and based on this performs billing every billing cycle. The consumers use the past prices to determine their future energy consumption scheduling actions in a selfinterested and completely decentralized manner. Note that the protocol designer is passive at run time, and hence, it cannot oblige the consumers to follow the recommended energy scheduling. The consumers will only adopt the recommended consumption scheduling if it is in their selfinterest to do so, i.e., if they are better off following the recommended protocol rather than deviating from it. Such a protocol is called incentivecompatible (IC).
To ensure that the selfinterested consumers will take the recommended scheduling actions, a punishment phase is incorporated in the recommendation of the protocol designer upon observing deviations. The protocol designer designs review strategybased protocol which consist of two phases: the review phase and the punishment phase. (Its operation is depicted in Fig. 2.) In the review phases, the consumers are recommended to take the socially optimal scheduling actions. At the end of each review phase, a statistical test is performed (either by the consumers or the utility company) on the prices announced by the utility company. If the test fails (i.e., there is an evidence that some consumers did not follow the recommendation), the system goes into a punishment phase in which consumers perform for a certain time the strongest punishment which can be imposed on a selfinterested consumer. The punishment is implemented by the participating consumers using a different energy consumption scheduling. In particular, upon a deviation, all consumers shift more of their energy consumption to peak hours and, hence, the prices for the next billing cycles will be raised. Therefore, the consumer who deviated will be punished by receiving a low utility.
There are two important points that are worth noting. First, even though the consumers who did not deviate will also receive low utilities due to the raised prices, the protocol is designed in such a way that consumers still want to carry out such a punishment. Second, the utilities obtained by the consumers in the punishment phase are lower than those in the review phase. However, they are the same as in the scenario where there was no review strategy protocol being deployed. Therefore, the review strategybased protocol is guaranteed to achieve better system performance than that which can be obtained without deploying the protocol. Importantly, the protocol designer needs to carefully design the protocol such that this punishment does not take place too frequently since it reduces all consumers’ utilities. We rigorously characterize the performance of deploying review strategies and determine the optimal design (optimal billing cycle, punishment phase length, etc.) for various smart grid deployment scenarios. It is also important to note that the proposed review strategybased protocol can be deployed in conjunction with any DSM scheme besides the smart pricing scheme which is used for illustration in this paper, which may take into account findgrained load scheduling (e.g., hourly scheduling). The scheduling results of any DSM scheduling schemes can serve as the input of our design framework, and the output is the review strategybased protocol.
The rest of this paper is organized as follows. In Section 2, related works are discussed. Section 3 models the repeated energy consumption game and formulates the design problem of the DSM with usage data aggregation. Section 4 formally introduces the proposed incentive protocol based on review strategies. Section 5 determines the optimal protocol parameters and evaluates its performance. Section 6 provides simulation results. Section 7 concludes the paper.
Related works
A main issue for the efficient deployment of smart grids is the design of DSM [5]. A large body of literature assumes the deployment of smart meters and designs smart pricing schemes to encourage individual consumers to manage their own loads (e.g., by shifting their energy consumption from peak hours to offpeak hours). Among them, realtime pricing (RTP) [6], timeofuse pricing (TOUP) [7], and criticalpeak pricing (CPP) [8] represent popular options.
Recent works [3, 9–14] considered consumers’ discomfort costs and aimed to jointly minimize the consumers’ billing and discomfort costs, by assuming some utility functions. These works can be classified into two categories. In the first category, consumers are assumed to be pricetaking, meaning that they do not consider how their consumption will affect the prices. In this case, the decisionmaking of a single foresighted consumer is formulated as a stochastic control problem aiming to maximize its longterm utility [11–13]. Alternatively, in [15, 16], multiple myopic consumers aim to maximize their utility, and their decisions are formulated as static optimization problems among cooperative users.
The second category assumed that consumers are myopic and priceanticipating, meaning that they take into account how their consumption will affect the prices. In this case, each consumer’s electricity usage affects the other consumers’ billing costs. These works [3, 17, 18] model the interaction emerging among myopic consumers as oneshot games and studied the Nash equilibrium (NE) of the emerging game. In this paper, we also model the consumers as priceanticipating. However, consumers interact with each other repeatedly and are foresighted, thereby engaging in a repeated game. It is wellknown that the Nash equilibrium in oneshot games with myopic players is often inefficient. In this paper, we design a novel class of incentive protocols based on review strategies in order to achieve the socially optimal load scheduling in smart grid systems. Prior work [19] also studied DSM in a repeated game setting. However, that work assumes that each individual consumer’s action can be perfectly observed, while in this work, only the aggregate scheduling of a set of participating consumers can be observed with noise.
This paper adopts a similar pricing method as in [3] where pricing is performed based on the aggregate usage pattern of a set of consumers and consumers are charged proportional to their total daily energy consumption. It is argued in [3] that this proportional charging model is consistent with the existing residential metering models. Nevertheless, our work can be used in conjunction with a variety of existing DSM scheduling methods [6–8]. Augmenting these methods with our proposed incentive protocols is especially important when consumers have incentives to deviate from the optimal scheduling given by a DSM scheme (e.g., when it is performed on the aggregate energy usage information).
The approach proposed in this paper contributes to both the smart grid and the gametheoretic literature dedicated to engineering applications. Review strategies have been adopted in the principalagent games with discounting in [20] in economics, and such games differ significantly from the smart grid deployment scenario considered in this paper. In [20], the game is played between only two players (i.e., a principal and an agent). However, in the considered smart grid scenario, there are multiple players (i.e., a set of consumers) and their utilities exhibit negative externalities.
Table 1 provides a comparison with existing works.
System model
Power system
We consider a smart grid system with multiple consumers and one energy producer, e.g., a utility company. These consumers receive electricity from the same aggregator which distributes electricity to the consumers. Each consumer is equipped with an energy consumption scheduler (ECS) for scheduling the household energy consumption. A smart meter is connected to the set of consumers from which it collects and analyzes the energy consumption. This smart meter gathers (almost) accurate readings automatically, at requested time intervals, and relays them to the utility company. Using this information, the aggregator (utility company) can adjust its energy generation, purchase, and transmission accordingly. The communication between the utility company and the consumers’ smart meters is done through the local area network (LAN) by using appropriate communication protocols. Let \(\mathcal {N}\) denote the set of consumers that share the same aggregator, where the number of consumers is N. Figure 1 illustrates the system model.
Time is discrete and each time slot represents 1 day. Everyday, there are peak hours when consumers’ energy consumption demand is high and offpeak hours when consumers’ energy consumption demand is low. Nevertheless, our analysis can be easily extended to include finergrained demand intensities, e.g., hourly scheduling. For consumer \(n\in \mathcal {N}\), we assume that its daily schedulable demand does not change and is denoted by d _{ n }. We also denote the schedulable demand vector of all consumers in the set \(\mathcal {N}\) by d. Consumers can schedule these portions of loads in different hours within a day. Since we consider only peak hours and offpeak hours, we write a _{ n } as the fraction of load that is scheduled in peak hours for consumer n and 1−a _{ n } as the fraction of its load that is scheduled in offpeak hours ^{1}. Consumers prefer consuming energy in peak hours to offpeak hours. For consumer n, its energy consumption benefit per day is
where b _{peak}>b _{off−peak}>0 and B(·) is a benefit function.
The utility company charges the consumers for their electricity consumption at a price p(D _{peak},D _{off−peak}) which is set depending on the electricity usage in peak hours and offpeak hours, where \(D_{\text {peak}} = \sum _{n} a_{n} d_{n}\) is the total consumption in peak hours and \(D_{\mathrm {offpeak}} = \sum _{n} (1a_{n}) d_{n}\) is the total consumption in offpeak hours. Clearly, \(D_{\text {peak}} + D_{\mathrm {offpeak}} = \sum _{n} d_{n}\) is the total usage of all consumers. Let C _{peak}(x) and C _{off−peak}(x) represent the costs of generating and distributing x units of electricity by the energy source in peak hours and offpeak hours, respectively. The price p(D _{peak},D _{off−peak}) is set to be proportional to the average unit cost of electricity generation and distribution, i.e.
where κ is the revenue/cost ratio. If κ=1, then the billing system is budgetbalanced and the utility company charges the consumers only the generating/providing energy costs for the utility (i.e., the utility company does not make money and it serves simply as a benevolent energy provider). If κ>1, then the difference between the total charges to the consumers and the total energy cost represent the profit made by the utility company. In this paper, the protocol designer is considered to be benevolent and represent the consumer’s interests rather than maximizing the utility company’s profit. Thus, κ=1 in the subsequent analysis.
We make the following standard assumptions on the benefit function and cost functions throughout this paper.
Assumption 1.
(1) The benefit function B(x) is increasing and concave in x. B(0)=0. (2) The cost functions C _{peak}(x) and C _{off−peak}(x) are increasing and strictly convex in x. C _{peak}(0)=C _{off−peak}(0) and Cpeak′(x)≥Coff−peak′(x),∀x≥0.
Because the smart meter only periodically sends the usage information to the utility company, monitoring the usage pattern (D _{peak},D _{offpeak}) is imperfect due to the monitoring noise. Hence, the price is also a noisy function of (D _{peak},D _{offpeak}). In particular, we model this by adding a noise term to the price function
where ε is induced by the monitoring noise of the usage pattern. Because we assume that the schedulable demand vector d is fixed, the price p(D _{peak},D _{off−peak}) only depends on the consumption scheduling actions a of the consumers. Therefore, we alternatively write the price p(a) as a function of the scheduling action profile p(a). Note that p(a) is the expected price if a is taken and \(\hat {p}({\boldsymbol {a}})\) is the actual realized price if a is taken in the noisy environment. When consumers are making scheduling decisions, only p(a) is important since consumers can only compute the expected price but not the actual price which has not been realized yet.
By taking both energy consumption benefit and payment into consideration, consumer n’s (expected) utility can be written as ^{2}
A billing cycle consists of L days. At the beginning of each billing cycle, the consumers determine the consumption scheduling actions for the next L days. At the end of each billing cycle, the utility company posts bills as well as the electricity price of the previous cycle. Using the pricing information, the consumers are able to infer the aggregate daily usage pattern in the last L days. However, since the price does not perfectly reflect the aggregate usage pattern due to the noise term, the consumers’ knowledge of the aggregate usage pattern is imperfect.
Energy consumption game: stage game
The consumers’ energy consumption scheduling action profile a determines the price at which the consumers will be charged, thereby leading to the following noncooperative game \(\mathcal {G} = \langle \mathcal {N}, \mathcal {A}, \{u_{n}(\cdot)\}_{n\in \mathcal {N}}\rangle \) among consumers. The elements of this game are elaborated below:

Players: consumers in the set \(\mathcal {N}\).

Actions: each consumer \(n \in \mathcal {N}\) selects its energy consumption scheduling action \(a_{n} \in \mathcal {A} = [0,1]\), i.e., the fraction of its energy consumption in peak hours ^{3}.

Payoffs: the utility for consumer n is the benefit obtained by the energy consumption minus the payment to the utility company as in (4). By separating consumer n’s action from other consumers’ actions, the utility can also be written as
$$\begin{array}{*{20}l} &u_{n}\left(a_{n};{\boldsymbol{a}}_{n}\right) = b_{n}\left(a_{n}\right)  p\left(a_{n}, {\boldsymbol{a}}_{n}\right) d_{n} \end{array} $$((5))where a _{−n }, by convention, is the action profile of consumers except consumer n.
Since consumers are selfinterested, they will want to selfishly maximize their own utilities. We use Nash equilibrium as the solution concept of this energy consumption game.
Definition 1 (NE).
A Nash equilibrium action profile a ^{NE} is such that, \(\forall n \in \mathcal {N}, \forall \tilde {a}_{n}, u_{n}\left (a^{NE}_{n},{\boldsymbol {a}}^{NE}_{n}\right) \geq u_{n}\left (\tilde {u}_{n}, {\boldsymbol {a}}^{NE}_{n}\right)\).
In NE, no agent can improve its own utility by unilaterally changing its own action. Theorem 1 proves the existence of NE of the considered energy consumption game.
Theorem 1.
There exists at least one NE in the energy consumption game.
Proof.
We will show that energy consumption game is a strictly concave Nperson game. The existence of NE for this type of games then directly results from [21].
In order to show that the game is concave, consider any consumer n. We need to show that the utility function u _{ n }(a) is strictly concave in a _{ n }. We investigate the two parts consisting the utility function separately as follows.

(1)
By Assumption 1, we know that B(·) is a concave function. It is then straightforward to see that b _{ n }(a _{ n }) is concave in a _{ n }.

(2)
Now consider the price term. We perform secondorder derivative with respect to a _{ n },
$$\begin{array}{*{20}l} {}\frac{\partial p\left({\boldsymbol{a}}\right)}{\partial a} = \frac{{d_{n}^{2}} \left(C^{\prime\prime}_{\text{peak}}\left(D_{\text{peak}}\right) + C^{\prime\prime}_{\mathrm{offpeak}}\left(D_{\mathrm{offpeak}}\right)\right)}{\sum_{n} d_{n}} > 0 \end{array} $$((6))Hence, the payment term is strictly convex in a _{ n }.
In sum, u _{ n }(a) is strictly concave in a _{ n }. Thus, there exists at least one NE in the energy consumption game.
It is well known that NE is often not efficient in terms of Paretooptimality. If an action profile is Paretooptimal, then no consumer can gain a higher utility without decreasing at least one other consumer’s utility by using a different action profile. We provide a formal definition as follows.
Definition 2 (Paretooptimal (PO)).
An action profile a is Paretooptimal if there does not exist any other action profile \(\tilde {{\boldsymbol {a}}}\) such that \(u_{n}\left (\tilde {{\boldsymbol {a}}}\right) \geq u_{n}\left ({\boldsymbol {a}}\right),\forall n\).
Even though a PO action profile a ^{PO} is superior to a ^{NE}, consumers do not have incentives to automatically adopt this action profile in the energy consumption game since they are selfinterested. This is because for any action profile that is not a NE, there will always be some consumers who want to schedule a different amount of energy consumption in peak hours to increase their own utilities. In order to provide incentives for the consumers to play the PO action profile, in this paper, we will design a protocol that exploits the ongoing nature of consumers’ energy consumption interactions. For notational simplicity, we denote u _{ n }(a ^{NE}) as \(u_{n}^{\text {NE}}\) and u _{ n }(a ^{PO}) as \(u_{n}^{\text {PO}}\). Before we proceed to that, we provide a simple example to illustrate the difference between a ^{NE},a ^{PO}, and the inefficiency of a ^{NE}.
Example (Two consumers). Consider two consumers: N=2. Let κ=1,b _{peak}=2,b _{off−peak}=1,d _{1}=d _{2}=1 and B(x)=x. The cost functions are simple quadratic functions as in [22]: C _{peak}(x)=x ^{2},C _{off−peak}=0.5x ^{2}. The utility of consumer n is therefore u _{ n }(a _{1},a _{2})=1+a _{ n }−0.5[(a _{1}+a _{2})^{2}+0.5(2−a _{1}−a _{2})^{2}].
In this twoconsumer game example, the unique symmetric NE action profile is a1NE=a2NE=2/3. That is, both consumers schedule 2/3 of their energy consumption in peak hours and no one wants to unilaterally schedule a different amount since that will only reduce its own utility. Thus, the corresponding utilities are u1NE=u2NE=2/3. However, the sum utility is maximized when both users choose the action profile a1PO=a2PO=0.5, which is the unique symmetric PO action profile. The corresponding utilities are u1PO=u2PO=5/4. The sum utility obtained when taking this action profile is 87.5 % higher than that by taking the NE action profile (i.e., 4/3), thereby indicating the inefficiency of NE.
Repeated energy consumption game
In the repeated energy consumption game, consumers play the energy consumption stage game repeatedly. At the beginning of each billing cycle, the consumers determine the scheduling actions for this billing cycle based on the previous energy consumption history. Specifically, consumer n determines \({a^{t}_{n}}\) for t=m L+1,m L+2,…,(m+1)L at the beginning of billing cycle L based on the prices \({p^{t}_{n}}, t=(m1)L+1, (m1)L+2,\ldots,mL\) of the previous billing cycle. Alternatively, consumer n can also determine \(a^{mL+\tau }_{n}\) when day m L+τ comes. However, because consumers do not obtain additional information about the past energy consumption histories during one billing cycle (i.e., prices are announced only at the end of each billing cycle), these two methods are equivalent for analysis. In this paper, we consider that consumers determine the scheduling actions at the beginning of each billing cycle. Consumers do not directly know the energy consumption history of other consumers. They can only infer this history according to a public signal \(z \in \mathcal {Z}\). In this paper, the public signal is binary, representing whether the statistical test is passed or not. Based on this information, the consumers take actions to maximize their longterm utilities, which is defined as follows
That is, the longterm utility is the normalized sum of the discounted expected stage utilities which are induced by the strategy where the expectation is taken over the probability over different sequences of action profiles \(\left \{{\boldsymbol {a}}^{t}\right \}_{t=1}^{\infty }\).
Figure 3 illustrates the timeline of the system and the decision flow of consumers.
Problem formulation
In this paper, we assume a benevolent designer (i.e., κ=1) who aims to design a protocol that maximizes the expected sum utility of all consumers, i.e., \(\sum \limits _{n}E\left \{u_{n}\left ({\boldsymbol {a}}\right)\right \}\). Nevertheless, other performance criteria can also be used in our framework. Formally, the designer wants to achieve an action profile a ^{PO} that solves the optimization below:
The first constraint requires that no consumer’s utility is decreased by applying such a protocol than NE. The second constraint requires the solution be Paretooptimal. To provide consumers with incentives to take this action profile, the designer designs a protocol that exploits the ongoing interactions of consumers. Let Ψ denote a protocol and \(V(\Psi) = \sum \limits _{n} E\{u_{n}(\Psi)\}\) as the protocol efficiency where E{u _{ n }(Ψ)} is the expected utility of consumer n if all consumers follow the protocol Ψ. The protocol design problem is then formally presented as follows
The constraint requires that all consumers have incentives to follow the protocol Ψ.
Protocol based on review strategies
A possible incentive compatible protocol is the grimtrigger strategy [4] which uses the strongest punishment that can be imposed upon deviation. In the trigger strategy, following any point in time at which there is any evidence that any consumer had deviated from any previous recommendation, the protocol designer recommends that each consumer consumes afterwards a large amount of energy in peak hours (i.e., revert to NE action profile forever afterwards). In the twoconsumer example, the protocol designer recommends both consumers to take the PO actions a1PO=a2PO=0.5 in each billing cycle. If there is a deviation, the protocol designer recommends both consumers to take the NE actions a1NE=a2NE=2/3 forever (i.e., all the way until the consumers unsubscribe from this service). Therefore, the longterm utility that consumer 1 can receive by following the recommendation is \(u^{\text {PO}}_{1} + \sum _{t=1}^{\infty } \delta ^{t} u^{\text {PO}}_{t}\), and the longterm utility by deviation is \({u^{d}_{1}} + \sum _{t=1}^{\infty } \delta ^{t} u^{\text {NE}}_{t}\) where \({u^{d}_{1}} > u^{\text {PO}}_{1}\) is the oneshot utility by deviation. Because u1PO>u1NE, the grimtrigger strategy may provide sufficient incentives for the consumers to play the PO actions. Since no users want to deviate, the system never enters the NE equilibrium phase, thereby leading to the highest social welfare. However, if there is noise in the prices, the designer will determine that some users have deviated by accident almost surely, and hence, the system will eventually enter the punishment phase with probability one and stay there indefinitely. This leads to the lowest social welfare. Therefore, a protocol which allows stopping the punishment after a certain time is needed in such imperfect monitoring scenarios.
Protocols based on review strategies
In a protocol based on review strategies, there are two types of phases: review phases and punishment phases. Each review phase consists of one billing cycle. Since the length of the billing cycle is designed by the protocol designer, the length of a review phase is a design parameter. Each punishment phase consists of a certain integer number K of billing cycles, and hence, it has KL time slots (i.e., days). The protocol designer recommends a review strategy σ ^{R} for all consumers as follows: the recommended scheduling action profile is a ^{PO} for each day in the review phase; the recommended scheduling action profile is a ^{NE} for each day in the punishment phase. At the end of each billing cycle in the review phase, a statistical test is performed (either by the consumers themselves or the utility company) on the prices of the previous cycle to determine whether other consumers followed or not the recommended strategy. A signal \(z \in \mathcal {Z} = \{0,1\}\) is generated based on the test results with z=1 representing that all consumers followed the recommended strategy and z=0 representing that some consumers deviated from the recommended strategy. If z=1, the system moves to another review phase; if z=0, the system moves to the punishment phase. When a punishment phase ends, the system automatically moves to a new review phase. This is the protocol based on review strategies depicted in Fig. 4.
Importantly, note that the utilities obtained by the consumers in the punishment phase are lower than in the review phase and they are the same as in the scenario where there was no review strategy protocol being deployed. Therefore, the review strategybased protocol is guaranteed to achieve better system performance than that which can be obtained without deploying the protocol. However, the designer still needs to carefully design the protocol such that the system enters the punishment phase as rarely as possible.
Given this review strategy structure, the protocol can be fully characterized by the length of the review phase (i.e., the billing cycle length) L, the length of the punishment phase KL and the statistical test G. Therefore, we write a protocol based on review strategies as Ψ(L,K,G).
In Table 2, we summarize what information the designer and the consumers have and what computations that they need to perform. Note that when performing appliance level load scheduling, consumer takes into account the specific electricity demands. However, since the protocol will be designed to be incentivecompatible, consumers will perform scheduling according to a ^{PO} in the review phases and according to a ^{NE} in the punishment phases.
Performance metrics
To evaluate the protocols based on review strategies, we need to define several performance metrics. First, the protocol should be IC since the consumers are selfinterested and will only follow recommendations when this is in their selfinterest. Particularly, consumers should have incentives to take the PO scheduling actions in the review phase.
Proposition 1.
In a PO action profile, a selfinterested consumer will not want to schedule less than the recommended energy consumption in peak hours in the stage game.
Proof.
If consumer n can increase its stage utility by choosing \({a^{d}_{n}} < a^{PO}_{n}\), then all other consumers’ utilities are also increased because the price is reduced as the aggregate usage in peak hours decreases. This causes a contradiction to the definition of “Paretooptimality”.
Proposition 1 states that, given the daily desired energy demand of a consumer, if the designer recommends to it the PO scheduling action, then the selfinterested consumer will schedule no less than the PO energy consumption in peak hours in order to maximize its own utility assuming other consumers are complying with the recommended PO scheduling. Note that this applies even in the case when the consumer’s daily demand is low since the consumer is simply rescheduling its desired amount of energy consumption to different hours. In the twoconsumer example, given the recommended PO action profile a _{1}=a _{2}=0.5, a consumer can only increase its utility by scheduling more than half of its energy consumption in peak hours to increase its own utility. Therefore, we only need to focus on the case \({a^{d}_{n}} > a^{\text {PO}}_{n}\) when considering consumers’ incentive problems and denote the corresponding utility by \({u^{d}_{n}}\) for consumer n. Note that consuming more energy in peak hours increases the price and, hence, consumer n’s own payment is increased. However, because the costs are shared among the entire consumer set, consumer n is still able to receive a higher utility \({u^{d}_{n}} > u^{\text {PO}}_{n}\) by unilaterally increasing its own consumption in peak hours when ∇_{ n } u _{ n }(a ^{PO})>0.
At the beginning of each billing cycle, the consumers determine the scheduling actions for this billing cycle based on the previous energy consumption history. We will focus on constant strategies during a billing cycle, i.e., consumer n chooses a constant scheduling action every day during a billing cycle. However, our analysis can also be extended to more sophisticated strategies (i.e., the consumer may use different scheduling actions during a billing cycle) by taking the equivalent average scheduling action. Let U _{ n }(σ ^{R}s=1) and U _{ n }(σ ^{R}s=0) denote the longterm utilities of consumer n at the beginning of a review phase and a punishment phase, respectively, if all consumers follow the strategy σ. Let \(\tilde {\sigma }^{R}_{n}\) denote a strategy where consumer n deviates to some \({a^{d}_{n}} > a^{\text {PO}}_{n}\) in the current review phase and follow the review strategy afterwards and all consumers follow σ ^{R} all the time. The following proposition characterizes the IC condition of a protocol.
Proposition 2.
A protocol based on a review strategy σ ^{R} is IC against a strategy \(\tilde {\sigma }_{n}\) with the deviation action \({a^{d}_{n}} > a^{\text {PO}}_{n}\) for consumer n if and only if
Proof.
This directly results from oneshot deviation principle in repeated games [4].
Proposition 2 uses the oneshot deviation principle in repeated games and shows that if a consumer cannot gain by unilaterally deviating from the recommended strategy σ ^{R} in the current billing cycle and following afterwards, it cannot gain by switching to any other strategies either and vice versa. In the punishment phase, the recommended strategy is the NE scheduling action profile a ^{NE}, and hence, the consumers will not have incentives to deviate from the recommended action. Therefore, we only need to check whether the consumers have incentives to take the recommended PO profile a ^{PO}, i.e., whether the oneshot deviation principle holds in the review phase. The lefthand side of (10) represents the longterm utility by following σ ^{R}, and the righthand side represents the longterm utility for consumer n by deviating to \({a^{d}_{n}}\) only in the current review phase while all other consumers follow σ ^{R}. In Section 4.2, we will use Proposition 2 to construct IC protocols based on review strategies.
The goal of the protocol designer is to maximize the sum utility of all the users. The maximum sum utility is achieved when all consumers take the PO scheduling actions in each time slot, denoted by \(V^{*} = \sum _{n} u_{n}^{\text {PO}}\). We call V ^{∗} the “firstbest” utility which yields the maximum sum utility for the consumers. The efficiency loss of a protocol is defined by C(Ψ)=(V ^{∗}−V(Ψ))/V ^{∗}.
Definition 3.
A protocol Ψ is said to be ΔPareto optimal (ΔPO) if C(Ψ)<Δ.
A ΔPO protocol yields a sum utility no less than 1−Δ of the firstbest sum utility. If a protocol Ψ prescribes the scheduling action profile a ^{PO} every day regardless of the history, then V(Ψ)=V ^{∗} and thus Ψ achieves the firstbest sum utility (0PO). However, such a protocol is not IC, and hence, the firstbest is not achievable.
Statistical test
In this subsection, we discuss how the public signal based on the prices is constructed. The consumers cannot know the actual scheduling actions made by the other consumers but can only observe the prices announced by the utility company at the end of each billing cycle. According to (3), if all consumers take the recommended optimal scheduling actions, the corresponding price on that day is \(\hat {p}\left ({\boldsymbol {a}}^{\text {PO}}\right) = p\left ({\boldsymbol {a}}^{\text {PO}}\right)+\epsilon \), where
If there are some consumers who deviated by consuming more than the recommended amount of energy during the peak hours, the price of that day is increased^{4}. Due to the monitoring noise, the actual price is \(\hat {p}\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right) = p\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right) + \epsilon \).
The statistical test determines whether the consumer set was following the recommended strategy in the previous billing cycle by comparing the average value of the prices to a threshold value,
We are interested in the following two kinds of error probabilities by performing the statistical test.

False alarm probability q _{ F }(L,p _{th}): the probability that z=0 when all consumers follow the recommended strategy σ ^{R}, i.e., they all scheduling the optimal amount of energy consumption in peak hours.

Miss detection probability q _{ M,n }(L,p _{ th }): the probability that z=1 when consumer n deviates to the action \({a^{d}_{n}}\) in the previous billing cycle, i.e., it schedules more than the optimal amount of energy consumption in peak hours while all other consumers take the optimal scheduling.
Note that our design can also be extended to analyze the collusion problem where a subset of consumers colludes in order to gain higher utilities. In that case, we can simply take the consumers who collude as a single consumer with its demand being the aggregate demand.
The following proposition characterizes the impact of the threshold and the billing cycle length on the error probabilities.
Proposition 3.
\(\forall p_{\textit {th}} \in \left [0, p\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right)p\left ({\boldsymbol {a}}^{\text {PO}}\right)\right ], \lim \limits _{L\to \infty } q_{F}\left (L, p_{\textit {th}}\right) = 0\) and \(\lim \limits _{L\to \infty } q_{M,n}\left (L, p_{\text {th}}\right) = 0\).
Proof.
By the law of large numbers, the sample averages converge in probability and almost surely to the expected value as the sample number tends to infinity. Therefore, when the threshold is large than 0, the false alarm probability goes to 0 and when the threshold is smaller than \(p\left ({a^{d}_{n}}, {\boldsymbol {a}}^{\text {PO}}_{n}\right) p\left ({\boldsymbol {a}}^{\text {PO}}\right)\), the miss detection probability goes to 0.
Proposition 3 informs the protocol designer’s selection of the statistical test (i.e., the test threshold). It also proves that, in order to accurately detect selfinterested consumers’ excessive usage in peak hours, a longer billing cycle should be chosen such that the monitoring mitigated. However, a very long billing cycle also reduces the consumers’ valuation of utilities in the future billing cycles, and hence, the punishment may not be strong enough to induce consumers’ compliance. Therefore, the optimal billing cycle as well as the punishment length should be carefully designed to induce the optimal performance of the smart grid system.
Design and performance analysis
In this section, we design the optimal incentivecompatible protocols based on review strategies and analyze their performance. The outline of this section is as follows.

We first establish the existence of IC protocols based on review strategies and provide the IC conditions such that the consumers have incentives to perform the recommended scheduling in the review phase.

Next, we propose a greedy algorithm to determine the optimal design of review strategies.

We then evaluate the performance of the optimal protocol based on review strategies. Specifically, the proposed protocol is able to achieve ΔPO of the firstbest efficiency.
Incentivecompatibility
Recall that Proposition 2 provides us with a simple method to determine whether a protocol based on review strategies can be IC. In this subsection, we study how to design protocol parameters according to Proposition 2. To do this, we need to compute U _{ n }(σ ^{R}s=1), U _{ n }(σ ^{R}s=0), and \(U_{n}(\tilde {\sigma }_{n}s=1)\) where s=1 denotes that the system is in a review phase and s=0 denotes that the system is in a punishment phase. These utilities are dependent on each other as follows,
and
The first term in (13) is the utility in the current review phase. The second term is the continuation utility after the review phase. With probability 1−q _{ F }(L,p _{th}) the system remains in the review phase; with probability q _{ F }(L,p _{th}), the system moves to a punishment phase due to the monitoring error. The utility of consumer n by choosing a deviation scheduling action \({a^{d}_{n}}\) at the beginning of a review phase is given by,
The first term in (15) is the utility gain by the deviation in the current review phase which is larger than \(u^{\text {PO}}_{n}\). The second term is the continuation utility after the review phase. With probability 1−q _{ M,n }(L,p _{th}) the deviation is detected, so the system moves to a punishment phase; with probability q _{ M,n }(L,p _{th}) the system remains in the review phase.
We define the following “incentive ratio” of a protocol based on review strategies for consumer n:
Let us examine the physical meaning of this incentive ratio. Essentially, the numerator represents the longterm utility gain due to deviation, and the denominator represents the maximal longterm utility loss due to the punishment. To enforce consumers to cooperatively optimize their energy consumption, this loss should be positive and larger than the gain obtained when deviating. Therefore, the incentive ratio should be in the range [0,1]. It is worth noting that g _{ n }(L,p _{ th }) should be strictly less than 1 since the denominator is only an upper bound on the loss induced by the punishment but not the actual loss (which depends on L,K,p _{th}). Theorem 2 provides a condition such that a protocol based on review strategy is IC against a deviation action \({a^{d}_{n}}\). This condition serves as a guideline for the choices of the proper protocol parameters L,K, and p _{th}.
Theorem 2.
The protocol Ψ(L,K,p _{ th }) is IC against \({a^{d}_{n}}\) for consumer n if and only if the billing cycle length satisfies 0≤g _{ n }(p _{ th },L) and the punishment phase length is large enough, i.e.
Proof.
According to Proposition 2, we only need to check the utility difference below,
In the last equation of \((1\delta ^{L})(u^{\text {PO}}_{n}  {u^{d}_{n}})\) is the utility gain in the current review phase billing cycle by deviation, and the remaining term is the utility loss for the future due to the punishment. For Ψ to be IC, the utility loss due to the punishment should exceed the utility gain due to deviation, i.e.,
The sufficient and necessary condition for the above to hold is 0≤g _{ n }(L,p _{th})<1 and K L≥ logδ(1−g _{ n }(L,p _{th})).
Optimal protocol parameters
We first determine the efficiency of a given protocol. If a protocol is IC, then all consumers follow the recommended strategy and play the PO action profile. Therefore, the efficiency depends on the probability that the system is in review phases and punishment phases due to monitoring errors. Denote these two probabilities by η _{ R }(Ψ) and η _{ P }(Ψ)=1−η _{ R }(Ψ), respectively. The efficiency of an IC protocol is thus \(V(\Psi) = \sum _{n} (u^{\text {PO}}_{n} \eta _{R}(\Psi) + u^{\text {NE}}_{n} \eta _{P}(\Psi))\) where η _{ R }(Ψ) is determined in the following Lemma.
Lemma 1.
\(\eta _{R}(\Psi) = \frac {1}{1+{Kq}_{F}\left (p_{\textit {th}},L\right)}\)
Proof.
Solving the stationary distribution of the Markov chain in Fig. 6 yields the result. The transition probabilities of the chain are RR with probability 1−q _{ F }, RP with q _{ F }, PP and PR with probability 1.
Lemma implies that in order to maximize the system efficiency, the protocol designer should choose L,K,p _{th} such that K q _{ F }(L,p _{th}) is minimized subject to IC conditions in Theorem 2. These design parameters are coupled in a complex manner, and thus, to find the optimal design parameters, it is better to work backwards.
Step 1. We first determine the optimal K ^{∗}(L,p _{th}) given L,p _{th}. Therefore, the false alarm probability q _{ F }(L,p _{th}) and the miss detection probability q _{ M,n }(L,p _{ th }) are also determined. If 0≤g _{ n }(L,p _{th}), the optimal K is chosen as
Step 2. Given L, the statistical test determines q _{ F }(L,p _{th}) and \(q_{M, n}(L, p_{\text {th}}), \forall n \in \mathcal {N}\). Note that K ^{∗}(L,p _{th}) depends on the statistical test through the term q _{ F }(L,p _{th})+q _{ M,n }(L,p _{th}). Therefore, the optimal statistical threshold is chosen as
In general, the statistical test threshold p _{th} has two opposite effects on K ^{∗}(L,p _{th})q _{ F }(L,p _{th}). Minimizing q _{ F }(L,p _{th}) often leads to a large q _{ M,n }(L,p _{th}), and hence, q _{ F }(L,p _{th})+q _{ M,n }(L,p _{th}) may also be large. This by (16) induces a large K ^{∗}(L,p _{th}). Therefore, the protocol designer has to tradeoff when selecting p _{ th } between minimizing the false alarm probability q _{ F }(L,p _{ th }) and the punishment phase length K ^{∗}(L,p _{th}).
Step 3. The previous two steps provide the optimal \(p^{*}_{\text {th}}\) and \(K^{*}(L, p_{\text {th}}^{*})\) given the review phase length L. However, the space of L includes all positive integer numbers and is infinite. In the following, we determine the upper bound on L such that an IC protocol can be designed.
Proposition 4.
If Ψ(L,K,p _{ th }) is IC, then
Proof.
Note that
If (22) does not hold, then there must exist \(n \in \mathcal {N}\) such that g _{ n }(L,p _{th})>1 which violates the IC condition in Theorem 2. Therefore, for a protocol to be IC, (22) must be satisfied.
Proposition 4 leads to a crucial tradeoff of the review phase length. On one hand, the protocol designer wants to choose a longer review phase period L because it improves the monitoring accuracy, and hence, there is a smaller probability that the system goes into a punishment phase due to monitoring errors. On the other hand, a longer review increases the current gain of a consumer in the review phase upon deviation while it reduces the future loss due to the punishment because of the discounting of the future utility. This requires a stronger punishment (longer punishment phase), and hence, it induces more energy consumption in peak hours as the system stays longer in the punishment phase. More importantly, if the review phase is too long, then even the strongest punishment (i.e., a trigger strategy that prescribes to stay in the punishment phase forever upon deviations) is not able to provide the sufficient incentives for the consumers to schedule the PO energy consumption in the current review phase. Therefore, given consumers’ valuation of the future utility (i.e., the discount factor δ) and the structure of the stage game (i.e., \({u^{d}_{n}}, u^{\text {PO}}_{n}, u^{\text {NE}}_{n},\forall n \in \mathcal {N}\)), there is a maximum length of the review phase, and hence, the billing cycle should not be too long. To make the protocol IC, the protocol designer must choose a review phase no longer than the upper bound determined in Proposition 4. The optimal review phase length L (i.e., billing cycle) is thus such that it minimizes the product of \(K^{*}\left (L, p^{*}_{\text {th}}\right)\) and \(q_{F}\left (p^{*}_{\text {th}}, L\right)\) which have been determined in the previous two steps for a fixed L. Based on the above three design steps, we propose a greedy algorithm (presented in Table 2) to determine the optimal protocols based on review strategies which requires only finite iterations on L. If the candidate threshold p _{th} belongs to a continuous interval, then we need to quantize p _{th} to solve (19) in finite iterations, and hence, the algorithm leads to a suboptimal protocol.
Performance evaluation
In the previous subsection, we determine the optimal protocol design. In this subsection, we characterize the performance of these optimal protocols. Specifically, we are interested in determining whether a protocol based on review strategies Ψ can be ΔPO.
Recall that the firstbest efficiency is \(V^{*} = \sum _{n} u^{\text {PO}}_{n}\), and the sum utility of a given IC protocol can be determined using the result in Proposition 3. Therefore, if η _{ R }(Ψ) is close enough to 1, then (V(Ψ)−V ^{∗})/V ^{∗} can be made less than a given Δ. According to Proposition 3, this can be done if q _{ F } is close enough to 0 and if K is finite. That is, an accurate enough statistical test is required. By Proposition 4, if L is long enough, then the monitoring can be accurate enough. However, a long L can be chosen only when the consumers valuate the future utilities sufficiently high enough. The following theorem characterizes the condition when a protocol is ΔPO.
Theorem 3.
\(\forall p_{\textit {th}} \in \left [0, {p^{d}_{n}}  p^{PO}_{n}\right ]\), for a given Δ∈[0,1], there exists δ _{ min }∈(0,1), such that ∀δ≥δ _{ min } there exists a protocol Ψ(L,K,p _{ th }) such that it is ΔPO.
Proof.
Let us write ΔPO condition in terms of η _{ R } and q _{ F }. V(Ψ)≥(1−Δ)V ^{∗} implies:
Note that Δ∈(0,1)→A∈(0,1), Δ=0→A=1, and Δ=1→A=0.
Fix \(p_{\text {th}} = \bar {p}_{\text {th}} \in [0, {p^{d}_{n}}  p^{PO}]\) and
Now, we select \(L = \bar {L}\) such that
Such \(\bar {L}\) exists due to the law of large numbers (see Proposition 3).
\(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is a ΔPO protocol by the above construction. Now, we show that it is also IC for δ close enough to 1. The condition under which \(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is an IC protocol is:
For δ→1, the above inequality becomes
This implies that there exists a δ _{min} close to 1 such that for every δ≥δ _{min} the above inequality defines an IC protocol (the inequality must be strict). Since (28) is exactly that same as (25), the constructed protocol \(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is IC for δ≥δ _{min}.
Theorem 3 characterizes the asymptotical performance of the protocols based on review strategies. Specifically, when the consumers highly value their future utilities, then the protocol designer can design a protocol based on review strategies whose sum utility is close enough to the firstbest. That is, consumers will comply with the recommended scheduling most of the time.
Corollary 1.
If δ→1, then the efficiency loss of the optimal protocol goes to 0.
The corollary states that if the consumers do not discount their future utilities, then the protocol designer can design a review strategybased protocol which is IC and also asymptotically achieves the full efficiency.
Remark: Our analysis depends on the deviation strategy a ^{d}. To compute \({a^{d}_{n}}\), we consider the unilateral deviation by consumer n at the optimal action profile a ^{OPT}, i.e., \({a^{d}_{n}} = a^{*}_{n} = \arg \max _{a_{n}} \mu _{n}\left (a_{n}; {\boldsymbol {a}}^{\text {OPT}}_{n}\right)\). In this way, a consumer chooses to deviate to the action that maximizes its own utility. However, if the consumer is “smarter”, it can deviate to a slightly lower action than \(a^{*}_{n}\) to avoid being detected but still gains some increased utility (of course, since there is noise, the probability of being detected is higher if its selected action is closer to \({a^{d}_{n}}\)). Hence, a more practical way to set \({a^{d}_{n}}\) is by setting a maximal tolerable deviation action \({a^{d}_{n}} = (1+\gamma)a^{\text {OPT}}_{n}\) where γ<1 depends on the maximal tolerable social welfare loss. Since, by doing this the (oneshot) social welfare loss is at most \(u_{n}\left ({\boldsymbol {a}}^{\text {OPT}}_{n}\right)  u_{n}\left ((1+\gamma){\boldsymbol {a}}^{\text {OPT}}\right)\), the designer can determine γ according to the maximal tolerable social welfare loss and set \({a^{d}_{n}}\) accordingly.
Simulations
In this section, we present numerical results and assess the performance of our proposed framework. For illustration purposes, we assume that the benefit function takes the linear form B _{ n }(a _{ n })=b _{peak} a _{ n } d _{ n }+b _{offpeak}(1−a _{ n })d _{ n }, the cost functions have the quadratic forms [3] C _{peak}(x)=c _{ p } x ^{2}, C _{offpeak}(x)=0.5c _{ p } x ^{2}. We regard c _{ p } as parameter determined by the power generation technology and the power market and will investigate their impact of the protocol design and performance.
Figure 5 shows how the optimal design of the billing cycle varies depending on consumers’ valuation of future utilities (reflected by the discount factor δ). On one hand, a longer billing cycle improves the monitoring accuracy, and therefore, the system has a smaller probability to go into the punishment phase. Hence, the efficiency is increasing in the billing cycle. On the other hand, a too long billing cycle reduces the consumers’ valuation of future utilities and, beyond some point, it violates the consumers’ incentives to comply. Moreover, the optimal billing cycle depends on how much the consumers value the future utilities which is parameterized by the discount factor δ. When δ is large, consumers value more their future utilities, and hence, a longer billing cycle can be used.
Figures 6 and 7 show the impact of environment parameters (the consumer set size and cost function) on the performance of the proposed protocols. In Fig. 8, we investigate the impact of the consumer set size on the performance of our proposed protocols. As the number of consumer increases, the protocol is able to provide sufficient incentives for consumers to follow the recommended optimal energy scheduling actions, and therefore, an improved performance is attained. Fig. 9 varies the cost function parameter c _{1} in peak hours and illustrates the corresponding efficiency for the optimal protocol and two protocols with fixed design parameters. As the cost in peak hours increases, the efficiency decreases. The system efficiency of the optimal protocol is 80 to 250 % higher than that when no incentive protocols are deployed. The optimal protocol also outperforms the fixed protocols by 50 % on average. This highlights that designing the optimal protocol could bring significant improvement on the system efficiency. Varying the cost functions represents different utility functions for the consumers. Note that different optimal scheduling schemes are required for these different utilities, and hence, these simulations also highlight the capability of the proposed framework to design review strategybased protocols which are applicable to a variety of DSM schemes.
In Fig. 8, we consider a set of eight consumers with various load demands. The first subplot illustrates the consumers’ energy consumption scheduling actions with and without the proposed protocol deployment. With the deployment of the protocol, consumers tend to consume less energy in peak hours. At the same time, their individual utilities are increased as illustrated in the second subplot. This is because the energy cost is smaller, and hence, the bills charged to consumers are smaller. The third subplot further shows the prices with respect to various discount factors. Note that the price is proportional to the energy cost. Therefore, the lower the price is, the more energyefficient the system is. When the consumers value more the future utilities (i.e., the discount factor is larger), consumers tend to schedule less energy consumption in peak hours, and therefore, the price is also lower.
Figure 9 illustrates a realtime curve of the prices for the considered set of consumers. Note that the price also reflects the energy generation and transmission cost of this consumer set. The energy price is low in the review phase since consumers schedule less power in peak hours. Due to the imperfect monitoring, if the average price of the previous billing cycle exceeds the predetermined threshold, the system goes into the punishment phase. In the punishment phase, the energy price is high because consumers schedule more consumption in peak hours. When the punishment phase ends, the system automatically goes back to a new review phase and the price drops again.
Conclusions
In this paper, we augment existing DSM schemes using aggregate usage information of a set of participating consumers by proposing a novel framework which incentivizes consumers to reduce their consumption in peak hours. The proposed technology is review strategybased protocols. It is general and can be deployed in conjunction with any DSM schemes proposed in smart grids to make it incentivecompatible (i.e., selfish consumers find it in their selfinterest to follow it). The proposed protocols are simple, and thus, they are suitable for practical implementation in smart grids where the energy scheduling actions cannot be perfectly monitored. Even though this paper considers a simplified smart grid model, our analysis provides important and useful insights for designing incentivecompatible demandside management schemes based on aggregate energy usage information in a variety of practical scenarios.
The success of DSM relies heavily on the energy producer’s knowledge of individual consumers’ energy usage information. However, a huge concern is that smart meters threaten consumers’ privacy as data mining techniques are applied to energy consumption traces in order to infer consumers’ habits and behaviors [2, 23, 24]. This information may be used for other purposes besides improving the efficiency of smart grids, thereby giving rise to privacy concerns [25]. To respond to these concerns, in 2010, California’s new smart meter privacy law [26] was adopted, which mandates privacy protection for the consumers’ energy consumption data. The proposed scheme does not require detailed knowledge of household electricity demand profiles at the appliance level or for a fine time granularity. Instead, it only needs the aggregate usage pattern of individual consumers at large time scales, e.g., the aggregate usage in peak hours of a day and the aggregate usage in offpeak hours of a day. This information is very limited for the detection of the actual consumption traces of individual consumers, and hence, we believe that the proposed scheme can be applied in a wide range of practical deployment scenarios.
Endnotes
^{1} The proposed protocol can also be extended to directly recommend perappliance scheduling actions. However, such an extension would only complicate the notation and presentation of the proposed methods while the proposed methods will remain unchanged.
^{2} In practice, the utility function may dynamically change over time. In this paper, we make the common assumption that the utility function is fixed. One way to handle the dynamically changing utility functions is by periodically updating the design of the proposed protocol to adapt to the changes.
^{3} The proposed method allows both continuous and discrete value space of actions. For illustration purpose, we consider only continuous value space in this paper.
^{4} If consumers consume less than the recommended amount of energy in peak hours, the price is reduced. Since selfinterested consumers only have incentives to deviate to a higher consumption in peak hours (by Proposition 1), we do not regard consuming less in peak hours as a deviation.
References
 1
SM Amin, BF Wollenberg, Toward a smart grid: power delivery for the 21st century. IEEE Power Energy Mag. 3(5), 34–41 (2005).
 2
Z Fan, P Kulkarni, S Gormus, C Efthymiou, G Kalogridis, M Sooriyabandara, Z Zhu, S Lambotharan, WH Chin, Smart grid communications: overview of research challenges, solutions, and standardization activities. IEEE Commun. Surv. Tutorials. 15(1), 21–38 (2013).
 3
AH MohsenianRad, VW Wong, J Jatskevich, R Schober, A LeonGarcia, Autonomous demandside management based on gametheoretic energy consumption scheduling for the future smart grid. IEEE Trans. Smart Grid. 1(3), 320–331 (2010).
 4
GJ Mailath, L Samuelson, Repeated Games and Reputations, vol. 2 (Oxford University Press, Oxford, UK, 2006).
 5
CW Gellings, JH Chamberlin, Demandside management: concepts and methods (The Fairmont Press Inc., Lilburn, GA, 1987).
 6
H Allcott, Real time pricing and electricity markets (Harvard University, 2009).
 7
C Triki, A Violi, Dynamic pricing of electricity in retail markets. 4OR. 7(1), 21–36 (2009).
 8
K Herter, Residential implementation of criticalpeak pricing of electricity. Energy Policy. 35(4), 2121–2130 (2007).
 9
B Ramanathan, V Vittal, A framework for evaluation of advanced direct load control with minimum disruption. IEEE Trans. Power Syst. 23(4), 1681–1688 (2008).
 10
M Alizadeh, A Scaglione, RJ Thomas, From packet to power switching: digital direct load scheduling. Selected IEEE J. Areas Commun. 30(6), 1027–1036 (2012).
 11
L Jia, L Tong, in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on. Optimal pricing for residential demand response: a stochastic optimization approach (IEEEAllerton Park, IL, USA, 2012), pp. 1879–1884.
 12
HI Su, AEl Gamal, in Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. Modeling and analysis of the role of fastresponse energy storage in the smart grid (IEEEAllerton Park, IL, USA, 2011), pp. 719–726.
 13
L Huang, J Walrand, K Ramchandran, in Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on. Optimal demand response with energy storage management (IEEETainan, ROC, 2012), pp. 61–66.
 14
Z Yu, L Jia, MC MurphyHoye, A Pratt, L Tong, Modeling and stochastic control for home energy management. IEEE Trans. Smart Grid. 4(4), 2244–2255 (2013).
 15
N Li, L Chen, SH Low, in Power and Energy Society General Meeting, 2011 IEEE. Optimal demand response based on utility maximization in power networks (IEEESan Diego, CA, USA, 2011), pp. 1–8.
 16
C JoeWong, S Sen, S Ha, M Chiang, Optimized dayahead pricing for smart grids with devicespecific scheduling flexibility. IEEE J. Selected Areas Commun. 30(6), 1075–1085 (2012).
 17
C Ibars, M Navarro, L Giupponi, in Smart grid communications (SmartGridComm), 2010 first IEEE international conference on. Distributed demand management in smart grid with a congestion game (IEEEGaithersburg, MD, USA, 2010), pp. 495–500.
 18
HK Nguyen, JB Song, Z Han, in Computer Communications Workshops (INFOCOM WKSHPS), 2012 IEEE Conference on. Demand side management to reduce peaktoaverage ratio using game theory in smart grid (IEEE, 2012), pp. 91–96.
 19
L Song, Y Xiao, M van der Schaar, Demand side management in smart grids using a repeated game framework. IEEE J. Selected Areas Commun. 32, 1412–1424 (2014).
 20
R Radner, Repeated principalagent games with discounting. Econometrica: J. Econ. Soc. 53(5), 1173–1198 (1985).
 21
JB Rosen, Existence and uniqueness of equilibrium points for concave nperson games. Econometrica: J. Econ. Soc. 33(3), 520–534 (1965).
 22
AJ Wood, BF Wollenberg, Power Generation, Operation, and Control (Wiley, USA, 2012).
 23
P McDaniel, S McLaughlin, Security and privacy challenges in the smart grid. IEEE Secur. Privacy. 7(3), 75–77 (2009).
 24
MK El Mahrsi, S Vignes, G Hébrail, ML Picard, in Research Challenges in Information Science, 2009. RCIS 2009. Third International Conference on. A data stream model for home device description (IEEEFez, Morocco, 2009), pp. 395–402.
 25
G Kalogridis, C Efthymiou, SZ Denic, TA Lewis, R Cepeda, in Smart Grid Communications (SmartGridComm) 2010. First IEEE International Conference on. Privacy for smart meters: towards undetectable appliance load signatures (IEEEGaithersburg, MD, USA, 2010), pp. 232–237.
 26
C King, California’s new landmark smart meter privacy law (2010). http://www.emeter.com/2010/californiasnewlandmarksmartmeterprivacylaw/.
Acknowledgements
This research is supported by NSF CCF 1218136.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Xu, J., van der Schaar, M. Incentivecompatible demandside management for smart grids based on review strategies. EURASIP J. Adv. Signal Process. 2015, 51 (2015). https://doi.org/10.1186/s1363401502359
Received:
Accepted:
Published:
Keywords
 Nash Equilibrium
 Smart Grid
 Schedule Action
 Utility Company
 Review Strategy