In this section, we design the optimal incentivecompatible protocols based on review strategies and analyze their performance. The outline of this section is as follows.

We first establish the existence of IC protocols based on review strategies and provide the IC conditions such that the consumers have incentives to perform the recommended scheduling in the review phase.

Next, we propose a greedy algorithm to determine the optimal design of review strategies.

We then evaluate the performance of the optimal protocol based on review strategies. Specifically, the proposed protocol is able to achieve ΔPO of the firstbest efficiency.
5.1 Incentivecompatibility
Recall that Proposition 2 provides us with a simple method to determine whether a protocol based on review strategies can be IC. In this subsection, we study how to design protocol parameters according to Proposition 2. To do this, we need to compute U
_{
n
}(σ
^{R}s=1), U
_{
n
}(σ
^{R}s=0), and \(U_{n}(\tilde {\sigma }_{n}s=1)\) where s=1 denotes that the system is in a review phase and s=0 denotes that the system is in a punishment phase. These utilities are dependent on each other as follows,
$$ {\fontsize{8.8pt}{9.6pt}\selectfont{\begin{aligned} &U_{n}\left(\sigma^{R}s=1\right)\\ &= \left(1\delta\right)\left(\sum\limits_{t=0}^{L1} \delta^{t} u^{PO}_{n} + \delta^{L}\left[ q_{F}\left(L, p_{\text{th}}\right)U_{n}\left(\sigma^{R}s=0\right)\right.\right.\\ &\quad\left.\left.+\left(1q_{F}\left(p_{\text{th}},L\right)\right)U_{n}\left(\sigma^{R}s=1\right)\right]{\vphantom{\sum\limits_{t=0}^{L1}}}\right) \end{aligned}}} $$
((13))
and
$$ \begin{aligned} &U_{n}\left(\sigma^{R}s=0\right)\\ &= \left(1\delta\right)\left(\sum\limits_{t=0}^{KL1}\delta^{t} u^{NE}_{n} + \delta^{KL} U_{n}\left(\sigma^{R}s=1\right)\right) \end{aligned} $$
((14))
The first term in (13) is the utility in the current review phase. The second term is the continuation utility after the review phase. With probability 1−q
_{
F
}(L,p
_{th}) the system remains in the review phase; with probability q
_{
F
}(L,p
_{th}), the system moves to a punishment phase due to the monitoring error. The utility of consumer n by choosing a deviation scheduling action \({a^{d}_{n}}\) at the beginning of a review phase is given by,
$$ \begin{aligned} &U_{n}\left(\tilde{\sigma}_{n}s=1\right)\\ &=\left(1\delta\right)\left(\sum\limits_{t=0}^{L1}\delta^{t} {u^{d}_{n}} + \delta^{L}\left[q_{M,n}\left(L, p_{\text{th}}\right)U_{n}\left(\sigma^{R}s=1\right)\right.\right.\\ &\quad\left.\left.+\left(1q_{M,n}\left(L, p_{\text{th}}\right)\right)U_{n}\left(\sigma^{R}s=0\right)\right]{\vphantom{\sum\limits_{t=0}^{L1}}}\right) \end{aligned} $$
((15))
The first term in (15) is the utility gain by the deviation in the current review phase which is larger than \(u^{\text {PO}}_{n}\). The second term is the continuation utility after the review phase. With probability 1−q
_{
M,n
}(L,p
_{th}) the deviation is detected, so the system moves to a punishment phase; with probability q
_{
M,n
}(L,p
_{th}) the system remains in the review phase.
We define the following “incentive ratio” of a protocol based on review strategies for consumer n:
$$\begin{array}{*{20}l} g_{n}(L, p_{\text{th}}) = \frac{1}{1  q_{F}  q_{M,n}}\frac{1\delta^{L}}{\delta^{L}}\frac{{u^{d}_{n}}  u^{\text{PO}}_{n}}{u^{\text{PO}}_{n}  u^{\text{NE}}_{n}} \end{array} $$
((16))
Let us examine the physical meaning of this incentive ratio. Essentially, the numerator represents the longterm utility gain due to deviation, and the denominator represents the maximal longterm utility loss due to the punishment. To enforce consumers to cooperatively optimize their energy consumption, this loss should be positive and larger than the gain obtained when deviating. Therefore, the incentive ratio should be in the range [0,1]. It is worth noting that g
_{
n
}(L,p
_{
th
}) should be strictly less than 1 since the denominator is only an upper bound on the loss induced by the punishment but not the actual loss (which depends on L,K,p
_{th}). Theorem 2 provides a condition such that a protocol based on review strategy is IC against a deviation action \({a^{d}_{n}}\). This condition serves as a guideline for the choices of the proper protocol parameters L,K, and p
_{th}.
Theorem
2.
The protocol Ψ(L,K,p
_{
th
}) is IC against \({a^{d}_{n}}\) for consumer n if and only if the billing cycle length satisfies 0≤g
_{
n
}(p
_{
th
},L) and the punishment phase length is large enough, i.e.
$$\begin{array}{*{20}l} K\geq \frac{1}{L}\log_{\delta}(1g_{n}(p_{th}, L)) \end{array} $$
((17))
Proof.
According to Proposition 2, we only need to check the utility difference below,
$$ \begin{aligned} &{}U_{n}\left(\sigma^{R}s=1\right)  U_{n}\left(\tilde{\sigma}s=1\right)\\ &{}=\left(1\delta\right)\left(\sum\limits_{t=0}^{L1} \delta^{t} \left(u^{\text{PO}}_{n}  {u^{d}_{n}}\right)\right. \\ &{}\quad\left.+ \delta^{L} (1  q_{F}  q_{M, n})\left(U_{n}\left(\sigma^{R}s=1\right)  U_{n}\left(\sigma^{R}s=0\right)\right)\right)\\ &{}= (1\delta^{L})\left(u^{\text{PO}}_{n}  {u^{d}_{n}}\right) \\ &{}\quad+ \delta^{L}\left(1\delta^{LK}\right)\left(1q_{F}  q_{M, n}\right)\left(u^{\text{PO}}_{n}  u^{\text{NE}}_{n}\right) \end{aligned} $$
((18))
In the last equation of \((1\delta ^{L})(u^{\text {PO}}_{n}  {u^{d}_{n}})\) is the utility gain in the current review phase billing cycle by deviation, and the remaining term is the utility loss for the future due to the punishment. For Ψ to be IC, the utility loss due to the punishment should exceed the utility gain due to deviation, i.e.,
$$ \begin{aligned} &\left(1\delta^{L}\right)\left(u^{\text{PO}}_{n}  {u^{d}_{n}}\right) \leq \\ &\delta^{L}\left(1\delta^{LK}\right)\left(1q_{F}  q_{M, n}\right)\left(u^{\text{PO}}_{n}  u^{\text{NE}}_{n}\right) \end{aligned} $$
((19))
The sufficient and necessary condition for the above to hold is 0≤g
_{
n
}(L,p
_{th})<1 and K
L≥ logδ(1−g
_{
n
}(L,p
_{th})).
5.2 Optimal protocol parameters
We first determine the efficiency of a given protocol. If a protocol is IC, then all consumers follow the recommended strategy and play the PO action profile. Therefore, the efficiency depends on the probability that the system is in review phases and punishment phases due to monitoring errors. Denote these two probabilities by η
_{
R
}(Ψ) and η
_{
P
}(Ψ)=1−η
_{
R
}(Ψ), respectively. The efficiency of an IC protocol is thus \(V(\Psi) = \sum _{n} (u^{\text {PO}}_{n} \eta _{R}(\Psi) + u^{\text {NE}}_{n} \eta _{P}(\Psi))\) where η
_{
R
}(Ψ) is determined in the following Lemma.
Lemma
1.
\(\eta _{R}(\Psi) = \frac {1}{1+{Kq}_{F}\left (p_{\textit {th}},L\right)}\)
Proof.
Solving the stationary distribution of the Markov chain in Fig. 6 yields the result. The transition probabilities of the chain are RR with probability 1−q
_{
F
}, RP with q
_{
F
}, PP and PR with probability 1.
Lemma implies that in order to maximize the system efficiency, the protocol designer should choose L,K,p
_{th} such that K
q
_{
F
}(L,p
_{th}) is minimized subject to IC conditions in Theorem 2. These design parameters are coupled in a complex manner, and thus, to find the optimal design parameters, it is better to work backwards.
Step 1. We first determine the optimal K
^{∗}(L,p
_{th}) given L,p
_{th}. Therefore, the false alarm probability q
_{
F
}(L,p
_{th}) and the miss detection probability q
_{
M,n
}(L,p
_{
th
}) are also determined. If 0≤g
_{
n
}(L,p
_{th}), the optimal K is chosen as
$$\begin{array}{*{20}l} K^{*}\left(L, p_{\text{th}}\right) = \left\lceil\max\limits_{n} \left(\frac{1}{L}\log_{\delta}\left(1g_{n}\left(p_{\text{th}}, L\right)\right)\right)\right\rceil \end{array} $$
((20))
Step 2. Given L, the statistical test determines q
_{
F
}(L,p
_{th}) and \(q_{M, n}(L, p_{\text {th}}), \forall n \in \mathcal {N}\). Note that K
^{∗}(L,p
_{th}) depends on the statistical test through the term q
_{
F
}(L,p
_{th})+q
_{
M,n
}(L,p
_{th}). Therefore, the optimal statistical threshold is chosen as
$$\begin{array}{*{20}l} p^{*}_{\text{th}} = \arg\min_{p_{\text{th}}} K^{*}\left(L, p_{\text{th}}\right) q_{F}\left(L, p_{\text{th}}\right) \end{array} $$
((21))
In general, the statistical test threshold p
_{th} has two opposite effects on K
^{∗}(L,p
_{th})q
_{
F
}(L,p
_{th}). Minimizing q
_{
F
}(L,p
_{th}) often leads to a large q
_{
M,n
}(L,p
_{th}), and hence, q
_{
F
}(L,p
_{th})+q
_{
M,n
}(L,p
_{th}) may also be large. This by (16) induces a large K
^{∗}(L,p
_{th}). Therefore, the protocol designer has to tradeoff when selecting p
_{
th
} between minimizing the false alarm probability q
_{
F
}(L,p
_{
th
}) and the punishment phase length K
^{∗}(L,p
_{th}).
Step 3. The previous two steps provide the optimal \(p^{*}_{\text {th}}\) and \(K^{*}(L, p_{\text {th}}^{*})\) given the review phase length L. However, the space of L includes all positive integer numbers and is infinite. In the following, we determine the upper bound on L such that an IC protocol can be designed.
Proposition
4.
If Ψ(L,K,p
_{
th
}) is IC, then
$$\begin{array}{*{20}l} L\leq \min\limits_{n} \log_{\delta} \frac{{u^{d}_{n}}  u^{\text{PO}}_{n}}{{u^{d}_{n}}  u^{\text{NE}}_{n}} \end{array} $$
((22))
Proof.
Note that
$$\begin{array}{*{20}l} g_{n}(L, p_{\text{th}}) > \frac{1\delta^{L}}{\delta^{L}}\frac{{u^{d}_{n}}  u^{\text{PO}}_{n}}{u^{\text{PO}}_{n}  u^{\text{NE}}_{n}} \end{array} $$
((23))
If (22) does not hold, then there must exist \(n \in \mathcal {N}\) such that g
_{
n
}(L,p
_{th})>1 which violates the IC condition in Theorem 2. Therefore, for a protocol to be IC, (22) must be satisfied.
Proposition 4 leads to a crucial tradeoff of the review phase length. On one hand, the protocol designer wants to choose a longer review phase period L because it improves the monitoring accuracy, and hence, there is a smaller probability that the system goes into a punishment phase due to monitoring errors. On the other hand, a longer review increases the current gain of a consumer in the review phase upon deviation while it reduces the future loss due to the punishment because of the discounting of the future utility. This requires a stronger punishment (longer punishment phase), and hence, it induces more energy consumption in peak hours as the system stays longer in the punishment phase. More importantly, if the review phase is too long, then even the strongest punishment (i.e., a trigger strategy that prescribes to stay in the punishment phase forever upon deviations) is not able to provide the sufficient incentives for the consumers to schedule the PO energy consumption in the current review phase. Therefore, given consumers’ valuation of the future utility (i.e., the discount factor δ) and the structure of the stage game (i.e., \({u^{d}_{n}}, u^{\text {PO}}_{n}, u^{\text {NE}}_{n},\forall n \in \mathcal {N}\)), there is a maximum length of the review phase, and hence, the billing cycle should not be too long. To make the protocol IC, the protocol designer must choose a review phase no longer than the upper bound determined in Proposition 4. The optimal review phase length L (i.e., billing cycle) is thus such that it minimizes the product of \(K^{*}\left (L, p^{*}_{\text {th}}\right)\) and \(q_{F}\left (p^{*}_{\text {th}}, L\right)\) which have been determined in the previous two steps for a fixed L. Based on the above three design steps, we propose a greedy algorithm (presented in Table 2) to determine the optimal protocols based on review strategies which requires only finite iterations on L. If the candidate threshold p
_{th} belongs to a continuous interval, then we need to quantize p
_{th} to solve (19) in finite iterations, and hence, the algorithm leads to a suboptimal protocol.
5.3 Performance evaluation
In the previous subsection, we determine the optimal protocol design. In this subsection, we characterize the performance of these optimal protocols. Specifically, we are interested in determining whether a protocol based on review strategies Ψ can be ΔPO.
Recall that the firstbest efficiency is \(V^{*} = \sum _{n} u^{\text {PO}}_{n}\), and the sum utility of a given IC protocol can be determined using the result in Proposition 3. Therefore, if η
_{
R
}(Ψ) is close enough to 1, then (V(Ψ)−V
^{∗})/V
^{∗} can be made less than a given Δ. According to Proposition 3, this can be done if q
_{
F
} is close enough to 0 and if K is finite. That is, an accurate enough statistical test is required. By Proposition 4, if L is long enough, then the monitoring can be accurate enough. However, a long L can be chosen only when the consumers valuate the future utilities sufficiently high enough. The following theorem characterizes the condition when a protocol is ΔPO.
Theorem
3.
\(\forall p_{\textit {th}} \in \left [0, {p^{d}_{n}}  p^{PO}_{n}\right ]\), for a given Δ∈[0,1], there exists δ
_{
min
}∈(0,1), such that ∀δ≥δ
_{
min
} there exists a protocol Ψ(L,K,p
_{
th
}) such that it is ΔPO.
Proof.
Let us write ΔPO condition in terms of η
_{
R
} and q
_{
F
}. V(Ψ)≥(1−Δ)V
^{∗} implies:
$$ \begin{array}{ll} &\eta_{R}\sum_{n\in \mathcal{N}} u^{\text{PO}}_{n} + (1\eta_{R})\sum_{n\in\mathcal{N}} u^{\text{NE}}_{n} \\ &\geq (1\Delta) \sum_{n\in \mathcal{N}} u^{\text{PO}}_{n}\\ \Rightarrow& \eta_{R}\geq \frac{(1\Delta)u^{\text{PO}}_{n}  u^{\text{NE}}_{n}}{u^{\text{PO}}_{n}  u^{\text{NE}}_{n}} \triangleq A\\ \Rightarrow& q_{F}(L, p_{\text{th}}) \leq \frac{1A}{K A} \end{array} $$
((24))
Note that Δ∈(0,1)→A∈(0,1), Δ=0→A=1, and Δ=1→A=0.
Fix \(p_{\text {th}} = \bar {p}_{\text {th}} \in [0, {p^{d}_{n}}  p^{PO}]\) and
$$\begin{array}{*{20}l} K = \bar{K} > \max_{n}\in \mathcal{N} \frac{{u^{d}_{n}}  u^{\text{PO}}_{n}}{u^{\text{PO}}_{n}  u^{\text{NE}}_{n}} \end{array} $$
((25))
Now, we select \(L = \bar {L}\) such that
$$\begin{array}{*{20}l} q_{F}(\bar{L}, \bar{p}_{\text{th}}) \leq \frac{1A}{KA} \end{array} $$
((26))
Such \(\bar {L}\) exists due to the law of large numbers (see Proposition 3).
\(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is a ΔPO protocol by the above construction. Now, we show that it is also IC for δ close enough to 1. The condition under which \(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is an IC protocol is:
$$\begin{array}{*{20}l} \frac{1\delta^{\bar{L}}}{1\delta^{\bar{L}\bar{K}}} \leq \delta^{\bar{L}}\left(1 q_{F}  q_{M, n}\right)\frac{u^{\text{PO}}_{n}  u^{\text{NE}}_{n}}{{u^{d}_{n}}  u^{\text{PO}}_{n}}, \forall n \in \mathcal{N} \end{array} $$
((27))
For δ→1, the above inequality becomes
$$\begin{array}{*{20}l} \bar{K} \geq \max_{n} \frac{{u^{d}_{n}}  u^{\text{NE}}_{n}}{u^{\text{PO}}_{n}  u^{\text{NE}}_{n}} \end{array} $$
((28))
This implies that there exists a δ
_{min} close to 1 such that for every δ≥δ
_{min} the above inequality defines an IC protocol (the inequality must be strict). Since (28) is exactly that same as (25), the constructed protocol \(\Psi (\bar {L}, \bar {K}, \bar {p}_{\text {th}})\) is IC for δ≥δ
_{min}.
Theorem 3 characterizes the asymptotical performance of the protocols based on review strategies. Specifically, when the consumers highly value their future utilities, then the protocol designer can design a protocol based on review strategies whose sum utility is close enough to the firstbest. That is, consumers will comply with the recommended scheduling most of the time.
Corollary
1.
If δ→1, then the efficiency loss of the optimal protocol goes to 0.
The corollary states that if the consumers do not discount their future utilities, then the protocol designer can design a review strategybased protocol which is IC and also asymptotically achieves the full efficiency.
Remark: Our analysis depends on the deviation strategy a
^{d}. To compute \({a^{d}_{n}}\), we consider the unilateral deviation by consumer n at the optimal action profile a
^{OPT}, i.e., \({a^{d}_{n}} = a^{*}_{n} = \arg \max _{a_{n}} \mu _{n}\left (a_{n}; {\boldsymbol {a}}^{\text {OPT}}_{n}\right)\). In this way, a consumer chooses to deviate to the action that maximizes its own utility. However, if the consumer is “smarter”, it can deviate to a slightly lower action than \(a^{*}_{n}\) to avoid being detected but still gains some increased utility (of course, since there is noise, the probability of being detected is higher if its selected action is closer to \({a^{d}_{n}}\)). Hence, a more practical way to set \({a^{d}_{n}}\) is by setting a maximal tolerable deviation action \({a^{d}_{n}} = (1+\gamma)a^{\text {OPT}}_{n}\) where γ<1 depends on the maximal tolerable social welfare loss. Since, by doing this the (oneshot) social welfare loss is at most \(u_{n}\left ({\boldsymbol {a}}^{\text {OPT}}_{n}\right)  u_{n}\left ((1+\gamma){\boldsymbol {a}}^{\text {OPT}}\right)\), the designer can determine γ according to the maximal tolerable social welfare loss and set \({a^{d}_{n}}\) accordingly.