In the following, we consider the compensation rules of several uncertainty decoding techniques from a Bayesian view.

### 4.1 General example of uncertainty decoding

A fundamental example of uncertainty decoding can, e.g., be extracted from [1, 37–42]. The underlying observation model can be identified as

$$\begin{array}{@{}rcl@{}} \mathbf{y}_{n} = \mathbf{x}_{n} + \mathbf{b}_{n} \ \text{with} \ \mathbf{b}_{n} \sim \mathcal{N}(\mathbf{0}, \mathbf{C}_{\mathbf{b}_{n}}), \end{array} $$

((10))

where **y**
_{
n
} and \(\mathbf {C}_{\mathbf {b}_{n}}\phantom {\dot {i}\!}\) often play the role of an enhanced feature vector, e.g., from a Wiener filtering front-end [40] and a measure of uncertainty from the enhancement process, respectively. Thus, the point estimate **y**
_{
n
} can be seen as being enriched by the additional reliability information \(\mathbf {C}_{\mathbf {b}_{n}}\phantom {\dot {i}\!}\). The observation model is representable by the Bayesian network in Fig. 2
a. Exploiting the conditional independence properties of Bayesian networks [26], the compensation of the observation likelihood in (5) leads to [26]

$$\begin{array}{@{}rcl@{}} p(\mathbf{y}_{n}| q_{n}) &=& \int p(\mathbf{x}_{n}|q_{n}) \, p(\mathbf{y}_{n}|\mathbf{x}_{n}) d\mathbf{x}_{n} \\ &=& \int \mathcal{N}(\mathbf{x}_{n} ; \boldsymbol{\mu}_{\mathbf{x}|q_{n}}, \mathbf{C}_{\mathbf{x}|q_{n}}) \, \mathcal{N}(\mathbf{y}_{n} ; \mathbf{x}_{n}, \mathbf{C}_{\mathbf{b}_{n}}) d\mathbf{x}_{n} \\ &=& \mathcal{N}(\mathbf{y}_{n} ; \boldsymbol{\mu}_{\mathbf{x}|q_{n}}, \mathbf{C}_{\mathbf{x}|q_{n}} + \mathbf{C}_{\mathbf{b}_{n}}). \end{array} $$

((11))

Without loss of generality, a single Gaussian pdf *p*(**x**
_{
n
}|*q*
_{
n
}) is assumed since, in the case of a Gaussian mixture model (GMM), the linear mismatch function (10) can be applied to each Gaussian component separately.

### 4.2 Dynamic variance compensation

The concept of dynamic variance compensation [2] is based on a reformulation of the log-sum observation model [39]:

$$\begin{array}{@{}rcl@{}} \mathbf{y}_{n} = \mathbf{x}_{n} + \log\left(1+\exp(\widehat{\mathbf{r}}_{n} - \mathbf{x}_{n})\right) + \mathbf{b}_{n} \end{array} $$

((12))

with \(\widehat {\mathbf {r}}_{n}\) being a noise estimate of any noise tracking algorithm and \(\mathbf {b}_{n} \sim \mathcal {N}(\mathbf {0}, \mathbf {C}_{\mathbf {b}_{n}})\) a residual error term. Since the analytical derivation of *p*(**y**
_{
n
}|*q*
_{
n
}) is intractable, an approximate pdf is evaluated based on the assumption of *p*(**x**
_{
n
}|**y**
_{
n
}) being Gaussian and that the compensation can be applied to each Gaussian component of the GMM separately [2]. According to Fig. 2
a, the observation likelihood in (5), in its scaled version *p* ̈(**y**
_{
n
}|*q*
_{
n
}), hence becomes:

$$\begin{array}{@{}rcl@{}} \mathring{p}(\mathbf{y}_{n}|q_{n}) &=& \int p(\mathbf{x}_{n}|q_{n}) \, \frac{p(\mathbf{x}_{n}|\mathbf{y}_{n})}{p(\mathbf{x}_{n})} d\mathbf{x}_{n} \\ &\approx& \int p(\mathbf{x}_{n}|q_{n}) \, p(\mathbf{x}_{n}|\mathbf{y}_{n}) d\mathbf{x}_{n} \end{array} $$

((13))

$$\begin{array}{@{}rcl@{}} &\approx& \int \mathcal{N}(\mathbf{x}_{n} ; \boldsymbol\mu_{\mathbf{x}|q_{n}}, \mathbf{C}_{\mathbf{x}|q_{n}}) \mathcal{N}(\mathbf{x}_{n} ; \boldsymbol\mu_{\mathbf{x}|\mathbf{y}_{n}}, \mathbf{C}_{\mathbf{x}|\mathbf{y}_{n}}) d\mathbf{x}_{n} \\ &=& \mathcal{N}(\boldsymbol\mu_{\mathbf{x}|q_{n}}; \boldsymbol\mu_{\mathbf{x}|\mathbf{y}_{n}}, \mathbf{C}_{\mathbf{x}|q_{n}} + \mathbf{C}_{\mathbf{x}|\mathbf{y}_{n}}), \end{array} $$

((14))

where the approximation (13) can be justified if *p*(**x**
_{
n
}) is assumed to be significantly “flatter,” i.e., of larger variance, than \(p(\mathbf {x}_{n}|\mathbf {y}_{n})\phantom {\dot {i}\!}\). The estimation of the moments \(\boldsymbol \mu _{\mathbf {x}|\mathbf {y}_{n}}\phantom {\dot {i}\!}\), \(\mathbf {C}_{\mathbf {x}|\mathbf {y}_{n}}\phantom {\dot {i}\!}\) of \(p(\mathbf {x}_{n}|\mathbf {y}_{n})\phantom {\dot {i}\!}\) represents the core of [2].

### 4.3 Uncertainty decoding with SPLICE

The stereo piecewise linear compensation for environment (SPLICE) approach, first introduced in [43] and further developed in [44, 45], is a popular method for cepstral feature enhancement based on a mapping learned from stereo (i.e., clean and noisy) data [25]. While SPLICE can be used to derive an minimum mean square error (MMSE) [44] or MAP [43] estimate that is fed into the recognizer, it is also applicable in the context of uncertainty decoding [3], which we focus on in the following. In order to derive a Bayesian network representation of the uncertainty decoding version of SPLICE [3], we first note from [3] that one fundamental assumption is

$$\begin{array}{@{}rcl@{}} p(\mathbf{x}_{n}|\mathbf{y}_{n},s_{n}) = \mathcal{N}(\mathbf{x}_{n}; \mathbf{y}_{n} + \mathbf{r}_{s_{n}}, \boldsymbol{\Gamma}_{s_{n}}), \end{array} $$

((15))

where *s*
_{
n
} denotes a discrete region index, \(\mathbf {r}_{s_{n}}\phantom {\dot {i}\!}\) is its bias, and \(\boldsymbol {\Gamma }_{s_{n}}\phantom {\dot {i}\!}\) the uncertainty in that region. Exploiting the symmetry of the Gaussian pdf

$$\begin{array}{@{}rcl@{}} p(\mathbf{x}_{n}|\mathbf{y}_{n},s_{n}) &=& \mathcal{N}(\mathbf{x}_{n}; \mathbf{y}_{n} + \mathbf{r}_{s_{n}}, \boldsymbol{\Gamma}_{s_{n}}) \\ &=& \mathcal{N}(\mathbf{y}_{n} - \mathbf{x}_{n}; -\mathbf{r}_{s_{n}}, \boldsymbol{\Gamma}_{s_{n}}) \end{array} $$

((16))

and defining **b**
_{
n
}=**y**
_{
n
}−**x**
_{
n
}, we identify the observation model to be

$$ \mathbf{y}_{n} = \mathbf{x}_{n} + \mathbf{b}_{n} $$

((17))

given a certain region index *s*
_{
n
}. In the general case of *s*
_{
n
} depending on **x**
_{
n
}, the observation model can be expressed by the Bayesian network in Fig. 2
b with

$$\begin{array}{@{}rcl@{}} p(\mathbf{b}_{n} | s_{n}) = \mathcal{N}(\mathbf{b}_{n}; -\mathbf{r}_{s_{n}}, \boldsymbol{\Gamma}_{s_{n}}). \end{array} $$

((18))

This reveals that the introduction of different regions *s*
_{
n
} is equivalent to assuming an affine model (18) with *p*(**b**
_{
n
}) being a GMM instead of a single Gaussian density, as in (10). By introducing a separate prior model

$$\begin{array}{@{}rcl@{}} p(\mathbf{y}_{n}) &=& \sum\limits_{s_{n}} p(s_{n}) \, p(\mathbf{y}_{n}|s_{n}) \\ &=& \sum\limits_{s_{n}} p(s_{n}) \, \mathcal{N}(\mathbf{y}_{n}; \boldsymbol{\mu}_{\mathbf{y}|s_{n}}, \mathbf{C}_{\mathbf{y}|s_{n}}), \end{array} $$

((19))

for the distorted speech **y**
_{
n
}, the likelihood in (5) can be adapted according to

$$ \begin{aligned} p(\mathbf{y}_{n}|q_{n}) &= \int p(\mathbf{x}_{n}|q_{n}) \, p(\mathbf{y}_{n}|\mathbf{x}_{n}) d\mathbf{x}_{n}\\ &= \int p(\mathbf{x}_{n}|q_{n}) \, \frac{p(\mathbf{x}_{n},\mathbf{y}_{n})}{p(\mathbf{x}_{n})} d\mathbf{x}_{n} = \int p(\mathbf{x}_{n}|q_{n}) \\ &\quad \cdot \frac{\sum_{s_{n}} p(\mathbf{x}_{n}|\mathbf{y}_{n},s_{n}) \, p(\mathbf{y}_{n}|s_{n}) \, p(s_{n})}{\sum_{s_{n}} \int {p(\mathbf{x}_{n}|\mathbf{y}_{n},s_{n}) \, p(\mathbf{y}_{n}|s_{n}) \, p(s_{n}) d{\mathbf{y}}_{n}}} d\mathbf{x}_{n}. \;\;\;\;\;\; \end{aligned} $$

((20))

Although analytically tractable, both the numerator and the denominator in (20) are typically approximated for the sake of runtime efficiency [3].

### 4.4 Joint uncertainty decoding

Model-based joint uncertainty decoding [4] assumes an affine observation model in the cepstral domain

$$\begin{array}{@{}rcl@{}} \mathbf{y}_{n} = \mathbf{A}_{k_{n}} \mathbf{x}_{n} + \mathbf{b}_{n} \end{array} $$

((21))

with the deterministic matrix \(\mathbf {A}_{k_{n}}\phantom {\dot {i}\!}\) and \(p(\mathbf {b}_{n} | k_{n}) = \mathcal {N}(\mathbf {b}_{n} ; \boldsymbol {\mu }_{\mathbf {b}|k_{n}}, \mathbf {C}_{\mathbf {b}|k_{n}})\phantom {\dot {i}\!}\) depending on the considered Gaussian component *k*
_{
n
} of the GMM of the current HMM state *q*
_{
n
}:

$$\begin{array}{@{}rcl@{}} p(\mathbf{x}_{n} | q_{n}) = \sum_{k_{n}} p(k_{n}) \, p(\mathbf{x}_{n} | k_{n}). \end{array} $$

((22))

The Bayesian network is depicted in Fig. 2
c implying the following compensation rule:

$$\begin{array}{@{}rcl@{}} p(\mathbf{y}_{n}|k_{n}) = \int p(\mathbf{x}_{n} | k_{n}) \, p(\mathbf{y}_{n} | \mathbf{x}_{n}, k_{n}) d\mathbf{x}_{n}, \end{array} $$

((23))

which can be analytically derived analogously to (11). In practice, the compensation parameters \(\mathbf {A}_{k_{n}}\phantom {\dot {i}\!}\), \(\boldsymbol {\mu }_{\mathbf {b}|k_{n}}\phantom {\dot {i}\!}\), and \(\mathbf {C}_{\mathbf {b}|k_{n}}\phantom {\dot {i}\!}\) are not estimated for each Gaussian component *k*
_{
n
} but for each regression class comprising a set of Gaussian components [4].

### 4.5 REMOS

As many other techniques, the reverberation modeling for speech recognition (REMOS) concept [5, 46] assumes the environmental distortion to be additive in the melspectral domain. However, REMOS also considers the influence of the *L* previous clean speech feature vectors **x**
_{
n−L:n−1} in order to model the dispersive effect of reverberation and to relax the conditional independence assumption of conventional HMMs. The observation model reads in the logmelspec domain:

$$\begin{array}{@{}rcl@{}} \mathbf{y}_{n} &=& \log\left(\vphantom{\sum\limits_{l=1}^{L}}\exp(\mathbf{c}_{n}) + \exp(\mathbf{h}_{n} + \mathbf{x}_{n})\right. \\ && +\left. \exp(\mathbf{a}_{n}) \odot \sum\limits_{l=1}^{L}{\exp({\boldsymbol\mu}_{l} + \mathbf{x}_{n-l})}\right), \end{array} $$

((24))

where the normally distributed random variables **c**
_{
n
},**h**
_{
n
}, and **a**
_{
n
} model the additive noise components, the early part of the room impulse response (RIR), and the weighting of the late part of the RIR, respectively, and the parameters *μ*
_{1:L
} represent a deterministic description of the late part of the RIR. The Bayesian network is depicted in Fig. 3 with **b**
_{
n
}=[**c**
_{
n
},**a**
_{
n
},**h**
_{
n
}]. In contrast to most of the other compensation rules reviewed in this article, the REMOS concept necessitates a modification of the Viterbi decoder due to the introduced cross-connections in Fig. 3. In order to arrive at a computationally feasible decoder, the marginalization over the previous clean speech components **x**
_{
n−L:n−1} is circumvented by employing estimates \(\widehat {\mathbf {x}}_{n-L:n-1}(q_{n-1})\) that depend on the best partial path, i.e., on the previous HMM state *q*
_{
n−1}. The resulting analytically intractable integral is then approximated by the maximum of its integrand:

$$ \begin{aligned} &{p(\mathbf{y}_{n} | q_{n}, \widehat{\mathbf{x}}_{n-L:n-1}(q_{n-1}))}\\ &\quad = \int p(\mathbf{y}_{n}|\mathbf{x}_{n},\widehat{\mathbf{x}}_{n-L:n-1}(q_{n-1})) \, p(\mathbf{x}_{n}|q_{n}) d\mathbf{x}_{n} \\ &\quad \approx \max_{\mathbf{x}_{n}} \; p(\mathbf{y}_{n}|\mathbf{x}_{n},\widehat{\mathbf{x}}_{n-L:n-1}(q_{n-1})) \, p(\mathbf{x}_{n}|q_{n}). \;\;\;\;\; \end{aligned} $$

((25))

The determination of a global solution to (25) represents the core of the REMOS concept. The estimates \(\widehat {\mathbf {x}}_{n-L:n-1}(q_{n-1})\) in turn are the solutions to (25) at previous time steps. We refer to [5] for a detailed derivation of the corresponding decoding routine.

It seems worthwhile noting that the simplification in (25) represents a variant of the MAP integral approximation, as often applied in Bayesian estimation [26]. To show this, we first omit the dependency on \(\widehat {\mathbf {x}}_{n-L:n-1}(q_{n-1})\) for notational convenience and define

$$ \begin{aligned} \mathbf{x}_{n}^{\text{MAP}} &= \arg\max_{\mathbf{x}_{n}} \; p(\mathbf{y}_{n} | \mathbf{x}_{n}) \, p(\mathbf{x}_{n} | q_{n})\\ &= \arg\max_{\mathbf{x}_{n}} \; \frac{p(\mathbf{y}_{n}, \mathbf{x}_{n} | q_{n})}{p(\mathbf{y}_{n}|q_{n})} = \arg\max_{\mathbf{x}_{n}} \; p(\mathbf{x}_{n} | \mathbf{y}_{n}, q_{n}), \;\;\;\;\;\; \end{aligned} $$

((26))

where we scaled the objective function in the second step by the constant 1/*p*(**y**
_{
n
}|*q*
_{
n
}). We can now reformulate the Bayesian integral leading to a novel derivation of (25)

$$\begin{array}{@{}rcl@{}} p(\mathbf{y}_{n}|q_{n}) &=& \int p(\mathbf{y}_{n}|\mathbf{x}_{n}) \, p(\mathbf{x}_{n}|q_{n}) d\mathbf{x}_{n} \\ &=& \int p(\mathbf{y}_{n}|q_{n}) \, p(\mathbf{x}_{n}|\mathbf{y}_{n}, q_{n}) d\mathbf{x}_{n}\\ &\approx& \int p(\mathbf{y}_{n}|q_{n}) \, p\left(\mathbf{x}_{n}|\mathbf{y}_{n}, q_{n}\right) \, \delta\left(\mathbf{x}_{n} - \mathbf{x}_{n}^{\text{MAP}}\right) d\mathbf{x}_{n} \\ &=& p(\mathbf{y}_{n}|q_{n}) \, p\left(\mathbf{x}_{n}^{\text{MAP}}|\mathbf{y}_{n}, q_{n}\right) \end{array} $$

((27))

$$\begin{array}{@{}rcl@{}} &=& p\left(\mathbf{y}_{n}|\mathbf{x}_{n}^{\text{MAP}}\right) \, p\left(\mathbf{x}_{n}^{\text{MAP}}|q_{n}\right). \end{array} $$

((28))

We see that the assumption underlying (25) is a modified MAP approximation:

$$ p(\mathbf{x}_{n}|\mathbf{y}_{n}, q_{n}) \approx p(\mathbf{x}_{n}|\mathbf{y}_{n}, q_{n}) \, \delta\left(\mathbf{x}_{n} - \mathbf{x}_{n}^{\text{MAP}}\right), $$

((29))

which slightly differs from the conventional MAP approximation:

$$ p(\mathbf{x}_{n}|\mathbf{y}_{n}, q_{n}) \approx \delta\left(\mathbf{x}_{n} - \mathbf{x}_{n}^{\text{MAP}}\right). $$

((30))

The obvious disadvantage of (29) is that the resulting score (25) does not represent a normalized likelihood w.r.t. to **y**
_{
n
}. On the other hand, the modified MAP approximation (29) leads to a scaled version of the exact likelihood *p*(**y**
_{
n
}|*q*
_{
n
}), cf. (27), with the scaling factor \(p(\mathbf {x}_{n}^{\text {MAP}}|\mathbf {y}_{n}, q_{n})\phantom {\dot {i}\!}\) being all the higher with increasing accuracy of the approximation (29).

### 4.6 Ion and Haeb-Umbach

Similarly to REMOS, the generic uncertainty decoding approach given in [24], and first proposed by [20], considers cross-connections in the Bayesian network in order to relax the conditional independence assumption of HMMs. The concept, as described in [24], is an example of uncertainty decoding, where the compensation rule can be defined by a modified Bayesian network structure—given in Fig. 4
a—without fixing a particular functional form of the involved pdfs via an analytical observation model. In order to derive the compensation rule, we start by introducing the sequence **x**
_{1:N
} of latent clean speech vectors in each summand of (4)

$$\begin{array}{@{}rcl@{}} \,p(\mathbf{y}_{1:N},q_{1:N})\!\! &=&\! \int p(\mathbf{y}_{1:N},\mathbf{x}_{1:N},q_{1:N}) d\mathbf{x}_{1:N} \\ &=&\! \int\! p(\mathbf{y}_{1:N}|\mathbf{x}_{1:N})\! \!\prod\limits_{n=1}^{N} p(\mathbf{x}_{n}|q_{n}) p(q_{n}|q_{n-1}) d\mathbf{x}_{1:N}\\ &\sim&\!\! \int\!\! \frac{p(\mathbf{x}_{1:N}|\mathbf{y}_{1:N})}{p(\mathbf{x}_{1:N})} \!\!\prod\limits_{n=1}^{N}\! p(\mathbf{x}_{n}|q_{n}) p(q_{n}|q_{n-1}) d\mathbf{x}_{1:N},\!\!\\ \end{array} $$

((31))

where we exploited the conditional independence properties defined by Fig. 4
a (respecting the dashed links) and dropped *p*(**y**
_{1:N
}) in the last line of (31) as it represents a constant factor with respect to a varying state sequence *q*
_{1:N
}. The pdf in the numerator of (31) is next turned into

$$\begin{array}{@{}rcl@{}} p(\mathbf{x}_{1:N}|\mathbf{y}_{1:N}) &=& p(\mathbf{x}_{1}|\mathbf{y}_{1:N}) \prod\limits_{n=2}^{N} p(\mathbf{x}_{n}|\mathbf{y}_{1:N}, \mathbf{x}_{1:n-1}) \\ &\approx& \prod\limits_{n=1}^{N} p(\mathbf{x}_{n}|\mathbf{y}_{1:N}), \end{array} $$

((32))

where the conditional dependence (due to the head-to-head relation) of **x**
_{
n
} and **x**
_{1:n−1} is neglected. This corresponds to omitting the respective dashed links in Fig. 4
a for each factor in (32) separately. The denominator in (31) can also be further decomposed if the dashed links in Fig. 4
b, i.e., the head-to-tail relations in *q*
_{
n
}, are disregarded:

$$ p(\mathbf{x}_{1:N}) \approx \prod\limits_{n=1}^{N} p(\mathbf{x}_{n}). $$

((33))

With (32) and (33), the updated rule (31) is finally turned into the following simplified form:

$$ \begin{aligned} {p(\mathbf{y}_{1:N},q_{1:N})}\sim \prod\limits_{n=1}^{N} \int \frac{p(\mathbf{x}_{n}|\mathbf{y}_{1:N})}{p(\mathbf{x}_{n})} \, p(\mathbf{x}_{n}|q_{n}) d\mathbf{x}_{n} \, p(q_{n}|q_{n-1}) \end{aligned} $$

((34))

that is given in [24]. Due to the approximations in Fig. 4
a, b, the compensation rule defined by (34) exhibits the same decoupling as (5) and can thus be carried out without modifying the underlying decoder. In practice, *p*(**x**
_{
n
}) may, e.g., be modeled as a separate Gaussian density and *p*(**x**
_{
n
}|**y**
_{1:N
}) as a separate Markov process [24].

### 4.7 Significance decoding

Assuming the affine model (10), the concept of significance decoding [9] first derives the moments of the posterior *p*(**x**
_{
n
}|**y**
_{
n
},*q*
_{
n
}):

$$ \begin{aligned} p(\mathbf{x}_{n}|\mathbf{y}_{n},q_{n}) &= \frac{p(\mathbf{y}_{n}|\mathbf{x}_{n},q_{n}) \, p(\mathbf{x}_{n}|q_{n})}{\int p(\mathbf{y}_{n}|\mathbf{x}_{n},q_{n}) \, p(\mathbf{x}_{n} | q_{n}) d\mathbf{x}_{n}} \\ &= \frac{p(\mathbf{y}_{n}|\mathbf{x}_{n}) \, p(\mathbf{x}_{n}|q_{n})}{\int p(\mathbf{y}_{n}|\mathbf{x}_{n}) \, p(\mathbf{x}_{n} | q_{n}) d\mathbf{x}_{n}} \\ &= \mathcal{N}(\mathbf{x}_{n}; \boldsymbol{\mu}_{\mathbf{x}|\mathbf{y}_{n},q_{n}}, \mathbf{C}_{\mathbf{x}|\mathbf{y}_{n},q_{n}}), \end{aligned} $$

((35))

where the Bayesian network properties of Fig. 2
a have been exploited in the numerator and the denominator and a single Gaussian pdf \(p(\mathbf {x}_{n} |q_{n})\phantom {\dot {i}\!}\) is assumed without loss of generality. In a second step, the clean likelihood \(p(\mathbf {x}_{n}|q_{n})\phantom {\dot {i}\!}\) is evaluated at \(\boldsymbol {\mu }_{\mathbf {x}|\mathbf {y}_{n},q_{n}}\phantom {\dot {i}\!}\) after adding the variance \(\mathbf {C}_{\mathbf {x}|\mathbf {y}_{n},q_{n}}\phantom {\dot {i}\!}\) to \(\mathbf {C}_{\mathbf {x}|q_{n}}\phantom {\dot {i}\!}\), cf. (36).

In terms of probabilistic notation, this compensation rule corresponds to replacing the score calculation in (5) by an expected likelihood, similarly to [47, 48]:

$$\begin{array}{@{}rcl@{}} p(\mathbf{y}_{n}|q_{n}) &\approx& \mathcal{E}_{\mathbf{x}_{n}|\mathbf{y}_{n},q_{n}}\{p(\mathbf{x}_{n}|q_{n})\} \\ &=& \int p(\mathbf{x}_{n}|\mathbf{y}_{n},q_{n}) \, p(\mathbf{x}_{n}|q_{n}) d\mathbf{x}_{n} \\ &=& \mathcal{N}\left(\boldsymbol{\mu}_{\mathbf{x}|\mathbf{y}_{n},q_{n}}; \boldsymbol{\mu}_{\mathbf{x}|q_{n}}, \mathbf{C}_{\mathbf{x}|\mathbf{y}_{n},q_{n}} \! + \! \mathbf{C}_{\mathbf{x}|q_{n}}\right). \end{array} $$

((36))

For the case of single Gaussian densities, the (in the Bayesian sense) exact score *p*(**y**
_{
n
}|*q*
_{
n
}) is given by (11). Extending previous work [9], we show (11) to be bounded from above by the modified score (36):

$$ \begin{aligned} \mathcal{E}_{\mathbf{x}_{n}|\mathbf{y}_{n},q_{n}}\{p(\mathbf{x}_{n}|q_{n})\} &= \int p(\mathbf{x}_{n}|\mathbf{y}_{n},q_{n}) \, p(\mathbf{x}_{n}|q_{n}) d\mathbf{x}_{n}\\ &= \underbrace{\frac{\int p(\mathbf{y}_{n}|\mathbf{x}_{n}) \, p^{2}(\mathbf{x}_{n}|q_{n}) d\mathbf{x}_{n}}{p^{2}(\mathbf{y}_{n}|q_{n})}}_{\alpha} \, p(\mathbf{y}_{n} | q_{n}), \end{aligned} $$

((37))

where *α* can be evaluated exploiting the product rules of Gaussians [26]:

$$ \begin{aligned} \alpha& = \sqrt{\frac{\det(\mathbf{C}_{\mathbf{x}|q_{n}}\,+\,\mathbf{C}_{\mathbf{b}_{n}})}{\det(\mathbf{C}_{\mathbf{x}|q_{n}})}} \frac{\mathcal{N}(\mathbf{y}; \boldsymbol{\mu}_{\mathbf{x}|q_{n}}, \frac{1}{2}\mathbf{C}_{\mathbf{x}|q_{n}} \! + \! \mathbf{C}_{\mathbf{b}_{n}})}{\mathcal{N}(\mathbf{y}; \boldsymbol\mu_{\mathbf{x}|q_{n}}, \frac{1}{2}\mathbf{C}_{\mathbf{x}|q_{n}} \! + \! \frac{1}{2}\mathbf{C}_{\mathbf{b}_{n}})}\\ &\geq \sqrt{\frac{\det(\mathbf{C}_{\mathbf{x}|q_{n}}\,+\,\mathbf{C}_{\mathbf{b}_{n}})\det(\frac{1}{2}\mathbf{C}_{\mathbf{x}|q_{n}}\,+\,\frac{1}{2}\mathbf{C}_{\mathbf{b}_{n}})}{\det(\mathbf{C}_{\mathbf{x}|q_{n}})\det(\frac{1}{2}\mathbf{C}_{\mathbf{x}|q_{n}}\,+\,\mathbf{C}_{\mathbf{b}_{n}})}} \geq 1. \end{aligned} $$

((38))

A closer inspection of *α* reveals that the expected likelihood computation scales up *p*(**y**
_{
n
}|*q*
_{
n
}) for large values of \(\mathbf {C}_{\mathbf {b}_{n}}\phantom {\dot {i}\!}\), which acts as an alleviation of the (potentially overly) flattening effect of \(\mathbf {C}_{\mathbf {b}_{n}}\phantom {\dot {i}\!}\) on *p*(**y**
_{
n
}|*q*
_{
n
}), cf. (11).