 Research
 Open Access
 Published:
Multiple importance sampling revisited: breaking the bounds
EURASIP Journal on Advances in Signal Processing volume 2018, Article number: 15 (2018)
Abstract
We revisit the multiple importance sampling (MIS) estimator and investigate the bound on the efficiency improvement over balance heuristic estimator with equal count of samples established in Veach’s thesis. We revise the proof for this and come to the conclusion that there is no such bound and henceforth it makes sense to look for new estimators that improve on balance heuristic estimator with equal count of samples. Next, we examine a recently introduced nonbalance heuristic MIS estimator that is provably better than balance heuristic with equal count of samples, and we improve it both in variance and efficiency. We then obtain an equally provably better onesample balance heuristic estimator, and finally, we introduce a heuristic for the count of samples that can be used when the individual techniques are biased. All in all, we present three new sampling strategies to improve on both variance and efficiency on the balance heuristic using nonequal count of samples.
Our scheme requires the previous knowledge of several quantities, but those can be obtained in an adaptive way. The results also show that by a careful examination of the variance and properties of the estimators, even better estimators could be discovered in the future. We present examples that support our theoretical findings.
Introduction
The multiple importance sampling (MIS) estimator [1, 2], and in particular balance heuristic, which is equivalent to the Monte Carlo estimator with a mixture of probability density functions (pdfs), has been used for many years with a big success, being a reliable and robust estimator that allows an easy and straightforward combination of different sampling techniques. MIS with balance heuristic estimator has been almost exclusively used with equal count of samples for each technique, mainly following the recommendations based on certain quasioptimality theorems in Veach’s thesis. Specifically, these theorems stated that a balance heuristic estimator could not be worse than n times any other MIS estimator with equal count of samples where n is the number of combined methods. In papers addressing the variance of MIS, strategies were assumed to have equal number of samples [3, 4], and the combination of MIS with jittered sampling was studied in [5].
MIS is often used in rendering applications of computer graphics [6] where the reflected illumination is the product of the intensity in the illumination direction that can be represented by an environment map, and the probability of the reflection from a surface. As it is not feasible to sample with this product of two functions, sampling mimics either directions from where reflection is likely or directions of high intensity. The first approach is called light source sampling and the second approach is called BRDF sampling after the name of the bidirectional reflection distribution function (BRDF).
Lu et al. [7] propose an improvement to balance heuristic estimators for environment map illumination by using a Taylor’s second order approximation of the variance around the equal weights 1/2 to obtain the counts of samples according to the BRDF and the environment map, which is accurate only if the optimal sample numbers are not too far from equal sampling. Heuristics have also been used that prefer certain sampling methods based on the local properties, for example, BRDF sampling is advantageous on highly specular surfaces and light source sampling on diffuse surfaces. Pajot et al. proposed a framework, called representativity, to develop such heuristic [8]. Havran and Sbert [9, 10] introduce a heuristic for unbiased techniques based on assigning a count of samples proportional to the inverse of the variance of each technique that in some cases seemed to put into question the quasioptimality rules of equal sampling. More recently, based on the analysis of the variance of MIS estimators, Sbert et al. [11] discovered a MIS nonbalance heuristic estimator that is provably better than balance heuristic with equal count of samples. The introduction of costs into those schemes allowed an even bigger increase in efficiency. We improve on their work by clarifying the quasioptimality rules in Veach’s thesis by introducing new balance heuristic estimators that are provably better than balance heuristic with equal count of samples, and by introducing a new heuristic valid when the individual estimators are biased. We believe that our work can inspire new provably better estimators or at least sound heuristics for heuristically better estimators.
Putting our results in a wider perspective, we should consider adaptive variance reduction methods that learn some properties of the integrand, e.g., the location of its peak, and refine the sampling strategy onthefly by analyzing the samples that have been already generated. In case of mixing different pdfs with which sampling is straightforward, the weights of the individual methods are the target of the adaptation. Adaptive methods can be imagined as a sequence of Monte Carlo quadrature steps and parameter estimation steps. Care should be exercised to implement adaptive methods since making the sampling pdf dependent on the generated samples might make the estimator biased, which can be attacked by freezing the adaptation after certain samples letting the approach be consistent. Another challenge is the control of the adaptation with randomly sampled, i.e., noisy data. To handle this, adaptive MCMC approaches often choose an artificial but robust criterion, like the acceptance rate, instead of the variance of the estimation [12, 13]. In computer graphics, this technique has been used to control the mutation of MCMC methods [14, 15] and to distribute different numbers of samples among the different techniques [16].
Owen and Zhou [17] survey the principles of adaptively mixing different sampling strategies, including defensive importance sampling and the simultaneous application of control variates and importance sampling. They also investigate MIS and point out that there is a room for improvement. Douc et al. [18] investigated the population Monte Carlo method and derived sufficient convergence conditions for adaptive mixtures, and also iterative formulae for the optimal weights [19]. As population Monte Carlo explores the sampling space with a Markov stochastic process, it uses the information of only the current samples directly. Cornuet et al. [20] improve this in their adaptive multiple importance sampling algorithm and present optimal exploitation of previous samples, which automatically stabilizes the process. Marin et al. [21] argue that the Markov property is important to be able to prove the consistency of the estimator, and the stability of the process can be achieved by increasing the sample size in each iteration.
In their recent work, Martino, Elvira et al. proposed adaptive population importance sampling [22] and gradient adaptive population importance sampling [23] that adapt the mean and the covariance of proposal Gaussian pdfs, and reported significant improvements with respect to MIS that keeps the individual proposal pdfs fixed. We note that in computer graphics the not continuous integrands may pose problems to such gradientbased methods. Elvira et al. also examined a discrete weighting scheme [24] where only a subset of available sampling strategies is selected with the aim of reducing the computational complexity of the sampling process.
The rest of the paper is organized as follows: In Section 2, we review the basic formulae for the variance of MIS. In Section 3, we discuss Theorem 9.5 of Veach’s thesis and present a new proof for a slightly modified theorem. In Section 4, we present a multisample balance heuristic estimator that is provably better than multisample balance heuristic with equal count of samples. In Section 5, we obtain a provably better onesample balance heuristic estimator, and in Section 6, we justify a new heuristic for the count of samples that does not require the individual techniques be unbiased. In Section 7, we discuss the results, and in Section 8, summarize our conclusions. Some proofs are given in Appendices A and B, and numeric 1D examples are presented in Appendix C.
MIS variance analysis
We review here the variance of MIS estimators. The naming convention for the estimators is given in Table 1.
General multisample MIS estimator
The MIS estimator introduced by Veach and Guibas [1] to estimate the value of integral \(I = \int f(x) {\mathrm d} x\) has the following form:
where the weights w_{ i }(X_{ ij }) are such that
and
In this combination scheme, sampling method i uses probability density function p_{ i }(x) to generate N_{ i } number of random samples {X_{ ij }}, (j=1,…,N_{ i }). If we have n techniques, the total number of samples is \(\sum _{i=1}^{n} N_{i} =N\). Integral estimator F is unbiased, as its expected value μ is equal to integral I:
The variance of the estimator is given in the proof of Theorem 9.2 of Veach’s thesis [2]. Define F_{ ij } as
For a fixed method i and all j values, the estimators {F_{ ij }} are independent identically distributed random variables with expected value μ_{ i }:
Observe that
and that the variance of F_{ ij } is
If samples are statistically independent, the variance of the integral estimator is
There are two different components of a MIS estimator that determine the variance of the estimates: weighting scheme w_{ i } and the number of samples N_{ i } allocated to different methods. Weighting functions may or may not depend on the number of samples. Additionally, the number of samples can be predetermined, called the multisample model, or selected also randomly, which leads to the onesample model.
Balance heuristic estimator
Veach defined a balance heuristic estimator setting the weights as the following:
These weights lead to the following estimator:
Let us now consider the variance of the combined estimator when method i takes N_{ i } samples and predetermining the number of samples in advance (multisample model):
where α_{ k }=N_{ k }/N.
Balance heuristic with equal count of samples
Observe that if we take equal count of samples for each technique, i.e., for all i, N_{ i } = N/n or α_{ i } = 1/n, then the weights become
the estimator is
and the variance is equal to
where \(\sigma ^{2}_{i,{\text {eq}}}\) is equal to
and μ_{i,eq} is
A provably better nonbalance heuristic estimator
Sbert et al. [11] introduced an estimator provably better than balance heuristic with equal count of samples. Taking the weights as in Eq. 14, the estimator F in Eq. 1 is
The estimators F_{ ij } in Eq. 5 become
and the variances \(\sigma ^{2}_{i}\) defined by Eq. 9 are equal to \(\sigma ^{2}_{i,{\text {eq}}}\). The variance, Eq. 10, is then
Sbert et al. showed [11] that taking N_{ i }∝σ_{i,eq} guarantees that we improve on balance heuristic with equal sampling by using estimator Eq. 19, up to the statistical error in estimating the σ_{i,eq} values.
Taking into account the cost c_{ i } of sampling with technique i, the sample numbers that guarantee that we improve the efficiency (i.e., the inverse of cost times variance) over balance heuristic with equal sampling are
Breaking the bounds for the relative acceleration
In this section, we investigate the bounds for the improvement on the variance of balance heuristic estimator with equal count of samples.
This problem has been attacked by Veach establishing an inequality (Theorem 9.5 of his thesis [2]) for the variance of the balance heuristic estimator with equal count of sampling \({\hat F}_{{\text {eq}}}\):
where F is any multiple importance sampling estimator using the same total number of samples N. Veach interpreted Theorem 9.5 as a proof of quasioptimality of balance heuristic with equal count of samples, saying “According to this result, changing the N_{ i } can improve the variance by at most a factor of n, plus a small additive term. In contrast, a poor choice of the w_{ i } can increase variance by an arbitrary amount. Thus, the sample allocation is not as important as choosing a good combination strategy.”
The proof of Eq. 23 is based on the following inequality, which compares a general multiple importance sample estimator F with arbitrary number of samples {N_{ i }} with the same estimator (i.e., using the same weights w_{ i }) but with equal count of samples, F_{eq} as follows:
But this inequality is not valid when the weights w_{ i }(x) depend on the number of samples N_{ i }, see Appendix A for a proof. Just to give a single counter example, let us consider the case when zero variance estimator is possible by properly setting the number of samples, making V[F] zero, but the equal count of samples estimator will not have zero variance. However, we show that Theorem 9.5 can be generalized to such cases as well, but it requires the full reconsideration of the original proof.
The interpretation by Veach of Theorem 9.5 is based on the assumption that additive term μ^{2}(n−1)/N is small if the total number of samples, N, gets larger. However, denominator N is implicitly included in the other terms of Eq. 23 as well, thus the considered additive term is, in fact, not negligible. As a result, the selection of N_{ i } sample numbers or weights α_{ i } can make a significant difference in the variance, which is worth examining and opens possibilities to find better estimators.
Onesample balance heuristic
The general MIS onesample primary estimator is
where technique i is selected with probability α_{ i }. It can be easily shown that it is unbiased, i.e., its expected value is μ. Using the balance heuristic weights,
the estimator becomes the onesample balance heuristic estimator,
Onesample balance heuristic is the same as the Monte Carlo estimator using the mixture of probabilities \(p(x)~=~\sum _{k~=~1}^{n} \alpha _{k} p_{k}(x)\), \(\sum _{k~=~1}^{n} \alpha _{k}~=~1\). The α_{ i } values are called the mixture coefficients and represent the average count of samples from each technique. The variance of this estimator can be obtained by the application of the definition of variance,
Theorem 1
If \(V\left [\!\hat {\mathcal {F}}_{{\text {eq}}}^{1}\right ]\) is the variance of the onesample balance heuristic estimator with equal weights, and \(V\left [\!\hat {\mathcal {F}}^{1}\right ]\) the variance of the onesample balance heuristic estimator with any distribution of weights {α_{ k }}, then the following inequality holds:
Proof
The variance of the onesample balance heuristic with equal weights is
As \(\alpha _{\max } \sum _{k=1}^{n} p_{k}(x) \ge \sum _{k=1}^{n} \alpha _{k} p_{k}(x)\) where α_{max}>0 is the maximum of the values of α_{ k } and p_{ k } is not negative, we have \(\frac {1}{n} \alpha _{\max } \sum _{k=1}^{n} p_{k}(x) \ge \frac {1}{n} \sum _{k=1}^{n} \alpha _{k} p_{k}(x)\) and thus:
□
When for all i, α_{ i }=1/n, Eq. 29 becomes an equality.Observing that α_{max}≤1, the following corollary is immediate.
Corollary 1
Equations 29 and 32 do not imply that the improvement with respect to the equal count of samples is limited by a factor of n, since μ can be large in comparison with the variances. So, it is worth trying to obtain α_{ i } values that can reduce the variance.
Equations 29 and 32 can be extended to the onesample MIS in general using Theorem 9.4 of Veach’s thesis, which states that the variance of the onesample MIS estimator is minimal when it is a balance heuristic one, i.e., \(V[\hat {\mathcal {F}}] \le V[{\mathcal {F}}]\). For instance, for Eq. 32
And for N samples,
Observe that Eq. 34 is similar to Eq. 23 except for the fact that it holds for the onesample estimator. Using \(V[\!\hat {F}] \le V[\!\hat {\mathcal {F}}]\) (as shown in [11]) we also have:
Finally, note that we do not state that Eq. 23 is invalid, but that its proof by Veach in his thesis [2] is wrong and also his interpretation is not correct. On the other hand, we have proved a formally identical relationship, Eq. 34, for the onesample MIS estimator instead of the multisample estimator.
A provably better balanceheuristic multisample estimator
In Section 1, we reviewed two strategies for the allocation of samples to individual methods, balance heuristic with equal count of samples (Section 2.2.1) and nonbalance heuristic with sample numbers proportional to the variance (Section 2.3). The variance of the latter is better than that of the equal count estimator. This section shows that we can do it even better if the sample allocation strategy of Section 2.3 is applied to balance heuristic.
We can use Theorem 9.2 from Veach’s thesis, which states that for any MIS estimator F, the balance heuristic estimator \(\hat {F}\) with the same count of samples as F has always less variance up to an asymptotically decreasing factor, i.e.,
to obtain a provably better estimator than the one described in Section 2.3. Let us now take the estimator F as Eq. 19 from Section 2.3 where N_{ i }∝σ_{i,eq}. This is provably better than the balance heuristic with equal count of samples, \(\hat {F}_{{\text {eq}}}\), i.e., \(V[{F}] \le V[\hat {F}_{{\text {eq}}}]\). Thus,
Observe that \(\hat {F}\) is the balance heuristic estimator defined in Eq. 12 where the N_{ i }∝σ_{i,eq}. Similar results can be obtained for the case when the sampling cost is also taken into account, i.e. consider now as F the MIS estimator with \(N_{i} \propto {\sigma _{i,{\text {eq}}}}/{\sqrt {c_{i}}}\).
Estimator \(\hat {F}\) is now the balance heuristic estimator defined in Eq. 12 with \(N_{i} \propto {\sigma _{i,{\text {eq}}}}/{\sqrt {c_{i}}}\).
Discussion
Let us rewrite Eq. 37 with variances \(V_{1}[\!\hat {F}], V_{1}[\!{F}]\), and \(V_{1}[\!\hat {F}_{{\text {eq}}}]\) normalized to N = 1, i.e., \(V[\!\hat {F}]=V_{1}[\!\hat {F}]/N, V[\!{F}]~=~V_{1}[\!{F}]/N, V[\!\hat {F}_{{\text {eq}}}]~=~V_{1}[\!\hat {F}_{{\text {eq}}}]/N\):
The efficiency in this case is
Thus, the relative difference between variances will be constant disregarding N. This is a similar situation to that encountered in Section 3.
A provably better onesample balance heuristic estimator
The variance of the estimator in Eq. 25 is the second moment minus the square of μ:
where the second moments \(M_{i}^{2}\) are defined as
Observe that, as μ^{2} is a constant, minimizing \(V[{\mathcal {F}}^{1}]\) means minimizing the first term of the last equality in Eq. 41. When the weights {w_{ i }(x)} are independent of the coefficients {α_{ i }}, we get the following theorem:
Theorem 2
When the weights {w_{ i }(x)} are independent of the {α_{ i }}, the minimum variance in Eq. 41 is obtained when
Proof
The proof fully given in Appendix B follows from applying the Lagrange multipliers method to the target function:\(\Lambda (\alpha _{i}, \lambda)=\sum _{i=1}^{n} \frac {1}{\alpha _{i}} M_{i}^{2}  \mu ^{2} + \lambda \left (\sum _{k=1}^{n} \alpha _{k} 1\right)\). □
Let us use the weights of Section 2.3, i.e., \(w_{i}(x) = {p_{i}(x)}/{\sum _{k=1}^{n} p_{k}(x)}\). These weights do not depend on the {α_{ i }}, and thus we can apply Theorem 2. The minimum variance is then given by substituting the optimal {α_{ i }} from Eq. 43 to Eq. 41, resulting in
where the M_{i,eq} values are the square root of
obtained by substituting the weights of Section 2.3 into Eq. 42. As they correspond to the minimum variance, the values obtained in Eq. 43 will provide a variance less than using any other count of samples, in particular when taking equal count of samples, thus we get
Observe that the right side in Eq. 46 also corresponds to the variance of the onesample balance heuristic estimator with equal count of samples, i.e., for all i, α_{ i } = 1/n. Thus, we can guarantee that the onesample estimator given by Eq. 25 using weights of Section 2.3 is better than the onesample balance heuristic estimator with equal count of samples. Applying now Theorem 9.4 of Veach’s thesis, that reads
we have that the onesample balance heuristic estimator
using α_{ i }∝M_{i,eq} weights is guaranteed to have less variance than the onesample balance heuristic estimator with equal count of samples. If we take into account the costs, we have the following theorem:
Theorem 3
When the weights {w_{ i }(x)} are independent of the {α_{ i }}, the maximum efficiency of MIS onesample estimator is obtained when
See Appendix B for a proof. Observe, however, that Eq. 49 is not an explicit expression for α_{ i } since it contains these values in the right hand side, as \(C_{T} = \sum _{k~=~1}^{n} \alpha _{k} c_{k}\). Thus, some approximation is needed in practice. One possibility is to take
another is to substitute in the right side of Eq. 49 the optimal values independent of the costs, i.e., α_{ i }∝M_{ i }. Under certain circumstances, we can guarantee that the onesample balance heuristic estimator using the values
is more efficient than the onesample balance heuristic with equal count of sampling.
Theorem 4
The onesample balance heuristic estimator using the {α_{ i }} values in Eq. 51 is more efficient than the onesample balance heuristic estimator with equal count of sampling when the {α_{ i }} values are decreasing with costs {c_{ i }}.
A proof can be found in Appendix B. The estimator would be now
When the variation of M_{i,eq} values is small relatively to the variation of c_{ i } values, then we can consider that \(\alpha _{i} \propto {M_{i,{\text {eq}}}}/{\sqrt {c_{i}}}\) values are decreasing with c_{ i }, and the efficiency of MIS estimator using \(\alpha _{i} \propto {M_{i,{\text {eq}}}}/{\sqrt {c_{i}}}\) would be better than taking α_{ i } = 1/n.
Discussion
One might ask why we are interested in the onesample estimator when its variance is always higher than its multisample version due to the additional randomization. In addition, as the costs C_{ T } are the same, as we use on average the same number of samples from each technique for both the onesample and the multisample estimators, we also have \(C_{T} V[\!{F}] \le C_{T} V[\!\mathcal {F}]\), this is, given a onesample estimator, its corresponding multisample counterpart will always be more efficient. However, there might be cases where only the onesample estimator is applicable, e.g., when we do not know a priori the number of samples or we need just a single sample.
A new heuristic
In [9, 10] the heuristic α_{ i }∝1/v_{ i } was presented, where \(v_{i}=\int \frac {f^{2}(x)}{p_{i}(x)} {\mathrm {d}}x \mu ^{2}\) is the variance of technique i. This heuristic is only valid when all techniques p_{ i }(x) are unbiased. We will present here a generalized heuristic that is valid even when the techniques are biased and based on the following theorems.
Theorem 5
The variance \(V[\hat {\mathcal {F}}^{1}]\) is bounded by
where \(m_{k}^{2} = \int \frac {f^{2}(x)}{p_{k}(x)} {\mathrm {d}}x\) is the second moment of technique k.
Proof
Using the inequality between harmonic and arithmetic means, we get
Observe that we do not require the {p_{ k }} techniques be unbiased. □
If all techniques are unbiased, then
which follows from that μ^{2} can be written as \(\sum _{k} \alpha _{k} \mu ^{2}\). Note that from \(V[\!\hat {F}] \le V[\!\hat {\mathcal {F}}]\), the bound in Eqs. 53 and 55 also holds for \(V[\!\hat {F}]\).
Observe also that if \(m_{\max }^{2} = \max _{k} \left \{m_{k}^{2}\right \}\) and \(v_{\max } = \max _{k}^{2} \{v_{k}\}\), then from Eq. 53
and in case of unbiased techniques from Eq. 55
That is, whichever combination {α_{ k }} is taken, we cannot make it worse than just using the worst technique.
Let us use Theorem 5 to obtain a heuristic valid for biased techniques.
Theorem 6
Consider a function f(x) and n not necessarily unbiased Monte Carlo integration techniques, {p_{ i }(x)}, with second moments \(\left \{m_{i}^{2}\right \}\), and the sequence of n positive values {α_{ i }}, \(\sum _{i} \alpha _{i}=1\). The rearrangement that minimizes the bound in Eq. 53 happens when the {α_{ i }} are ordered inversely to \(\left \{m_{i}^{2}\right \}\).
Proof
As μ is a constant value only depending on f(x), the rearrangement that minimizes the bound in Eq. 53 is the one that minimizes \(\sum _{k=1}^{n} \alpha _{k} m_{k}^{2}\), and by virtue of the rearrangement inequality [25, page 261][26], it happens when the {α_{ i }} are ordered inversely to \(\left \{m_{i}^{2}\right \}\). □
Theorem 6 justifies the introduction of the new heuristic \(\alpha _{i} \propto {1}/{m_{i}^{2}}\). We can extend this heuristic to take the costs of the techniques into account, in the same way as in [9] and [10] by using \(\alpha _{i} \propto {1}/{\left (c_{i} m_{i}^{2}\right)}\).
Results
1D functions
We have compared the following estimators: equal count of samples with balance heuristic, count inversely proportional to variances of independent techniques with balance heuristic [9, 10], optimal count nonbalance heuristic in [11], and the new balance heuristic estimators defined in Sections 4, 5, and 6 using values α_{ k }∝σ_{k,eq},α_{ k }∝M_{k,eq}, and \(\alpha _{k} \propto \frac {1}{m^{2}_{k}}\) for equal cost of sampling, and \(\alpha _{k} \propto \frac {\sigma _{k,eq}}{\sqrt {c_{k}}}, \alpha _{k} \propto \frac {M_{k,eq}}{\sqrt {c_{k}}}\), and \(\alpha _{k} \propto \frac {1}{c_{k} m^{2}_{k}}\) for unequal cost of sampling. The new estimators are tried for both the onesample and the multisample case. The results for 1D example functions and pdfs of Fig. 1 are presented in Appendix C and summarized in Tables 2, 3, 4, 5, and 6. We additionally provide in supplemental material the Mathematica code to try any function with any pdfs. In the obtained results, we can see the following patterns:

As expected [11] multisample estimators are always better than the corresponding onesample estimators with the same distribution of samples

As expected the multisample estimators defined in [11] (Section 2.2.1) are better than balance heuristic with equal count of samples, both in variance and efficiency.

The new multisample estimators defined in Section 4 have similar variance values than balance heuristic with equal number of samples, but have better values for efficiency, both for the onesample and the multisample estimator.

The new multisample estimators defined in Section 4 provide similar variance as the ones in [11] but better efficiency, except in the fourth example where the efficiency is slightly worse.

The optimal distribution of samples obtained in Section 5 performs always better for both the onesample and the multisample estimator cases than the balance heuristic estimator with equal count of samples, in both variance and efficiency. This improvement is notable, especially when we consider that example 4 is particularly well suited for the equal count of sampling.

The multisample estimators using the optimal distribution of samples obtained in Section 5 perform better than the multisample estimators defined in [11], in both variance and efficiency.

The heuristically based estimators defined in [9] and in Section 6 give the best results for the first three examples, but are by far the worst for the fourth example.

The heuristically based estimators for unbiased techniques defined in [9] and [10] give better results than the one defined in Section 6 for possibly biased techniques, except for the third example, where the new heuristic performs much better.

We obtain in one case an acceleration of more than 30 times over the variance of the balance heuristic estimator with equal count of samples.
Adaptive results for 1D functions
We present also results for the four 1D examples computed with adaptive sampling algorithms. A first batch using equal number of samples from each technique computes order 1 approximation of the different quantities, which in turn allows us to obtain a first approximation of the α values needed for the second batch. The samples of batch \({\mathcal {N}}+1\) are obtained with the order \({\mathcal {N}}\) approximation of the α values, and incrementally update the \({\mathcal {N}}\)order quantities. We provide in supplemental material the C++ code to compute all estimators adaptively. The pseudocode for the estimators defined in [9] and [10] can be found in the supplementary material of [10]. The charts with variances normalized to 1 sample, (i.e., variance times the number of samples), and variances times cost (i.e., inefficiencies) for specific costs as described in the Appendix C, and averaged for 100 runs are shown in Figs. 2 and 3 for samples count from 35 to 5000.
2D analytical environment map example with LafortunePhong BRDF model
Let us consider here the BRDF formula for the physically based variant of the Phong model presented by Lafortune [27], assuming the case when the outgoing direction ω_{ o } is the surface normal:
where ρ_{ d } is the diffuse albedo, ρ_{ s } is the specular albedo, m is the shininess, and θ is the angle between incident direction ω_{ i } and the surface normal.
Let us consider also an environment map of intensity R(ω_{ i })=R· cosθ. As dω= sinθdθdϕ the outgoing radiance is given then by the integral for the hemisphere in spherical parametrization θ,ϕ of the incident direction
and integrating over ϕ we get
By the variable substitution of x= cosθ, we obtain
We have thus the integral of the product p_{1}(x)p_{2}(x), where p_{1}(x) corresponds to the BRDF times x= cosθ,
and p_{2}(x)=x is the environment map.
We used typical values, m = 5, ρ_{ d } = ρ_{ s } = 0.5, and compared for equal cost and for the empirically obtained sampling costs, c_{1} = 1,c_{2} = 4.8. The detailed results are in the last columns of Tables 2, 3, 4, 5, and in the last two columns of Table 6. The results confirm the patterns found in our 1D examples.
Conclusions
This paper discussed an important topic of MIS estimators and the limited acceleration over equal count of samples balance heuristic. We have shown that in reality, this bound on acceleration does not hold, which justifies the search for better and more efficient estimators. We have obtained, both without and with taking into account the cost of sampling, new balance heuristic estimators that are provably better than balance heuristic with equal count of samples, and new heuristic estimators valid even when the independent techniques are biased. We have analyzed their behavior with 1D examples and with a 2D environment map example.
Appendix A: Inequality in Theorem 9.5 of Veach’s thesis does not hold for balance heuristic
Let us repeat first the proof of Eq. 24 such as it appears in Veach’s thesis
where V[ F_{eq}] is computed assuming the same weights as V[ F] but with equal count of samples. Let us see that inequality 61 does not hold when the weights w_{ i }(x) depend on the N_{ i }. We will see it for the balance heuristic estimator, where the weights are defined by
Eq. 61 would be
as the last but two expression is clearly not the variance of a balance heuristic estimator with equal count of samples (Eq. 16). Inequality 61 thus holds only when the weights w_{ i }(x) do not depend on counts N_{ i }, and thus the proof of Theorem 9.5 is not valid in this case.
Appendix B: Section 5 proofs
We present the proof of Theorem 2. This proof is similar to the proof of Theorem 2 in [11].
Proof
To optimize the variance \(V\big [\!\hat {\mathcal {F}^{1}}\big ] \,=\, \sum _{i=1}^{n} \frac {1}{\alpha _{i}} M^{2}_{i}\mu ^{2}\), we use Lagrange multipliers with the constraint \(\sum _{i=1}^{n} \alpha _{i} = 1\) and objective function
Computing the partial derivatives and equating them to zero, as the M_{ i } do not depend on the α_{ i }, we get
Thus for all j values \(\lambda = {M^{2}_{j}}/{\alpha _{j}^{2}}\), which implies that α_{ j }∝M_{ j }. The Hessian matrix, obtained with the second derivatives of \(V[\hat {\mathcal {F}^{1}}]\), is a diagonal matrix with positive diagonal values
and thus is positivedefinite. The variance function is then strictly convex in its convex domain \(\sum _{i=1}^{n} \alpha _{i} = 1\), where for all i, 0<α_{ i }<1, meaning that the critical point is unique and a minimum.
Substituting the optimal values, i.e., α_{ j }∝M_{ j }, we find the minimum variance,
□
We present now the proof of Eq. 49.
Proof
Let us consider c_{ i } the cost of sampling with technique i. The average cost (or cost for one total sample) is thus \(C_{T} = \sum _{i} \alpha _{i} c_{i}\). We want to minimize now the cost times variance, i.e., \(C_{T} \times V\left [\!{\mathcal F}^{1}\right ]\), which is the inverse of efficiency. We use Lagrange multipliers with the constraint \(\sum _{i=1}^{n} \alpha _{i} = 1\) and objective function
Taking partial derivatives and equating them to zero, as the M_{ i } do not depend on the α_{ i },
Multiplying by α_{ j } and adding over j, we obtain
and thus λ=C_{ T }μ^{2}. Substituting this value in Eq. 68 we find that the optimal sampling counts are the ones in Eq. 49. □
We present here the proof of Theorem 4.
Proof
We compare first the efficiency of the general onesample MIS estimator with weights given by Eq. 14 and \(\alpha _{i} \propto {M_{i,{\text {eq}}}}/{\sqrt {c_{i}}}\), with the balance heuristic estimator with equal count of samples. The inverse of efficiency, i.e., cost times variance, is given in both cases by Eq. 70:
Let us compare first the terms corresponding to the second moments, i.e., the term \(\left (\sum _{i=1}^{n} \alpha _{i} c_{i}\right) \left (\sum _{i=1}^{n} \frac {M_{i,{\text {eq}}}^{2}}{\alpha _{i}}\right)\), for α_{ i }=1/n and \(\alpha _{i} \propto {M_{i,{\text {eq}}}}/{\sqrt {c_{i}}}\). Using CauchySchwartz inequality, we can easily check that Eq. 71 is true, where the first term is the second moment for \(\alpha _{i} \propto {M_{i,{\text {eq}}}}/{\sqrt {c_{i}}}\) and the second term for α_{ i }=1/n, because it holds that
Let us consider now the term of Eq. 70 that contains μ^{2}, i.e., \(\left (\sum _{i=1}^{n} \alpha _{i} c_{i}\right) \mu ^{2}\). Whenever the α_{ i } values are decreasing with c_{ i }, the following inequality can be proved [26, 28].
Thus, we can guarantee that the general MIS estimator with weights w_{ i }(x) given by Eq. 14 and \(\alpha _{i} \propto {M_{i,eq}}/{\sqrt {c_{i}}}\) is more efficient than the onesample balance heuristic estimator with equal count of samples whenever the α_{ i } values are decreasing with c_{ i }. Considering now the onesample balance heuristic estimator with \(\alpha _{i} \propto {M_{i,{\text {eq}}}}/{\sqrt {c_{i}}}\), as the costs are the same and the variance is less than the general MIS estimator considered by virtue of Theorem 9.4 in Veach’s thesis, its efficiency is higher, and in turn, higher than the balance heuristic estimator with equal count of sampling. □
Appendix C: 1D examples
We compare the results for the functions and pdfs in Fig. 1 with the following estimators: equal count of samples, count inversely proportional to variances of independent techniques [9], the new heuristic defined in Section 6 of this paper, optimal count in [11], and the two new balance heuristic provably better estimators defined in Sections 4 and 5. These two new estimators are tried for both onesample and multisample.
Example 1
Suppose we want to solve the integral
by MIS sampling on functions x, \(\left (x^{2}\frac {x}{\pi }\right)\), and sin(x), respectively. We first find the normalization constants: \(\int _{\frac {3}{2\pi }}^{\pi } x {\mathrm {d}}x = 4.82, \int _{\frac {3}{2\pi }}^{\pi } \left (x^{2}\frac {x}{\pi }\right) {\mathrm d}x = 8.76, \int _{\frac {3}{2\pi }}^{\pi } \sin (x) {\mathrm {d}}x=1.89\). We take into account the sampling costs given in [9], i.e., c_{1} = 1,c_{2} = 6.24,c_{3} = 3.28.
Example 2
Let us solve the integral
using the same functions x, \(\left (x^{2}\frac {x}{\pi }\right)\), and sin(x) as before.
Example 3
As the third example, let us solve the integral
using the same functions as before.
Example 4
As the last example, consider the integral of the sum of the three pdfs
In this case, we know the optimal (zero variance) α values: (0.3,0.3,0.4). This case should be most favorable to equal count of samples.
References
 1
E Veach, LJ Guibas, in Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’95. Optimally Combining Sampling Techniques for Monte Carlo Rendering (ACMNew York, 1995), pp. 419–428. https://doi.org/10.1145/218380.218498. http://doi.acm.org/10.1145/218380.218498.
 2
E Veach, Robust Monte Carlo Methods for light transport simulation. PhD thesis, Stanford University (1997).
 3
V Elvira, L Martino, D Luengo, MF Bugallo, Efficient multiple importance sampling estimators. IEEE Signal Process. Lett.22(10), 1757–1761 (2015). https://doi.org/10.1109/LSP.2015.2432078.
 4
V Elvira, L Martino, D Luengo, MF Bugallo, Generalized multiple importance sampling. ArXiv eprints (2015). http://arxiv.org/abs/1511.03095.
 5
K Subr, D Nowrouzezahrai, W Jarosz, J Kautz, K Mitchell, Error analysis of estimators that use combinations of stochastic sampling strategies for direct illumination. Comput. Graph. Forum Proc. EGSR. 33(4), 93–102 (2014). https://doi.org/10.1111/cgf.12416.
 6
M Pharr, G Humphreys, Physically based rendering, Second Edition: From Theory To Implementation, 2nd edn (Morgan Kaufmann Publishers Inc., San Francisco, 2010).
 7
H Lu, R Pacanowski, X Granier, Secondorder approximation for variance reduction in multiple importance sampling. Comput. Graph. Forum. 32(7), 131–136 (2013). https://doi.org/10.1111/cgf.12220.
 8
A Pajot, L Barthe, M Paulin, P Poulin, Representativity for robust and adaptive multiple importance sampling. IEEE Trans. Vis. Comput. Graph.17(8), 1108–1121 (2011). https://doi.org/10.1109/TVCG.2010.230.
 9
V Havran, M Sbert, in Proceedings of the 13th ACM SIGGRAPH International Conference on VirtualReality Continuum and Its Applications in Industry, VRCAI ’14. Optimal Combination of Techniques in Multiple Importance Sampling (ACMNew York, 2014), pp. 141–150. https://doi.org/10.1145/2670473.2670496. http://doi.acm.org/10.1145/2670473.2670496.
 10
M Sbert, V Havran, Adaptive multiple importance sampling for general functions. Vis. Comput.1–11 (2017). https://doi.org/10.1007/s0037101713981.
 11
M Sbert, V Havran, L SzirmayKalos, Variance analysis of multisample and onesample multiple importance sampling. Comput. Graph. Forum. 35(7), 451–460 (2016). https://doi.org/10.1111/cgf.13042.
 12
GO Roberts, A Gelman, WR Gilks, Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab.7:, 110–120 (1997).
 13
C Andrieu, J Thoms, A tutorial on adaptive MCMC. Stat. Comput.18(4), 343–373 (2008).
 14
T Hachisuka, HW Jensen, Robust adaptive photon tracing using photon path visibility. ACM Trans. Graph.30(5), 114–111411 (2011). https://doi.org/10.1145/2019627.2019633.
 15
L SzirmayKalos, L Szécsi, Improved stratification for metropolis light transport. Comput. Graph.68(Supplement C), 11–20 (2017). https://doi.org/10.1016/j.cag.2017.07.032.
 16
T Hachisuka, AS Kaplanyan, C Dachsbacher, Multiplexed metropolis light transport. ACM Trans. Graph.33(4), 100–110010 (2014). https://doi.org/10.1145/2601097.2601138.
 17
A Owen, Y Zhou, Safe and Effective Importance Sampling. J. Am. Stat. Assoc.95:, 135–143 (2000).
 18
R Douc, A Guillin, JM Marin, CP Robert, Convergence of Adaptive Mixtures of Importance Sampling Schemes. Ann. Stat.35(1), 420–448 (2007). https://doi.org/10.1214/009053606000001154.
 19
R Douc, A Guillin, JM Marin, CP Robert, Minimum variance importance sampling via population monte carlo. ESAIM: Probab. Stat.11:, 424–447 (2007).
 20
JM Cornuet, JM Marin, A Mira, CP Robert, Adaptive multiple importance sampling. Scand. J. Stat.39(4), 798–812 (2012). https://doi.org/10.1111/j.14679469.2011.00756.x.
 21
JM Marin, P Pudlo, M Sedki, Consistency of the adaptive multiple importance sampling. Preprint arXiv:1211.2548 (2012). http://arxiv.org/abs/1211.2548.
 22
L Martino, V Elvira, D Luengo, J Corander, An adaptive population importance sampler: Learning from uncertainty. IEEE Trans. Signal Process. 63(16), 4422–4437 (2015). https://doi.org/10.1109/TSP.2015.2440215.
 23
V Elvira, L Martino, D Luengo, J Corander, in Proceedings of ICASSP 2015. A gradient adaptive population importance sampler (IEEE, Brisbane, 2015), pp. 4075–4079.
 24
V Elvira, L Martino, D Luengo, MF Bugallo, Efficient multiple importance sampling estimators. IEEE Signal Process. Lett. 22(10), 1757–1761 (2015). https://doi.org/10.1109/LSP.2015.2432078.
 25
GH Hardy, JE Littlewood, G Pólya, Inequalities (Cambridge University Press, Cambridge, 1952).
 26
M Sbert, J Poch, A necessary and sufficient condition for the inequality of generalized weighted means. J. Inequalities Appl.2016(1), 292 (2016). https://doi.org/10.1186/s1366001612337.
 27
EP Lafortune, YD Willems, Using the modified Phong reflectance model for physically based rendering. Technical Report report CW197, Dept. of Computer Science, K.U.Leuven (1994).
 28
MM Marjanović, Z Kadelburg, A proof of chebyshev’s inequality. Teach. Math.X(2), 107–108 (2007).
Acknowledgements
The authors acknowledge the comments by anonymous reviewers that helped to improve a preliminary version of the paper.
Funding
The authors are funded in part by Czech Science Foundation research program GA1419213S by grant TIN201675866C33R from the Spanish Government and by grant OTKA K124124 and VKSZ14 PET/MRI 7T.
Author information
Affiliations
Contributions
Equal contribution from all authors. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Mateu Sbert.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Sbert, M., Havran, V. & SzirmayKalos, L. Multiple importance sampling revisited: breaking the bounds. EURASIP J. Adv. Signal Process. 2018, 15 (2018) doi:10.1186/s1363401805312
Received
Accepted
Published
DOI
Keywords
 Global illumination
 Rendering equation analysis
 Multiple importance sampling
 Monte Carlo
AMS Subject Classification
 primary Computer Graphics G.3 Mathematics of Computing/PROBABILITY AND STATISTICS probabilistic algorithms
 Computer Graphics G.3 Mathematics of Computing / PROBABILITY AND STATISTICS probabilistic algorithms
 secondary Computer Graphics G.3 Mathematics of Computing / PROBABILITY AND STATISTICS probabilistic algorithms