EURASIP Journal on Applied Signal Processing 2005:15, 2470–2485 c ○ 2005 Hindawi Publishing Corporation Cosmological Non-Gaussian Signature Detection: Comparing Performance of Different Statistical Tests

Currently, it appears that the best method for non-Gaussianity detection in the cosmic microwave background (CMB) consists in calculating the kurtosis of the wavelet coefficients. We know that wavelet-kurtosis outperforms other methods such as the bispectrum, the genus, ridgelet-kurtosis, and curvelet-kurtosis on an empirical basis, but relatively few studies have compared other transform-based statistics, such as extreme values, or more recent tools such as higher criticism (HC), or proposed "best possible" choices for such statistics. In this paper, we consider two models for transform-domain coefficients: (a) a power-law model, which seems suited to the wavelet coefficients of simulated cosmic strings, and (b) a sparse mixture model, which seems suitable for the curvelet coefficients of filamentary structure. For model (a), if power-law behavior holds with finiteth moment, excess kurtosis is an asymptotically optimal detector, but if theth moment is not finite, a test based on extreme values is asymptotically optimal. For model (b), if the transform coefficients are very sparse, a recent test, higher criticism, is an optimal detector, but if they are dense, kurtosis is an optimal detector. Empirical wavelet coefficients of simulated cosmic strings have power-law character, infiniteth moment, while curvelet coefficients of the simulated cosmic strings are not very sparse. In all cases, excess kurtosis seems to be an effective test in moderate-resolution imagery.


Introduction
The Cosmic Microwave Background (CMB), discovered in 1965 by Penzias and Wilson [39], is a relic of radiation emitted some 13 billion years ago, when the Universe was about 370.000 years old. This radiation exhibits characteristic of an almost perfect blackbody at a temperature of 2.726 Kelvin as measured by the FIRAS experiment on board COBE satellite [20]. The DMR experiment, again on board COBE, detected and measured angular small fluctuations of this temperature, at the level of a few tens of micro Kelvin, and at angular scale of about 10 degrees [43]. These so-called temperature anisotropies were predicted as the imprints of the initial density perturbations which gave rise to present large scale structures as galaxies and clusters of galaxies. This relation between the present-day universe and its initial conditions has made the CMB radiation one of the preferred tools of cosmologists to understand the history of the universe, the formation and evolution of the cosmic structures and physical processes responsible for them and for their clustering.
As a consequence, the last several years have been a particularly exciting period for observational cosmology focussing on the CMB. With CMB balloon-borne and ground-based experiments such as TOCO [36], BOOMERanG [16], MAXIMA [24], DASI [23] and Archeops [8], a firm detection of the so-called "first peak" in the CMB anisotropy angular power spectrum at the degree scale was obtained. This detection was very recently confirmed by the WMAP satellite [7], which detected also the second and third peaks. WMAP satellite mapped the CMB temperature fluctuations with a resolution better that 15 arc-minutes and a very good accuracy marking the starting point of a new era of precision cosmology that enables us to use the CMB anisotropy measurements to constrain the cosmological parameters and the underlying theoretical models. Figure 1: Courtesy of the WMAP team (reference to the website). All sky map of the CMB anisotropies measured by the WMAP satellite.
In the framework of adiabatic cold dark matter models, the position, amplitude and width of the first peak indeed provide strong evidence for the inflationary predictions of a flat universe and a scale-invariant primordial spectrum for the density perturbations. Furthermore, the presence of second and third peaks, confirm the theoretical prediction of acoustic oscillations in the primeval plasma and shed new light on various cosmological and inflationary parameters, in particular, the baryonic content of the universe. The accurate measurements of both the temperature anisotropies and polarised emission of the CMB will enable us in the very near future to break some of the degeneracies that are still affecting parameter estimation. It will also allow us to probe more directly the inflationary paradigm favored by the present observations.
Testing the inflationary paradigm can also be achieved through detailed study of the statistical nature of the CMB anisotropy distribution. In the simplest inflation models, the distribution of CMB temperature fluctuations should be Gaussian, and this Gaussian field is completely determined by its power spectrum. However, many models such as multi-field inflation (e.g. [9] and references therein), super strings or topological defects, predict non-Gaussian contributions to the initial fluctuations [33,28,22]. The statistical properties of the CMB should discriminate models of the early universe. Nevertheless, secondary effects like the inverse Compton scattering, the Doppler effect, lensing and others add their own contributions to the total non-Gaussianity.
All these sources of non-Gaussian signatures might have different origins and thus different statistical and morphological characteristics. It is therefore not surprising that a large number of studies have recently been devoted to the subject of the detection of non-Gaussian signatures. Many approaches have been investigated: Minkowski functionals and the morphological statistics [37,41], the bispectrum (3-point estimator in the Fourier domain) [11,49,40], the trispectrum (4-point estimator in the Fourier domain) [31], wavelet transforms [1,21,26,5,15,29,44], and the curvelet transform [44]. Different wavelet methods have been studied, such as the isotropic a trous algorithm [46] and the bi-orthogonal wavelet transform [34]. (The bi-orthogonal wavelet transform was found to be the most sensitive to non-Gaussianity [44]). In [3,44], it was shown that the wavelet transform was a very powerful tool to detect the non-Gaussian signatures. Indeed, the excess kurtosis (4th moment) of the wavelet coefficients outperformed all the other methods (when the signal is characterised by a non-zero 4th moment).
Nevertheless, a major issue of the non-Gaussian studies in CMB remains our ability to disentangle all the sources of non-Gaussianity from one another. Recent progress has been made on the discrimination between different possible origins of non-Gaussianity. Namely, it was possible to separate the non-Gaussian signatures associated with topological defects (cosmic strings (CS)) from those due to Doppler effect of moving clusters of galaxies (both dominated by a Gaussian CMB field) by combining the excess kurtosis derived from both the wavelet and the curvelet transforms [44].
This success argues for us to construct a "toolkit" of well-understood and sensitive methods for probing different aspects of the non-Gaussian signatures.
In that spirit, the goal of the present study is to consider the advantages and limitations of detectors which apply kurtosis to transform coefficients of image data. We will study plausible models for transform coefficients of image data and compare the performance of tests based on kurtosis of transform coefficients to other types of statistical diagnostics.
At the center of our analysis are two facts [A] The wavelet/curvelet coefficients of CMB are Gaussian (we implicitly assume the most simple inflationary scenario).
[B] The wavelet/curvelet coefficients of topological defect and Doppler effect simulations are non-Gaussian.
We develop tests for non-Gaussianity for two models of statistical behavior of transform coefficients. The first, better suited for wavelet analysis, models transform coefficients of cosmic strings as following a power law. The second, theoretically better suited for curvelet coefficients, assumes that the salient features of interest are actually filamentary (it can be residual strips due do a non perfect calibration), which gives the curvelet coefficients a sparse structure.
We review some basic ideas from detection theory, such as likelihood ratio detectors, and explain why we prefer non-parametric detectors, valid across a broad range of assumptions.
In the power-law setting, we consider two kinds of non-parametric detectors. The first, based on kurtosis, is asymptotically optimal in the class of weakly dependent symmetric non-Gaussian contamination with finite 8-th moments. The second, the Max, is shown to be asymptotically optimal in the class of weakly dependent symmetric non-Gaussian contamination with infinite 8-th moment. While the evidence seems to be that wavelet coefficients of CS have about 6 existing moments -indicating a decisive advantage for extreme-value statistics -the performance of kurtosis-based tests and Max-based tests on moderate sample sizes (eg. 64K transform coefficients) does not follow the asymptotic theory; excess kurtosis works better at these sample sizes.
In the sparse-coefficients setting, we consider kurtosis, the Max, and a recent statistic called Higher Criticism (HC) [19]. Theoretical analysis suggests that curvelet coefficients of filamentary features should be sparse, with about n 1/4 substantial nonzero coefficients out of n coefficients in a subband; this level of sparsity would argue in favor of Max/HC. However, empirically, the curvelet coefficients of actual CS simulations are not very sparse. It turns out that kurtosis outperforms Max/HC in simulation.
Summarizing, the work reported here seems to show that for all transforms considered, the excess kurtosis outperforms alternative methods despite their strong theoretical motivation. A reanalysis of the theory supporting those methods shows that the case for kurtosis can also be justified theoretically based on observed statistical properties of the transform coefficients not used in the original theoretic analysis.

Detecting Faint Non-Gaussian Signals Superposed on a Gaussian Signal
The superposition of a non-Gaussian signal with a Gaussian signal can be modeled as Y = N +G, where Y is the observed image, N is the non-Gaussian component and G is the Gaussian component. We are interested in using transform coefficients to test whether N ≡ 0 or not.

Hypothesis Testing and Likelihood Ratio Test (LRT).
Transform coefficients of various kinds [Fourier, wavelet, etc.] have been used for detecting non-Gaussian behavior in numerous studies. Let X 1 , X 2 , . . . , X n be the transform coefficients of Y ; we model these as where λ > 0 is a parameter, z i iid ∼ N (0, 1) are the transform coefficients of the Gaussian component G, w i iid ∼ W are the transform coefficients of the non-Gaussian component N , and W is some unknown symmetrical distribution. Here without loss of generality, we assume the standard deviation for both z i and w i are 1.
Phrased in statistical terms, the problem of detecting the existence of a non-Gaussian component is equivalent to discriminating between the hypotheses: 2)  Figure 2: Detectable regions in the α − r plane. With (α, r) in the white region on the top or the undetectable region, all methods completely fail for detection. With (α, r) in the white region on the bottom, both excess kurtosis and Max/HC are able to detect reliably. While in the blue region to the left, Max/HC is able to detect reliably, but excess kurtosis completely fails, and in the yellow region to the right, excess kurtosis is able to detect reliably, but Max/HC completely fail.
and N ≡ 0 is equivalent to λ ≡ 0. We call H 0 the null hypothesis H 0 , and H 1 the alternative hypothesis.
When both W and λ are known, then the optimal test for Problem (2.2) -(2.3) is simply the Neyman-Pearson Likelihood ratio test (LRT), [32,Page 74 ]. The size of λ = λ n for which reliable discrimination between H 0 and H 1 is possible can be derived using asymptotics. If we assume that the tail probability of W decays algebraically, (we say W has a power-law tail), and we calibrate λ to decay with n, so that increasing amounts of data are offset by increasingly hard challenges: then there is a threshold effect for the detection problem (2.2) -(2.3). In fact, define: then as n → ∞, LRT is able to reliably detect for large n when r < ρ * 1 (α), and is unable to detect when r > ρ * 1 (α); this is proved in [18]. Since LRT is optimal, it is not possible for any statistic to reliably detect when r > ρ * 1 (α). We call the curve r = ρ * 1 (α) in the α-r plane the detection boundary; see Figure 2.
In fact, when r < 1/4, asymptotically LRT is able to reliably detect whenever W has a finite 8-th moment, even without the assumption that W has a power-law tail. Of course, the case that W has an infinite 8-th moment is more complicated, but if W has a power-law tail, then LRT is also able to reliably detect if r < 2/α. Despite its optimality, LRT is not a practical procedure. To apply LRT, one needs to specify the value of λ and the distribution of W , which seems unlikely to be available. We need nonparametric detectors, which can be implemented without any knowledge of λ or W , and depend on X i 's only. In the section below, we are going to introduce two non-parametric detectors: excess kurtosis and Max; later in Section 4.3, we will introduce a third non-parametric detector: Higher Criticism (HC).

Excess Kurtosis and Max
We pause to review the concept of p-value briefly. For a statistic T n , the p-value is the probability of seeing equally extreme results under the null hypothesis: here P H 0 refers to probability under H 0 , and t n (X 1 , X 2 , . . . , X n ) is the observed value of statistic T n . Notice that the smaller the p-value, the stronger the evidence against the null hypothesis. A natural decision rule based on p-values rejects the null when p < α for some selected level α, and a convenient choice is α = 5%. When the null hypothesis is indeed true, the p-values for any statistic are distributed as uniform U (0, 1). This implies that the p-values provide a common scale for comparing different statistics.
We now introduce two statistics for comparison. Excess Kurtosis (κ n ). Excess kurtosis is a widely used statistic, based on the 4-th moment. For any (symmetrical) random variable X, the kurtosis is: The kurtosis measures a kind of departure of X from Gaussianity, as κ(z) = 0. Empirically, given n realizations of X, the excess kurtosis statistic is defined as: When the null is true, the excess kurtosis statistic is asymptotically normal: thus for large n, the p-value of the excess kurtosis is approximately: whereΦ(·) is the survival function (upper tail probability) of N (0, 1). It is proved in [18] that the excess kurtosis is asymptotically optimal for the hypothesis testing of (2 However, when E[W 8 ] = ∞, even though kurtosis is well-defined (E[W 4 ] < ∞), there are situations in which LRT is able to reliably detect but excess kurtosis completely fails. In fact, by assuming (2.4) -(2.5) with an α < 8, if (α, r) falls into the blue region of Figure 2, then LRT is able to reliably detect, however, excess kurtosis completely fails. This shows that in such cases, excess kurtosis is not optimal; see [18]. Max (M n ). The largest (absolute) observation is a classical and frequently-used nonparametric statistic: under the null hypothesis, M n ≈ 2 log n, and moreover, by normalizing M n with constants c n and d n , the resulting statistic converges to the Gumbel distribution E v , whose cdf is e −e −x : where approximately hereX and S n are the sample mean and sample standard deviation of {X i } n i=1 respectively. Thus a good approximation of the p-value for M n is: We have tried the above experiment for n = 244 2 , and found that taking c n = 4.2627, d n = 0.2125 gives a good approximation. Assuming (2.4) -(2.5) and α < 8, or λ = n −r and that W has a power-law tail with α < 8, it is proved in [18] that Max is optimal for hypothesis testing (2.2) -(2.3). Recall if we further assume 1 4 < r < 2 α , then asymptotically, excess kurtosis completely fails; however, Max is able to reliably detect and is competitive to LRT.
On the other hand, recall that excess kurtosis is optimal for the case α > 8. In comparison, in this case, Max is not optimal. In fact, if we further assume 2 α < r < 1 4 , then excess kurtosis is able to reliably detect, but Max will completely fail.
In Figure 2, we compared the detectable regions of the excess kurtosis and Max in the α-r plane.
To conclude this section, we mention an alternative way to approximate the p-values for any statistic T n . This alternative way is important in case that an asymptotic (theoretic) approximation is poor for moderate large n, an example is the statistic HC * n we will introduce in Section 4.3; this alternative way is helpful even when the asymptotic approximation is accurate. Now the idea is, under the null hypothesis, we simulate a large number (N = 10 4 or more) of n , we then tabulate them. For the observed value t n (X 1 , X 2 , . . . , X n ), the p-value will then be well approximated by: . . , X n )}, and the larger the N , the better the approximation.

Heuristic Approach
We have exhibited a phase-change phenomenon, where the asymptotically optimal test changes depending on power-law index α. In this section, we develop a heuristic analysis of detectability and phase change.
The detection property of Max follows from comparing the ranges of data. Recall that for large n, notice that: thus if and only if r < 2 α , M n for the alternative will differ significantly from M n for the null, and so the criterion for detectability by Max is r < 2 α . Now we study detection by excess kurtosis. Heuristically, thus if and only if r < 1 4 will κ n for the alternative differ significantly from κ n under the null, and so the criterion for detectability by excess kurtosis is r < 1 4 . This analysis shows the reason for the phase change. In Figure 2, when the parameter (α, r) is in the blue region, for sufficiently large n, n 1 α − r 2 √ 2 log n and the strongest evidence against the null is in the tails of the data set, which M n is indeed using. However, when (α, r) moves from the blue region to the yellow region, n 1 α − r 2 √ 2 log n, the tails no longer contain any important evidence against the null, instead, the central part of the data set contain the evidence. By symmetry, the 1 st and the 3 rd moments vanishes, and the 2 nd moment is 1 by the normalization; so the excess kurtosis is in fact the most promising candidate of detectors based on moments.
The heuristic analysis is the essence for theoretic proof as well as empirical experiment. Later in Section 3.4, we will have more discussions for comparing the excess kurtosis with Max down this vein.

Simulated Astrophysical Signals
The temperature anisotropies of the CMB contain the contributions of both the primary cosmological signal, directly related to the initial density perturbations, and the secondary anisotropies. The latter are generated after matter-radiation decoupling [51]. They arise from the interaction of the CMB photons with the neutral or ionised matter along their path [48,38,50].
In the present study, we assume that the primary CMB anisotropies are dominated by the fluctuations generated in the simple single field inflationary Cold Dark Matter model with a non-zero cosmological constant. The CMB anisotropies have therefore a Gaussian distribution. We allow for a contribution to the primary signal from topological defects, namely cosmic strings (CS), as suggested in [10].
We use for our simulations the cosmological parameters obtained from the WMAP satellite [6] and a normalization parameter σ 8 = 0.9. Finally, we obtain the so-called "simulated observed map", D, that contains the two previous astrophysical components. It is obtained from D λ = √ 1 − λCMB + √ λCS, where CMB and CS are respectively the CMB and the cosmic string simulated maps. λ = 0.18 is an upper limit constant derived by [10]. All the simulated maps have 500 × 500 pixels with a resolution of 1.5 arcminute per pixel.  Table 1: Empirical estimate 4-th, 5-th, 6-th, 7-th, and 8-th moments calculated using a subsamples of size n/2 k of {|w i |} n i=1 , with k = 0, 1, 2, 3, 4. The table suggests that the 4-th, 5-th, and 6-th moments are finite, but the 7-th and 8-th moments are infinite.

Evidence for E[W 8 ] = ∞
For the wavelet coefficients on the finest scale of the cosmic string map in the right panel of Figure 3, by throwing away all the coefficients related to pixels on the edge of the map, we have n = 244 2 coefficients; we then normalize these coefficients so that the empirical mean and standard deviation are 0 and 1 respectively; we denote the resulting dataset by , we randomly draw sub-samples of size n/2 k from {w i } n i=1 , and then take the average of the 8-th power of this subsequence; we repeat this process 50, 000 times, and we letm is obtained from all n samples. The results correspond to the first wavelet band are summarized in Table 1. ¿From the table, we have seen thatm ; this supports that E[W 8 ] = ∞. Similar results were obtained from the other bands. In comparison, in Table 1, we also list the 4-th, 5-th, 6-th, and 7-th moments. It seems that the 4-th, 5-th, and 6-th moments are finite, but the 7-th and 8-th moments are infinite.

Power-law Tail of W
Typical models for heavy-tailed data include exponential tails and power-law tails. We now compare such models to the data on wavelet coefficients for W ; the Gaussian model is also included as comparison.
We sort the |w i |'s in descending order, |w| (1) > |w| (2) > . . . > |w| (n) , and take the 50 largest samples |w| (1) > |w| (2) > . . . > |w| (50) . For a power-law tail with index α, we expect that for some constant C α , so there is a strong linear relationship between log( i n ) and log(|w| (i) ). Similarly, for the exponential model, we expect a strong linear relationship between log( i n ) and |w| (i) , and for the Gaussian model, we expect a strong linear relationship between log( i n ) and |w| 2 (i) . For each model, to measure whether the "linearity" is sufficient to explain the relationship between log( i n ) and log(|w| (i) ) (or |w| (i) , or |w| 2 (i) ), we introduce the following z-score: wherep i is the linear fit using each of the three models. If the resulting z-scores is random and have no specific trend, the model is appropriate; otherwise the model may need improvement.
The results are summarized in Figure 5. The power-law tail model seems the most appropriate: the relationship between log( i n ) and log(|w| (i) ) looks very close to linear, the z-score looks very small, and the range of z-scores much narrower than the other two. For the exponential model, the linearity is fine at the first glance, however, the z-score is decreasing with i, which implies that the tail is heavier than estimated. The Gaussian model fits much worse than exponential. To summarize, there is strong evidence that the tail follows a power-law. Now we estimate the index α for the power-law tail. A widely-used method for estimating α is the Hills' estimator [25]:α (l) where l is the number of (the largest) |w| (i) to include for estimation. In our situation, l = 50 andα =α (50) H = 6.134; we also found that the standard deviation of this estimate ≈ 0.9. Table 2 gives estimates of α for each band of the wavelet transform. This shows that α is likely to be only slightly less than 8: this means the performance of excess kurtosis and Max might be very close empirically.

Comparison of Excess Kurtosis vs. Max with Simulation
To test the results in Section 3.3, we now perform a small simulation experiment. A complete cycle includes the following steps. (n = 244 2 and {w i } n i=1 are the same as in Section 3.3).
Based on these simulations, first, we have estimated the probability of detection under various λ, for each detector: Fraction of detections = number of cycles with a p-value ≤ 0.05 500 .
Results are summarized in Figure 6. Second, we pick out those simulated values for λ = 0.05 alone, and plot the ROC curves for each detector. The ROC curve is a standard way to evaluate detectors [35]; the x-axis gives the fraction of false alarms (the fraction of detections when the null is true (i.e. λ = 0)); the y-axis gives the corresponding fraction of true detections). Results are shown in Figure 6. The figure suggests that the excess kurtosis is slightly better than M n . We also show an adaptive test, HC n in two forms (HC * n and HC + n ); these will be described later. We now interpret. As our analysis predicts that W has a power-law tail with E[W 8 ] = ∞, it is surprising that excess kurtosis still performs better than Max.
In Section 2.3, we compared excess kurtosis and Max in a heuristic way; here we will continue that discussion, using now empirical results. Notice that for the data set (w 1 , w 2 , . . . , w n ), the largest (absolute) observation is: and the excess kurtosis is: In the asymptotic analysis of Section 2.3, we assumed κ(W ) is a constant. However for n = 244 2 , we get a very large excess kurtosis 27.08 ≈ n 0.3 ; this will make excess kurtosis very favorable in the current situation. Now, in order for M n to work successfully, we have to take λ to be large enough that √ λM > 2 log n so λ > 0.072. The p-value of M n is then: moreover, the p-value for excess kurtosis is heuristicallȳ setting them to be equal, we can solve κ in terms of M : The curve κ = κ 0 (M ) separates the M -κ plane into 2 regions: the region above the curve is favorable to the excess kurtosis, and the region below the curve is favorable to Max. See Figure  7. In the current situation, the point (M, κ) = (17.48, 27.08) falls far above the curve; this explains why excess kurtosis is better than Max for the current data set.

CMB + CS
We study the relative sensitivity of the different wavelet-based statistical methods when the signals are added to a dominant Gaussian noise, i.e. the primary CMB. We ran 5000 simulations by adding the 100 CMB realisations to the CS (D(λ, i) = √ 1 − λCMB i + √ λCS, i = 1 . . . 100), using 50 different values for λ, ranging linearly between 0 and 0.18. Then we applied the bi-orthogonal wavelet transform, using the standard 7/9 filter [4] to these 5000 maps. On each band b of the wavelet transform, for each dataset D(λ, i), we calculate the kurtosis value K D(b,λ,i) . In order to calibrate and compare the departures from a Gaussian distribution, we have simulated for each image D(λ, i) a Gaussian Random Field G(λ, i) which has the same power spectrum as D(λ, i), and we derive its kurtosis values K G(b,λ,i) . For a given band b and a given λ, we derive for each each kurtosis K D(b,λ,i) its p-value p K (b, λ, i) under the null hypothesis (i.e. no CS) using the distribution of K G(b,λ, * ) . The mean p-valuep K (b, λ) is obtained by taking the mean of p K (b, λ, * ). For a given band b, the curvep K (b, λ) versus λ shows the sensitivity of the method to detect CS. Then we do the same operation, but replacing the kurtosis by HC and Max. Figure 8 shows the mean p-value versus λ for the nine finest scale subbands of the wavelet transform. The first three subbands correspond to the finest scale (high frequencies) in the three directions, respectively horizontal, vertical and diagonal. Bands 4,5, and 6 correspond to the second resolution level and bands 7,8,9 to the third. Results are clearly in favor of the excess kurtosis.
The same experiments have been repeated, but replacing the bi-orthogonal wavelet transform by the undecimated isotropicà trous wavelet transform. Results are similarly in favor of the excess kurtosis. Table 3 gives the λ values (multiplied 100) for which the CS are detected at a 95% confidence level. Only bands where this level is achieved are given. Smaller the λ, better the the sensibility of the method to the detect the CS. These results show that the excess kurtosis outperforms clearly HC and Max, whatever the chosen multiscale transform and the analyzed scale.
No method is able to detect the CS at a 95% confidence level after the second scale in these simulations. In practice, the presence of noise makes the detection even more difficult, especially in the finest scales.

CMB + SZ
We now consider a totally different contamination. Here, we take into account the secondary anisotropies due to the kinetic Sunyaev-Zel'dovich (SZ) effect [48]. The SZ effect represents the Compton scattering of CMB photons by the free electrons of the ionised and hot intra-cluster gas. When the galaxy cluster moves with respect to the CMB rest frame, the Doppler shift induces additional anisotropies; this is the so-called kinetic SZ (KSZ) effect. The kinetic SZ maps are simulated following Aghanim et al [2] and the simulated observed map D is obtained from D λ = CMB + λKSZ, where CMB and KSZ are respectively the CMB and the kinetic   SZ simulated maps. We ran 5000 simulations by adding the 100 CMB realisations to the KSZ (D(λ, i) = CMB i + λKSZ, i = 1 . . . 100), using 50 different values for λ, ranging linearly between 0 and 1. The p-values are calculated just as in the previous section. Table 4 gives the λ values for which SZ is detected at a 95% confidence level for the three multiscale transforms. Only bands were this level is achieved are given. Results are again in favor of the Kurtosis.

Curvelet Coefficients of Filaments
Curvelet Analysis was proposed by Candès and Donoho (1999) [12] as a means to efficiently represent edges in images; Donoho and Flesia (2001) [17] showed that it could also be used to describe non-Gaussian statistics in natural images. It has also been used for a variety of image processing tasks: [13,45,47]. We now consider the use of curvelet analysis for detection of non-Gaussian cosmological structures which are filamentary.

Curvelet Coefficients of Filaments
Suppose we have an image I which contains within it a single filament, i.e. a smooth curve of appreciable length L. We analyse it using the curvelet frame. Applying analysis techniques described carefully in [14], we can make precise the following claim: at scale s = 2 −j there will be about O(L2 j/2 ) significant coefficients caused by this filamentary feature, and they will all be of roughly similar size. The remaining O(4 j ) coefficients at that scale will be much smaller, basically zero in comparison.
The pattern continues in this way if there is a collection of m filaments of individual lengths L i and total length L = L 1 + · · · + L m . Then we expect roughly O(L2 j/2 ) substantial coefficients at level j, out of 4 j total.
This suggests a rough model for the analysis of non-Gaussian random images which contain apparent 'edgelike' phenomena. If we identify the edges with filaments, then we expect to see, at a scale with n coefficients, about Ln 1/4 nonzero coefficients. Assuming all the edges are equally 'pronounced', this suggests that we view the curvelet coefficients of I at a given scale as consisting of a fraction = L/n 3/4 nonzeros and the remainder zero. Under this model, the curvelet coefficients of a superposition of a Gaussian random image should behave like: where are the fraction of large curvelet coefficients corresponding to filaments, and µ is the amplitude of these coefficients of the non-Gaussian component N . The problem of detecting the existence of such a non-Gaussian mixture is equivalent to discriminating between the hypotheses: N (µ n , 1), (4.11) and N ≡ 0 is equivalent to n ≡ 0.
There is a threshold effect: setting then as n → ∞, LRT is able to reliably detect for large n when s > ρ * 2 (β), and is unable to detect when s < ρ * 2 (β), [30], [27], and [19]. Since LRT is optimal, it is not possible for any statistic to reliably detect when s < ρ * 2 (α). We call the curve s = ρ * 2 (β) in the β-s plane the detection boundary; see Figure 9.
We also remark that if the sparsity parameter β < 1/2, it is possible to discriminate merely using the value of the empirical variance of the observations or some other simple moments, and so there is no need for advanced theoretical approaches.  Figure 9: The detection boundary separates the square in the β-s plane into the detectable region and the undetectable region. When (β, s) falls into the estimable region, it is possible not only to reliably detect the presence of the signals, but also estimate them.

Adaptive Testing using Higher Criticism
The Higher Criticism statistic (HC), was proposed in [19], where it was proved to be asymptotically optimal in detecting (4.10) -(4.11).
To define HC first we convert the individual X i 's into p-values for individual z-tests. Let p i = P {N (0, 1) > X i } be the i th p-value, and let p (i) denote the p-values sorted in increasing order; the Higher Criticism statistic is defined as: or in a modified form: we let HC n refer either to HC * n or HC + n whenever there is no confusion. The above definition is slightly different from [19], but the ideas are essentially the same.
With an appropriate normalization sequence: a n = 2 log log n, b n = 2 log log n + 0.5 log log log n − 0.5 log(4π), the distribution of HC n converges to the Gumbel distribution E 4 v , whose cdf is exp(−4exp(−x)), [42]: a n HC n − b n → w E 4 v , so the p-values of HC n are approximately: exp(−4exp(−[a n HC n − b n ])). (4.14) For moderately large n, in general, the approximation in (4.14) is accurate for the HC + n , but not for HC * n . For n = 244 2 , taking a n = 2.2536 and b n = 3.9407 in (4.14) gives a good approximation for the p-value of HC + n . A brief remark comparing Max and HC. Max only takes into account the few largest observations, HC takes into account those outliers, but also moderate large observations; as a result, in general, HC is better than Max, especially when we have unusually many moderately large observations. However, when the actual evidence lies in the middle of the distribution both HC and Max will be very weak.

Curvelet Coefficients of Cosmic Strings
In Section 3, we studied wavelet coefficients of simulated cosmic strings. We now study the curvelet coefficients of the same simulated maps.
We now discuss empirical properties of Curvelet coefficients of (simulated) cosmic strings. This was first deployed on a test image showing a simple 'bar' extending vertically across the image. The result, seen in Figure 10 shows the image, the histogram of the curvelet coefficients at the next-to-finest scale, and the qq-plot against the normal distribution. The display matches in general terms the sparsity model of section 4. That display also shows the result of superposing Gaussian noise on the image; the curvelet coefficients clearly have the general appearance of a mixture of normals with sparse fractions at nonzero mean, just as in the model.
We also applied the curvelet transform to the simulated cosmic string data. Figure 11 shows the results, which suggest that the coefficients do not match the simple sparsity model. Extensive modelling efforts, not reported here, show that the curvelet coefficients transformed by |v| 0.815 have an exponential distribution.
This discrepancy from the sparsity model has two explanations. First, cosmic string images contain (to the naked eye) both point-like features and curvelike features. Because curvelets are not specially adapted to sparsifying point-like features, the coefficients contain extra information not expressible by our geometric model. Second, cosmic string images might contain filamentary features at a range of length scales and a range of density contrasts. If those contrasts exhibit substantial amplitude variation, the simple mixture model must be replaced by something more complex. In any event, the curvelet coefficients of cosmic strings do not have the simple structure proposed in Section 4.
When applying various detectors of non-Gaussian behavior to curvelet coefficients, as in the simulation of Section 3.5, we find that, despite the theoretical ideas backing the use of HC as an optimal test for sparse non-Gaussian phenomena, the kurtosis consistently has better performance. The results are included in Tables 3 and 4. Note that, although the curvelet coefficients are not as sensitive detectors as wavelets in this setting, that can be an advantage, since they are relatively immune to point-like features such as SZ contaimination. Hence they are more specific to CS as opposed to SZ effects.

Conclusion
The kurtosis of the wavelet coefficients is very often used in astronomy for the detection non-Gaussianities in the CMB. It has been shown [44] that it is also possible to separate the non-Gaussian signatures associated with cosmic strings from those due to SZ effect by combining the excess kurtosis derived from these both the curvelet and the wavelet transform. We have studied in this paper several other transform-based statistics, the MAX and the Higher Criticism, and we have compared them theoretically and experimentally to the kurtosis. We have shown that kurtosis is asymptotically optimal in the class of weakly dependent symmetric non-Gaussian contamination with finite 8-th moments, while HC and MAX are asymptotically optimal in the class of weakly dependent symmetric non-Gaussian contamination with infinite 8-th moment. Hence depending on the nature of the non-Gaussianity, a statitic is better than another one. This is a motivation for using several statistics rather than a single one, for analysing CMB data. Finally, we have studied in details the case of cosmic string contaminations on simulated maps. Our experiment results show clearly that kurtosis outperforms Max/HC.