Application of the HLSVD Technique to the Filtering of X-Ray Diffraction Data

A filter based on the Hankel Lanczos Singular Value Decomposition (HLSVD) technique is presented and applied for the first time to X-ray diffraction (XRD) data. Synthetic and real powder XRD intensity profiles of nanocrystals are used to study the filter performances with different noise levels. Results show the robustness of the HLSVD filter and its capability to extract easily and efficiently the useful crystallographic information. These characteristics make the filter an interesting and user-friendly tool for processing XRD data.


INTRODUCTION
In many applications of X-ray diffraction (XRD) techniques to the study of crystal properties, a key step in the data processing chain is an effective and adaptive noise filtering [1][2][3][4]. A correct noise removal can facilitate the separation of the useful crystallographic information from the background signal, and the estimation of crystal structure and domain size. Important issues of XRD data filtering are performances in noise suppression, capability to preserve the peak position, computational cost, and finally, the possibility of being used as a blackbox tool. Different digital filters have been applied to XRD data, in spatial and frequency domains. Simple procedures are based on polynomial filtering (and fitting) in the spatial domain [1]. A standard practice when working in frequency domain is to use Fourier smoothing. It consists in removing the high-frequency components of the spectrum [5]. Since the truncation of high-frequency components can be problematic in the case of high-level noise, a different approach based on the Wiener-Fourier (WF) filter has been proposed to clean XRD data [6]. A different approach, which makes use of the singular value decomposition (SVD), has been successfully applied to time-resolved XRD data to reduce noise level [3,4].
In this work, we describe an application of the Hankel-Lanczos singular value decomposition (HLSVD) algorithm to filter XRD intensity data. The proposed filter is based on a subspace-based parameter estimation method, called Hankel singular value decomposition (HSVD) [7], which is currently applied to nuclear magnetic resonance spectroscopy data for solvent suppression [8]. The HSVD method computes a "signal" subspace and a "noise" subspace by means of the SVD of the Hankel matrix H, whose entries are the noisy signal data points. Its computationally most intensive part consists of the computation of the SVD of the matrix H. Recently, several improved versions of the algorithm have been developed in order to reduce the needed computational time [8].
In this paper, we choose to apply the HSVD method based on the Lanczos algorithm with partial reorthogonalization (HLSVD-PRO), which is proved to be the most accurate and efficient version available in the literature. A comparison in terms of numerical reliability and computational efficiency of HSVD with its Lanczos-based variants can be found in [8].
A criterion is presented to facilitate the separation of noise from the useful crystallographic signal. It is completely user-independent since it is based on a numerical method. It will be described in more detail in Section 4. It enables the design of a blackbox filter to be used in the processing of XRD data. Here, the filter is applied to nanocrystalline XRD data. Nanocrystals are characterized by chemical and physical properties different from those of the bulk [9].

EURASIP Journal on Advances in Signal Processing
At a scale of a few nanometers, metals can crystallize in a structure different from that of bulk. Nowadays, different branches of science and engineering are benefiting from the properties of nanocrystalline materials [10]. In particular, recent XRD experiments have shown that intensities, measured as a function of the scattering angle, could be useful to extract structural and domain size information about nanocrystalline materials. Experimental XRD data were collected on the XRD beam line at the Brazilian Synchrotron Light Facility (LNLS-Campinas, Brazil) using 8.040 keV photons at room temperature (for further details, see [11]). In this case, samples with different diameters of powder of gold nanoparticles underwent diffraction measurements in the Xray domain. The diffracted intensity was recorded varying the diffraction angle, namely, the angle between the incident beam and the scattered one. Synthetic XRD datasets are generated by computing the X-ray scattered intensity from nanocrystalline samples of different sizes and properties by using an analytic expression (see (6)). Synthetic datasets are processed and filter performance is studied when considering different levels of noise. Numerical tests on real XRD data of Au nanocrystalline samples of different sizes and properties show the robustness of the proposed filter and its capability to extract easily and efficiently the useful crystallographic information. These characteristics make this filter an interesting and user-friendly tool for the interactive processing of XRD data.
The paper has the following structure. Section 2 is devoted to the theoretical aspects of the proposed approach. The dataset used to study the filter properties is described in Section 3. Numerical results are reported in Section 4. Finally, some conclusions are drawn in Section 5.

THE SUBSPACE-BASED PARAMETER ESTIMATION METHOD HSVD
Let us denote with I n the samples of the diffracted intensity signal collected at angles ϑ n , n = 0, . . . , N − 1. They are modelled as the sum of K exponentially damped complex sinusoids where I n and I 0 n , respectively, represent the measured and modelled intensities at the nth scattering angle ϑ n = nΔϑ+ϑ 0 , with Δϑ the sampling angular interval and ϑ 0 the initial scattering angular position, a k is the amplitude, ϕ k the phase, d k the damping factor, and f k the frequency of the kth sinusoid, k = 1, . . . , K, with K the number of damped sinusoids, and e n is complex white noise with a circular Gaussian distribution. It is worth noting that the value of K increases or decreases by 2 in order to guarantee that the modelled intensity is real. This constraint is enforced in the filtering process. The algorithm is described in detail in Appendix A.
It allows to estimate the parameters d k and f k appearing in (1). These are inserted into the model (1), which yields the set of equations with n = 0, 1, . . . , N − 1. The least-squares solution c k = a k exp i ϕ k of (2) provides the amplitude a k and phase ϕ k estimates of the model sinusoids. The computationally most intensive part of this method is the computation of the SVD of the Hankel matrix H. Various algorithms are available for computing the SVD of a matrix. The most reliable algorithm for dense matrices is due to Golub and Reinsch [12] and is available in LAPACK [13]. The Golub-Reinsch method computes the full SVD in a reliable way and takes approximately 2LM 2 + 4M 3 complex multiplications for an L × M matrix. However, when only the computation of a few largest singular values and corresponding singular vectors is needed, the method is computationally too expensive. More suitable algorithms exist which are based on the Lanczos method. The proposed approach relies on the latter.
A key step in the filtering procedure is the selection of the number K of damped sinusoids characterizing the model function of the HLSVD-PRO filter. Here, a possible approach is presented, which is based on the following frequency selection criterion: the singular values λ k , k = 1, . . . , r, are plotted versus the corresponding frequencies f k of the sinusoids in (1). This choice facilitates a direct comparison of the results of the proposed filter with those obtained by other filters based on a frequency approach. It was observed that, generally, crystallographic XRD intensity signals show a clear transition from a low-frequency region, characterized by high singular values λ k , to a high-frequency region with small singular values whose variability is below 10% with respect to the asymptotic value λ r , namely (λ K − λ r )/λ K < 0.1. We exploit this feature in order to automatically find a threshold. The index K of the frequency f K corresponding to the transition provides the number of damped sinusoids to be used in the HLSVD-PRO filter. The filter performance was evaluated using the measure where I exp/th/fil are the experimental/theoretical/filtered intensities, respectively.

DATASET
The HLSVD-PRO filter was applied to synthetic as well as real XRD data. In this section, the generation of XRD intensity profiles and the experimental setup for the acquisition of real data are described. Both synthetic and real XRD data refer to Au nanocrystalline samples. Nanocrystals are made of clusters of three different structure types: cuboctahedral C, icosahedral I, and decahedral D. For each fixed structure type X (X = C, I, D), the size n of clusters follows a lognormal distribution M. Ladisa et al.

3
with mode ξ X and logarithmic width s X . Structural distances for the different structure types X are generally studied independently of the actual nanomaterial. The nearest distance between atoms in the crystals is chosen as a reference length and is arbitrarily set to 1/ √ 2, a constant in various structures X and for all sizes n of the clusters. Actual distances in nanoclusters are then recovered by applying a correction factor a X (n) for strain, supposed to be uniform and isotropic. A convenient description of the strain factor as a function of the structure type and cluster size is given in terms of the four parameters [n 0 X , Ω X , Ξ X , w X ]. Intensities scattered by nanoclusters with size n and structure type X are computed by using the diffractional model based on the Debye function method [14,15]: where I 0 is the incident X-ray intensity, T(q ) is the Debye-Waller factor, f (q) is the atomic form factor, are, respectively, the dimensionless and the usual scattering vector lengths with a f.c.c. being the face-centered cubic (f.c.c.) bulk lattice constant; N X (n) is the number of atoms in the cluster, u X,n i, j the distance between the ith and jth atom, a X (n) the strain factor. The total scattered intensity is computed as where S X denotes the size of the largest cluster of type X, x X is the number fraction of each structure type ( X x X = 1), and f X (n) is the value of log-normal size distribution (4). It is worth noting that both intensities in (6) and (7) are actually functions of the scattering angle ϑ being q = 2a f.c.c. λ −1 sin ϑ. Experimental XRD intensity profiles are collected by counting, at each scattering angle ϑ n , the number of scattered photons giving the diffracted intensity signal I n . For such events, data are affected by Poisson noise. Since the number of photons scattered at each angle ϑ n is large, the Poisson-distributed noise can be approximated by a Gaussian-distributed noise as required in Section 2 [16]. Noisy synthetic XRD intensity profiles were built by generating Poisson-distributed random profiles with intensity I (7) taken as the mean value of the Poisson process. As a measure of the noise level, the noise-to-signal ratio (NSR) was defined as follows: where P (I) denotes a Poisson process with mean value I. Different NSR values were obtained by scaling the scattered intensity (7) by a factor F. Figure 1 displays XRD intensity Parameter  Figure 1: Synthetic XRD intensity profiles as a function of the scattering angle. From the upper to the lower profile, the NSR increases from 2% to 5% (see Figure 2 and text for details).
profiles with increasing NSRs. They were obtained by setting λ = 0.15418 nm and a f.c.c. = 0.40786 nm in (6). The set of parameters used to compute the synthetic profiles are summarized in Table 1. Figure 2 shows the NSR of the synthetic profiles as a function of the scaling factor F ranging from 0 to 2. This range contains the NSR values usually measured in experimental profiles.
We also considered real data in order to validate our method. Three different samples were prepared with resultant mean diameters of 2.0, 3.2, and 4.1 nm, respectively (as measured by transmission electron microscope). The size distributions were approximately characterized by the same full width at half-maximum (≈ 1 nm) for all three systems.

NUMERICAL RESULTS
Noisy synthetic XRD patterns were generated corresponding to nanocrystalline samples of increasing size from 2 to 4 nm, and Poisson-distributed noise with increasing NSR from 2% to 10%. The HLSVD-PRO filter was then applied to the noisy synthetic XRD signals in order to study their  (3)) of the filter performance as a function of the order K of the filter. The synthetic XRD intensity data refer to different sample sizes and NSRs. The best performance corresponds to the order K reported in the middle row of each NSR value.  properties under controlled conditions. Figure 3 displays an example of application of the HLSVD-PRO filter. A noisy synthetic XRD intensity profile is shown at the top of the figure. It corresponds to X-ray scattering from an Au sample having a 3 nm size with a Poisson-like noise with NSR = 10%. The filtered signal shown in the middle of the figure was obtained by setting K = 9 in the HLSVD-PRO filter. In the following (see Table 2 for results), we discuss in more detail the performance of the method when the values K = 7 and K = 11 are used. This value was estimated according to the criterion illustrated in Section 2. The values of λ k were plotted by first sorting frequencies f k in ascendant order. Specifically, a transition from high to small λ k was observed at frequency f K = 35 rad −1 , which represents the Kth frequency in the set of the sorted frequencies starting from the smallest one. For a comparison, the discrete Fourier transform (DFT) of the noisy synthetic XRD signal is reported at the bottom of  sition frequency is much more difficult to localize than in the HLSVD-PRO filter case. The same behavior is observed when using the WF filter. This makes troublesome the application of DFT and WF filters to clean noisy XRD data. It is worth noting that this difference between the HLSVD-PRO and Fourier-frequency-based filters is relevant when the filter is intended to be used during interactive XRD data analysis. In this case, the successful application of an easy-to-use blackbox filter becomes crucial.
Coming back to Figure 3, the difference between the values of noisy and filtered profiles is shown at the bottom. To quantify the performance of the filter, the filtered signal was compared with the noiseless synthetic XRD signal (see Figure 5). For the sake of completeness, we also report in Figure 5 the residue between the noiseless and the filtered signals. This can be done only with synthetic signals as experimental XRD data without noise are not available. To give a statistical significance to these measures a Monte Carlo experiment was carried out. More precisely, the HLSVD-PRO was applied to 1000 noisy synthetic profiles generated by considering samples of the same size undergoing different NSRs. For each filtered profile, the filter performance measure E , defined in (3), was estimated by calculating the mean value and the standard deviation. For each sample size and NSR, the mean and standard deviation are obtained using 1000 synthetic XRD intensity profiles with different M. Ladisa et al. noise realizations having the same NSR. The sensitivity to the number K of sinusoids of the HLSVD-PRO filter was also studied. This number was slightly varied around the optimal K value selected by using the threshold criterion. The performance results were compared in order to validate the choice of the optimal K value. In particular, K was increased and decreased by 2, as discussed in Section 1. The results of such a comparison are summarized in Table 2 and they show that the proposed threshold criterion provides the value of K corresponding to the best performance of the HLSVD-PRO filter.
The filter was also applied to real XRD intensity profiles of Au samples of sizes 2, 3.2, and 4.1 nm. Figure 6 shows at top the profile of a 3.2 nm Au sample with NSR = 2.3%. The latter is computed as σ / I , where σ and I are vectors with the measured error and the intensity values, respectively. Since in the case of XRD signals, the noise follows the Poisson distribution, σ is given by √ I. The result obtained by HLSVD-PRO is displayed in the middle of the figure. At bottom, the plot of singular values is depicted versus the frequency. Components with a frequency higher than f K = 34 rad −1 , due to noise, were removed. Denoising a real XRD profile of 500 intensity data samples, as typical ones used in the present study, requires about 11 seconds, using Matlab 7 on a machine with an Intel Xeon 2.80 GHz processor and a 512 KB cache size.
Finally, as a matter of comparison, we applied two wellknown parametric algorithms that are commonly used for spectral analysis: MUSIC and ESPRIT [17]. Such methods are generally expected to be more effective spectral tools compared to DFT since they rely on the use of a model func- tion. However, in the present case where the signal is better modelled with damped sinusoids, the aforementioned methods are not able to correctly filter the signal. This limitation comes from the use of prescribed model functions that do not account for damping. Extensive simulation studies by using synthetic as well as real data show that MUSIC and ES-PRIT fail. For instance, for the real XRD intensity reported in Figure 6, we computed the residue-to-signal ratio (RSR). We obtain the following results: RSR = 54% (ESPRIT), 51% (MUSIC), 2% (HLSVD).

CONCLUSIONS
A filter based on the HLSVD-PRO method has been presented. It has been applied to filter XRD patterns of nanocluster powders. The filter performance has been studied on synthetic and real XRD patterns with different NSRs. Results show that the proposed filter is robust and computationally efficient. A further advantage is its user-friendliness that makes it a useful blackbox tool for the processing of XRD data.

APPENDICES
HSVD is a subspace-based parameter estimation method in which the noisy signal is arranged in a Hankel matrix H. Its SVD allows to compute a "signal" subspace and a "noise" subspace. In fact, if H is constructed from a noiseless signal, the data matrix H has exactly rank equal to K, the number of exponentials that models the underlying signal. Due to the presence of the noise, H becomes a full-rank matrix. However, as long as the SNR of the signal is not too low, one can still define the "numerical" rank being approximately equal to K. Then, the "signal" subspace is found by truncating the SVD of the matrix H to rank K.
In the following subsections, the method will be derived in the context of linear algebra.

A. HSVD: THE ALGORITHM
The N data points defined in (1) The SVD of the Hankel matrix is computed as where Σ = diag{λ 1 , λ 2 , . . . , λ r }, λ 1 ≥ λ 2 ≥ · · · ≥ λ r ≥ 0, r = min(L, M), U and V are orthogonal matrices, and the superscript H denotes the Hermitian conjugate. The SVD is computed by using the Lanczos bidiagonalization algorithm with partial reorthogonalization [18]. This algorithm computes two matrix-vector products at each step. Exploiting the structure of the matrix (A.1) by using the FFT, the latter computation requires O((L+M) log 2 (L+M)) rather than O(LM). In order to obtain the "signal" subspace, the matrix H is truncated to a matrix H K of rank K, where U K , V K , and Σ K are defined by taking the first K columns of U and V , and the K × K upper-left matrix of Σ, respectively. The way of choosing K is described at the beginning of Section 4. As a subsequent step, the least-squares solution E of the following overdetermined set of equations is computed as where U (bottom) The model of (1) can be rewritten in terms of complex amplitudes c k and signal poles z k as follows: where c k = a k exp (iϕ k ) exp (− d k + i2π f k )ϑ 0 and z k = exp(− d k + i2π f k )Δϑ. Using this model function, the Hankel matrix H can be factorized as follows: This factorization is called Vandermonde decomposition and from it the signal parameters can immediately be derived. A well-known algorithm to directly compute the Vandermonde decomposition is available in the literature and is called Prony's method [19][20][21][22]. Here, a more reliable approach, based on an indirect computation of the parameters, is adopted. This approach is described below. From (B.3), it can be easily proved that the matrix S satisfies the so-called shiftinvariance property, that is, where S ↑ and S ↓ are derived from S by deleting its first and last rows, respectively, and Z is a K × K complex diagonal matrix with entries equal to the K signal poles z k , k = 1, . . . , K. The rank of the matrix H is equal to K, and thus, its SVD has the following form: where U K ∈ C L×K , U 2 ∈ C L×(L−K) , Σ K ∈ C K×K , V K ∈ C M×K , V 2 ∈ C M×(M−K) . From the comparison of (B.3) and (B.5), it follows that S and U K span the same column space, and hence are equal up to a multiplication by a nonsingular matrix Q ∈ C K×K , that is, Using (B.6), the shift-invariance property of (B.4) becomes The matrix Q −1 ZQ can be determined as the least-squares solution of (B.7). Several reliable and efficient algorithms are available in the literature and they exploit well-known algebraic tools such as the QR decomposition, the SVD decomposition, and so forth. The reader is referred to [12,23], where an exhaustive overview on the computation of the least-squares solution of a system of equations is provided.
Since the eigenvalues of Q −1 ZQ and Z are equal, the signal poles are easily derived as where the function eig(·) determines the eigenvalues of the matrix between brackets. From the signal poles, frequency and damping factors are estimated. By filling in these estimates into the model function (B.2), a new system of equations is obtained with unknowns equal to the complex variables c k . Its solution provides estimates for the amplitudes and the phases.

C. HSVD: NOISY DATA
When noise affects the data, as in real MRS signals, relation (B.5) no longer holds. Although no exact solution of the shift-invariance property exists, if the noise is small compared to the signal, H can be approximated by the truncated SVD, that is, where U K and V K are the first K columns of U and V , respectively, and Σ K is the K × K upper-left submatrix of Σ.
The matrix H K has rank K but its Hankel structure has been destroyed by the truncation of the SVD. Therefore, there exists no exact solution of the system in (C.1). However, estimates of the signal poles can still be obtained by solving the aforementioned system in an LS sense and the signal parameters can be derived from such estimates as in the noiseless case. Further details about the derivation of HSVD can be found in [7,24].