 Research
 Open Access
Discrete shearlet transform on GPU with applications in anomaly detection and denoising
 Xavier Gibert^{1}Email author,
 Vishal M Patel^{1},
 Demetrio Labate^{2} and
 Rama Chellappa^{1}
https://doi.org/10.1186/16876180201464
© Gibert et al.; licensee Springer. 2014
 Received: 4 November 2013
 Accepted: 19 April 2014
 Published: 10 May 2014
Abstract
Shearlets have emerged in recent years as one of the most successful methods for the multiscale analysis of multidimensional signals. Unlike wavelets, shearlets form a pyramid of welllocalized functions defined not only over a range of scales and locations, but also over a range of orientations and with highly anisotropic supports. As a result, shearlets are much more effective than traditional wavelets in handling the geometry of multidimensional data, and this was exploited in a wide range of applications from image and signal processing. However, despite their desirable properties, the wider applicability of shearlets is limited by the computational complexity of current software implementations. For example, denoising a single 512 × 512 image using a current implementation of the shearletbased shrinkage algorithm can take between 10 s and 2 min, depending on the number of CPU cores, and much longer processing times are required for video denoising. On the other hand, due to the parallel nature of the shearlet transform, it is possible to use graphics processing units (GPU) to accelerate its implementation. In this paper, we present an open source standalone implementation of the 2D discrete shearlet transform using CUDA C++ as well as GPUaccelerated MATLAB implementations of the 2D and 3D shearlet transforms. We have instrumented the code so that we can analyze the running time of each kernel under different GPU hardware. In addition to denoising, we describe a novel application of shearlets for detecting anomalies in textured images. In this application, computation times can be reduced by a factor of 50 or more, compared to multicore CPU implementations.
Keywords
 Shearlets
 Wavelets
 Image processing
 Parallelism
 Multicore
 GPU
1 Introduction
During the last decade, a new generation of multiscale systems has emerged which combines the power of the classical multiresolution analysis with the ability to process directional information with very high efficiency. Some of the most notable examples of such systems include the curvelets[1], the contourlets[2], and the shearlets[3]. Unlike classical wavelets, the elements of such systems form a pyramid of welllocalized waveforms ranging not only across various scales and locations, but also across various orientations and with highly anisotropic shapes. Due to their richer structure, these more sophisticated multiscale systems are able to overcome the poor directional sensitivity of traditional multiscale systems and have been used to derive several stateoftheart algorithms in image and signal processing (cf.[4, 5]).
Shearlets, in particular, offer a unique combination of very remarkable features: they have a simple and wellunderstood mathematical structure derived from the theory of affine systems [3, 6]; they provide optimally sparse representations, in a precise sense, for a large class of images and other multidimensional data where wavelets are suboptimal [7, 8]; and the directionality is controlled by shear matrices rather than rotations. This last property, in particular, enables a unified framework for continuum and discrete setting since shear transformations preserve the rectangular lattice and is an advantage in deriving faithful digital implementations [9, 10].
The shearlet decomposition has been successfully employed in many problems from applied mathematics and signal processing, including decomposition of operators [11], inverse problems [12, 13], edge detection [14–16], image separation [17], and image restoration [18–20]. However, one major bottleneck to the wider applicability of the shearlet transform is that current discrete implementations tend to be very time consuming, making its use impractical for large data sets and for realtime applications. For instance, the current (CPUbased) MATLAB implementation^{a} of the 2D shearlet transform, run on a typical desktop PC, takes about 2 min to denoise a noisy image of size 512×512 [9, 21]. The running time of the current (CPUbased) MATLAB implementation of the 3D shearlet transform for denoising a video sequence of size 192^{3} is about 5 min [20]. Running times for alternative shearlet implementations from Shearlab [10] as well as for the current implementation of the curvelet transform [22] are comparable.
In recent years, generalpurpose graphics processing units (GPGPUs) have become ubiquitous not only on highperformance computing (HPC) clusters, but also on workstations. For example, Titan, which was until recently the world’s fastest supercomputer, contains 18,688 NVIDIA Tesla K20X GPUs. These GPUs provide about 90% of Titan’s peak computing performance, which is greater than 20 PetaFLOPS (quadrillion floating point operations per second). Due to their energy efficiency and capabilities, GPGPUs are also becoming mainstream on mobile platforms, such as iOS and Android devices. There are two main architectures for GPGPU computing: CUDA and OpenCL. CUDA was designed by NVIDIA, and has been around since 2006. OpenCL was originally designed by Apple, Inc., and was introduced in 2008. OpenCL is an open standard maintained by the Khronos Group, whose members include Intel, AMD, NVIDIA, and many others, so it has broader industry acceptance than any other architecture. In 2009, Microsoft introduced DirectCompute as an alternative architecture for GPGPU computing, which is only available in Windows Vista and later. OpenCL has been designed to provide the developer with a common framework for doing computation on heterogeneous devices. One of the advantages of OpenCL is that it can potentially support any computing device, such as CPUs, GPUs, and FPGAs, as long as there is an OpenCL compiler available for such processor. NVIDIA provides CUDA/OpenCL drivers, libraries, and development tools for the three major operating systems (Linux, Windows, and Mac OS X), while AMD/ATI™ and Intel provide OpenCL drivers and tools for their respective GPUs.
The objective of this paper is to introduce and demonstrate a new implementation of the 2D and 3D discrete shearlet transform which takes advantage of the computational capabilities of the graphics processing unit (GPU). To demonstrate the effectiveness of the proposed implementations, we will illustrate its application on problems of image and video denoising and on a problem of feature recognition aiming at crack detection of railway components. In particular, we will show that our new implementation takes about 40 ms to denoise an image of size 512 × 512, which is a 233 × speedup compared to singlecore CPU, and about 3 s to denoise a video of size 192^{3}, which is a 551× speedup compared to singlecore CPU.
The organization of the paper is as follows. In Section 2, we recall the construction of 2D and 3D shearlets. Next, in Section 3, we present our implementation of the discrete shearlet transform, and in Section 4, we benchmark our implementation using three specific applications. Finally, concluding remarks and future work are discussed in Section 5.
2 Shearlets
In this section, we recall the construction of 2D and 3D shearlets (cf.[6, 7]).
2.1 2D shearlets
consisting of

the coarsescale shearlets$\left\{{\stackrel{~}{\psi}}_{1,k}=\Phi (\xb7k):k\in {\mathbb{Z}}^{2}\right\}$;

the interior shearlets$\{{\stackrel{~}{\psi}}_{j,\ell ,k,\nu}={\psi}_{j,\ell ,k}^{(\nu )}:j\ge 0,\ell <{2}^{\phantom{\rule{0.3em}{0ex}}j},k\in {\mathbb{Z}}^{2},\nu =1,2\}$, where the functions ${\psi}_{j,\ell ,k}^{(\nu )}$ are given by (2);

the boundary shearlets$\left\{{\stackrel{~}{\psi}}_{j,\ell ,k}:\phantom{\rule{0.3em}{0ex}}j\ge 0,\ell =\pm {2}^{\phantom{\rule{0.3em}{0ex}}j},k\in {\mathbb{Z}}^{2}\right\}$, obtained by joining together slightly modified versions of ${\psi}_{j,\ell ,k}^{\left(1\right)}$ and ${\psi}_{j,\ell ,k}^{\left(2\right)}$, for ℓ = ± 2^{ j }; after that, they have been restricted in the Fourier domain to the cones ${\mathcal{P}}_{1}$ and ${\mathcal{P}}_{2}$, respectively. We refer to [6] for details.
where M = M_{ C }∪ M_{ I }∪ M_{ B }are the indices associated with coarsescale shearlets, interior shearlets, and boundary shearlets, respectively. We have the following result from [6]:
Theorem 2.1.
All elements$\left\{{\stackrel{~}{\psi}}_{\mu},\phantom{\rule{0.6em}{0ex}}\mu \in M\right\}$are C^{ ∞ }and compactly supported in the Fourier domain.
As mentioned above, it is proved in [7] that the 2D Parseval frame of shearlets $\left\{{\stackrel{~}{\psi}}_{\mu},\phantom{\rule{0.6em}{0ex}}\mu \in M\right\}$ provides essentially optimal approximations for functions of two variables which are C^{2} regular away from discontinuities along C^{2} curves.
The mapping from $f\in {L}^{2}\left({\mathbb{R}}^{2}\right)$ into the elements $\u3008f{\stackrel{~}{\psi}}_{\mu}\u3009,\mu \in M$, is called the 2D shearlet transform.
2.2 3D shearlets
which again can be identified as the coarsescale, interior and boundary shearlets. It turns out that the 3D system of shearlets is a Parseval frame of ${L}^{2}\left({\mathbb{R}}^{3}\right)$[6] and it provides essentially optimal approximations for functions of three variables which are C^{2} regular away from discontinuities along C^{2} surfaces [8].
3 Discrete implementation
A faithful numerical implementation of the 2D shearlet transform was originally presented in [9]. Let us briefly recall the main steps of this implementation.
3.1 2D discrete shearlet transform
where ${g}_{j}(u,w)=\widehat{{f}_{d}^{\phantom{\rule{0.3em}{0ex}}j}}({\xi}_{1},{\xi}_{2}).$ This shows that the directional components are obtained by simply translating the window function V. The discrete samples g_{ j }[ n_{1},n_{2}] = g_{ j }(n_{1},n_{2}) are the values of the DFT of ${f}_{d}^{\phantom{\rule{0.3em}{0ex}}j}[\phantom{\rule{0.3em}{0ex}}{n}_{1},{n}_{2}]$ on a pseudopolar grid.
where ∗ denotes the onedimensional convolution along the n_{2} axis and ${\mathcal{F}}_{1}$ is the onedimensional discrete Fourier transform. Thus, (6) gives the algorithmic implementation for computing the discrete samples of g_{ j }(u,w) v(2^{ j }w  ℓ). At this point, to compute the shearlet coefficient in the discrete domain, it suffices to compute the inverse PDFT or directly reassemble the Cartesian sampled values and apply the inverse twodimensional FFT.Figure 3 illustrates the cascade of Laplacian pyramid and directional filtering. Recall that once the discrete shearlet coefficients are obtained, the inverse shearlet transform is computed using the following steps: (i) convolution of discrete shearlet coefficients and synthesis directional filters, (ii) summation of all directional components, and (iii) reconstruction by inverse Laplacian pyramidal transformation.
3.2 2D GPUbased implementation
Comparison of processing times for denoising a single precision 512 × 512 image
Step  4core CPU  GTX 690 GPU  

Time (s)  % time  Time (ms)  % time  
Laplacian pyramid  2.787  31.6  18.282  47.3 
Directional filters  4.386  49.7  18.350  47.5 
Hard threshold  0.375  4.2  1.967  5.1 
Other  1.281  14.5  0.063  0.2 
Total time  8.829 s  38.662 ms 
Since most of the computing time for performing a discrete shearlet transform is spent in FFT function calls, it is crucial to have the best possible library to perform FFTs. The main two GPU vendors provide optimized FFT libraries: NVIDIA provides cuFFT as part of its CUDA Toolkit, and AMD provides clAmdFft as part of its Accelerated Parallel Processing Math Libraries (APPML). We have decided to use CUDA as our development architecture both because there is better documentation and because of the availability of more mature development tools. We have implemented the device code in CUDA C++, while the host code is pure C++. Since both CUDA C/C++ and OpenCL are based on the C programming language, porting the code from CUDA to OpenCL should not be difficult. However, for code compactness, we have made extensive use of templates and operator overloading, which are supported in CUDA C++, but not in OpenCL, which is based on C99.
To facilitate the development, we have used GPUmat from the GPyou Group, a free (GPLv3) GPU engine for MATLAB^{®} (source code is available from http://sourceforge.net/projects/gpumat/). This framework provides two new classes, GPUsingle and GPUdouble, which encapsulate vectors of numerical data allocated on GPU memory and allow mathematical operations on objects of such classes via function and operator overloading. Transfers between CPU and GPU memory are as simple as doing type casting, and memory allocation and deallocation are done automatically. The idea is that existing MATLAB functions could be reused without any code changes. In practice, however, in order to get acceptable performance, it is necessary to handtune the code or even use lower level languages such as C/C++.
Fortunately, the GPUmat framework provides an interface for manipulating these objects from MEX files, and a mechanism for loading custom kernels. Although there are commercial alternatives to GPUmat such as Jacket from AccelerEyes, or the Parallel Computing Toolbox from Mathworks, we have found that GPUmat is pretty robust and adds very little overhead to the execution time as long as we follow good programming practices such as inplace operations and reuse of preallocated buffers.
Our implementation supports both single precision (32bit) and double precision (64bit) IEEE 754 floating point numbers (double precision is only supported on devices with compute capability 2.0 or newer due to limitations in the maximum amount of shared memory available per multiprocessor). We generate the filter bank of directional filters using the Fourierdomain approach from [9], where directional filters are designed as Meyertype window functions in the Fourier domain. Since this step only needs to be run once and does not depend on the image dimensions, we precompute these directional filters using the original MATLAB implementation.
For the Laplacian pyramidal decomposition, we ported the à trous algorithm using symmetric extension [2] into CUDA. This algorithm requires performing nonseparable convolutions with decimated signals. For efficiency reasons, the kernel that performs à trous convolutions preloads blocks of data into shared memory, so that the memory is only accessed once from each GPU thread.
With the above GPUbased Laplacian pyramid and directional filter implementation, it is just a matter of applying convolutions in the GPU to find the forward and inverse shearlet transform.
Main steps of the shearlet transform
Forward transform  Inverse transform  

1.  Laplacian decomposition  1.  Forward FFT of directional components 
2.  Forward FFT of Laplacian components  2.  Modulation with complex conjugate directional filter bank 
3.  Modulation of Laplacian components with directional filter bank  3.  Inverse FFT of directional components 
4.  Inverse FFT of directional components  4.  Laplacian reconstruction 
3.3 3D discrete shearlet transform
The algorithm for the discretization of the 3D shearlet transform is very similar to the 2D shearlet transform, and our implementation of the 3D discrete shearlet transform adapts the code available from http://www.math.uh.edu/~dlabate/3Dshearlet_toolbox.zip and described in [20]. The main practical difference is that storing the 3D shearlet coefficients is much more memoryintensive. Since the memory requirement can easily exceed the available GPU memory, in our algorithm, we compute one convolution at a time in CUDA and add the result to the output.
4 Applications
In the following, we illustrate the advantages of our new implementation of the discrete shearlet transform by considering three applications: denoising of natural images corrupted with white Gaussian noise, detection of cracks in textured images, and denoising of videos. The source code, sample data, as well as the MATLAB scripts used to generate all the figures in this paper are publicly available at http://www.umiacs.umd.edu/~gibert/ShearCuda.zip.
Specifications and computing environments for each of the graphics processors used on our benchmarks
GPU model  Memory (GB)  Number of cores  CC  OS  CUDA 

Tesla C1060  4  240  1.3  RHEL 5  5.0.35 
GeForce GTX 480  1.5  448  2.0  RHEL 6  4.2.9 
Tesla C2050  3  448  2.0  RHEL 6  4.2.9 
GeForce GTX 690^{a}  2  1,536  3.0  RHEL 6  5.0.35 
Tesla K20c  4.8  2,496  3.5  RHEL 6  5.0.35 
4.1 Image denoising
As a first test, we evaluated the performance of our implementation of the discrete shearlet transform on a problem of image denoising, using a standard denoising algorithm based on hard threshold of the shearlet coefficients. The setup is similar to the one described in [9]. That is, given an image $f\in {\mathbb{R}}^{{N}^{2}}$, we observe a noisy version of it given by u = f + ε, where $\epsilon \in {\mathbb{R}}^{{N}^{2}}$ is an additive white Gaussian noise process which is independent of f, i.e., $\epsilon \sim N(0,{\sigma}^{2}{\mathbf{I}}_{{N}^{2}\times {N}^{2}})$. Our goal is to compute an estimate $\stackrel{~}{f}$ of f from the noisy data u by applying a classical hard thresholding scheme [24] on the shearlet coefficients of u. The threshold levels are given by ${\tau}_{i,j,n}={\sigma}_{{\epsilon}_{i,j}}^{2}/{\sigma}_{i,j,n}^{2}$, as in [2, 9, 25], where ${\sigma}_{i,j,n}^{2}$ denotes the variance of the n th coefficient at the i th directional subband in the j th scale, and ${\sigma}_{{\epsilon}_{i,j}}^{2}$ is the noise variance at scale j and directional band i. The variances ${\sigma}_{{\epsilon}_{i,j}}^{2}$ are estimated by using a Monte Carlo technique in which the variances are computed for several normalized noise images and then the estimates are averaged.
where ∥ · ∥_{ F }is the Frobenius norm, the given image f is of size N × N, and $\stackrel{~}{f}$ denotes the estimated image.
In order to minimize latency as well as bandwidth usage on the PCIe bus, we first transferred the input image to GPU memory, then we let all the computation happen on the GPU and we finally transferred the results back to CPU memory. We have verified that both CPU and GPU implementations provide an output PSNR of 29.9 dB when the input PSNR is 22.1 dB. At these noise levels, there is no difference in PSNR between the single and the double precision implementations.
Table 1 shows the breakdown of different parts of the image denoising algorithm both on CPU and GPU.
4.2 Crack detection
Detection of cracks on concrete structures is a difficult problem due to the changes in width and direction of the cracks, as well as the variability in the surface texture. This problem has received considerable attention recently. Redundant representations, such as undecimated wavelets, have been extensively used for crack detection [26, 27]. However, wavelets have poor directional sensitivity and have difficulties in detecting weak diagonal cracks. To overcome this limitation, Ma et al. [28] proposed the use of the nonsubsampled contourlet transform[2] for crack detection. However, all these methods rely on the assumption that the background surface can be modeled as additive white Gaussian noise, and this assumption leads to matched filter solutions. As a matter of fact, on real images, textures are highly correlated and applying linear filters leads to poor performance.
To address this problem, we propose a completely new approach to crack detection based on separating the image into morphological distinct components using sparse representations, adaptive thresholding, and variational regularization. This technique was pioneered by Starck et al. [29] and later extended and generalized by many authors (e.g., [17, 18, 30]). In particular, we will use the Iterative Shrinkage Algorithm with a combined dictionary of shearlets and wavelets to separate cracks from background texture.
where for an n  dimensional vector b, the ℓ_{1} norm is defined as $\parallel b{\parallel}_{1}=\sum _{i}\left{b}_{i}\right.$ This image separation problem can be solved efficiently using an iterative shrinkage algorithm proposed in [17] (Figure 5).
 1.
ShearletC. This method takes advantage of the Parseval property of the shearlet transform and performs crack detection directly in the transform domain. We first decompose the image into cracks and texture components using iterative shrinkage with a shearlet dictionary and a wavelet one. Instead of using the reconstructed image, we analyze the values of the shearlet transform coefficients. For each scale in the shearlet transform domain, we analyze the directional components corresponding to each displacement and collect the maximum magnitude across all directions. If the sign of the shearlet coefficient corresponding to the maximum magnitude is positive, we classify the corresponding pixel as background; otherwise, we assign the norm of the vector containing the maximum responses at each scale to each pixel and we apply a threshold.
 2.
ShearletI. We first decompose the image into cracks and texture components as described for the previous method. Then, we apply an intensity threshold on the reconstructed cracks image.
 1.
Intensity. This is the most basic approach, which only uses image intensity. After compensating for slow variations of intensity in the image, we apply a global threshold.
 2.
Canny. We use the Canny[31] edge detector as implemented in MATLAB using the default $\sigma =\sqrt{2}$ and the default high to low threshold ratio of 40%.
After using a lowlevel detector, it may be necessary to remove small isolated regions corresponding to false detections due to random noise. This postprocessing step may reduce the false detection rate on intensitybased methods. However, to provide an objective comparison, we have generated the experimental results without running any postprocessing. We leave the performance analysis of a complete crack detector for future work.
In this paper, we report the peak F_{1} score for all methods. The Canny edge detection method estimates the location of the crack boundary, while the other three methods estimate the location of the crack itself. To have a meaningful comparison, we have generated separate ground truth masks for the crack outline, so we can use the same matching metric on the Canny method. For each method, we have used the same algorithm parameters on all the images.
Comparison of detection performance for different crack detection algorithms (best results are emphasized in italics)
Image  Method  AUC  F_{1}score  $\text{PD}{}_{\text{PF}=1{0}^{3}}$  $\text{PD}{}_{\text{PF}=1{0}^{4}}$ 

1  ShearletC  0.99915  0.79916  0.8398  0.6746 
ShearletI  0.99908  0.65810  0.7140  0.4247  
Intensity  0.99874  0.73188  0.7411  0.5722  
Canny  0.94457  0.27752  0.2114  0.1099  
2  ShearletC  0.99999  0.98841  0.9989  0.9895 
ShearletI  0.99557  0.62705  0.4837  0.3964  
Intensity  0.99037  0.55404  0.4371  0.3342  
Canny  0.99043  0.81787  0.6425  0.4462  
3  ShearletC  0.99934  0.76418  0.8368  0.5874 
ShearletI  0.99977  0.82353  0.9101  0.7098  
Intensity  0.99650  0.45992  0.0543  0.0000  
Canny  0.96248  0.19436  0.0000  0.0000 
4.3 Video denoising
5 Conclusions
The shearlet transform is an advanced multiscale method which has emerged in recent years as a refinement of the traditional wavelet transform and was shown to perform very competitively over a wide range of image and data processing problems. However, standard CPUbased numerical implementations are very timeconsuming and make the application of this method to large data sets and realtime problems very impractical.
In this paper, we have described how to speed up the computation of the 2D/3D discrete shearlet transform by using GPUbased implementations. The development of algorithms on GPU used to be tedious and require a very specialized knowledge of the hardware. Using CUDA, this is no longer the case, and scientists with C/C++ programming skills can quickly develop efficient GPU implementations of dataintensive algorithms. In this paper, we have taken advantage of the GPUbased implementation of the fast Fourier transform and used the capabilities of MATLAB for quick prototyping. The results presented in this paper illustrate the practical benefits of this approach. For example, a GeForce 480 GTX, a $200 graphics card, can perform video denoising 58 times faster than an expensive 64core machine while consuming much less power.
Our new implementation enables the efficient application of the shearlet decomposition to a variety of image and data processing tasks for which the required CPU resources would be prohibitive. There are further improvements and extensions that can be achieved such as precalculating the filter coefficients and porting the code to OpenCL so it can also run on AMD and Intel GPUs, but this would go beyond the scope of this paper.
Endnote
^{a} Note that this code also includes some C routines to speed up the computation time. This is true both for the 2D and 3D implementations.
Declarations
Acknowledgements
The authors thank Amtrak, ENSCO, Inc. and the Federal Railroad Administration for providing the images used in Section 4.2. This work was supported by the Federal Railroad Administration under contract DTFR5313C00032. DL acknowledges support from NSF grant DMS 1008900/1008907 and DMS 1005799.
Authors’ Affiliations
References
 Candès EJ, Donoho DL: New tight frames of curvelets and optimal representations of objects with C^{2} singularities. Comm. Pure Appl. Math 2004, 57: 219266. 10.1002/cpa.10116MATHMathSciNetView ArticleGoogle Scholar
 Cunha A, Zhou J, Do M: The nonsubsampled contourlet transform: theory, design, and applications. IEEE Trans. Image Process 2006, 15(10):30893101.View ArticleGoogle Scholar
 Labate D, Lim W, Kutyniok G, Weiss G: Sparse multidimensional representation using shearlets. Wavelets XI (San Diego, CA, 2005), Volume SPIE Proc. 5914 2005, 254262.Google Scholar
 Kutyniok G, Labate D: Shearlets: Multiscale Analysis for Multivariate Data. Birkhäuser, Boston; 2012.View ArticleMATHGoogle Scholar
 Starck JL, Murtagh F, Fadili JM: Sparse image and signal processing: wavelets, curvelets, morphological diversity. In Shearlets: Multiscale Analysis for Multivariate Data. Cambridge books online, Cambridge University Press Cambridge; 2010.Google Scholar
 Guo K, Labate D: The construction of smooth Parseval frames of shearlets. Math. Model. Nat. Phenom 2013, 8: 82105. 10.1051/mmnp/20138106MATHMathSciNetView ArticleGoogle Scholar
 Guo K, Labate D: Optimally sparse multidimensional representation using shearlets. Siam J. Math. Anal 2007, 9: 298318.MathSciNetView ArticleMATHGoogle Scholar
 Guo K, Labate D: Optimally sparse representations of 3D data with C^{2} surface singularities using Parseval frames of shearlets. Siam J. Math. Anal 2012, 44: 851886. 10.1137/100813397MATHMathSciNetView ArticleGoogle Scholar
 Easley GR, Labate D, Lim W: Sparse directional image representations using the discrete shearlet transform. Appl. Comput. Harmon. Anal 2008, 25: 2546. 10.1016/j.acha.2007.09.003MATHMathSciNetView ArticleGoogle Scholar
 Kutyniok G, Shahram M, Zhuang X: ShearLab: a rational design of a digital parabolic scaling algorithm. SIAM J. Imaging Sci 2012, 5(4):12911332. 10.1137/110854497MATHMathSciNetView ArticleGoogle Scholar
 Guo K, Labate D: Representation of Fourier integral operators using shearlets. J. Fourier Anal. Appl 2008, 14: 327371. 10.1007/s0004100890180MATHMathSciNetView ArticleGoogle Scholar
 Colonna F, Easley GR, Guo K, Labate D: Radon transform inversion using the shearlet representation. Appl. Comput. Harmon. Anal 2010, 29(2):232250. 10.1016/j.acha.2009.10.005MATHMathSciNetView ArticleGoogle Scholar
 Vandeghinste B, Goossens B, Van Holen R, Vanhove C, Pizurica A, Vandenberghe S, Staelens S: Iterative CT reconstruction using shearletbased regularization. IEEE Trans. Nuclear Sci 2013, 60(5):33053317.View ArticleGoogle Scholar
 Guo K, Labate D: Characterization and analysis of edges using the continuous shearlet transform. SIAM Imaging Sci 2009, 2: 959986. 10.1137/080741537MATHMathSciNetView ArticleGoogle Scholar
 Guo K, Labate D: Analysis and detection of surface discontinuities using the 3D continuous shearlet transform. Appl. Comput. Harmon. Anal 2011, 30: 231242. 10.1016/j.acha.2010.08.004MATHMathSciNetView ArticleGoogle Scholar
 Yi S, Labate D, Easley GR, Krim H: A shearlet approach to edge analysis and detection. IEEE Trans. Image Process 2009, 18(5):929941.MathSciNetView ArticleGoogle Scholar
 Kutyniok G, Lim W: Image separation using wavelets and shearlets. In Curves and Surfaces (Avignon, France, 2010), 416–430, Lecture Notes in Computer Science 6920. Springer Berlin Heidelberg; 2011.Google Scholar
 Easley G, Labate D, Negi PS: 3D data denoising using combined sparse dictionaries. Math. Model. Nat. Phenom 2013, 8: 6074.MATHMathSciNetView ArticleGoogle Scholar
 Patel VM, Easley G, Healy D: Shearletbased deconvolution. IEEE Trans. Image Process 2009, 18: 26732685.MathSciNetView ArticleGoogle Scholar
 Negi P, Labate D: 3D discrete shearlet transform and video processing. IEEE Trans. Image Process 2012, 21: 29442954.MathSciNetView ArticleGoogle Scholar
 Easley G, Labate D, Patel VM: Directional multiscale processing of images using wavelets with composite dilations. J. Math. Imaging Vis 2014, 48(1):1343. 10.1007/s1085101203854MATHMathSciNetView ArticleGoogle Scholar
 Candès EJ, Demanet L, Donoho D, Ying L: Fast discrete curvelet transforms. SIAM Multiscale Model. Simul 2006, 5(3):861899. 10.1137/05064182XMATHView ArticleMathSciNetGoogle Scholar
 Burt PJ, Adelson EH: The Laplacian pyramid as a compact image code. IEEE Trans. Commun 1983, 31(4):532540. 10.1109/TCOM.1983.1095851View ArticleGoogle Scholar
 Donoho D, Johnstone I: Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc 1995, 90: 12001224. 10.1080/01621459.1995.10476626MATHMathSciNetView ArticleGoogle Scholar
 Chang SG, Yu B, Vetterli M: Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process 2000, 9(9):15321546. 10.1109/83.862633MATHMathSciNetView ArticleGoogle Scholar
 Subirats P, Dumoulin J, Legeay V, Barba D: Automation of pavement surface crack detection using the continuous wavelet transform. IEEE International Conference on Image Processing, Atlanta, GA 3037.Google Scholar
 Chambon S, Moliard J: Automatic road pavement assessment with image processing: review and comparison. Int. J. Geophys. article ID 989354 2011, 20 pages. doi:10.1155/2011/989354Google Scholar
 Ma C, Zhao C, Hou Y: Pavement distress detection based on nonsubsampled contourlet transform. Int. Conf. Comput. Sci. Softw. Eng. 2008, 1: 2831.Google Scholar
 Starck JL, Elad M, Donoho D: Image decomposition via the combination of sparse representation and a variational approach. IEEE Trans. Image Process 2005, 14(10):15701582.MATHMathSciNetView ArticleGoogle Scholar
 Bobin J, Starck JL, Fadili M, Moudden Y, Donoho D: Morphological component analysis: an adaptive thresholding strategy. IEEE Trans. Image Process 2007, 16(11):26752681.MATHMathSciNetView ArticleGoogle Scholar
 Canny J: A computational approach to edge detection. Mach. Intell. 1986, 8(6):679698.View ArticleGoogle Scholar
 Oliveira H, Correia P: Automatic road crack detection and characterization. IEEE Trans. Intell. Transport. Syst. 2013, 14: 155168.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.