 Research
 Open Access
 Published:
Performance versus energy consumption of hyperspectral unmixing algorithms on multicore platforms
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 68 (2013)
Abstract
Hyperspectral imaging is a growing area in remote sensing in which an imaging spectrometer collects hundreds of images (at different wavelength channels) for the same area on the surface of the Earth. Hyperspectral images are extremely highdimensional, and require onboard processing algorithms able to satisfy near realtime constraints in applications such as wildland fire monitoring, mapping of oil spills and chemical contamination, etc. One of the most widely used techniques for analyzing hyperspectral images is spectral unmixing, which allows for subpixel data characterization. This is particularly important since the available spatial resolution in hyperspectral images is typically of several meters, and therefore it is reasonable to assume that several spectrally pure substances (called endmembers in hyperspectral imaging terminology) can be found within each imaged pixel. There have been several efforts towards the efficient implementation of hyperspectral unmixing algorithms on architectures susceptible of being mounted onboard imaging instruments, including field programmable gate arrays (FPGAs) and graphics processing units (GPUs). While FPGAs are generally difficult to program, GPUs are difficult to adapt to onboard processing requirements in spaceborne missions due to its extremely high power consumption. In turn, with the increase in the number of cores, multicore platforms have recently emerged as an easier to program platform compared to FPGAs, and also more tolerable radiation and power consumption requirements. However, a detailed assessment of the performance versus energy consumption of these architectures has not been conducted as of yet in the field of hyperspectral imaging, in which it is particularly important to achieve processing results in realtime. In this article, we provide a thoughtful perspective on this relevant issue and further analyze the performance versus energy consumption ratio of different processing chains for spectral unmixing when implemented on multicore platforms.
1 Introduction
Hyperspectral imaging instruments are capable of collecting hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth [1]. For instance, NASA is continuously gathering imagery data with instruments such as the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS), which operates in the 0.4–2.5 μ m spectral range, with 10 nm spectral resolution and 30 m spatial resolution [2]. As a followup to the success of AVIRIS (an airborne instrument), a new generation of satellite instruments for Earth observation are operating or under development (see Table 1). As indicated there, most current hyperspectral missions are spaceborne in nature [3].
One of the main problems in the analysis of hyperspectral data cubes is the presence of mixed pixels [4, 5], which arise when the spatial resolution of the sensor is not fine enough to separate spectrally distinct materials (see Figure 1). Spectral unmixing [6–8] is one of the most popular techniques to analyze hyperspectral data. It involves the separation of a pixel spectrum into its pure component (endmember) spectra [9, 10], and the estimation of the abundance value for each endmember [11, 12]. The linear mixture model assumes single scattering between the endmember substances resulting from the fact that they are sitting sidebyside within the field of view of the imaging instrument (see Figure 2a). On the other hand, the nonlinear mixture model [13–15] assumes nonlinear interactions and multiple scattering between endmember substances (see Figure 2b). In practice, the linear model is more flexible and can be easily adapted to different analysis scenarios [16]. It can be simply defined as follows [17]:
where y is an ndimensional pixel vector given by a collection of values at different wavelengths, $\mathbf{E}={\left\{{\mathbf{e}}_{i}\right\}}_{i=1}^{p}$ is a matrix containing p endmembers, $\mathbf{a}=[{a}_{1},{a}_{2},\dots ,{a}_{p}]$ is a pdimensional vector containing the abundance fractions for each of the p endmembers in y, and n is a noise term. Generalizing this expression for all the hyperspectral pixels in the scene (in compact matrix notation) yields Y = E A + N, where Y is the full hyperspectral image with m pixels, each with n bands, E is the endmember matrix with dimensions n × p, A is an p × m matrix containing the endmember abundances for each pixel of the scene, and N is a n×m noise matrix. With the aforementioned notation in mind, solving the linear mixture model involves: (1) estimating the number of endmembers, p, in the hyperspectral scene; (2) identifying a collection of $\mathbf{E}={\left\{{\mathbf{e}}_{i}\right\}}_{i=1}^{p}$ endmembers; and (3) estimating the fractional abundances of the p endmembers for each pixel in the hyperspectral data set.
Several techniques have been proposed to solve this problem under the linear mixture model assumption in recent years (see [18–43], among several others), but all of them are quite expensive in computational terms. Although these techniques map nicely to high performance computing platforms such as commodity clusters [44], these systems are difficult to adapt to onboard processing requirements introduced by applications with realtime constraints such as wild land fire tracking, biological threat detection, monitoring of oil spills and other types of chemical contamination [45–47]. In those cases, lowweight integrated components such as field programmable gate arrays (FPGAs) [48–50] and graphics processing units (GPUs) [51, 52] have the potential to reduce payload in current and future Earth observation missions, which are mainly spaceborne in nature as indicated by Table 1. Furthermore, GPUs offer fast processing at low cost (in the literature, so far GPUs have been the only platform shown to be able to process hyperspectral images in realtime) and easy programmability which are very appealing for future remote sensing missions [53–58]. On the negative side, FPGAs are difficult to program and there is currently a lack of efficient implementations of a full spectral unmixing chain for hyperspectral image processing in these type of architectures. Besides, GPUs are currently not suitable for spaceborne missions due to their high power consumption and the lack of radiation tolerance. These aspects are critical for the definition of the mission (payload) and its overall success and lifetime. In turn, with the increase in the number of cores, multicore platforms have recently emerged as an easier to program platform as compared to FPGAs, and also more flexible to accomodate to radiation tolerance and power consumption requirements. Nevertheless, a detailed assessment of the performance versus energy consumption of these architectures has not been conducted asofyet in the field of hyperspectral imaging, in which it is particularly important to achieve processing results in realtime with low energy cost.
In this article, we provide a thoughtful perspective on this relevant issue and further analyze the performance versus energy consumption ratio of different processing chains for spectral unmixing when implemented on multicore platforms. This kind of analysis has not been previously conducted in the literature, and in our opinion it is very important in order to really calibrate the possibility of using multicore platforms for efficient hyperspectral image processing in real remote sensing missions. The remainder of the article is organized as follows. Section 2 reviews the different modules that conform the considered unmixing chains discussed in this work. Section 3 describes the parallel implementation of these modules on multicore platforms. Section 4 presents an experimental evaluation of the proposed implementations in terms of unmixing accuracy, parallel performance and energy consumption, reporting several multicore implementations able to provide realtime analysis performance and discussing their energy consumption requirements. Section 5 concludes the article with some remarks and hints at plausible future research lines.
2 Spectral unmixing modules
In this section, we describe different modules for spectral unmixing of hyperspectral data. These modules will be then used to define spectral unmixing chains, which are composed of three main stages: (1) estimating the number of endmembers in the original hyperspectral scene; (2) identifying a collection of endmembers in the scene; and (3) estimating the fractional abundances of endmembers in each pixel of the scene. In the following, we describe different methods in each category, offering a few remarks that describe computational aspects such as the operations that are involved and the cost of the algorithm in terms of floatingpoint arithmetic operations (flops). In the following cost expressions, for simplicity, we neglect lower order terms, taking into account that in practice p, n ≪ m.
2.1 Methods for estimating the number of endmembers
This section introduces two different methods for estimating the number of endmembers: virtual dimensionality (VD) [59] and hyperspectral signal identification by minimum error (HySime) [60].
2.1.1 Virtual dimensionality (VD)
The VD method first calculates the eigenvalues of the covariance matrix ${\mathbf{K}}_{L\times L}=1/N{(\mathbf{Y}\overline{\mathbf{Y}})}^{T}(\mathbf{Y}\overline{\mathbf{Y}})$ and the correlation matrix ${\mathbf{R}}_{L\times L}={\mathbf{K}}_{L\times L}+\overline{\mathbf{Y}}{\overline{\mathbf{Y}}}^{T}$, referred to as covarianceeigenvalues and correlationeigenvalues, for each of the spectral bands in the original hyperspectral image Y. The VD concept follows the “pigeonhole principle”. If we represent a signal source by a pigeon and a spectral band by a hole, we can use a spectral band to accommodate one source. Thus, if a distinct spectral signature makes a contribution to the eigenvaluerepresented signal energy in one spectral band, then its associated correlation eigenvalue will be greater than its corresponding covariance eigenvalue in this particular band. Otherwise, the correlation eigenvalue would be very close to the covariance eigenvalue, in which case only noise would be present in this particular band. By applying this concept, a Neyman–Pearson detector [59] is introduced to formulate the issue of whether a distinct signature is present or not in each of the spectral bands of Y as a binary hypothesis testing problem. Here, the decision is made based on an input parameter of the algorithm that is called the false alarm probability or P _{ F }, which is used to establish the sensitivity of the algorithm in terms of how much error can be tolerated in the identification of the actual number of endmembers in the image data. With this interpretation in mind, the issue of determining an estimate $\widehat{p}$ for the number of endmembers is further simplified and reduced to a specific value of P _{ F } that is preset by the Neyman–Pearson detector.
From the computational point of view, the most complex operation in this algorithm is related with the calculation of the covariance and correlation matrices which need to be compared in order to determine the number of endmmebers. If we recall that the number of bands of the hyperspectral image is denoted by n, the total cost of each calculation is given by n ^{2} flops.
2.2 HySime method
The HySime method consists of two parts. Algorithm 1 describes the noise estimation part, which obtains an N × L matrix $\hat{\xi}$ containing an estimation of the noise present in the original hyperspectral image Y[60]. This algorithm follows an approach which addresses the high correlation exhibited by close spectral bands. The main advantage of Algorithm 1 is that the computational complexity is substantially lower than that of other algorithms for noise estimation in hyperspectral data in the literature. Additional details about Algorithm 1 can be found in [60] and we do not repeat them for space considerations. On the other hand, Algorithm 2 describes the signal subspace identification part of the algorithm, which first computes the noise correlation matrix ${\hat{\mathbf{R}}}_{n}$ and then computes the signal correlation matrix ${\hat{\mathbf{R}}}_{x}$. Next, the eigenvectors of the signal correlation matrix are obtained and sorted in ascending order. Finally, a minimization function is applied to obtain an estimate $\widehat{p}$ of the number of endmembers in the subspace $\hat{\mathbf{X}}$. The main purpose of this algorithm is to select the subset of eigenvectors that best represents the signal subspace in the minimum mean squared error sense. As in the case of the previous algorithm, the most complex operations are due to the calculation of the covariance and correlation matrices. Again, the total cost of each calculation is given by n ^{2} flops, where n is the number of bands of the hyperspectral image.
Algorithm 1 Noise estimation
Algorithm 2 Signal subspace estimation
2.3 Methods for endmember identification
This section introduces two different methods for identifying the endmember signatures in the hyperspectral data: orthogonal subspace projection with GramSchmidt orthogonalization (OSPGS) [18] and NFINDR [23].
2.3.1 Orthogonal subspace projection with GramSchmidt orthogonalization (OSPGS)
The OSP algorithm [18] was originally developed to find spectrally distinct signatures using orthogonal projections. For this work, we have used an optimization of this algorithm (see [61, 62]) which allows calculating the OSP without requiring the computation of the inverse of the matrix that contains the endmembers already identified in the image. This operation, which is difficult to implement in parallel, is accomplished using the GramSchmidt method for orthogonalization. This process selects a finite set of linearly independent vectors A = {a _{1}, …, a _{ p }} in the inner product space R ^{n} in which the original hyperspectral image is defined, and generates an orthogonal set of vectors B = {b _{1}, …, b _{ p }} which spans the same pdimensional subspace of R ^{n} (p ≤ n) as A. In particular, B is obtained as follows:
where the projection operator is defined as
and < a, b > denotes the inner product of vectors a and b.
The sequence b _{1}, …, b _{ p } in Equation (2) represents the set of orthogonal vectors generated by the GramSchmidt method, and thus, the normalized vectors e _{1}, …, e _{ p } in (2) form an orthonormal set. As far as B spans the same pdimensional subspace of R ^{n} as A, an additional vector b _{ p+1} computed by following the procedure stated at (2) is also orthogonal to all the vectors included in A and B. This algebraic assertion constitutes the cornerstone of the OSP method with GramSchmidt orthogonalization.
From the computational point of view, this algorithm has to be augmented with some sort of column pivoting that, at each step of the orthogonalization, detects the pixel with maximum projection value among those of the image (see [63] for details). Unfortunately, this requires that each projector is applied to all pixels of the scene, not only to p, yielding a significant increase in the arithmetic cost of the algorithm. Given the 3n flops required to apply the projector (3) to one pixel, and the p endmembers that have to be identified, the result is a total cost for the algorithm of 3mnp flops.
2.3.2 NFINDR
The NFINDR algorithm [23] is one of the most widely used and successfully applied methods for automatically determining endmembers in hyperspectral image data without using a priori information. This algorithm looks for the set of pixels with the largest possible volume by inflating a simplex inside the data. The procedure begins with a random initial selection of pixels (see Figure 3a). Every pixel in the image must be evaluated in order to refine the estimate of endmembers, looking for the set of pixels that maximizes the volume of the simplex defined by the selected endmembers. The mathematical definition of the volume of a simplex formed by a set of endmember candidates is proportional to the determinant of the set augmented by a row of ones. The determinant is only defined in the case where the number of features is p  1, p being the number of desired endmembers [9]. Since in hyperspectral data typically n≫p, a transformation that reduces the dimensionality of the input data is required. In this work, we use the principal component transform (PCT) [64] for this purpose. The corresponding volume is calculated for every pixel in each endmember position by replacing that endmember and finding the resulting volume. If the replacement results in an increase of volume, the pixel replaces the endmember. This procedure is repeated in iterative fashion until there are no more endmember replacements (see Figure 3b). The method can be summarized by a stepbystep algorithmic description which is given below for clarity:

1.
Feature reduction. Apply a dimensionality reduction transformation such as PCT to reduce the dimensionality of the data from n to d = p  1, where p is an input parameter to the algorithm (number of endmembers to be extracted). The basic idea of PCT is to orthogonally project the data into a new coordinate system, defined by the variance of the original data, i.e. the direction that accounts for the greatest variance of the original data will be the first coordinate (the principal component) of the transformed system, the second dimension will be the direction with the second largest variance, and so on. PCT requires the computation of the singular values and right singular vectors of $\stackrel{~}{\mathbf{Y}}={(\mathbf{Y}\overline{\mathbf{Y}})}^{T}(\mathbf{Y}\overline{\mathbf{Y}})$. In particular, consider the singular value decomposition (SVD) $\stackrel{~}{\mathbf{Y}}=\mathbf{U}\Sigma {\mathbf{V}}^{T}$ [64, 65]. Then, the PCT performs the dimension reduction n → (d = p  1) by replacing Y with the first p  1 columns of $\stackrel{~}{\mathbf{Y}}\mathbf{V}$.

2.
Initialization. Let $\{{\mathbf{e}}_{1}^{\left(0\right)},{\mathbf{e}}_{2}^{\left(0\right)},\dots ,{\mathbf{e}}_{p}^{\left(0\right)}\}$ be a set of endmembers randomly extracted from the input data.

3.
Volume calculation. At iteration k ≥ 0, calculate the volume defined by the current set of endmembers as follows:
$$V\left({\mathbf{e}}_{1}^{\left(k\right)},{\mathbf{e}}_{2}^{\left(k\right)},\dots ,{\mathbf{e}}_{p}^{\left(k\right)}\right)=\frac{\left\text{det}\left[\begin{array}{llll}1& 1& \dots & 1\\ {\mathbf{e}}_{1}^{\left(k\right)}& {\mathbf{e}}_{2}^{\left(k\right)}& \dots & {\mathbf{e}}_{p}^{\left(k\right)}\end{array}\right]\right}{(p1)!}.$$(4) 
4.
Replacement. For each pixel vector y in the input hyperspectral data, recalculate the volume by testing the pixel in all p endmember positions, i.e., first calculate $V\left(\mathbf{y},{\mathbf{e}}_{2}^{\left(k\right)},\dots ,{\mathbf{e}}_{p}^{\left(k\right)}\right)$, then calculate $V\left({\mathbf{e}}_{1}^{\left(k\right)},\mathbf{\text{y}},\dots ,{\mathbf{e}}_{p}^{\left(k\right)}\right)$, and so on until $V\left({\mathbf{e}}_{1}^{\left(k\right)},{\mathbf{e}}_{2}^{\left(k\right)},\dots ,\mathbf{y}\right)$. If none of the p recalculated volumes is greater than $V\left({\mathbf{e}}_{1}^{\left(k\right)},{\mathbf{e}}_{2}^{\left(k\right)},\dots ,{\mathbf{e}}_{p}^{\left(k\right)}\right)$, then no endmember is replaced. Otherwise, the combination with maximum volume is retained. Let us assume that the endmember absent in the combination resulting in the maximum volume is denoted by ${\mathbf{e}}_{i}^{(k+1)}$. In this case, a new set of endmembers is produced by letting ${\mathbf{e}}_{i}^{(k+1)}=\mathbf{y}$ and ${\mathbf{e}}_{l}^{(k+1)}={\mathbf{e}}_{l}^{\left(k\right)}$ for l ≠ i. The replacement step is repeated for all the pixel vectors in the input data until all the pixels have been exhausted.
Computationally, this algorithm requires two major operations: feature reduction and volume calculation + replacement. Exploiting that n ≪ m and that only a few columns of $\stackrel{~}{\mathbf{Y}}\mathbf{V}$ are required, determines that the PCT (first operation) can be computed in only 2m n ^{2} flops, which are basically due to the calculation of the SVD of $\stackrel{~}{\mathbf{Y}}$. The determination of the volumes are much more expensive. In particular, a straightforward implementation of the computations of the determinants in steps (3)–(4), via e.g. the LU factorization (with partial pivoting), renders a total cost of 2m p ^{4} / 3 flops, which results from having to compute mp factorizations of p × p matrices, with a cost of 2p ^{3} / 3 flops per LU factorization. In [63], we describe a refined alternative that, by exploiting simple properties of the LU factorization, reduces this cost to mp ^{3} + 2p ^{4} / 3 flops.
As a final comment, it has been observed that different random initializations of NFINDR may produce different final solutions. Thus, our NFINDR algorithm was implemented in iterative fashion, so that each sequential run was initialized with the previous algorithm solution, until the algorithm converges to a simplex volume that cannot be further maximized. Our experiments show that, in practice, this approach allows the algorithm to converge in a few iterations only.
2.4 Methods for abundance estimation
This section introduces two different methods for estimating the abundance fractions: unconstrained least squares (ULS) [65] and nonnegative constrained least squares (NCLS) [11].
2.4.1 Unconstrained least squares
Once a set of $\mathbf{E}={\left\{{\mathbf{e}}_{j}\right\}}_{j=1}^{p}$ endmembers has been estimated using an endmember extraction algorithm, an unconstrained pdimensional estimate of the endmember abundances in a given pixel in y can be simply obtained (in least squares sense) from the following expression [65]:
In the computation of (5), we can leverage that the term M = (E ^{T} E)^{1} E ^{T} remains fixed for all the pixels of the image. Thus, by explicitly obtaining M first, the cost of computing ${\widehat{\mathbf{a}}}^{\text{UC}}$ for all the scene pixels is basically reduced to 2m n p flops, since n, p ≪ m and, therefore, the number of arithmetic operations that are necessary to form M is negligible compared to that.
The main advantages of the unconstrained abundance estimation approach in Eq. (5) are the simplicity of its implementation and its fast execution. However, under this unconstrained model, the derivation of negative abundances is possible if the model endmembers are not pure or if they are affected by variability caused by spatial or temporal variations [9]. To address this issue, two physical constrains can be introduced into the model described in Eq. (1), these are the abundance nonnegativity constraint (ANC), i.e., a _{ j }≥0, and the abundance sumtoone constraint (ASC), i.e., $\sum _{j=1}^{p}{a}_{j}=1$[12]. Imposing the ASC results in the following optimization problem:
Similarly, imposing the ANC results in the optimization problem:
As indicated in [12], a fully constrained (i.e. ASCconstrained and ANCconstrained) estimate can be obtained in leastsquares sense by solving the optimization problems in Eq. (6) and Eq. (7) simultaneously. While partially constrained solutions imposing only the ANC have found success in the literature [11], the ASC is however prone to criticisms because, in a real image, there is a strong signature variability [66] that, at the very least, introduces positive scaling factors varying from pixel to pixel in the signatures present in the mixtures. As a result, the signatures are defined up to a scale factor, and thus, the ASC should be replaced with a generalized ASC of the form $\sum _{j=1}^{p}{\xi}_{j}\xb7{a}_{j}=1$, in which the weights ξ _{ j } denote the pixeldependent scale factors [67]. What we conclude is that the nonnegativity of the endmembers automatically imposes a generalized ASC. For this reason, in the following section we describe a solution that does not explicitly impose the ASC but only the ANC.
2.4.2 Nonnegative constrained least squares
A NCLS algorithm can be used to obtain a solution to the ANCconstrained problem described in Equation (7) in iterative fashion [11]. A successful approach for this purpose in different applications has been the image space reconstruction algorithm (ISRA) [68], a multiplicative algorithm for solving NCLS problems. The algorithm is based on the following iterative expression:
where the endmember abundances at pixel y are iteratively estimated, so that the abundances at the k+1th iteration, ${\widehat{\mathbf{a}}}^{k+1}$, depend on the abundances estimated at the kth iteration, ${\widehat{\mathbf{a}}}^{k}$. The procedure starts with an unconstrained abundance estimation ${\widehat{\mathbf{a}}}^{\text{UC}}$ which is progressively refined in a given number of iterations. For illustrative purposes, Algorithm 3 shows the ISRA pseudocode for unmixing one hyperspectral pixel vector y using a set of E endmembers. For simplicity, in the pseudocode y is treated as an ndimensional vector, and E is treated as a n × pdimensional matrix. The estimated abundance vector $\widehat{\mathbf{a}}$ is a pdimensional vector, and variable iters denotes the number of iterations per pixel in the abundance estimation process (in this work, we set iters = 200 as we have found good results empirically using this parameter setting). The pseudocode is subdivided into the numerator and denominator calculations in Equation (8). When these terms are obtained, they are divided and multiplied by the previous abundance vector. It is important to emphasize that the calculations of the fractional abundances for each pixel are independent, and therefore they can be calculated simultaneously without data dependencies, thus increasing the possibility of parallelization.
Algorithm 3 Pseudocode of ISRA algorithm for unmixing one hyperspectral pixel vector y using a set E of p endmembers
The pseudocode for the ISRA algorithm reveals that this procedure is composed of very simple arithmetic operations, but also that the innermost loop, for variable s, dominates its arithmetic cost. In particular, as two arithmetic operations are performed at each iteration of this loop, this yields a total cost for the algorithm of 2np ^{2} · iters flops.
3 Multicore implementations
The six numerical methods introduced in the previous section for the different stages of hyperspectral unmixing can be decomposed into a collection of basic and advanced dense linear algebra operations. Among the basic ones, we can find, e.g, vector scalings, inner (dot) products, matrixvector products, solution of triangular systems, matrix–matrix products, etc. The advanced ones comprise the solution of linear systems of equations, matrix inversion, eigenvalue problems, and singular values problems, among others. Fortunately, these operations are quite common to many other scientific and engineering applications, and nowadays there exist linear algebra libraries offering highly tuned and numerically reliable implementations of most of these operations for a variety of computer architectures, including multicore processors.
In particular, Basic Linear Algebra Subprograms (BLAS) [69–71] defines the specification (the interface and functionality) of a collection of routines for basic linear algebra operations as those listed above. There is a legacy implementation of BLAS publicly available at http://www.netlib.org, but the aspect that makes BLAS really useful is the existence of implementations developed by most hardware vendors and highly tuned for their specific products. These developments include Intel MKL, AMD ACML, and IBM ESSL for their multicore designs, but also more generic efforts like GotoBLAS2 and ATLAS. This approach has revealed so successful that NVIDIA, manufacturer of fancier hardware architectures such as GPUs, also offers their customers its own specialized implementation, CUBLAS. For the type of architectures considered in our work, i.e. multicore processors, an appealing property of these libraries is that they can exploit the existence of hardware concurrency, in the form of several cores, by carefully using optimized multithreaded codes. For example, the implementation of the matrix–matrix product kernel from MKL (routine_gemm), executed on a single core of an Intel Xeon core, attains more than 90% of the peak performance of the architecture when operating on matrices of moderate to large size. If several cores are used, and the problem dimension is scaled proportionally, the routine still achieves a similar performance rate.
The contents of BLAS are structured into three separate levels—BLAS1, BLAS2 and BLAS3—according to the number of flops and memory operations (memops) carried out by the kernels. Thus, routines from BLAS1 perform a linear number of flops on a linear number of data items and, therefore, memops; an example of a BLAS1 routine is the inner product of two vectors. For BLAS2, both flops and memops are quadratic on the amount of data items; the classical example for this level is the matrixvector product. Finally, for BLAS3 the flops are cubic while the memops are quadratic; e.g., the matrix–matrix product. The type of routine (level) has important implications on performance as current architectures feature a wide difference between the floatingpoint performance (flops/sec.) of the processor and the memory bandwidth (memops/sec.), and this gap continues growing. Concretely, only the routines from BLAS3 exhibit enough data reuse so as to exploit the hierarchical structure of the memory subsystem of current computers, with several layers of cache, and thus hide the large latencies that requires the access to data that lie on the main memory. Developers leverage this property by designing socalled blocked algorithms for their implementations of BLAS3 kernels that retrieve data from the main memory to the processor by blocks (square or rectangular submatrices), and operate with them as much as possible before returning the results back to memory. This is clearly not possible for BLAS1 and BLAS2 as the routines in these levels exhibit a flops/memops ratio that is O(1). An additional advantage of BLAS3 over the two other levels is that, in general, the use of multiple cores in a concurrent execution, is only justified if the arithmetic cost of the operation is cubic. In consequence, when implementing the spectral unmixing methods, it will be very important to identify numerical operations that can be casted in terms of the most convenient routine from BLAS, preferably BLAS3.
Linear Algebra PACKage (LAPACK) [72] provides advanced methods for dense linear algebra operations as those mentioned above. There is a legacy implementation at http://wwww.netlib.org, but some hardware vendors also include tuned versions of certain routines in their mathematical libraries (e.g., Intel MKL and AMD ACML). The routines in LAPACK make a heavy use of kernels from BLAS, thus inheriting the performance (and parallelism) intrinsic to the latter. LAPACK provides specialized implementations that can leverage special matrix properties like symmetry, band structure, positive definiteness, etc., when solving linear systems or linear least squares problems as well as calculating the eigenvalues/singular values of a matrix. To improve numerical accuracy and increase performance, it is very important to select the appropriate routine from LAPACK as, in many cases, this library offers different solvers to tackle one particular problem.
Our task of developing high performance, possibly parallel, codes for the spectral unmixing methods started by (i) carefully selecting the appropriate data structures to hold the data (image and intermediate results); and (ii) developing an initial sequential implementation of the method, while simultaneously identifying basic and advanced linear algebra operations that could be performed by invoking the appropriate routines from BLAS and LAPACK. For those parts of the method that could be performed using kernels from these libraries, the implementation/optimization task was over.
However, during the implementation of the methods, we detected certain parts of the methods that had to be manually encoded, usually in the form of (nested) loops. For many of these code fragments, special care was taken to apply basic optimizations such as avoiding expensive operations, eliminating common subexpressions, avoiding branches, selecting the appropriate order in nested loops, using the appropriate compiler optimizations, etc. After this stage, the execution of the code was profiled to identify possible performance bottlenecks. For those fragments of code, in particular loops, that exhibited a significant cost (execution time), we analyzed the possibility of reducing their impact by leveraging loop concurrency via OpenMP [73]. This is a standard parallelization tool that is available in most current compilers (e.g., Intel icc, GNU gcc) and allows an easy and, in most cases, efficient parallelization of C/Fortran codes on multicore processors.
Each one of these implementation and optimization stages was carefully monitored from the point of view of correctness, experimental accuracy, and performance. The result of this process was the spectral unmixing routines that we evaluate in the next section.
4 Experimental results
This section is organized as follows. Section 4.1 describes the hyperspectral data set used in experiments. In Section 4.2 we describe the multicore processing platforms. Finally, Section 4.3 performs a detailed assessment of the performance versus energy consumption of the considered multicore architectures when executing the different unmixing chains that can be formed with the processing modules described in Section 2.
4.1 Hyperspectral data
The hyperspectral data set used in experiments was collected by the AVIRIS sensor over the Cuprite mining district in Nevada in the summer of 1997 (see Figure 4). It is available online (in reflectance units) after atmospheric correction [74]. The portion used in experiments corresponds to a 350×350pixel subset of the sector labeled as f970619t01p02_r02_sc03.a.rfl in the online data, which comprise 188 spectral bands in the range from 400 to 2,500 nm and a total size of around 50 MB. Water absorption bands as well as bands with low signaltonoise ratio (SNR) were removed prior to the analysis. The site is well understood mineralogically, and has several exposed minerals of interest, including alunite, buddingtonite, calcite, kaolinite, and muscovite. Reference ground signatures of the above minerals, available in the form of a USGS library [75] have been used in the literature for evaluation purposes [16].
For illustrative purposes, Table 2 provides the values of p (number of endmembers) estimated by the VD method for the considered hyperspectral scene, using different values of the false alarm probability (P _{ F }). The number of endmembers estimated by HySime was p = 19 for the Cuprite scene. As shown by Table 2, a consensus between VD and HySime was observed for P _{ F } = 10^{5} and P _{ F } = 10^{6}. On the other hand, Table 3 shows the spectral angles (in degrees) between the most similar endmembers extracted by the OSPGS and the reference USGS spectral signatures available for this scene. The range of values for the spectral angle is [0°, 90°], with values close to 0° indicating higher spectral similarity. As shown by Table 3, the endmembers extracted by both the OSPGS and NFINDR algorithms are very similar, spectrally, to the USGS reference signatures, despite the potential variations (due to posible interferers still remaining after the atmospheric correction process) between the ground signatures and the airbone data. For illustrative purposes, Figure 5 plots the estimated endmembers against the groundtruth spectra for the considered endmember extraction algorithms.
4.2 Multicore platforms
In 2004, the evolution of processor architecture shifted from a progressive increment of clock frequency to a growth in the number of cores. Thus, although current processors still feature a moderate number of cores (between 4 and 16), the trend indicates that next generations will include a larger number of cores. On the other hand, asoftoday it is possible to build a commodity sharedmemory multiprocessor with four sockets that can accommodate 16core processors each, for a total of 64 cores in a single desktop platform.
Following this trend towards high levels of hardware concurrency at the corelevel, all the experiments were conducted on a platform equipped with 4 AMD Opteron 6172 processors, with 12 cores per processor, and a total of 48 cores in the platform. The software employed in the experiments included Intel MKL v10.3 implementation of the BLAS and LAPACK libraries, and Intel icc v12.1.3 compiler. The codes were compiled with the optimization flag O3, and singleprecision arithmetic was employed in all experiments. The explosion of hardware concurrency in multicore processors requires the development of efficient parallel software that attains a significant fraction of the platform peak performance. However, in some cases the target method does not exhibit enough concurrency to efficiently exploit all the computational units in the platform. Therefore, when executing this kind of methods, the use of all the cores results in an increment of the execution time (due to the overhead introduced by the synchronization, communication and management of the cores) and the corresponding waste of energy. In this context, it is preferable to limit the number of cores employed, idling the rest of them so that the OS can move these cores into an energysaving state (Cstate) that yields a significant reduction of dynamic power.
In our study of the spectral unmixing methods, each code was evaluated separately to determine the optimal number of cores from the point of view of execution time. Given that the energy consumption equals the product of power dissipation times execution time, in general, reducing the execution time is an important step towards increasing energy efficiency.
In our experiments, the power consumption was measured using an internal DC powermeter. This device obtains 25 samples per second and is attached to the 12 V lines connecting the power suply with the motherboard (chipset plus processors) of the platform (see Figure 6). With this configuration, the results are not affected by inefficiencies of the power supply unit, or the “noise” due to the operation of other hardware components like fans, disks, etc. Also, samples from the powermeter are collected in a separate system, to prevent the measurements from impairing the accuracy of the tests. The error of the device is less than 5%. The powermeter is controlled via our library pmlib, which requires minimum changes to the code.
4.3 Assessment of performance versus energy consumption
In this section, we analyze the processing times, energy consumption and maximum power obtained by different combinations of the modules reported in Section 2 for estimating the number of endmembers (VD and HySime), identifying the endmember signatures (OSPGS and N  FINDR), and estimating the abundances (ULS and ISRA). In all cases, the number of endmembers to be extracted was set to p = 19. Table 4 shows the processing times, energy consuption and maximum power for different full unmixing chains formed using combinations of the aforementioned modules. In Table 4, we also indicate (in the parentheses) the number of cores from the AMD Opteron 6172 system that were used in the multicore implementation of each method. It is important to emphasize that, in order to satisfy the realtime processing constraint for the AVIRIS Cuprite scene, we should be able to process it in less than 1.98 s which is the time needed by the instrument to collect the data. As shown in Table 4, only the chain VD+OSPGS+LSU achieved realtime performance, with two other chains (VD + OSP GS + ISRA and VD + NFINDR + ISRA) providing near realtime performance in the target multicore architecture. It is remarkable that the inclusion of HySime for the identification of the number of endmembers significantly increments the processing times and also increases the energy consumption.
In order to investigate the individual contributions of the methods in different parts of the unmixing chain to the total processing time, Table 5 reports the processing times measured for all the individual methods when processing the AVIRIS Cuprite scene. It can be observed that, by far, HySime is the most computationally expensive method while VD provides an alternative for this part of the chain which is about 100 × faster in comparison. Table 5 also reveals that the OSPGS method used for the endmember identification is about 9 × faster than NFINDR in the multicore. Finally, ISRA is more than 100 × slower than ULS as a consequence of the fact that it imposes the non negativity constraint in the abundance estimation. The results in Table 5 suggest that the combination VD+OSPGS+ULS provides a good basis for fast spectral unmixing in the considered multicore platform (in fact, this combination is the only one that results in realtime performance in our experiments).
For comparative purposes, Table 6 shows the processing times measured for the same methods reported in Table 5 implemented in the NVidia ^{TM} GeForce GTX 580 GPU [76], which features 512 processor cores operating at 1.544 GHz, with a total dedicated memory of 1,536 MB, at 2.004 MHz (with 384bit GDDR5 interface) and memory bandwidth of 192.4 GB/s. The GPU is connected to an Intel core i7 920 CPU at 2.67 GHz with eight cores, which uses a motherboard Asus P6T7 WS SuperComputer. As shown in Table 6, the processing times in the GPU are comparatively similar to those reported for the multicore for the OSPGS and ULS algorithms. On the other hand, the times for HySime, NFINDR and ISRA are sensibly lower in the GPU, with only the VD being slightly faster in the multicore than in the GPU. However, the energy consumption is much higher in the GPU, which still makes the multicore platform a more interesting platform from an operational point of view.
5 Conclusions
In this article, we have addressed hyperspectral imaging via spectral unmixing on multicore processors, exposing a detailed evaluation of the performance and energy requirements of efficient parallel codes for all the stages in the spectral unmixing chain on multicore processors. Specifically, we have implemented modules for (i) the estimation of the number of endmembers, (ii) the identification of a collection of these, and (iii) the estimation of the fractional abundances, using kernels from highly tuned linear algebra libraries and OpenMP directives on a platform equipped with 48 AMD cores.
Our study offers two major conclusions:

Three of the unmixing chains attain realtime performance (VD + OSP  GS + ULS) or close to it (VD + OSP  GS + ISRA and VD + NFINDR + ULS) as the underlying modules VD, OSPGS, NFINDR, LSU and ISRA exhibit a high degree algorithmic concurrency that can be leveraged to yield efficient parallel implementations on current multicore processors. On the other hand, HySime presents a performance bottleneck that turns those chains that utilize this module inappropriate for realtime image processing.

With the expected increase in the number of cores in future architectures, sharedmemory platforms equipped with a few multicore processors are a competitive approach to efficiently tackle computationally expensive hyperspectral imaging applications on cheap commodity hardware. Compared with FPGAs, conventional multicore processors offer the plain advantage of being much easier to program, considerably improving the software development cycle. Furthermore, generalpurpose cores offer an appealing performanceenergy ratio and they clearly outperform GPUs in their tolerance to incorporate radiationavoidance mechanisms.
As future work, we plan to analyze in more detail the energy consumption of other types of architectures, including FPGAs or GPUs, which are currently considered as candidate specialized hardware platforms for onboard hyperspectral image processing.
References
 1.
Goetz AFH, Vane G, Solomon JE, Rock BN: Imaging spectrometry for earth remote sensing. Science 1985, 228: 11471153. 10.1126/science.228.4704.1147
 2.
Green RO, Eastwood ML, Sarture CM, Chrien TG, Aronsson M, Chippendale BJ, Faust JA, Pavri BE, Chovit CJ, Solis M: Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sens. Envir 1998, 65(3):227248. 10.1016/S00344257(98)000649
 3.
Plaza A, Plaza J, Paz A, Sanchez S: Parallel hyperspectral image and signal processing. IEEE Signal Process. Mag 2011, 28(3):119126.
 4.
Plaza A, Benediktsson JA, Boardman J, Brazile J, Bruzzone L, CampsValls G, Chanussot J, Fauvel M, Gamba P, Gualtieri J, Marconcini M, Tilton TC, Trianni G: Recent advances in techniques for hyperspectral image processing. Remote Sens. Envir 2009, 113: 110122.
 5.
Plaza A, Du Q, BioucasDias JM, Jia X, Kruse F: Foreword to the special issue on spectral unmixing of remotely sensed data. IEEE Trans. Geosci. Remote Sens 2011, 49(11):41034110.
 6.
Johnson PE, Smith MO, TaylorGeorge S, Adams JB: A semiempirical method for analysis of the reflectance spectra for binary mineral mixtures. J. Geophys. Res 1983, 88: 35573561. 10.1029/JB088iB04p03557
 7.
Adams JB, Smith MO, Johnson PE: Spectral mixture modeling: a new analysis of rock and soil types at the Viking Lander 1 site. J. Geophys. Res 1986, 91: 80988112. 10.1029/JB091iB08p08098
 8.
Keshava N, Mustard JF: Spectral unmixing. IEEE Signal Process. Mag 2002, 19: 4457. 10.1109/79.974727
 9.
Plaza A, Martinez P, Perez R, Plaza J: A quantitative and comparative analysis of endmember extraction algorithms from hyperspectral data. IEEE Trans. Geosci. Remote Sens 2004, 42(3):650663. 10.1109/TGRS.2003.820314
 10.
Du Q, Raksuntorn N, Younan NH, King RL: Endmember extraction for hyperspectral image analysis. Appl. Opt 2008, 47: 7784. 10.1364/AO.47.000F77
 11.
Chang CI, Heinz D: Constrained subpixel target detection for remotely sensed imagery. IEEE Trans Geosci. Remote Sens 2000, 38: 11441159. 10.1109/36.843007
 12.
Heinz D, Chang CI: Fully constrained least squares linear mixture analysis for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens 2001, 39: 529545. 10.1109/36.911111
 13.
Borel CC, Gerstl SAW: Nonlinear spectral mixing model for vegetative and soil surfaces. Remote Sens. Envir 1994, 47(3):403416. 10.1016/00344257(94)901074
 14.
Liu W, Wu EY: Comparison of nonlinear mixture models. Remote Sens. Envir 2004, 18: 19762003.
 15.
Raksuntorn N, Du Q: Nonlinear spectral mixture analysis for hyperspectral imagery in an unknown environment. IEEE Geosci. Remote Sens. Lett 2010, 7(4):836840.
 16.
Plaza A, Martin G, Plaza J, Zortea M, Sanchez S: Recent developments in spectral unmixing and endmember extraction. In Optical Remote Sensing. Edited by: Prasad S, Bruce LM, Chanussot J. Berlin, Germany: Springer; 2011:235267.
 17.
Settle JJ, Drake NA: Linear mixing and the estimation of ground cover proportions. Int. J. Remote Sens 1993, 14: 11591177. 10.1080/01431169308904402
 18.
Harsanyi JC, Chang CI: Hyperspectral image classification and dimensionality reduction: An orthogonal subspace projection. IEEE Trans. Geosci. Remote Sens 1994, 32(4):779785. 10.1109/36.298007
 19.
Boardman JW, Kruse FA, Green RO: Mapping target signatures via partial unmixing of AVIRIS data. In Proc. JPL Airborne Earth Science Workshop. USA: Pasadena; 1995:2326.
 20.
Bowles JH, Palmadesso PJ, Antoniades JA, Baumback MM, Rickard LJ: Use of filter vectors in hyperspectral data analysis. Proc. SPIE Infrared Spaceborne Remote Sens. III 1995, 2553: 148157. 10.1117/12.221352
 21.
Neville RA, Staenz K, Szeredi T, Lefebvre J, Hauff P: Automatic endmember extraction from hyperspectral data for mineral exploration. In Proc. 21st Canadian Symp. Remote Sens. Canada: Ottawa; 1999:2124.
 22.
Ifarraguerri A, Chang CI: Multispectral and hyperspectral image analysis with convex cones. IEEE Trans. Geosci. Remote Sens 1999, 37(2):756770. 10.1109/36.752192
 23.
Winter ME: NFINDR: An algorithm for fast autonomous spectral endmember determination in hyperspectral data. Proc. SPIE 1999, 3753: 266277. 10.1117/12.366289
 24.
Du Q, Ren H, Chang CI: A comparative study for orthogonal subspace projection and constrained energy minimization. IEEE Trans. Geosci. Remote Sens 2003, 41(6):15251529. 10.1109/TGRS.2003.813704
 25.
Berman M, Kiiveri H, Lagerstrom R, Ernst A, Dunne R, Huntington JF: ICE: a statistical approach to identifying endmembers in hyperspectral images. IEEE Trans. Geosci. Remote Sens 2004, 42(10):20852095.
 26.
Nascimento JMP, BioucasDias JM: Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens 2005, 43(4):898910.
 27.
Plaza A, Chang CI: Impact of initialization on design of endmember extraction algorithms. IEEE Trans. Geosci. Remote Sens 2006, 44(11):33973407.
 28.
Chang CI, Plaza A: A fast iterative algorithm for implementation of pixel purity index. IEEE Geoscience. Remote Sens. Lett 2006, 3: 6367. 10.1109/LGRS.2005.856701
 29.
Rogge DM, Rivard B, Zhang J, Feng J: Iterative spectral unmixing for optimizing perpixel endmember sets. IEEE Trans. Geosci. Remote Sens 2006, 44(12):37253736.
 30.
Wang J, Chang CI: Applications of independent component analysis in endmember extraction and abundance quantification for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens 2006, 44(9):26012616.
 31.
Chang CI, Wu CC, Liu W, Ouyang YC: A new growing method for simplexbased endmember extraction algorithm. IEEE Trans. Geosci. Remote Sens 2006, 44(10):28042819.
 32.
Zare A, Gader P: Sparsity promoting iterated constrained endmember detection for hyperspectral imagery. IEEE Geosci. Remote Sens. Lett 2007, 4(3):446450.
 33.
Zare A, Gader P: Hyperspectral band selection and endmember detection using sparsity promoting priors. IEEE. Remote Sens. Lett 2008, 5(2):256260.
 34.
Zortea M, Plaza A: A quantitative and comparative analysis of different implementations of NFINDR: a fast endmember extraction algorithm. IEEE Geosci. Remote Sens. Lett 2009, 6: 787791.
 35.
Tao X, Wang B, Zhang L: Orthogonal bases approach for the decomposition of mixed pixels in hyperspectral imagery. IEEE Geosci. Remote Sens. Lette 2009, 6: 219223.
 36.
Zare A, Gader P: PCE: piecewise convex endmember detection. IEEE Trans. Geosci. Remote Sens 2010, 48(6):26202632.
 37.
Chang CI, Wu CC, Lo CS, Chang ML: Realtime simplex growing algorithms for hyperspectral endmember extraction. IEEE Trans. Geosci. Remote Sens 2010, 48(4):18341850.
 38.
Schmidt F, Schmidt A, Treandguier E, Guiheneuf M, Moussaoui S, Dobigeon N: Implementation strategies for hyperspectral unmixing using Bayesian source separation. IEEE Trans. Geosci. Remote Sens 2010, 48(11):40034013.
 39.
Duran O, Petrou M: Robust endmember extraction in the presence of anomalies. IEEE Trans. Geosci. Remote Sens 2011, 49(6):19861996.
 40.
Zhang B, Sun X, Gao L, Yang L: Endmember extraction of hyperspectral remote sensing images based on the ant colony optimization (ACO) algorithm. IEEE Trans. Geosci. Remote Sens 2011, 49(7):26352646.
 41.
Shoshany M, Kizel F, Netanyahu N, Goldshlager N, Jarmer T, EvenTzur G: An iterative search in endmember fraction space for spectral unmixing. IEEE Geosci. Remote Sens. Lett 2011, 8(4):706709.
 42.
Chang CI, Wu CC, Chen HM: Random pixel purity index. IEEE Geosci. Remote Sens. Lett 2010, 7(2):324328.
 43.
Dowler S, Andrews M: On the convergence of NFINDR and related algorithms: to iterate or not to iterate. IEEE Geosci. Remote Sens. Lett 2011, 8: 48.
 44.
Plaza A, Valencia D, Plaza J, Martinez P: Commodity clusterbased parallel processing of hyperspectral Imagery. J. Parall. Distr. Comput 2006, 66(3):345358. 10.1016/j.jpdc.2005.10.001
 45.
Plaza A, Chang CI: High Performance Computing in Remote Sensing. Boca Raton, FL: Taylor & Francis; 2007.
 46.
Plaza A, Du Q, Chang YL, King RL: High performance computing for hyperspectral remote sensing. IEEE J. Sel. Top. Appl. Earth Obser. Remote Sens 2011, 4(3):528544.
 47.
Lee CA, Gasster SD, Plaza A, Chang CI, Huang B: Recent developments in high performance computing for remote sensing: a review. IEEE J. Sel. Top. Appl. Earth Obser. Remote Sens 2011, 4(3):508527.
 48.
Plaza A, Chang CI: Clusters versus FPGA for parallel processing of hyperspectral imagery. Int. J. High Perfor. Comput. Appl 2008, 22(4):366385. 10.1177/1094342007088376
 49.
Gonzalez C, Resano J, Mozos D, Plaza A, Valencia D: FPGA implementation of the pixel purity index algorithm for remotely sensed hyperspectral image analysis. EURASIP J. Adv. Signal Process 2010, 969806: 113.
 50.
Gonzalez C, Mozos D, Resano J, Plaza A: FPGA implementation of the NFINDR algorithm for remotely sensed hyperspectral image analysis. IEEE Trans. Geosci. Remote Sens 2012, 50(2):374388.
 51.
Setoain J, Prieto M, Tenllado C, Tirado F: GPU for parallel onboard hyperspectral image processing. Int. J. High Perfor. Comput. Appl 2008, 22(4):424437. 10.1177/1094342007088379
 52.
Hsueh M, Chang CI: Field programmable gate arrays (FPGA) for pixel purity index using blocks of skewers for endmember extraction in hyperspectral imagery. Int. J. High Perfor. Comput. Appl 2008, 22: 408423. 10.1177/1094342007088378
 53.
Tarabalka Y, Haavardsholm TV, Kasen I, Skauli T: Realtime anomaly detection in hyperspectral images using multivariate normal mixture models and GPU processing. J. RealTime Image Process 2009, 4: 114. 10.1007/s1155400901126
 54.
Yang H, Du Q, Chen G: Unsupervised hyperspectral band selection using graphics processing units. IEEE J. Sel. Top. Appl. Earth Obser. Remote Sens 2011, 4(3):660668.
 55.
Goodman JA, Kaeli D, Schaa D: Accelerating an imaging spectroscopy algorithm for submerged marine environments using graphics processing units. IEEE J. Sel. Top. Appl. Earth Obser. Remote Sens 2011, 4(3):669676.
 56.
Mielikainen J, Huang B, Huang A: GPUaccelerated multiprofile radiative transfer model for the infrared atmospheric sounding interferometer. IEEE J. Sel. Top. Appl. Earth Obser. Remote Sens 2011, 4(3):691700.
 57.
Christophe E, Michel J, Inglada J: Remote sensing processing: from multicore to GPU. IEEE J. Sel. Top. Appl. Earth Obser. Remote Sens 2011, 4(3):643652.
 58.
Sanchez S, Paz A, Martin G, Plaza A: Parallel unmixing of remotely sensed hyperspectral images on commodity graphics processing units. Concur. Comput. Pract. Exp 2011, 23(13):15381557. 10.1002/cpe.1720
 59.
Chang CI, Du Q: Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens 2004, 42(3):608619. 10.1109/TGRS.2003.819189
 60.
BioucasDias JM, Nascimento JMP: Hyperspectral subspace identification. IEEE Trans. Geosci. Remote Sens 2008, 46(8):24352445.
 61.
Lopez S, Horstrand P, Callico GM, Lopez JF, Sarmiento R: A lowcomputationalcomplexity algorithm for hyperspectral endmember extraction: modified vertex component analysis. IEEE Geosci. Remote Sens. Lett 2012, 9(3):502506.
 62.
Bernabé S, López S, Plaza A, Sarmiento R: GPU implementation of an automatic target detection and classification algorithm for hyperspectral image analysis. IEEE Geosci. Remote Sens. Lett 2013, 10(2):221225. 10.1109/LGRS.2012.2198790
 63.
Remón A, Sanchez S, Paz A, QuintanaOrtí ES, Plaza A: Realtime endmember extraction on multicore processors. IEEE Geosci. Remote Sens. Lett 2011, 8(5):924928.
 64.
Richards JA, Jia X: Remote Sensing Digital Image Analysis: An Introduction. New York: SpringerVerlag; 2006.
 65.
Chang CI: Hyperspectral Imaging: Techniques for Spectral Detection and Classification. New York: Kluwer Academic/Plenum Publishers; 2003.
 66.
Bateson CA, Asner GP, Wessman CA: Endmember bundles: a new approach to incorporating endmember variability into spectral mixture analysis. IEEE Trans. Geosci. Remote Sens 2000, 38(2):10831094. 10.1109/36.841987
 67.
Iordache MD, BioucasDias J, Plaza A: Sparse unmixing of hyperspectral data. IEEE Trans. Geosci. Remote Sens 2011, 49(6):20142039.
 68.
DaubeWitherspoon ME, Muehllehner G: An iterative image space reconstruction algorithm suitable for volume ECT. IEEE Trans. Med. Imag 1986, 5: 6166.
 69.
Lawson CL, Hanson RJ, Kincaid DR, Krogh FT: Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Soft 1979, 5(3):308323. 10.1145/355841.355847
 70.
Dongarra JJ, Du Croz J, Hammarling S, Hanson RJ: An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Soft 1988, 14: 117. 10.1145/42288.42291
 71.
Dongarra JJ, Du Croz J, Hammarling S, Duff I: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft 1990, 16: 117. 10.1145/77626.79170
 72.
Anderson E, Bai Z, Bischof C, Blackford LS, Demmel J, Dongarra JJ, Croz JD, Hammarling S, Greenbaum A, McKenney A, Sorensen D: LAPACK Users’ guide. Philadelphia: SIAM; 1999.
 73.
The OpenMP API specification for parallel programming [http://www.openmp.org/] []
 74.
NASA: Aviris (Airbone Visible infrared Imaging Spectrometer)  Data. aviris.jpl.nasa.gov/data/free_data.html
 75.
U.S. Geological Survey (U.S. Dep. of the Interior): USGS Spectroscopy Lab  Spectral Library. speclab.cr.usgs.gov/spectrallib.html.
 76.
NVIDIA Corporation: NVIDIA GTX580 specifications. [http://www.nvidia.com/object/productgeforcegtx580us.html] []
Acknowledgements
Enrique S QuintanaOrtí and Alfredo Remón were supported by the CICYT project TIN201123283 of the Ministerio de Economía y Competitividad and FEDER. Funding from the Spanish Ministry of Science and Innovation (CEOSSPAIN project, reference AYA201129334C0202) is also gratefully acknowledged.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Remón, A., Sánchez, S., Bernabé, S. et al. Performance versus energy consumption of hyperspectral unmixing algorithms on multicore platforms. EURASIP J. Adv. Signal Process. 2013, 68 (2013). https://doi.org/10.1186/16876180201368
Received:
Accepted:
Published:
Keywords
 Hyperspectral Image
 Hyperspectral Data
 Spectral Unmixing
 Energy Consumption Ratio
 Principal Component Transform