EURASIP Journal on Applied Signal Processing 2004:4, 1–8 c ○ 2004 Hindawi Publishing Corporation Optimization of Color Conversion for Face Recognition

This paper concerns the conversion of color images to monochromatic form for the purpose of human face recognition. Many face recognition systems operate using monochromatic information alone even when color images are available. In such cases, simple color transformations are commonly used that are not optimal for the face recognition task. We present a framework for selecting the transformation from face imagery using one of three methods: Karhunen-Loève analysis, linear regression of color distribution, and a genetic algorithm. Experimental results are presented for both the well-known eigenface method and for extraction of Gabor-based face features to demonstrate the potential for improved overall system performance. Using a database of 280 images, our experiments using these methods resulted in performance improvements of approximately 4% to 14%.


INTRODUCTION
Most single-view face recognition systems operate using intensity (monochromatic) information alone. This is true even for systems that accept color imagery as input. The reason for this is not that multispectral data is lacking in information content, but often because of practical considerations-difficulties associated with illumination and color balancing, for example, as well as compatibility with legacy systems. Associated with this is a lack of color image databases with which to develop and test new algorithms. Although work is in progress that will eventually aid in colorbased tasks (e.g., through color constancy [1]), those efforts are still in the research stage.
When color information is present, most of today's face recognition systems convert the image to monochromatic form using simple transformations. For example, a common mapping [2,3] produces an intensity value I i by taking the average of red, green, and blue (RGB) values (I r , I g , and I b , resp.): The resulting image is then used for feature extraction and analysis.
We argue that more effective system performance is possible if a color transformation is chosen that better matches the task at hand. For example, the mapping in (1) implicitly assumes a uniform distribution of color values over the entire color space. For a task such as face recognition, color values tend to be more tightly confined to a small portion of the color space, and it is possible to exploit this narrow concentration during color conversion. If the transformation is selected based on the expected color distribution, then it is reasonable to expect improved recognition accuracies. This paper presents a task-oriented approach for selecting the color-to-grayscale image transformation. Our intended application is face recognition, although the framework that we present is applicable to other problem domains.
We assume that frontal color views of the human face are available, and we develop a method for selecting alternate weightings of the separate color values in computing a single monochromatic value. Given the rich color content of the human face, it is desirable to maximize the use of this content even when full-color computation and matching is not used. As an illustration of this framework, we have used the Karhunen-Loève (KL) transformation (also known as principal components analysis) of observed distributions in the color space to determine the improved mapping.
Other work [4] has suggested that alternative color spaces provide no real benefit for locating skin in images because these spaces do not increase the separability of the skin and nonskin classes. However, to extract features for face recognition, we do not wish to discriminate skin from nonskin regions, but rather to extract meaningful image features within the skin area. Queisser [5] used the properties of color distributions of a set of similar images to select a new color space for object classification. Abbott and Zhao [6,7] developed a color-space quantization approach for the recognition of naturally textured objects, but did not consider that for face recognition. Torres has demonstrated that color information can provide additional accuracy for the "eigenface" approach [8], although there is no discussion of optimal color representation. Heseltine et al. [9] measured the performance effect on eigenface-based face recognition of a number of preprocessing techniques, including several color transformations (RGB to hue, brightness-insensitive hue, etc.) and found that these color methods actually degraded the recognition accuracy. However, the techniques that they explored were general color transformations that were not based on the content of the images.
The remainder of this paper is organized as follows. Section 2 presents our approach for using KL analysis to determine a suitable single color axis for a given set of RGB images, and Section 3 presents experimentally derived color transformation data using this method. In Section 4, we investigate the use of KL analysis on color data in CIE L-a-b format. Section 5 describes an alternative method based on linear regression analysis of RGB pixel data, while Section 6 discusses our experimental use of a genetic algorithm to select the color conversion. Section 7 presents the face recognition accuracy improvement observed with the eigenface method from using the KL derived color transformation, and Section 8 describes the effect of the optimal color conversion on feature vectors ("jets") extracted using complex Gabor filters. Finally, Section 9 presents concluding remarks.

KL COLOR CONVERSION-RGB
Pixels in the original color image can be represented as the vector I(x, y) = [Ir(x, y) I g (x, y) I b (x, y)] T , where the r, g, and b subscripts denote the red, green, and blue color planes, respectively. As described in (1), face recognition systems typically use an intensity plane derived as I i (x, y) = 1/3[1 1 1]I(x, y). We propose that human face images exhibit common characteristics that can be exploited in the conversion from a full-color representation to a monochrome image. In the hue-saturation plane, for example, face pixels from a mixture of ethnic groups are well clustered [10], with only the intensity plane varying markedly. This suggests that the standard intensity plane is in fact more sensitive to variation due to ethnic type, which is undesirable.
To determine an improved linear transformation, we want to find the optimum transformation vector w such that M(x, y) = w T I(x, y), where I is the original color image and M is the resulting single-plane image. We make the assump-tion that the optimum transformation corresponds closely to the expected distribution of pixel values within the original color space. With this in mind, it is possible to select w by using the KL transformation to determine the projection with uncorrelated axes. The resulting color space has been called the "Karhunen-Loève color space" for an unspecified pixel population [11,12]; here, we specifically restrict it to the face area. For a given distribution of pixel values, the eigenvector corresponding to the largest eigenvalue defines the direction along which the data is the least correlated, and therefore most likely to be of use in recognition tasks.
The KL transformation is determined from the covariance matrix of the distribution. For this application, the input datum is the ensemble of pixel values from a set of training images, taken from the region containing the face. We form the covariance matrix S as follows: where p is the collection of M color pixel vectors. The KL transformation is then given by the eigenvectors {u i } of S, concatenated into the matrix U = [u1 u 2 u 3]. The eigenvector u 1 , associated with the largest eigenvalue, is of primary interest here; it represents the direction of most variability in the data within the original space. Projection of RGB values onto this axis represents a color-to-grayscale conversion with the highest potential for discrimination. The normalization of the conversion vector w requires consideration. A unit vector will, by definition, not change the magnitude of the vector quantity that it operates on. However, this is not appropriate for conversion of threecomponent color quantities (where each component can range up to full scale) to monochrome, since any three-color vector with magnitude greater than unity will saturate in the monochrome plane. We prevent saturation by normalizing the vector having RGB components at full scale to a magnitude of 1. Therefore, the conversion vectors that we compute are normalized by

RESULTS OF KL ANALYSIS ON RGB DATA
The images used in this study are frontal-view, color face images from two databases (described in [13,14]). Each image is of size 240 rows by 300 columns. Prior to this study, the images were spatially registered so that the centers of the eye sockets are at fixed locations, the line between the eye centers is horizontal, and the distance between eye centers is 60 pixels, in accordance with developing standards for face recognition image interchange [15]. No effort was made to color-correct or contrast-equalize the images. To determine the color conversion that is most suited for the face features, we process only a portion of the face image that represents the area of the face with minimal included background and hair. The extent to be processed, a region 90 pixels wide by 140 pixels high, is indicated in Figure 1.
The KL analysis described in Section 2 yields an eigenvector u 1 describing the axis of projection with the largest variance in the original data, which we call the conversion vector. Let the three components of this vector be represented by u 1 = [u11 u 12 u 13] T . Because this vector has unit length r = u 2 11 + u 2 12 + u 2 13 = 1, we can represent it using spherical coordinates and completely describe the color mapping by the two angular quantities θ and φ: To illustrate the meaningfulness of the transformation, several scatter diagrams are shown in Figure 2. Four collections of face images are represented as well as some natural images of random content. For each image, the color histogram was computed and the conversion vector u 1 obtained. The resulting conversion vectors are indicated as points in [φ, θ] space. Each face image collection consists of several sets of 21 images, each for a single individual. The natural images contain a mix of object types including landscapes, photographs of sporting events, and astronomy images.
It can be seen that the optimal color conversion vectors u 1 computed for the face images are distinct from those for more general natural images, indicating that the red, green, and blue color planes carry different degrees of information for the specific class of face images. The figure also indicates the position in this space of an equal-weighted color conversion, which appears to represent a good estimate for the optimal conversion for general natural images, but is not well suited for the face image collections. The selection of face databases used in our testing contain color distributions that generally correspond to [φ = 1.01, θ = 0.662], which in turn corresponds to a conversion vector of This should be compared with the equal-weighted values of [0.333 0.333 0.333] T . We observe that the KL procedure for these images results in a color space that more heavily weights the red color component than the green and blue. This indicates that face images contain more uncorrelated variation in the red plane than in the green or blue planes. Note that the preceding eigenvalue-eigenvector analysis concerns only the color-to-monochrome conversion process, and is independent of the face recognition approach that is Figure 1: Illustration of image extent to be processed for color conversion and recognition. This monochrome image is an "average" image.  used. We propose that any face recognition technique could benefit from a careful examination of the initial conversion from color to monochrome images.

KL COLOR CONVERSION-L-a-b
RGB is not always the most convenient space in which to process color information. The CIE tristimulus system represents a color in terms of its three coordinates relative to a reference color, usually a standard illuminant [16]. However, equal distances in the XYZ space are perceived as unequal, so the L-a-b color space is defined so that color distances are perceived as linear.
The L-a-b space is defined as follows [16]: Our KL-based approach for selecting the color conversion produces a linear transformation of the RGB color values; thus, we could expect that using the KL process on the XYZ values would produce the same result within computation accuracy. However, the relation between RGB and La-b is nonlinear, and the L-a-b space is in some sense more relevant to human perception, so that application of the KL procedure defined in Section 2 would be expected to produce useful results.
In fact, as can be seen in Figure 3, the KL transformation on L-a-b data does not yield distinctive data for face pixels as opposed to image pixels from more general scenes. This suggested that the "optimal" color conversion obtained from L-a-b data does not provide any beneficial added feature content. Experimentation with the eigenvalues of face images converted to L-a-b representation, and then projected onto the axis found by using KL on the resulting histogram data (as described in Section 7) showed that this was the case; information contained in the most significant n axes was not greater (and in fact frequently less) than that for the L plane of the corresponding L-a-b images. It is possible that a transformation resulting in a linear perception of color distance inherently concentrates useful detail information in the L plane.

COLOR CONVERSION THROUGH LINEAR REGRESSION
Queisser discusses (in [5]) the use of a least-squared-error line-fit to RGB data to define a new color axis that is best suited to images of a particular class of object. In his study, images of wood panels and food products were shown to be more suited for object detection and inspection in the resulting single-color plane than in any of the HSI axes. The other axes relate to additional magnitude and chromaticity information.
We consider a similar approach in the RGB space. We performed least-squared-error fits to our RGB data with the added constraint that the new axis of projection should pass through the RGB origin. The purpose of this is to force a pixel 2  with zero in all color planes to map to a black pixel in the new space. The transformation matrix is as follows: Applying (7) to the pixel data from the face box areas of the sample databases, we obtain the data presented in Figure 2 as the "line-fit" data. As before, we are only interested in the primary axis, β in this transformation. The results are very similar to those obtained by the KL method, but with much lower computational cost because only the red, green, and blue sample means are required.

COLOR CONVERSION THROUGH GENETIC ALGORITHM SEARCH
To further investigate the determination of the color projection by optimizing the face recognition accuracy, we applied a genetic algorithm to the color vector selection process. Each individual in the population consisted of a [φ, θ] pair as defined in (3). The optimization algorithm had the following properties: (i) population size of 100; (ii) breeding by averaging of [φ, θ] values; (iii) population initialized with random values; (iv) "roulette wheel" selection model with elitism (the best two candidates in each generation will persist [17]); (v) mutation by perturbation of a random individual (probability of mutation was 0.005); This study therefore attempted to maximize the performance of the face recognition system as simulated by the sum of the largest 8 eigenvalues.
The results of 100 generations of testing on one sample database are summarized in Table 1. The testing data suggested that the error surface was slowly changing and not unimodal. After only 100 iterations, the convergence is clearly dominated by the effect of mutation rather than breeding. The resulting vector differs in error from the results obtained by KL computation by only 0.00077.
The advantage of the genetic algorithm for this purpose is its flexibility in that it is possible to define the error function in terms of any computable metric of overall system performance. For example, this could be biased toward a particular combination of Type I (false positive) and Type II (false negative) errors of recognition on a given database. The major disadvantage of this method is its computational requirements. For a relatively modest database of fifty individuals, 100 generations took more than six hours to run on a 1 GHz Pentium III machine. In addition, the genetic algorithm has unpredictable convergence behavior and a set of performance parameters that may require tuning. Our experimentation with a GA roughly confirmed the earlier computed results.

EFFECT OF OPTIMIZED COLOR CONVERSION ON FACE RECOGNITION ACCURACY
To evaluate the effect of our color conversion method on face recognition accuracy, we considered the effect on performance of the well-known eigenface method [18,19]. This technique uses principal components analysis of a collection of face images, treated as one-dimensional vectors, to determine the linear combinations of pixel locations that form the best projective axes for the collection. Early work in this area focused on the use of a small set of these projections to adequately represent a face image, while later work (beginning around 1990) applied this same technique to recognition. The new "face space" defined by the M most significant basis vectors, called "eigenfaces", is used for pattern recognition based on a distance measure. For any principal component analysis, the ratio of an eigenvalue to the sum of all the eigenvalues is proportional to the mean squared error implied by exclusion of the corresponding eigenvector [20]. Thus, we can examine the cumulative sum of eigenvalues 1 through n, plotted versus n, to compare the information contained in the first n eigenfaces (the "principal components"). In this way, we can predict the performance of the eigenface method on the two databases. Table 2 shows the individual and cumulative eigenvalues for a typical database of face images. Figure 4 shows a plot of the cumulative eigenvalues, which gives a measure of the accuracy achievable by truncating all higher eigenvalues. Using the optimized color conversion produces a modest, yet consistent, improvement in the potential accuracy. The increased information contained is more pronounced for the more significant eigenvalues.
By comparison, we also evaluated the magnitude of the initial eigenvectors for the eigenface method when using the line-fit method described in Section 5. The cumulative eigenvalues computed by using the β axis as the new image plane are shown in Table 3 and exhibit a similar increase in information in the lowest eigenfaces. In fact, for all of the databases we examined, the use of the line fit gave essentially equal performance as measured by the normalized eigenvalues.
For confirmation of these predictions of increased performance, we measured the face recognition accuracy on a complete eigenface recognition implementation. We will not describe the specifics of the eigenface method here as they are covered well in [18,19]. For our test, a training phase and a test phase were implemented. The training phase computes the desired transformation by solving for the eigenvalues of the matrix composed of the concatenation of the training images. Testing is performed by applying this transformation to a set of probe images of the same individuals and measuring the Euclidean distance from the probe image data to the exemplars of each individual, defined as the average in "face space" of each training image of that individual. The probe images were not present in the training set. Note that the eigenface implementation was fairly simplistic; our objective was not to achieve overall high recognition accuracy but to measure the effect of using our color conversion.
To measure the performance in a consistent fashion, we adopted the method used in the NIST FERET studies [21]. The results for each probe image are ranked in order of increasing Euclidean distance. The performance score for a particular experiment R n is defined to be the ratio of the number of times that the correct identity is in the top n candidates (the n nearest exemplars to the probe image) to the total number of probe images tested. Table 4 summarizes the eigenface performance for three values of n (2, 5, and 10) for a particularly difficult database of 280 images. Many of the images exhibit poor contrast, and there is significant variation in expression by the human subjects. Two sets of results are shown: the first for a typical equal-weighted conversion from RGB to monochrome and the second for a transformation vector derived using the KL procedure described above. The results show significant improvements in performance scores (roughly in the range of 6% to 14%) when the KL conversion was used. Although the database was relatively small, and therefore care must be taken in extrapolating these accuracy values to larger sets, they provide a strong indication that the color conversion   Figure 4: Comparison of cumulative eigenvalues for the eigenface procedure. The optimized RGB to monochrome conversion results in more significant information in the first n eigenfaces.
process can have a sizable impact on face recognition performance. Because the face images had a noticeable increase in contrast as a result of the KL derived RGB to monochrome transformation, there was a concern that the KL derived method was doing no more than could be obtained from a common histogram equalization on the color image. To explore this idea, the eigenface performance was also measured with and without the use of a histogram equalization preprocessing step. Each color plane in the original RGB space was enhanced using a standard 256-to-64-bin histogram flattening procedure. The results show that, rather than a performance increase similar to that obtained from the optimized color conversion, the histogram equalization actually produced a severe decrease in accuracy. It is believed that this is due to the global nature of the process, which may have resulted in a suppression of the facial features that are useful for recognition. We conclude that color histogram equalization is not a useful preprocessing step for eigenface face recognition, regardless of the choice of method for color transformation.

EFFECT ON FACE FEATURE DISCRIMINABILITY USING GABOR FILTERS
Another technique for face recognition is based on the application of a family of Gabor filters to monochrome face images [22,23,24]. A two-dimensional Gabor filter is a directed complex sinusoid in the image plane, decaying exponentially as a function of distance from the filter's origin. At a set of preselected locations on the face, Gabor filters at various related directions and sinusoidal frequencies are applied and the complex responses are assembled into a feature vector known as a "jet". Several techniques exist for performing face recognition using these jets. We have evaluated the potential improvement in Gaborbased methods from the use of an optimized color transformation by evaluating the relative distances between the Gabor jets from the same point on different faces, with and without the use of the KL derived color transformation. To obtain the interjet distances, we consider the jets as 80element vectors and determine the Mahalanobis distance by the usual method. To measure the effectiveness of a set of jets for face discrimination, we consider (at each facial landmark) the ratio of the minimum interjet distance between two different faces to the maximum interjet distance, as well as the ratio of the minimum interjet distance to the average of all interjet distances for that landmark. Ratios were used to provide some normalization.
When the KL derived color transformation is used, the min-to-max ratio improved by 4.4% on a set of ten facial landmarks over the test database, while the min-to-average Table 4: Improvement in face recognition performance with new color conversion procedure. The monochrome eigenface recognition procedure was used on a database of 280 color images. The second and third columns show the recognition accuracy values that were obtained when the images were color-converted with the standard equal-weight method and with our KL method, respectively. ratio increased by 6%. Interestingly, the average interjet distance actually decreased slightly, indicating that the minimum interjet distances were larger than when the usual monochrome intensity images were used. This is an initial indication that Gabor-based methods may have greater discrimination between different individuals when the KL derived color-to-monochrome transformation is used, since the underlying features are more distinctive.

CONCLUSIONS
This paper has presented a new approach for converting color images to monochromatic form. By tailoring the conversion process to the needs of a particular task, such as human face recognition, it is possible to improve the overall system performance. Most existing face recognition systems operate using monochromatic information alone, even when color information is available. In such cases, a simple and suboptimal conversion process is typically used. We argue that recognition accuracies can be improved if the color-conversion process is selected based on the expected color distributions. We explored three such approaches to determine an improved mapping empirically: Karhunen-Loève analysis of the color pixel distributions, a least-squared-error line fit in RGB space, and a genetic algorithm.
The color-conversion method presented in this paper is independent of the actual face recognition approach that is used. For testing purposes, however, we have used the wellknown eigenface method. Our experiments using the eigenface method for recognition resulted in performance improvements in the range of approximately 6% to 14% for a database of 280 color images. Relative distance measurements of Gabor jets of the face area also showed an increase in discriminability of 4% to 6%. Evaluation of the cumulative eigenvalues produced by an eigenface analysis of intensity images and images converted to grayscale form using the computed conversion vector showed a modest yet consistent improvement in the potential accuracy in retaining only the most important n basis vectors.