Data Fusion Boosted Face Recognition Based on Probability Distribution Functions in Different Colour Channels

A new and high performance face recognition system based on combining the decision obtained from the probability distribution functions (PDFs) of pixels in di ﬀ erent colour channels is proposed. The PDFs of the equalized and segmented face images are used as statistical feature vectors for the recognition of faces by minimizing the Kullback-Leibler Divergence (KLD) between the PDF of a given face and the PDFs of faces in the database. Many data fusion techniques such as median rule, sum rule, max rule, product rule, and majority voting and also feature vector fusion as a source fusion technique have been employed to improve the recognition performance. The proposed system has been tested on the FERET, the Head Pose, the Essex University, and the Georgia Tech University face databases. The superiority of the proposed system has been shown by comparing it with the state-of-art face recognition systems.


Introduction
The earliest work in computer recognition of faces was reported by Bledsoe [1], where manually located feature points are used. Statistical face recognition systems such as principal component analysis-(PCA-) based eigenfaces introduced by Turk and Pentland [2] attracted a lot of attention. Belhumeur et al. [3] introduced the fisherfaces method which is based on linear discriminant analysis (LDA).
Many of these methods are based on greyscale images; however colour images are increasingly being used since they add additional biometric information for face recognition [4]. Colour PDFs of a face image can be considered as the signature of the face, which can be used to represent the face image in a low-dimensional space. Images with small changes in translation, rotation, and illumination still possess high correlation in their corresponding PDFs, which prompts the idea of using PDFs for face recognition.
PDF of an image is a normalized version of an image histogram. Hence the published face recognition papers using histograms indirectly use PDFs for recognition, there is some published work on application of histograms for the detection of objects [5]. However, there are few publications on application of histogram or PDF-based methods in face recognition: Yoo and Oh used chromatic histograms of faces [6]. Ahonen et al. [7] and Rodriguez and Marcel [8] divided a face into several blocks and extracted the Local Binary Pattern (LBP) feature histograms from each block and concatenated into a single global feature histogram to represent the face image; the face was recognized by a simple distance based grey-level histogram matching. Demirel and Anbarjafari [9] introduced high performance pose invariant face recognition system based on greyscale histogram of faces, where the cross-correlation coefficient between the histogram of the query image and the histograms of the training images was used as a similarity measure.
Face segmentation is one of the important preprocessing phases of face recognition. There are several methods for this task such as skin tone-based face detection for face segmentation. Skin is a widely used feature in human image processing with a range of applications [10]. Human skin can be detected by identifying the presence of skin colour pixels. Many methods have been proposed for achieving this. Chai and Ngan [11] modelled the skin colour in YCbCr colour space. One of the recent methods for face detection is proposed by Nilsson et al. [12] which is using local Successive   In the present paper, the local SMQT algorithm has been adopted for face detection and cropping in the preprocessing stage. Colour PDFs in HSI and YCbCr colour spaces of the isolated face images are used as the face des-criptors. Face recognition is achieved using the Kullback-Leibler Divergence (KLD) between the PDF of the input face and the PDFs of the faces in the training set. Different data and source fusion methods have been used to combine the decision of the different colour channels to increase the recognition performance. In order to reduce the effect of the illumination, the singular value decomposition-based image equalization has been used. Figure 1 illustrates the phases of the proposed system which combines the decisions of the classifiers in different colour channels for improved recognition performance.
The system has been tested on the Head Pose (HP) [13], FERET [14], Essex University [15] and the Georgia Tech University [16] face databases where the faces have more varying background and illumination than pose changes.

Preprocessing of Face Images
There are several approaches used to eliminate the illumination problem of the colour images [17]. One of the most EURASIP Journal on Advances in Signal Processing     frequently used and simplest methods is to equalize the colour image in RGB colour space by using histogram equalization (HE) in each colour channel separately. Previously we proposed singular value equalization (SVE) technique which is based on singular value decomposition (SVD) to equalize an image [18,19]. In general, for any intensity image matrix Ξ A , A = {R, G, B}, SVD can be written as where U A and V A are orthogonal square matrices (hanger and aligner matrices), and Σ A matrix contains the sorted singular values on its main diagonal (stretcher matrix). As reported in [20], Σ A represents the intensity information of a given image intensity matrix. If an image is a low contrast image this problem can be corrected to replace the Σ A of the image with another singular matrix obtained from a normal image with no contrast problem. Any pixel of an image can be considered as a random value with distribution function of Ψ. According to the central limit theorem (CLT), the normalized sum of a sequence of random variables tends to have a standard normal distribution with mean 0 and standard deviation 1, which can be formulated as follows: Hence a normalized image with no intensity distortion (i.e., no external condition forces the pixel value to be close to a specific value, thus the distribution of each pixel is identical) has a normal distribution with mean of 0 and variance of 1. Such a synthetic matrix with the same size of the original image can easily be obtained by generating random pixel values with normal distribution with mean of 0 and variance of 1.
Then the ratio of the largest singular value of the generated normalized matrix over a normalized image can be calculated according to where Σ N(μ=0,σ=1) is the singular value matrix of the synthetic intensity matrix. This coefficient can be used to regenerate a new singular value matrix which is actually an equalized intensity matrix of the image generated by where Ξ equalizedA is representing the equalized image in Acolour channel. As (4) states, the equalized image is just a multiplication of ξ A with the original image. From the computational complexity point of view singular value decomposition of a matrix is an expensive process which takes quite significant amount of time to calculate the orthogonal matrices of U A and V A while they are not being used in the equalization process. Hence, finding a cheaper method to obtain ξ can be an improvement to the technique. Recall where λ max is the maximum eigenvalue of A T A. By using SVD, This follows that the eigenvalues of A T A are the square of elements of the main diagonal of Σ, and that the eigenvector where λ i is the ith eigenvalue of A. Thus, The 2-norm of a matrix is equal to the largest singular value of the matrix. Therefore ξ A can be easily obtained from where Ξ N(μ=0,σ=1) is a random matrix with mean of 0 and variance of 1, and Ξ A is the intensity image in R, G, or B.
Hence the equalized image can be obtained by which shows there is no need to use singular value decomposition of intensity matrices. This procedure eases the equalization step. Note that, Ξ A is a normalized image with intensity values between 0 and 1. After generation of Ξ N , it is normalized such that the values are between 0 and 1. This task which is actually equalizing the images of a face subject will eliminate the illumination problem. Then, this new image can be used as an input for the face detector prepared by Nilsson [21] in order to segment the face region and eliminate the undesired background.
The local successive mean quantization transform (SMQT) can be explained as follows. The SMQT can be considered as an adjustable tradeoff between the number of quantization levels in the result and the computational load [22]. Local is defined to be the division of an image into blocks with a predefined size. Let x be a pixel of local D, and let us have the SMQT transform as follows: where M(x) is a new set of values which are insensitive to gain and bias [22]. These two properties are desired for the formation of the intensity image which is a product of reflection and illumination. A common approach to separate the reflection and illumination is based on this assumption that illumination is spatially smooth so that it can be taken as a constant in a local area. Therefore each local pattern with similar structure will yield the similar SMQT features for a specified level, L. The spare network of winnows (SNoWs) learning architecture is also employed in order to create a look-up table for classification. As Nilsson et al. proposed in [22], in order to scan an image for faces, a patch of 32 × 32 pixels is used and also the image is downscaled and resized with a scale factor to enable the detection of faces with different sizes. The choice of the local area and the level of the SMQT are vital for successful practical operation. The level of the transform is also important in order to control the information gained from each feature. As reported in [22] the 3 × 3 local area and level L = 1 are used to be a proper balance for the classifier. The face and nonface tables are trained in order to create the split up SNoW classifier. Overlapped detections are disregarded using geometrical locations and classification score. Hence given two detections overlapping each other, the detection with the highest classification score is kept and the other one is removed. This operation is repeated until no overlapping detection is found.
The segmented face images are used for the generation of PDFs in H, S, I, Y , Cb, and Cr colour channels in HSI and YCbCr colour spaces. If there is no face in the image, then there will be no output from the face detector software, so it means the probability of having a random noise which has the same colour distribution of a face but with different shape is zero, which makes the proposed method reliable. The proposed equalization has been tested on the Oulu face database [23] as well as the FERET, the HP, the Essex University, and the Georgia Tech University face databases. Figure 2 shows the general required steps of the preprocessing phase of the proposed system.

Colour Images versus Greyscale Images
Usually many face recognition systems use greyscale face images. From the information point of view a colour image has more information than a greyscale image. So we propose not to lose the available amount of information by converting a colour image into a greyscale image. In order to compare the amount of the information in a colour and greyscale images, the entropy of an image can be used, which can be calculated by where H measures the information of the image. The average amount of information measured by using 2650 face images of the FERET, HP, Essex University, and Georgia Tech University face databases is shown in Table 1. The entropy values indicate that there is significant amount of information in different colour channels which should not be simply ignored by only considering the greyscale image.

PDF-Based Face Recognition
The PDF of an image is a statistical description of the distribution in terms of occurrence probabilities of pixel intensities, which can be considered as a feature vector representing the image in a lower-dimensional space [18]. In a general mathematical sense, an image PDF is simply a mapping η i representing the probability of the pixel intensity levels that fall into various disjoint intervals, known as bins. The bin size determines the size of the PDF vector. In this work the bin size is assumed to be 256. Given a monochrome image, PDF η j meet the following conditions, where N is the total number of pixels in an image: Then, PDF feature vector, H, is defined by where η i is the intensity value of a pixel in a colour channel, and N is total number of pixels in an intensity image. Kullback-Leibler Divergence can be used to measure the distance between the PDF of two images, although in general it is not a distance metric. Kullback-Leibler Divergence is sometimes referred as Kullback-Leibler Distance (KLD) as well [24]. Given two PDF vectors p and q the KLD, κ, is defined as  where β is the number of bins, and M is the number of images in the training set. In order to avoid the three undefined possibilities: division by zero in log(q j / p i j ) where p i j = 0, or log(0) where q j = 0, or both situation together, we have modified the formula into the following form: where δ 1/β, for example, δ = 10 −7 . One should note that for an image the p i j , q j ∈ Z + , that is,their minimum value is zero and the maximum value can be the number of pixels in an image. Then, a given query face image, the PDF of the query image q can be used to calculate the KLD between q and PDFs of the images in the training samples as follows: Here, χ r is the minimum KLD reflecting the similarity of the rth image in the training set and the query face. The image with the lowest KLD distance from the training face images is declared to be the identified image in the set. Figure 3 shows two subjects with two different poses and their segmented faces from the FERET face database which is well known in terms of pose changes and also the images have different backgrounds with slight illumination variation. The intensity of each image has been equalized by using SVE to minimize the illumination effect. The colour PDFs used in the proposed system are generated only from the segmented face, and hence the effect of background regions is eliminated. The performance of the proposed system is tested on the FERET, the HP, Essex University, and Georgia Tech University face databases with changing poses, background, and illumination, respectively. The details of these databases are given in Results and Discussions section. The faces in those datasets are converted from RGB to HSI and YCbCr colour spaces, and the data set is divided into training and test sets. In this setup the training set contains n images per subject, and the rest of the images, are used for the test set.

Fusion of Decision in Different Colour Channels
The face recognition procedure explained in the previous section can be applied to different colour channels such as H, S, I, Y, Cb, and Cr. Hence, given a face image the image can be represented in these colour spaces with dedicated colour PDFs for each channel. Different colour channels contain different information regarding the image; therefore all of these six PDFs can be combined to represent a face image. There are many techniques to combine the resultant decision.
In this paper, sum rule, median rule, max rule, product rule, majority voting, and feature vector fusion methods have been used to do this combination [25]. These data fusion techniques use probability of the decisions they provide through classifiers. That is why it is necessary to calculate the probability of the decision of each classifier based on the minimum KLD value. This is achieved by calculating the probability of the decision in each colour channel, κ C , which can be formulated as follows: where σ C is the normalized KLD values, n shows the number of face samples in each class, and M is the number of classes. The highest similarity between two projection vectors is when the minimum KLD value is zero. This represents a prefect match, that is, the probability of selection is 1. So zero Euclidean distance represents probability of 1 that is why σ C has been subtracted from 1. The maximum probability corresponds to the probability of the selected class. The sum rule is applied, by adding all the probabilities of a class in different colour channels followed by declaring the class with the highest accumulated probability to be the selected class. The maximum rule, as its name implies, simply takes the maximum among the probabilities of a class in different colour channels followed by declaring the class with the highest probability to be the selected class. The median rule is similarly takes the median among the sorted probabilities of a class in different channels. The product rule is achieved from the product of all probabilities of a class in different colour channels. It is very sensitive as a low probability (close to 0) will remove any chance of that class being selected [25]. Majority voting (MV) is another data fusion technique. The main idea behind MV is to achieve increased recognition rate by combining decisions of different colour channels. The MV procedure can be explained as follows. Given the probability of the decisions, κ C , in all colour channels (C : H, S, I, Y , Cb, Cr), the highest repeated decision among all channels is declared to be the overall decision.
Data fusion is not the only way to improve the decision making. PDFs vectors can also be simply concatenated with the feature vector fusion (FVF) process which is a source fusion technique and can be explained as follows. Consider {p 1 , p 2 , . . . , p M } C to be a set of training face images in colour channel C (H, S, I, Y , Cb, Cr), then for a given query face image, the f v f q is defined as a vector which is the combination of all PDFs of the query image q as follow: This new PDF can be used to calculate the KLD between f v f q and f v f pi of the images in the training samples as follows: where M is the number of images in the training set. Thus, the similarity of the rth images in the training set and the query face can be reflected by χ r , which is the minimum KLD value. The image is with the lowest KLD distance in a channel; χ r is declared to be the vector representing the recognized subject. With the proposed system using PDFs in different colour channels as the face feature vector, discussed ensemble-based systems in decision making have been tested on the FERET, the Essex University, the Georgia tech university, and the HP face databases. The correct recognition rates in percent are included in Table 4. Each result is the average of 100 runs, where we have randomly shuffled the faces in each class.

Results and Discussions
The experimental results have been achieved by testing the system on the following face databases: The HP face database containing 150 faces of 15 classes with 10 different rotational poses varying from −90 • to +90 • for each class, a subset of the FERET face database containing 500 faces of 50 classes with 10 different poses varying from −90 • to +90 • for each class, the Essex University face database containing 1500 faces of 150 classes with 10 different slightly varying poses and illumination changes, and the Georgia Tech University face database containing 500 faces of 50 classes with 10 different varying poses, illumination, and background. The correct recognition rates in percent of the aforementioned face databases using PDF-based face recognition system in different colour channels are shown in Table 2. Each result is the average of 100 runs, where we have randomly shuffled the faces in each class. It is important to note that the performance of each colour channel is different, which means that a person can be recognized in one channel where the same person may fail to be recognized in another channel.
In order to show the superiority of proposed PDFbased face recognition over PCA-based face recognition in each colour channel, the performance of PCA-based face recognition system on the aforementioned face databases in different colour channels is shown in Table 3.
The results of the proposed system using data and source fusion techniques, for different face databases have been shown in Table 4. The results show that the performance of the product rule dramatically drops when the number of images per subject in the training set is increasing, this is because by increasing the number of training images per subject, the probability of having a low probability will be increased, so one low probability is enough to cancel the effect of several high probabilities. The median rule is marginally better than sum rule in some occasion but from computational complexity point of view the median rule is more expensive than the sum rule, because it requires sorting. The marginal improvement of the median rule is due to this fact that having only one out of range probability will not affect the median, though it will affect the sum rule. The minimum rule has not been discussed in the work, as it is not logical to give priority to the decisions which have a low probability of occurrence. The same data fusion techniques have been applied to the PCA-based system in different colour channels to improve the final recognition rate. The recognition rates have been stated in Table 5. A comparison between Table 4 and Table 5 indicates the high performance of the proposed system.
In order to show the superiority of the proposed method on available state-of-art and conventional face recognition systems, we have compared the recognition rate with conventional PCA-based face recognition system and state-of-art techniques such as Nonnegative Matrix Factorization (NMF) [26,27], supervised incremental NMF (INMF) [28], LBP [8], and LDA [3] based face recognition systems for the FERET face database. The experimental results are shown in Table 6. In Figure 4, the graphical illustration of the superiority of the proposed data fusion boosted colour PDF-based face recognition system over the aforementioned face recognition systems. Performance was achieved on FERET face database by two selected data fusion techniques FVF and median rule. The results clearly indicate that this superiority is achieved by using PDF-based face recognition in different colour channels backed by the data fusion techniques.
In an attempt to show the effectiveness of the proposed SVD-based equalization technique, the comparison between the proposed method and HE on the final recognition scores is shown in Table 7. As the results indicate, HE is not a suitable preprocessing technique for the proposed face recognition system, due to the fact that it transforms the input image such that the PDF of the output image has uniform distribution. This process dramatically reshapes the PDFs of the segmented face images, which results in poor recognition performance.

Conclusion
In this paper we introduced a high performance face recognition system based on combining the decision obtained from PDFs in different colour channels. A new preprocessing procedure was employed to equalize the images. Furthermore local SMQT technique has been employed to isolate the faces from the background, and KLD-based PDF matching is used to perform face recognition. Minimum KLD between the PDF of a given face and the PDFs of the faces in the database was used to perform the PDF matching. Several decision making techniques such as sum rule, minimum rule, median rule and product rule, majority voting, and feature vector fusion have been employed to improve the performance of the proposed PDF-based system. The performance clearly shows the superiority of the proposed system over the conventional and the state-of-art based face recognition systems.