On combining multi-normalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics

In this paper, we have considered the utility of multi-normalization and ancillary measures, for the optimal score level fusion of fingerprint and voice biometrics. An efficient matching score preprocessing technique based on multi-normalization is employed for improving the performance of the multimodal system, under various noise conditions. Ancillary measures derived from the feature space and the score space are used in addition to the matching score vectors, for weighing the modalities, based on their relative degradation. Reliability (dispersion) and the separability (inter-/intra-class distance and d-prime statistics) measures under various noise conditions are estimated from the individual modalities, during the training/validation stage. The ‘best integration weights’ are then computed by algebraically combining these measures using the weighted sum rule. The computed integration weights are then optimized against the recognition accuracy using techniques such as grid search, genetic algorithm and particle swarm optimization. The experimental results show that, the proposed biometric solution leads to considerable improvement in the recognition performance even under low signal-to-noise ratio (SNR) conditions and reduces the false acceptance rate (FAR) and false rejection rate (FRR), making the system useful for security as well as forensic applications.


Introduction
The recognition accuracy of a biometric system is highly sensitive to the quality of the biometric input. The noisy data can result in a significant reduction in the performance of the system. One of the main problems associated with biometric systems is the undesired variations in the biometric data. These variations can arise due to a variety of factors such as sensors used in capturing the biometric data and various non-ideal operating conditions such as background noise and non-uniform illumination [1][2][3][4][5][6][7][8][9]. Multimodal systems are more robust to environmental and sample quality variations due to the presence of multiple sources of evidences [10][11][12]. This is an added advantage of the multibiometric systems. Compared to the fingerprint systems, voice recognition *Correspondence: p090004ec@nitc.ac.in Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Calicut, Kerala, India-673601 systems are severely degraded by the presence of noise and intra-class variations. Also, they are strongly affected by the behavioural and physiological factors. These variations are often reflected in the matching scores, which in turn influence the overall efficiency of the biometric system [13]. All these factors make the reduction in error rates of the biometric system a challenging enterprise.
The score outputs of the classifier often show tremendous variations when presented with feature vectors corrupted by noise. In this scenario, some impostors will be able to obtain higher scores and the genuines will obtain lower scores, compared to the clean conditions, thereby increasing the FAR and FRR [11]. Most matchers have to deal with such situations in real time in spite of the enhancement algorithms and the feature sets they use. In order to reduce the classification http://asp.eurasipjournals.com/content/2014/1/10 errors, this paper aims at quantifying the amount of trust that can be given to the individual classifier's decision, taking into consideration the effect of environmental noise conditions and the behaviour of the classifiers on evaluation data. Here, we present a new combinational approach for fusing the scores derived from fingerprint and voice biometric matchers, with multi-normalization and the weighting measures derived from ancillary information. As the matching score values from the fingerprint and the voice matchers follow nonhomogeneous statistical distributions, we have employed tanh and min-max normalization techniques, respectively, for the complementary modalities [14]. Ancillary information includes measures indicating the quality of the acquired biometric samples or certain additional information about the user [11]. Here, the relative quality information from the individual classifiers are used in the fusion process.
In multimodal systems, confidence measures [15] are widely used as integration weights for biometric fusion. The weight vector represents the weight assigned to the matching score vectors. In a multibiometric system, the weight vector represents the relative importance of the different biometric matchers, provided that the scores of the matchers have been normalized [6]. The proposed technique involves emphasizing or de-emphasizing the matching scores of the individual modalities, depending on the estimate of their relative degradation. Let us assume that during a particular access attempt by the user, the fingerprint image is of poor quality but the voice samples are sufficiently good. In this case, we can assign a higher weight to the voice matching result and a lower weight to the fingerprint matching result. Even for the same biometric modality, different representations and matching algorithms may exhibit different levels of sensitivity to the quality of the biometric data [6]. The aim of this fusion technique is to combine the information from fingerprint and voice classifiers, such that, the resulting performance is greater than or equal to the performance of the best individual sources. Anything less than this is termed as 'catastrophic fusion' [16], and it is of course undesirable. The inter-/intraclass separability measures derived from the feature space and the reliability (dispersion) as well as the d-prime separability measures from the match score space are estimated separately for each noise condition in the training/validation phase using 'leave-one-out' cross validation technique. These measures are then algebraically combined to differentially weigh each subsystems to improve the performance of fusion. The basic assumption followed in this experiment is that, the fingerprint biometric trait has higher permanence than voice; hence, its performance under various noise conditions is not explored. As the quality of voice biometric degrades with noise, its performance under varying noise conditions is demonstrated by artificially degrading the training/testing samples with additive white Gaussian noise (AWGN). We have considered voice samples with noise contents varying from -10-to 20-dB SNR. We have compared the performance of the proposed method with the baseline techniques on score level fusion. The experimental results show that optimal integration weights estimated using multi-normalization and ancillary measures improves the performance of the multibiometric systems even under low SNR conditions.

Previous work
Though a lot of work has been done in biometric fusion, little has been done to improve the efficiency of the multimodal systems under various noise conditions. Lewis et al. shed some light on audio-visual speech recognition systems using dispersion measures as the integration weights [16]. These measures are based upon the values assigned to the individual classes by the matcher module. Poh et al. proposed a margin-derived confidence measure while fusing two system opinions [15]. Jain et al. examined the effect of different score normalization techniques on the performance of the multimodal biometric system [13]. Kryszczuk et al. proposed a method of performing multimodal fusion using face and speech data, combining signal quality measures and reliability estimates [17]. Bendris et al. introduced quality measures in audio-visual identity verification [18]. Alsaade et al. showed that score normalization and quality-based fusion improve the accuracy of multimodal biometrics [2]. Optimal integration weight estimation using least squares technique was reported in [19]. Reliability-based optimal integration weight estimation for audio-visual decision fusion was reported in [20]. In our earlier work, we presented an optimal integration weight estimation scheme for fingerprint and voice biometric under various noise conditions, without using ancillary information [21]. We also presented a reliability-based optimal integration weight estimation scheme for the fingerprint and voice modalities in [22]. In this paper, we propose an efficient integration weight optimization strategy incorporating both the reliability measures from the score space (dispersion measure) and the separability measures from the feature space (inter-/intra-class distance) and score space (d-prime statistic). The optimal integration weights are estimated using a multi-normalization framework [14].

Major contributions
The major contributions of this work are as follows: 1. We have proposed an efficient multi-normalization-based matching score http://asp.eurasipjournals.com/content/2014/1/10 preprocessing technique to transform the scores obtained from the individual modalities, for reducing the classification errors (the overlap between the genuine and the impostor score distribution). 2. Ancillary information such as reliability (from the score space) and separability (from the feature space and score space) measures are combined algebraically to find the 'best integration weights', (γ ) for fusion. Thus we have utilized the rate of relative degradation of the samples from the feature space and the expert score space for finding the 'best integration weights'. 3. The 'optimal integration weights' (β) are estimated in the training/validation stage. Standard optimization techniques such as grid search and random search techniques like genetic algorithm and particle swarm optimization algorithms are used to find the 'optimal integration weights'. The 'optimal integration weights' thus obtained in the training stage are used as the integration weights for fusion in the testing stage.
To the best of our knowledge, the proposed optimal fusion strategy using multi-normalization, ancillary measures, and optimization techniques has not been attempted until now.

Organization of the paper
The rest of this paper is structured as follows: First, in section 2, the modelling and pattern-matching approaches used with the fingerprint and voice modality are briefly discussed. The proposed method is discussed in section 3. The optimal fusion using ancillary measures is detailed in section 4. Experimental results are described in section 5. Finally, a brief summary is presented.

Fingerprint classifier
We have used the minutiae-based fingerprint matching technique [23] using ridge counting. Given two sets of minutiae from the template (T) and the input fingerprint (I) images, the matching algorithm compares the minutiae points in the two images and returns a degree of similarity. Each minutiae is represented as a triplet m = x, y, θ that indicates the x, y minutiae location coordinates and the minutiae angle θ. A minutiae m i in T and a minutiae m j in I are considered matching if the spatial distance (sd) between them is lesser than a given tolerance r 0 and the direction difference (dd) between them is lesser than an angular tolerance θ 0 [24].
Elastic matching algorithm is used to perform matching between the two fingerprints. Match score formula for the reference and the test print is given by [24] Matching score = 100N pair max{M, N} where N pair is the number of matched minutiae, M is the number of minutiae in the template set, and N is the number of minutiae in the test set. Maximum similarity criterion is used for fingerprint pattern classification.

Voice classifier
Short-time spectral analysis is used to characterize the quasi-stationary voice samples. To represent the voice samples in a parametric way, we have considered the cepstral representation as this has been found to be a more robust and reliable feature set for voice recognition than other forms of representation [16,20]. The number of Melfrequency cepstral coefficients (MFCCs) are taken as 16 in this study. Gaussian mixture model (GMM) is considered for representing the acoustic feature vectors. The mean vectors, covariance, and the mixture weights parametrize the complete GMM. These parameters are collectively represented by [25] Thus, by using the MFCC feature vectors and the statistical GMM, each enrolled speaker is uniquely represented by a specific λ. In the training stage itself, each enrolled speaker in g, where g = ĝ 1 ,ĝ 2 , . . . ,ĝ G , is represented by a unique GMM (λ). In the testing stage, the features from the unknown speaker's utterances are compared with statistical models of the voices of speakers known to the system. GMM approach uses the Bayes classification strategy. According to this rule, the test samples are allocated to the classĝ k , having the highest posterior probability, that is [25], where p (X|λ k ), is the a posteriori probability for a given observation sequence.

Proposed method
Various approaches are reported in the literature for improving the performance of biometric fusion at the matching score level. Most of them are based on either of the two following strategies [2,15,17,18,26]: 1. Maximizing the separation between the genuine and impostor scores. 2. Finding the best weighting factor for fusion. http://asp.eurasipjournals.com/content/2014/1 /10 In the proposed method, we have combined both the above-mentioned strategies on a unified framework. As mentioned earlier, classification errors are often inevitable in biometric systems [11]. Two types of such errors are the FAR and the FRR. The decision threshold determines the FAR and FRR of the system. The FAR is the fraction of impostors exceeding the threshold and FRR is the fraction of genuine falling below the threshold. Figure 1 shows the genuine and impostor distributions for a typical biometric matcher. Note that, given two distributions, the FAR and the FRR cannot be reduced simultaneously by adjusting the decision threshold (t). However, the classification errors can be minimized if we minimize the overlap between the genuine and the impostor score distributions. Hence, in this paper, an efficient matching score preprocessing technique using multi-normalization is proposed to improve the separation between the locations of the genuine and impostor score distributions. Here, we also present a technique for incorporating the quality measures into the matching score fusion scheme. In this approach, we have combined the ancillary information such as reliability (dispersion) and separability (inter-/intra-class distance ratio and d-prime statistic) measures in addition to the matching score vectors to find the 'optimal integration weight' for fusion, under varying noise conditions. The performance of the proposed method is compared with the baseline techniques on score level fusion. Experimental studies reveal that the proposed weighting scheme gives improved recognition accuracy even under low SNR conditions and reduces the FAR and FRR considerably compared to the baseline systems.

Multi-normalization
Score normalization is essentially a transformation technique that effectively normalizes any unwanted peculiarities involved in the raw similarity computations The genuine and impostor score distributions for a typical biometric matcher. [13]. As the matching score values from the unimodal matchers follow nonhomogeneous statistical distributions, we propose different score normalization techniques for the complementary modalities employed (multi-normalization) [14]. Various score normalization techniques have been employed in the literature. For a good normalization scheme, the estimates of the location and scale parameters of the matching score distribution must be robust and efficient. Not all the normalization techniques are equally suited for the different match score distributions. Here, we use min-max and tanh normalization techniques for the voice and fingerprint similarity scores to enhance the efficiency and robustness of the system, under varying noise conditions. Min-max (MM) [27] transforms all the raw scores to [0, 1] range while retaining the original score distribution except for a scaling factor. Given a set of matching scores, s = {s i }, i = 1, 2, . . . , n, the normalized score s i is obtained by [13], where max(s) and min(s) are the maximum and the minimum values of the score range estimated. Tanh (TH) normalization is one of the robust and efficient normalization methods. The transformed scores can be obtained using where μ(s) and σ (s) denote the mean and standard deviation of the genuine scores, respectively. There are several versions in the literature regarding Equation 7. We have adopted the version in [13]. Instead of using Hampel estimators, the mean and the standard deviation are estimated from the matching scores itself. This is because for a training set not containing artificially introduced outliers, the use of Hampel estimators gives a nearly identical multimodal system performance as when using the real values of μ(s) and σ (s) [28]. The constant 0.01 in the expression for tanh normalization determines the spread of normalized genuine scores.

Estimation of ancillary measures
In a biometric system, the smaller the overlap between the genuine and the impostor scores, the better is the recognition rate. Thus, the class separability and the score reliability (dispersion) or separability measures give an indication of the quality of the biometric samples and the matcher [22]. The global recognition rate of the multibiometric system can be improved by incorporating the reliability and separability measures as integration weights in the fusion module. Here, we have considered inter-/ http://asp.eurasipjournals.com/content/2014/1/10 intra-class distance measure from the feature space and the reliability as well as the d-prime separability measure from the matching score space.

Estimation of separability measures
In a biometric system, the smaller the overlap between the genuine and the impostor scores, the better is the recognition rate. Thus, the class separability and the score separability measures give an indication of the quality of the biometric samples and the matcher. The global recognition rate of the multibiometric system can be improved by incorporating these separability measures as integration weights in the fusion module. Here, we have considered inter-/intra-class distance measure from the feature space and the d-prime separability measure from the matching score space.

Estimation of inter-/intra-class distance
Inter-/intra-class separability measures are derived from the feature space of the two modalities. This distance measure is based on the Euclidean distance between pairs of feature vectors in the training set. Here, the basic assumption is that the class-dependent distributions are in such a way that the expectation vectors of different classes are discriminating [29]. Let S T be a labelled training set with S N feature vectors. The classes k are represented by subsets S k ⊂ S T , each class having S k features ( S k = S N ). Feature vectors in S T without reference to their classes are denoted by ζ n . Feature vectors in S k (i.e. vectors coming from the class k ) are denoted by ζ k,n . The sample mean of class k is given bŷ The sample mean of the entire training set is given bŷ In order to quantify the scattering of feature vectors in the space, we consider the scatter matrices [29]. Scatter matrices are among the most popular measures for quantifying the way feature vectors 'scatter' in the respective space.
The scatter matrix gives some information about the dispersion of the feature vectors around their mean. Matrix that describes the scattering of vectors from class k is SM k is the estimate of the class-dependent covariance matrix. SM k not only provides information about the average distance of the scattering, but also gives information about the eccentricity and orientation of the scattering. The scatter matrix representing the noise, when averaged over all the class, is given by SM w and SM b are the within-class scatter matrix and between-class scatter matrix, respectively. SM w describes the average scattering within the classes while SM b gives the scattering of the class-dependent sample means around the overall average. With the above definitions, the average squared distance is proportional to the trace of the matrix SM w + SM b .
This expression indeed shows that the average distance has a contribution due to differences in expectation and a contribution due to noise. This average distance is not an adequate performance measure since a large value ofρ 2 does not imply that the classes are well separated. The performance measure well suited to express the separability of classes is the ratio between inter-class and intra-class distance [29]. can be considered as 'signal-to-noise ratio' [29]. These Js are measurements of the separability among all classes. The inter-/intra-separability measures are estimated from the feature space of both the fingerprints and voice modalities.

Estimation of reliability measures
Reliability estimates have been demonstrated to be an elegant tool for incorporating quality measures into the process of estimating the probability of correctness of the decisions [16,17,20,30]. They can be used as auxiliary quality information for the score level fusion. The reliability gives the degree of trust in the recognition results http://asp.eurasipjournals.com/content/2014/1/10 drawn from the individual information sources [16]. In this approach, the integration weight is determined from the relative reliability of the two modalities. The reliability parameters can be measured at either the signal level or at the expert score level. The score-based reliability measures are mainly categorized as score entropy, dispersion, variance, cross classifier coherence, and score difference [31].
We have proposed a dynamic reliability measure by considering the dispersion of the scores from the fingerprint and voice matchers. When the voice samples do not contain any noise, there are large differences in the matching score values. As the voice samples become noisy, these differences tend to become small. Given that a very high level of noise in a signal is likely to produce very similar scores across all the models, we would expect the estimate of the error variance to be very small. Considering this observation, the reliability of a modality is defined in several ways as mentioned in [16]. The modalities' reliability parameters are estimated based upon the variances of the matching scores. The usual measure is to calculate the variance around the best or the least score rather than the mean or median [20,30]. Here, the reliability of each modality is calculated as follows: where 'N' is the number of test samples considered from all the classes and 'm' stands for the reliability of either fingerprint (F) or voice (V ) modality. This quantity measures the dispersion of the score values to the least score rather than the mean. The reliability of each stream should satisfy the following conditions: The reliability measures are estimated from the score space of both the fingerprint and voice modality during the training/validating stage.

Estimation of d-prime separability measures
The d-prime gives a measure of how well the non-match score probability density and the match score probability density are separated. We calculate the d-prime separability measure from the matching score matrix of both the modalities. The d-prime statistic provides the separation between the means of the genuine and the impostor score distributions, in units of the standard deviation of the score distributions [32].
where μ m = mean of genuine scores, σ 2 m = variance of genuine scores, μ n = mean of impostor scores, and σ 2 n = variance of impostor scores. A higher d-prime indicates that the genuine scores can be more readily detected. This sensitivity index measured from the score space captures both the separation and the spread of the genuine and impostor score distributions.

Optimal fusion with ancillary measures
We have combined the reliability and the separability measures using the mean (average) rule. Let ρ F and ρ V denote the inter-/intra-class distance measures obtained from the fingerprint and voice modality, respectively. The reliability measures obtained from the two modalities are λ F and λ V . d F and d V denote the d-prime separability measures obtained from preprocessed scores of fingerprints and voice modality, respectively. The ancillary information estimated in the training stage are shown in Table 1, and it is evident from the table that the reliability and the separability measures decrease with the increase in noise. The following parameters are defined to obtain the fused scores, and Table 1 presents their numerical values.
The multi-normalized match scores from the two modalities are combined by the weighted sum rule to produce the final decision scores. Given the speaker scores S (vc) and the finger scores S (fc) , the fused scores can be obtained by linearly combining the two scores.
The weighting factor γ (0 ≤ γ ≤ 1) determines the amount of contribution of each modality to the final decision. Even though the integration weight using Equation 22 can improve noise robustness under certain noise conditions, it is not always the optimal. Hence, a modified integration weight β given by Equation 24 is employed to obtain better performance under low SNR conditions [20].  where x opt is the scaling factor which needs to be optimized. In order to emphasize or de-emphasize the scores obtained from the unimodal systems, the integration weight factor must be adaptive and optimal. That is, the weights must be most appropriate and self adaptive to the fluctuating inputs. So, we propose optimization techniques for estimating the optimal integration weights for fusion. The optimal integration weights are obtained in the training/validation stage using 'leave-one-out' cross validation. The proposed method systematically chooses the best scaling factor x opt from a defined domain (0 ≤ x opt ≤ 1) so as to maximize the objective function (recognition accuracy) [20,30]. The objective function is given by (25) where C Mat is the confusion matrix. We have employed a direct search optimization method (grid search) and random search optimization methods (genetic algorithm and particle swarm optimization) for finding the optimal integration weight β. The following subsections give a brief overview of the methods employed.

Grid search
The directed one-dimensional grid search (GS) method determines the minimum of a real valued function based on the initial estimate of the location of the minimum point from the lower (L) and upper (U) bounds of the decision variable x opt . This method involves setting up of grids in the decision space and evaluating the values of the objective function at each grid point. The point which corresponds to the best value of the objective function is considered to be the optimum solution. The one-dimensional grid search method can be formulated as the mapping f : R 1 −→ R 1 such that L ≤ x opt 1 ≤, · · · , ≤ x opt n ≤ U, where x opt 1 , · · · , x opt n are the 'n' test points [33]. The number of test points 'n', in each iteration step, determines the rate of convergence of the algorithm.

Genetic algorithm
Genetic algorithm (GA) is a directed random search technique that is modelled on the natural evolution/selection process towards the survival of the fittest. It efficiently utilizes historical information to obtain new search points with expected enhanced performance. In every generation, a new set of artificial individuals is created using the information from the best of the old generation. Genetic algorithm combines the survival of the fittest from the old population with a randomized information exchange that helps to form new individuals with higher fitness. The algorithm consists of initialization, evaluation, reproduction (selection), cross over and mutation. GA is expected to find the global minimum solution even in the case where the objective function has several extrema, including local extrema and saddle points. The final solution gives the integration weight scale factor for the score level fusion [34].

Particle swam optimization
The particle swam optimization (PSO) algorithm is a stochastic optimization strategy, inspired by the social behaviour of the flock of swarms. Here, the underlying concept is that, for every time instant, the velocity of each particle (potential solution), changes between its pbest and lbest locations [35]. The particle associated with the best solution (fitness value) seems to be the leader and each particle keeps track of its coordinates in the solution space. This fitness value is stored which is referred to as pbest. Another 'best' value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the neighbours of the particle. This location is called lbest. When a particle takes all the population as its topological neighbours, the best value is a global best and is called gbest. The algorithm is as follows: 1. Randomly generate initial candidate solutions. 2. Assign the position and velocity of the associated particles randomly. 3. Evaluate the fitness (objective function) of each particles. http://asp.eurasipjournals.com/content/2014/1/10 4. Compare each particle objective function value with this particle's personal best value. If better, update pbest and record current position as the particle's personal best position. 5. Find the lowest objective function value of the whole particles. If the value is better than gbest, replace gbest with this objective function value, and record the global best position. 6. Change velocities and positions [35]. The velocity and position updates are given by

Results and discussions
Finger images from the FVC2002 fingerprint database [24] and voice samples from ELSDSR database [36] have been employed for the experimentation. ELSDSR database contains nine text-independent speech samples of twenty three persons. So, finger images of twenty three different persons with nine impressions per finger is considered. Out of the nine impressions per finger and speech samples for each individual, seven samples are used for training the individual classifiers and two samples are used for testing. This choice is considered for improving the recognition accuracy even under the adverse conditions. Since the number of available biometric samples is limited, 'leave-one-out' cross validation is employed to fine tune the training/ validation phase and estimate the best optimal weights under various noise conditions. As the fingerprint biometric is more robust, the performance of the fingerprint classifier under varying noise conditions is not considered. We define noise as 'any unwanted change in the signal'. The influence of noise on the clean voice samples are modelled by adding AWGN to the voice samples.
The performance of the system with varying SNR conditions is considered systematically from the feature extraction and model building stage to the testing stage. MFCC feature vectors of the order 12, 16 and 20 and the GMM with 12 and 16 mixtures are considered for the simulation studies, as they are widely used. Different model combinations are considered to select the best model that gives better recognition accuracy. The voice model with 16 MFCC feature vectors with 12 Gaussian mixtures give improved recognition accuracy under normal operating conditions (−10-to 20-dB SNR). So, this model combination is considered for the experimental study. The outputs of the two classifiers are consolidated into a single vector of scores using the weighted sum rule of fusion. We have compared the performance of the proposed method with the baseline techniques such as bimodal systems with equal weighting, optimal integration weight estimation scheme without ancillary measures [21] and integration weight estimation using reliability measures [20,22].

Equal prior weights
This method equally weighs the classifiers, without making any prior assumptions about the quality of each data source. We have weighted the contribution of both the fingerprint and voice data equally for the identification problem. A constant value of γ = 0.5 is assigned as an integration weight at all SNR conditions. The score transformation is achieved by the following equation: where S (fc) = finger scores; S (vc) = voice scores. The testing accuracy of this technique is presented in Table 2. This technique will not favour one modality over another. At low SNR, the fusion system starts to exhibit catastrophic fusion. From the score density plot (Figure 2), it is evident that with the increase in noise, the overlap between the genuine and the impostor distribution curves increases. The detection error tradeoff (DET) curves in Figure 3 show the FAR and FRR performances.

Fusion with optimal integration weight without ancillary information
The performance of the proposed method is compared with the optimal integration weight estimation techniques reported in [21]. In [21], a blind optimization of the parameter β, (0 ≤ β ≤ 1) was performed to maximize the recognition accuracy of the multimodal system. The recognition accuracy (testing) of this method is shown in Table 2. The abbreviation IWWAM in the Table 2 denotes 'integration weight without ancillary measures'. This method does show better performance than the sum rule of fusion method with equal weighing. Moreover, it shows improved accuracy than any of the unimodal systems under the normal operating conditions and maintains the accuracy of the better unimodal ones under all the adverse conditions. Further insight could be obtained from the DET plots in Figure 4. The disadvantage of the method is that at the extreme noise conditions, the fusion module contributes zero weighing to the voice modality and there is no substantial reduction in the FAR and FRR. The score density plots of the said method has been presented in [21]. It is evident from the plots that with the increase in noise, the overlap between the genuine and the impostor score distributions also increases, and under low SNRs, it shows the worst case performance.

Optimal fusion with reliability measures
We have also compared the proposed method with the reliability-based optimal integration weights estimation scheme presented in [20]. A direct implementation of the techniques reported in [20] was presented in [22] for fusing fingerprint and voice biometric. Even though the method shows better performance in terms of recognition accuracy and the FAR than the sum rule of fusion method with equal weighting, and the method discussed in subsection 5.1.2, it shows attenuation fusion under extreme noise conditions [22]. This is evident from the testing accuracy depicted in the Table 2. The DET plots from Figure 5 reveals that the FAR shows a more pronounced reduction when we use the quality measures for finding the optimal integration weight. A noted disadvantage with this method is that, considerable reduction in the FRR is not attained with this method. Figure 6 shows the score density performance with the said method.

Fusion with separability measures
For improving the recognition performance of the multibiometric system, we have presented a multinormalization-based integration weight estimation scheme using separability measures in [22,37]. This is a non-optimization technique. For improving the performance of the system, fingerprint and speech similarity scores are pre-processed with cohort and tanh normalization methods, respectively. Table 2 and Figure 7 show the result. This method gives improved performance in recognition accuracy [22,37] and reduces the FAR and FRR (Figure 7) compared to the baseline  techniques, optimization technique and the reliabilitybased integration weight estimation techniques. One of the limitations of this method is with the cohort selection for normalization. Additionally, the FAR needs further reduction.

Fusion with the proposed method
From the training/validation stage, we have obtained the optimal integration weights β = x opt × γ for different noise conditions (−10-to 20-dB SNR). The integration weights are estimated in such a way that it maximizes the recognition accuracy. We have applied the GS, GA and PSO techniques for optimizing the integration weight factor. The relative ancillary ratio estimates of the two modalities and the optimal integration weight β estimated under various noise conditions are presented in Table 1.
The β values thus estimated during the training/validation stage are used for testing. We have employed the maximum similarity classifier, and it allocates the test samples to the class that is having the highest matching score values. The overall testing accuracy of the proposed method is depicted in Table 3. It is evident from Table 3 that the recognition accuracy of the proposed method eliminates catastrophic fusion under extreme noise conditions (0, −5 and −10 dB). The ancillary information provided by the reliability and the separability ratios effectively captures the relative noise degradation of the individual modalities employed. This helps to weight the complementary modalities based on their relative degradation which further improves the overall efficiency of the system. Even though the method discussed in subsection 5.1.4 shows similar performance in terms of recognition accuracy, the FRR of the said method needs further reduction. This is achieved with the proposed method. The added advantage of the proposed technique is that the classification errors, both the FAR and FRR, are reduced considerably even under low SNR conditions. The DET performance plots (Figures 8, 9 and 10) and the score density plot (Figure 11) highlight this observation. This performance improvement is mainly due to the combined effects of the multi-normalization-based score preprocessing technique and the optimal weighting strategy using ancillary information. Here, the genuine scores from the fingerprint matcher are transformed with tanh and the voice matching scores are transformed with minmax normalization techniques. The use of tanh normalization technique plays a vital role in reducing the overlap between the score density plots. It is noticed that, this transformation, improves the separation between the genuine and impostor score distribution curves (It clamps the genuine score values as the mean and standard deviation of the genuine scores are used for tanh normalization [13]. In our experiment, the standard deviation of the genuine  for various noise conditions. We observe that the genuine fingerprint scores have a standard deviation that is approximately 10 times the standard deviation of the genuine voice scores. Hence, the constant factor in the tanh normalization was set to 0.1 [13,38]). The 100% value in the recognition accuracy does not always indicate the classification errors FAR and FRR to be zero. The operating Fused Score for −10 dB (y)→ p(y)→ Genuine Scores Impostor Scores Figure 11 Score density plots for GA-based optimization (Proposed). http://asp.eurasipjournals.com/content/2014/1/10 point used for calculating the FAR and FRR from the DET would determine its values. This is quite obvious because, if any of the genuine score are comparable to the impostor score values (or its value is less than the decision threshold) even after score normalization, cross over may occur in the score density plots and the FAR and FRR exhibit non-zero values. Therefore, the experimental studies reveal that the overall performance of the multibiometric system is improved, when multi-normalization-based score preprocessing technique and ancillary quality measures are employed on a unified framework for finding the optimal integration weights. This method could successfully overcome the attenuating fusion ('catastrophic fusion') under various noise conditions. The score density plots ( Figure 11) indicate the proposed method effectively reduces the overlap between the genuine and the impostor score distributions, which also reduces the classification errors.

Statistical significance test
To compare the performance of different fusion methods, we have considered the Friedman test [39]. This test is best suited for multi-class data with any sample distribution. We have considered n = 7 data sets (different noise conditions) and k = 5 fusion methods, to test whether the proposed fusion method performs better than other methods. The null hypothesis is that the mean accuracy of the proposed method is equal to the mean accuracies of the other fusion methods. The hypothesis is rejected when the P value is small (usually < 0.05). Using k-1 degrees of freedom (4), we obtain the P value as 9.173 × 10 −5 . The P value obtained is much smaller than the level of significance 0.05, so we accept the alternative hypothesis that the mean accuracies of the weighted mean fusion method differ from the other methods. Using the mean ranks and the standard deviation obtained from the Friedman test, a multiple comparison test has been done using one-way ANOVA to show the performance of the different fusion methods. Figure 12 shows which means (and the comparison intervals around them) are significantly different and which are not. It can be seen that the mean rank obtained by the proposed methods outperform all other methods except the one discussed in subsection 5.1.4 (separability method with cohort normalization).

Conclusions
Integration weight optimization techniques based on multi-normalization and ancillary measures are proposed here for improving the performance of the fingerprint and voice biometrics system. Experimental studies have been carried out under various noise conditions from −10to 20-dB SNR. The matching score preprocessing technique based on multi-normalization effectively suppresses any unwanted peculiarities involved in the raw similarity computation of the individual matchers. The use of tanh and min-max normalization schemes with fingerprint and voice-matching score vector helps in reducing the overlap between the genuine and the impostor score distributions, which in turn reduces the classification errors. Moreover, by estimating the best integration weight (γ ) using ancillary measures derived from the feature space and the score space, we could weigh the two modalities based on their relative degradation. The optimal integration weights (β) are estimated in the training/validation stage using optimization techniques such as grid search, genetic algorithm and particle swarm optimization. Hence, by incorporating ancillary measures using the multi-normalization framework, we could achieve better recognition performance even at low SNR conditions. The proposed method outperforms the baseline systems in terms of recognition accuracy, FAR and FRR. The proposed system performs more reliably in controlled environments, such as offices and laboratories, than in uncontrolled environments, such as outdoors. This work can be extended with databases having more number of subjects, and the fusion study can be conducted in the two recognition modes of identification and verification. Thus, the benefits of this work scan a wide range of areas that are capable of improving the quality of life of people. As this method can reduce the FAR and FRR, it may be highly suitable for applications like sharing networked computer resources, granting access to nuclear facilities, performing remote financial transactions, and forensic applications like criminal investigation, parenthood determination, etc.