 Research
 Open Access
 Published:
On combining multinormalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics
EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 10 (2014)
Abstract
In this paper, we have considered the utility of multinormalization and ancillary measures, for the optimal score level fusion of fingerprint and voice biometrics. An efficient matching score preprocessing technique based on multinormalization is employed for improving the performance of the multimodal system, under various noise conditions. Ancillary measures derived from the feature space and the score space are used in addition to the matching score vectors, for weighing the modalities, based on their relative degradation. Reliability (dispersion) and the separability (inter/intraclass distance and dprime statistics) measures under various noise conditions are estimated from the individual modalities, during the training/validation stage. The ‘best integration weights’ are then computed by algebraically combining these measures using the weighted sum rule. The computed integration weights are then optimized against the recognition accuracy using techniques such as grid search, genetic algorithm and particle swarm optimization. The experimental results show that, the proposed biometric solution leads to considerable improvement in the recognition performance even under low signaltonoise ratio (SNR) conditions and reduces the false acceptance rate (FAR) and false rejection rate (FRR), making the system useful for security as well as forensic applications.
1 Introduction
The recognition accuracy of a biometric system is highly sensitive to the quality of the biometric input. The noisy data can result in a significant reduction in the performance of the system. One of the main problems associated with biometric systems is the undesired variations in the biometric data. These variations can arise due to a variety of factors such as sensors used in capturing the biometric data and various nonideal operating conditions such as background noise and nonuniform illumination [1–9]. Multimodal systems are more robust to environmental and sample quality variations due to the presence of multiple sources of evidences [10–12]. This is an added advantage of the multibiometric systems. Compared to the fingerprint systems, voice recognition systems are severely degraded by the presence of noise and intraclass variations. Also, they are strongly affected by the behavioural and physiological factors. These variations are often reflected in the matching scores, which in turn influence the overall efficiency of the biometric system [13]. All these factors make the reduction in error rates of the biometric system a challenging enterprise.
The score outputs of the classifier often show tremendous variations when presented with feature vectors corrupted by noise. In this scenario, some impostors will be able to obtain higher scores and the genuines will obtain lower scores, compared to the clean conditions, thereby increasing the FAR and FRR [11]. Most matchers have to deal with such situations in real time in spite of the enhancement algorithms and the feature sets they use. In order to reduce the classification errors, this paper aims at quantifying the amount of trust that can be given to the individual classifier’s decision, taking into consideration the effect of environmental noise conditions and the behaviour of the classifiers on evaluation data. Here, we present a new combinational approach for fusing the scores derived from fingerprint and voice biometric matchers, with multinormalization and the weighting measures derived from ancillary information. As the matching score values from the fingerprint and the voice matchers follow nonhomogeneous statistical distributions, we have employed tanh and minmax normalization techniques, respectively, for the complementary modalities [14]. Ancillary information includes measures indicating the quality of the acquired biometric samples or certain additional information about the user [11]. Here, the relative quality information from the individual classifiers are used in the fusion process.
In multimodal systems, confidence measures [15] are widely used as integration weights for biometric fusion. The weight vector represents the weight assigned to the matching score vectors. In a multibiometric system, the weight vector represents the relative importance of the different biometric matchers, provided that the scores of the matchers have been normalized [6]. The proposed technique involves emphasizing or deemphasizing the matching scores of the individual modalities, depending on the estimate of their relative degradation. Let us assume that during a particular access attempt by the user, the fingerprint image is of poor quality but the voice samples are sufficiently good. In this case, we can assign a higher weight to the voice matching result and a lower weight to the fingerprint matching result. Even for the same biometric modality, different representations and matching algorithms may exhibit different levels of sensitivity to the quality of the biometric data [6]. The aim of this fusion technique is to combine the information from fingerprint and voice classifiers, such that, the resulting performance is greater than or equal to the performance of the best individual sources. Anything less than this is termed as ‘catastrophic fusion’[16], and it is of course undesirable. The inter/intraclass separability measures derived from the feature space and the reliability (dispersion) as well as the dprime separability measures from the match score space are estimated separately for each noise condition in the training/validation phase using ‘leaveoneout’ cross validation technique. These measures are then algebraically combined to differentially weigh each subsystems to improve the performance of fusion. The basic assumption followed in this experiment is that, the fingerprint biometric trait has higher permanence than voice; hence, its performance under various noise conditions is not explored. As the quality of voice biometric degrades with noise, its performance under varying noise conditions is demonstrated by artificially degrading the training/testing samples with additive white Gaussian noise (AWGN). We have considered voice samples with noise contents varying from –10 to 20dB SNR. We have compared the performance of the proposed method with the baseline techniques on score level fusion. The experimental results show that optimal integration weights estimated using multinormalization and ancillary measures improves the performance of the multibiometric systems even under low SNR conditions.
1.1 Previous work
Though a lot of work has been done in biometric fusion, little has been done to improve the efficiency of the multimodal systems under various noise conditions. Lewis et al. shed some light on audiovisual speech recognition systems using dispersion measures as the integration weights [16]. These measures are based upon the values assigned to the individual classes by the matcher module. Poh et al. proposed a marginderived confidence measure while fusing two system opinions [15]. Jain et al. examined the effect of different score normalization techniques on the performance of the multimodal biometric system [13]. Kryszczuk et al. proposed a method of performing multimodal fusion using face and speech data, combining signal quality measures and reliability estimates [17]. Bendris et al. introduced quality measures in audiovisual identity verification [18]. Alsaade et al. showed that score normalization and qualitybased fusion improve the accuracy of multimodal biometrics [2]. Optimal integration weight estimation using least squares technique was reported in [19]. Reliabilitybased optimal integration weight estimation for audiovisual decision fusion was reported in [20]. In our earlier work, we presented an optimal integration weight estimation scheme for fingerprint and voice biometric under various noise conditions, without using ancillary information [21]. We also presented a reliabilitybased optimal integration weight estimation scheme for the fingerprint and voice modalities in [22]. In this paper, we propose an efficient integration weight optimization strategy incorporating both the reliability measures from the score space (dispersion measure) and the separability measures from the feature space (inter/intraclass distance) and score space (dprime statistic). The optimal integration weights are estimated using a multinormalization framework [14].
1.2 Major contributions
The major contributions of this work are as follows:

1.
We have proposed an efficient multinormalizationbased matching score preprocessing technique to transform the scores obtained from the individual modalities, for reducing the classification errors (the overlap between the genuine and the impostor score distribution).

2.
Ancillary information such as reliability (from the score space) and separability (from the feature space and score space) measures are combined algebraically to find the ‘best integration weights’, (γ) for fusion. Thus we have utilized the rate of relative degradation of the samples from the feature space and the expert score space for finding the ‘best integration weights’.

3.
The ‘optimal integration weights’ (β) are estimated in the training/validation stage. Standard optimization techniques such as grid search and random search techniques like genetic algorithm and particle swarm optimization algorithms are used to find the ‘optimal integration weights’. The ‘optimal integration weights’ thus obtained in the training stage are used as the integration weights for fusion in the testing stage.
To the best of our knowledge, the proposed optimal fusion strategy using multinormalization, ancillary measures, and optimization techniques has not been attempted until now.
1.3 Organization of the paper
The rest of this paper is structured as follows: First, in section 2, the modelling and patternmatching approaches used with the fingerprint and voice modality are briefly discussed. The proposed method is discussed in section 3. The optimal fusion using ancillary measures is detailed in section 4. Experimental results are described in section 5. Finally, a brief summary is presented.
2 Individual classifiers
2.1 Fingerprint classifier
We have used the minutiaebased fingerprint matching technique [23] using ridge counting. Given two sets of minutiae from the template (T) and the input fingerprint (I) images, the matching algorithm compares the minutiae points in the two images and returns a degree of similarity. Each minutiae is represented as a triplet m={x,y,θ} that indicates the x, y minutiae location coordinates and the minutiae angle θ. A minutiae m_{ i } in T and a minutiae ${m}_{j}^{\prime}$ in I are considered matching if the spatial distance (sd) between them is lesser than a given tolerance r_{0} and the direction difference (dd) between them is lesser than an angular tolerance θ_{0}[24].
Elastic matching algorithm is used to perform matching between the two fingerprints. Match score formula for the reference and the test print is given by [24]
where N_{pair} is the number of matched minutiae, M is the number of minutiae in the template set, and N is the number of minutiae in the test set. Maximum similarity criterion is used for fingerprint pattern classification.
2.2 Voice classifier
Shorttime spectral analysis is used to characterize the quasistationary voice samples. To represent the voice samples in a parametric way, we have considered the cepstral representation as this has been found to be a more robust and reliable feature set for voice recognition than other forms of representation [16, 20]. The number of Melfrequency cepstral coefficients (MFCCs) are taken as 16 in this study. Gaussian mixture model (GMM) is considered for representing the acoustic feature vectors. The mean vectors, covariance, and the mixture weights parametrize the complete GMM. These parameters are collectively represented by [25]
Thus, by using the MFCC feature vectors and the statistical GMM, each enrolled speaker is uniquely represented by a specific λ. In the training stage itself, each enrolled speaker in g, where $\mathbf{\text{g}}=\left\{\widehat{{g}_{1}},\widehat{{g}_{2}},\dots ,\widehat{{g}_{G}}\right\}$, is represented by a unique GMM (λ). In the testing stage, the features from the unknown speaker’s utterances are compared with statistical models of the voices of speakers known to the system. GMM approach uses the Bayes classification strategy. According to this rule, the test samples are allocated to the class $\widehat{{g}_{k}}$, having the highest posterior probability, that is [25],
where p(Xλ_{ k }), is the a posteriori probability for a given observation sequence.
3 Proposed method
Various approaches are reported in the literature for improving the performance of biometric fusion at the matching score level. Most of them are based on either of the two following strategies [2, 15, 17, 18, 26]:

1.
Maximizing the separation between the genuine and impostor scores.

2.
Finding the best weighting factor for fusion.
In the proposed method, we have combined both the abovementioned strategies on a unified framework. As mentioned earlier, classification errors are often inevitable in biometric systems [11]. Two types of such errors are the FAR and the FRR. The decision threshold determines the FAR and FRR of the system. The FAR is the fraction of impostors exceeding the threshold and FRR is the fraction of genuine falling below the threshold. Figure 1 shows the genuine and impostor distributions for a typical biometric matcher. Note that, given two distributions, the FAR and the FRR cannot be reduced simultaneously by adjusting the decision threshold (t). However, the classification errors can be minimized if we minimize the overlap between the genuine and the impostor score distributions. Hence, in this paper, an efficient matching score preprocessing technique using multinormalization is proposed to improve the separation between the locations of the genuine and impostor score distributions. Here, we also present a technique for incorporating the quality measures into the matching score fusion scheme. In this approach, we have combined the ancillary information such as reliability (dispersion) and separability (inter/intraclass distance ratio and dprime statistic) measures in addition to the matching score vectors to find the ‘optimal integration weight’ for fusion, under varying noise conditions. The performance of the proposed method is compared with the baseline techniques on score level fusion. Experimental studies reveal that the proposed weighting scheme gives improved recognition accuracy even under low SNR conditions and reduces the FAR and FRR considerably compared to the baseline systems.
3.1 Multinormalization
Score normalization is essentially a transformation technique that effectively normalizes any unwanted peculiarities involved in the raw similarity computations [13]. As the matching score values from the unimodal matchers follow nonhomogeneous statistical distributions, we propose different score normalization techniques for the complementary modalities employed (multinormalization) [14]. Various score normalization techniques have been employed in the literature. For a good normalization scheme, the estimates of the location and scale parameters of the matching score distribution must be robust and efficient. Not all the normalization techniques are equally suited for the different match score distributions. Here, we use minmax and tanh normalization techniques for the voice and fingerprint similarity scores to enhance the efficiency and robustness of the system, under varying noise conditions. Minmax (MM) [27] transforms all the raw scores to [0, 1] range while retaining the original score distribution except for a scaling factor. Given a set of matching scores, s={s_{ i }}, i=1,2,…,n, the normalized score ${s}_{i}^{\prime}$ is obtained by [13],
where max(s) and min(s) are the maximum and the minimum values of the score range estimated. Tanh (TH) normalization is one of the robust and efficient normalization methods. The transformed scores can be obtained using
where μ(s) and σ(s) denote the mean and standard deviation of the genuine scores, respectively. There are several versions in the literature regarding Equation 7. We have adopted the version in [13]. Instead of using Hampel estimators, the mean and the standard deviation are estimated from the matching scores itself. This is because for a training set not containing artificially introduced outliers, the use of Hampel estimators gives a nearly identical multimodal system performance as when using the real values of μ(s) and σ(s) [28]. The constant 0.01 in the expression for tanh normalization determines the spread of normalized genuine scores.
3.2 Estimation of ancillary measures
In a biometric system, the smaller the overlap between the genuine and the impostor scores, the better is the recognition rate. Thus, the class separability and the score reliability (dispersion) or separability measures give an indication of the quality of the biometric samples and the matcher [22]. The global recognition rate of the multibiometric system can be improved by incorporating the reliability and separability measures as integration weights in the fusion module. Here, we have considered inter/intraclass distance measure from the feature space and the reliability as well as the dprime separability measure from the matching score space.
3.3 Estimation of separability measures
In a biometric system, the smaller the overlap between the genuine and the impostor scores, the better is the recognition rate. Thus, the class separability and the score separability measures give an indication of the quality of the biometric samples and the matcher. The global recognition rate of the multibiometric system can be improved by incorporating these separability measures as integration weights in the fusion module. Here, we have considered inter/intraclass distance measure from the feature space and the dprime separability measure from the matching score space.
3.3.1 Estimation of inter/intraclass distance
Inter/intraclass separability measures are derived from the feature space of the two modalities. This distance measure is based on the Euclidean distance between pairs of feature vectors in the training set. Here, the basic assumption is that the classdependent distributions are in such a way that the expectation vectors of different classes are discriminating [29]. Let S_{ T } be a labelled training set with S_{ N } feature vectors. The classes Ω_{ k } are represented by subsets S_{ k }⊂S_{ T }, each class having S_{ k } features ($\sum {S}_{k}={S}_{N}$). Feature vectors in S_{ T } without reference to their classes are denoted by ζ_{ n }. Feature vectors in S_{ k } (i.e. vectors coming from the class Ω_{ k }) are denoted by ζ_{k,n}. The sample mean of class Ω_{ k } is given by
The sample mean of the entire training set is given by
In order to quantify the scattering of feature vectors in the space, we consider the scatter matrices [29]. Scatter matrices are among the most popular measures for quantifying the way feature vectors ‘scatter’ in the respective space. The scatter matrix gives some information about the dispersion of the feature vectors around their mean. Matrix that describes the scattering of vectors from class Ω_{ k } is
SM_{ k } is the estimate of the classdependent covariance matrix. SM_{ k } not only provides information about the average distance of the scattering, but also gives information about the eccentricity and orientation of the scattering. The scatter matrix representing the noise, when averaged over all the class, is given by
SM_{ w } and SM_{ b } are the withinclass scatter matrix and betweenclass scatter matrix, respectively. SM_{ w } describes the average scattering within the classes while SM_{ b } gives the scattering of the classdependent sample means around the overall average.
With the above definitions, the average squared distance is proportional to the trace of the matrix SM_{ w }+SM_{ b }.
This expression indeed shows that the average distance has a contribution due to differences in expectation and a contribution due to noise. This average distance is not an adequate performance measure since a large value of $\stackrel{\u0304}{{\rho}^{2}}$ does not imply that the classes are well separated. The performance measure well suited to express the separability of classes is the ratio between interclass and intraclass distance [29].
The term J_{INTER}=trace(SM_{ b }) gives the interclass distance and the term J_{INTRA}=trace(SM_{ w }) gives the intraclass distance. J_{INTER} denotes the fluctuations of the conditional expectations around the overall expectation, i.e. the fluctuations of the signal, while J_{INTRA} measures the fluctuations due to noise. Hence $\frac{{J}_{\text{INTER}}}{{J}_{\text{INTRA}}}$ can be considered as ‘signaltonoise ratio’ [29]. These Js are measurements of the separability among all classes. The inter/intraseparability measures are estimated from the feature space of both the fingerprints and voice modalities.
3.3.2 Estimation of reliability measures
Reliability estimates have been demonstrated to be an elegant tool for incorporating quality measures into the process of estimating the probability of correctness of the decisions [16, 17, 20, 30]. They can be used as auxiliary quality information for the score level fusion. The reliability gives the degree of trust in the recognition results drawn from the individual information sources [16]. In this approach, the integration weight is determined from the relative reliability of the two modalities. The reliability parameters can be measured at either the signal level or at the expert score level. The scorebased reliability measures are mainly categorized as score entropy, dispersion, variance, cross classifier coherence, and score difference [31].
We have proposed a dynamic reliability measure by considering the dispersion of the scores from the fingerprint and voice matchers. When the voice samples do not contain any noise, there are large differences in the matching score values. As the voice samples become noisy, these differences tend to become small. Given that a very high level of noise in a signal is likely to produce very similar scores across all the models, we would expect the estimate of the error variance to be very small. Considering this observation, the reliability of a modality is defined in several ways as mentioned in [16]. The modalities’ reliability parameters are estimated based upon the variances of the matching scores. The usual measure is to calculate the variance around the best or the least score rather than the mean or median [20, 30]. Here, the reliability of each modality is calculated as follows:
where ‘N’ is the number of test samples considered from all the classes and ‘m’ stands for the reliability of either fingerprint (F) or voice (V) modality. This quantity measures the dispersion of the score values to the least score rather than the mean. The reliability of each stream should satisfy the following conditions:
The reliability measures are estimated from the score space of both the fingerprint and voice modality during the training/validating stage.
3.3.3 Estimation of dprime separability measures
The dprime gives a measure of how well the nonmatch score probability density and the match score probability density are separated. We calculate the dprime separability measure from the matching score matrix of both the modalities. The dprime statistic provides the separation between the means of the genuine and the impostor score distributions, in units of the standard deviation of the score distributions [32].
where μ_{ m } = mean of genuine scores, ${\sigma}_{m}^{2}$ = variance of genuine scores, μ_{ n } = mean of impostor scores, and ${\sigma}_{n}^{2}$ = variance of impostor scores. A higher dprime indicates that the genuine scores can be more readily detected. This sensitivity index measured from the score space captures both the separation and the spread of the genuine and impostor score distributions.
4 Optimal fusion with ancillary measures
We have combined the reliability and the separability measures using the mean (average) rule. Let ρ_{ F } and ρ_{ V } denote the inter/intraclass distance measures obtained from the fingerprint and voice modality, respectively. The reliability measures obtained from the two modalities are λ_{ F } and λ_{ V }. ${d}_{F}^{\prime}$ and ${d}_{V}^{\prime}$ denote the dprime separability measures obtained from preprocessed scores of fingerprints and voice modality, respectively. The ancillary information estimated in the training stage are shown in Table 1, and it is evident from the table that the reliability and the separability measures decrease with the increase in noise. The following parameters are defined to obtain the fused scores, and Table 1 presents their numerical values.
The multinormalized match scores from the two modalities are combined by the weighted sum rule to produce the final decision scores. Given the speaker scores S^{(vc)} and the finger scores S^{(fc)}, the fused scores can be obtained by linearly combining the two scores.
The weighting factor γ(0≤γ≤1) determines the amount of contribution of each modality to the final decision. Even though the integration weight using Equation 22 can improve noise robustness under certain noise conditions, it is not always the optimal. Hence, a modified integration weight β given by Equation 24 is employed to obtain better performance under low SNR conditions [20].
where x_{opt} is the scaling factor which needs to be optimized. In order to emphasize or deemphasize the scores obtained from the unimodal systems, the integration weight factor must be adaptive and optimal. That is, the weights must be most appropriate and self adaptive to the fluctuating inputs. So, we propose optimization techniques for estimating the optimal integration weights for fusion. The optimal integration weights are obtained in the training/validation stage using ‘leaveoneout’ cross validation. The proposed method systematically chooses the best scaling factor x_{opt} from a defined domain (0≤x_{opt}≤1) so as to maximize the objective function (recognition accuracy) [20, 30]. The objective function is given by
where C_{Mat} is the confusion matrix. We have employed a direct search optimization method (grid search) and random search optimization methods (genetic algorithm and particle swarm optimization) for finding the optimal integration weight β. The following subsections give a brief overview of the methods employed.
4.1 Grid search
The directed onedimensional grid search (GS) method determines the minimum of a real valued function based on the initial estimate of the location of the minimum point from the lower (L) and upper (U) bounds of the decision variable x_{opt}. This method involves setting up of grids in the decision space and evaluating the values of the objective function at each grid point. The point which corresponds to the best value of the objective function is considered to be the optimum solution. The onedimensional grid search method can be formulated as the mapping f : R^{1}→R^{1} such that $L\le {x}_{{\text{opt}}_{1}}\le ,\cdots \phantom{\rule{0.3em}{0ex}},\le {x}_{{\text{opt}}_{n}}\le U$, where ${x}_{{\text{opt}}_{1}},\cdots \phantom{\rule{0.3em}{0ex}},{x}_{{\text{opt}}_{n}}$ are the ‘n’ test points [33]. The number of test points ‘n’, in each iteration step, determines the rate of convergence of the algorithm.
4.2 Genetic algorithm
Genetic algorithm (GA) is a directed random search technique that is modelled on the natural evolution/selection process towards the survival of the fittest. It efficiently utilizes historical information to obtain new search points with expected enhanced performance. In every generation, a new set of artificial individuals is created using the information from the best of the old generation. Genetic algorithm combines the survival of the fittest from the old population with a randomized information exchange that helps to form new individuals with higher fitness. The algorithm consists of initialization, evaluation, reproduction (selection), cross over and mutation. GA is expected to find the global minimum solution even in the case where the objective function has several extrema, including local extrema and saddle points. The final solution gives the integration weight scale factor for the score level fusion [34].
4.3 Particle swam optimization
The particle swam optimization (PSO) algorithm is a stochastic optimization strategy, inspired by the social behaviour of the flock of swarms. Here, the underlying concept is that, for every time instant, the velocity of each particle (potential solution), changes between its pbest and lbest locations [35]. The particle associated with the best solution (fitness value) seems to be the leader and each particle keeps track of its coordinates in the solution space. This fitness value is stored which is referred to as pbest. Another ‘best’ value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the neighbours of the particle. This location is called lbest. When a particle takes all the population as its topological neighbours, the best value is a global best and is called gbest. The algorithm is as follows:

1.
Randomly generate initial candidate solutions.

2.
Assign the position and velocity of the associated particles randomly.

3.
Evaluate the fitness (objective function) of each particles.

4.
Compare each particle objective function value with this particle’s personal best value. If better, update pbest and record current position as the particle’s personal best position.

5.
Find the lowest objective function value of the whole particles. If the value is better than gbest, replace gbest with this objective function value, and record the global best position.

6.
Change velocities and positions [35]. The velocity and position updates are given by
$$\begin{array}{l}v\left[\phantom{\rule{0.3em}{0ex}}\text{new}\right]=w\ast v\left[\phantom{\rule{0.3em}{0ex}}\text{old}\right]+\\ c1\ast \text{rand}1\ast (\text{pbest[old]}\phantom{\rule{1em}{0ex}}\text{present[old]})\phantom{\rule{0.3em}{0ex}}+\\ c2\ast \text{rand}2\ast (\text{gbest[old]}\phantom{\rule{1em}{0ex}}\text{present[old]})\end{array}$$(26)$$\begin{array}{l}\text{present[new]}=\text{present[old]}+v\left[\phantom{\rule{0.3em}{0ex}}\text{new}\right]\end{array}$$(27)w is the inertia weight, v[ ] is the particle velocity, present [ ] is the current particle (solution). rand1 and rand2 are the random numbers between [0,1]. c 1, c 2 are learning factors. Usually c 1=c 2=2.

7.
Repeat 3 to 7 until stop criteria are satisfied.
5 Results and discussions
Finger images from the FVC2002 fingerprint database [24] and voice samples from ELSDSR database [36] have been employed for the experimentation. ELSDSR database contains nine textindependent speech samples of twenty three persons. So, finger images of twenty three different persons with nine impressions per finger is considered. Out of the nine impressions per finger and speech samples for each individual, seven samples are used for training the individual classifiers and two samples are used for testing. This choice is considered for improving the recognition accuracy even under the adverse conditions. Since the number of available biometric samples is limited, ‘leaveoneout’ cross validation is employed to fine tune the training/ validation phase and estimate the best optimal weights under various noise conditions. As the fingerprint biometric is more robust, the performance of the fingerprint classifier under varying noise conditions is not considered. We define noise as ‘any unwanted change in the signal’. The influence of noise on the clean voice samples are modelled by adding AWGN to the voice samples.
The performance of the system with varying SNR conditions is considered systematically from the feature extraction and model building stage to the testing stage. MFCC feature vectors of the order 12, 16 and 20 and the GMM with 12 and 16 mixtures are considered for the simulation studies, as they are widely used. Different model combinations are considered to select the best model that gives better recognition accuracy. The voice model with 16 MFCC feature vectors with 12 Gaussian mixtures give improved recognition accuracy under normal operating conditions (10 to 20dB SNR). So, this model combination is considered for the experimental study. The outputs of the two classifiers are consolidated into a single vector of scores using the weighted sum rule of fusion. We have compared the performance of the proposed method with the baseline techniques such as bimodal systems with equal weighting, optimal integration weight estimation scheme without ancillary measures [21] and integration weight estimation using reliability measures [20, 22].
5.1 Baseline and stateoftheart techniques
5.1.1 Equal prior weights
This method equally weighs the classifiers, without making any prior assumptions about the quality of each data source. We have weighted the contribution of both the fingerprint and voice data equally for the identification problem. A constant value of γ=0.5 is assigned as an integration weight at all SNR conditions. The score transformation is achieved by the following equation:
where S^{(fc)} = finger scores; S^{(vc)} = voice scores. The testing accuracy of this technique is presented in Table 2. This technique will not favour one modality over another. At low SNR, the fusion system starts to exhibit catastrophic fusion. From the score density plot (Figure 2), it is evident that with the increase in noise, the overlap between the genuine and the impostor distribution curves increases. The detection error tradeoff (DET) curves in Figure 3 show the FAR and FRR performances.
5.1.2 Fusion with optimal integration weight without ancillary information
The performance of the proposed method is compared with the optimal integration weight estimation techniques reported in [21]. In [21], a blind optimization of the parameter β, (0≤β≤1) was performed to maximize the recognition accuracy of the multimodal system. The recognition accuracy (testing) of this method is shown in Table 2. The abbreviation IWWAM in the Table 2 denotes ‘integration weight without ancillary measures’. This method does show better performance than the sum rule of fusion method with equal weighing. Moreover, it shows improved accuracy than any of the unimodal systems under the normal operating conditions and maintains the accuracy of the better unimodal ones under all the adverse conditions. Further insight could be obtained from the DET plots in Figure 4. The disadvantage of the method is that at the extreme noise conditions, the fusion module contributes zero weighing to the voice modality and there is no substantial reduction in the FAR and FRR. The score density plots of the said method has been presented in [21]. It is evident from the plots that with the increase in noise, the overlap between the genuine and the impostor score distributions also increases, and under low SNRs, it shows the worst case performance.
5.1.3 Optimal fusion with reliability measures
We have also compared the proposed method with the reliabilitybased optimal integration weights estimation scheme presented in [20]. A direct implementation of the techniques reported in [20] was presented in [22] for fusing fingerprint and voice biometric. Even though the method shows better performance in terms of recognition accuracy and the FAR than the sum rule of fusion method with equal weighting, and the method discussed in subsection 5.1.2, it shows attenuation fusion under extreme noise conditions [22]. This is evident from the testing accuracy depicted in the Table 2. The DET plots from Figure 5 reveals that the FAR shows a more pronounced reduction when we use the quality measures for finding the optimal integration weight. A noted disadvantage with this method is that, considerable reduction in the FRR is not attained with this method. Figure 6 shows the score density performance with the said method.
5.1.4 Fusion with separability measures
For improving the recognition performance of the multibiometric system, we have presented a multinormalizationbased integration weight estimation scheme using separability measures in [22, 37]. This is a nonoptimization technique. For improving the performance of the system, fingerprint and speech similarity scores are preprocessed with cohort and tanh normalization methods, respectively. Table 2 and Figure 7 show the result. This method gives improved performance in recognition accuracy [22, 37] and reduces the FAR and FRR (Figure 7) compared to the baseline techniques, optimization technique and the reliabilitybased integration weight estimation techniques. One of the limitations of this method is with the cohort selection for normalization. Additionally, the FAR needs further reduction.
5.2 Fusion with the proposed method
From the training/validation stage, we have obtained the optimal integration weights β=x_{opt}×γ for different noise conditions (10 to 20dB SNR). The integration weights are estimated in such a way that it maximizes the recognition accuracy. We have applied the GS, GA and PSO techniques for optimizing the integration weight factor. The relative ancillary ratio estimates of the two modalities and the optimal integration weight β estimated under various noise conditions are presented in Table 1. The β values thus estimated during the training/validation stage are used for testing. We have employed the maximum similarity classifier, and it allocates the test samples to the class that is having the highest matching score values. The overall testing accuracy of the proposed method is depicted in Table 3. It is evident from Table 3 that the recognition accuracy of the proposed method eliminates catastrophic fusion under extreme noise conditions (0, 5 and 10 dB). The ancillary information provided by the reliability and the separability ratios effectively captures the relative noise degradation of the individual modalities employed. This helps to weight the complementary modalities based on their relative degradation which further improves the overall efficiency of the system. Even though the method discussed in subsection 5.1.4 shows similar performance in terms of recognition accuracy, the FRR of the said method needs further reduction. This is achieved with the proposed method. The added advantage of the proposed technique is that the classification errors, both the FAR and FRR, are reduced considerably even under low SNR conditions. The DET performance plots (Figures 8, 9 and 10) and the score density plot (Figure 11) highlight this observation.
This performance improvement is mainly due to the combined effects of the multinormalizationbased score preprocessing technique and the optimal weighting strategy using ancillary information. Here, the genuine scores from the fingerprint matcher are transformed with tanh and the voice matching scores are transformed with minmax normalization techniques. The use of tanh normalization technique plays a vital role in reducing the overlap between the score density plots. It is noticed that, this transformation, improves the separation between the genuine and impostor score distribution curves (It clamps the genuine score values as the mean and standard deviation of the genuine scores are used for tanh normalization [13]. In our experiment, the standard deviation of the genuine scores for fingerprint are 0.1646 while for speaker scores, it is 0.0598 (20 dB), 0.0621 (15 dB), 0.0654 (10 dB), 0.0730 (5 dB), 0.0937 (0 dB), 0.1262 (5 dB), 0.1395 (10 dB) for various noise conditions. We observe that the genuine fingerprint scores have a standard deviation that is approximately 10 times the standard deviation of the genuine voice scores. Hence, the constant factor in the tanh normalization was set to 0.1 [13, 38]). The 100% value in the recognition accuracy does not always indicate the classification errors FAR and FRR to be zero. The operating point used for calculating the FAR and FRR from the DET would determine its values. This is quite obvious because, if any of the genuine score are comparable to the impostor score values (or its value is less than the decision threshold) even after score normalization, cross over may occur in the score density plots and the FAR and FRR exhibit nonzero values. Therefore, the experimental studies reveal that the overall performance of the multibiometric system is improved, when multinormalizationbased score preprocessing technique and ancillary quality measures are employed on a unified framework for finding the optimal integration weights. This method could successfully overcome the attenuating fusion (‘catastrophic fusion’) under various noise conditions. The score density plots (Figure 11) indicate the proposed method effectively reduces the overlap between the genuine and the impostor score distributions, which also reduces the classification errors.
5.3 Statistical significance test
To compare the performance of different fusion methods, we have considered the Friedman test [39]. This test is best suited for multiclass data with any sample distribution. We have considered n=7 data sets (different noise conditions) and k=5 fusion methods, to test whether the proposed fusion method performs better than other methods. The null hypothesis is that the mean accuracy of the proposed method is equal to the mean accuracies of the other fusion methods. The hypothesis is rejected when the P value is small (usually <0.05). Using k1 degrees of freedom (4), we obtain the P value as 9.173×10^{5}. The P value obtained is much smaller than the level of significance 0.05, so we accept the alternative hypothesis that the mean accuracies of the weighted mean fusion method differ from the other methods. Using the mean ranks and the standard deviation obtained from the Friedman test, a multiple comparison test has been done using oneway ANOVA to show the performance of the different fusion methods. Figure 12 shows which means (and the comparison intervals around them) are significantly different and which are not. It can be seen that the mean rank obtained by the proposed methods outperform all other methods except the one discussed in subsection 5.1.4 (separability method with cohort normalization).
6 Conclusions
Integration weight optimization techniques based on multinormalization and ancillary measures are proposed here for improving the performance of the fingerprint and voice biometrics system. Experimental studies have been carried out under various noise conditions from 10 to 20dB SNR. The matching score preprocessing technique based on multinormalization effectively suppresses any unwanted peculiarities involved in the raw similarity computation of the individual matchers. The use of tanh and minmax normalization schemes with fingerprint and voicematching score vector helps in reducing the overlap between the genuine and the impostor score distributions, which in turn reduces the classification errors. Moreover, by estimating the best integration weight (γ) using ancillary measures derived from the feature space and the score space, we could weigh the two modalities based on their relative degradation. The optimal integration weights (β) are estimated in the training/validation stage using optimization techniques such as grid search, genetic algorithm and particle swarm optimization. Hence, by incorporating ancillary measures using the multinormalization framework, we could achieve better recognition performance even at low SNR conditions. The proposed method outperforms the baseline systems in terms of recognition accuracy, FAR and FRR. The proposed system performs more reliably in controlled environments, such as offices and laboratories, than in uncontrolled environments, such as outdoors. This work can be extended with databases having more number of subjects, and the fusion study can be conducted in the two recognition modes of identification and verification. Thus, the benefits of this work scan a wide range of areas that are capable of improving the quality of life of people. As this method can reduce the FAR and FRR, it may be highly suitable for applications like sharing networked computer resources, granting access to nuclear facilities, performing remote financial transactions, and forensic applications like criminal investigation, parenthood determination, etc.
References
 1.
AlonsoFernandez F, Fierrez J, Ramos D, GonzalezRodriguez J: Qualitybased conditional processing in multibiometrics: application to sensor interoperability. Syst. Cybernet. Part A: Syst. Huma. IEEE Trans 2010, 40(6):11681179.
 2.
Alsaade F, Ariyaeeinia A, Malegaonkar A, Pillay S: Qualitative fusion of normalised scores in multimodal biometrics. Pattern Recognit. Lett 2009, 30(5):564569. 10.1016/j.patrec.2008.12.008
 3.
Grother P, Tabassi E: Performance of biometric quality measures. Pattern Anal. Mach. Intell. IEEE Trans 2007, 29(4):531543.
 4.
Kryszczuk K, Richiardi J, Drygajlo A: Impact of combining quality measures on biometric sample matching. In IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, 2009, BTAS’09. Washington, DC; 28–30 September, 2009:16.
 5.
Mishra A: Multimodal biometrics it is: need for future systems. Int. J. Comput. Appl. IJCA 2010, 3(4):2833.
 6.
Nandakumar K, Chen Y, Jain AK, Dass SC: Qualitybased score level fusion in multibiometric systems. In Proceedings of 18th IEEE International Conference on Pattern Recognition (ICPR) 2006. Hong Kong; 2006:473476.
 7.
Nandakumar K, Ross A, Jain AK: Incorporating ancillary information in multibiometric systems. In Handbook of Biometrics. Heidelberg: Springer US,; 2008:335355.
 8.
Poh N, Kittler J: A unified framework for multimodal biometric fusion incorporating quality measures. IEEE Trans. Pattern Anal. Mach. Intell 2012, 34(34):38.
 9.
Poh N, Kittler J, Bourlai T: Qualitybased score normalization with device qualitative information for multimodal biometric fusion. Syst. Man Cybernet. Part A: Syst. Hum. IEEE Trans 2010, 40(3):539554.
 10.
Harbi A, Ma’en ZA: Article: A survey of multibiometric systems. Int. J. Comput. Appl 2012, 43(15):3643.
 11.
Ross AA, Nandakumar K, Jain AK: Handbook of Multibiometrics (International Series on Biometrics). Secaucus, NJ, USA: Springer; 2006.
 12.
Sahoo SK, Choubisa T, Mahadeva Prasanna SR: Multimodal biometric person authentication. A review. IETE Tech. Rev. 2012, 29(1):54. 10.4103/02564602.93139
 13.
Jain AK, Nandakumar K, Ross AA: Score normalization in multimodal biometric systems. Pattern Recognition 2005, 38(12):22702285. 10.1016/j.patcog.2005.01.012
 14.
Anzar SM, Sathidevi PS: Multinormalization: a new method for improving biometric fusion. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI). Chennai India; 3–5 August 2012:931937.
 15.
Poh N, Bengio S: Improving fusion with marginderived confidence in biometric authentication tasks. In Proceedings of 5th International Conference on Audioand VideoBased Biometric Person Authentication (AVBPA). Hilton Rye Town, NY, USA; 20–22 July 2005:474483.
 16.
Lewis TW, Powers DMW: Sensor fusion weighting measures in audiovisual speech recognition. In Proceedings of the 27th Australasian Conference on Computer Science (ACSC). Darlinghurst, Australia: Australian Computer Society, Inc.,; 2004:305314.
 17.
Kryszczuk K, Richiardi J, Prodanov P, Drygajlo A: Reliabilitybased decision fusion in multimodal biometric verification systems. EURASIP J. Appl. Signal Process 2007, 2007(1):7474.
 18.
Bendris M, Charlet D, Chollet G: Introduction of quality measures in audiovisual identity verification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Taipei; 19–24 April 2009:19131916.
 19.
Toh KA: Fingerprint and speaker verification decisions fusion. In Proceedings of the 12th International Conference On Image Analysis and Processing, 2003. Mantova; 17–19 September 2003:626631.
 20.
Rajavel R, Sathidevi PS: A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audiovisual speech recognition. Int. J. Signal Imaging Syst. Eng 2011, 4(2):123131. 10.1504/IJSISE.2011.041605
 21.
Anzar SM, Sathidevi PS: Optimization of integration weights for a multibiometric system with score level fusion. In Proceedings of Second International Conference on Advances in Computing and Information Technology (ACITY). Chennai, India; 13–15 July 2012:833842.
 22.
Anzar SM, Sathidevi PS: Optimal score level fusion using modalities reliability and separability measures. Int. J. Comput. Appl 2012, 51(16):18.
 23.
Fons M, Fons F, Cantó E, López M: Fpgabased personal authentication using fingerprints. J. Signal Process. Syst 2012, 66(2):153189. 10.1007/s1126501106293
 24.
Maltoni D, Maio D, Jain AK, Prabhakar S: Handbook of Fingerprint Recognition. London: Springer; 2009.
 25.
Reynolds D: Gaussian mixture models. In Encyclopedia Biometrics Recognition. London: Springer,; 2008.
 26.
Morizet N, Gilles J: A new adaptive combination approach to score level fusion for face and iris biometrics combining wavelets and statistical moments. In Proceedings of 4th International Symposium, ISVC 2008. Las Vegas, NV, USA; 1–3 December 2008:661671.
 27.
He M, Horng SJ, Fan P, Run RS, Chen RJ, Lai JL, Khan MK, Sentosa KO: Performance evaluation of score level fusion in multimodal biometric systems. Pattern Recognit 2010, 43(5):17891800. 10.1016/j.patcog.2009.11.018
 28.
Ribaric S, Fratric I: A matchingscore normalization technique for multimodal biometric systems. In Proceedings of Third COST 275 Workshop  Biometrics on the Internet. University of Hertfordshire, UK; 27–28 October 2005:5558.
 29.
Van der Heijden F, Duin RPW, De Ridder D, Tax DMJ: Classification, Parameter Estimation and State Estimation. The Attrium, Chichester: Wiley; 2004.
 30.
Rajavel R, Sathidevi PS: Adaptive reliability measure and optimum integration weight for decision fusion audiovisual speech recognition. J. Signal Process. Sys 2012, 68(1):8393. 10.1007/s112650110578x
 31.
Tariquzzaman M, Gyu SM, Young KJ, You NS, Rashid MA: Performance improvement of audiovisual speech recognition with optimal reliability fusion. In Proceedings of International Conference on Internet Computing & Information Services (ICICIS). Hong Kong; 17–18 September 2011:203206.
 32.
Bolle R: Guide to Biometrics. New York: Springer; 2004.
 33.
Kim J: Iterated grid search algorithm on unimodal criteria. PhD thesis, Virginia Polytechnic Institute and State University; 1997.
 34.
Yang W, Cao W, Chung TS, Morris J: Applied Numerical Methods Using MATLAB. Daryaganj, New Delhi: Wiley India Edition, Wiley India (P.)Ltd.; 2005.
 35.
Sumathi S, Paneerselvam S: Computational Intelligence Paradigms: Theory & Applications Using MATLAB. Boca Raton, London, New York: CRC Press, Taylor and Francis Group; 2009.
 36.
Feng L, Hansen LK: A New Database for Speaker Recognition. DTU, Kongens Lyngby: IMM, Informatics and Mathematical Modelling; 2005. http://www2.imm.dtu.dk/pubdb/p.php?3662i
 37.
Anzar SM, Sathidevi PS: Fusion of fingerprints and voice using multinormalization and separability measures. In Proceedings of the International Conference on Power Electronics, Systems and Applications. Kuala Lumpur; 25–26 August 2012:205209.
 38.
Singh Y, Gupta P: Quantitative evaluation of normalization techniques of matching scores in multimodal biometric systems. In Proceedings of International Conference on Adv. Biometrics, ICB 2007. Seoul, Korea; 27–29 August 2007:574583.
 39.
James AP, Dimitrijev S: Nearest neighbor classifier based on nearest feature decisions. Comput. J 2012, 55(9):10721087. 10.1093/comjnl/bxs001
Acknowledgements
The authors would like to acknowledge Dr. M. N. Bandyopadhyay, the Director of National Institute of Technology Calicut and Dr. V. H. Abdul Salam, the Principal of MES College of Engineering Kuttipuram for rendering all help and support needed for the successful completion of this work. Our appreciation also goes to the editors and reviewers as their comments greatly helped to improve the earlier version of this manuscript.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Mohammed Anzar, S.T., Sathidevi, P.S. On combining multinormalization and ancillary measures for the optimal score level fusion of fingerprint and voice biometrics. EURASIP J. Adv. Signal Process. 2014, 10 (2014). https://doi.org/10.1186/16876180201410
Received:
Accepted:
Published:
Keywords
 Ancillary measures
 Dispersion measures
 Separability measures
 Multinormalization
 Integration weights
 Noise robustness