 Research
 Open Access
Practical security and privacy attacks against biometric hashing using sparse recovery
 Berkay Topcu^{1, 2}Email authorView ORCID ID profile,
 Cagatay Karabat^{1},
 Matin Azadmanesh^{2} and
 Hakan Erdogan^{2}
https://doi.org/10.1186/s1363401603961
© The Author(s) 2016
Received: 10 August 2015
Accepted: 2 September 2016
Published: 15 September 2016
Abstract
Biometric hashing is a cancelable biometric verification method that has received research interest recently. This method can be considered as a twofactor authentication method which combines a personal password (or secret key) with a biometric to obtain a secure binary template which is used for authentication. We present novel practical security and privacy attacks against biometric hashing when the attacker is assumed to know the user’s password in order to quantify the additional protection due to biometrics when the password is compromised. We present four methods that can reconstruct a biometric feature and/or the image from a hash and one method which can find the closest biometric data (i.e., face image) from a database. Two of the reconstruction methods are based on 1bit compressed sensing signal reconstruction for which the data acquisition scenario is very similar to biometric hashing. Previous literature introduced simple attack methods, but we show that we can achieve higher level of security threats using compressed sensing recovery techniques. In addition, we present privacy attacks which reconstruct a biometric image which resembles the original image. We quantify the performance of the attacks using detection error tradeoff curves and equal error rates under advanced attack scenarios. We show that conventional biometric hashing methods suffer from high security and privacy leaks under practical attacks, and we believe more advanced hash generation methods are necessary to avoid these attacks.
Keywords
 Biometric verification
 Biometric hashing
 Advanced attack model
 Rainbow attack
1 Introduction
Biometric recognition provides an alternative to the traditional authentication mechanisms based on passwords or tokens such as ID cards due to the inalienable and distinctive nature of biometric traits. Biometric recognition systems enable fast, reliable, and secure electronic authentication; however, their largescale deployment in realworld applications causes privacy and security concerns [1–3]. Biometric systems are not foolproof and a critical vulnerability that is unique to biometrics systems is the acquisition of the stored templates by adversaries [4]. Biometric data might reveal sensitive information such as race, gender, and certain medical conditions. Since biometric traits are supposed to be permanent and unique to an individual, stolen templates can be used as unique identifiers to link information across different applications. Moreover, biometric modalities are limited in number, and they cannot be easily revoked to obtain another template as seen in the use of passwords. Therefore, it is essential to ensure the security of biometric templates and to protect biometric data. In the literature, several biometric template protection methods have been proposed [5] (e.g., fuzzy commitment scheme [6] and biohashing [7]) to overcome these concerns by securing biometric templates (e.g., face and fingerprint). Biometric template protection methods store a modified version of the biometric template and reveal as little information about the original biometric trait as possible without losing the capability to identify a person.
1.1 Biometric template protection and biohashing
Template protection methods can be categorized into two groups: (i) biometric cryptosystems [5] (i.e., fuzzy commitment [6], fuzzy vault [8]) and (ii) transformationbased methods/salting [9] (i.e., biohashing [10]). Biometric cryptosystems either bind secrets into biometric data to form a secure biometric template or generate secrets from biometric data with the help of some auxiliary data. The secrets can be successfully retrieved during a genuine verification attempt. The helper or auxiliary data does not reveal significant information about the biometric or the key. On the other hand, transformationbased approaches distort or randomize biometric data with the use of noninvertible functions so that the original data cannot be reconstructed from transformed templates. Biometric templates are transformed based on parameters derived from external information such as user keys or passwords.
Biohashing or biometric hashing [10] is one of the transformationbased methods, in which the biometric template of the user is transformed into a protected binary string through multiplication with a pseudorandom projection matrix and quantization. Due to increased interclass variation and preservation of intraclass variation, biohashing significantly improves verification accuracy when the secret key is kept secure and unknown to the adversaries. In this paper, we use the terms biohashing and biometric hashing synonymously, even though we think biometric hashing is a more descriptive name.
Biometric hashing uses a unique secret key in order to randomize biometric template of each user. It is a twofactor authentication system in which both the biometric modality and the secret key of a user have to be presented during authentication. Although biohashing methods have become very popular due to their high authentication performance and easy deployment into matchoncard applications, research recently showed that they might suffer from serious security and privacy problems [4, 11–13].
We believe that it is necessary to study the security and privacy preservation capabilities of biometric hashing especially when the secret key is compromised. If the key is always assumed to be kept secure, an authentication system which checks the accuracy of the entered key will achieve a zero verification error even without any need for biometric data.
1.2 Attacks against biohashing—biohash inversion
In the first study that investigates the invertibility of a biometric hashing algorithm, it was assumed that the biohash of a user and the corresponding random projection matrix are available to an adversary. Each dimension of the biohash vector was mapped to the set {−1,1} (by mapping [0]→[−1] and [1]→[1]) and the resulting vector was multiplied with the pseudoinverse of the random projection matrix. A new biohash created from the estimated biometric feature vector was used to perform imposter attacks. A similar approach that uses the pseudo inverse of a random projection matrix was also presented in [17]. In [18], a new method was proposed to generate a biometric feature from biohashes using genetic algorithms. For each biohash in a database, the proposed genetic algorithm was applied to approximate the value of the biometric feature given the corresponding secret key.
A detailed analysis of irreversibility of biohashes was performed by Feng et al. [19] where the details of the random projection is solved using perceptron learning. It was assumed that the attacker does not have the secret key of the user and the parameters of the random projection are estimated using stolen biohashes and a local biometric database. The main difference of this study is that the method requires several stolen biohashes from several distinct subjects (68 subjects, 105 images/subject for one database and 350 subjects, 40 images/subject for another database) for parameter estimation. It was assumed that the whole system is available to the adversary as a black box and the matching scores could be eavesdropped. A local face dataset (3500 different local faces) was presented to the system along with a common token and every local binary template was matched against every stolen template. Using the matching scores and the stolen biohashes, local binary biohashes corresponding to the local face database were calculated, which were used for iterative perceptron learning to estimate the projection parameters. Once the parameters of the random projection were estimated, they could be used to generate synthetic realvalued features from a stolen biohash which is another perceptron problem. Our proposed methods cannot be compared with this method where the estimation of parameters with a single stolen biohash is not possible and several biohashes from different subjects are required. However, our methods require only a single biohash for the inversion.
Nagar et al. [4] have proposed a promising approach that is comparable to our proposed methods. In that approach, given the binary biohash vector of a subject and the transformation parameters, a close approximation to the original biometric features is recovered by formulating the problem as an optimization problem. A database of unrelated biometric features was used for optimization. For each unrelated biometric feature vector from the database, a new feature vector was estimated by minimizing the Euclidean distance between the new feature vector and the unrelated biometric feature vector subject to the consistency criterion (i.e., the new biohash created from the estimated feature vector exactly matches the original biohash). The estimated feature vector was computed by taking the weighted average of t number of trials where the weight was the Hamming distance between the original biohash and the estimated one. Since this approach attempts to invert biohashes in a similar setup with our proposed methods, we compared it with our algorithms in terms of verification errors and computation times.
1.3 Contributions of this paper
In this paper, we propose four different novel optimizationbased methods that aim to predict the feature vector and/or the biometric image itself. Here, we assume that an adversary gains access to the biohash vector of a valid system user and the corresponding secret key and estimates a new realvalued feature vector from the binary biohash in order to authenticate to the system. Novel feature estimation methods are in the focus of this study. The first two proposed methods are based on 1bit compressive sensing approach and related feature reconstruction algorithms. Compressive sensing is a new signal acquisition technology with the potential of reducing the number of measurements required to acquire signals that are sparse or compressible in some domain. Rather than uniformly sampling the signal, compressive sensing computes inner products with a randomized dictionary of test functions. The signal is then recovered by a convex optimization which ensures that the recovered signal is consistent with the measurements. Onebit measurements is a more restricted case in which only the sign information of the random measurements is preserved. In our framework, we solve the biohash invertibility problem by using two different reconstruction approaches, namely, linear programming [20] and binary iterative hard thresholding [21].
We also discuss minimum norm solutions for approximating feature vectors from biohashes and present L _{2} and L _{1} norm minimization for this problem. Finally, we describe the rainbow attack to compromise the security of a biometric hashing scheme. Rainbow attack is different from feature approximation methods and does not aim at predicting a new feature vector. With the help of a huge database of biometric features along with the biohash vector of a valid user and the corresponding secret key, a biometric image that creates a sufficiently close biohash to the desired one is found and used for illegitimate authentication.
We propose practical attacks and study their performances instead of using theoretical metrics. Furthermore, we analyze the privacy issues related to the invertibility of biohash templates, and as a case study, we visually inspect reconstructed face images of the subjects. Authentication performance of the reconstructed feature vectors in a conventional verification setup, in which the plain features are used for matching, is also investigated.
Existing biohash inversion attacks
Method  Assumptions  Security  Privacy 

Multiply with the   Random projection  Attack with biohash  
pseudoinverse of  matrix is available  from estimated features:  
the random projection   Threshold is fixed   existing key  
and it is 0   a new key is assigned  
 Wavelet FMT face  and stolen again  
features  
Genetic algorithms   Random projection  1) Attack with biohash  
[18]  matrix is available  from estimated features:  
 Threshold is fixed   existing key  
and it is 0   a new key is assigned  
 Fingercode features  and stolen again  
2) Average distance  
between real and  
approximated features  
Perceptronlearning   Several biohashes  Identification scenario,  Adversary has 
with hill climbing and  of various different  where biohash generated  access to output 
MLP modeling with  subjects are available  from each synthetic face  of feature extractor 
customized hill  (other methods assume  is matched against the  given a face image 
climbing [19]  availability of a single  stolen templates  and applies hill 
stolen biohash)  climbing attack to  
 Attacker can access  generate synthetic  
the matching scores of  face images  
the system  
 Secret key of the  
user is available  
Solve a constrained   Random projection  Attack with biohash  Reconstructed 
minimization of  matrix is available  from estimated features:  face images 
distance between   Threshold is available   existing key  from estimated 
estimated features   A database of   a new key is assigned  vector using 
and unrelated  unrelated features  and stolen again  PCA inversion 
feature vector [4]   Eigenface features  
Methods proposed   Random projection  1) Attack with biohash  Orthogonal linear 
and discussed  matrix is available  from estimated features:  face features 
in this study:   Threshold is available   existing key  (i.e., PCA, LDA): 
 Eigenface features   a new key is assigned  transformation  
 Sparse recovery  and is unknown  matrix is known  
 Minnorm solutions   a new key is assigned  and its inverse  
and stolen again  is used to  
2) Verification accuracy  reconstruct  
using the real features  face images  
as gallery and  
approximated features  
as probe 
First, we review the biometric hashing scheme in Section 2. The proposed feature approximation methods are presented in Section 3, which is followed by description of the rainbow attack in Section 4. Section 5 presents the experimental results of the proposed approaches and finally, we summarize our findings and conclusions in Section 6.
2 Biometric hashing
Biometric hashing schemes are simple yet powerful biometric template protection methods [22–26]. Biohash is a binary and pseudorandom representation of a biometric template (e.g., face or fingerprint), and biometric hashing schemes perform an automatic verification of a user based on her biohash which is a binary string. The two inputs to a biometric hashing scheme are (i) biometric template and (ii) user specific secret key. A biometric feature vector is transformed into another space using a pseudorandom set of vectors which are generated from the user’s secret key. Then, the result is binarized to produce a pseudorandom bit string which is called the biohash. What is unique or specific to each user is the random projection matrix and it can be stored in a USB token or smartcard. In a practical system, userspecific random matrix is calculated based on a seed (userspecific secret key) that is stored in a USB token or smartcard microprocessor through a pseudorandom number generator. The seed is the same as those users recorded during the enrollment and is different among different users and different applications [7].
In an ideal case, the distance between the biohashes belonging to the biometric templates of the same user is expected to be relatively small. On the other hand, the distance between the biohashes belonging to different users is expected to be sufficiently high which enables higher recognition rates. The user is enrolled to the system at the enrollment stage. Then, the user again provides her biometric data and secret key to the system at the authentication stage in order to prove her identity.
In the below subsection, we describe the random projection (RP) based biohashing scheme proposed by Ngo et al. [10] for face verification.
2.1 Enrollment stage
where A∈ℜ^{ k×(m n)} is the PCA matrix trained by the face images in the training set, μ is the mean face vector, and x∈ℜ^{ k×1} is the vector containing PCA coefficients.
where z∈ℜ^{ ℓ×1} is an intermediate biohash vector.
where b∈ {0,1}^{ ℓ } denotes the biohash vector of the user and β denotes the quantization threshold which can be 0 (sign operator) or mean value of the intermediate biohash vector z, depending on the system design.
After enrollment, biometric hashes are stored in a database or in a smart card.
2.2 Authentication stage
where ⊕ denotes the binary XOR (exclusive OR) operator. The distance threshold ε is an integer between 0 and n (number of bits in a biohash). In a biometric hashing system, the selection of ε depends on system design, and it is chosen such that the desired false acceptance rate (FAR) and false rejection rate (FRR) are satisfied.
The system computes the Hamming distance between the test biometric hash vector and the claimed user’s reference biometric hash vector stored in the database. If the Hamming distance is below the predetermined distance threshold, the claimer is accepted; otherwise, the claimer is rejected (Fig. 1).
3 The proposed feature approximation methods from biohash
The success probability of such an attack to the biometric hashing system can be measured as \(P(d(\text {sign}(\mathbf {R}\hat {\mathbf {x}}),\mathbf {b})\) <ε)^{1}, where d(·) is the Hamming distance between two biohashes (i.e., the number of disagreeing bits). This metric is also called the intrusion rate due to inversion for the same biometric system (IRIS) by Nagar et al. [4]. In the next sections, we present various methods to obtain a feature vector \(\hat {\mathbf {x}}\) that allows illegitimate access to a biometric system given b and the transformation parameters.
3.1 Onebit compressive sensing approach
where b is the biohash vector, R is the matrix representing the random projection matrix (the measurement system), and the 1bit quantization function sign(.) is applied elementwise to the vector R x.
As the compressive sensing measurements are quantized to 1 bit, it is clear that the scale (absolute amplitude) of the signal is lost and it is not immediately evident that the remaining information is enough for signal reconstruction. Nonetheless, there is a strong empirical evidence stating that signal reconstruction is possible [28]. Onebit compressive sensing by linear programming [20] and binary iterative hard thresholding [21] are two theoretical reconstruction methods that we implement separately for obtaining inverse images of biohashes and finding biometric feature vectors that provide biohash vectors which are acceptable by the verification system (i.e., with a distance to the original biohash vector that is less than a threshold).
3.1.1 Onebit compressive sensing by linear programming
where B=diag(b).
The first constraint, \(\mathbf {BR}\hat {\mathbf {x}}\geq \mathbf {0}\), keeps the solution consistent with the original biohash vector and it is defined by the relation \(\langle R_{i},\hat {\mathbf {x}} \rangle \cdot b_{i}\geq 0\) for i=1,2,…,m where R _{ i } is the i ^{th} row of the random projection matrix R. The second constraint in the original problem definition (8) contains L _{2}norm which is a quadratic term and can be replaced with the linear L _{1}norm, so that the optimization becomes a linear program. The second constraint, \(\\mathbf {R}\hat {\mathbf {x}}\_{1}=m\), serves to prevent the program from returning zero solution, and it is linear as it can be represented as one linear equation \(\sum _{i=1}^{m}b_{i}\langle R_{i},\hat {\mathbf {x}} \rangle =m\), where m is the length of the biohash vector. Therefore, (9) is a convex minimization problem and can easily be represented as a linear program (see Algorithm 1).
3.1.2 Binary iterative hard thresholding
Binary iterative hard thresholding (BIHT) [21] is a modification of iterative hard thresholding (IHT) which is a realvalued algorithm designed for compressive sensing [30]. Proposed for the recovery of Ksparse signals, IHT algorithm consists of two steps. The first step is a gradient descent to reduce the least squares objective \(\\mathbf {y}\mathbf {Rx}\_{2}^{2}/2\). At each iteration, IHT proceeds by setting a ^{ l+1}=x ^{ l }+R ^{ T }(y−R x). The second step imposes a sparse signal model by selecting the K elements of a ^{ l+1} that are largest in magnitude.
where τ is a scalar that controls the gradient descent step size, and the function η _{ K } computes the best Kterm approximation of a ^{ l+1} (see Algorithm 2). In our experiments, we choose K as 25 % of the feature vector length, i.e., K=50 for 200 dimensional feature vectors and K=256 for 1024 dimensional feature vectors. As stated in Section 3.1, 25 % of the PCA coefficients captures 70 % of the total energy. In addition, we analyzed PCA coefficients of natural face images (see Fig. 4) and concluded that 25 % of the coefficients that are largest in magnitude are enough to reconstruct a typical face image that is visually similar to the original face image.
3.2 Minimum L _{1} and L _{2} norm solutions
In this section, we present and discuss minimum normbased feature reconstruction methods for biohashes in addition to the solutions we propose in 1bit compressive sensing framework.
In this work, we study minimum norm solutions for n=1 and n=2, namely L _{1} and L _{2} norms.
3.2.1 Inversion of the quantization step
Solutions in a 1bit compressive sensing framework implicitly handle the quantization of the randomly projected feature z within the optimization process. However, L _{1} and L _{2} normbased reconstruction requires an explicit inversion of the thresholding step of the biometric hashing scheme.
To be consistent with the solutions described in a 1bit compressive sensing approach, we assume that the signs of the elements of the intermediate vector z is used to obtain the biohash (i.e., the threshold at the quantization step is 0). However, various quantization methods and thresholding mechanisms are proposed in the literature for biometric hashing, one of them being the mean value of the intermediate vector and another one being its median value. If the system uses the mean value of the intermediate vector as the quantization threshold, the mean value of the z _{ f } can be calculated. In our experiments, the threshold equals to 0; thus, the mean value is not used, and the intermediate vector is computed as \(\hat {z}(i)=\hat {b}(i)\sigma \).
3.2.2 Minimum L _{2} norm solution
The closed form solution that gives the minimum L _{2} norm for the estimated feature vector is given by the MoorePenrose pseudoinverse. For linear systems A x=b with nonunique solutions (i.e., underdetermined systems), the pseudo inverse is used to reconstruct the solution of minimum Euclidean norm ∥x∥_{2} among all solutions. So the solution to the above minimization problem to estimate the feature vector from biohash b is calculated as \(\hat {\mathbf {x}}=\mathbf {R}^{\dag }\hat {\mathbf {z}}\).
3.2.3 Minimum L _{1} norm solution
In a 1bit compressive sensing approach by linear programming, L _{1} norm of the estimated feature vector is minimized according to the constraints that include the quantization step. However, minimum L _{1} norm solution handles the quantization step separately, and the minimization is carried out over the intermediate realvalued vector \(\hat {\mathbf {z}}\). The minimization problem still has linear constraints and minimization of L _{1} norm can easily be expressed as a linear program and solved accordingly.
For both L _{1} and L _{2} norm minimizations, if the PCA dimension is less than the biohash length (i.e., if the random projection step does not reduce the dimension), the linear system is overdetermined and an exact solution might not possibly exist (i.e., solutions could be inconsistent with the observations). Instead, it is possible to minimize the residual between the observation and the solution (i.e., \(\\hat {\mathbf {z}}\mathbf {R}\hat {\mathbf {x}}\_{n}\)) and to obtain a feature vector that provides biohashes that is close to the original one.
3.3 Reconstructing the face image
where A∈ℜ^{ k×(m n)} is the PCA matrix, A ^{ † } is the pseudoinverse of A, and μ is the mean face vector.
3.4 Other thresholding methods—apart from the “sign” operator
In cases where the thresholding after the random projection step is not the sign operator, some alternatives can also be formulated within our proposed framework. Assuming that an adversary has the full knowledge of the system, i.e., the specific thresholding method, he can also invert the biohashes.
3.4.1 Fixed or userspecific threshold
Apart from using the sign operator, one can use a predefined fixed threshold or user specific threshold, i.e., b=sign(R x−T) where T denotes the threshold. Entries of T can be the same number or different numbers at each dimension. T can also be specific to each user (it is shown as T _{ i } where i denotes the subject number). By augmenting the threshold vector to the random projection matrix, \(\hat {\mathbf {R}} = \left [ \begin {array}{cc} \mathbf {R} & \mathbf {T}_{i} \\ \end {array} \right ]\), we canreformulate the biohashing operation as \(\mathbf {b} = \text {sign}\left (\hat {\mathbf {R}}\left [ \begin {array}{cc} \mathbf {x}\! & \!1 \\ \end {array} \right ]\right)\) and perform the same operations for inverting biohashes.
3.4.2 Mean value is the threshold
where N is the biohash length, \(\underline {\mathbf {1}}\) is a matrix of ones, and the biohash vector becomes \(\mathbf {b} = \text {sign}(\hat {\mathbf {R}}\mathbf {x})\). An adversary can use the modified matrix \(\hat {\mathbf {R}}\) and all inversion methods that we discuss are still valid in this setup.
4 Rainbow attack
In the previous section, we propose four different optimization methods for recovering features from an original biohash vector that is stolen by an attacker. Having the corresponding secret key and using the knowledge of system parameters, one can estimate a realvalued feature vector \(\hat {\mathbf {x}}\) with the consistency criterion such that \(\mathbf {b}=\text {sign}(\mathbf {R}\hat {\mathbf {x}})\) in order to gain illegitimate access to the biometric system. Rainbow attack is different from these methods in the sense that it does not aim at inverting a biohash vector to obtain a valid preimage. Instead, using the knowledge of the system and the secret key of the user, with the help of a large database of biometric features, an adversary may find a face image which, when combined with the secret key of the user, result in a biohash vector that is sufficiently close to the original biohash b.
In the cryptography literature, a rainbow table is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Any computer system that requires password authentication must contain a database of passwords, either hashed or in plaintext, and utilize different methods to store passwords. Because the tables are vulnerable to thefts, storing passwords as plain texts is dangerous. Most databases therefore store a cryptographic hash of a user’s password in the database. When a user enters his password for authentication, it is hashed and compared to the stored password entry of that user (which is also hashed before being stored in the database). If the two hashes match, the access is granted. A rainbow table is a large dictionary with precalculated hashes and the passwords from which they were calculated. When an attacker steals a long list of password hashes from the system, he can quickly check if any of them are in the rainbow table. If that is the case, the rainbow table will also contain the original string that they were hashed from.
A biometric authentication system that protects biometric templates using biometric hashing methods operates in a similar way; the biohash of a user is stored and compared to the query biohash during verification. If an adversary having a large database of biometric features of various users steals the biohash of a system user and knows his secret key, the adversary can compute biohashes of each biometric feature in the database using the random projection matrix of the user and create a table of biohashes and their corresponding feature vectors. If any of the biohashes in the table is sufficiently close to the stolen biohash (i.e., their Hamming distance is less than a threshold), the corresponding feature vector can be used for illegitimate access to the biometric system.
Different from previously described attacks which try to approximate a feature vector that gives a close biohash vector to the stolen one, the rainbow attack is a practical attack that aims to compromise the security of a biometric hashing scheme. Furthermore, assuming that one authentication factor (the secret key of a user) is known, the rainbow attack also provides privacy threat since look alike faces can be found.
5 Experiments and results
In this section, the performance of our proposed attack methods are analyzed and discussed. The database that is used and the experimental setup are described, and attack models and their corresponding error rates are given.
5.1 Database and experimental setup
In order to provide the performance analysis of the security of biohashes based on the feature approximation methods, we implement our proposed methods on a face verification setup.
We have obtained face verification results on BioSecureds2 [31] face database. Faces are detected in an automatic fashion using ViolaJones face detector [32], and detected face images are resized to 64×64 pixels. In order to normalize a grayscale face image, its mean intensity value is extracted from each pixel and each pixel is divided to its standard deviation. The resulting face images have zero mean and unit variance.
The BioSecureds2 face database consists of 210 users, equally balanced in female and men. There are two sessions for each person. For each person and for each session, there exist six colored images (two webcam acquisitions and four standard camera acquisitions—two with flash and two without flash). Standard camera acquisitions of 210 users, 8 images per person, are used in our experiments.
Mdimensional PCA coefficients are calculated for all 8 samples of 210 subjects (a total of 1680 (210 × 8) face images are used in our experiments). Two different PCA dimensions (M=200 and M=1024) are used in this study. M=200 is a typical choice for PCA dimension of face images. We also analyze M=1024 in order to see what extent the increased feature dimension affects inversion process. PCA training is done using the first session images only. Applying standard biometric hashing procedure, a bit string is created by inner product between the pseudorandom number and Mdimensional PCA coefficients and deciding each bit based on the sign of the each vector entry. Using random projection matrices of different sizes, one can obtain bit strings of various lengths. We present our results using bit strings of lengths 128, 256, 512 and 1024, in order to analyze how the proposed methods perform for different biohash lengths.
In a verification setting, we use all possible combinations for matching genuine pairs and the first sample of each subject is chosen for imposter matches (5880 (210×8×7/2) genuine comparisons and 21945 (210×209/2) imposter comparisons) in order to evaluate the performance of the biometric hashing scheme. For validating the consistency of approximated features using the proposed methods, we compare the biohashes created from these features with the original biohashes leading to one imposter score for each sample in the database (1680 imposter matches). Equal error rates (EER) in each case are reported.
5.2 Performance of the biometric hashing scheme
Equal error rates (%) for biohash vectors of different lengths
PCA 200  PCA 1024  

Bit length  Biohash  Biohash (stolen key)  Biohash  Biohash (stolen key) 
128  6.295  12.571  6.593  13.565 
256  4.570  11.457  4.813  12.216 
512  4.137  11.595  4.328  11.634 
1024  2.875  11.118  2.934  11.553 
For all bit lengths, the performance of the biometric hashing scheme is better than the baseline PCA approach and lower EERs are obtained with the protected templates. In cases where an adversary steals the secret key of a user but does not possess the claimed person’s biometric information, the adversary sends his own biometric (or an arbitrary biometric) and the secret key of the genuine user in order to be authenticated. This is a serious threat to the system as the pseudorandom vectors generated using the secret key have a considerable influence on the generated bit string, therefore, on the matching score. However, even if the attacker knows the secret key, the verification accuracy of the biometric hashing system is still in the same range with the performance of the unprotected PCA vectors.
One obvious addition to the biometric hashing scheme is the direct comparison of secret keys (i.e., the one stored during enrollment and the one presented during authentication) prior to biohash comparison. This way 0 % (zero) EER is achieved if the attacker does not have the secret key of a valid user. The error rates presented in Table 2 are the results of biohash comparison, and if key checking mechanism is applied as illustrated in Fig. 1, the EERs for the first scenario would be 0 %. So that, here, we study the added security coming from the biometrics with the use of biohashes in cases where an attacker obtains the secret key.
5.3 Performance of the feature approximation from biohash methods
Since the database that we use has 1680 samples from 210 subjects, using their PCA coefficients and secret keys of each subject, we create 1680 biohashes, each corresponding to a different sample. It is assumed that an adversary obtains the biohash and the secret key of a user and with this knowledge he aims to find a feature vector by inverting the biohash. With this new feature vector, a new biohash can be calculated and used for authentication purposes. For each biohash in the database, we obtain a new feature vector and create its corresponding biohash. We use the new biohash to perform an imposter attack to the original one and we do not attack to other genuine samples. We use all possible combinations to match genuine pairs (5880 (210 × 8 × 7/2)), and the number of imposter comparisons is 1680 (one for each biohash). The performance of each method is reported in terms of the equal error rate (EER), and higher EER shows the success of the attacker (i.e., 100 % EER means that the inversion of all biohashes in the database is successful and the approximated features provide biohash that matches with the original one).
In order to evaluate the security that biometric hashing provides, we follow three consecutive scenarios:
Advanced attack model (AAM): The attacker, who knows the system details and possesses the biohash of a user and his secret key, calculates an estimate feature vector. Using this feature vector and the secret key of the subject, a new biohash is created and compared with the original one.
Security after key change (SAKC): Upon the detection of a security breach, the secret key of the user is changed by the system administrator. Using the previous biometric data, a new biohash is created from the new secret key and stored as the new gallery template in the system. The adversary does not have access to neither the new secret key nor the new biohash. The adversary makes an authentication attempt using the feature vector found in the advanced attack model and the previous (or an arbitrary) secret key. It should be noted that these errors are available only when the system does not perform key checking prior to biohash comparison. As the attacker does not know the secret key of the user, the EER in a keychecking system is 0 %.
Equal error rates (%) when the adversary has the true biometric features but does not possess the associated secret key
PCA dimension  Biohash length  

128  256  512  1024  
200  6.199  4.290  4.243  2.917 
1024  6.497  4.902  4.375  3.044 
Attack in the long term (ALT): The adversary obtains the new secret key of the user but not the new biohash. Using the feature vector found in the advanced attack model and the new secret key, the adversary makes an authentication attempt. This is different from the advanced attack model in the sense that the biohash vector of the user is not known to the adversary and the authentication attempt is performed using the approximated feature vector which is obtained from the previous biohash of the user.
5.3.1 Results for 1bit compressive sensing approaches
Equal error rates (%) for 1bit compressive sensing approaches—linear programming (LP) method
PCA  Bit length  AAM  SAKC  ALT 

200  128  100.00  7.262  48.333 
256  100.00  5.225  65.570  
512  100.00  4.018  78.958  
1024  100.00  3.308  89.987  
1024  128  100.00  7.530  40.187 
256  100.00  5.128  53.342  
512  100.00  4.286  68.907  
1024  100.00  3.444  80.863 
Equal error rates (%) for 1bit compressive sensing approaches—BIHT method
PCA  Bit length  AAM  SAKC  ALT 

200  128  100.00  7.381  33.767 
256  100.00  4.851  49.388  
512  100.00  3.958  74.809  
1024  100.00  3.367  90.536  
1024  128  100.00  6.667  16.314 
256  100.00  5.306  19.887  
512  100.00  4.252  28.759  
1024  100.00  3.474  47.653 
In the security after key change scenario, when the secret key of the user is changed but not known to the adversary, EERs are in the same line with the cases where the adversary has access only to one of the factors, either true biometric or true secret key (see Tables 2 and 3). In the attack in the longterm (ALT) scenario, it is possible for the attacker to have unauthorized access to the system most of the time, especially if the PCA length is shorter and the biohash length is longer (see the ALT column in Tables 4 and 5).
Boundary conditions are issues of LP implementation, i.e., small R x values before thresholding (for sign operator, values are close to zero). This leads to numerical inconsistencies about the inequality criteria of the linear program (i.e., B R x≥0) and can be solved by replacing the inequality constraint with B R x≥ε, where ε is the minimum positive number available in MATLAB (machine epsilon).
5.3.2 Results for minimum norm solutions
Equal error rates (%) for minimum norm solutions— L _{2} norm
PCA  Bit length  AAM  SAKC  ALT 

200  128  100.00  7.113  31.233 
256  99.843  5.196  34.753  
512  99.239  4.018  72.513  
1024  98.444  3.219  86.599  
1024  128  100.00  7.117  17.623 
256  100.00  5.544  21.003  
512  100.00  4.256  28.703  
1024  100.00  3.474  36.947 
Equal error rates (%) for minimum norm solutions— L _{1} norm
PCA  Bit length  AAM  SAKC  ALT 

200  128  100.00  6.815  30.965 
256  97.113  5.106  28.563  
512  92.491  3.839  61.173  
1024  92.751  3.431  77.564  
1024  128  100.00  6.577  17.534 
256  100.00  5.723  20.765  
512  100.00  4.196  28.346  
1024  100.00  3.474  36.947 
In the SAKC scenario, the performances of minimum norm solutions are similar to the 1bit compressive sensing solutions. If the new key of the user is stolen (the ALT scenario), 1bit compressive sensing approaches provide significantly higher error rates which shows the success of the attack method.
FAR1000 values for the proposed methods under the scenario attack in the long term
Method  200→1024  1024→1024 

LP  97.9932  95.9864 
BIHT  97.6190  66.1565 
L _{2}  94.1190  54.7789 
L _{1}  89.3027  54.7789 
A special case of solving the norm minimization problem is when the PCA feature vector dimension is equal to the length of biohash in bits. In approximating the 1024 dimensional PCA vector from biohash of length 1024 bits, there is a single unique solution. However, the condition number of the random projection matrix is so high and this leads to inaccurate solutions. We improve the solution by decreasing the condition number of the random projection matrix. In this common practice, 20 % of the maximum singular value of the matrix R is added to its all singular values. This way, the condition number of R decreases by ∼10^{2}.
5.3.3 Computation times for the proposed feature approximation methods
5.4 Results for the rainbow attack

Collusion model (CM): Keys are known to the attacker and using an available database, he finds the faces that provide the closest biohash given the secret key of the valid user.

Security after key change (SAKC): Secret keys of users are changed by the system administrator. The attacker does not know the new key but uses the face found in the CM scenario.

Attack in the long term (ALT): The attacker obtains the new key. He uses the face found in the CM scenario and the new key to create biohashes.
Equal Error Rates (%) for the rainbow attack
PCA  Bit Length  CM  SAKC  ALT 

200  128  53.597  6.964  38.571 
256  49.787  4.762  40.179  
512  47.177  4.043  41.820  
1024  46.054  3.342  43.469  
1024  128  56.467  7.440  38.746 
256  51.786  5.795  41.417  
512  48.206  4.524  42.543  
1024  46.794  3.296  43.439 
5.5 Privacy assessment for the proposed methods
5.5.1 Visual results of the attacks
The first two set of images (Figs. 7 and 8) belong to two different subjects from the database and the reconstruction is carried out on biohashes with length of 1024 bits which are obtained from 200dimensional PCA features. All four methods provide face images that look similar to the subject’s original face image.
5.5.2 Crosslinking different systems
Equal error rates (%) for direct feature level comparisons—200dimensional PCA feature vectors and biohash length = 1024 bits
LP  91.161 
BIHT  91.773 
L _{2}  88.338 
L _{1}  78.720 
6 Conclusions
Biometric template protection is a critical problem that needs to be addressed to enhance the public acceptance of biometric technologies, and it is essential to develop a set of measures which can evaluate the strength of template protection techniques. Although biometric cryptosystems can be analyzed using information theoretical metrics such as entropy and mutual information, the suitability of theoretical analysis of the transformationbased methods is based on the hardness of the invertibility of the transformation.
When a user’s biohash is obtained by an adversary, it can seriously undermine the security of the biometric system and privacy of users. If the secret key of a user is known to the adversary, the biometric feature of the user can be reconstructed from the user’s biohash which might harm the subject’s privacy and lead to illegitimate authentication to a system. Biometric hashing is claimed to be irreversible due to the random projection and quantization steps; however, our study shows that an attacker is able to invert the transformed template to obtain a close approximation to the original biometric template.
This paper proposes four novel ways to approximate the original biometric feature from the transformed template in a biometric hashing scheme and reveals security and privacy problems concerning the associated biometric system. We define three different attack scenarios under which we analyze the protection capability of biohashing. From the security point of view, these attacks enable an adversary to recover a biometric template under realistic assumptions and perform intrusion attacks to the biometric system. This study is the first to analyze the inversion of biohashes in a 1bit compressive sensing framework. Experimental results show the superiority of this approach over minimum norm solutions. Biohashes that are created from feature vectors obtained by using LP and BIHT solutions to the 1bit compressive problem are equal to the original biohashes stored during enrollment, and this is a serious threat to the security of the system. In addition, this study introduces rainbow attack in order to find a biometric template from a biometric database and use it to obtain a biohash that is same with or close to the original biohash of a subject.
Biometric hashing scheme is a generic template protection scheme that can be applied to various types of biometric features. In this paper, we focus on an orthogonal linear transform of face images, namely PCA (i.e., Eigenfaces). Several other studies on biohashing also use PCA ([4, 10]) or LDA ([19]) (i.e., Fisherface) which is another orthogonal linear transform that is invertible. Using the knowledge of the linear transform and its inversion, we analyze the privacy issues by reconstructing face images.
If the adversary knows system details (i.e., the PCA matrix, user’s secret key, and other parameters), the obtained feature vectors can be used to reconstruct face images of the subject which is a direct threat to the privacy of system users. The quality of the reconstructed images depends on the number of bits and length of the original feature vector, and the images illustrated in the last section visually confirm the success of the methods in reverting the biohash vectors. In this work, we study feature reconstruction and image reconstruction is carried out separately. Directly approximating images from biohash vectors may also be possible by integrating the PCA transformation with random projection matrix and solving the optimization problem by enforcing sparsity in the DCT or blockDCT domain. However, our initial experiments in this direction indicate that image level approximation approach lowers the performance both in security perspective (evaluated through EERs) and privacy perspective (evaluated through visual inspection of the reconstructed face images) due to the fact that the number of dimensions to be approximated is higher for images.
In the future, the effects of various improvements proposed for biometric hashing scheme might be investigated for security and privacy analysis by carrying out similar attacks on the improved versions of biometric hashing. In addition, weaknesses of the biometric hashing scheme should be explored and possible modifications should be introduced for better security and privacy protection capability in the light of the inversion attacks proposed in this study.
Declarations
Acknowledgements
This work has been partially supported by the BEAT project 7th Framework Research Programme of the European Union (EU), grant agreement number: 284989. The authors would like to thank the EU for the financial support and the partners within the consortium for a fruitful collaboration. For more information about the BEAT consortium please visit http://www.beateu.org.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 T Ignatenko, FMJ Willems, Information leakage in fuzzy commitment schemes. IEEE Trans. Inf. Forensics Secur.5(2), 337–348 (2010).View ArticleGoogle Scholar
 S Prabhakar, S Pankanti, AK Jain, Biometric recognition: security and privacy concerns. IEEE Secur. Priv.1(2), 33–42 (2003).View ArticleGoogle Scholar
 NK Ratha, JH Connell, RM Bolle, in Proceedings of the Third International Conference on Audio and VideoBased Biometric Person Authentication. An Analysis of Minutiae Matching Strength, AVBPA ’01 (SpringerLondon, 2001), pp. 223–228.View ArticleGoogle Scholar
 A Nagar, K Nandakumar, AK Jain, in Media Forensics and Security, 7541, ed. by ND Memon, J Dittmann, AM Alattar, and EJ Delp. Biometric Template Transformation: A Security Analysis, SPIE Proceedings (SPIEUnited States, 2010), p. 75410, doi:10.1117/12.839976.Google Scholar
 AK Jain, K Nandakumar, A Nagar, Biometric Template Security. EURASIP J. Adv. Signal Process.2008:, 113–117 (2008). doi:10.1155/2008/579416.View ArticleGoogle Scholar
 A Juels, M Wattenberg, in Proceedings of the 6th ACM Conference on Computer and Communications Security. A Fuzzy Commitment Scheme, CCS ’99 (ACMNew York, 1999), pp. 28–36, doi:10.1145/319709.319714.Google Scholar
 ATB Jin, DNC Ling, A Goh, Biohashing: two factor authentication featuring fingerprint data and tokenised random number. Pattern Recogn.37(11), 2245–2255 (2004). doi:10.1016/j.patcog.2004.04.011.View ArticleGoogle Scholar
 U Uludag, S Pankanti, A Jain, in Audio and VideoBased Biometric Person Authentication, 3546, ed. by T Kanade, A Jain, and N Ratha. Fuzzy Vault for Fingerprints, Lecture Notes in Computer Science (SpringerNew York, 2005), pp. 310–319, doi:10.1007/1152792332.View ArticleGoogle Scholar
 D Maltoni, D Maio, AK Jain, S Prabhakar, Handbook of Fingerprint Recognition, 2nd edn. (Springer, London, 2009).View ArticleMATHGoogle Scholar
 DCL Ngo, ABJ Teoh, A Goh, Biometric Hash: HighConfidence Face Recognition. IEEE Trans. Circ. Syst. Video Technol.16(6), 771–775 (2006). doi:10.1109/TCSVT.2006.873780.View ArticleGoogle Scholar
 X Zhou, Privacy and security assessment of biometric template protection. IT  Inf. Technol.54(4), 197–200 (2012).View ArticleGoogle Scholar
 B Yang, C Busch, P Bours, D Gafurov, in Media Forensics and Security, 7541, ed. by ND Memon, J Dittmann, AM Alattar, and EJ Delp. Robust Minutiae Hash for Fingerprint Template Protection, SPIE Proceedings (SPIEUnited States, 2010), p. 75410.View ArticleGoogle Scholar
 K Kümmel, C Vielhauer, T Scheidat, D Franke, J Dittmann, in Proceedings of the 11th IFIP TC 6/TC 11 International Conference on Communications and Multimedia Security. Handwriting Biometric Hash Attack: A Genetic Algorithm with User Interaction for Raw Data Reconstruction, CMS’10 (SpringerBerlin, 2010), pp. 178–190.Google Scholar
 A Kong, KH Cheung, D Zhang, MS Kamel, J You, An analysis of biohashing and its variants. Pattern Recognit.39(7), 1359–1368 (2006). doi:10.1016/j.patcog.2005.10.025.View ArticleMATHGoogle Scholar
 X Zhou, T Kalker, in Media Forensics and Security, 7541, ed. by ND Memon, J Dittmann, AM Alattar, and EJ Delp. On the Security of Biohashing, SPIE Proceedings (SPIEUnited States, 2010), p. 75410, doi:10.1117/12.839165.
 O Goldreich, Foundations of Cryptography: Volume 1 (Cambridge University Press, New York, 2006).Google Scholar
 Y Lee, Y Chung, K Moon, in IEEE Workshop on Computational Intelligence in Biometrics: Theory, Algorithms, and Applications, CIB. Inverse Operation and Preimage Attack on Biohashing, (2009), pp. 92–97, doi:10.1109/CIB.2009.4925692.
 P Lacharme, E Cherrier, C Rosenberger, in International Conference on Security and Cryptography, SECRYPT, ed. by P Samarati. Preimage Attack on Biohashing (SciTePressReykjavík, 2013), pp. 363–370, doi:10.5220/0004524103630370.Google Scholar
 YC Feng, M Lim, PC Yuen, Masquerade attack on transformbased binarytemplate protection based on perceptron learning. Pattern Recogn.47(9), 3019–3033 (2014). doi:10.1016/j.patcog.2014.03.003.View ArticleGoogle Scholar
 Y Plan, R Vershynin, Onebit compressed sensing by linear programming. Commun. Pur. Appl. Math.66(8), 1275–1297 (2013). doi:10.1002/cpa.21442.
 L Jacques, JN Laska, PT Boufounos, RG Baraniuk, Robust 1bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Trans. Inf. Theory.59(4) (2013). doi:10.1109/TIT.2012.2234823.
 C Karabat, H Erdogan, in Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. A Cancelable Biometric Hashing for Secure Biometric Verification System, (2009), pp. 1082–1085, doi:10.1109/IIHMSP.2009.121.
 Z Bai, D Hatzinakos, in 11th International Conference on Control Automation Robotics Vision (ICARCV). LbpBased Biometric Hashing Scheme for Human Authentication, (2010), pp. 1842–1847, doi:10.1109/ICARCV.2010.5707216.
 Y Wai Kuan, ABJ Teoh, DCL Ngo, Secure hashing of dynamic hand signatures using waveletFourier compression with BioPhasor mixing and 2N discretization. EURASIP J. Adv. Signal Process.2007(1), 32–32 (2007). doi:10.1155/2007/59125.View ArticleMATHGoogle Scholar
 C Rathgeb, A Uhl, in 20th International Conference on Pattern Recognition (ICPR). IrisBiometric Hash Generation for Biometric Database Indexing, (2010), pp. 2848–2851, doi:10.1109/ICPR.2010.698.
 A Lumini, L Nanni, An Improved Biohashing for Human Authentication. Pattern Recognit.40(3), 1057–1065 (2007). doi:10.1016/j.patcog.2006.05.030.View ArticleMATHGoogle Scholar
 M Turk, A Pentland, Eigenfaces for recognition. J. Cogn. Neurosci.3(1), 71–86 (1991). doi:10.1162/jocn.1991.3.1.71.View ArticleGoogle Scholar
 PT Boufounos, RG Baraniuk, in 42nd Annual Conference on Information Sciences and Systems, CISS. 1bit Compressive Sensing, (2008), pp. 16–21, doi:10.1109/CISS.2008.4558487.
 EJ Candes, MB Wakin, An introduction to compressive sampling. IEEE Signal Proc. Mag.25(2), 21–30 (2008). doi:10.1109/msp.2007.914731.View ArticleGoogle Scholar
 T Blumensath, ME Davies, Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal.27(3), 265–274 (2009). doi:10.1016/j.acha.2009.04.002.MathSciNetView ArticleMATHGoogle Scholar
 J OrtegaGarcia, J Fierrez, F AlonsoFernandez, J Galbally, MR Freire, J GonzalezRodriguez, C GarciaMateo, JL AlbaCastro, E GonzalezAgulla, E OteroMuras, S GarciaSalicetti, L Allano, B LyVan, B Dorizzi, J Kittler, T Bourlai, N Poh, F Deravi, M Ng, M Fairhurst, J Hennebert, A Humm, M Tistarelli, L Brodo, J Richiardi, A Drygajlo, H Ganster, FM Sukno, SK Pavani, A Frangi, L Akarun, A Savran, The multiscenario multienvironment BioSecure multimodal database (BMDB). IEEE Trans. Pattern. Anal. Mach. Intell.32(6), 1097–1111 (2010). doi:10.1109/TPAMI.2009.76.View ArticleGoogle Scholar
 P Viola, MJ Jones, Robust RealTime Face Detection. Int. J. Comput. Vis.57(2), 137–154 (2004). doi:10.1023/B:VISI.0000013087.49260.fb.View ArticleGoogle Scholar
 KH Cheung, AWK Kong, J You, D Zhang, in Proceedings of The 2005 International Conference on Imaging Science, Systems, and Technology: Computer Graphics, CISST, ed. by HR Arabnia. An Analysis on Invertibility of Cancelable Biometrics Based on Biohashing (CSREA PressLas Vegas, 2005), pp. 40–45.Google Scholar