 Research
 Open Access
 Published:
An analysis on equal width quantization and linearly separable subcode encodingbased discretization and its performance resemblances
EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 82 (2011)
Abstract
Biometric discretization extracts a binary string from a set of realvalued features per user. This representative string can be used as a cryptographic key in many security applications upon error correction. Discretization performance should not degrade from the actual continuous featuresbased classification performance significantly. However, numerous discretization approaches based on ineffective encoding schemes have been put forward. Therefore, the correlation between such discretization and classification has never been made clear. In this article, we aim to bridge the gap between continuous and Hamming domains, and provide a revelation upon how discretization based on equalwidth quantization and linearly separable subcode encoding could affect the classification performance in the Hamming domain. We further illustrate how such discretization can be applied in order to obtain a highly resembled classification performance under the general Lp distance and the inner product metrics. Finally, empirical studies conducted on two benchmark face datasets vindicate our analysis results.
1. Introduction
Explosion of biometricbased cryptographic applications (see e.g. [1–12]) in the recent decade has abruptly augmented the demand of stable binary strings for identity representation. Biometric features extracted by most current feature extractors, however, do not exist in binary form by nature. In the case where binary processing is needed, biometric discretization becomes necessary in order to transform such an ordered set of continuous features into a binary string. Note that discretization is referred to as a process of 'binarization' throughout this article. The general block diagram of a biometric discretizationbased binary string generator is illustrated in Figure 1.
Biometric discretization can be decomposed into two essential components: biometric quantization and feature encoding. These components are governed by a static or a dynamic bit allocation algorithm, determining whether the quantity of binary bits allocated to every dimension is fixed or optimally different, respectively. Typically, given an ordered set of realvalued feature elements per identity, each singledimensional feature space is initially quantized into a number of nonoverlapping intervals according to a quantization fashion. The quantity of these intervals is determined by the corresponding number of bits assigned by the bit allocation algorithm. Each feature element captured by an interval is then mapped to a short binary string with respect to the label of the corresponding interval. Eventually, the binary output from each dimension is concatenated to form the user's final bit string.
Apart from the above consideration, information about the constructed feature space for each dimension is stored in the form of helper data to enable reproduction of the same binary string for the same user. However, it is required that such helper data, upon compromise, should neither leak any helpful information about the output binary string, nor that of the biometric feature itself.
In general, there are three aspects that can be used in assessing a biometric discretization scheme:

(1)
Performance: Upon extraction of distinctive features, it is important for a discretization scheme to preserve the significance of realvalued feature elements in the Hamming domain in order to maintain the actual classification performance. A better scheme usually incorporates a feature selection or bit allocation process to ensure only reliable feature components are extracted or highly weighted for obtaining an improved performance.

(2)
Security: Helper data upon revelation must not expose any crucial information which may be of assistance to the adversary in obtaining a false accept. Therefore, the binary string of the user should contain adequate entropy and should be completely uncorrelated to the helper data. Generally, entropy is a measure that quantifies the expected value of information contained in a binary string. In the context of biometric discretization, the entropy of a binary string is referred to as the sum of entropy of all singledimensional binary outputs. With the probability p_{ i } of every binary output i ∈ {1, ⋯ S} in a dimension, the entropy can be calculated as l={\sum}_{i=1}^{s}{p}_{i}{log}_{2}{p}_{i}. As such, the probability p_{ i } will be reduced when the number of outputs S is increased, signifying higher entropy and security against adversarial brute force attack.

(3)
Privacy: A high level of protection needs to be exerted against the adversary who could be interested in all userspecific information other than the verification decision of the system. Apart from the biometric data applicable for discretization, it is important that unnecessary yet sensitive information such as ethnic origin, gender and medical condition should also be protected. Since biometric data is inextricably linked to the user, it can never be reissued or replaced once compromised. Therefore, helper data must be uncorrelated to such information in order to defeat any adversary's privacy violation attempt upon revealing it.
1.1 Related works
Biometric discretization in the literature can generally be divided into two broad categories: supervised and unsupervised discretization (discretization that makes use of class labels of the samples and discretization that does not, respectively).
Unsupervised discretization can be subcategorized into thresholdbased discretization [7–9, 11]; equalwidth quantizationbased discretization [12, 13]; and equalprobable quantizationbased discretization [5, 10, 14–16]. For thresholdbased discretization, each singledimensional feature space is segmented into two intervals based on a prefixed threshold. Each interval is labeled with a single bit '0' or '1'. A feature element that falls into an interval will be mapped to the corresponding 1bit output label. Examples of thresholdbased discretization schemes include Monrose et al.'s [7, 8], Teoh et al.'s [9] and Verbitsky et al.'s [11] scheme. However, determining the best threshold could be a hurdle in achieving optimal performance. On top of that, this discretization scheme is only able to produce a 1bit output per feature dimension. This could practically be insufficient in meeting the current entropy requirement (indicating the level of toughness against brute force attacks).
On the other hand, the unsupervised equal width quantizationbased discretization [12, 13] partitions each singledimensional feature space into a number of nonoverlapping equalwidth intervals during quantization in accordance with the quantity of bits required to be extracted from each dimension. These intervals are labeled with binary reflected gray code (BRGC) [17] for encoding, where both of which require the number of constructed intervals to be of a power of 2 in order to avoid loss of entropy. Based on the equalwidth quantization and the BRGC encoding, Teoh et al. [13] have designed a userspecific equalwidth quantizationbased dynamic bit allocation algorithm that assigns different number of bits to each dimension based on an intraclass variation measure. Equal width quantization does not incur privacy issue. However, it could not offer maximum entropy since the probability of every quantization output in a dimension is rarely equal. Moreover, the width of quantization intervals can be easily affected by outliers.
The last subcategory of unsupervised biometric discretization, known as equalprobable quantizationbased discretization [14] segments each singledimensional feature space into multiple nonoverlapping equalprobable intervals, whereby every interval is constructed to encapsulate an equal portion of background probability mass during quantization. As a result, the constructed intervals are of different widths if the background distribution is not uniform. BRGC is used for encoding. Subsequently, two efficient dynamic bit allocation schemes have further been proposed by Chen et al. in [15] and [16] based on equalprobable quantization and BRGC encoding where the detection rate (genuine acceptance rate) [15] as well as area under FRR curve [16] is used as the evaluation measure for bit allocation.
Tuyls et al. [10] and Kevenaar et al. [5] have used a similar equalprobable discretization technique but the bit allocation is limited to at most one bit per dimension. However, a feature selection technique is incorporated in order to identify reliable components based on the training bit statistics [10] or a reliability function [5] so that unreliable dimensions can be eliminated from the overall bit extraction and the discretization performance can eventually be improved. Equal probable quantization offers maximum entropy. However, information regarding the background pdf of every dimension needs to be stored so that exact intervals can be constructed during verification. This may pose a privacy threat [18] to the users.
On the other hand, supervised discretization [1, 3, 14, 19] potentially improves classification performance by exploiting the genuine user's feature distribution or the userspecific dependencies to extract segmentations which are useful for classification. In Chang et al.'s [1] and HaoChan's scheme [3], singledimensional interval defined by [μ_{ j }  kσ_{ j }, μ_{ j } + kσ_{ j } ] (also known as the genuine interval) is first tailored for the Gaussian user pdf (with mean μ_{ j } and standard deviation σ_{ j } ) of the genuine user with a free parameter k. The remaining intervals of the same width are then constructed outwards from the genuine interval. Finally, the boundary intervals are formed by the leftover widths. In fact, the number of bits extractable from each dimension relies on the relative number of formable intervals in that dimension and is controllable by k. This scheme uses direct binary representation (DBR) for encoding. Chen et al. proposed a similar discretization scheme [14] except that BRGC encoding is adopted; the genuine interval is determined by the likelihood ratio pdf; and the remaining intervals are constructed equalprobably. Kumar and Zhang [19] employed an entropybased quantizer to reduce class impurity/entropy in the intervals through recursively splitting every interval until a stopping criterion is met. The final intervals will be resulted in such a way that majority samples enclosed within each interval would belong to a specific identity.
Despite being able to achieve a better classification performance than the unsupervised approaches, a critical problem with these supervised discretization schemes is the potential exposure of the genuine measurements or the genuine user pdf, since the constructed intervals serve as a clue at which the user pdf or measurements could be located to the adversary. As a result, the number of possible locations of user pdf/genuine measurements might be reduced to the amount of quantization intervals in that dimension, thus potentially facilitating malicious privacy violation attempt.
1.2 Motivations and contributions
Past research attention was mostly devoted to proposing discretization schemes with new quantization techniques without realizing the effect of encoding towards the discretization performance. This can be seen from the recent revelation of inappropriateness of DBR and BRGC for feature encoding in classification [20], although they were the most commonly seen encoding schemes for multibits discretization in the literature [1, 3, 12–16]. For this reason, the performance of multibits discretization schemes remain to be a mystery when it comes to linking the classification performance in the Hamming domain (discretization performance) with the relative performance in the continuous domain (classification performance of continuous features). To date, no explicit study has been conducted to resolve such an ambiguity.
A common goal of discretization is to convert realvalued features into a binary string which at least preserve the actual classification performance without significantly compromising the security and privacy aspects. To achieve this, it is important that appropriate quantization and encoding schemes have to be adopted. A new encoding scheme known as linearly separable subcode (LSSC) has lately been proposed [20]. With this, features can be encoded much more efficiently with LSSC than with DBR or BRGC. Since combining it with an elegant quantization scheme would produce satisfactory classification results in the Hamming domain, we adopt the unsupervised equalwidth quantization scheme in our analysis due to its simplicity and its less susceptibility against privacy attacks. However, a lower entropy could be achieved when the class distribution is not uniform (with respect to the equalprobable quantization approach). This shortage can simply be tackled by utilizing a larger number of feature dimensions or by allocating a larger quantity of bits to each dimension to compensate such entropy loss.
It is the objective of this article to extend the work of [20] to justify and analyze the deterministic discretetobinary mapping behavior of LSSC encoding; as well as the approximate continuoustodiscrete mapping behavior of equalwidth quantization when quantization intervals in each dimension are substantial. We reveal the essential correspondence of distance between the Hamming domain and the rescaled L1 domain for an equalwidth quantization and LSSC encodingbased (EW + LSSC) discretization. We further generalize this fundamental correspondence to Lp distance metrics and inner productbased classifiers to obtain desired performance resemblances. These important resemblances in fact open up possibility of applying powerful classifier in the Hamming domain such as binary support vector machine (SVM) without having to suffer from a poorer discretization performance with reference to the actual classification performance.
Empirically, we justify the superiority of LSSC over DBR and BRGC and the aforementioned performance resemblances in the Hamming domain by adopting face biometric as our subject of study. Note that such experiments could also be conducted using other biometric modalities, as long as the relative biometric features can be represented orderly in the form of a feature vector.
The organization of this paper is described as follows. In the next section, equalwidth quantization and LSSC encoding are described as a continuoustodiscrete mapping and a discretetobinary mapping, respectively, and both mapping functions are derived. These mappings are then combined to reveal the performance resemblance of EW + LSSC discretization to that of the rescaled L1 distancebased classification. In Section 3, proper methods to extend basic performance resemblance of EW + LSSC discretization to that of different metrics and classifiers are described. In section 4, approximate performance of EW + LSSC discretization with respect to L1 distancebased classification performance is experimentally justified. Results showing the resemblances of altered EW + LSSC discretization to the performance of several different distance metrics/classifier are presented. Finally, several insightful concluding remarks are drawn in Section 5.
2. Biometric discretization
For binary extraction, biometric discretization can be described as a twostage mapping process: Each segmented feature space is first mapped to the respective index of a quantization interval; subsequently, the index of each interval is mapped to a unique nbit codeword in a Hamming space. The overall mapping process can be mathematically described by
where v^{d} denotes a continuous feature, i^{d} denotes a discrete index of the interval, {b}_{{i}^{d}}^{d} denotes a short binary string associated to i^{d}, f:ℝ → ℤ denotes a continuoustodiscrete map and g:ℤ → {0, 1} ^{n} denotes a discretetobinary map. Note that a superscript d is used for specifying the dimension to which a variable belongs and it is by no means of being an integer power. We shall define both these functions in the following subsections.
2.1 Continuoustodiscrete mapping f(·)
A continuoustodiscrete mappingf(·) is achieved through applying quantization to a continuous feature space. Recall that an equalwidth quantization divides a onedimensional feature space evenly in forming the quantization intervals and subsequently maps each intervalcaptured background probability density function (pdf) to a discrete index. Hence, the probability mass {p}_{{i}^{d}}^{d} associated with each index i^{d} precisely represents the probability density captured by the interval with the same index. This equality can be described by
where {p}_{bg}^{d}\left(\cdot \right) denotes the dth dimensional background pdf, \mathit{int}{}_{{i}^{{d}_{\left(max\right)}}}^{d} and \mathit{int}{}_{{i}^{{d}_{\left(min\right)}}}^{d} denote the upper and lower boundary of interval with index i^{d} in the dth dimension, and S^{d} denotes the number of constructed intervals in the dth dimension. Conspicuously, the resultant background pmf is an approximation of the original pdf upon the mapping.
Suppose that a feature element captured by an interval \mathit{int}{}_{{i}^{d}}^{d} with an index i^{d} is going to be mapped to a fixed point within such an interval. Let {c}_{{i}^{d}}^{d} be the fixed point in \mathit{int}{}_{{i}^{d}}^{d} to which every feature element {v}_{{i}^{d}{j}^{d}}^{d} that falls within the interval has to be mapped, where i^{d} ∈{0,1, ⋯ S^{d}  1} denotes the interval index and j_{ d } ∈ {1,2 ⋯} denotes the feature element index. The distance of {v}_{{i}^{d}{j}^{d}}^{d} from {c}_{{i}^{d}}^{d} is
Suppose now we are to match each index i^{d} of the S^{d} intervals with the corresponding {c}_{{i}^{d}}^{d} through some scaling \u015d and translation \widehat{t}:
To make {\u015d}^{d} and {\widehat{t}}^{d} globally derivable for all intervals, it is necessary to keep distance between {c}_{{i}^{d}}^{d} and {c}_{{i}^{d}}^{d}+1 constant for every i^{d} ∈ {0, S^{d}  2}. In order to preserve such a distance between any two different intervals, {c}_{{i}^{d}}^{d} in every interval should, therefore, take identical distance from its corresponding \mathit{int}{}_{{i}^{{d}_{\left(min\right)}}}^{d}. Without loss of generality, we let {c}_{{i}^{d}}^{d} be the central point of \mathit{int}{}_{{i}^{d}}^{d}, such that
With this, the upper bound of distance of {v}_{{i}^{d}{j}^{d}}^{d} from {c}_{{i}^{d}}^{d} upon mapping in (3) becomes
To obtain the parameters {\u015d}^{d} and {\widehat{t}}^{d}, we normalize both feature and index spaces to (0, 1) and shift every normalized index i^{d} by \frac{1}{2{S}^{d}} to the right to fit the respective {c}_{{i}^{d}}^{d}, such that
Through some algebraic manipulation, we have
Thus, {\u015d}^{d}=\frac{\mathit{int}{}_{{S}^{d}1\left(max\right)}^{d}\mathit{int}{}_{0\left(min\right)}^{d}}{{S}^{d}} and {\widehat{t}}^{d}=0.5.
Combining results from (3), (4) and (8), the continuoustodiscrete mapping functionf(·) can be written as
Suppose we are to compute a L1 distance between two arbitrary points {v}_{{i}_{1}^{d},{j}_{1}^{d}}^{d} and {v}_{{i}_{2}^{d},{j}_{2}^{d}}^{d} for all {i}_{1}^{d},{i}_{2}^{d}\in \left[0,{S}^{d}1\right],{j}_{1}^{d},{j}_{2}^{d}\in \left\{1,2...\right\} in the dth dimensional continuous feature space, and the relative distance between the corresponding mapped elements in the dth dimensional discrete index space, then it is easy to find that the deviation between these two distances can be bounded below:
From (4), this inequality becomes
Note that the upper bound of such distance deviation is equivalent to the width of an interval in (6), such that
Therefore, it is clear that an increase or reduction in the width of each equalwidth interval could significantly affect the upper bound of such deviation. For instance, when the number of intervals constructed over a feature space is increased/reduced by a factor of β (i.e. S^{d} → βS^{d} or {S}^{d}\to \frac{1}{\beta}{S}^{d}), the width of each equalwidth interval will be reduced/increased by the same factor. Hence, the resultant upper bound for the distance deviation becomes \frac{2{\epsilon}_{{i}^{d},{j}^{d}(\mathrm{max})}^{d}}{\beta} and 2\beta {\epsilon}_{{i}^{d},{j}^{d}\left(max\right)}^{d}, respectively.
Finally, when static bit allocation is adopted where an equal number of equalwidth intervals is constructed in all D feature dimensions, the total distance deviation incurred by the continuoustodiscrete mapping can be upper bounded by 2D{\epsilon}_{{i}^{d},{j}^{d}\left(max\right)}^{d}.
2.2 Discretetobinary mapping g(·)
The discretetobinary mapping can be defined in a more direct manner compared to the previous mapping. Suppose that in the dth dimension, we have S^{d} discrete elements to be mapped from the index space. We therefore require the same amount of elements in the Hamming space to be mapped to. In fact, these elements in the Hamming space (also known as the codewords) may have different orders and indices depending on the encoding scheme being employed. With this, the directtobinary mapping can, therefore, be specified by
where ℂ(i^{d} ) denotes a codeword with index i^{d} from an encoding scheme ℂ. We shall look into the available options of ℂ and their individual effect on the discretetobinary mapping in the following subsections.
2.2.1 Encoding schemes

(a)
Direct binary representation (DBR)
In DBR, decimal indices are directly converted into their binary equivalent. Depending on the required size S of a code, the length of DBR is selected to be n_{DBR =} [log_{2}S]. A collection of DBRs in fulfilling S = 4, 8 and 16 are illustrated in Table 1.

(b)
Binary reflected gray code (BRGC) [17]
BRGC is a special code that restricts the Hamming distance between every consecutive pair of codewords to unity. Similarly as DBR, each decimal index is uniquely mapped to one out of S number of n_{BRGC}bit codewords, where n_{BRGC =} [log_{2}S]. If L_{ n }_{BRGC} denotes the listing of n_{BRGC}bit binary strings, then n_{BRGC}bit BRGC can be defined recursively as follows:
Here, bL denotes the list constructed from L by adding bit b in front of every element of L, and \stackrel{\u0304}{L} denotes the complement of list L. In Table 2, instances of BRGCs in meeting different values of S are shown.

(c)
Linearly separable subcode (LSSC) [20]
Out of codewords in total for any positive integer n_{LSSC}, LSSC contains (n_{LSSC} + 1) number of n_{LSSC}bit codewords, where every adjacent pair of codewords differs by a single bit and every nonadjacent pair of codewords differs by q bits, with q denoting the corresponding index difference. Beginning with an initial codeword, say the allzero codeword, the next n_{LSSC} number of codewords can simply be constructed by complementing a bit from the lowest order (rightmost) bit position to the highest order (leftmost) bit position one at a time. The resultant n_{LSSC} bit LSSCs in fulfilling S = 4, 8 and 16 are shown in Table 3.
2.2.2 Mappings and correspondences
On Hamming space where Hamming distance is crucial, a onetoone correspondence between each binary codeword and the corresponding Hamming distance incurred with respect to any reference codeword is essentially desired. We can observe clearly from Figure 2 that even though the widely used DBR and BRGC have each of their codewords associated with a unique index, most mapped elements eventually overlap each other as far as Hamming distance is concerned. In other words, although distance deviation in prior continuoustodiscrete mapping is minimal, the deviation effect led by such an overlapping discretetobinary mapping could be tremendous, causing the continuous feature elements originated from multiple different nonadjacent intervals to be mapped to a common Hamming distance away from a specific codeword.
Taking DBR as an instance in Figure 2a, feature elements associated with intervals 1, 2 and 4 are mapped to codewords '001', '010' and '100', respectively, which are all 1 Hamming distance away from '000' (interval 0). This implies that if there is a scenario where we have a genuine template feature captured by interval 0, a genuine query feature by interval 1, two imposters' query features by intervals 2 and 4, all query features will be mapped to 1 Hamming distance away from the template and could not be differentiated. Likewise, the same problem occurs when BRGC is employed, as illustrated in Figure 2b. Therefore, these imprecise mappings caused by DBR and BRGC greatly undermine the actual discriminability of the feature elements and could probably be detrimental to the overall recognition performance.
In contrast, LSSC does not suffer from such a drawback. As shown in Figure 2c, LSSC links each of its codewords to a unique Hamming distance away from any reference codeword in a decent manner. More precisely, a definite mapping behaviour can be obtained when each index is mapped to a LSSC codeword. The probability mass distribution in the discrete space is completely preserved upon the discretetobinary mapping and thus, a precise mapping from the L1 distance to the Hamming distance can be expected, such that given two indices {i}_{1}^{d}=f\left({v}_{{i}_{1}^{d},{j}_{1}^{d}}^{d}\right),\phantom{\rule{2.77695pt}{0ex}}{i}_{2}^{d}=f\left({v}_{{i}_{2}^{d},{j}_{2}^{d}}^{d}\right) and their respective LSSCbased binary outputs {b}_{{i}_{1}^{d}}^{d}=g\left(f\left({v}_{{i}_{1}^{d},{j}_{1}^{d}}^{d}\right)\right),{b}_{{i}_{2}^{d}}^{d}=g\left(f\left({v}_{{i}_{2}^{d},{j}_{2}^{d}}^{d}\right)\right),
where H_{ D } denotes the Hamming distance operator. The only disadvantage of LSSC is the larger bit length requirement a system may need to afford in meeting a similar number of discretization outputs compared to DBR and BRGC. In the case where a total of S^{d} intervals need to be constructed for each dimension, LSSC introduces R^{d} = S^{d}  log_{2}S^{d}  1 redundant bits to maintain the optimal onetoone discretetobinary mapping in the dth dimension. Thus, upon concatenation of outputs from all feature dimensions, the length of LSSCbased final binary string could be significantly larger.
2.3 Combinations of both mappings
Through combining both continuoustodiscrete and discretetobinary mappings, the overall mapping can be expressed as
where {\u015d}^{d}=\frac{\mathit{int}{}_{{S}^{d}1\left(max\right)}^{d}\mathit{int}{}_{0\left(min\right)}^{d}}{{S}^{d}} and {\widehat{t}}^{d}=0.5. This equation can typically be used to derive the codeword {b}_{{i}^{d}}^{d} based on the continuous feature value {v}_{{i}^{d}{j}^{d}}^{d}.
In view of different encoding options, three discretization configurations can be deduced. They are:

Equal Width + Direct Binary Representation (EW + DBR)

Equal Width + Binary Reflected Gray Code (EW + BRGC)

Equal Width + Linearly Separable SubCode (EW + LSSC)
Table 4 gives a glance of the behaviours of both mappings which we have discussed so far. Among them, a much poorer performance by EW + DBR and EW + BRGC can be anticipated due to intrinsic indefinite mapping deficiency. On contrary, only the combination of EW + LSSC could lead to approximate and definite discretization results. Since for LSSC, {H}_{D}\left({b}_{{i}_{2}^{d}}^{d},{b}_{{i}_{1}^{d}}^{d}\right)=\left\left(\rightseparators="">{i}_{2}^{d}{i}_{1}^{d}\right.\n and {S}^{d}={n}_{\mathsf{\text{LSSC}}}^{d}+1, integrating these LSSC properties with (3) and (4) yield
Here the RHS of (17) corresponds to a rescaled L1 distance.
By concatenating distances of all D individual dimensions, the overall discretization performance of EW + LSSC could, therefore, very likely to resemble the relative performance of the rescaled L1 distancebased classification:
Hence, matching plain bitstrings in a biometric verification system guarantees a rescaled L1 distancebased classification performance when {S}^{d}={n}_{\mathsf{\text{LSSC}}}^{d}+1 is adequately large. However, for cryptographic key generation applications where a bitstring is derived directly from the helper data of each user for further cryptographic usage, (18) then implies relation between the bit discrepancy of an identity's bitstring with reference to the template bitstring and the L1 distance of their continuous counterparts in each dimension.
3. Performance resemblances
When binary matching is performed, the basic resemblance in (18) can further be exploited to obtain resemblance with the other distance metricbased and machine learningbased classification performance. The key idea for such extension lies in how to flexibly alter the matching function or to represent each continuous feature element individually with its binary approximation in obtaining nearequivalent classification behaviour in the continuous domain. As such, rather than just confining binary matching method to pure Hamming distance calculation, these extensions significantly broaden the practicality of performing binary matching and enable a strong performance resemblance of a powerful classifier such as a multilayer perceptron (MLP) [21] or a SVM [22] when the bits allocation to each dimension is substantially large. In this section, 'ζ_{ ϕ } ' denotes the matching score of the 'ϕ' dissimilarity/similarity measure.
3.1 Lp Distance metrics
In the case where a Lp distance metric classification performance is desired, the resemblance equation in (18) can easily be modified and applied to obtain an approximate performance in the Hamming domain by
provided that the number of bits allocated to each dimension are substantially large, or equivalently, the quantization intervals in each dimension are of great number. As long as \left\left(\rightseparators="">{v}_{{i}_{2}^{d},{j}_{2}^{d}}^{d}{v}_{{i}_{1}^{d},{j}_{1}^{d}}^{d}\right.\n can be linked to the desired distance computation, (14) can then be modified and applied directly. According to (11), the total difference in distance of (19) is upper bounded by \sqrt[p]{{\sum}_{d=1}^{D}{\left(2{\epsilon}_{{i}_{2}^{d},{j}_{2}^{d}\left(max\right)}^{d}\right)}^{p}}.
Likewise, to achieve a resembled performance of kNN classifier [23] and RBF network [24] that use Euclidean distance (L2) as the distance metric, the RHS of (19) can simply be amended and subsequently adopted for binary matching by setting p = 2.
3.2 Inner product
For the inner product similarity measure which cannot be directly associated with \left\left(\rightseparators="">{v}_{{i}_{2},{j}_{2}}^{d}{v}_{{i}_{1},{j}_{1}}^{d}\right.\n, the simplest way to obtain the approximate performance resemblance is to transform each continuous feature value into its binary approximate individually and substitute it into the actual formula. By exploiting results from (3), (8) and (15), we have
leading to an approximate binary representation of the continuous feature value.
Considering inner product (IP) between two column feature vectors ν_{ 1 }and ν_{ 2 }as an instance, we represent every continuous feature element in each feature vector with its binary approximate to obtain an approximately equal similarity measure:
The total similarity deviation of (21) turns out to be upper bounded by {\sum}_{d=1}^{D}{\left({\epsilon}_{{i}^{d},{j}^{d}\left(max\right)}^{d}\right)}^{2}.
For another instance, the similarity measure adopted by SVM [22] in classifying an unknown data point appears likewise to be inner productbased. Let n_{ s } be the number of support vectors, y_{ k } = ± 1 be the class label of the kth support vector, v_{ k } be the kth Ddimensional support (column) vector, v be the Ddimensional query (column) vector, {\widehat{\lambda}}_{k} be the optimized Lagrange multiplier of the kth support vector and {\u0175}_{o} be the optimized bias. The performance resemblance of binary SVM to that of the continuous counterpart follows directly from (21) in such a way that
The expected upper bound of the total difference in similarity of (22) is then quantified by \underset{{y}_{k}}{max}\left\left(\rightseparators="">{{\sum}_{k=1}^{{n}_{{y}_{k}}}{\sum}_{d=1}^{D}{y}_{k}\left({\epsilon}_{{i}_{2}^{d},{j}_{2}^{d}\left(max\right)}^{d}\right)}^{2}\right.\n where y_{ k } = ± 1 and denotes the number of support vectors with class label y_{ k } .
In fact, the individual element transformation illustrated in (20) can be generalized to any other inner productbased measure and classifier such as Pearson correlation [25] and MLP [21] in order to obtain a resemblance in performance when the matching is carried out in the Hamming domain.
4. Performance evaluation
4.1 Data sets and experiment settings
To evaluate the discretization performance of the three discretization schemes (EW + DBR, EW + BRGC and EW + LSSC) and to justify the performance resemblances by EW + LSSC in particular, our experiments were conducted based on the following two popular face data sets:
AR
The employed data set is a random subset of the AR face data set [26], which contains a total of 684 images corresponding to 114 identities with 6 images per person. The images were taken under controlled illumination conditions with moderate variations in facial expressions. The images were aligned according to standard landmarks, such as eyes, nose and mouth. Each extracted raw feature vector consists of 56 × 46 grey pixel elements. Histogram equalization was applied to these images before they were processed by the feature extractor.
FERET
The employed data set is a random subset of the FERET face dataset, [27] in which the images were collected under a semicontrolled environment. It contains a total of 2400 images with 12 images for each of 200 identities. Proper alignment is applied to the images based on the standard face landmarks. Due to possible strong variation in hair style, only the face region is extracted for recognition by cropping it to the size of 61 × 73 from each raw image. The images were preprocessed with histogram equalization before feature extraction. Note that SVM performance resemblance experiments in Figures 3Ib, IIb and 4Ib, IIb only utilize images from the first 75 identities to reduce the computational complexity of our experiments.
For each identity in both datasets, half of the images are randomly selected for training while the remaining half is used for testing. In order to measure the false acceptance rate (FAR) of the system, each image of every identity is matched against a random image of every other identity within the testing partition (without overlapping selection), while for evaluating the system FRR, each image is matched against every other images of the same identity for every identity within the testing partition. In the following experiments, the equal error rate (EER) (error rate where FAR = FRR) is used to compare the classification and discretization performances, since it is a quick and convenient way to compare the accuracy of such classification and discretization. The lower the EER is, the better the performance is considered to be and vice versa.
4.2 Performance assessment
The conducted experiments can be categorized into two parts. The first part examines the performance superiority of EW + LSSC over the remaining schemes and justifies the fundamental performance resemblance with the rescaled L1 distancebased classification performance in (18). The second part vindicates the applicability of EW + LSSC discretization in obtaining a resembled performance of each different metric and a classifier including L1, L2, L3 distance metric, inner product similarity metric and a SVM classifier, as exhibited in (19) and (21). Note that in this part, features from each dimension have been minmax normalized (by dividing both sides of (19) and (21) by \left(\mathit{int}{}_{{S}^{d}1\left(max\right)}^{d}\mathit{int}{}_{0\left(min\right)}^{d}\right) before they are classified/discretized.
Both parts of experiments were carried out based on static bit allocation. To ensure consistency of the results, two different dimensionality reduction techniques (principal component analysis (PCA) [28] and Eigenfeature regularization and extraction (ERE) [29]) with two wellknown face data sets (AR and FERET) were used. The raw dimensions of AR (2576) and FERET (4453) images were both reduced to D = 64 by PCA and ERE in all parts of experiment.
In general, discretization based on static bit allocation assigns n bits equally to each of the D feature dimensions, thereby yielding a Dnbit binary string in representing every identity upon concatenating short binary outputs from all individual dimensions. Note that LSSC has a code length different from DBR and BRGC when labelling a specific number of intervals. Thus, it is unfair to compare the performance of EW + LSSC with the remaining schemes through equalizing the bit length of the binary strings generated by different encoding schemes, since the dimensions utilized by LSSCbased discretization will be much lesser than that by DBRbased and BRGCbased discretization at common bit lengths.
A better way to compare these discretization schemes would be in terms of entropy L of the final bit string. By denoting the entropy of the dth dimension as l^{d} and the ith output probability of the dth dimension as {p}_{{i}^{d}}^{d}, we have
Note that due to static bit allocation, S^{d} = S for all d. Since S^{d} = 2 ^{n} for BRGC & DBR while S = n_{LSSC} + 1 for LSSC, Equation 23 becomes
Figure 3 illustrates the EER and the ROC performances of equalwidth based discretization and the performance resemblances of EW+LSSC discretization based on the AR face data set. As depicted in Figure 3Ia, IIa for experiments on PCA and EREextracted features, EW + DBR and EW + BRGC discretizations fail to preserve the distances in the index space and therefore deteriorate critically as the number of quantization intervals constructed in each dimension increases, or nearly proportionally, as the entropy L increases. EW + LSSC, on the other hand, achieves not only definite, but also the lowest discretization performance among the discretization schemes especially at high L due to its capability in preserving approximately the rescaled L1 distancebased classification performance.
Another noteworthy observation is that the initially large deviation of EW + LSSC performance from the rescaled L1 distancebased performance tends to decrease as L increases at first and fluctuates trivially after a certain point of L. This can be explained by (6) that since for each dimension, the difference between each continuous value with the central point of the interval (to which we have chosen to scale the discretization output) is upperbounded by half the width of the interval \left({\epsilon}_{{i}^{d},{j}^{d}\left(max\right)}^{d}\right). To augment the entropy L produced by a discretization scheme, the number of intervals/possible outputs from each dimension needs to be increased. As a result, a greatly reduced upper bound of the overall deviation 2D{\epsilon}_{{i}^{d},{j}^{d}\left(max\right)}^{d} can eventually be obtained. Therefore, the more the number of intervals is constructed for each dimension, or in other words the higher the overall entropy is desired, the stronger the resemblances will be observed to be. Note that similar observations in Figure 3Ia, IIa illustrate the independence of the resemblances with respect to the feature extraction methods employed.
Perhaps the only limitation arose in achieving such performance resemblances is the inevitable derivation of large length binary string per user when high entropy strength is desired by a system. As shown in Figure 3Ia, IIa, bitstring that is at least four times longer than the entropy is needed to offer 222 and 224bit entropy respectively, while bit string that is at least six times longer is required to fulfil a 285 and 288bit systemspecific entropy respectively. Indeed, these amounts of binary bits pose high processing challenges to the system capability. However, with the current state of technology advancement, it is expected that processing these binary strings would not raise so much of a critical threat to the current systems.
For performance resemblance experiments on PCA and EREextracted features in Figure 3Ib, IIb, the tendency of the performance resemblances are similar to the previous case where the difference of EER performance in the continuous and the Hamming domains is noticeable at low L and an approximate performance resemblance can be noticed when L ≥ 159.2 in Figure 3Ib and L ≥ 161.7 in Figure 3IIb. Therefore, similar explanations applied.
Similar performance trends can be seen in Figure 4 when FERET data set was used. Note that when L increases, EW + DBR and EW + BRGC remain deteriorating badly, as shown in Figure 4Ia, IIa. Their deficit of being indefinite during the discretetobinary mapping process can again be justified. Contrarily, the performance of the EW + LSSC discretization remains the lowest and it resembles nearly exactly the rescaled L1 distancebased performance when L ≥ 338.77 in Figure 4Ia and L ≥ 276.52 in Figure 4IIa. In Figure 4Ib, IIb, the initial performance deviation between each pair of schemes is slightly lower as those in Figure 4Ib, IIb, although perfect resemblance can similarly be observed at high L.
4.3 Summary and discussions
By and large, our general findings can be summarized in the following three aspects:

When substantial quantization intervals are constructed or a large number of bits are allocated to each feature dimension, equal width (EW) quantization offers an approximate continuoustodiscrete mapping. LSSC outperforms DBR and BRGC in preserving a definite discretetobinary mapping behaviour. Overall, adopting equal width quantization with LSSC as a discretizer results in an approximate outcome.

As long as EW+LSSC is concerned, the distance between two mapped elements in the Hamming domain is fundamentally associated to an approximately rescaled L1 distance between the two continuous counterparts.

The basic performance resemblance of EW + LSSC discretization to L1 distancebased classification can be extended to Lp distancebased and inner productbased classifications either by flexibly modifying the matching function or by substituting every continuous feature element individually with its binary approximate to obtain a similar classification behaviour in the continuous domain.
We believe that the clarification of the underlying mapping behaviours of EW + LSSC discretization would benefit not only the cryptographic and biometric communities, but also communities from machine learning and data mining areas (i.e. relevant applications include image retrieval [30], image categorization [31], text categorization [32] and etc). In fact, EW + LSSC discretization can be appropriately adopted in any other application that requires transformation from continuous data to binary bitsrings and involves similarity/dissimilarity matching in the Hamming domain so as to attain a deterministically resembled performance of the continuous counterpart.
5. Conclusion
Biometric discretization aims to facilitating numerous security applications through deriving stable representative binary strings in practice. Therefore, understanding the way which discretization may influence on the classification performance is important in warranting the optimal classification performance when discretization is performed. In this paper, we have decomposed equalwidth discretization into a twostage mapping process and performed detailed analysis in the continuous, discrete and Hamming domains in view of different mapping associations among them. Our analysis yields that equalwidth quantization exhibits an approximate continuoustodiscrete mapping trend when sufficiently many quantization intervals are constructed while LSSC encoding scheme offers a definite discretetobinary mapping behaviour. We have shown that the combination of both such quantization and encoding schemes results in a discretization scheme which offers an approximate rescaled L1 distancebased classification performance in the Hamming metric. Further, we have also illustrated how such a fundamental resemblance can be exploited to obtain other approximate classification performances when binary matching is concerned. These analysis outcomes have been experimentally supported and the performance resemblances which we have shown are neither dependent to the feature extraction technique (PCA and ERE) nor the dataset (AR and FERET).
Abbreviations
 BRGC:

binary reflected gray code
 DBR:

direct binary representation
 EER:

equal error rate
 ERE:

Eigenfeature regularization and extraction
 EW:

equal width
 FAR:

false acceptance rate
 IP:

inner product
 LSSC:

linearly separable subcode
 MLP:

multilayer perceptron
 PCA:

principal component analysis
 SVM:

support vector machine.
References
Chang Y, Zhang W, Chen T: Biometricbased cryptographic key generation. Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2004) 2004.
Dodis Y, Ostrovsky R, Reyzin L, Smith A: Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. Eurocrypt 2004, LNCS 2004, 3027: 523540. 10.1007/9783540246763_31
Hao F, Chan CW: Private key generation from online handwritten signatures. Inf Manage Comp Security 2002,10(4):159164. 10.1108/09685220210436949
Juels A, Wattenberg M: A fuzzy commitment scheme. 6th ACM Conference in Computer and Communication Security (CCS'99) 1999, 2836.
Kevenaar TAM, Schrijen GJ, van der Veen M, Akkermans AHM, Zuo F: Face recognition with renewable and privacy preserving binary templates. Proceedings of 4th IEEE Workshop on Automatic Identification Advanced Technologies (AutoID '05) 2005 2126.
Linnartz JP, Tuyls P: New shielding functions to enhance privacy and prevent misuse of biometric templates. Proceedings of 4th International Conference on Audio and Video Based Person Authentication (AVBPA 2004), LNCS 2003, 2688: 238250.
Monrose F, Reiter MK, Li Q, Wetzel S: Cryptographic key generation from voice. Proceedings of IEEE Symposium on Security and Privacy (S&P 2001) 2001, 202213.
Monrose F, Reiter MK, Li Q, Wetzel S: Using voice to generate cryptographic keys. Proceedings of Odyssey 2001, The Speaker Verification Workshop 2001.
Teoh ABJ, Ngo DCL, Goh A: Personalised cryptographic key generation based on FaceHashing Comput. Security 2004,23(7):606614. 10.1016/j.cose.2004.06.002
Tuyls P, Akkermans AHM, Kevenaar TAM, Schrijen GJ, Bazen AM, Veldhuis NJ: Practical biometric authentication with template protection. Proceedings of 5th International Conference on Audio and Videobased Biometric Person Authentication, LNCS 2005, 3546: 436446. 10.1007/11527923_45
Verbitskiy E, Tuyls P, Denteneer D, Linnartz JP: Reliable biometric authentication with privacy protection. 24th Benelux Symposium on Information Theory 2003, 125132.
Yip WK, A Goh, DCL Ngo, ABJ Teoh: Generation of replaceable cryptographic keys from dynamic handwritten signatures. Proceedings of 1st International Conference on Biometrics, Lecture Notes in Computer Science 2006, 3832: 509515.
Teoh ABJ, Yip WK, Toh KA: Cancellable biometrics and userdependent multistate discretization in BioHash. Pattern Anal Appl 2009,13(3):301307.
Chen C, Veldhuis R, Kevenaar T, Akkermans A: Multibits biometric string generation based on the likelihood ratio. Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2004) 2004, 3: 22032206.
Chen C, Veldhuis R, Kevenaar T, Akkermans A: Biometric quantization through detection rate optimized bit allocation. EURASIP J Adv Signal Process 2009., 16:
Chen C, Veldhuis R: Extracting biometric binary strings with minimal area under the FRR curve for the Hamming distance classifier. Signal Process 2011, 91: 906918. 10.1016/j.sigpro.2010.09.008
Gray F: Pulse code communications. U.S Patent 1953., 2632058:
Galbally J, Fierrez J, OrtegaGarcia J, McCool C, Marcel S: Hillclimbing attack to an EigenfaceBased face verification system. 1st IEEE International Conference on Biometrics, Identity and Security (BIdS) 2009, 16.
Kumar A, Zhang D: Hand geometry recognition using entropybased discretization. IEEE Trans Inf Forens Security 2007, (2):181187.
Lim MH, Teoh ABJ, Linearly separable subcode: A novel output label with high separability for biometric discretization. Proceedings of 5th IEEE Conference on Industrial Electronics and Applications (ICIEA'10) 2010.
Simon H: Neural Networks: A Comprehensive Foundation. Second edition. Prentice Hall, New York; 1998.
Cortes C, Vapnik V: Supportvector networks. Mach Learn 1995,20(3):273297.
Cover TM, Hart PE: Nearest neighbor pattern classification. IEEE Trans Inform Theory 1967,13(1):2127.
Buhmann MD: Radial Basis Functions: Theory and Implementations. Cambridge University Cambridge, United Kingdom; 2003.
Pearson K: Notes on the history of correlation. Biometrika 1895,13(1):2545.
Martinez AM, Benavente R: The AR Face Database. CVC Technical Report # 24 1998.
Philips PJ, Moon H, Rauss PJ, Rizvi S: The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Mach Intell 2000.,22(10):
Turk M, Pentland A: Eigenfaces for recognition. J Cognit, Neurosci 1991,3(1):7186. 10.1162/jocn.1991.3.1.71
Jiang XD, Mandal B, Kot A: Eigenfeature regularization and extraction in face recognition. IEEE Trans Pattern Anal Mach Intell 2008,30(3):383394.
Datta R, Joshi D, Li J, Wang JZ: Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surveys 2008,40(2):160.
Chen Y, Wang JZ: Image categorization by learning and reasoning with regions. J Mach Learn Res 2004, 5: 913939.
Sebastiani F: Machine learning in automated text categorization. ACM Comput Surveys 2002,34(1):147. 10.1145/505282.505283
Acknowledgements
This work was supported by the MKE (The Ministry of Knowledge Economy), Korea, under IT/SW Creative research program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA2010(C181010020016)).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Lim, MH., Teoh, A.B.J. & Toh, KA. An analysis on equal width quantization and linearly separable subcode encodingbased discretization and its performance resemblances. EURASIP J. Adv. Signal Process. 2011, 82 (2011). https://doi.org/10.1186/16876180201182
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16876180201182
Keywords
 Classification Performance
 Encode Scheme
 Discretization Scheme
 Binary String
 Equal Error Rate