Robustness against the degradation factors is essential for a reliable verification. A typical source of error in the iris recognition systems is lacking similarity between two iris patterns pertaining to the same individual. This mainly stems from the texture deformation, occluded regions, and the degradation factors like motion blurriness and lack of focus. The more the method is reliant on texture details, the more is the prone to failure verification. Generally, the existing methods dealing with NIR iris images tend to capture sharp variations of the texture and detailed information of the muscular structure like position and orientation of fibers. However, from blurred and unfocused iris images, no high frequency information can be obtained. Such dramatic performance degradation can be observed in the experiments conducted in [36].
The goal of our feature extraction strategy is to reduce the adverse effects of the degradations to a minimum through extracting texture information minimally affected by the noise factors. To do this, we utilize global variations combined with local but soft variations of the texture along the angular direction. The global variations can potentially reduce the adverse effects of the local noisy regions, and the local variations make it possible to extract essential texture information from the blurred and unfocused images. To take the advantage of both feature sets, we adopt an SVM-based fusion rule prior to performing the matching module. Figure 2 depicts an algorithmic overview of the proposed method.
In the following, we explain the proposed local and global variations in detail, including the parameters obtained from the training sets and the length of final binary feature vectors. The values reported as the optimal parameters are identical for both NIR and VL images; however, the reported code length for the local and global feature vectors just applies to the VL images. These values depend on the size of images, and since the NIR images are twice the size of VL images in the angular direction, the related values for NIR images are twice as big as the stated values for those of VL images.
3.1. Global Variations
Due to different textural behavior in pupillary and ciliary zones and also to reduce the negative effects of the local noisy regions, the image is divided into two distinct parts by the green dashed line as depicted in Figure 3. The following strategy is performed on each part, and the resulting codes are augmented to form the final global feature vector.
On each column, a window with 10-pixel wide is placed, and the average of the intensity values in this window is computed. Repeating this process for all columns leads to a 1D signature that reflects the global intensity variation of the texture along the angular direction. The signature includes some high frequency fluctuations that are probably created as a result of noise. Another probable reason is the high contrast and quality of the texture in the corresponding regions. In the best case, high frequency components of the signature are not reliable. Since the purpose is to robustly reveal the similarity of two iris patterns and regarding to the fact that these fluctuations are susceptible to the image quality, the signature is smoothed to achieve a more reliable presentation. In order to smooth the signature, a moving average filter with 20-pixel long is applied. Although more reliable for comparison, the smoothed signatures lose a considerable amount of information. To compensate for missing information, a solution may be to adopt a method which locally and in a redundant manner extracts salient features of the signature. Therefore, we perform 1D DCT on overlapped segments of the signature. To that end, the signature is divided into several segments with 20 samples in length which share 10 overlapping samples with each adjacent segment. On each segment, 1D DCT is performed and a subset of coefficients are selected. Because of the soft behavior of the smoothed signature, essential information is roughly summarized in the first five DCT coefficients. Then, the first coefficient of each segment is put in a sequence. Performing the same task for the other four coefficients results in five sequences of numbers that can be regarded as five 1D signals. Indeed, instead of the original signature, five informative 25-sample signals are obtained. In this way, the smoothed signature is compressed by half of the original length.
To encode the obtained signals, we apply two different coding strategies in accordance with the characteristic of the selected coefficients. The generated 1D signal based on the first DCT coefficient contains positive values presenting the average value of each segment. Therefore, a coding strategy based on the first derivative of the generated 1D signal is performed, that is, to substitute positive and negative derivatives with one and zero. Since the remaining four generated signals include variations around zero, a zero-crossing detector is adopted to encode the signals. Finally, corresponding to each part of the iris, a binary code containing
bits is generated. Concatenating the obtained codes leads to 250-bit global binary vector. Figure 3 illustrates how the binary vector pertaining to the global variations of the lower region is created.
3.2. Local Variations
The proposed method to encode the local variation is founded on the idea of the intensity signals—suggested by Ma et al. [25, 26] and the goal is to extract soft variations robust against the degradation factors. To that end, we exploit the energy compaction property of DCT and the multiresolution property of wavelet decomposition to capture the soft changes of the intensity signals. To generate the intensity signals, we divide the normalized iris to overlapping horizontal patches as depicted in Figure 4. Then, each patch is averaged along the radial direction that results in a 1D intensity signal. We use 10 pixels in height patches having five overlapping rows, thus 24 intensity signals are obtained.
When using wavelet decomposition, the key point is to ascertain which subband is the most liked with the smooth behavior of the intensity signals. For this purpose, reconstruction of the intensity signals based on different sub-bands was visually examined. Confirmed with our experiments, approximation coefficients of the third level of decomposition can efficiently display the low frequency variations of the intensity signals. To encode the coefficients, zero-crossing presentation is used and a binary vector containing 32 bits is obtained. Applying the same strategy on 24 intensity signals, a 768-bit binary vector is achieved.
In the second approach, the goal is to summarize the information content of soft variations in a few DCT coefficients. To that end, we smooth the intensity signals with a moving average filter. Then, each smoothed signal is divided to nonoverlapping 10-pixel long segments. After performing 1D DCT on each segment, the first two DCT coefficients are selected. Concatenating the DCT coefficients obtained from the consecutive segments results in two 1D signals which each contains 25 samples. To get a binary presentation, zero-crossing of the signals' first derivate is applied. This algorithm produces a 1200-bit binary vector for a given iris pattern. The final 1968-bit global binary vector is produced by concatenating the vectors obtained from the above two approaches.
3.3. Matching
To compare two iris images, we use the nearest neighbor approach as the classifier, and the Hamming distance as the similarity measure. To compensate for the eye rotation during the acquisition process, we store eight additional local and global binary feature vectors. This is accomplished by horizontal shifting of 3, 6, 9, and 12 pixels on either side in the normalized images. During verification, the local binary feature vector of a test iris image is compared against the other nine vectors of the stored template and the minimum distance is chosen. The same procedure is repeated for all training samples and the minimum result is selected as the matching hamming distance based on the local feature vector. A similar approach is applied to obtain the matching hamming distance based on the global feature vector. To decide about the identity of the test iris image, the fusion rule explained below is adopted to obtain the final similarity from the computed matching distances.
3.4. Fusion Strategy
The SVM provides a powerful tool to address many pattern recognition problems in which the observations lie in a high dimensional feature space. One of the main advantages of the SVM is to provide an upper band for generalization error based on the number of support vectors in the training set. Although traditionally used for classification purposes, the SVM has recently been adopted as a strong score fusion method. For instance, it has successfully been applied to iris recognition methods (e.g., [12, 14]), giving rise to a better performance in comparison with that of statistical fusion rules or kernel-based match score fusion methods. Besides, the SVM classifier has some advantages over Artificial Neural Networks (ANNs) and often outperforms them. In contrast to ANNs which suffer from the existence of multiple local minima solutions, SVM training always finds a global minimum. While ANNs are prone to overfitting, an SVM classifier provides us with a soft decision boundary and hence a superior generalization capability. Above all, an SVM classifier is insensitive to the relative numbers of training examples in positive and negative classes which plays a critical role in our classification problem. Accordingly, here, to take advantage of both local and global features derived from the iris texture, the SVM is employed to fuse dissimilarity values. In the following, we briefly explain how the SVM serves as a fusion rule.
The output of the matching module, the two hamming distances, represents a point in 2D distance space. To compute the final matching distance, the genuine and imposter classes based on the training set must be defined. The pairs of hamming distances computed between every two iris images of the same individual constitute the points belonging to the genuine class. The imposter class is comprised of the pairs of hamming distances explaining the dissimilarity between every two iris images of different individuals. Here, to ascertain the fusion strategy means to map all the points lying in the distance space into a 1D space in which the points of different classes gain maximum separability. For this purpose, the SVM is adopted to determine the separating boundary between the genuine and imposter classes. Using different kernels makes it possible to define linear and nonlinear boundaries and consequently a variety of linear and nonlinear fusion rules. The position and distance of the new test point relative to the decision boundary determine the sign and absolute value of the fused distance, respectively.