Facial Recognition in Uncontrolled Conditions for Information Security
© Q. Xiao and X.-D. Yang. 2010
Received: 1 December 2009
Accepted: 3 February 2010
Published: 8 April 2010
With the increasing use of computers nowadays, information security is becoming an important issue for private companies and government organizations. Various security technologies have been developed, such as authentication, authorization, and auditing. However, once a user logs on, it is assumed that the system would be controlled by the same person. To address this flaw, we developed a demonstration system that uses facial recognition technology to periodically verify the identity of the user. If the authenticated user's face disappears, the system automatically performs a log-off or screen-lock operation. This paper presents our further efforts in developing image preprocessing algorithms and dealing with angled facial images. The objective is to improve the accuracy of facial recognition under uncontrolled conditions. To compare the results with others, the frontal pose subset of the Face Recognition Technology (FERET) database was used for the test. The experiments showed that the proposed algorithms provided promising results.
With the growing need to exchange information and share resources, information security has become more important than ever in both the public and private sectors. Although many technologies have been developed to control access to files or resources, to enforce security policies, and to audit network usages, there does not exist a technology that can verify that the user who is using the system is the same person who logged in. Considering heightened security requirements of military organizations to exchange information, Defence Research and Development Canada (DRDC) Ottawa started a research project in 2004 to develop a demonstration system that automatically logs the user off the computer or locks the screen when the authenticated user cannot be identified by examining images of the person sitting in front of the computer . Facial recognition technology has been adopted to monitor the presence of the authenticated user throughout a session. Therefore, only the legitimate user could operate the computer and unauthorized entities have less chance to hijack the session. The objective is to enhance the level of system security by periodically checking the user's identity without disrupting the user's activities.
Various biometric technologies, which measures human physiological or behavioural characteristics, have been proposed for user authentication [2–4]. Physiological biometric traits, such as fingerprints, hand geometry, retina, iris, and facial images, are collected from direct measurements of the human body, while behavioural biometric characteristics, such as signature, keystroke rhythms, gait pattern, and voice recordings, are associated with a specific set of actions of a person. Based on the level of user involvement when capturing the biometric traits, biometrics can be further defined as either active or passive. Passive biometrics do not require the user to actively submit a measurement, while active biometrics need cooperation from the user . For approaches that enable continuous verification of identity but do not interrupt the user's activity, passive biometric technologies, such as keystroke analysis and facial recognition, have shown great potential [5–8]. However, the user alternates between the mouse and the keyboard, thus rendering monitoring difficult with keystroke rhythms. Recently, some researchers have investigated the possibility of using multiple biometric modalities to continuously authenticate the user [9, 10]. It has been demonstrated that a multimodal biometric system provides a higher level of authentication assurance, but needs more computational resources than a unimodal biometric system. Therefore, we developed a video-based facial presence monitoring demonstration system, which acquires images from a video camera and runs on a Windows-based computer in near-real time .
Experiments have been carried out where users were allowed to perform different tasks, such as answering a phone call or drinking a soda, while being able to move freely within their normal working space in front of a camera. A major challenge is that facial images are taken under uncontrolled conditions, such as changes in illumination, pose, facial expression, and so forth. The authors of  claimed that "In such uncontrolled conditions, all current commercial and academic face recognition systems fail". This motivated us to conduct further research on new algorithms to improve the performance of accuracy.
The rest of the paper is organized as follows. Section 2 reviews the video-based facial recognition technologies. Section 3 briefly introduces the research background and previous work. Section 4 presents the image preprocessing algorithms. Section 5 deals with multiangle facial image analysis. Section 6 presents performance evaluation and experimental results, and the conclusion and future work are discussed in Section 7.
2. Video-Based Facial Recognition
Video-based facial recognition is a promising technology that allows covert and unobtrusive monitoring of individuals. Generally, video sequences are a collection of sequential static frames, thereby allowing the use of still-image-based techniques. However, in video-based techniques, one can utilize the temporal continuity of the image sequences to enhance robustness of the recognition process.Chellappaand Zhou  proposed asystem that uses static images as the training data and video sequences as the probe data. They used a state space model to fuse the temporal information contained in the video sequences by tracking the subject identity using kinematics. A computationally efficient sequential importance sampling (SIS) algorithm was developed to estimate the posterior distribution. For identity n at each time instant t, by propagating the joint posterior distribution of the motion (denoted by θ t ) and subject information a marginalization procedure yielded a robust estimate of identity. Evaluation was performed on two databases containing 12 and 30 subjects. The training data consisted of a single frontal face image of each subject and the probe data were videos of each of the subjects walking straight towards the camera. The first database contained images with no variation in illumination or pose, while the second, larger database contained images with large illumination variation and slight variations in pose. With the first database, the proposed system achieved a recognition rate of 100% and interestingly, it was shown that the posterior probability of identity increased with time, whereas the conditional entropy decreased. Using the second database, with large fluctuations of illumination, the system produced an average classification rate of 90.75%.
Chen et al.  used the spatio-temporal nature of video sequences to model the motion characteristics of individual faces. For a given subject, they extracted motion flow fields from the video sequences using wavelet transforms. The high dimensional vectors encoding these flow fields were reduced in size by applying a Principal Component Analysis (PCA) followed by a Linear Discriminant Analysis (LDA). Recognition was performed using a nearest neighbour classifier. The training data was collected by recording 28 subjects pronouncing two words in Mandarin. For each subject, nine video sequences were captured under different poses. For the testing data, they used the same sequences, but applied an artificial light source of varying intensity, as the goal of their evaluation was to measure robustness to illumination variations. Face alignment was performed by cropping the faces below the positions of the eyes, which were indicated manually. This method was evaluated against the Fisherface algorithm and exhibited much more stable performance across a wide range of illumination, it alsoachieved a correct classification rate of 70% versus 20% for Fisherface on the equivalent test data.
Liu and Chen  applied adaptive Hidden Markov Models (HMM) for pose-varying video-based face recognition. All face images were reduced to low-dimensional feature vectors by using PCA. In the training process, an HMM was generated to learn both the statistics of the video sequences and the temporal dynamics of each subject. During the recognition stage, the temporal characteristics of the probe face sequence were analyzed over time by the HMM corresponding to each subject. The likelihood scores provided by the HMMs were compared. Based on maximum likelihood scores, the identity of a face in the video sequence was recognized. Face regions were manually extracted from the images. The test database included 21 subjects. Two sequences were collected for each subject: one contained 322 frames for training and the other had around 400 frames for testing. The recognition performance of the proposed algorithm was compared with an individual PCA method, which showed a 4.0% error rate for HMM versus a 9.9% error rate for PCA.
3. Previous Work and System Overview
Traditionally, the authentication process only verifies the identity of a user once at login or sign-on. Afterward, the assumption is that the system remains under the control of the same authenticated user. This authentication mechanism is fairly secure for one-time applications, such as accessing a protected file or withdrawing money from an automatic banking machine. However, there is a security threat if an unauthorized user takes over the session after the legitimate user successfully logged in. A facial presence monitoring system was developed in our previous work to verify the user's identity throughout the entire session .
The system captures video images and processes either 24-bit or 32-bit colour images. The captured images are displayed on the computer screen via DirectX in real time.
(2) Face Detection
A face detection algorithm subsequently examines the captured images. The locations of potential human faces are recorded. Later, when a video image is finally rendered to the monitor, a red rectangle encloses each potential face. Because of the near-real time requirement, this algorithm cannot be expected to perform flawlessly. Therefore, it is anticipated that some detected objects are not actually faces. In order to obtain an accurate result, the system examines more than one image frame to determine if an object is likely a face or not, which is one of the advantages of using video-based facial recognition. When an object presumed to be a face is discovered, the corresponding region in the previous frame is examined. If there was no face found in that area in the previous frame, then the current object is unlikely to be a face. As a result, the object will not be recorded as a potential face. Conversely, if there had been a face present in that region, the likelihood that the current object represents a face is greater. In such a situation, the system assumes that the object is a face.
(3) Face Segmentation
Because the position of a face computed by the face detection module is not accurate, a more precise location is necessary for a good face-matching performance. Since the size of a user's face appearing in a video frame also varies depending on the distance of the user from the web camera, the face image must be normalized to a standard size. There are some features in face images that may change from time to time. For example, the hairstyle can change significantly from one day to another. In order to reduce the effects of such dynamic features, a standard elliptical region with a fixed aspect ratio is used to extract the face region.
(4) Face Matching
Turk and Pentland pioneered the eigenface method , which relies on the Karhunen-Loeve (KL) transform or the Principal Component Analysis (PCA). To improve the performance of the eigenface method, it is important to have a good alignment between the live and the stored face images. It means that the nose has to be in the middle, the eyes have to be at a stable vertical position, and the scale of the face images must be normalized. A significant portion of our efforts addressed these issues. An elliptical facial region extracted from a video frame is matched against the facial models stored in a database. Each face image is first converted to a vector. This vector is projected onto eigenfaces through inner product calculations. Each face produces a weight vector. The Euclidean distance between two weight vectors is used to measure the similarity between the two faces. This distance is then mapped to a normalized matching score.
This module provides a histogram-based intensity mapping function to normalize the intensity distribution of the segmented face image. It is noted that some areas, such as the eyes, can be very dark due to light direction. It is potentially beneficial to enhance the features in dark regions to improve the recognition performance.
(6) Facial Database
It is assumed that data from up to eight users may be saved in the database. Each user is required to take at least one picture in the user's normal working environment within the normal sitting space and under the normal lighting conditions.
Live video: the detected face in the scene is surrounded by a red rectangle.
Matching results: the segmented face image from the current test scene is displayed, along with up to five candidate faces from the database in descending priority order.
Performance Data: several performance data are displayed in real time, such as the overall frame rate, the face detection time, the face recognition time, and the best matching score.
4. Image Preprocessing
A study on image preprocessing algorithms has been carried out to improve the accuracy performance. It focused on the areas that affect the accuracy of facial recognition, which include geometric correction, face alignment, masking and photometric normalization.
4.1. Face and Eye Detection
4.2. Face Alignment and Masking
4.3. Photometric Normalization
To convolve an image with a separable filter kernel, each row in the image is convolved with the horizontal projection to obtain an intermediate image. Next, we convolve each column of the intermediate image with the vertical projection of the filter. Hence the resulting image is identical to the direct convolution of the original image and the filter kernel. The convolution of an image with an filter kernel requires a time proportional to . In comparison, convolution in the separable fashion only requires a time proportional to . Therefore, the processing speed is improved to achieve real time operation.
5. MultiAngle Facial Image Analysis
In the experiments, thirty subjects were randomly selected to construct the training set and twelve subjects were used as imposters to evaluate the false accept rate (FAR). In order to compare with other published algorithms, the proposed approach has been tested using the widely-used FERET database. The fb (frontal pose) subset, consisting of 725 subjects, was used in which 580 subjects were selected randomly as the training set and the remaining 145 used as impostors. The performance metrics consisted of the mean classification rate, the false accept rate, and the false recognition rate (FRR). The FAR is the likelihood of incorrectly accepting an impostor, while the FRR is the likelihood of incorrectly rejecting an individual in the training set. As a common rule, the thresholds were adjusted based on the classification confidence values to evaluate the trade-off between FAR and FRR.
6. Experimental Evaluation
Figure 11(b) presents the FAR and FRR, individually, as functions of threshold value. Since higher threshold values increase FRR and decrease FAR, analysis of these results is only meaningful when it takes into account both FAR and FRR together. The point at which the FAR is equal to the FRR is called the equal error rate (EER). This is another commonly used measure to assess the overall performance for biometric systems. The result in Figure 11(b) shows that error is minimized when training with greater angular range, as expected. Training with an angular range of yields an equal error rate of 4.86%, whereas this increases to 6.80% and 7.42% for angular ranges of and , respectively.
To deal with outdoor scenarios, a preliminary test has been conducted by using static images of eight of the subjects taken outdoors; six of the eight subjects (75%) were successfully classified. Note that the training data for these individuals came exclusively from indoor data. Due to the very limited number of testing samples, these results should not be taken as definitive statements of performance.
7. Conclusions and Future Work
As networks become larger, more complex, and more distributed, information security has become more critical than it has ever been in the past. Many efforts have been made aiming to accurately authenticate and authorize trusted individuals, and audit their activities. Once a user is successfully logged in, the prevailing technique assumes that the system is controlled by the same person. Focusing on this security challenge, we developed an enhanced authentication method that uses video-based facial recognition technology to monitor the user during the entire session in which the person is using the system. It can automatically lock the screen or log out the user when the authenticated user's face disappears from the vicinity of the computer system for an adjustable time interval.
In order to improve the performance in accuracy, further research has been conducted in developing image preprocessing algorithms and using multiangle facial images in training. The experiments conducted on the CIM and FERET face databases showed promising results. On the FERET database, an EER of 7.5% is obtained which is comparable to the best published EER rate of 6% in the literature. A major advantage of video-based face recognition is that a set of images of the same subject can be captured in a video sequence, while a main problem of video-based face recognition lies in the low images quality in video frames. In order to improve recognition accuracy, an effort has been put into combining front and angle face images.
Uncontrolled face recognition from video is still a challenging task. During our experiments, it became clear that most classification errors were due to instability in the alignment process. Specifically, we noticed that even though the eye detector accurately finds the eyes, the position tends to oscillate in the dark area of the pupils, which causes fluctuations in the computed in-plane head rotation angle. One area for future research is to investigate bootstrapping and integration of spatio-temporal filtering methods in the eye detector to mitigate this issue. We will perform more research on the relationships among front and angle-face images to extract some nose features that cannot be obtained or accurately measured from the front face itself, such as nose slope and depth. Not only will we use newly developed algorithms to improve the facial presence monitoring system, but also we will explore other application areas that will benefit from uncontrolled face recognition. In addition, we need to conduct research to analyze legal and social aspects of monitoring the user's presence at a workplace.
The authors would like to thank for the great contribution that Dr. Martin Levine and Dr. Jeremy Cooperstock made under the contract W7714-071076. Without their contribution, they would not have been able to achieve the results presented herein. The authors would also like to thank Dr. Daniel Charlebois for his support and valuable comments throughout the course of this research project.
- Yang XD, Kort P, Dosselmann R: Automatically log off upon disappearance of facial image. In Contract Report. DRDC, Ottawa, Canada; March 2005.Google Scholar
- Reid P: Biometrics for Network Security. Prentice-Hall, Upper Saddle River, NJ, USA; 2003.Google Scholar
- Chirillo J, Blaul S: Implementing Biometric Security. John Wiley & Sons, Indianapolis, Ind, USA; 2003.Google Scholar
- Manoj R: Biometric security: the making of biometrics era. InfoSecurity July 2007, 16-22.Google Scholar
- Xiao Q, Yang XD: A facial presence monitoring system for information security. Proceedings of the IEEE Workshop on Computational Intelligence in Biometrics: Theory, Algorithms, and Applications (CIB '09), March 2009 69-76.Google Scholar
- Janakiraman R, Kumar S, Zhang S, Sim T: Using continuous face verification to improve desktop security. Proceedings of the 7th IEEE Workshop on Applications of Computer Vision (WACV '07), January 2007 501-507.Google Scholar
- Rao B: Continuous keystroke biometric system, M.S. thesis. Media Arts and Technology, University of California, Santa Barbara, Calif, USA; 2005.Google Scholar
- Yap RHC, Sim T, Kwang GXY, Ramnath R: Physical access protection using continuous authentication. Proceedings of the IEEE International Conference on Technologies for Homeland Security (HST '08), May 2008 510-512.Google Scholar
- Sim T, Zhang S, Janakiraman R, Kumar S: Continuous verification using multimodal biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007, 29(4):687-700.View ArticleGoogle Scholar
- Azzini A, Marrara S: Impostor users discovery using a multimodal biometric continuous authentication fuzzy system. In Proceedings of the 12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES '08), September 2008, Lecture Notes in Computer Science. Volume 5178. Springer; 371-378.View ArticleGoogle Scholar
- Prince SJD: Latent identity variables: a generative framework for face recognition in uncontrolled conditions. EP/E065872/1, EPSRC, September 2007Google Scholar
- Chellappa R, Zhou S: Face tracking and recognition from videos. In Handbook of Face Recognition. Edited by: Li SZ, Jain AK. Springer, Berlin, Germany; 2005:169-192.View ArticleGoogle Scholar
- Chen L-F, Liao H-YM, Lin J-C: Person identification using facial motion. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001 2: 677-680.Google Scholar
- Liu X, Chen T: Video-based face recognition using adaptive hidden Markov models. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2003 1: 340-345.Google Scholar
- Turk M, Pentland A: Eigenfaces for recognition. Journal of Cognitive Neuroscience 1991, 3(1):71-86. 10.1162/jocn.19126.96.36.199View ArticleGoogle Scholar
- Viola P, Jones M: Robust real-time object detection. International Journal of Computer Vision 2004, 57(2):137-154.View ArticleGoogle Scholar
- Williams DB, Madisetti V: Digital Signal Processing Handbook. CRC Press, Boca Raton, Fla, USA; 1999.Google Scholar
- Phillips PJ, Moon H, Rizvi SA, Rauss PJ: The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(10):1090-1104. 10.1109/34.879790View ArticleGoogle Scholar
- Schuckers M: Some statistical aspects of biometric identification device performance. Stats Magazine September 2001, 3.Google Scholar
- Davis D, Higgins P, Kormarinski P, Marques J, Orlans N, Wayman J: State of the art biometrics excellence roadmap: technology assessment: volume 1. MITRE Corporation; 2008.http://www.biometriccoe.gov/SABER/index.htmGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.