- Research Article
- Open Access
Gabor Directional Binary Pattern: An Image Descriptor for Gaze Estimation
© Hongzhi Ge. 2010
- Received: 27 April 2010
- Accepted: 24 August 2010
- Published: 29 August 2010
This paper proposes an image descriptor, Gabor Directional Binary Pattern (GDBP), for robust gaze estimation. In GDBP, Gabor magnitude information is extracted firstly from a cropped subimage. The local directional derivations are then utilized to encode the binary patterns in the given orientations. As an image descriptor, GDBP can suppress noises and robustness to illumination variations. Meanwhile, the encoding pattern can emphasize boundary. We use the GDBP features of eye regions and adopt the Support Vector Regression (SVR) to approximate the gaze mapping function, which is then used to predict the gaze direction with respect to the camera coordinate system. In the person-independent experiments, our dataset includes 4089 samples of 11 persons. Experimental results show that the gaze estimation can achieve an accuracy of less than by using the proposed GDBP and SVR.
- Local Binary Pattern
- Gabor Filter
- Lighting Variation
- Gabor Wavelet
- Camera Coordinate System
In HCI (Human-Computer Interaction) scenario, eye gaze means the pointer from the viewer's two eyes to an object, and gaze is a very useful natural input modality. Combining with the sign language recognition or speech recognition, eye gaze tracking can highly improve usability for the disabled persons, and meanwhile it can be applied in some special fields such as, ophthalmology, neurology, and psychology. Many researchers in computer vision and pattern recognition community have been focusing on this topic, and some methods for gaze estimation can be found in related literature. These methods, by their approaches to represent the position of a pupil's center in the eye socket, are divided into two categories : model-based methods and appearance-based methods. Model-based solutions, such as, Pirkinje image [2, 3] and limbus tracking , use an explicit geometric eye model and the geometric features to estimate the gaze direction. Appearance-based solutions treat an eye image as a high-dimensional feature, instead of using the explicit geometric characteristics . These appearance-based approaches are usually more robust in experiments by better exploiting the statistical properties. Sugano et al.  take the cropped eye region as a point in a local manifold model and make gaze estimation by clustering learning samples with similar head poses and constructing their local manifold model. In the approach proposed by Lu et al. , Local Binary Pattern (LBP)  represents the "pupil-glint" vector information related to gaze direction by obtaining the texture changes of eye images. In , an appearance-based method, Local Pattern Model (LPM), is presented. This model combines the improved Pixel-Pattern-Based Texture Feature (PPBTF) and LBP texture feature. Although the existing appearance-based methods have made significant progress in gaze estimation, their accuracy and robustness need to be further improved.
In this paper, we present an appearance-based gaze estimation method based on a novel image operator, Gabor Directional Binary Pattern (GDBP), and Support Vector Regression (SVR) . In GDBP, multiscale and multiorientation Gabor wavelets are used to decompose an eye image, followed by the Directional Binary Pattern (DBP) operator. We use the GDBP operator to represent the texture changes of the eye images caused by the pupil centers which keep moving in the eye sockets, when people at a certain head pose gazes in different directions. With the advantages of Gabor filters  and the local directional differentiation information, GDBP is not only robust to illumination variances, but also with much discriminating power. In applications, these patterns are useful in representing the horizontal and vertical pupil movements. As appearance-based features, GDBP is fed into SVR to approximate the gaze mapping function. The output gaze direction is represented in terms of Euler angles with respect to the camera coordinate system. Our experimental results show the validity of the proposed operator, and additionally we have achieved an accuracy of less than .
The rest of the paper is organized as follows. In Section 2, we elaborate the computation of the proposed GDBP operator in detail, as well as some analysis on its robustness to the light variances and its different discriminating power in different orientations. Gaze estimation with fixed front head pose based on GDBP is presented in Section 3, followed by experimental results with comparisons with other approaches. In the last section, some brief conclusions are drawn with some discussions on the further work.
In this section, we first define the Directional Binary Pattern (DBP), and then extend it to GDBP, using multiscale and multiorientation Gabor filters. Finally, we analyze the robustness and discriminating power of the GDBP. The details are given as follows.
2.1. Directional Binary Pattern (DBP)
2.2. Extending DBP with Gabor Filters
As stated above, a cropped eye image is encoded into GDBP by the following procedures. The image is normalized and transformed to obtain multiple Gabor magnitude maps in frequency domain by applying multiscale and multiorientation Gabor filters. The Directional Binary Pattern is extracted from these maps.
2.3. Robustness Analysis of the GDBP
3.1. Experimental Data Collection
The FASTRAK has a transmitter (the world coordinate system . The origin is located in the centre of the transmitter) and four receivers, and only three receivers are used in our data collection procedure (Each receiver owns a local coordinate system). Three receivers are mounted on the head of the viewer, the camera, and top-left corner of the screen, respectively. The data from a receiver is its position and orientation related to the transmitter, which are six values: in cm, and Azimuth, Elevation, Roll in degree. And then our system can calculate the receiver's translation and rotation matrices related to the transmitter's coordinate system. In Figure 7, one receiver is mounted on the top of the camera, and the output of its translation and calculated rotation matrices are and related to the transmitter's coordinate system. are the axes of camera's coordinate system. Suppose that the translation and rotation matrices from the receiver's coordinate system to camera's coordinate system are and = I (I is a unit matrix), respectively. The second receiver is mounted on the top-left corner of the screen, and the output of its translation and calculated rotation matrices are and related to the transmitter's coordinate system. For each generated cursor as a gazed point, we assume the translation and rotation matrices from the receiver's coordinate system to screen's coordinate system are and = I, respectively. The third receiver is mounted on top of the viewer's head, and the output of its translation and calculated rotation matrices are and related to his transmitter's coordinate system. Assume that the centre of the two eyes has a translation of in cm related to the third receiver ( is a statistical average and the error of different centre can be ignored and tested by the experiments). We keep the direction of the receiver paralleling to the line between the two eyes and the direction upright, and the direction parallels the direction of head pose.
Our system synchronizes the image capturing and the computation of gaze direction. In the data collection, the distance between subject's heads and the screen is around 600 mm. In this paper, in order to simplify the experiments, we keep the head pose in front view by holding the direction of receiver fixed on the top of head. In the data collection, at each time only one of predefined 16 points appears on the monitor and its world coordinate is calculated from the screen coordinate by the translation and rotation matrices. We provide a database for further research.
3.2. Gaze Estimation Based on GDBP and Experimental Results
In our experiments, the kernel function of SVR is the Gaussian kernel function. The sixteen regions of double eye images are used and the average error is around . It is important to note that our eye gaze method is noninvasive, fast, and stable. It is stable due to the robustness of our novel features to the light variances.
In this paper, a robust image descriptor, GDBP, is proposed for gaze estimation. GDBP captures not only the local binary pattern, but also the texture change information related to the given directions. Other advantages of GDBP include noise restrain and robustness to lighting variations. GDBP features are finally fed into SVR to estimate the gaze direction with respect to the camera coordinate system. In the future, we will investigate how to match two GDBPs and how to apply the discriminative capacity of the GDBP operator for other tasks.
- Sugano Y, Matsushita Y, Sato Y, Koike H: An incremental learning method for unconstrained gaze estimation. Proceedings of the Europeon Conference on Computer Vision, 2008, Lecture Notes in Computer Science 5304: 656-667.Google Scholar
- Cornsweet TN, Crane HD: Accurate two-dimensional eye tracker using first and fourth Purkinje images. Journal of the Optical Society of America 1973, 63(8):921-928. 10.1364/JOSA.63.000921View ArticleGoogle Scholar
- Ohno T, Mukawa N, Yoshikawa A: FreeGaze: a gaze tracking system for everyday gaze interaction. Proceedings of the Eye Tracking Research and Applications Symposium (ETRA '02), March 2002 125-132.View ArticleGoogle Scholar
- Matsumoto Y, Zelinsky A: An algorithm for real- time stereo vision implementation of head pose and gaze direct-ion measurement. Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition (AFGR '00), 2000 499-504.Google Scholar
- Hansen D, Ji Q: In the eye of the beholder: a survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence 2010, 32(3):478-500.View ArticleGoogle Scholar
- Lu H-C, Wang C, Chen Y-W: Gaze tracking By binocular vision and LBP features. Proceedings of IEEE 19th International Conference on the Pattern Recognition (ICPR '08), August 2008 1-4.Google Scholar
- Ojala T, Pietikäinen M, Mäenpää T: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002, 24(7):971-987. 10.1109/TPAMI.2002.1017623View ArticleMATHGoogle Scholar
- Lu H, Fang G, Wang C, Chen Y: A novel method for gaze tracking by local pattern model and support vector regressor. EURASIP Signal Processing 2010, 90(4):1290-1299.View ArticleMATHGoogle Scholar
- Smola A, Scholkopf B, et al.: A tutorial on support vector regression. Royal Holloway College, University of London, London, UK; 1998.Google Scholar
- Shan S, Gao W, Chang Y, Cao B, Yang P: Review the strength of gabor features for face recognition from the angle of its robustness to mis-alignment. Proceedings of the 17th International Conference on Pattern Recognition (ICPR '04), August 2004 338-341.Google Scholar
- Zhang B, Wang Z, Zhong B: Kernel learning of histogram of local Gabor phase patterns for face recognition. EURASIP Journal on Advances in Signal Processing 2008, 2008:-8.Google Scholar
- Polhemus FASTRAK, http://www.polhemus.com/?page=Motion_Fastrak
- Niu Z, Shan S, Chen X, Ma B, Gao W: Enhance ASMs based on AdaBoost-based salient landmarks localization and confidence-constraint shape modeling. Proceedings of the International Workshop on Biometric Recognition Systems (IWBRS '05), 2005, Lecture Notes in Computer Science 3781: 9-14.Google Scholar
- EyeLink II, http://www.sr-research.com/EL_II.html
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.