Open Access

A procedure to locate the eyelid position in noisy videokeratoscopic images

EURASIP Journal on Advances in Signal Processing20162016:136

DOI: 10.1186/s13634-016-0433-0

Received: 29 June 2016

Accepted: 1 December 2016

Published: 13 December 2016

Abstract

In this paper, we propose a new procedure to robustly determine the eyelid position in high-speed videokeratoscopic images. This knowledge is crucial in videokeratoscopy to study the effects of the eyelids on the cornea and on the tear film dynamics. Difficulties arise due to the very low contrast of videokeratoscopic images and because of the occlusions caused by the eyelashes. The proposed procedure uses robust M-estimation to fit a parametric model to a set of eyelid edge candidate pixels. To detect these pixels, firstly, nonlinear image filtering operations are performed to remove the eyelashes. Secondly, we propose an image segmentation approach based on morphological operations and active contours to provide the set of candidate pixels. Subsequently, a verification procedure reduces this set to pixels that are likely to contribute to an accurate fit of the eyelid edge. We propose a complete framework, for which each stage is evaluated using real-world videokeratoscopic images. This methodology allows for automatic localization of the eyelid edges and is applicable to replace the currently used time-consuming manual labeling approach, while maintaining its accuracy.

Keywords

Videokeratoscopy Eyelid detection Nonlinear filtering Image segmentation Robust M-estimation

1 Introduction

A keratoscope is an ophthalmological instrument that allows for non-invasive imaging of the topography of the human cornea, which is the outer surface of the eye [1]. The cornea is the largest contributor to the eye’s refractive power, and its topography is of critical importance when determining the quality of vision and corneal health. For example, astigmatism may occur if the cornea has an irregular or toric curvature. Videokeratoscopy allows for studying the dynamics of the corneal topography [25].

Another important application of videokeratoscopy is the analysis of tear film stability in the inter-blink interval. Ocular discomfort can be caused by dry spots which occur if the tear film is destabilized. The tear film build-up and break-up times can be estimated from videokeratoscopic images if the data acquisition rate is sufficiently high [69]. Videokeratoscopy is also involved in the study of the dynamic response of the corneal anterior surface to mechanical forces. These mechanical forces are exerted by the eyelids during horizontal eye movements in a downward gaze. More information on the applications of high-speed videokeratoscopy can be found in [10].

Figure 1 displays the principle of videokeratoscopy. Concentric rings are projected by a Placido disk onto the cornea which is covered by a tear film. The reflection of the ring pattern is recorded by a video camera and analyzed to produce contour maps and 3D reconstruction of the corneal surface. Equally spaced symmetric reflections from the corneal surface would indicate perfect vision, while distortions in the ring pattern represent aberrations.
Fig. 1

Principle of videokeratoscopy [40]. Illuminated rings of pre-defined geometry are projected onto the cornea

One of the first high-speed videotopographic methods could record four images per second [11]. The Contact Lens and Visual Optics Laboratory (CLVOL) at the School of Optometry, Queensland University of Technology, in Brisbane, Australia, has developed a high-speed videokeratoscope which can operate at sampling frequency of 50 Hz due to a combination of a commercially available videokeratoscope and an additional dynamic image acquisition system [10]. Only at this high sampling rate it is possible to reasonably study the period of tear film behavior immediately before and after a blink. All videokeratoscopic data used in this paper were recorded at CLVOL. An example of a videokeratoscopic image is given in Fig. 2.
Fig. 2

A videokeratoscopic image

Eyelid localization in images is an active area of research, and important applications are, for example, iris recognition systems and drowsiness detection [1215]. To the best of our knowledge, the case of videokeratoscopic images is still an open research question. In fact, even today, the very time-consuming manual selection of candidate pixels followed by a parametric fit of a parabola in the least squares sense is still the routine operation.

In videokeratoscopy, the contrast of the images is low and edges are potentially blurred, which makes edge detection [16, 17] inapplicable. Further, severe occlusions especially by the upper eyelashes and their shadows may occur, as seen in Fig. 2. Figure 3 illustrates the application of a Canny edge detector [16] to videokeratoscopic images. Clearly, the Placido disk ring pattern produces strong gradients in all directions. Further, it is evident that the upper eyelid edge is much more difficult to detect than the lower eyelid because it is severely affected by the eyelashes and their shadows. For this reason, we focus on the upper eyelid.
Fig. 3

Canny edge detection applied to a videokeratoscopic image

In addition to the difficulty of localizing the image’s region of interest, videokeratoscopy for eye research imposes strong requirements concerning the accuracy of the model of the eyelid edge. The conventional approach to fit a parabola does not always provide a sufficiently accurate approximation to the real curvature. In some images, a non-symmetrical model may be necessary to describe the entire eyelid including the parts covering the sclera. In this paper, we therefore propose and evaluate some alternative models.

Contributions: In this paper, a new procedure is proposed to robustly determine the eyelid position in high-speed videokeratoscopic images. The proposed method allows for automatic localization of the eyelid edges which replaces the currently used time-consuming manual labeling. We propose to use robust M-estimation to fit a parametric model to a set of eyelid edge candidate pixels. In this way, we account for outliers in the candidate pixels. These are present due to the very low contrast of videokeratoscopic images and because of the occlusions caused by the eyelashes. In the case of the parabola, an alternative robust fit by the Hough transform is also discussed. To detect these pixels, first, nonlinear image filtering operations are performed to remove the eyelashes. In particular, we propose a method based on the gradient direction variance and a wavelet-based method which adapts the procedure of [14] to videokeratoscopic images. Subsequently, an image segmentation approach based on morphological operations and active contours is proposed to provide the set of candidate pixels. We propose and evaluate new linear and nonlinear eyelid curvature models as alternatives to the conventionally used parabola. A real-world data performance analysis is provided to examine the error rates of the proposed models.

Organization: Section 2 is dedicated to the proposal and description of the robust procedure to locate the eyelid position in noisy videokeratoscopic images. Section 3 provides real-data experiments and results. Section 4 concludes the paper.

2 The proposed procedure for eyelid position estimation in videokeratoscopy

In this section, we introduce a new robust procedure for eyelid localization in noisy videokeratoscopic images. Our method is divided into three steps: nonlinear image filtering, candidate pixel detection and verification, and robust model fitting. Figure 4 shows an overview.
Fig. 4

The proposed framework for the estimation of the eyelid edges (top) and the investigated filtering, verification, and fitting methods and eyelid edge models (bottom)

2.1 Nonlinear image filtering for eyelash removal

In this step, videokeratoscopic images are processed such that the subsequent algorithms are able to detect candidate pixels that are located on the eyelid edge. Similar to iris recognition systems [12, 14], an important factor that affects the quality of the eyelid position estimation are the eyelashes. Additional challenges to be considered in videokeratoscopy are the blur of the image and the ring pattern of the Placido disk.

We investigate two different approaches to remove the eyelashes from videokeratoscopic images. The first is based on the gradient direction variance and the second is a wavelet-based method.

2.1.1 Gradient direction variance-based method

We briefly revisit the method by Zhang et al. [18] that is based on nonlinear conditional directional filtering and describe its adaptation to videokeratoscopic images.

The first step concerns detecting whether a pixel is in an image area that is contaminated by the eyelashes. Then, the eyelashes’ direction is estimated. Each affected pixel is filtered along the eyelashes’ perpendicular direction. To find the eyelashes’ positions and to estimate their direction, a 3 ×3 Sobel edge filters are applied to the image, as illustrated in Table 1.
Table 1

Sobel edge filters: x-direction, image region, and y-direction (from left to right)

 
The gradients in the x- and y-directions are denoted as G x and G y , respectively. Here, the local gradient direction ϕ and the magnitude are calculated as follows:
$$\begin{array}{@{}rcl@{}} G_{{x}} & = & (z_{7} + 2z_{8} + z_{9}) - (z_{1} + 2z_{2} + z_{3}) \end{array} $$
(1)
$$\begin{array}{@{}rcl@{}} G_{{y}} & = & (z_{3} + 2z_{6} + z_{9}) - (z_{1} + 2z_{4} + z_{7}) \end{array} $$
(2)
$$\begin{array}{@{}rcl@{}} \nabla & = & \sqrt{G_{\mathrm{x}}^{2} + G_{\mathrm{y}}^{2} } \end{array} $$
(3)
$$\begin{array}{@{}rcl@{}} \phi &=& \arctan{\frac{G_{\mathrm{y}}}{G_{\mathrm{x}}}}. \end{array} $$
(4)
Next, a window of size 8×8 is defined and the gradient direction variance is calculated as
$$ \sigma_{\nabla}^{2} = \frac{1}{63} \sum_{i=1}^{64}{ \left(\phi_{i} - \overline{\phi} \right)^{2}}. $$
(5)
Herein, \(\overline {\phi }\) represents the sample mean of the gradient directions included in the window. If the gradient direction variance is small, which indicates the presence of an edge, then the pixel is classified as being affected by an eyelash. This underlying threshold is empirically determined to be 2.7 by the distributions of gradient direction variances in the eyelash and non-eyelash areas, see Fig. 5. In case of an eyelash pixel, a 1-D median filter of length L is applied to the surrounding pixels to determine a new value of the classified pixel.
Fig. 5

Histogram of the gradient direction variances for the eyelash area and the non-eyelash area

For distinguishable gradients, as caused by the eyelashes in well-focused images, the method has achieved reasonable results [18]. The filter has been reported to have little effect on the regions without gradients, e.g., the iris, the sclera, or the facial skin. In videokeratoscopic images, the eyelash removal is less effective, as illustrated in Fig. 6 which displays the output of the method. While the eyelashes on the lower eyelid are almost entirely removed, this does not hold for the upper eyelashes region, especially if multiple eyelashes overlap.
Fig. 6

Videokeratoscopic image (same as Fig. 2) after applying the gradient direction variance method

2.1.2 Wavelet-based method

For iris recognition systems, an effective wavelet-based method for eyelash removal was introduced by Aligholizadeh et al. [14]. Wavelets can be used to decompose the eye image into components that appear at different resolutions. The key advantage of the wavelet transform, compared to the traditional Fourier transform, is its position-frequency localization property, allowing features that occur at the same position and resolution to be matched up.

We adapted the eyelash removal algorithm by [14] and extended it to be applicable to videokeratoscopic images. For this, in each level, we decomposed the videokeratoscopic image into four sub-bands of wavelet coefficients, as shown in Fig. 7.
Fig. 7

Wavelet coefficients at level 1 of a videokeratoscopic image: approximation (top left), horizontal (top right), vertical (bottom left), and diagonal coefficients (bottom right)

In videokeratoscopy, the eyelashes are mainly vertically or diagonally aligned. Therefore, it can be expected that they have an influence on the coefficients in the vertical and diagonal sub-band images whereas the eyelid is mainly horizontally or in some extent diagonally oriented. Based on these assumptions, Daubechies wavelets are used to decompose the image and to set all vertical and diagonal coefficient values in each level to zero. After applying the inverse wavelet transform, the wavelet-filtered image should contain less eyelashes than the original image. Figure 8 shows the wavelet-filtered image and the difference image, i.e., the image containing the removed eyelashes.
Fig. 8

Wavelet-filtered image and difference image based on the videokeratoscopic image shown in the upper left part of Fig. 7

Videokeratoscopic images are more challenging compared to the images considered in [14]. For this reason, applying the above method only results in a reduction and not the removal of eyelashes. Additional steps are necessary to determine eyelid edge pixels. Our proposed approach is introduced subsequently.

2.2 Active contours method for eyelid edge pixel candidate detection

After applying nonlinear image filters to the initial videokeratoscopic image, an active contours image segmentation method is presented to detect pixels in videokeratoscopic images that lie on the eyelid edge. This method outperformed other image segmentation approaches, such as region growing [19], watershed segmentation [20], and empirical and gradient-based methods [16] we studied before, but we do not report for space considerations.

Active contours are widely used in image segmentation to delineate an object contour within an image. The general idea of Kass et al. [21], who introduced the active contour model (also called snakes), was to minimize the energy associated to the current contour as a sum of an internal and external energy. The internal energy term controls the smoothness of the contour and is minimized when the snake’s shape matches the shape of the sought object; the external energy term attracts the contour towards the object and is minimized when the snake is at the object’s boundary. An initial estimate is required which is refined by means of energy minimization.

“Snakes” are active deformable models and can be represented as a set of n points v i =(x i ,y i ), where i=0,…n−1. The deformation of their contours depends on their energy function
$$\begin{array}{@{}rcl@{}} E_{\text{snake}} & = & {\int_{0}^{1}} E_{\text{snake}} \left(v(s)\right)ds \\ & = & {\int_{0}^{1}} E_{\text{internal}}\left(v(s)\right)+E_{\text{external}}\left(v(s)\right)ds \\ & = & {\int_{0}^{1}} E_{\text{internal}}\left(v(s)\right)+E_{\text{image}}\left(v(s)\right) \\ & & \ldots +E_{\text{con}}\left(v(s)\right)ds \end{array} $$
(6)

with E internal representing the internal energy of the snake, E image denoting the image forces acting on the spline, and E con representing the external constraint forces introduced by the user. E image and E con form the external energy acting on the spline.

In the case of videokeratoscopy, the recurrent structure of the image allows to incorporate higher-level prior knowledge to obtain an initial estimate. We propose to apply morphological operations to the output of the nonlinearly filtered image (stage 1). In particular, the nonlinearly filtered image is eroded and dilated with morphological discs.

If A is a set in \(\mathbb {Z}^{2}\), then a=(a 1,a 2) is considered to be an element of A if aA. This corresponds to a pixel lying within a region of the image. Dilation is thereby defined as
$$ \delta_{B}(A) = A \oplus B = \{z \in E \ |\ (B^{s})_{z} \bigcap A \ne \emptyset \}, $$
(7)
where A and B denote sets within \(\mathbb {Z}^{2}\) and B s is the the reflection of set B
$$ B^{s} = \{w|w = -b, \ \text{for} \ b \in B\}. $$
(8)
Dilation can be interpreted geometrically as the locus of points covered by B when the center of B moves inside A. Erosion is defined as
$$ \epsilon_{B}(A) = A \ominus B = \{z \in E \ |\ B_{z} \subseteq A \}. $$
(9)
The translation of a set B by point z=(z 1,z 2), denoted as B z , becomes
$$ B_{z} = \{b + z, | b \in B \}, \quad \forall z \in E. $$
(10)
Erosion is interpreted as the locus of points reached by the center of B when B moves inside A. In our approach, we combine the two operations, which is referred to in mathematical morphology as opening: the erosion of A by B, followed by dilation of the result by B.
$$ \gamma_{B}(A) = A \circ B = (A \ominus B) \oplus B $$
(11)

Opening generally smooths the contour of a set by breaking its narrow isthmuses and by eliminating small holes in the set.

The proposed approach uses morphological discs for B with radius r d=2 pixels for dilation and radius r e=16 pixels for erosion. If the size of the structuring element is chosen properly, the eyelashes can be effectively suppressed. From the resulting image, a global image threshold is calculated by Otsu’s method [22] to convert the image to a binary image. Figure 9 displays an example of an obtained binary image that is used as an initial estimate for active contours.
Fig. 9

Initial estimate for active contours obtained from morphological dilation and erosion operations on the nonlinearly filtered image using gradient direction variance

By minimizing the energy function in Eq. (6), the contours of the initial estimate iteratively adapt and converge to the eyelid contour. Here, it is possible to cut the videokeratoscopic image into the upper and lower half to decide for estimating the edge of the upper or lower eyelid contour. As we focus on the upper eyelid in this work, candidate pixels are finally drawn from the upper edge of the white contour in the binary image. An example of the resulting candidate pixels for the upper eyelid edge superimposed onto the original image is shown in Fig. 10.
Fig. 10

Candidates pixels drawn from the resulting contour

2.3 Candidate verification by using image statistics and polar coordinate fit

Before candidate pixels are fit to a parametric model, a candidate verification algorithm analyzes characteristics of the candidate pixels in order to remove pixels which are unlikely to contribute to an accurate fit of the eyelid.

The verification is based on a set of characteristics: the intensity averages, the column intensity decline, and a polar coordinate fit, which are combined to obtain a verification of candidates.

2.3.1 Intensity averages

In this characteristic, first-order statistics of the column and row of a candidate pixel are evaluated and compared to the overall averages of the remaining rows and columns. Figure 11 depicts the average row intensity values of a typical videokeratoscopic image and the empirically determined threshold which flags the membership of this row to the eyelid region. The threshold is carefully and conservatively chosen among all videokeratoscopic images. The box in Fig. 11 illustrates the value range of the actual eyelid areas. The same strategy is pursued for the columns. Since the intensity averages of the columns are less significant, they are given less weight in the final pixel verification.
Fig. 11

Average row intensity of a videokeratoscopic image. The box indicates the value range of the actual eyelid areas

2.3.2 Column intensity decline

The next characteristic is motivated by the fact that, in general, for videokeratoscopic images, the pixels above an eyelid are brighter due to the skin compared to the eyelid, which itself is characterized by a dark region. Thus, the intensity decline serves as indicator that a candidate pixel belongs to the set of eyelid edge pixels.

Before the intensity gradients are calculated, it is necessary to filter the column intensity values to reduce the effect of the ring patterns from the Placido disk. In our experiments, a median filter of length L 2=15 is applied to suppress the ring patterns and a moving average filter of length L 1=25 further smooths the intensity curve.

After the smoothing, an extended differentiation is performed on the column intensity values, which calculates the difference of values having a distance of 15 pixels. The distance between the pixels is empirically determined by the average transition range of an eyelid in a videokeratoscopic image. An example is shown in Fig. 12.
Fig. 12

Differentiated column intensity values of a videokeratoscopic image

Positive weight is given towards the overall decision if the differentiated column intensity values of the candidate pixel and that of its adjacent columns fall below zero. Thus, the candidate pixel lies inside an intensity decline.

2.3.3 Polar coordinate fit

The third characteristic to verify the candidate pixels is based on a robust parabolic fit of the candidate pixels in the polar coordinate domain. After shifting the center of the image to the center of the pupil, a polar image can be determined by a Cartesian to polar coordinate transformation. In polar coordinates, the eyelids’ shape is similar to a parabolic curve.

The algorithm to calculate the polar coordinate fit consists of four steps which we discuss in the sequel.

Step 1: Finding the center of the pupil. A videokeratoscopic image is characterized by the ring pattern which is projected onto the iris. The center of the rings is also the center of the pupil. This fact can be exploited by a circular Hough transform [23], which is a robust method to find circles in an image. The circular Hough transform is applied to the original videokeratoscopic images to find circles with radii ρ between 75 and 125 pixels and center coordinates c x and c y . The parametrized equation of the circle is given by
$$ \rho = \sqrt{(y-c_{y})^{2}+(x-c_{x})^{2}}. $$
(12)

The maximum in the Hough space is determined to find the best fitting parameter set.

Step 2: Transformation in the polar domain. Based on the center coordinates of the pupil, a Cartesian to polar transformation can be performed. The new coordinate system consists of the variable ρ for the radius as in Eq. (12) and φ for the angle which can be derived by
$$ \varphi = \arctan\left(\frac{y-c_{y}}{x-c_{x}}\right). $$
(13)

Due to the circular dimension of the new coordinate system, the rectangular original image is cropped in the corners. For our purpose, the cropping can be neglected since only insignificant image areas are dropped. This would result in information loss, only if the center of the iris is very far away from the center of the image, which is not usually the case for videokeratoscopic images.

Step 3: Robust fitting of a parabolic curve. In this step, a robustly estimated parabolic model is fitted to the candidate pixels. See Section 2.4 for possible robust estimation methods. An example is provided in Fig. 13.
Fig. 13

Eyelid candidates (white) and robustly fitted curve (yellow)

Step 4: Outlier detection. The last step is to compare the candidate pixels with the fitted curve of step 3. For this purpose, the smallest distances d i between the candidate pixels and the fitted curve are calculated. Then, the corresponding robust estimate of the standard deviation is determined by
$$ \hat{\sigma}_{\text{rob}} = \frac{1}{\Phi^{-1}(3/4)}\cdot \text{MAD}(d_{i}) = 1.4826\cdot \text{MAD}(d_{i}), $$
(14)

where the median absolute deviation (MAD) is given as MAD(d i )=median i (|d i −median j (d j )|) and Φ −1 is the inverse of the cumulative distribution function for the standard normal distribution. To detect outliers, a threshold is set to \(T_{1} = 3\cdot \hat {\sigma }_{\text {rob}}\). The 3-\(\hat {\sigma }_{\text {rob}} \) rule is justified by the fact that for \(d_{i} \sim \mathcal {N}(\mu,\sigma ^{2})\), the probability of d i taking a value above 3σ is unlikely, i.e., \(\phantom {\dot {i}\!}\text {Pr}(|d_{i}-\mu _{d_{i}}|<3\sigma)=99.73\).

2.3.4 Candidate verification

For candidate verification, the normalized decisions c i,k of each characteristic i are weighted for each candidate k according to its significance and compared to a threshold:
$$ \frac{1}{3}c_{1,k} + \frac{2}{9}c_{2,k} + \frac{4}{9}c_{3,k} > \frac{7}{9}. $$
(15)
Both the weights and the threshold are determined empirically. Figure 14 shows a videokeratoscopic image with accepted (white) and dismissed (yellow) candidate pixels.
Fig. 14

Videokeratoscopic image with accepted (white) and dismissed (yellow) candidate pixels

2.4 Robust model fitting

In this section, we present robust approaches to fit linear and nonlinear parameterized curve models to the verified eyelid edge candidate pixels. Due to the eyelashes and low image quality in videokeratoscopic images, we suggest to use robust M-estimation for the unknown model parameters. We chose M-estimators, because even after candidate pixel verification, the Gaussian assumption may only hold approximately. An alternative robust fit via the Hough transform is also discussed exemplarily for quadratic polynomials.

2.4.1 Curve models

To provide the best possible accuracy, we investigate the applicability of a wide range of curve models.

The quadratic polynomial
$$ y(x;a,b,c) = ax^{2} + bx + c $$
(16)
is a linear function of the parameters and is also the most frequently applied parametrization of the eyelid edge. Drawbacks are its symmetry and its single maximum, which do not always accurately represent the true eyelid edge. We thus also consider the cubic
$$ y(x;a,b,c,d) = ax^{3} + bx^{2} + cx + d, $$
(17)
and the fourth-order polynomial
$$ y(x;a,b,c,d,e) = ax^{4} + bx^{3} + cx^{2} + dx + e. $$
(18)

Higher polynomial orders are not considered since they would result in curvatures that over-fit the data. As there is no physical motivation to restrict our attention to linear models, we also consider some nonlinear models that are potentially suitable parametrizations of the eyelid edge.

Rational functions, which are described by a nominator polynomial function P(x) and a denominator polynomial function Q(x)
$$ y(x) = \frac{P(x)}{Q(x)} $$
(19)
are a natural extension of the polynomial models. Based on experimental evaluation, we restrict the class of rational functions to a parabolic nominator function and a linear denominator function, i.e.,
$$ y(x;a,b,c,p) = \frac{ax^{2} + bx + c}{x + p}. $$
(20)
The second nonlinear model that we consider is the first-order Fourier series
$$ y(x;a_{0},a_{1},b_{1},\omega) = a_{0} + a_{1} cos(x \omega) + b_{1} sin(x \omega), $$
(21)
for which the Fourier coefficients a 0, a 1, and b 1 are given by
$$\begin{array}{@{}rcl@{}} a_{0} & = & \frac 1\pi \int_{-\pi}^{\pi} x dx = 0, \end{array} $$
(22)
$$\begin{array}{@{}rcl@{}} a_{n} & = & \frac 1\pi \int_{-\pi}^{\pi} x \cos(nx)dx = 0, \quad n \ge 0, \end{array} $$
(23)
$$\begin{array}{@{}rcl@{}} b_{n} & = & \frac 1\pi \int_{-\pi}^{\pi} x \sin(nx)dx \end{array} $$
(24)
$$\begin{array}{@{}rcl@{}} & = & -\frac 2n \cos(n\pi) + \frac{2}{\pi n^{2}} \sin(n\pi) \end{array} $$
(25)
$$\begin{array}{@{}rcl@{}} & = & 2 \frac{(-1)^{n+1}}{n}, \quad n \ge 1. \end{array} $$
(26)

Also, in this case, higher orders were excluded to avoid an over-fitting of the data and to avoid modeling artefacts that would be introduced by the periodicity.

The third class of curve models is based on probability density functions (pdf), which are characterized by
$$ \int_{-\infty}^{\infty} f(x) dx = 1 $$
(27)

with f(x)≥0. The motivation for applying pdf type functions is that shifted and rotated versions of f(x) are able to well parametrize the eyelid edge using only a few parameters. Since there is no theoretical justification, or practical investigation that suggests a particular distribution, we consider the following candidates.

(i) The Weibull pdf [24, 25]
$$ f(x;\sigma,\lambda) = \frac{\sigma}{\lambda} \left(\frac x \sigma \right)^{\lambda-1} e^{- \left(x / \sigma \right)^{\lambda}} $$
(28)

is described by the scale parameter σ and its shape parameter λ. Here, x≥0 and σ,λ>0.

(ii) The Gamma pdf
$$ f(x;\sigma,\lambda) = \frac{1}{\sigma \Gamma(\lambda)} \left(\frac x \sigma \right)^{\lambda-1} e^{-\frac x \sigma}, $$
(29)

with x≥0 and σ,λ>0.

(iii) The Fréchet pdf
$$ f(x;\sigma,\lambda) = \frac{\lambda}{\sigma} \left(\frac x \sigma\right)^{-1-\lambda} e^{-\left(\frac x \sigma \right)^{-\lambda}}, $$
(30)

where x≥0 and σ,λ>0.

(iv) The Type I Dagum pdf [26, 27]
$$ f(x;a,b,p) = \frac{ap}{x} \left(\frac{\left(\frac{x}{b} \right)^{ap}}{ \left(\left(\frac{x}{b} \right)^{a} + 1 \right)^{p+1}} \right), $$
(31)

where x≥0 and a,b,p>0.

(v) The log-logistic pdf
$$ f(x;\alpha,\beta) = \frac{\frac \beta\alpha \left(\frac x\alpha \right)^{\beta-1}}{ \left(1 + \left(\frac x\alpha \right)^{\beta} \right)^{2}} $$
(32)

with scale parameter α and shape parameter β, where x≥0 and α,β>0.

(vi) The Rice pdf [28]
$$ f(x;\nu,\sigma) = \frac{x}{\sigma^{2}} e^{\frac{- \left(x^{2} + \nu^{2} \right)}{2 \sigma^{2}}} I_{0} \left(\frac{x \nu}{\sigma^{2}} \right), $$
(33)

where x≥0 and ν,σ≥0 with ν being the distance between the reference point and the center of the bivariate distribution. σ is the scale parameter, and I 0(x) is the modified Bessel function of the first kind with order zero.

(vii) The skew normal pdf [29]
$$ f(x;\alpha,\xi,\omega) = \frac 2\omega \phi \left(\frac{x - \xi}{\omega} \right) \Phi \left(\alpha \left(\frac{x-\xi}{\omega} \right) \right), $$
(34)
where α represents the skew, ξ the location, ω the scale parameter. ϕ(x) is the standard normal pdf
$$ \phi(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^{2}}{2}} $$
(35)
and Φ(x) denotes the cumulative distribution function given by
$$ \Phi(x) = \int_{-\infty}^{x} \phi(t) dt = \frac 12 \left(1 + \text{Erf}\left(\frac{x}{\sqrt{2}} \right) \right). $$
(36)
Here, the error function Erf(z) is defined as
$$ \text{Erf}(z)=\frac{2}{\sqrt{\pi}}{\int_{0}^{z}} e^{-t^{2}} \: dt. $$
(37)

For α=0, the skew normal distribution reduces to the normal distribution. For an increasing absolute value of α, the skewness also increases. The distribution is right skewed for α>0, and for α<0, the distribution is left skewed.

Before fitting these models, the candidate pixels must be aligned and normalized to account for rotation or scaling. For this, a ground line is drawn from the lowest candidate pixel to the left to the lowest candidate pixel to the right in the image. The ground line is then rotated to a horizontal line and all candidate pixels are rotated with the same angle. The scale is normalized to one in both axes, and the fitting is performed on these transformed candidate pixels.

2.4.2 Robust estimation

The above described curve models represent either a linear or a nonlinear regression. The linear regression model is given by
$$ Y_{n} = \mathbf{X}_{n}^{\top}\boldsymbol{\theta} + V_{n}, \quad n = 1, \ldots, N, $$
(38)
where Y n is a scalar random variable, X n is a vector of random variables, θ are the unknown parameters of interest, and V n is a random variable of the errors. The residuals R n can be obtained by
$$ R_{n} = Y_{n} - \mathbf{X}_{n}^{\top}\hat{\boldsymbol{\theta}}, \quad n = 1, \ldots, N. $$
(39)

In our notation, vectors and matrices are bold and random variables are in uppercase.

The nonlinear regression model states that
$$ Y_{n} = f(\mathbf{X}_{n},\boldsymbol{\theta}) + V_{n}, \quad n = 1, \ldots, N, $$
(40)
where f(X n ,θ) is a nonlinear function and the corresponding residuals R n are obtained by
$$ R_{n} = Y_{n} - f(\mathbf{X}_{n},\hat{\boldsymbol{\theta}}), \quad n = 1, \ldots, N. $$
(41)
The Gaussian maximum likelihood estimate is defined by
$$ \hat{\boldsymbol{\theta}} = \arg\!\min_{\boldsymbol{\theta}}\sum_{n=1}^{N} \rho \left({R_{n}(\boldsymbol{\theta})} \right), $$
(42)

where the loss function ρ(x)=x 2 coincides with that of a least squares estimator (LSE). It is well known that this estimator is very sensitive to departures from the Gaussian data assumption. Robust statistics formalize the theory of approximate parametric models [30]. On the one hand, like classical parametric methods, robust methods are able to leverage upon a parametric model, but on the other hand, they do not depend critically on the exact fulfilment of the model assumptions. In this sense, robust statistics are very close to engineering intuition and signal processing demands [31]. M-estimators robustify maximum likelihood estimation (MLE) by introducing a bounded score function \(\psi (x) = \frac {\partial \rho (x)}{\partial x}\).

A common M-estimator is the least absolute deviations (LAD) estimator, for which estimates are obtained by solving
$$ \hat{\boldsymbol{\theta}}_{\text{rob}} = \arg\!\min_{\boldsymbol{\theta}}\sum_{n=1}^{N} \rho \left(\frac{R_{n}(\boldsymbol{\theta})}{\hat{\sigma}_{\text{rob}}} \right) $$
(43)

with ρ(x)=|x| and \(\hat {\sigma }_{\text {rob}}\) as given by (14). It belongs to the class M-estimators with monotone score function.

As a member of the class M-estimators with re-descending score functions, we consider Tukey’s bisquare estimator which uses
$$ \rho(x) = \left\{\begin{array}{ll} 1-\lbrack 1 - (x/k)^{2} \rbrack^{3} & \text{if }|x|\leq k\\ 1 & \text{if }|x|>k. \end{array}\right. $$
(44)

Choosing the tuning constant to be k=4.685 ensures 95% efficiency w.r.t. the MLE when the data exactly follows the nominal Gaussian model [32]. To obtain estimates for linear models, the minimization problem of (43) is easily solved using an iteratively reweighted least squares approach, as described in [32]. The LAD can serve as starting point for Tukey’s biweight method. For nonlinear models, we used the trust-region method [33], which represents an improvement over the popular Levenberg-Marquardt algorithm [34, 35].

2.4.3 Hough transform for parabolic curve detection

The Hough transform [36] is widely used in digital image processing and computer vision to isolate features of a particular shape within an image. Circular or parabolic Hough transforms have been applied to accurately detect the iris or eyelid boundary [3739], respectively.

Based on geometrical limitations, boundaries for the parameters in Eq. (16) are determined so as to span a finite size 3-D accumulator array, the Hough space. Within these boundaries, all possible parabolas are evaluated for each candidate pixel. If the corresponding parametrized parabola matches a candidate pixel, the value of a point in the Hough space is incremented.

3 Real-data experiments

This section presents the evaluation metric, the experimental setup, and the results of the proposed procedure.

3.1 Experimental setup

As with many real-world problems, there is no objective ground truth for the exact eyelid curvature of videokeratoscopic images. Therefore, reference pixels \(y_{m}^{\text {ref}}, m = 1,\ldots,N^{\text {ref}},\) are manually determined in the videokeratoscopic images to serve as ground truth for the evaluation. In each of these selected videokeratoscopic images, about ten reference pixels are set to a position, where an expert human observer locates the eyelid with high confidence. In our study, we considered ten different videokeratoscopic images. All of them are challenging, as they contain blur and severe eyelashes. An example is shown in Fig. 15.
Fig. 15

Reference pixels in a videokeratoscopic image

3.2 Evaluation metric

As the evaluation metric for a fitted curve, the root mean square (RMS) deviation of each reference pixel to the closest pixel of the fitted curve is calculated by
$$ \text{RMS}(\hat{y}_{m}(\boldsymbol{\hat{\theta}}), y_{m}^{\text{ref}}) = \sqrt{\frac{\sum_{m=1}^{N^{\text{ref}}}(\hat{y}_{m}(\boldsymbol{\hat{\theta}})-y_{m}^{\text{ref}})^{2}}{N^{\text{ref}}}}. $$
(45)

Here, \(\boldsymbol {\hat {\theta }}\) represents the estimated parameters, N ref is the number of reference pixels, and \(\hat {y}_{m}(\boldsymbol {\hat {\theta }})\) defines the closest curve pixel to the reference pixel \(y_{m}^{\text {ref}}\). A pixel in a videokeratoscopic image corresponds to approximately 20 μm.

After evaluating Eq. (45) for all reference images, we report on the mean and standard deviation (STD) taken over all images. We also calculate the median and MAD over all images since they are robust estimates of the mean and standard deviation that are not severely influenced by severely divergent results on single images.

3.3 Results

In Table 2, the overall results of the proposed methods for all possible combinations of the stages, that are shown in Fig. 4, are listed. As it can been seen, the curve model of the Rice function outperforms the parabola. Furthermore, the LAD achieves better results for the Rice function and parabola compared to Tukey’s bisquare. The performance of the Hough transform is disappointing, not only because of its enormously high computational complexity but also in terms of the accuracy of fitting the parabola to the candidate pixels. The gradient direction variance (GDV) performs much better than the wavelet-based filtering.
Table 2

Overall results of the proposed methods for all possible combinations of the stages that are shown in Fig. 4 have been computed

Rank

Filtering

Model

Fitting

Mean ± STD

1

GDV

Rice

LAD

3.54 ± 1.44

2

GDV

Rice

Bisquare

3.68 ± 1.49

3

GDV

Parabola

LAD

4.32 ± 2.07

4

GDV

Cubic

Bisquare

4.34 ± 1.51

5

GDV

Cubic

LAD

4.42 ± 1.72

6

GDV

Parabola

Bisquare

4.51 ± 2.64

7

GDV

Rational

LAD

4.64 ± 1.91

8

GDV

Fourier

LAD

4.69 ± 2.10

9

GDV

Fourth-order

LAD

4.75 ± 2.11

10

GDV

Fourier

Bisquare

4.80 ± 1.99

11

GDV

Dagum

LAD

4.84 ± 1.54

13

GDV

Skew-normal

Bisquare

4.86 ± 2.42

17

GDV

Weibull

LAD

5.11 ± 2.09

42

GDV

Log-logistic

LAD

5.71 ± 2.86

71

Wavelet

Fréchet

Bisquare

6.10 ± 3.36

78

Wavelet

Parabola

Hough

6.13 ± 4.42

184

No filtering

Gamma

Bisquare

7.17 ± 1.41

The top ten and best results for each curve model are listed in decreasing accuracy order

Interestingly, when calculating the median of the RMS deviation over all reference images, however, the Hough transform outperforms the M-estimation and the Rice function falls behind, as it can be seen in Table 3. This indicates that the Hough transform performs well in the majority of the cases but is largely outperformed by the M-estimators for a small subset of images.
Table 3

Overall results of the proposed methods (in decreasing accuracy order)

Rank

Filtering

Model

Fitting

Median ± MAD

1

GDV

Parabola

Hough

3.50 ± 1.37

2

GDV

Rice

LAD

3.52 ± 1.04

3

GDV

Parabola

Bisquare

3.54 ± 0.91

4

GDV

Fourth-order

LAD

3.61 ± 1.80

5

GDV

Rice

Bisquare

3.62 ± 1.37

6

GDV

Parabola

LAD

3.66 ± 1.17

7

GDV

Rational

Bisquare

3.82 ± 0.61

8

GDV

Fourier

Bisquare

3.83 ± 1.57

9

GDV

Fourth-order

Bisquare

3.91 ± 2.28

10

GDV

Skew-normal

Bisquare

4.03 ± 2.22

11

GDV

Cubic

LAD

4.04 ± 1.74

32

Wavelet

Weibull

Bisquare

4.62 ± 2.11

40

Wavelet

Fréchet

Bisquare

4.86 ± 2.27

42

GDV

Dagum

LAD

4.90 ± 1.08

60

GDV

Log-logistic

LAD

5.16 ± 2.40

113

GDV

Gamma

LAD

5.71 ± 3.03

The results are the median and MAD of the RMS deviations over all ten images

We next assess the performance of each individual stage of our proposed procedure.

Table 4 evaluates the nonlinear image filtering stage. The results of the wavelet- and GDV-based eyelash suppression are also compared to the case when no filtering is performed. The GDV is more accurate when looking at the median of the results, whereas the wavelet-based method outperforms the GDV on average.
Table 4

Results of the two nonlinear image filtering approaches and without any filtering in RMS deviations over all images

 

GDV

Wavelet

None

Mean ± STD

12.27 ± 13.26

10.31 ± 7.32

11.51 ± 8.60

Median ± MAD

6.60 ± 4.58

8.23 ± 4.80

8.73 ± 4.89

Table 5 shows the results of the candidate verification stage. A minor improvement is obtained in terms of average performance; however, for the considered cases, it seems that the curve estimators are sufficiently robust, even without the verification stage.
Table 5

Results of the candidate verification procedure in RMS deviations over all images

 

Verification

No. of verification

Mean ± STD

13.62 ± 15.73

13.67 ± 14.47

Median ± MAD

9.27 ± 5.26

9.35 ± 5.06

Table 6 compares the Hough transform to the robust M-estimation approach. Although the Hough transform exhibits considerably higher computational complexity than the M-estimator, it is inferior in terms of mean accuracy.
Table 6

Results of the robust fitting approaches in RMS deviations over all images

 

M-estimator

Hough transform

Mean ± STD

12.19 ± 13.62

18.46 ± 25.39

Median ± MAD

8.81 ± 5.12

10.48 ± 5.75

Table 7 compares Tukey’s M-estimator to the LAD estimator. The choice of the two loss functions of the M-estimator is less significant, as both methods achieve similar results in terms of accuracy. However, the computational complexity of Tukey’s estimator is slightly higher than that of the LAD.
Table 7

Results of the two robust estimation methods in RMS deviations over all ten images

 

Tukey’s bisquare

LAD

Mean ± STD

13.43 ± 14.24

13.38 ± 14.50

Median ± MAD

9.30 ± 5.16

9.25 ± 5.11

Figure 16 depicts a result of both fitting methods for a parabola in a videokeratoscopic image. For comparison, Fig. 17 shows a fit using the Rice model.
Fig. 16

Videokeratoscopic image with estimated eyelid edge fitted by Hough transform (white) and robust regression (yellow)

Fig. 17

Videokeratoscopic image with estimated eyelid edge using the Rice model

Based on the presented results, we suggest for further eyelid localization research to consider the usage of M-estimators instead of the Hough transform, as it achieves similar results in terms of accuracy but is significantly less computational demanding. Furthermore, we recommend to also consider different curvature models than the parabola. Candidate verification does not seem to be required when using robust estimators.

4 Conclusions

We proposed a new procedure to robustly estimate the position of the eyelid edges in high-speed videokeratoscopic images. The proposed method applies eylash removal before segmenting the image with an active contours approach that is initialized by a contour that is obtained from morphological opening and closing operations. The position of the eyelids are verified and, finally, parametric curve models are fitted by applying robust parameter estimators to the selected pixels. Real-data experiments showed that the Rice model and the parabola achieved best results. Furthermore, robust regression outperforms the Hough transform as a robust fitting method in terms of processing time and is similar in terms of accuracy. The overall precision of the proposed approach is in the order of 10−2 mm and allows for replacing the currently used time-consuming manual labeling.

Declarations

Acknowledgements

The authors would like to thank D.R. Iskander and the staff at Contact Lens and Visual Optics Laboratory (CLVOL) at the School of Optometry, Queensland University of Technology, in Brisbane, Australia, for their efforts in collecting the videokeratoscopic data and for their advice and to the anonymous reviewers for their useful comments on the proposed approach. The work of M. Muma was supported by the project HANDiCAMS which acknowledges the financial support of the Future and Emerging Technologies (FET) Programme within the Seventh Framework Programme for Research of the European Commission (HANDiCAMS), under FET-Open grant number: 323944.

Competing interests

The authors declare that they have no competing interests.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
Signal Processing Group, Technische Universität Darmstadt

References

  1. AE Reynolds, Corneal Topography: Measuring and Modifying the Cornea (Springer, New York, 1992).Google Scholar
  2. W Alkhaldi, DR Iskander, AM Zoubir, MJ Collins, Enhancing the standard operating range of a Placido disk videokeratoscope for corneal surface estimation. IEEE Trans. Biomed. Eng. 56(3), 800–809 (2009).View ArticleGoogle Scholar
  3. W Alkhaldi, DR Iskander, AM Zoubir, Model-order selection in Zernike polynomial expansion of corneal surfaces using the efficient detection criterion. IEEE Trans. Biomed. Eng. 57(10), 2429–2437 (2010).View ArticleGoogle Scholar
  4. W Alkhaldi, Statistical signal and image processing techniques in corneal modeling. PhD thesis (Technische Universität Darmstadt, Germany, 2010).Google Scholar
  5. M Muma, AM Zoubir, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE Int. Conf. On. Robust model order selection for corneal height data based on τ estimation (Prague, 2011), pp. 4096–4099.
  6. DR Iskander, MJ Collins, B Davis, Evaluating tear film stability in the human eye with high-speed videokeratoscopy. IEEE Trans. Biomed. Eng. 52(11), 1939–1949 (2005).View ArticleGoogle Scholar
  7. D Alonso-Caneiro, J Turuwhenua, DR Iskander, MJ Collins, Diagnosing dry eye with dynamic-area high-speed videokeratoscopy. J. Biomed. Opt. 16(7), 076012 (2011).View ArticleGoogle Scholar
  8. DH Szczesna-Iskander, DR Iskander, Future directions in non-invasive measurements of tear film surface kinetics. Optom. Vis. Sci. 89(5), 749–759 (2012).View ArticleGoogle Scholar
  9. DH Szczesna-Iskander, D Alonso-Caneiro, DR Iskander, Objective measures of pre-lens tear film dynamics versus visual responses. Optom. Vis. Sci. 93(8), 872–880 (2016).View ArticleGoogle Scholar
  10. DR Iskander, MJ Collins, Applications of high-speed videokeratoscopy. Clin. Exp. Optom. 88(4), 223–231 (2005).View ArticleGoogle Scholar
  11. J Nemeth, B Erdelyi, B Csakany, P Gaspar, A Soumelidis, F Kahlesz, Z Lang, High-speed videotopographic measurement of tear film build-up time. Invest. Ophthalmol. Vis. Sci. 43(6), 1783–1790 (2002).Google Scholar
  12. YK Jang, BJ Kang, KR Park, A study on eyelid localization considering image focus for iris recognition. Pattern Recognit. Lett. 29(11), 1698–1704 (2008).View ArticleGoogle Scholar
  13. X Liu, P Li, Q Song, in Proceedings of the 3rd International Conference on Advances in Biometrics (ICB ’09). Lectures Notes in Computer Science, 5558. Eyelid localization in iris images captured in less constrained environment (Alghero, Italy, 2009), pp. 1140–1149.
  14. MJ Aligholizadeh, SH Javadi, R Sabbaghi-Nadooshan, K Kangarloo, in International Conference on Biometrics and Kansei Engineering. An effective method for eyelashes segmentation using wavelet transform (Takamatsu, 2011), pp. 185–188.
  15. F Bernard, CE Deuter, P Gemmar, H Schachinger, Eyelid contour detection and tracking for startle research related eye-blink measurements from high-speed video records. Comput. Methods Prog. Biomed. 112(1), 22–37 (2013).View ArticleGoogle Scholar
  16. JF Canny, A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986).View ArticleGoogle Scholar
  17. G Sicuranza, Nonlinear Image Processing (Academic Press, San Diego, USA, 2000).MATHGoogle Scholar
  18. D Zhang, DM Monro, S Rakshit, in IEEE International Conference on Image Processing. Eyelash removal method for human iris recognition (Atlanta, 2006), pp. 285–288.
  19. R Adams, L Bischof, Seeded region growing. IEEE Trans. Pattern Anal. Mach. Intell. 16(6), 641–647 (1994).View ArticleGoogle Scholar
  20. S Beucher, F Meyer, The morphological approach to segmentation: the watershed transformation. Opt. Eng. 34:, 433 (1992).Google Scholar
  21. M Kass, A Witkin, D Terzopoulos, Snakes: active contour models. Int. J. Comput. Vis. 1:, 321–331 (1988).View ArticleMATHGoogle Scholar
  22. N Otsu, A threshold selection method from gray-level histograms. Automatica. 11:, 23–27 (1975).Google Scholar
  23. D Ballard, Generalizing the hough transform to detect arbitrary shapes. Pattern Recognit. 13:, 111–122 (1981).View ArticleMATHGoogle Scholar
  24. P Rosin, E Rammler, The laws governing the fineness of powdered coal. J. Inst. Fuel. 7:, 29–36 (1933).Google Scholar
  25. W Weibull, A statistical distribution function of wide applicability. J. Appl. Mech. 18:, 293–297 (1951).MATHGoogle Scholar
  26. C Dagum, in Proceedings of the 40th Session of the International Statistical Institute, 46. A model of income distribution and the conditions of existence of moments of finite order (Warsaw, 1975), pp. 196–202.
  27. C Dagum, A new model of personal income distribution: specification and estimation. Economie Appliquée. 30:, 413–436 (1977).Google Scholar
  28. SO Rice, Mathematical analysis of random noise. Bell. Syst. Tech. J. 24:, 146–156 (1945).MathSciNetView ArticleMATHGoogle Scholar
  29. A Azzalini, A Dalla Valle, The multivariate skew-normal distribution. Biometrika. 83(4), 715–726 (1996).MathSciNetView ArticleMATHGoogle Scholar
  30. PJ Huber, Robust estimation of a location parameter. Ann. Math. Statist. 35:, 73–101 (1964).MathSciNetView ArticleMATHGoogle Scholar
  31. AM Zoubir, V Koivunen, Y Chakhchoukh, M Muma, Robust estimation in signal processing: a tutorial-style treatment of fundamental concepts. IEEE Signal Process. Mag. 29(4), 61–80 (2012).View ArticleGoogle Scholar
  32. RA Maronna, DR Martin, VJ Yohai, Robust Statistics. Wiley Series in Probability and Statistics (Wiley, Chichester, 2006).Google Scholar
  33. MA Branch, TF Coleman, Y Li, A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM J. Sci. Comput. 21:, 1–23 (1999).MathSciNetView ArticleMATHGoogle Scholar
  34. K Levenberg, A method for the solution of certain problems in least squares. Quart. Appl. Math. 2:, 164–168 (1944).MathSciNetMATHGoogle Scholar
  35. DW Marquardt, An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indus. Appl. Math. 11(2), 431–441 (1963).MathSciNetView ArticleMATHGoogle Scholar
  36. PV Hough, BW Powell, A method for faster analysis of bubble chamber photographs. Nuovo Cimento Ser. 10. 18:, 1184–1191 (1960).View ArticleGoogle Scholar
  37. L Masek, Recognition of human iris patterns for biometric identification. Technical report (The University of Western Australia, 2003).
  38. P Li, X Liu, L Xiao, Q Song, Robust and accurate iris segmentation in very noisy iris images. Image Vis. Comput. 28(2), 246–253 (2010).View ArticleGoogle Scholar
  39. DS Jeong, JW Hwang, BJ Kang, KR Park, CS Won, D-K Park, J Kim, A new iris segmentation method for non-ideal iris images. Image Vis. Comput. 28(2), 254–260 (2010).View ArticleGoogle Scholar
  40. M Muma, Robust estimation and model order selection for signal processing. PhD thesis (2014).

Copyright

© The Author(s) 2016