A fully automated method to detect and segment a manufactured object in an underwater color image

—In this work we propose a fully automated active contours based method for the detection and the segmentation of a moored manufactured object in an underwater image. Detection of objects in underwater images is difﬁcult due to the variable lighting conditions and shadows on the object. The proposed technique is based on the information contained in the color maps and uses the visual attention method, combined with a statistical approach for the detection and an active contour for the segmentation of the object to overcome the above problems. In the classical active contour method the region descriptor is ﬁxed and the convergence of the method depends on the initialization. With our approach, this dependence is overcome with an initialization using the visual attention results and a criteria to select the best region descriptor. This approach improves the convergence and the processing time while providing the advantages of a fully automated method.

A fully automated method to detect and segment a manufactured object in an underwater color image Christian Barat, and Ronald Phlypo Member IEEE Abstract-In this work we propose a fully automated active contours based method for the detection and the segmentation of a moored manufactured object in an underwater image.Detection of objects in underwater images is difficult due to the variable lighting conditions and shadows on the object.The proposed technique is based on the information contained in the color maps and uses the visual attention method, combined with a statistical approach for the detection and an active contour for the segmentation of the object to overcome the above problems.In the classical active contour method the region descriptor is fixed and the convergence of the method depends on the initialization.With our approach, this dependence is overcome with an initialization using the visual attention results and a criteria to select the best region descriptor.This approach improves the convergence and the processing time while providing the advantages of a fully automated method.
Index Terms-Image Processing, Detection, Segmentation, Active contour, Visual Attention.

I. INTRODUCTION
T HE objective of this work is to present a method which detects and segments manufactured objects (particularly underwater mines) in underwater video images.Actually, the underwater video is increasingly used as a complementary sensor to the sonar especially for detection of objects or animals, see [1] [2] [3] [4].However, the underwater images present some particular difficulties including natural and artificial illumination, color alteration, light attenuation [5] and marine snow.The method must tackle these problems and is developed under two constraints: no human intervention, low processing time.For segmenting an object in an image, we first need to detect the presence of the object.The method is composed of two main steps: 1) Object detection.
2) Object segmentation if an object is detected.
For detecting or tracking object in underwater images, generally, a pre-processing step is applied to enhance the images.For example, in [6], a homomorphic filtering is used to correct the illumination, wavelet denoising, anisotropic filtering to improve the segmentation, adjustment of image intensity, and some other processing.A self-tuning image restoration filter is applied in [7], but the illumination is considered to be uniform, which is a restrictive hypothesis.In [4] the constant background features are estimated for each frame by computing the sliding average over the ten preceding C.Barat is with the Laboratoire I3S Université de Nice Sophia-Antipolis, France, e-mail:barat@i3s.unice.fr.R. Phlypo is with the Ghent University -IBBT, IbiTech/MEDISIP, Belgium, e-mail:ronald.phlypo@ugent.be.frames, and the average is subtracted from the frame.This preprocessing cannot be applied in our images due to the large size of our objects and the slow movement of the object in the image, because the average will contain the object.Also, we want to detect an object in only one image.The main problem encountered in our images is the shadow, a problem wich is not discussed in previous works.We propose to use the visual attention approach [2] [8] with some modifications.Once the saliency map is extracted (using visual the attention method) we apply a Linear Discriminant Analysis (LDA) [9] to estimate the probability of object presence.For the second step, we use active contour.Since the work of Kass and al. [10], extensive research on snakes (active contours) has been developed to segment images.The early approaches [10], [11] minimized an energy function to move the active contour toward the object's edges.In [12], region information has been used to overcome some problems inherent to the edge approach.In this paper we put forward a region based snake method in the Minimum Description Length (MDL) framework.This approach is adapted to our problem of segmentation of uniform color objects.Since we require low processing times, we use explicit snake (polygonal).Implicit approach (level set) needs more computational time even with the fast marching method [13].The problem encountered in the classical snake method is the dependence on the contour initialization.We propose to use the visual attention based method to find the region of interest to segment in the image (there is generally only one manufactured object to be found in underwater images).Moreover, in the classical region-based snake approach, the region descriptor is fixed for the whole video sequence, while in this paper we propose to select the best region descriptor adapted to each image.The idea is to use the information extracted with the visual attention to select, based on a kurtosis criterion, the region descriptor for each image to segment.This approach reduces the shadow effect during the segmentation step.The methods are presented in Section II, our approach is developed in Sections II-D and II-E.Experimental results on real images are reported in Section III-B.

II. METHODS
The general method for image segmentation is composed of six steps, the first three steps corresponding to the object detection and the last three corresponding to the object segmentation: 1) Extraction of a saliency map using visual attention.
2) Detection of the most salient part of the map.

hal-00528288, version 1 -21 Oct 2010
Author manuscript, published in "EURASIP Journal on Advances in Signal Processing (2010) 1-10" In what follows we supply the details for each of the above sub-parts.

A. Visual Attention
The method we adapt is based on the work of Itti and Koch [8].Low-level vision features (orientation, brightness, color channels tuned to red, green, blue and yellow hues ) are extracted from the original color image at several spatial scales (depending on the image size).The different spatial scales are obtained through the use of a dyadic Gaussian pyramid.Then we combine the different maps, after a normalization step, to obtain a saliency map.The final step is the determination of the most salient part of the image using a Winner Take All (WTA) neural network; see Figure 1.The initial image is a color image (r=red,b=blue,g=green channels), from which we can extract the intensity image using the following equation:
From , four orientation-selective pyramids `£ QP ba c d are also created using Gabor filtering at c e¡ 0, 45, 90, and 135 deg.Feature maps for each pyramid are calculated using center surround operation as difference (pixel by pixel) between fine scale P gf e¡ ih I( pa q pa qr ts and coarse scale P vu w¡ ih YP gf ¨ § Gh F Va xr ps Rs .y £ QP gu a P gf 8 6¡ ¤B £ QP gu £ 'P vf t qB (1) £ QP vu a P vf pa c ¡ iB `£ QP vu a c d `£ QP gf pa c d QB H where is the center surround operation.For the center surround operation the difference between maps at different scale is obtained by interpolation at finer scale followed by a pixel by pixel subtraction.For color channels, the feature maps are calculated for Green/Red and Blue/Yellow opponency.
6£ QP gu a P gf 8 6¡ ¤B 2£ Q #£ 'P vu E" ) £ QP gu x e£ ) £ QP gf 8 E" & #£ QP gf 8 q qB (3) £ 'P vu a P gf 8 6¡ iB ¦£ Q3 0£ QP vu E" G9 0£ 'P vu x £ Q9 0£ QP gf 8 E" G3 0£ QP gf 8 x QB H (4) Forty four features maps are finally obtained, 6 for the intensity, 12 for the color and 24 for orientation.Then, we calculate the conspicuity maps ( £ 'P R Qa y £ QP R 'a 6£ QP R Qa £ QP R ) by linear combination of the feature maps at scale P C¡ r , followed by a normalization between 0 and 1.Compared to the classical approach, the normalization related to the maximum of each feature map is not applied here; see [8].This normalisation is not needed since our subsequent processing step is invariant to scale; see Section II-E.Finally, we calculate the conspicuity color map and the saliency map d :

B. Detection of the most salient part
In this step we select only the most salient pixel of the saliency map.The WTA neural network is not used since we have only one object to detect: h 6i w¡ j Rk xl gm %j Yn o qp sr t vu d o qp r t vu £ 'P H

C. Classification
A classic Linear Discriminant Analysis (LDA) is used to classify the area around the maximum detected in the previous step (Section II-B), as 'object'(1) or 'not object' (0).As this is a supervised approach the parameters of the model are estimated using reference videos.An object is considered detected if: where h E ¡ U pa e is the prior probability for classes 0 and 1, v the vector of expected features values and { the feature vector.

D. Initialization by Visual Attention
The idea is to use the saliency map to initialize the active contour.Generally, in our application, the underwater image is composed of only one object in a dark and noisy background.The visual attention scheme is well adapted to these particular images where a single region of interest is contrasted with a background.We propose to use only the most salient part of the image to initialize the position of the active contour.

E. Adaptive region descriptor for Active contour
Usually the choice of the region descriptor depends on the application and is fixed a priori.In this work, we introduce a data-driven region descriptor.The information to choose the descriptor will be deduced from the saliency map.Once we have detected the most salient part, we can select the most informative conspicuity map.Since we are dealing with illuminated objects superposed on a dark background, an informative map would mean that the pixels are easy classifiable into either a class object or background (non-object).In other words, the probability density function of the pixel values would ideally be unimodal with a mode at the background value (

¢ &U
) and the object constituting the distribution tail ( 1 e ), whereas a non-informative pixel map would result in a more uniform distribution of its pixel values.A well adapted criterion to differentiate an informative from a non-informative pixel map is the entropy of the pixel map, which is under some general assumptions related to the more easily calculable normalized kurtosis of a map [14], [15].The pixel map of choice w is: where ¡ i h £ 2 a ' qs , £ 2 a Q is the pixel intensity for a pixel belonging to the conspicuity map S y ga a a g .In [16] the one map is found that contributes most to the activity at the most salient location £ ¦ A ga ' looking back at the conspicuity maps.Examining the feature maps that gave rise to the conspicuity map g with S h v a a `s leads to the one that contributes most to its activity at the winning location.The winning feature map is segmented using region growing and adaptive thresholding.In Section III-D, we compare this approach with our approach and show the improvement.

F. Active contour
which minimizes an energy functional by moving through the spatial domain of an image.The total energy consists of two energies: internal 2 Y and external p .The internal energy preserves the smoothness and con- tinuity.The external energy is derived from the information in the image .A typical energy function for a snake using image gradients is [11]: where and £ a '¤ a ¦ are positive coefficients.For a snake using the region information in the image based on Minimum Description Length (MDL) we have [12]: " 2µ Rl wh o qp r t vu ¶ £ ¦ a ' 6S B •£ E § ¦ where is the segmented region , is the code length for unit arc length, ¸is the number of regions and £ E the parameters of the distribution describing the region .¦ is the code length needed to describe the distribution and code system for region .The external force is processed using a window ¹ o qp r t vu of º pixels around the control point.Then the equation is: Using equation ( 12): where ú o ù í gu ¡ h B § ï lies on s , ø o ù í vu is the curvature, § o ù í gu is the unit normal to at point § ï .As the objects of interest are simple manufactured objects, we propose to add a constraint on the snake form.This allows a smooth segmentation of the image.The constrained form is ellipsoidal.Related works in litterature already propose to add to the energy a penalty function increasing with the distance of the curve to an ellipse [17], [18].We propose to estimate directly the ellipse parameters and not the control points of the curve as in [19], but allowing for a rotation of the ellipse as in [18].Using the parametric expression of ellipses as function of c we have : where £ ¦ Au £ ¦ó Qa ' u £ ¦ó x is the ellipse center, û and the half length of the ellipse axes, and þ A£ ¦ó the angle between the x-axis and the major axis of the ellipse.As we add a constraint of form we can eliminate respectively the internal energy and the internal force in the equations ( 10) and (16).Then the equation ( 16) Then, we express the evolution of the ellipse parameters if we consider a discretized curve to be an v p 1 , t £ , ba and bc are coefficients of ponderation controlling the speed of the active contour.The rotation around the centre of the ellipse can be found by calculating the angular momentum for a solid object as with Newton's second law.
For an object with a moment of inertia on which a torque e is exercised, we have: where Ch ö ÷ is the force at the point § ï normal to R. We can generalise for forces applied along the contour where bq is a coefficient controlling the speed of evolution of the contour including the constants f ó and r .

III. RESULTS
In this section, we present first some criterion of performance and then we show the results obtained on real images.

A. Criterion
The criterion expresses the segmentation quality: where w w 2 is the internal region of the snake, w u the region of the object to detect, w c the background region and ü Fû ¥ pª s£ w # defines the cardinality (the number of elements in set w ).

B. Illustration of the method on real image
The method has been tested on real images of underwater mines1 .Since we have only a partial knowledge on the recording conditions, then we cannot use illumination model or any a priori information to restore images.The first result presented is a mine in seawater environment with low visibility, some acquisition noise and compression nois;, see Figure 2. We select the best conpiscuity map, Figure 3, using the kurtosis criterion (9). Figure 4 shows the histograms for the conspicuity maps and the calculated kurtosis.The maximum kurtosis for this image is obtained for the Blue/Yellow map.Once the maximum on the saliency map is detected we initialize an active contour on this position; see Figure 5. Then after convergence of the snake we obtain the segmentation of the object displayed Figures 6 and 7.For this image we obtain s vu Yu ÿ ¡ U pH S and f a Vy ¢ ¡ e U S .

C. Results on different images
In this section, we present the segmentation results on a set of underwater images.The images have been recorded with different conditions of depth, illumination, noise, acquisition system, camera, etc.
The method is robust to the acquisition conditions.Even under bad lighting conditions, the mine is detected, for an example without enough light, see Figure 11.An example with artificial light is shown Figure 9 and another with natural light Figure 12.The misclassification results are very low, from 0 to 2% see Figures 8 and 10, and at minimum we detect more than 25% of the object, see Figure 14.

D. Comparison with other methods
In this section we compare our method with two other alternative approaches: the approach proposed by [6], also developed for the same images and the approach proposed by [16] using the maximum of activity to select the feature map, see Section II-E.
1) First Method: The image is corrected using the following Pre-processing and the segmentation is applied using RGB features.
Removing aliasing effect due to digital conversion of the images.
Converting color space from RGB to YCbCr.Correction of non-uniform illumination using homomor- phic filtering.
Wavelet denoising.Anisotropic filtering to improve image segmentation.Segmentation presented on the real image.In red the obtained segmentation, in blue the ground true.The blue square is the area of interest which can be used for the recognition step.We obtain similar results with the method proposed in [6] for objects without shadows; see Figures 8, 15.However, our method is more robust to shadow; see for example Figure 16, and compare with Figure 7.For some images the segmentation using active contour cannot converge as the contours are blurred and the contrast is low between the object and the background.Using the conspicuity map we can reduce this effect.
2) Second Method: We implement the method to select the feature map based on the maximum activity and use active contour segmentation.As in the previous approach, we obtain very similar results for objects with an uniform illumination and no shadows.But, as illustrated in Figures 17 and 18 compared to the results presented in figures 6 and 7, this method focuses on the illuminated part of the object.The criteria to select the map is local compared to our criteria [Eq.( 9)] and the selection will thus focus on detail.

E. Results obtained during the evaluation of the project TOPVISION
The results next were obtained by our method applied on test videos for the project TOPVISION 2 [20].Remark that these videos were never used during the development of the method, The project is composed of 4 steps: Object detection.For the detection step , we only try to detect if an object is present in the image.We use the output of the LDA classifier for the detection step; see Section II-C.For the Position finding, the position is correct if we are inside the object.We are using the maximum of the saliency map as position; Section II-D.Only the two first steps have been evaluated on 11 videos ( 20000 images), and we have obtained 64.77% of good object detection (and 2.82% of false alarms) and 85.28% of positon finding.The method presented in [6] have been tested on the same images for the detection step and obtains 56.55% of good object detection and 0.79% of false alarms.The results are more robust in term of false alarm but the percentage of detection is inferior.The method uses the detection of geometric forms to determine the object presence.This needs more computation than our method because, after the preprocessing step (see Section III-D) it extracts lines and circles.

F. Limits of the method
The method provides satisfactory results in numerous cases, but when the object is of the same color than the background it presents some limitation illustrated Figure 19.It is difficult for the method to find the object.To obtain this result we have removed the ellipsoidal constraint.

IV. CONCLUSION
In this paper we have presented a fully automated method to detect and segment manufactured objects in underwater image.The method uses two well known approaches with some original adaptations: the visual attention scheme, and  the active contours.We have described the successive steps of the method and presented some results on real images.The method presents good performance even with noisy images and is robust to different acquisition conditions (illumination, camera settings, shadow,etc...).The limitations of the approach have also been evoked.Essentially to ensure good results the object to segment must be uniform and sufficiently contrasted respect to the background.For the future, we are developing a method to track these objects in video based on the conspicuity map and particle filtering.Currently, we can track the object using the estimated snake in the previous frame as initialization in the actual frame.

REFERENCES
[1] S. Bazeille, L. Jaulin, and I. Quidu, "Identification of underwater manmade object using a colour criterion," in Proceedings of the Institute of Acoustics, 2007.
ellipse with uniform density and mass v the moment of inertia is d ¡ 5v 4£ Qû § & Q r Va and the torque in any point § ï can be calculated as

Fig. 5 .Fig. 6 .
Fig. 5. Maximum detected : green cross and the initial snake in red.The image is presented at initial scale (Ï Å ) ) after up sampling operation on Blue/Yellow conpicuity map.

Fig. 7 .
Fig. 7.Segmentation presented on the real image.In red the obtained segmentation, in blue the ground true.The blue square is the area of interest which can be used for the recognition step.

Fig. 15 .
Fig. 15.Segmentation of a mine at sea with Pre-Processing and segmentation using RGB information.# Y 6 @ Å ê i d and 0 I B ¨ Å .

Fig. 16 .
Fig. 16.Segmentation of a mine at sea with Pre-Processing and segmentation using RGB information.# Y 6 @ Å ê Ë and 0 I B ¨ Å .

Fig. 17 .
Fig. 17.Segmentation of a mine using the feature map selected by maximum activity.# Y 6 6 Å h ê Ë and t 0 Y B ¨ Å h ê x xd .