- Research Article
- Open Access
A Fully Automated Method to Detect and Segment a Manufactured Object in an Underwater Color Image
EURASIP Journal on Advances in Signal Processing volume 2010, Article number: 568092 (2010)
We propose a fully automated active contours-based method for the detection and the segmentation of a moored manufactured object in an underwater image. Detection of objects in underwater images is difficult due to the variable lighting conditions and shadows on the object. The proposed technique is based on the information contained in the color maps and uses the visual attention method, combined with a statistical approach for the detection and an active contour for the segmentation of the object to overcome the above problems. In the classical active contour method the region descriptor is fixed and the convergence of the method depends on the initialization. With our approach, this dependence is overcome with an initialization using the visual attention results and a criterion to select the best region descriptor. This approach improves the convergence and the processing time while providing the advantages of a fully automated method.
The objective of this work is to present a method which detects and segments manufactured objects (particularly underwater mines) in underwater video images. Actually, the underwater video is increasingly used as a complementary sensor to the sonar especially for detection of objects or animals; see [1–4]. However, the underwater images present some particular difficulties including natural and artificial illumination, color alteration, light attenuation , and marine snow. The method must tackle these problems and is developed under two constraints: no human intervention, low processing time. For segmenting an object in an image, we first need to detect the presence of the object. The method is composed of two main steps:
object segmentation if an object is detected.
For detecting or tracking objects in underwater images, generally, a preprocessing step is applied to enhance the images. For example, in , a homomorphic filtering is used to correct for the illumination, wavelet denoising, anisotropic filtering to improve the segmentation, adjustment of image intensity, and some other processing. A self-tuning image restoration filter is applied in , but the illumination is considered to be uniform, which is a restrictive hypothesis. In  the constant background features are estimated for each frame by computing the sliding average over the ten preceding frames, and the average is subtracted from the frame. This preprocessing cannot be applied in our images due to the large size of our objects and the slow movement of the object in the image, because the average will contain the object. Also, we want to detect an object in only one image. The main problem encountered in our images is the shadow, a problem wich is not discussed in previous works. We propose to use the visual attention approach [2, 8] with some modifications. Once the saliency map is extracted (using the visual attention method), we apply a Linear Discriminant Analysis (LDA)  to estimate the probability of object presence. For the second step, we use active contour. Since publication of the work of Kass et al. , extensive research on snakes (active contours) has been developed to segment images. The early approaches [10, 11] minimized an energy function to move the active contour toward the object's edges. In , region information has been used to overcome some problems inherent to the edge approach. In this paper we put forward a region based snake method in the Minimum Description Length (MDL) framework. This approach is adapted to our problem of segmentation of uniform color objects. Since we require low processing times, we use explicit snake (polygonal). Implicit approach (level set) needs more computational time even with the fast marching method . The problem encountered in the classical snake method is the dependence on the contour initialization. We propose to use the visual-attention-based method to find the region of interest to segment in the image (there is generally only one manufactured object to be found in underwater images). Moreover, in the classical region-based snake approach, the region descriptor is fixed for the whole video sequence, while in this paper we propose to select the best region descriptor adapted to each image. The idea is to use the information extracted with the visual attention to select, based on a kurtosis criterion, the region descriptor for each image to segment. This approach reduces the shadow effect during the segmentation step. The methods are presented in Section 2, our approach is developed in Sections 2.4 and 2.5. Experimental results on real images are reported in Section 3.2.
The general method for image segmentation is composed of six steps, the first three steps corresponding to the object detection and the last three corresponding to the object segmentation:
extraction of a saliency map using visual attention,
detection of the most salient part of the map,
classification in class "object" or "no object",
selection of a region descriptor,
initialization of the snake on the most salient part of the map,
segmentation of the image by the snake.
In what follows we supply the details for each of the above subparts.
2.1. Visual Attention
The method we adapt is based on the work of Itti et al. . Low-level vision features (orientation, brightness, color channels tuned to red, green, blue and yellow hues) are extracted from the original color image at several spatial scales (depending on the image size). The different spatial scales are obtained through the use of a dyadic Gaussian pyramid. Then we combine the different maps, after a normalization step, to obtain a saliency map. The final step is the determination of the most salient part of the image using a Winner-Take-All (WTA) neural network; see Figure 1.
The initial image is a color image (, , channels), from which we can extract the intensity image using the following equation:
and four color channels are created:
One Gaussian pyramid is estimated from intensity image , where is the scale. Four Gaussian pyramids , , , and are created from these color channels. From , four orientation-selective pyramids are also created using Gabor filtering at 0, 45, 90, and 135 deg. Feature maps for each pyramid are calculated using center-surround operation as difference (pixel by pixel) between fine scale and coarse scale
where is the center-surround operation. For the center-surround operation the difference between maps at different scale is obtained by interpolation at finer scale followed by a pixel by pixel subtraction. For color channels, the feature maps are calculated for Green/Red and Blue/Yellow opponency
Forty four feature maps are finally obtained, 6 for the intensity, 12 for the color, and 24 for orientation. Then, we calculate the conspicuity maps () by linear combination of the feature maps at scale , followed by a normalization between 0 and 1. Compared to the classical approach, the normalization related to the maximum of each feature map is not applied here; see . This normalization is not needed since our subsequent processing step is invariant to scale; see Section 2.5. Finally, we calculate the conspicuity color map and the saliency map :
2.2. Detection of the Most Salient Part
In this step we select only the most salient pixel of the saliency map. The WTA neural network is not used since we have only one object to detect:
A classic Linear Discriminant Analysis (LDA) is used to classify the area around the maximum detected in the previous step (Section 2.2), as "object" () or "not object" (0). As this is a supervised approach, the parameters of the model are estimated using reference videos. An object is considered detected if
where is the prior probability for classes 0, and 1, the vector of expected features values and the feature vector.
2.4. Initialization by Visual Attention
The idea is to use the saliency map to initialize the active contour. Generally, in our application, the underwater image is composed of only one object in a dark and noisy background. The visual attention scheme is well-adapted to these particular images where a single region of interest is contrasted with a background. We propose to use only the most salient part of the image to initialize the position of the active contour.
2.5. Adaptive Region Descriptor for Active Contour
Usually the choice of the region descriptor depends on the application and is fixed a priori. In this work, we introduce a data-driven region descriptor. The information to choose the descriptor will be deduced from the saliency map. Once we have detected the most salient part, we can select the most informative conspicuity map. Since we are dealing with illuminated objects superposed on a dark background, an informative map would mean that the pixels are easy classifiable into either a class object or background (nonobject). In other words, the probability density function of the pixel values would ideally be unimodal with a mode at the background value () and the object constituting the distribution tail (), whereas a noninformative pixel map would result in a more uniform distribution of its pixel values. A well-adapted criterion to differentiate an informative from a noninformative pixel map is the entropy of the pixel map, which is under some general assumptions related to the more easily calculable normalized kurtosis of a map [14, 15]. The pixel map of choice is
where , and is the pixel intensity for a pixel belonging to the conspicuity map . In  the one map is found that contributes most to the activity at the most salient location looking back at the conspicuity maps. Examining the feature maps that gave rise to the conspicuity map with leads to the one that contributes most to its activity at the winning location. The winning feature map is segmented using region growing and adaptive thresholding. In Section 3.4, we compare this approach with our approach and show the improvement.
2.6. Active Contour
A parametric snake is a curve , which minimizes an energy functional
by moving through the spatial domain of an image. The total energy consists of two energies: internal and external . The internal energy preserves the smoothness and continuity. The external energy is derived from the information in the image . A typical energy function for a snake using image gradients is given as 
where , , and are positive coefficients. For a snake using the region information in the image based on Minimum Description Length (MDL) we have 
where is the segmented region , is the code length for unit arc length, is the number of regions, and the parameters of the distribution describing the region . is the code length needed to describe the distribution and code system for region . The external force is processed using a window of pixels around the control point. Then the equation is
If we assume that each pixel has a multiplicative mixture distribution, we obtain
The motion equation for point is
where lies on , is the curvature, is the unit normal to at point .
As the objects of interest are simple manufactured objects, we propose to add a constraint on the snake form. This allows a smooth segmentation of the image. The constrained form is ellipsoidal. Related works in literature already propose to add to the energy a penalty function increasing with the distance of the curve to an ellipse [17, 18]. We propose to estimate directly the ellipse parameters and not the control points of the curve as in , but allowing for a rotation of the ellipse as in . Using the parametric expression of ellipses as function of , we have
where is the ellipse center, and the half-length of the ellipse axes, and the angle between the x-axis and the major axis of the ellipse. As we add a constraint of form, we can eliminate, respectively, the internal energy and the internal force in (9) and (15). Then (15) becomes
Then, we express the evolution of the ellipse parameters if we consider a discretized curve to be an -tuple of points:
, , , and are coefficients of ponderation controlling the speed of the active contour. The rotation around the center of the ellipse can be found by calculating the angular momentum for a solid object as with Newton's second law. For an object with a moment of inertia on which a torque is exercised, we have
Since for an ellipse with uniform density and mass the moment of inertia is
and the torque in any point can be calculated as
where is the force at the point normal to . We can generalise for forces applied along the contour
where is a coefficient controlling the speed of evolution of the contour including the constants and .
In this section, we present first some criterion of performance and then we show the results obtained on real images.
The criterion expresses the segmentation quality:
where is the internal region of the snake, the region of the object to detect, the background region and defines the cardinality (the number of elements in set ).
3.2. Illustration of the Method on Real Image
The method has been tested on real images of underwater mines (the information contained in this publication are using ac-rov data recorded at Lanvéoc (France) by the GESMA (Groupe d'Etudes Sous-Marines de l'Atlantique)). Since we have only a partial knowledge on the recording conditions, we cannot use an illumination model or any a priori information to restore images. The first result presented is a mine in seawater environment with low visibility, some acquisition noise and compression noise, see Figure 2. We select the best conspicuity map, Figure 3, using the kurtosis criterion (8). Figure 4 shows the histograms for the conspicuity maps and the calculated kurtosis. The maximum kurtosis for this image is obtained for the Blue/Yellow map. Once the maximum on the saliency map is detected, we initialize an active contour on this position; see Figure 5. Then after convergence of the snake, we obtain the segmentation of the object displayed in Figures 6 and 7. For this image we obtain and .
3.3. Results on Different Images
In this section, we present the segmentation results on a set of underwater images. The images have been recorded with different conditions of depth, illumination, noise, acquisition system, camera, and so forth.
The method is robust to the acquisition conditions. Even under bad lighting conditions, the mine is detected, for an example without enough light; see Figure 11. An example with artificial light is shown in Figure 9 and another with natural light Figure 12. The misclassification results are very low, from 0% to 2%; see Figures 8 and 10; and we detect a minimum of 25% of the object; see Figure 14.
3.4. Comparison with Other Methods
In this section we compare our method with two other alternative approaches: the approach proposed by , also developed for the same images, and the approach proposed by  using the maximum of activity to select the feature map; see Section 2.5.
3.4.1. First Method
The image is corrected using the following pre-processing and the segmentation is applied using RGB features:
removing aliasing effect due to digital conversion of the images,
converting color space from RGB to YCbCr,
)correction of nonuniform illumination using homomorphic filtering,
anisotropic filtering to improve image segmentation,
adjusting image intensity,
converting from YCbCr to RGB,
equalizing color mean.
We obtain similar results with the method proposed in  for objects without shadows; see Figures 8 and 15. However, our method is more robust to shadow; see, for example, Figure 16, and compare with Figure 7. For some images the segmentation using active contour cannot converge as the contours are blurred and the contrast is low between the object and the background; see Figures 12 and 13. Using the conspicuity map, we can reduce this effect.
3.4.2. Second Method
We implement the method to select the feature map based on the maximum activity and use active contour segmentation. As in the previous approach, we obtain very similar results for objects with a uniform illumination and no shadows. But, as illustrated in Figures 17 and 18 compared to the results presented in Figures 6 and 7, this method focuses on the illuminated part of the object. The criterion to select the map is local compared to our criterion (8) and the selection will thus focus on detail.
3.5. Results Obtained during the Evaluation of the Project TOPVISION
The results next were obtained by our method applied on test videos for the project TOPVISION (OPerational Trials of UnderWater Videos for Identification of Harmfull Article) . Remark that these videos were never used during the development of the method. The project is composed of 4 steps.
For the detection step, we only try to detect if an object is present in the image. We use the output of the LDA classifier for the detection step; see Section 2.3. For the position finding, the position is correct if we are inside the object. We are using the maximum of the saliency map as position; Section 2.4. Only the two first steps have been evaluated on 11 videos (20000 images), and we have obtained 64.77% of good object detection (and 2.82% of false alarms) and 85.28% of positon finding. The method presented in  has been tested on the same images for the detection step and obtains 56.55% of good object detection and 0.79% of false alarms. The results are more robust in terms of false alarm but the percentage of detection is inferior. The method uses the detection of geometric forms to determine the object presence. This needs more computation than our method because, after the preprocessing step (see Section 3.4), it extracts lines and circles.
3.6. Limits of the Method
The method provides satisfactory results in numerous cases, but when the object is of a color indistinguishable from that of the background it presents some limitation illustrated in Figure 19. It is difficult for the method to find the object. To obtain an acceptable result we had to remove the ellipsoidal constraint.
In this paper we have presented a fully automated method to detect and segment manufactured objects in underwater images. The method uses two well-known approaches with some original adaptations: the visual attention scheme, and the active contours. We have described the successive steps of the method and presented some results on real images. The method presents good performance even with noisy images and is robust to different acquisition conditions (illumination, camera settings, shadow, etc). The limitations of the approach have also been evoked. Essentially to ensure good results the object to segment must be uniform and sufficiently contrasted with respect to the background. For the future, we are developing a method to track these objects in videos based on the conspicuity map and particle filtering. Currently, we can track the object using the estimated snake in the previous frame as initialization in the actual frame.
Bazeille S, Jaulin L, Quidu I: Identification of underwater manmade object using a colour criterion. Proceedings of the 4th International Conference on Bio-Acoustics, April 2007, Loughborough, Uk, Proceedings of the Institute of Acoustics
Edgington D, Walther D, Cline D, Sherlock R, Koch C: Detecting and tracking animals in underwater video using a neuromorphic saliency-based attention system. Proceedings of the American Society of Limnology and Oceanography (ASLO) Summer Meeting, June 2004, Savannah, Ga, USA
Olmos A, Trucco E: Detecting man-made objects in unconstrained subsea videos. Proceedings of the British Machine Vision Conference, 2002 517-526.
Walther D, Edgington DR, Koch C: Detection and tracking of objects in underwater video. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 1: 544-549.
Schechner YY, Karpel N: Clear underwater vision. Proceedings of the IEEE Computer Society Conference on Vision and Pattern Recognition, 2004 1: 536-543.
Bazeille S, Quidu I, Jaulin L, Malkasse J-P: Automatic underwater image pre-preprocessing. Proceedings of the SEA TECH WEEK Caracterisation du Milieu Marin, October 2006, Brest, France
Trucco E, Olmos-Antillon AT: Self-tuning underwater image restoration. IEEE Journal of Oceanic Engineering 2006, 31(2):511-519. 10.1109/JOE.2004.836395
Itti L, Koch C, Niebur E: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20(11):1254-1259. 10.1109/34.730558
Fukunaga K: Introduction to Statistical Pattern Recognition. Academic Press, San Diego, Calif, USA; 1990.
Kass M, Witkin A, Terzopoulos D: Snakes: active contour models. International Journal of Computer Vision 1988, 1(4):321-331. 10.1007/BF00133570
Xu C, Prince J: Gradient vector flow: a new external force for snakes. Proceedings of the Conference on Computer Vision and Pattern Recognition, June 1997, San Juan, Puerto Rico, USA 66-71.
Zhu SC, Yuille A: Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1996, 18(9):884-900. 10.1109/34.537343
Sethian JA: Evolution, implementation, and application of level set and fast marching methods for advancing fronts. Journal of Computational Physics 2001, 169(2):503-555. 10.1006/jcph.2000.6657
Girolami M, Fyfe C: Negentropy and kurtosis as projection pursuit indices provide generalised ica algorithms. Advances in Neural Information Processing Systems 1996.
Friedman JH, Tukey JW: A projection pursuit algorithm for exploratory data analyis. IEEE Transactions on Computers 1974, 23(9):881-890.
Walther D, Rutishauser U, Koch C, Perona P: Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Computer Vision and Image Understanding 2005, 100(1-2):41-63. 10.1016/j.cviu.2004.09.004
Ray N, Acton ST, Ley K: Tracking leukocytes in vivo with shape and size constrained active contours. IEEE Transactions on Medical Imaging 2002, 21(10):1222-1235. 10.1109/TMI.2002.806291
Pluempitiwiriyawej C, Moura JMF, Wu Y-JL, Kanno S, Ho C: Stochastic active contour for cardiac mr image segmentation. Proceedings of the International Conference on Image Processing (ICIP '03), September 2003 2: 1097-1100.
Zugaj D, Lattuati V: Contours actifs: suivi de structures linéiques déformables. Extension à la résolution de problèmes d'optimisation sous contraintes. Proceedings of the Congrès GRETSI 97: Seizième Colloque sur le Traitement du Signal et des Images, 1997, Grenoble, France 16: 407-410.
The authors would like to thank V. Zarzoso for the interesting discussions on this work.
About this article
Cite this article
Barat, C., Phlypo, R. A Fully Automated Method to Detect and Segment a Manufactured Object in an Underwater Color Image. EURASIP J. Adv. Signal Process. 2010, 568092 (2010). https://doi.org/10.1155/2010/568092
- Linear Discriminant Analysis
- Visual Attention
- Active Contour
- Minimum Description Length
- Wavelet Denoising