A Fully Automated Method to Detect and Segment a Manufactured Object in an Underwater Color Image
© C. Barat and R. Phlypo. 2010
Received: 1 July 2009
Accepted: 2 February 2010
Published: 23 March 2010
We propose a fully automated active contours-based method for the detection and the segmentation of a moored manufactured object in an underwater image. Detection of objects in underwater images is difficult due to the variable lighting conditions and shadows on the object. The proposed technique is based on the information contained in the color maps and uses the visual attention method, combined with a statistical approach for the detection and an active contour for the segmentation of the object to overcome the above problems. In the classical active contour method the region descriptor is fixed and the convergence of the method depends on the initialization. With our approach, this dependence is overcome with an initialization using the visual attention results and a criterion to select the best region descriptor. This approach improves the convergence and the processing time while providing the advantages of a fully automated method.
object segmentation if an object is detected.
For detecting or tracking objects in underwater images, generally, a preprocessing step is applied to enhance the images. For example, in , a homomorphic filtering is used to correct for the illumination, wavelet denoising, anisotropic filtering to improve the segmentation, adjustment of image intensity, and some other processing. A self-tuning image restoration filter is applied in , but the illumination is considered to be uniform, which is a restrictive hypothesis. In  the constant background features are estimated for each frame by computing the sliding average over the ten preceding frames, and the average is subtracted from the frame. This preprocessing cannot be applied in our images due to the large size of our objects and the slow movement of the object in the image, because the average will contain the object. Also, we want to detect an object in only one image. The main problem encountered in our images is the shadow, a problem wich is not discussed in previous works. We propose to use the visual attention approach [2, 8] with some modifications. Once the saliency map is extracted (using the visual attention method), we apply a Linear Discriminant Analysis (LDA)  to estimate the probability of object presence. For the second step, we use active contour. Since publication of the work of Kass et al. , extensive research on snakes (active contours) has been developed to segment images. The early approaches [10, 11] minimized an energy function to move the active contour toward the object's edges. In , region information has been used to overcome some problems inherent to the edge approach. In this paper we put forward a region based snake method in the Minimum Description Length (MDL) framework. This approach is adapted to our problem of segmentation of uniform color objects. Since we require low processing times, we use explicit snake (polygonal). Implicit approach (level set) needs more computational time even with the fast marching method . The problem encountered in the classical snake method is the dependence on the contour initialization. We propose to use the visual-attention-based method to find the region of interest to segment in the image (there is generally only one manufactured object to be found in underwater images). Moreover, in the classical region-based snake approach, the region descriptor is fixed for the whole video sequence, while in this paper we propose to select the best region descriptor adapted to each image. The idea is to use the information extracted with the visual attention to select, based on a kurtosis criterion, the region descriptor for each image to segment. This approach reduces the shadow effect during the segmentation step. The methods are presented in Section 2, our approach is developed in Sections 2.4 and 2.5. Experimental results on real images are reported in Section 3.2.
extraction of a saliency map using visual attention,
detection of the most salient part of the map,
classification in class "object" or "no object",
selection of a region descriptor,
initialization of the snake on the most salient part of the map,
segmentation of the image by the snake.
In what follows we supply the details for each of the above subparts.
2.1. Visual Attention
and four color channels are created:
One Gaussian pyramid is estimated from intensity image , where is the scale. Four Gaussian pyramids , , , and are created from these color channels. From , four orientation-selective pyramids are also created using Gabor filtering at 0, 45, 90, and 135 deg. Feature maps for each pyramid are calculated using center-surround operation as difference (pixel by pixel) between fine scale and coarse scale
where is the center-surround operation. For the center-surround operation the difference between maps at different scale is obtained by interpolation at finer scale followed by a pixel by pixel subtraction. For color channels, the feature maps are calculated for Green/Red and Blue/Yellow opponency
Forty four feature maps are finally obtained, 6 for the intensity, 12 for the color, and 24 for orientation. Then, we calculate the conspicuity maps ( ) by linear combination of the feature maps at scale , followed by a normalization between 0 and 1. Compared to the classical approach, the normalization related to the maximum of each feature map is not applied here; see . This normalization is not needed since our subsequent processing step is invariant to scale; see Section 2.5. Finally, we calculate the conspicuity color map and the saliency map :
2.2. Detection of the Most Salient Part
In this step we select only the most salient pixel of the saliency map. The WTA neural network is not used since we have only one object to detect:
A classic Linear Discriminant Analysis (LDA) is used to classify the area around the maximum detected in the previous step (Section 2.2), as "object" ( ) or "not object" (0). As this is a supervised approach, the parameters of the model are estimated using reference videos. An object is considered detected if
2.4. Initialization by Visual Attention
The idea is to use the saliency map to initialize the active contour. Generally, in our application, the underwater image is composed of only one object in a dark and noisy background. The visual attention scheme is well-adapted to these particular images where a single region of interest is contrasted with a background. We propose to use only the most salient part of the image to initialize the position of the active contour.
2.5. Adaptive Region Descriptor for Active Contour
Usually the choice of the region descriptor depends on the application and is fixed a priori. In this work, we introduce a data-driven region descriptor. The information to choose the descriptor will be deduced from the saliency map. Once we have detected the most salient part, we can select the most informative conspicuity map. Since we are dealing with illuminated objects superposed on a dark background, an informative map would mean that the pixels are easy classifiable into either a class object or background (nonobject). In other words, the probability density function of the pixel values would ideally be unimodal with a mode at the background value ( ) and the object constituting the distribution tail ( ), whereas a noninformative pixel map would result in a more uniform distribution of its pixel values. A well-adapted criterion to differentiate an informative from a noninformative pixel map is the entropy of the pixel map, which is under some general assumptions related to the more easily calculable normalized kurtosis of a map [14, 15]. The pixel map of choice is
where , and is the pixel intensity for a pixel belonging to the conspicuity map . In  the one map is found that contributes most to the activity at the most salient location looking back at the conspicuity maps. Examining the feature maps that gave rise to the conspicuity map with leads to the one that contributes most to its activity at the winning location. The winning feature map is segmented using region growing and adaptive thresholding. In Section 3.4, we compare this approach with our approach and show the improvement.
2.6. Active Contour
by moving through the spatial domain of an image. The total energy consists of two energies: internal and external . The internal energy preserves the smoothness and continuity. The external energy is derived from the information in the image . A typical energy function for a snake using image gradients is given as 
where , , and are positive coefficients. For a snake using the region information in the image based on Minimum Description Length (MDL) we have 
As the objects of interest are simple manufactured objects, we propose to add a constraint on the snake form. This allows a smooth segmentation of the image. The constrained form is ellipsoidal. Related works in literature already propose to add to the energy a penalty function increasing with the distance of the curve to an ellipse [17, 18]. We propose to estimate directly the ellipse parameters and not the control points of the curve as in , but allowing for a rotation of the ellipse as in . Using the parametric expression of ellipses as function of , we have
where is the ellipse center, and the half-length of the ellipse axes, and the angle between the x-axis and the major axis of the ellipse. As we add a constraint of form, we can eliminate, respectively, the internal energy and the internal force in (9) and (15). Then (15) becomes
, , , and are coefficients of ponderation controlling the speed of the active contour. The rotation around the center of the ellipse can be found by calculating the angular momentum for a solid object as with Newton's second law. For an object with a moment of inertia on which a torque is exercised, we have
In this section, we present first some criterion of performance and then we show the results obtained on real images.
The criterion expresses the segmentation quality:
3.2. Illustration of the Method on Real Image
3.3. Results on Different Images
In this section, we present the segmentation results on a set of underwater images. The images have been recorded with different conditions of depth, illumination, noise, acquisition system, camera, and so forth.
3.4. Comparison with Other Methods
In this section we compare our method with two other alternative approaches: the approach proposed by , also developed for the same images, and the approach proposed by  using the maximum of activity to select the feature map; see Section 2.5.
3.4.1. First Method
removing aliasing effect due to digital conversion of the images,
converting color space from RGB to YCbCr,
)correction of nonuniform illumination using homomorphic filtering,
anisotropic filtering to improve image segmentation,
adjusting image intensity,
converting from YCbCr to RGB,
equalizing color mean.
3.4.2. Second Method
3.5. Results Obtained during the Evaluation of the Project TOPVISION
The results next were obtained by our method applied on test videos for the project TOPVISION (OPerational Trials of UnderWater Videos for Identification of Harmfull Article) . Remark that these videos were never used during the development of the method. The project is composed of 4 steps.
For the detection step, we only try to detect if an object is present in the image. We use the output of the LDA classifier for the detection step; see Section 2.3. For the position finding, the position is correct if we are inside the object. We are using the maximum of the saliency map as position; Section 2.4. Only the two first steps have been evaluated on 11 videos ( 20000 images), and we have obtained 64.77% of good object detection (and 2.82% of false alarms) and 85.28% of positon finding. The method presented in  has been tested on the same images for the detection step and obtains 56.55% of good object detection and 0.79% of false alarms. The results are more robust in terms of false alarm but the percentage of detection is inferior. The method uses the detection of geometric forms to determine the object presence. This needs more computation than our method because, after the preprocessing step (see Section 3.4), it extracts lines and circles.
3.6. Limits of the Method
In this paper we have presented a fully automated method to detect and segment manufactured objects in underwater images. The method uses two well-known approaches with some original adaptations: the visual attention scheme, and the active contours. We have described the successive steps of the method and presented some results on real images. The method presents good performance even with noisy images and is robust to different acquisition conditions (illumination, camera settings, shadow, etc). The limitations of the approach have also been evoked. Essentially to ensure good results the object to segment must be uniform and sufficiently contrasted with respect to the background. For the future, we are developing a method to track these objects in videos based on the conspicuity map and particle filtering. Currently, we can track the object using the estimated snake in the previous frame as initialization in the actual frame.
The authors would like to thank V. Zarzoso for the interesting discussions on this work.
- Bazeille S, Jaulin L, Quidu I: Identification of underwater manmade object using a colour criterion. Proceedings of the 4th International Conference on Bio-Acoustics, April 2007, Loughborough, Uk, Proceedings of the Institute of AcousticsGoogle Scholar
- Edgington D, Walther D, Cline D, Sherlock R, Koch C: Detecting and tracking animals in underwater video using a neuromorphic saliency-based attention system. Proceedings of the American Society of Limnology and Oceanography (ASLO) Summer Meeting, June 2004, Savannah, Ga, USAGoogle Scholar
- Olmos A, Trucco E: Detecting man-made objects in unconstrained subsea videos. Proceedings of the British Machine Vision Conference, 2002 517-526.Google Scholar
- Walther D, Edgington DR, Koch C: Detection and tracking of objects in underwater video. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 1: 544-549.Google Scholar
- Schechner YY, Karpel N: Clear underwater vision. Proceedings of the IEEE Computer Society Conference on Vision and Pattern Recognition, 2004 1: 536-543.Google Scholar
- Bazeille S, Quidu I, Jaulin L, Malkasse J-P: Automatic underwater image pre-preprocessing. Proceedings of the SEA TECH WEEK Caracterisation du Milieu Marin, October 2006, Brest, FranceGoogle Scholar
- Trucco E, Olmos-Antillon AT: Self-tuning underwater image restoration. IEEE Journal of Oceanic Engineering 2006, 31(2):511-519. 10.1109/JOE.2004.836395View ArticleGoogle Scholar
- Itti L, Koch C, Niebur E: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20(11):1254-1259. 10.1109/34.730558View ArticleGoogle Scholar
- Fukunaga K: Introduction to Statistical Pattern Recognition. Academic Press, San Diego, Calif, USA; 1990.MATHGoogle Scholar
- Kass M, Witkin A, Terzopoulos D: Snakes: active contour models. International Journal of Computer Vision 1988, 1(4):321-331. 10.1007/BF00133570View ArticleMATHGoogle Scholar
- Xu C, Prince J: Gradient vector flow: a new external force for snakes. Proceedings of the Conference on Computer Vision and Pattern Recognition, June 1997, San Juan, Puerto Rico, USA 66-71.Google Scholar
- Zhu SC, Yuille A: Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1996, 18(9):884-900. 10.1109/34.537343View ArticleGoogle Scholar
- Sethian JA: Evolution, implementation, and application of level set and fast marching methods for advancing fronts. Journal of Computational Physics 2001, 169(2):503-555. 10.1006/jcph.2000.6657MathSciNetView ArticleMATHGoogle Scholar
- Girolami M, Fyfe C: Negentropy and kurtosis as projection pursuit indices provide generalised ica algorithms. Advances in Neural Information Processing Systems 1996.Google Scholar
- Friedman JH, Tukey JW: A projection pursuit algorithm for exploratory data analyis. IEEE Transactions on Computers 1974, 23(9):881-890.View ArticleMATHGoogle Scholar
- Walther D, Rutishauser U, Koch C, Perona P: Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Computer Vision and Image Understanding 2005, 100(1-2):41-63. 10.1016/j.cviu.2004.09.004View ArticleGoogle Scholar
- Ray N, Acton ST, Ley K: Tracking leukocytes in vivo with shape and size constrained active contours. IEEE Transactions on Medical Imaging 2002, 21(10):1222-1235. 10.1109/TMI.2002.806291View ArticleGoogle Scholar
- Pluempitiwiriyawej C, Moura JMF, Wu Y-JL, Kanno S, Ho C: Stochastic active contour for cardiac mr image segmentation. Proceedings of the International Conference on Image Processing (ICIP '03), September 2003 2: 1097-1100.Google Scholar
- Zugaj D, Lattuati V: Contours actifs: suivi de structures linéiques déformables. Extension à la résolution de problèmes d'optimisation sous contraintes. Proceedings of the Congrès GRETSI 97: Seizième Colloque sur le Traitement du Signal et des Images, 1997, Grenoble, France 16: 407-410.Google Scholar
- GESMA http://topvision.gesma.fr/articles.php?lng=en%26pg=6
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.