# Weakly supervised object extraction with iterative contour prior for remote sensing images

- Chu He
^{1, 2}Email author, - Yu Zhang
^{1}, - Bo Shi
^{1}, - Xin Su
^{3}, - Xin Xu
^{1}and - Mingsheng Liao
^{2}

**2013**:19

https://doi.org/10.1186/1687-6180-2013-19

© He et al.; licensee Springer. 2013

**Received: **28 December 2011

**Accepted: **5 November 2012

**Published: **13 February 2013

## Abstract

This article presents a weakly supervised approach based on Markov random field model for the extraction of objects (e.g., aircrafts) in optical remote sensing images. This approach is capable of localizing and then segmenting objects in optical remote sensing images by relying only on several object samples without artificial labels. However, unlike direct combinations of object detection and segmentation, the proposed method develops a contour prior model based on detection results, thereby improving segmentation performance. Furthermore, we iteratively update the contour prior information based on the expectation-maximization algorithm. Numerical experiments illustrate that the proposed method can successfully be applied to the extraction of aircrafts in optical remote sensing images.

## 1 Introduction

Object detection and segmentation have received considerable attention as important procedures in automatic object identification in such fields as computer vision, remote sensing image processing, and so on. Based on the large number of works to which object detection and segmentation have been applied, a key distinction between these two methods can be found; object segmentation is usually interactive and incorporates guidance from the user throughout the analysis process, such as in GraphCut [1] and Snake [2], whereas object detection needs learning samples and/or supervising information from the user at the beginning of the analysis, such as in latent support vector machine LSVMs [3], Wu et al.’s Active Basis [4]. Nevertheless, object detection and object segmentation share numerous theoretical and methodological features, which if explored will be of benefit to each other. In this article, object detection results based on Active Basis [4] are developed to replace supervised learning samples in object segmentation. Object segmentation results can be obtained by providing several object samples. Furthermore, this combination employs a contour prior model based on the detection results, thereby improving the segmentation performance.

From a methodological perspective, the main idea of numerous methods that have recently been used for object detection and segmentation can be divided into shape-based methods and feature-based methods. Shape-based methods, such as Felzenszwalb et al.’s LSVMs [3], Wu et al.’s Active Basis [4], Laptev et al.’s Snake [2], and Ferrari’s kAS [5], exploit shapes similarities between objects by using different strategies and then obtain segmentation results through by connecting the segments. Shape-based methods are completed automatically without the need for human assistance. However, these methods hardly obtain segmentation results. Feature-based methods such as Cheng et al.’s [6] hierarchical lane detection system, Hassaballah et al.’s [7] independent components analysis, Borenstein and Ullman’s [8] top-down bottom-up segmentation, Weisenssel et al.’s [9] Markov random field (MRF) model-based method, and Jia and Hong-qi’s [10] interactive segmentation based on graph cuts, utilize different representations (color or texture features, distribution model) of image pixels or regions to distinguish objects from the background. However, these methods all require human assistance or strong supervised information.

Based on Active Basis [4], we developed detection results by using Morph-ActiveBasis [11] and thus proposed a contour prior model to improve segmentation performance, which will be detailed in Section 3. This article combines object detection (Morph-ActiveBasis [11]) and object segmentation (MRF) and proposes a contour prior model by using the above combination to improve segmentation performance.

## 2 Morph-ActiveBasis: from fragments to rough contours

Morph-ActiveBasis, which is presented in [11], is based on Active Basis. Morph-ActiveBasis can determine the basic edge contours of objects from a set of object samples without the need for artificial labels and can detect similar objects in given images. Unlike the scattered fragments obtained by using the original Active Basis, Morph-ActiveBasis employs fragment connection to link scattered fragments thus forming a sketch of the object contour (for details, see [11]).

### 2.1 Fragment detection

The Active Basis [4] model is utilized to detect the basic edge contours of objects. Active Basis represents contours through a set of Gabor Wavelet bases [12, 13]. Moreover, Active Basis does not require human guidance and can automatically detect objects in the image. However, the detected results are only scattered fragments that could not represent the integral contour of object. Therefore, we propose the use of fragment connection to link the scattered fragments, thus forming a sketch of the object contour.

### 2.2 Fragment connection

The principle of fragment connection in [11] is based on the structure information among fragments.

### 2.3 Rough contours extraction

*s*

_{ i }is the

*i*th pixel in the image. The segmentation result is that each one of

*N*

_{ I }pixels is assigned to a label

*y*

_{ i }∈{0,1}, where 0 and 1 represent objects and backgrounds, respectively, thus the segmentation yields result $Y=\{{y}_{1},{y}_{2},\dots ,{y}_{{N}_{I}}\}$.

Here, *Θ* is the parameter of the Gaussian distribution *Θ*={*μ*,*σ*}, and *μ*,*σ*, respectively, represent the mean and variance of Gaussian distribution. The likelihood probability ${p}_{{L}_{i}}$ can be obtained by using Gaussian distribution model of objects and background, as shown in Equation (6). ${P}_{{N}_{I}}$ denotes the Potts model.

## 3 Expectation-maximization (EM) contour MRF: from rough contours to further segmentation

In this section, we present a contour prior model of objects to improve segmentation performance under the MRF model framework. The idea of this contour prior model is to assign pixels (that are located inside an object contour) with a higher probability to become *object*. By contrast, pixels outside the contour are assigned a higher probability to be *background*. The probability of the contour prior is based on the distance between each pixel and its nearest contour point.

### 3.1 Contour prior information based on rough contours

*Ω*is the initial contour of the object,

*s*

_{01},

*s*

_{02},

*s*

_{11},

*s*

_{12}are the four arbitrary pixels in the image (

*s*

_{01},

*s*

_{02}are located inside

*Ω*, whereas

*s*

_{11},

*s*

_{12}are outside

*Ω*) and

*d*

_{01},

*d*

_{02},

*d*

_{11},

*d*

_{12}denote the distances from

*s*

_{01},

*s*

_{02},

*s*

_{11},

*s*

_{12}to

*Ω*, respectively. Inside the contour, a pixel that is located farther from the contour is more likely to become an object. However, outside the contour, a pixel that is farther from the contour has higher probability of becoming a background. For instance, in Figure 1a,

*s*

_{02}is more likely to become an object as compared with

*s*

_{01}because

*d*

_{01}<

*d*

_{02}, whereas

*s*

_{12}is more likely to become a background as compared with

*s*

_{11}because

*d*

_{11}<

*d*

_{12}.

*μ*is a constant that indicates the distance coefficient

*s*

*i*

*g*

*n*(

*s*

_{ i },

*Ω*)=1 when

*s*

_{ i }is inside

*Ω*and

*s*

*i*

*g*

*n*(

*s*

_{ i },

*Ω*)=−1 when

*s*

_{ i }is outside

*Ω*; and

*l*

*o*

*c*(

*s*) is the coordinate position of pixel

*s*in the image.

### 3.2 Segmentation based on the MRF model and on EM iteration

To obtain better segmentation results from the contour prior information, we combine the EM algorithm [17] with GraphCut optimization. We apply the segmentation result to update the parameters of the Gaussian distribution model and then utilize the new distribution parameters to initialize GraphCut optimization. The E- and M-steps are as follows:

*y*

_{ i }is computed based on the current distribution parameters of Gaussian distribution

*Θ*

^{(t)}={

*μ*

^{(t)},

*σ*

^{(t)}}, where

*μ*,

*σ*, respectively, represent the mean and variance of Gaussian distribution and express as follows:

*p*

_{ s i }and

*p*

_{ N I }, which is a pairwise probability in MRF. The probability ${p}_{{L}_{i}}$ which is obtained in the E-step is utilized to combined with the contour prior ${p}_{{s}_{i}}$ of the object and the prior probability ${P}_{{N}_{I}}$ of the Potts model to obtain the MRF model in Equations (7) and (8). The GraphCut optimization algorithm is then applied to solve Equations (7) and (8), thus obtaining the new labeling data

*Y*. The new Gaussian distribution parameters

*Θ*

^{(t+1)}={

*μ*

^{(t+1)},

*σ*

^{(t+1)}} of objects and background are computed, and the shape

*Ω*

^{(t+1)}of the object is updated. The EM algorithm continuously repeats the E- and the M-steps until the convergence condition is satisfied.

The right-hand side of Equation (8) is consisted of three parts which, respectively, represent ${p}_{{L}_{i}}$, ${p}_{{s}_{i}}$, and ${P}_{{N}_{I}}$. The first two parts, respectively, represent Gaussian probability and contour, the last one is the Potts prior model. *C*
_{
i
} is the cliques of pixel *s*
_{
i
}. We use eight neighborhood cliques. Then we defined the potential or energy function for these cliques through Gibbs distribution which is computed by Potts prior model. *d*(*s*
_{
i
},*Ω*
^{
t−1}) is the distance from *s*
_{
i
} to the contour *Ω*
^{
t−1}, *α*,*β*,*γ* are the normalized parameters.

## 4 Flow of the proposed algorithm

*Ω*

^{0}of the object based on the Active Basis model. The algorithm then constructs the MRF on the initial contour of the object and adopts an iterative optimization approach combined with the EM algorithm to achieve improved object segmentation results. The algorithm process is shown as below:

### Initialization

Use Morph-ActiveBasis method to obtain the initial contour *Ω*
^{0} of objects

**Step 1:** E-Step: a: Use (3)–(5) to compute for the shape prior ${p}_{{s}_{i}}$ of each pixel in the testing image based on the current contour *Ω*
^{
t
} of the object; b: Compute the likelihood *p*(*s*
_{
i
}|*y*
_{
i
},*Θ*) and posterior probability ${p}_{{L}_{i}}$ of pixels according to the current distribution parameters of the objects and background, as shown in Equation (6);

**Step 2:** M-Step: a: Use of GraphCut optimization algorithm to solve (7)–(8) to obtain the new labeling data based on (3)–(6); b: Update object’s shape *Ω*
^{(t+1)} as well as Gaussian distribution parameters *Θ*
^{(t+1)};

**Termination:** Repeat 2 and 3 until the change in the segmented object regions is less than a certain threshold, that is, the iterative convergence.

**end**

## 5 Experimental results and analysis

### 5.1 Experimental data

During the experiment, two new and significantly extended databases have been formed through capturing screen shot images from Google earth. One database contains 100 small images with the size 128×128 obtained from the Beijing Capital International Airport; the other database consists of 10 big images with the size 5000×5000 obtained from Atlanta and other airports in the world. To obtain clearer images of the aircrafts, eye altitude is set to lower than 500 m when using Google Earth to obtain screenshots. Aircraft targets from the airport are selected as experimental targets from which the training and testing image samples are taken. At least one image is needed for training. We use ten images for training in the experiments.

### 5.2 Experiment setting

Comparisons are performed by using the conditional random field (CRF) [18] with RCC prior. CRF needs artificial labels to start segmentation. For the same experimental setting, we used the initial object pixels and background pixels (in Step 2, as shown in Figure 2) as the input of CRF.

To evaluate quantitatively the performance of proposed method in terms of object segmentation, measures of completeness and correctness are adopted which are similar to recall and precision in image retrieval [19], in which completeness denotes the ratio of correctly segmented object pixels to the sum of the true object pixels, whereas correctness denotes the ratio of correctly segmented object pixels to the sum of pixels in the object segmentation results. True object pixels are manually labeled in the original image. The detection rate and error detection rate are also employed to assess target detection performance, where the former indicates the ratio of the number of detected target pixels to the total number of targets whereas the latter stands for the number of wrongly detected target pixels to the total number of targets.

### 5.3 Results

#### 5.3.1 Experiment 1

**Completeness and correctness of object segmentation in the images taken from the Beijing Capital International Airport with a single aircraft**

Methods | Completeness (%) | Correctness (%) |
---|---|---|

CRF | 72.14 | 88.29 |

Morph-ActiveBasis + EMContourMRF | 72.10 | 94.08 |

#### 5.3.2 Experiment 2

**Detection rate and error detection rate of aircraft detection in the Atlanta Airport and Charles de Gaulle Airport**

Image | Detection rate (%) | Error detection rate |
---|---|---|

Morph-ActiveBasis | 81.54 | 0 |

CRF | 61.15 | 0 |

**Completeness and correctness of aircraft detection in the Atlanta Airport and Charles de Gaulle Airport**

Methods | Completeness (%) | Correctness (%) |
---|---|---|

CRF | 81.17 | 76.3 |

Morph-ActiveBasis + EMContourMRF | 74.23 | 90.18 |

**Completeness and correctness of aircraft detection in ten international airports**

Methods | Completeness (%) | Correctness (%) |
---|---|---|

CRF | 90.26 | 72.91 |

Morph-ActiveBasis + EMContourMRF | 73.16 | 85.57 |

Although the completeness of the propose method is lower than that of CRF, the former significantly improves correctness. We can learn that the weakly supervised MRF model with iterative shape prior can also achieve satisfactory performance in multi-target extraction.

## 6 Conclusion

An automatic object extraction algorithm for optical remote sensing images is proposed in this article. First, the Active Basis algorithm was used to detect the fragment of object contour and to obtain the initial object contour by using fragment connection based on morphology. The Gaussian mixture models of objects and background were then built under the Bayesian framework and the prior information on object shapes was introduced. Finally, EM iteration and GraphCut optimization were combined for object segmentation. Our algorithm has the following advantages: (1) Morph-ActiveBasis needs only multiple images containing the objects or simple hand-drawn sketches of objects without other prior instructions. Less human intervention is required by the proposed method compared with other object segmentation methods that are based on feature classification; (2) Since EMContourMRF overcomes the problem of unorganized segmentation results, which occurs in general models that have only neighborhood prior, because EMContourMRF introduces prior information on object shapes and uses contours to constrain the segmentation process. Thus, EMContourMRF obtains better contours of objects; (3) EMContourMRF combines EM iteration and GraphCut optimization to estimate and to optimize the distribution models of objects and backgrounds repeatedly. EMContourMRF has relatively higher segmentation precision than object detection methods that are based on shape fragments.

The accuracy of object detection and segmentation is significantly affected by shadows. Thus, the removal of shadows is the focus of our follow-up work. Moreover, with the development of network information, automatic acquisition of images that contain more objects to construct a complete training database is a key step in our follow-up work to achieve better segmentation results independently.

## Declarations

### Acknowledgements

This study was supported by the National Basic Research Program of China (973 Program) under Grant No. 2013CB733404, NSFC grant (Nos. 60702041, 41174120, 41021061), the China Postdoctoral Science Foundation funded project, and the LIESMARS Special Research Funding.

## Authors’ Affiliations

## References

- Rother C, Kolmogorov V, Blake AM: Grabcut—interactive foreground extraction using iterated graph cuts.
*in Proceedings of the ACM Siggraph*2004, 23(3):309-314. 10.1145/1015706.1015720View ArticleGoogle Scholar - Laptev I, Mayer H, Lindeberg T, Eckstein W, Steger C, Baumgartner A: Automatic extraction of roads from aerial images based on scale space and snakes.
*Mach. Vis. Appl*2000, 12(12):23-31.View ArticleGoogle Scholar - Felzenszwalb P, McAllester D, Ramanan D: A discriminatively trained, multiscale, deformable part model. In
*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*. Anchorage, AK; 2008:1-8.Google Scholar - Wu YN, Si ZZ, Gong HF, Zhu SC: Learning active basis model for object detection and recognition.
*Int. J. Comput. Vis*2010, 90(2):198-235. 10.1007/s11263-009-0287-0MathSciNetView ArticleGoogle Scholar - Ferrari V, Fevrier L, Jurie F, Schmid C: Groups of adjacent contour segments for object detection.
*IEEE Trans. Pattern Anal. Mach. Intell*2008, 30(1):36-51.View ArticleGoogle Scholar - Cheng HY, Yu CC, Tseng CC, Fan KC, Hwang JN, Jeng BS: Environment classification and hierarchical lane detection for structured and unstructured roads.
*Inst. Eng. Technol*2010, 4(1):37-49.Google Scholar - Hassaballah M, Kanazawa T, Ido S: Efficient eye detection method based on grey intensity variance and independent components analysis.
*Inst. Eng. Technol*2010, 4(4):261-271.Google Scholar - Borenstein E, Ullman S: Learning to segment. In
*European Conference on Computer Vision*. Prague; 2004:315-328.Google Scholar - Weisenssel RA, Karl WC, Castanon DA: Gregory J. Power, Phil Douville, Markov random field segmentation methods for SAR target chips. In
*Algorithms for Synthetic Aperture Radar Imagery*. Orlando, FL; 1999:462-473.Google Scholar - Jia L, Hong-qi W: A graph cuts based interactive image segmentation method.
*J. Electron. Inf. Technol*2008, 30(8):1973-1976.Google Scholar - Qian C, He C, Deng XP, Sun H: Object contour detection in remote sensing image. In
*Multispectral Image Processing and Pattern Recognition*. Yichang, China; 2009:1-6.Google Scholar - Lee TS: Image representation using 2D Gabor wavelets.
*IEEE Trans. Pattern Anal. Mach. Intell*1996, 8(10):959-971.Google Scholar - Lyons M, Akamatsu S, Kamachi M, Gyoba J: Coding facial expressions with Gabor wavelets. In
*Proceedings of the Conference on Automatic Face and Gesture Recognition*. Nara; 1998:200-205.View ArticleGoogle Scholar - Boykov Y, Veksler O, Zabih R: Fast approximate energy minimization via graph cuts.
*IEEE Trans. Pattern Anal. Mach. Intell*2001, 23(11):1222-1239. 10.1109/34.969114View ArticleGoogle Scholar - Kumar MP, Torr PHS, Zisserman A: OBJ CUT. In
*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*. San Diego, CA, USA; 2005:18-25.Google Scholar - Kumar MP, Torr PHS, Zisserman A: OBJ CUT: efficient segmentation using top-down and bottom-up cues.
*IEEE Trans. Pattern Anal. Mach. Intell*2009, 32: 530-545.View ArticleGoogle Scholar - Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm.
*J. R. Stat. Soc. Ser. B*1997, 39(1):1-38.MathSciNetGoogle Scholar - Inglda J, Michel J: Qualitative spatial reasoning for higt-resolution remote sensing analysis.
*IEEE Trans. Geosci. Remote Sens*2009, 47(2):599-612.View ArticleGoogle Scholar - Lu LZ:
*Study on Remote Sensing Image-base Content Retrieval Based on Database Model*. Beijing: China Meteorological Press; 2005.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.