Open Access

Weakly supervised object extraction with iterative contour prior for remote sensing images

EURASIP Journal on Advances in Signal Processing20132013:19

Received: 28 December 2011

Accepted: 5 November 2012

Published: 13 February 2013


This article presents a weakly supervised approach based on Markov random field model for the extraction of objects (e.g., aircrafts) in optical remote sensing images. This approach is capable of localizing and then segmenting objects in optical remote sensing images by relying only on several object samples without artificial labels. However, unlike direct combinations of object detection and segmentation, the proposed method develops a contour prior model based on detection results, thereby improving segmentation performance. Furthermore, we iteratively update the contour prior information based on the expectation-maximization algorithm. Numerical experiments illustrate that the proposed method can successfully be applied to the extraction of aircrafts in optical remote sensing images.

1 Introduction

Object detection and segmentation have received considerable attention as important procedures in automatic object identification in such fields as computer vision, remote sensing image processing, and so on. Based on the large number of works to which object detection and segmentation have been applied, a key distinction between these two methods can be found; object segmentation is usually interactive and incorporates guidance from the user throughout the analysis process, such as in GraphCut [1] and Snake [2], whereas object detection needs learning samples and/or supervising information from the user at the beginning of the analysis, such as in latent support vector machine LSVMs [3], Wu et al.’s Active Basis [4]. Nevertheless, object detection and object segmentation share numerous theoretical and methodological features, which if explored will be of benefit to each other. In this article, object detection results based on Active Basis [4] are developed to replace supervised learning samples in object segmentation. Object segmentation results can be obtained by providing several object samples. Furthermore, this combination employs a contour prior model based on the detection results, thereby improving the segmentation performance.

From a methodological perspective, the main idea of numerous methods that have recently been used for object detection and segmentation can be divided into shape-based methods and feature-based methods. Shape-based methods, such as Felzenszwalb et al.’s LSVMs [3], Wu et al.’s Active Basis [4], Laptev et al.’s Snake [2], and Ferrari’s kAS [5], exploit shapes similarities between objects by using different strategies and then obtain segmentation results through by connecting the segments. Shape-based methods are completed automatically without the need for human assistance. However, these methods hardly obtain segmentation results. Feature-based methods such as Cheng et al.’s [6] hierarchical lane detection system, Hassaballah et al.’s [7] independent components analysis, Borenstein and Ullman’s [8] top-down bottom-up segmentation, Weisenssel et al.’s [9] Markov random field (MRF) model-based method, and Jia and Hong-qi’s [10] interactive segmentation based on graph cuts, utilize different representations (color or texture features, distribution model) of image pixels or regions to distinguish objects from the background. However, these methods all require human assistance or strong supervised information.

Based on Active Basis [4], we developed detection results by using Morph-ActiveBasis [11] and thus proposed a contour prior model to improve segmentation performance, which will be detailed in Section 3. This article combines object detection (Morph-ActiveBasis [11]) and object segmentation (MRF) and proposes a contour prior model by using the above combination to improve segmentation performance.

2 Morph-ActiveBasis: from fragments to rough contours

Morph-ActiveBasis, which is presented in [11], is based on Active Basis. Morph-ActiveBasis can determine the basic edge contours of objects from a set of object samples without the need for artificial labels and can detect similar objects in given images. Unlike the scattered fragments obtained by using the original Active Basis, Morph-ActiveBasis employs fragment connection to link scattered fragments thus forming a sketch of the object contour (for details, see [11]).

2.1 Fragment detection

The Active Basis [4] model is utilized to detect the basic edge contours of objects. Active Basis represents contours through a set of Gabor Wavelet bases [12, 13]. Moreover, Active Basis does not require human guidance and can automatically detect objects in the image. However, the detected results are only scattered fragments that could not represent the integral contour of object. Therefore, we propose the use of fragment connection to link the scattered fragments, thus forming a sketch of the object contour.

2.2 Fragment connection

The principle of fragment connection in [11] is based on the structure information among fragments.

2.3 Rough contours extraction

The Morph-ActiveBasis detection algorithm estimates the Gaussian distribution models of objects and backgrounds by using contour sketches, and then segments the contours by utilizing GraphCut segmentation [14] algorithm. Given an image I = { s 1 , s 2 , , s N I } , s i is the i th pixel in the image. The segmentation result is that each one of N I pixels is assigned to a label y i {0,1}, where 0 and 1 represent objects and backgrounds, respectively, thus the segmentation yields result Y = { y 1 , y 2 , , y N I } .
Y ¯ = arg max Y { P ( Y | I , Θ ) }
P ( Y | I , Θ ) i = 1 N I p L i ( y i | s i , Θ ) p N I ( y i | y N I )

Here, Θ is the parameter of the Gaussian distribution Θ={μ,σ}, and μ,σ, respectively, represent the mean and variance of Gaussian distribution. The likelihood probability p L i can be obtained by using Gaussian distribution model of objects and background, as shown in Equation (6). P N I denotes the Potts model.

3 Expectation-maximization (EM) contour MRF: from rough contours to further segmentation

In this section, we present a contour prior model of objects to improve segmentation performance under the MRF model framework. The idea of this contour prior model is to assign pixels (that are located inside an object contour) with a higher probability to become object. By contrast, pixels outside the contour are assigned a higher probability to be background. The probability of the contour prior is based on the distance between each pixel and its nearest contour point.

3.1 Contour prior information based on rough contours

As shown in Figure 1a, Ω is the initial contour of the object, s 01,s 02,s 11,s 12 are the four arbitrary pixels in the image (s 01,s 02 are located inside Ω, whereas s 11,s 12 are outside Ω) and d 01,d 02,d 11,d 12 denote the distances from s 01,s 02,s 11,s 12 to Ω, respectively. Inside the contour, a pixel that is located farther from the contour is more likely to become an object. However, outside the contour, a pixel that is farther from the contour has higher probability of becoming a background. For instance, in Figure 1a, s 02 is more likely to become an object as compared with s 01 because d 01<d 02, whereas s 12 is more likely to become a background as compared with s 11 because d 11<d 12.
Figure 1

Sketch map of contour prior. (a) Distance from point to contour; (b) probability map of shape prior.

Figure 1b shows the sketch map of the shape prior probability p s i . The data shown in Figure 1 can mathematically be expressed by Equations (3)–(5) such as in [15, 16]. Prior knowledge on the background and the object is shown in Equations (3) and (4), respectively. Equation (5) gives the distance of the pixel to the contour, where μ is a constant that indicates the distance coefficient s i g n(s i ,Ω)=1 when s i is inside Ω and s i g n(s i ,Ω)=−1 when s i is outside Ω; and l o c(s) is the coordinate position of pixel s in the image.
p s i ( y i = 0 | Ω ) 1 1 + exp ( μ dist ( s i , Ω ) )
p s i ( y i = 1 | Ω ) 1 1 1 + exp ( μ dist ( s i , Ω ) )
dist ( s i , Ω ) = sign ( s i , Ω ) min s Ω loc ( s i ) loc ( s )

3.2 Segmentation based on the MRF model and on EM iteration

To obtain better segmentation results from the contour prior information, we combine the EM algorithm [17] with GraphCut optimization. We apply the segmentation result to update the parameters of the Gaussian distribution model and then utilize the new distribution parameters to initialize GraphCut optimization. The E- and M-steps are as follows:

E-step: The likelihood probability of pixels to class y i is computed based on the current distribution parameters of Gaussian distribution Θ (t)={μ (t),σ (t)}, where μ,σ, respectively, represent the mean and variance of Gaussian distribution and express as follows:
p L i ( y i | s i , Θ ( t ) ) p ( s i | y i , Θ ( t ) ) c = 0 1 p ( s i | y i = c , Θ ( t ) )
M-step: Under the framework of the MRF model, we denote each pixel as a node in the MRF model and describe its likelihood probability with p L i . The relationship between pixels (nodes) is represented by the combination of p s i and p N I , which is a pairwise probability in MRF. The probability p L i which is obtained in the E-step is utilized to combined with the contour prior p s i of the object and the prior probability P N I of the Potts model to obtain the MRF model in Equations (7) and (8). The GraphCut optimization algorithm is then applied to solve Equations (7) and (8), thus obtaining the new labeling data Y. The new Gaussian distribution parameters Θ (t+1)={μ (t+1),σ (t+1)} of objects and background are computed, and the shape Ω (t+1) of the object is updated. The EM algorithm continuously repeats the E- and the M-steps until the convergence condition is satisfied.
Y ¯ = arg max Y { P ( Y | I ) P ( I / Ω ( t 1 ) ) P ( Y ) }
P ( Y ( t ) | I , Ω t 1 , Θ t 1 ) = exp i I αf s i | y i , Θ t 1 + βf y i | d s i , Ω t 1 + γ j C i δ y i , y j

The right-hand side of Equation (8) is consisted of three parts which, respectively, represent p L i , p s i , and P N I . The first two parts, respectively, represent Gaussian probability and contour, the last one is the Potts prior model. C i is the cliques of pixel s i . We use eight neighborhood cliques. Then we defined the potential or energy function for these cliques through Gibbs distribution which is computed by Potts prior model. d(s i ,Ω t−1) is the distance from s i to the contour Ω t−1, α,β,γ are the normalized parameters.

4 Flow of the proposed algorithm

The proposed weakly supervised object extraction algorithm aims at achieving improved object fragment detection and segmentation results with minimal supervision. As shown in Figure 2, the algorithm first utilizes fragment connection and GraphCut segmentation to obtain the initial contour Ω 0 of the object based on the Active Basis model. The algorithm then constructs the MRF on the initial contour of the object and adopts an iterative optimization approach combined with the EM algorithm to achieve improved object segmentation results. The algorithm process is shown as below:
Figure 2

Framework of Morph-ActiveBasis + EMContourMRF algorithm.


Use Morph-ActiveBasis method to obtain the initial contour Ω 0 of objects

Step 1: E-Step: a: Use (3)–(5) to compute for the shape prior p s i of each pixel in the testing image based on the current contour Ω t of the object; b: Compute the likelihood p(s i |y i ,Θ) and posterior probability p L i of pixels according to the current distribution parameters of the objects and background, as shown in Equation (6);

Step 2: M-Step: a: Use of GraphCut optimization algorithm to solve (7)–(8) to obtain the new labeling data based on (3)–(6); b: Update object’s shape Ω (t+1) as well as Gaussian distribution parameters Θ (t+1);

Termination: Repeat 2 and 3 until the change in the segmented object regions is less than a certain threshold, that is, the iterative convergence.


5 Experimental results and analysis

5.1 Experimental data

During the experiment, two new and significantly extended databases have been formed through capturing screen shot images from Google earth. One database contains 100 small images with the size 128×128 obtained from the Beijing Capital International Airport; the other database consists of 10 big images with the size 5000×5000 obtained from Atlanta and other airports in the world. To obtain clearer images of the aircrafts, eye altitude is set to lower than 500 m when using Google Earth to obtain screenshots. Aircraft targets from the airport are selected as experimental targets from which the training and testing image samples are taken. At least one image is needed for training. We use ten images for training in the experiments.

5.2 Experiment setting

Comparisons are performed by using the conditional random field (CRF) [18] with RCC prior. CRF needs artificial labels to start segmentation. For the same experimental setting, we used the initial object pixels and background pixels (in Step 2, as shown in Figure 2) as the input of CRF.

To evaluate quantitatively the performance of proposed method in terms of object segmentation, measures of completeness and correctness are adopted which are similar to recall and precision in image retrieval [19], in which completeness denotes the ratio of correctly segmented object pixels to the sum of the true object pixels, whereas correctness denotes the ratio of correctly segmented object pixels to the sum of pixels in the object segmentation results. True object pixels are manually labeled in the original image. The detection rate and error detection rate are also employed to assess target detection performance, where the former indicates the ratio of the number of detected target pixels to the total number of targets whereas the latter stands for the number of wrongly detected target pixels to the total number of targets.

5.3 Results

5.3.1 Experiment 1

We first test the proposed method on 100 small images taken from Beijing International Airport (size: 128×128, each contains a single aircraft). Figure 3 shows ten samples of the experimental results. As shown in Figure 3, contours are relatively well segmented. The proposed Morph-ActiveBasis + EMContourMRF algorithm is compared with the CRF method. Table 1 shows the segmentation accuracy. The completeness of Morph-ActiveBasis is higher than that of CRF, but its correctness is significantly lower. Based on Morph-ActiveBasis, EM Contour MRF not only enhances the contour of objects, but also significantly improves the segmentation accuracy.
Figure 3

Experimental results of single-target detection. Top, the original; Middle, results of the CRF method; Bottom, results of proposed method.

Table 1

Completeness and correctness of object segmentation in the images taken from the Beijing Capital International Airport with a single aircraft


Completeness (%)

Correctness (%)




Morph-ActiveBasis + EMContourMRF



5.3.2 Experiment 2

The second experiment involved the extraction of multiple targets from images taken from Atlanta Airport (size: 5000×5000, each contains more than one aircraft) as shown in Figure 4. Figure 5 displays the results. More aircrafts have correctly been detected by using Morph-ActiveBasis method than CRF approach. Table 2 shows the numbered bounding box, detection rate, and the error detection rate. Figure 5 illustrates some of the corresponding segmented results. The completeness and correctness of segmentation of each image are shown in Table 3, and the results of all the ten images are shown in Table 4.
Figure 4

Experimental images for multi-target detection. Top, Atlanta Airport and Charles de Gaulle Airport; Bottom, Ground Truth.

Figure 5

Extraction results and some segmentation results of multi-aircrafts: (a) CRF approach and (b) Morph-ActiveBasis + EMContourMRF algorithm.

Table 2

Detection rate and error detection rate of aircraft detection in the Atlanta Airport and Charles de Gaulle Airport


Detection rate (%)

Error detection rate







Table 3

Completeness and correctness of aircraft detection in the Atlanta Airport and Charles de Gaulle Airport


Completeness (%)

Correctness (%)




Morph-ActiveBasis + EMContourMRF



Table 4

Completeness and correctness of aircraft detection in ten international airports


Completeness (%)

Correctness (%)




Morph-ActiveBasis + EMContourMRF



Although the completeness of the propose method is lower than that of CRF, the former significantly improves correctness. We can learn that the weakly supervised MRF model with iterative shape prior can also achieve satisfactory performance in multi-target extraction.

6 Conclusion

An automatic object extraction algorithm for optical remote sensing images is proposed in this article. First, the Active Basis algorithm was used to detect the fragment of object contour and to obtain the initial object contour by using fragment connection based on morphology. The Gaussian mixture models of objects and background were then built under the Bayesian framework and the prior information on object shapes was introduced. Finally, EM iteration and GraphCut optimization were combined for object segmentation. Our algorithm has the following advantages: (1) Morph-ActiveBasis needs only multiple images containing the objects or simple hand-drawn sketches of objects without other prior instructions. Less human intervention is required by the proposed method compared with other object segmentation methods that are based on feature classification; (2) Since EMContourMRF overcomes the problem of unorganized segmentation results, which occurs in general models that have only neighborhood prior, because EMContourMRF introduces prior information on object shapes and uses contours to constrain the segmentation process. Thus, EMContourMRF obtains better contours of objects; (3) EMContourMRF combines EM iteration and GraphCut optimization to estimate and to optimize the distribution models of objects and backgrounds repeatedly. EMContourMRF has relatively higher segmentation precision than object detection methods that are based on shape fragments.

The accuracy of object detection and segmentation is significantly affected by shadows. Thus, the removal of shadows is the focus of our follow-up work. Moreover, with the development of network information, automatic acquisition of images that contain more objects to construct a complete training database is a key step in our follow-up work to achieve better segmentation results independently.



This study was supported by the National Basic Research Program of China (973 Program) under Grant No. 2013CB733404, NSFC grant (Nos. 60702041, 41174120, 41021061), the China Postdoctoral Science Foundation funded project, and the LIESMARS Special Research Funding.

Authors’ Affiliations

School of Electronic Information, Wuhan University
The State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University
Institut Telecom, Telecom ParisTech, LTCI


  1. Rother C, Kolmogorov V, Blake AM: Grabcut—interactive foreground extraction using iterated graph cuts. in Proceedings of the ACM Siggraph 2004, 23(3):309-314. 10.1145/1015706.1015720View ArticleGoogle Scholar
  2. Laptev I, Mayer H, Lindeberg T, Eckstein W, Steger C, Baumgartner A: Automatic extraction of roads from aerial images based on scale space and snakes. Mach. Vis. Appl 2000, 12(12):23-31.View ArticleGoogle Scholar
  3. Felzenszwalb P, McAllester D, Ramanan D: A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK; 2008:1-8.Google Scholar
  4. Wu YN, Si ZZ, Gong HF, Zhu SC: Learning active basis model for object detection and recognition. Int. J. Comput. Vis 2010, 90(2):198-235. 10.1007/s11263-009-0287-0MathSciNetView ArticleGoogle Scholar
  5. Ferrari V, Fevrier L, Jurie F, Schmid C: Groups of adjacent contour segments for object detection. IEEE Trans. Pattern Anal. Mach. Intell 2008, 30(1):36-51.View ArticleGoogle Scholar
  6. Cheng HY, Yu CC, Tseng CC, Fan KC, Hwang JN, Jeng BS: Environment classification and hierarchical lane detection for structured and unstructured roads. Inst. Eng. Technol 2010, 4(1):37-49.Google Scholar
  7. Hassaballah M, Kanazawa T, Ido S: Efficient eye detection method based on grey intensity variance and independent components analysis. Inst. Eng. Technol 2010, 4(4):261-271.Google Scholar
  8. Borenstein E, Ullman S: Learning to segment. In European Conference on Computer Vision. Prague; 2004:315-328.Google Scholar
  9. Weisenssel RA, Karl WC, Castanon DA: Gregory J. Power, Phil Douville, Markov random field segmentation methods for SAR target chips. In Algorithms for Synthetic Aperture Radar Imagery. Orlando, FL; 1999:462-473.Google Scholar
  10. Jia L, Hong-qi W: A graph cuts based interactive image segmentation method. J. Electron. Inf. Technol 2008, 30(8):1973-1976.Google Scholar
  11. Qian C, He C, Deng XP, Sun H: Object contour detection in remote sensing image. In Multispectral Image Processing and Pattern Recognition. Yichang, China; 2009:1-6.Google Scholar
  12. Lee TS: Image representation using 2D Gabor wavelets. IEEE Trans. Pattern Anal. Mach. Intell 1996, 8(10):959-971.Google Scholar
  13. Lyons M, Akamatsu S, Kamachi M, Gyoba J: Coding facial expressions with Gabor wavelets. In Proceedings of the Conference on Automatic Face and Gesture Recognition. Nara; 1998:200-205.View ArticleGoogle Scholar
  14. Boykov Y, Veksler O, Zabih R: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell 2001, 23(11):1222-1239. 10.1109/34.969114View ArticleGoogle Scholar
  15. Kumar MP, Torr PHS, Zisserman A: OBJ CUT. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA; 2005:18-25.Google Scholar
  16. Kumar MP, Torr PHS, Zisserman A: OBJ CUT: efficient segmentation using top-down and bottom-up cues. IEEE Trans. Pattern Anal. Mach. Intell 2009, 32: 530-545.View ArticleGoogle Scholar
  17. Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1997, 39(1):1-38.MathSciNetGoogle Scholar
  18. Inglda J, Michel J: Qualitative spatial reasoning for higt-resolution remote sensing analysis. IEEE Trans. Geosci. Remote Sens 2009, 47(2):599-612.View ArticleGoogle Scholar
  19. Lu LZ: Study on Remote Sensing Image-base Content Retrieval Based on Database Model. Beijing: China Meteorological Press; 2005.Google Scholar


© He et al.; licensee Springer. 2013

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.