Building detection from urban SAR image using building characteristics and contextual information

Zhao, Lingjun; Zhou, Xiaoguang; Kuang, Gangyao

doi:10.1186/1687-6180-2013-56

Research
Open access
Published: 20 March 2013

Building detection from urban SAR image using building characteristics and contextual information

Lingjun Zhao¹,
Xiaoguang Zhou² &
Gangyao Kuang¹

EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 56 (2013) Cite this article

8637 Accesses
78 Citations
1 Altmetric
Metrics details

Abstract

With the urgent demand on urban synthetic aperture radar (SAR) image interpretation, this article deals with detecting buildings from a single high-resolution SAR image. Based on our previous work in building detection from SAR images, aiming at extracting buildings with their whole and accurate boundaries from the built-up area, a general framework using the marker-controlled watershed transform is introduced to combine both building characteristics and contextual information. First, the characteristics of the buildings and their surroundings are extracted as markers by the target detection techniques. Second, the edge strength image of the SAR image is computed using the ratio of exponentially weighted averages detector. The marker-controlled watershed transform is implemented with the markers and the edge strength image to segment buildings from the background. Finally, to remove false alarms, building features are considered. Especially, a shape analysis method, called direction correlation analysis, is designed to keep linear or L-shaped objects. We apply the proposed method to high-resolution SAR images of different scenes and the results validate that the new method is effective with high detection rate, low false-alarm rate, and good localization performance. Furthermore, comparison between the new method and our previous method reveals that introducing contextual information plays an important role in improve building detection performance.

1. Introduction

Synthetic Aperture Radar (SAR) is an active microwave sensor, making it capable of acquiring high-resolution imagery independent of daytime and weather conditions. It has played a key role in the field of Earth remote sensing. Recently, one important issue of SAR image interpretation is urban environment analysis [1]. The acquisition of more and more high-resolution SAR data (meter-resolution spaceborne data like TerraSAR-X and Cosmo-SkyMed images, and decimeter-resolution airborne SAR images) over urban areas results in an urgent demand on interpretation methods for such images.

Buildings are the dominant structures in urban environment. Various methods of building extraction from SAR images have been presented in literature. Chellappa [2] used constant false-alarm rate (CFAR) processing and the Hough transform to detect bright and L-shaped streaks of potential buildings, and then applied supervised maximum likelihood (ML) segmentation to find shadow regions down-range from the potential buildings. Tupin et al. [3] proposed a line detector for extracting the bright linear features from SAR images. The detected bright lines can further be selected as the features corresponding to the partial footprint of the building on the ground [4, 5]. In radargrammetric frameworks, linear or L-shaped lines are exploited in stereoscopic structure extraction for building recognition [6–8]. Xu and Jin [9] used a CFAR edge detector and a Hough transform technique for parallel line segment pairs to extract parallelogram-like image of the building walls in SAR images. Hill et al. [10] developed an active contour approach to extract building shadows, which can be used for estimating building dimensions [11]. Bolter and Leverl [12] used a rotating mask to reconstruct building walls from multiple view slant range shadows. More recently, as SAR resolution improves, great attention has been posed to extract more detailed and accurate information about buildings. Michaelsen et al. [13], Soergel et al. [14, 15] used principles from perceptual grouping to detect building features such as long thin roof edge lines, groups of salient point scatterers, and symmetric configurations from SAR images with resolutions on the order of decimetre. Guida et al. [16] proposed to employ a more refined model accounting for both geometrical and electromagnetic properties of the building. Based on this model, an approach to extract parameters describing the shape and materials of a generic building was proposed in [17]. Ferro et al. [18] presented a method of detecting building and reconstructing radar footprints based on extraction of a set of low-level features from images and on their combination in more structured primitives. Brunner et al. [19] presented a building height estimation method by iteratively simulating a SAR image and matching it with the actual SAR image to test the building hypothesis. Cellier et al. [20] presented a building reconstruction technique for InSAR data based on building hypothesis management. A method for the 3D reconstruction of buildings using very high resolution (VHR) optical data and SAR image was presented in [21].

Among the aforementioned studies, quite a lot of early works focus on merging building features from multiple SAR images or multi-sensor data. Such strategy of information fusion from multiple images is due to the relatively coarse image quality, to some extent. It also implies that the investigated area is observed more than once with different viewing angles or directions. This obviously causes limitations in some applications such as emergency response. With the acquisition of VHR SAR images, more information can be utilized in a single SAR image and recent works began to address the problem of building extraction from single SAR data. However, due to the high complexity of VHR SAR images over built-up areas, building extraction from highly urban areas remains a challenging task.

In this article, which improves the work presented in [22], a general framework for building detection using both building characteristics and contextual information is proposed. This method is applied to single SAR images and has the following abilities. (1) It is able to detect and segment isolated buildings even though they are densely distributed. When gray-level values of buildings fluctuate greatly, different parts of a single building can be merged. (2) It is suitable for common buildings with different shapes, either linear or L-shaped. Due to imaging conditions, buildings may differ in shapes in different SAR images. Most common building shapes are linear or L-shaped lines, and the lines can be very thin (only several pixels wide) or with a certain width. Usually, the two situations are dealt with separately by adopting different algorithms. We consider that a method adapted to different building shapes is more flexible in practical applications. (3) It can locate accurate building boundaries. Here boundary means the boundary of a building appearing in SAR image, not the boundary or foot-print in the real world. In some researches, detection is not the final purposes. Detection results provide useful information for building reconstruction or 3D dimension extraction [23, 24]. The accuracy of detection results, such as building boundaries, mainly the boundaries of overlay regions for SAR images, will directly determine the performance of reconstruction [4, 9–11, 25].

2. Overview of the proposed method

Similar to our previous work in [22], the new general framework proposed in this article utilizes the marker-controlled watershed transform [26]. Two reasons are considered. First, when it comes to separating objects with closed and accurate boundaries, the watershed transform is a very efficient and widely used method. Second, markers help segment objects of interest by introducing their characteristics. In the previous article, only building characteristics were considered, which sometimes causes failures in segmenting entire boundaries, especially for buildings with fluctuating gray-level values. To solve this problem, contextual information is introduced in the new framework. As it will be analyzed later that contextual information can keep the entirety of each building and prevent merging adjacent buildings even when they are close to each other.

The watershed transform is an important segmentation approach proposed by Vincent and Soille [27]. It is an intuitive and fast method, producing closed segmentation boundaries. But it suffers from oversegmentation due to noise and other local irregularities of the gradient. Oversegmentation can be serious enough to render the result of the algorithm virtually useless. Some methods are proposed to solve the problem of oversegmentation, such as filling up the basins to a predetermined level to eliminate the lowest peaks which may be insignificant in terms of boundary detection [28] or computing dynamics of contours which is a contrast criterion measuring the grey-level difference between peaks and surrounding minima [29]. Essentially, these improvements take no consideration of the characteristics of objects. For example, when a predetermined level is used, only the value of the gradient module is concerned, thus an improper predetermined level will break the boundaries of objects.

Another approach to control oversegmentation is based on the concept of markers [26], which is considered in our building detection method. A marker is a connected component belonging to an image, which can be an internal one associated with objects of interest or an external one associated with the background. Markers are used to help with the modification of the gradient image in order to suppress insignificant regional minima. So by using the marker-controlled watersheds transform, we can decrease the regional minima and bound them within the region of interest to prevent oversegmentation. The marker-controlled watershed transform is often used to segment objects with some similarities (grey, texture, shape, etc), and thus quite suitable for extracting buildings in SAR images, which have distinct characteristics and strong similarities.

Marker-controlled watershed transform relies on two key steps: extracting markers and modifying the gradient image. Most buildings have strong radar backscatter energy. Thus, a characteristic feature of buildings is the presence of bright-pixel clusters in a SAR image, which can be used as internal markers. Besides, building shadows and the surrounding roads form black and netlike structures in SAR images. They can be used as external markers. Based on both features, internal markers are extracted by a CFAR detector and external markers are extracted by a power ratio (PR) detector. An edge strength image is obtained by the ratio of exponentially weighted averages (ROEWA) edge detector, which is especially designed for SAR images. Then the minima imposition technique is combined with the markers to modify the edge strength image of the original SAR image and most spurious minima are removed. The potential building boundaries are obtained by computing the watersheds of the edge strength image. Finally, the postprocessing stage is used to remove false alarms. Figure 1 depicts the whole process of our framework. In Sections 3 to 5, we will, respectively, describe the algorithm details of each module in the framework, namely, maker extraction, computation, and modification of the edge strength image, and postprocessing.

3. Extraction of markers

As mentioned in section 2, internal markers are associated with building characteristics and external markers with contextual information. Therefore, CFAR detector is used to extract bright pixels of buildings, as the internal markers. PR detector is adopted to extract building shadow and roads, as the external markers.

3.1 Bright pixel detection based on OS-CFAR detector

CFAR processing is useful for detecting strong reflectors in the background clutter and has widely been used in man-made objects detection from SAR images. In this pixel-based method, the signal at the pixel under test is compared with an adaptive threshold, generated from a sliding window of reference pixels from the background. The reference pixels are used to estimate the parameters of the underlying clutter statistical distribution. For high-resolution SAR images, the square hollow-stencil sliding-window is usually adopted (see Figure 2). A guard area is set in the window according to the target size and it can help prevent the target pixels from influencing the parameter estimation.

According to different methods for clutter parameter estimation, CFAR detectors can be classified as the cell averaging CFAR (CA-CFAR), order statistic CFAR (OS-CFAR), greatest of CFAR (GO-CFAR), etc. [30–33]. Theoretically, they are all capable of detecting the bright pixels of buildings. However, they are suitable for different situations. The CA-CFAR technique works well in situations where a single target is present in locally homogeneous clutter. In the presence of heterogeneous environment (including the clutter edge and multi-target situations), the performance of the CA-CFAR detector degrades rapidly. The OS-CFAR algorithm is designed to overcome the problem of the loss in detection performance suffered by the CA-CFAR when interfering targets are in the background cells and clutter statistics estimation is corrupted. Therefore, it has significant advantage when detecting targets in multi-target situations. The GO-CFAR algorithm provides good detection performance in clutter edge situations.

Since building detection is facing a typical multi-target situation, the OS-CFAR algorithm is considered for detecting bright pixels of buildings. In a general form of OS-CFAR detection, M reference cells are sorted in an increasing order according to their values. The threshold is obtained by selecting the k th ranked cell to represent the noise and clutter level [34]. However, it is difficult to theoretically derive the optimal decision statistic. To achieve robust performance, Ritcey [35] proposed an OS-based two-parameter CFAR, which is used to detect bright pixels in this article. The reference cells in the sliding window (as shown in Figure 2) are sorted in an increasing order, i.e.,

p_{1} \leq p_{2} \leq \dots p_{k} \leq p_{n_{c}},

where p _i is the i th ranked cell and n _c is the number of clutter cells. The ordered statistics of the clutter region are used as estimation of the mean value and the standard deviation. The detection rules are

➣ if $\frac{p_{t} - p_{50}}{p_{75} - p_{25}} > T_{CFAR},$ the cell under test is a target pixel

(a bright pixel);

➣ if $\frac{p_{t} - p_{50}}{p_{75} - p_{25}} \leq T_{CFAR},$ the cell under test is a background

pixel,

where p _t is the test cell. p ₅₀ is the median value ${\hat{μ}}_{c}$ of the ranked cells, which is used as the approximation to the mean value of the clutter region. p ₂₅ and p ₇₅ are, respectively, the [0.25 · n _c]th and [0.75 · n _c]th values of the ranked cells. [x] represents the integer nearest to x. p ₇₅ - p ₂₅ is the approximation to the standard deviation ${\hat{σ}}_{c}$ of the clutter region. The decision statistic $\frac{p_{t} - p_{50}}{p_{75} - p_{25}}$ represents the two-parameter CFAR decision statistic $\frac{p_{t} - {\hat{μ}}_{c}}{{\hat{σ}}_{c}}$ . T _CFAR is the CFAR detection threshold with the Gaussian distribution, which is commonly used in two-parameter CFAR detectors. The CFAR adaptive threshold T _CFAR and a given false-alarm rate P _FA is related by

1 - P_{FA} = \int_{0}^{T_{CFAR}} P_{G} (I) dI

(1)

where P _G(I) is the Gaussian distribution of the clutter intensity. T _CFAR is obtained by solving Equation (1).

After CFAR detection, a binary image B(x, y), 1 ≤ x ≤ m, 1 ≤ y ≤ n is obtained, where 1 indicates a bright pixel and 0 indicates a background pixel. Regions with pixels more than T _A are removed, where T _A is a threshold on the pixel number of a region. A low T _A is preferred for the purpose of removing small false alarms and keeping the building objects as complete as possible. The remained regions are denoted by {R _i}, i = 1, …, N _R, where R _i stands for the i th region and N _R is the total number of these regions. A new binary image is then defined as follows:

B^{'} (x, y) = \{\begin{array}{c} 1, (x, y) \in {R_{i}}, i = 1, \dots, N_{R} \\ 0, otherwise \end{array}

(2)

The binary internal marker image B _in(x, y) is obtained by implementing region filling on B ^′(x, y). In B _in(x, y), pixel value 1 represents the internal markers and 0 represents background.

3.2 Shadow/road detection based on PR detector

In the marker-controlled watershed transform, external markers are used to mark the background. More specifically, external markers can restrict each object in a certain region according to background information. If there is no prior information about the background, a convenient way of extracting external markers is to compute the watershed lines of the internal markers image, which is adopted in our previous article. However, the precondition is that internal markers can represent objects correctly and entirely. If the internal marker corresponding to a single object falls into several parts, the external markers will separate them in different regions. Consequently, the object will also be segmented into several parts. To solve the problem, we introduce the contextual information to mark the background instead. In other words, external marker extraction is independent of the extracted internal markers and this will improve the robustness of our detection method.

In the built-up areas in SAR images, shadows and roads form black and netlike structures, which provide the main contextual information of buildings. Such structures can be extracted and used as effective external markers. As mentioned in the introduction, building shadows were extracted using the ML method [2], the active-contour-model-based segmentation method [10], the mask-based method [12], etc. More recently, methods based on morphological profiles are used for feature extraction from urban remote sensing data [36, 37] and find use in street tracking from SAR images [38]. Most methods aim at accurate shadow contour segmentation for building reconstruction and they are relatively sophisticated. In our building detection framework, however, the purpose of shadow/road detection is different from them. The following aspects are considered. First, the connectivity of the extracted structure should be kept. Second, the accurate contours of the structure are not required here. Third, simple extraction algorithm with fewer procedures and parameters is preferred. Therefore, the PR detector [39], a simple and flexible method, is adopted to extract the whole netlike structure.

Similar to the CFAR detector, the PR detector also uses a sliding window (see Figure 2). Differently, the central square region (not only the test cell) in the window is used to compute the shadow power and the surrounding annular region is used to compute the clutter power. Dark regions (shadows and roads) in the image are detected, with the test:

\frac{{\hat{μ}}_{ROI}}{{\hat{μ}}_{C}} < λ_{L} presence of a shadow / road pixel,

(3)

where ${\hat{μ}}_{ROI} \cdot$ and ${\hat{μ}}_{C} \cdot$ are respectively the average power inside of the central window and the clutter power estimated in the annular region, λ _L is the detection threshold. By using this rule, each pixel in the SAR image is decided to be a shadow pixel or not. After that, we have a binary image, where 1 indicates a shadow/road pixel and 0 indicates a non-shadow/road pixel. The external marker image B _ex(x, y) is obtained by implementing morphological operations such as thinning and skeleton extracting on the binary image.

3.3 Marker image

With the internal and external images, the final marker image is defined as follows:

f_{m} (x, y) = \{\begin{array}{c} 0, B_{in} (x, y) = 1 or B_{ex} (x, y) = 1 \\ t_{max}, otherwise \end{array},

(4)

where t _max is the maximum value of the edge strength image of the SAR intensity image I(x, y). Since noise in SAR images are modeled as multiplicative, typical edge detectors for optical images are not suitable here. We adopt the ROEWA detector [40] for SAR images to compute the edge strength image g, which here plays the same role as the gradient image in a typical watershed transform-based segmentation for optical images.

4. Modification of the edge strength image

As mentioned before, noise and other local irregularities of the edge strength image usually cause oversegmentation when the watershed transform is directly used. Therefore, the edge strength image must be filtered to remove all the irrelevant minima and obtain meaningful segmentation result. Under the marker-controlled framework, the minima imposition technique [26] is an appropriate choice. The minima imposition technique is a kind of morphological reconstruction, which concerns the filtering of the image minima. It requires a set of markers marking relevant objects or background. It is based on geodesic erosion and reconstruction by erosion, both of which involve a mask image (the image to be processed) and a marker image. In this article, the mask image is the edge strength image g.

The geodesic erosion of size n of the marker image f _m with respect to the mask image g is an iterative process, which has the form

ε_{g}^{(n)} (f_{m}) = ε_{g}^{(1)} [ε_{g}^{(n - 1)} (f_{m})],

(5)

ε_{g}^{(1)} (f_{m}) = ε^{(1)} (f_{m}) \lor g,

(6)

where ε ⁽¹⁾ is the elementary erosion operator, ∨ is the point-wise maximum operator, ε _g ⁽⁰⁾(f _m) = f _m. According to (6), geodesic erosion of size n = 1 equals to that the marker image is first eroded and second the point-wise maximum with the mask image is calculated. When the geodesic erosion of f _m with respect to g iterates until the stability is reached, we get the reconstruction by erosion of g from f _m

R_{g}^{ε} (f_{m}) = ε_{g}^{(i)} (f_{m})

(7)

where i is such that ε _g ⁽ⁱ⁾(f _m) = ε _g ^(i + 1)(f _m).

Based on the reconstruction by erosion, the imposition of the minima of the edge strength image is performed in two steps:

Step 1. The point-wise minimum between the edge strength image and the marker image is computed: (g + 1) ∧ f _m. The resulting image (g + 1) ∧ f _m is lower or equal to the marker image.

Step 2. The reconstruction by erosion of (g + 1) ∧ f _m from the marker image f _m is computed as the modified edge strength image $g^{'} = R_{(g + 1) \land f_{m}}^{ε} (f_{m})$ .

Figure 3 illustrates the imposition of minima on a 1D signal. After filtered with the minima imposition technique, the minima of the markers are imposed to the edge strength image and other insignificant minima of the edge strength image are suppressed. According to Equation (4), the minima of markers correspond to building pixels and shadow/road pixels. The peaks (corresponding to the building boundary) between the markers are still kept. Therefore, when the watershed transform is applied to the filtered edge strength image, the extracted boundaries are mostly the building boundaries.

Figure 4 gives an example of segmenting a simulated image by applying the watershed transform to its edge strength image and modified edge strength image obtained by the minima imposition techniques. Figure 4a is a simulated image (297 × 245 pixels) with multiplicative noise following the Gamma distribution. There are four bright objects in this image with different shapes and orientations. Specially, object B has a dark part in it, imitating strong fluctuation of gray-levels. Figure 4b shows the external (red) markers marking the background and the internal (green) markers marking the objects. All these markers are manually extracted. Two internal marker regions are used for object B, since it tends to be detected as two parts by a CFAR detector. Figure 4c is the binary image where only the marker pixels are bright. Figure 4d shows the edge strength image of Figure 4a obtained by the ROEWA detector. Very strong responses occur at the locations of object boundaries. Strong responses caused by the dark part in object B are evident. Many local peaks of edge strength also exist in the background. Figure 4e is the gray-level profile of Row 78 in the edge strength image in Figure 4d. This row horizontally passes though objects A and B. From Figure 4e, we can see peaks corresponding to boundaries of objects A and B, indicated by p _A and p _B, respectively. We can also find two lower peaks caused by the gray-level fluctuation in object B. At other locations, the values of edge strength are not identical, with many local peaks and minima. Due to the existence of so many peaks and minima, no matter high or low, oversegmentation (see Figure 4f) happens when applying the watershed algorithm to Figure 4d. Figure 4g shows the modified edge strength image using the minima imposition technique and Figure 4h is the gray-level profile of Row 78 in Figure 4g. According to Figure 4h, edge strength values become zero at marker pixels. Real boundary peaks between markers, like p _A and p _B, are kept. Since the two false peaks in object B fall between two internal markers, they are flattened and merged into one. Other minima between markers are suppressed. In other words, meaningful peaks are kept. Figure 4i shows the result of applying the watershed segmentation algorithm to the modified edge strength image. We can see that the boundaries of the four objects are correctly extracted. Moreover, although a watershed line exists within object B, it is easy to merge the two parts. The segmentation result of object B demonstrates that even a building in a SAR image is detected as several parts, it can be merge by our method as long as they are surrounded by correct external markers (the shadow/road structure in this article).

5. Postprocessing

After the modified edge strength image is obtained, the watershed algorithm is applied to it. The segmented objects include:

1)
real buildings;
2)
non-building objects with strong backscattering;
3)
artificial objects caused by the watershed algorithm when a region partitioned by the external markers has no internal markers in it.
2)
and 3) are the false alarms to be removed. For 2), we can distinguish them from real buildings based on geometric features such as shape and area. For 3), apparently, they have no corresponding internal markers. Accordingly, we have the following rules for deciding whether a segmented object is a building.

Rule 1: a building should have corresponding internal markers.

Rule 2: the area of a building is higher than an area threshold;

Rule 3: buildings are linear or L-shaped.

Rules 1 and 2 are easy to decide. However, rule 3 needs further analysis. Instead of accurate shape fitting, we focus on designing an approach to quickly choose regions, which are approximately linear or L-shaped structures. Since most of these regions have certain width, the commonly used methods for fitting single-pixel-width lines, such as the Hough transform, are not directly used here. We designed a shape analysis method, called as direction correlation analysis (DCA), which is based on the correlation of pixels in a region [22].

The DCA method is designed to test whether a region in a binary image is linear or L-shaped. In [22], the DCA method was used in the stage of internal marker extraction to determine whether a region detected by the CFAR detector corresponds to a building or not. With further experiments we find that if a building is detected as several parts, the DCA method will remove them and the building will be missed. Therefore, it is more reasonable to use DCA to determine whether a segmented object is a building or not in the postprocessing stage. The main idea of the DCA method is as follows. Suppose we have a region corresponding to a candidate object in a binary image. For an arbitrary pixel (x, y) in this region, we can draw a line passing through (x, y) with angle θ in the image and the line intersects the image boundaries. The number of pixels both in the region and along the line is defined as the length of the line. With different θ, we can draw a number of lines and find the longest one. We denote the angle of the longest line going through (x, y) by θ _{(x, y)}. If a region is approximately a linear structure, it has one major direction and then pixels in it will have similar θ _{(x, y)}. If a region is approximately an L-structure, it will have two major directions with an angle difference close to 90°. Based on this, we can define the measurements of direction correlation of a region R as follows:

D C_{1} (R) = Var \{θ_{(x_{i}, y_{i})} |(x_{i}, y_{i}) \in R\} / N_{pix}^{R}

(8)

D C_{2} (R) = Var \{|θ_{(x_{i}, y_{i})} - 45^{\circ}| |(x_{i}, y_{i}) \in R\} / N_{pix}^{R}

(9)

where $N_{pix}^{R}$ is the number of pixels of R. According to Equations (8) and (9), a linear region R has low DC ₁(R). An L-shaped region R has low DC ₂(R). If both DC ₁(R) and DC ₂(R) are high, the possibility of R to be a linear or the L-shaped structure is low. Therefore, given a threshold T _DC, if DC ₁(R) < T _DC or DC ₂(R) < T _DC, R is remained; otherwise, R is removed.

When computing DC ₁(R) and DC ₂(R), it is difficult to quickly decide θ _{(x, y)} for each pixel. To solve this problem, the Radon transform is applied to a local window centered at each pixel (see Figure 5). The result of Radon transform gives the lengths of lines pass through this pixel along different directions. Since the Radon transform can efficiently be performed, the computational time can greatly be reduced.

6. Experiments and analysis

6.1 Dataset description

Different test areas were chosen from the city of Hefei, China. The X-band airborne SAR data is provided by East China Research Institute of Electronic Engineering, with a spatial resolution of 1 m both in azimuth and in range. To evaluate the performance of the proposed method under a variety of conditions, test SAR data is chosen based on the following considerations. Firstly, from the aspect of scene complexity, we consider both highly urban areas and industrial area. Secondly, individual buildings with different dimensions, roof structures and materials are considered. They appear as bright rectangles, thin lines or L-shaped structures with different intensity fluctuations in the test SAR images.

6.2 Building detection results over highly urban areas

Figure 6a–h depicts the whole experimental process of the proposed method on the first test site. Figure 6a is the initial SAR image with the size of 220 × 187 pixels. The distances between adjacent buildings are small. The gray values of each building fluctuate greatly. Especially, building A (marked by a green rectangle) appears as bright blob-like regions. Figure 6b shows the result of bright pixel detection, which is used as internal markers. In the OS-CFAR detection, a 25 × 25-pixel sliding window with 24 × 24-pixel guard area is used. Dimensions of sliding window and guard region are selected according to building sizes in SAR images. If the guard area is too small to cover the target, target pixels will leak to the clutter region and influence parameter estimation. To avoid merging adjacent buildings into one, a low P _fa = 0.01 is used here. After removing small regions and implementing region filling, the internal marker image is obtained. From Figure 6b, we can see that building A is divided into many parts. Some other buildings also have the similar problem. Figure 6c shows the external marker image obtained by the PR detector with λ _L = 1 and morphological operations. The dimensions of the sliding window, guard region, and central region are 15 × 15 pixels, 11 × 11 pixels, and 5 × 5 pixels, respectively. The netlike external markers effectively partition the image into regions. Each building, represented by the internal markers, is located in one of those regions. Figure 6d is the edge strength image computed by the ROEWA operator. Strong responses exist at the locations of building boundaries. In Figure 6e, both the internal and external markers are superimposed on the edge strength image. Obviously, the regions where building boundaries may locate are properly limited by combining the internal and external markers. Therefore, by applying the minima imposition technique, peaks (corresponding to the building boundaries) in the limited regions are kept, while other insignificant peaks are suppressed. Figure 6f shows the segmentation results of the watershed transform following the minima imposition technique. The grey regions represent the segmented regions and the white contours represent the regions boundaries. Some regions share parts of their boundaries (e.g., building A). The reason is when two or more parts of internal marker are surrounded by a closed external marker, they will be connected by a common part of boundaries by the watershed transform. Thereby, we can easily merge such regions to prevent a single building being partitioned into several parts. Then, according to Rules 1 and 2 in Section 5, small segmented objects or objects with no corresponding internal markers are eliminated (see Figure 6g). Finally, the DCA method is applied to the remained objects. The purpose of DCA is not accurate direction estimation, so only the projections on the angle set 0°, 10°, 20°,⋯,170° are computed. The results of direction correlation measurements DC ₁ and DC ₂ of each region are shown in Figure 7. Since most buildings are linear, corresponding DC ₁ and DC ₂ are very low. Only several DC ₁ and DC ₂ are quite high. With the threshold T _DC = 0.15, four false alarms are removed. The value of T _DC is set according to the experiments on a number of buildings from images acquired by the same airborne SAR system. After postprocessing, the final detected building boundaries are given in Figure 6h.

To further explain how the contextual information helps improve the detection performance, comparison between the new method and our previous work is made. Both detection methods are made of several steps and differ in concrete algorithms, such as CFAR detection and postprocessing. However, we think the essential improvement of the new method is that the external markers are computed based on the contextual information. Therefore, we mainly consider the effects of different external markers on detection results, and other detection steps are the same. Figure 6i gives the external marker image by computing watershed transform of the internal marker image, which is used in our previous method. Take building A for example, its internal markers are consist of several parts. Consequently, the external markers separate these parts (see Figure 6i). Building A is also segmented into small parts (see Figure 6j) and removed in postprocessing (see Figure 6k). Although we can cluster these parts by setting a distance threshold, it is difficult to adaptively choose the threshold, especially when buildings are densely distributed. As for the new method, this problem is solved by introducing external markers determined by contextual information. Therefore, we think the new method is more robust and practical. Besides, the new method also has better performance in boundary localization. Figure 6l gives the reference building boundaries, which are extracted manually.

Figure 8 gives the experimental results of the second test site. Figure 8a is the initial SAR image with the size of 206 × 144 pixels. This area is characterized by some linear buildings with fluctuating gray-level values. Dimensions of sliding windows for CFAR and PR detectors as well as other thresholds used here are the same with those in the first experiment. Figure 8b shows the edge strength image with the internal (white pixels) and external markers (black pixels). Figure 8c is the segmented objects by the watershed transform. After removing small regions, the DCA method is applied to 15 regions. Figure 9 shows the results of the DCA method. Regions with either DC ₁ or DC ₂ lower than T _DC are eliminated. Figure 8d shows the final results with detected building boundaries. A false alarm exists in the final detection result, which is caused by the strong backscattering over the road boundary. Since this false alarm has shape and size similar to that of real buildings, it is difficult to remove it from the result. Figure 8e–g shows the results obtained by our previous method. Some buildings are segmented into two parts (see Figure 8f), and after postprocessing small parts are removed. Therefore, in the final result (Figure 8g), two buildings are missed and some boundaries do not match the whole objects. Figure 8h gives the reference building boundaries, which are also extracted manually.

Figure 10 gives the experimental results of the third test site. Figure 10a is the initial SAR image with the size of 312 × 363 pixels. There are three industrial buildings in this area. Reflections of the flat roofs are hardly visible. Very bright lines appear along the surface discontinuity formed by the building and the ground due to the double bounce reflections along the building walls. The three buildings are characterized as L-shaped lines in the SAR images. Since the background is relatively simple and the buildings appear much brighter in contrast with the background, a low p _fa with value of 0.001 is set to reduce false alarms. Dimensions of sliding windows for CFAR and PR detectors as well as other thresholds used here are the same with those in the first experiment. Figure 10b shows the edge strength map with the internal (white pixels) and external markers (black pixels). Figure 10c is the segmented objects by the watershed transform. Only five regions are segmented. The region on the top-left has no corresponding internal marker. After removing this region, the DCA method is applied to four regions. Figure 11 shows the results of the DCA method. The region with high DC ₁ and DC ₂ is eliminated. Figure 10d shows the final results with detected building boundaries. Figure 10e–g shows the results obtained by our previous method. Since the internal markers perfectly correspond to the buildings, the external markers correctly surround every internal markers. The final detection result is also satisfactory. Figure 10h gives the reference building boundaries, which are also extracted manually.

The experiments carried out on test sites I, II, and III have discussed the performance of the proposed method over three representative cases of building detection from urban SAR images in detail. The results validate the effectiveness of the proposed method and its superiority over our previous article. Furthermore, we apply our method to another SAR image with more complex scene. Figure 12a is the initial SAR image with the size of 312 × 363 pixels and Figure 12b shows the manually extracted reference building boundaries. In this test site, there are buildings with different size, shape, and orientation. Sixty-one regular buildings (marked by A1, A2,…,A61) and other three buildings (marked by B1, B2, and B3) are considered here. Figure 12c is the edge strength image with markers superimposing on it. Parameters for CFAR and PR detectors used here are the same with those in the first experiment. Figure 12d is the segmented objects by the watershed transform. After postprocessing, some false alarms are removed. Figure 12e gives the final detection result and Figure 12f shows the extracted building boundaries. Most buildings are correctly detected, although some buildings are merged into one because they are too close to each other, e.g., A1 and A14. The buildings at the bottom right of the image (like A55–A61) show fluctuating gray-level values. A59 is missed since no internal markers represent it. B2 is missed because of small size. Two false alarms exist near the road/shadow boundary in the final detection result. As we can see that our method is not sensitive to the orientation. As for the size and shape of buildings, they do not have prominent influence on segmentation, as shown in Figure 12d. However, we want to point out that, in the postprocessing stage of this experiment, it is a little more difficult to determine the threshold for DC ₁ and DC ₂ because of the differences in building shape. Especially, the values of DC ₁ and DC ₂ of buildings with less rigorous linear or L-shape (like B1 and B3) are a little higher. A suitable threshold should be chosen to keep these buildings but remove false alarms. Therefore, we think that more rules should be considered when facing the problem of detecting buildings with complex shapes.

6.3 Quantitative performance evaluation of the proposed method

To quantitatively evaluate the performance of the proposed method, metrics for object space and for boundary precision are, respectively, considered. All comparisons are made between the detected buildings and the manually extracted reference buildings.

The metrics for object space are defined as follows [41]:

▪ TP (True Positive): a detected object that is also in the reference.
▪ FP (False Positive): a detected object that is not in the reference; also termed a false alarm.
▪ FN (False Negative): an object in the reference that is not detected.

To evaluate performance, the numbers of TP, FP and FN objects are counted, and then the following metrics are computed:

▪
$Detection rate : DR = \frac{TP}{TP + FN}$
(10)
▪
$False Alarm rate : FAR = \frac{FP}{TP + FP}$
(11)

To evaluate the precision of the detected building boundaries, the following steps are performed. A similar method can be found in [42].

▪ A binary image is obtained according to the reference building boundaries. Boundary pixels are assigned value of 1 and non-boundary pixels are assigned value of 0. A distance image f _dist is then extracted from the binary image.
▪ The location of the detected boundary pixels are recorded as {(x _b ⁱ, y _b ⁱ)}, i = 1, …, N _b. N _b. N _b is the number of all the boundary pixels.
▪ The parameter ${\bar{D}}_{offset}$ ,which measures the distance between the detected boundary and the reference boundary, is computed as
${\bar{D}}_{offset} = \frac{\sum_{i = 1}^{N_{b}} f_{dist} (x_{b}^{i}, y_{b}^{i})}{N_{b}}$
(12)

Table 1 gives the performance statistics for all the test sites. DR s are 100% for test sites I, II, and III, 95.3% for test site IV, respectively. FAR s are 0 for test sites I and III, 11.1% for test site II, 0.2% for test site IV, respectively. An overall DR of 96.6% and FAR of 2.3% shows that the proposed method has a good performance in detecting buildings and removing false alarms. As for the results of boundary precision, 0.5 pixel is with test I and 0.6 pixel with test III. Relatively higher results of 0.7 pixel are computed over test sites II and IV, due to fluctuating gray-level values and complex scene, respectively. The results indicate that the extracted building boundaries very close to the reference boundaries. In other words, our method also performs well in precise boundary localization, which is meaningful for building dimension extraction or reconstruction.

Table 1 Performance of the proposed method

Full size table

Table 2 shows the running time of the proposed method over different test sites. All the experiments are accomplished by Matlab codes (Matlab 7.5.0) with a hardware environment of Pentium (R) D CPU 2.80 GHz and 1 GB of RAM. According to Table 2, marker extraction and postprocessing are two time-consuming phases. Marker extraction adopts a local computation manner with a sliding window. Since the sizes of test images in our experiment are small and the window size (depending on the building size) is not large, the computational time of marker extraction is acceptable. However, if large image is processed or large window is used, fast algorithms should be designed for CFAR and PR detectors. Similar solutions can be found in [43]. The computational load of postprocessing increases as the number of targets increases (see results of test sites I and IV) or the sizes of targets gets large (see results of test site III).

Table 2 Running time of the proposed method

Full size table

7. Conclusion

Since the existing methods of building detection from SAR images are mostly not robust for images with complex scene or different appearances of buildings, a method of detecting buildings from a single high-resolution SAR images is proposed in this article, aiming at detecting buildings with their whole and accurate boundaries from the built-up area. By introducing a general framework based on the marker-controlled watershed transform, our method can make use of not only the characteristics of the building, which are strong scattering and high gray-level values, but also the characteristics of the contextual information, which are the black netlike structures formed by roads and shadows. As shown in the experimental results, the combination of the characteristics of buildings and background can overcome the problems of linking neighboring buildings in complex scene or dividing a building into several parts when its gray-level values fluctuate greatly. Besides, the new method can get the closed boundaries of the buildings. Since the ROEWA edge detector, an edge detector for SAR images with good localization performance, is used, the detected building boundaries are also accurately localized. Furthermore, according to the typical shapes of the building in SAR images, a shape analysis method called direction-correlation analysis is used to remove the false alarms. The quantitative performance evaluation validates that the proposed method is effective with high detection rate, low false-alarm rate, and good localization performance. The detection results can be useful for the process of extracting the buildings’ geometrical information.

In the future, we expect to continue refining and validating our research on a wider set of SAR imagery. Although using markers introduces knowledge about the buildings and their surroundings, how to automatically set the thresholds for marker extraction is still a problem. The relation between these thresholds and the knowledge of images (e.g., resolution) as well as the knowledge of the objects in the real world (e.g., the distribution rules of the objects) can be considered to improve threshold setting. For example, if the knowledge of interested buildings such as sizes and spacing is available, building sizes in the real world can be transformed to pixels in the image space, since resolutions of SAR images are usually provided. Thus, some parameters such as window sizes of detectors, area threshold to remove false alarms can adaptively be set. Moreover, this article provides a general framework for building detection. So far, we mainly apply it in detecting buildings with simple shapes. More complicated scenes may require more complex rules. How to extend the framework to detecting buildings with more complex shapes and how to solve building detection problem in environment with much disturbance, e.g., tree clutters, also need further research.

References

Soergel U: Radar remote sensing of urban areas. Springer; 2010.
Book Google Scholar
Chellappa R: Advanced automatic target recognition. Center for automation research, University of Maryland, College Park; 1998.
Google Scholar
Tupin F, Maitre H, Mangin JF, Nicolas JM, Pechersky E: Detection of linear features in SAR images: application to road network extraction. IEEE Trans Geosci Rem Sens 1998, 36: 434-453. 10.1109/36.662728
Article Google Scholar
Tupin F: Extraction of 3D information using overlay detection on SAR images. In 2nd GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas. Berlin, Germany; 2003:72-76.
Google Scholar
Tupin F, Roux M: Detection of building outlines based on the fusion of SAR and optical Features. ISPRS J Photogramm Remote Sens 2003, 58: 71-82. 10.1016/S0924-2716(03)00018-2
Article Google Scholar
Simonetto E, Oriot H, Garello R: Extraction of industrial buildings from stereoscopic airborne radar images. In Proceedings of SPIE Conference No. 4543 on SAR Image Analysis, Modeling, and Techniques IV. Toulouse, France; 2002. SPIE 4543, 121–129
Google Scholar
Simonetto E, Oriot H, Garello R: Rectangular building extraction from stereoscopic airborne radar images. IEEE Trans Geosci Remote Sens 2005, 43: 2386-2395.
Article Google Scholar
Soergel U, Michaelsen E, Thiele A: Radargrammetric extraction of building features from high resolution multi-aspect SAR data. In 2006 IEEE International Conference on Geoscience and Remote Sensing Symposium. Denver, CO, USA; 2006:3635-3638.
Chapter Google Scholar
Xu F, Jin YQ: Automatic reconstruction of building objects from multiaspect meter-resolution SAR images. IEEE Trans Geosci Remote Sens 2007, 45: 2336-2353.
Article Google Scholar
Hill RD, Moate CP, Blacknell D: Urban scene analysis from SAR image sequences. In Proceedings of the SPIE Algorithm Synthetic Aperture Radar Imagery XIII. Orlando, USA; 2006:623702-1-623702-12. 6237
Google Scholar
Bennett AJ, Blacknell D: The extraction of building dimensions from high resolution SAR imagery. In Proceedings of the International Radar Conference. Huntsville, Alabama, USA; 2003:182-187.
Google Scholar
Bolter R, Leverl F: Shape-from-shadow building reconstruction from multiple view SAR images. In 24th Workshop of the Austrian Association for Pattern Recognition. Villach, Carinthia, Austria; 2000:199-206.
Google Scholar
Michaelsen E, Soergel U, Thoennessen U: Perceptual grouping for automatic detection of man-made structures in high-resolution SAR data. Pattern Recognit Lett 2006, 27: 218-225. 10.1016/j.patrec.2005.08.002
Article Google Scholar
Soergel U, Thoennessen U, Brenner A, Stilla U: High-resolution SAR data: new opportunities and challenges for the analysis of urban areas. IEE Proc Radar Sonar Navigation 2006, 153: 294-300. 10.1049/ip-rsn:20045088
Article Google Scholar
Soergel U, Michaelsen E, Thiele A, Cadario E, Thoennessen U: Stereo analysis of high-resolution SAR images for building height estimation in cases of orthogonal aspect directions. ISPRS J Photogramm Remote Sens 2009, 64: 490-500. 10.1016/j.isprsjprs.2008.10.007
Article Google Scholar
Guida R, Iodice A, Riccio D, Stilla U: Model-based interpretation of high-resolution SAR images of buildings. IEEE J Sel Top Appl Earth Observations Rem Sens 2008, 1: 107-119.
Article Google Scholar
Guida R, Iodice A, Riccio D: Height retrieval of isolated building from single high-resolution SAR images. IEEE Trans Geosci Remote Sens 2010, 48: 2967-2979.
Article Google Scholar
Ferro A, Brunner D, Bruzzone L: Building detection and radar footprint reconstruction from single VHR SAR images. In Proc. IEEE IGARSS. Honolulu, HI; 2010:292-295.
Google Scholar
Brunner D, Lemoine G, Bruzzone L, Greidanus H: Building height retrieval from vhr SAR imagery based on an iterative simulation and matching technique. IEEE Trans Geosci Remote Sens 2010, 48: 1487-1504.
Article Google Scholar
Cellier F, Oriot H, Nicolas J-M: Hypothesis management for building reconstruction from high resolution InSAR imagery. In Proc. IEEE IGARSS. Denver, CO; 2006:3639-3642.
Google Scholar
Sportouche H, Tupin F, Denise L: Extraction and three dimensional reconstruction of isolated buildings in urban scenes from high-resolution optical and SAR spaceborne images. IEEE Trans Geosci Remote Sens 2011, 49: 3932-3946.
Article Google Scholar
Zhao LJ, Gao G, Kuang GY: Building Detection from a Single High resolution SAR image using Watershed Transform. J Astronaut 2008, 29: 1984-1990.
Google Scholar
Mayer H: Automatic object extraction from aerial imagery-a survey focusing on buildings. Comput Vis Image Understand 1999, 74: 138-149. 10.1006/cviu.1999.0750
Article Google Scholar
Hill RD, Moate CP, Blacknell D: Estimating building dimensions from synthetic aperture radar image sequences. IET Radar Sonar Navigation 2008, 2: 189-199. 10.1049/iet-rsn:20070077
Article Google Scholar
Quartulli M, Datch M: Stochastic geometrical modeling for built-up area understanding from a single SAR intensity image with meter resolution. IEEE Trans Geosci Rem Sens 2004, 42: 1996-2003.
Article Google Scholar
Soille P: Morphological image analysis: principles and applications. Second edition. Springer-Verlag, Berlin, New York; 2003.
Google Scholar
Vincent L, Soille P: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 1991, 13: 583-598. 10.1109/34.87344
Article Google Scholar
Marthon P, Paci B, Castan EC: Finding the structure of a satellite image. Proc Europ Image Signal Process Rem Sens SPIE 1994, 2315: 669-679. 10.1117/12.196767
Article Google Scholar
Najman L, Schmitt M: Geodesic saliency of watershed contours and hierarchical segmentation. IEEE Trans Pattern Anal Mach Intell 1966, 18: 1163-1173.
Article Google Scholar
Novak LM, Hesse SR: On the performance of order-statistics CFAR detectors. In IEEE Twenty-Fifth Asilomar Conference on Signals, System, and Computers. Pacific Grove, California, USA; 1991:835-840. 2
Chapter Google Scholar
Bisceglie MD, Galdi C: CFAR detection of extended objects in high-resolution SAR Images. IEEE Trans Geosci Remote Sens 2005, 43: 833-842.
Article Google Scholar
Kuttikkad S, Chellappa R: Non-Gaussian CFAR techniques for target detection in high resolution SAR images. In Proceedings of ICIP. Austin, Texas, USA: ; 1994:914-910. 1
Google Scholar
Hofele FX: An innovative CFAR algorithm. In 2001 CIE International Conference on Radar. Beijing, China; 2001:329-333.
Google Scholar
Shor M, Levanon N: Performance of order statistics CFAR. IEEE Trans Aerosp Electron Syst 1991, 27: 214-224. 10.1109/7.78295
Article Google Scholar
Ritcey J: An Order-Statistics-Based CFAR for SAR Applications. Electrical Engineering Dept, University of Washington, Seattle, WA; 1990. September
Google Scholar
Benediktsson JA, Pesaresi M, Arnason K: Classification and feature extraction for remote sensing images from urban areas based on morphological transformations. IEEE Trans Geosci Remote Sens 2003, 41: 1940-1949. 10.1109/TGRS.2003.814625
Article Google Scholar
Mura MD, Benediktsson JA, Waske B, Bruzzone L: Morphological attribute profiles for the analysis of very high resolution images. IEEE Trans Geosci Remote Sens 2010, 48: 3747-3762.
Article Google Scholar
Sigurjonsson SO, Benediktsson JA, Sveinsson JR, Lisini G, Gamba P, Chanussot J: Street tracking based on SAR data from urban areas. In Proceedings of IGARSS. Seoul, Korea; 2005:1273-1276.
Google Scholar
Lombardo P, Sciotti M, Kaplan LM: SAR prescreening using both target and shadow information. In Proc. IEEE Radar Conference. Atlanta, USA; 2001:147-152.
Google Scholar
Fjørtoft R, Marthon P, Lopès A, Cubero-Castan E: An optimum multiedge detector for SAR image segmentation. IEEE Trans Geosci Remote Sens 1998, 36: 793-802. 10.1109/36.673672
Article Google Scholar
Shufelt JA: Performance evaluation and analysis of monocular building extraction from aerial imagery. IEEE Trans Pattern Anal Mach Intell 1999, 21: 311-326. 10.1109/34.761262
Article Google Scholar
Dellepiane S, De Laurentiis R, Giordano F: Coastline extraction from SAR images and a method for the evaluation of the coastline precision. Pattern Recognit Lett 2004, 25: 1461-1470. 10.1016/j.patrec.2004.05.022
Article Google Scholar
Gao G, Liu L, Zhao LJ, Shi GT, Kuang GY: An adaptive and fast CFAR algorithm based on automatic censoring for target detection in high-resolution SAR images. IEEE Trans Geosci Remote Sens 2009, 47: 1685-1697.
Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China under Grant no. 61201338. The authors acknowledge the anonymous reviewers for their comments, which helped to improve the article.

Author information

Authors and Affiliations

School of Electronic Science and Engineering, National University of Defense Technology, 47 Yanwachi, Changsha, Hunan, 410073, China
Lingjun Zhao & Gangyao Kuang
Southwest Electronics and Telecommunication Technology Research Institute, Wuhou, Chengdu, Sichuan, 610041, China
Xiaoguang Zhou

Authors

Lingjun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Gangyao Kuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingjun Zhao.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Zhao, L., Zhou, X. & Kuang, G. Building detection from urban SAR image using building characteristics and contextual information. EURASIP J. Adv. Signal Process. 2013, 56 (2013). https://doi.org/10.1186/1687-6180-2013-56

Download citation

Received: 02 July 2012
Accepted: 18 February 2013
Published: 20 March 2013
DOI: https://doi.org/10.1186/1687-6180-2013-56

Building detection from urban SAR image using building characteristics and contextual information

Abstract

1. Introduction

2. Overview of the proposed method

3. Extraction of markers

3.1 Bright pixel detection based on OS-CFAR detector

3.2 Shadow/road detection based on PR detector

3.3 Marker image

4. Modification of the edge strength image

5. Postprocessing

6. Experiments and analysis

6.1 Dataset description

6.2 Building detection results over highly urban areas

6.3 Quantitative performance evaluation of the proposed method

7. Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords