Skip to main content

Optimal guidance whale optimization algorithm and hybrid deep learning networks for land use land cover classification

Abstract

Satellite Image classification provides information about land use land cover (LULC) and this is required in many applications such as Urban planning and environmental monitoring. Recently, deep learning techniques were applied for satellite image classification and achieved higher efficiency. The existing techniques in satellite image classification have limitations of overfitting problems due to the convolutional neural network (CNN) model generating more features. This research proposes the optimal guidance-whale optimization algorithm (OG-WOA) technique to select the relevant features and reduce the overfitting problem. The optimal guidance technique increases the exploitation of the search technique by changing the position of the search agent related to the best fitness value. This increase in exploitation helps to select the relevant features and avoid overfitting problems. The input images are normalized and applied to AlexNet–ResNet50 model for feature extraction. The OG-WOA technique is applied in extracted features to select relevant features. Finally, the selected features are processed for classification using Bi-directional long short-term memory (Bi-LSTM). The proposed OG-WOA–Bi-LSTM technique has an accuracy of 97.12% on AID, 99.34% on UCM, and 96.73% on NWPU, SceneNet model has accuracy of 89.58% on AID, and 95.21 on the NWPU dataset.

1 Introduction

Satellite images have gained interest from various fields, including government and business for various purposes such as biodiversity monitoring, agriculture to surface changes, forestry, and weather. Recent research in remote sensing applied the Deep Learning technique to satellite imagery to extract useful information [1]. Recently, continuous efforts have been made to extract more discriminative features for satellite image classification. Traditional methods are highly focused on handcrafted features such as color and texture features. Mid-level techniques are developed to build more representation to help high-order statistical methods [2]. In the field of remote sensing, scene classification images are challenging and important tasks in real applications such as urban planning from High-Spatial Resolution (HSR), environmental monitoring, geographic image retrieval, geospatial object detection, and natural hazard detection [3]. In many real-world applications of remote sensing, scene classification is an important step. Deep Neural Networks, especially CNN are considered state-of-the-art for satellite image classification. However, CNN requires more labeled data for training and creates an overfitting problem [4, 5].

Recently, CNN-based model provides great progress in super-resolution (SR) of remote sensing images. The CNN-based method provides useful feature representation from plenty of low and high-resolution counterparts. There are many similar ground targets for remote sensing recurred inside the image itself both across different scales and the same scale [6, 7]. The CNN model is efficient for the identification and classification of satellite images from remote sensing images. The Traditional CNN model has two limitations: (1) insufficient training data and (2) imbalance distribution of training and validation set [8, 9]. The existing CNN models provide excellent performance in satellite image classification. The structure of the CNN model becoming more complex and low-level feature learning is difficult to interpret [10]. The objectives and contribution of this research are discussed as follows:

  1. 1.

    The Optimal Guidance-Whale Optimization Algorithm (OG-WOA) is applied to increase the exploitation in the search process and select relevant features for classification. The OG-WOA technique changes the position of the search agent related to the best fitness value to increase the exploration.

  2. 2.

    The AlexNet and ResNet50 models are applied to extract the features from the input images for efficient feature extraction. The OG-WOA technique is applied to select relevant features for the classification which is done by Bi-directional long short-term memory.

  3. 3.

    The proposed OG-WOA–Bi-LSTM technique has higher efficiency in the existing feature selection process and deep learning technique which selects relevant features and avoids overfitting problems.

The organization of the paper is given as follows: Literature review of the satellite image classification is given in Sect. 2, and the OG-WOA technique explanation is given in Sect. 3. The simulation result is given in Sect. 4, and the result and discussion are presented in Sect. 5. The conclusion of this research paper is given in Sect. 6.

2 Literature survey

Recently, the CNN-based models were widely applied for scene classification in satellite images due to their efficiency. Recent CNN techniques in satellite image classification were reviewed to understand its performance.

Xie et al. [11] applied label augmentation to process the data and a joint label was assigned to each generated image to consider data augmentation and label at the same time. Training set intra-class diversity was increased by augmented samples and applied for the classification process. The Kullback–Leibler divergence (KL) was applied to constrain two samples' output distribution of the same category to generate consistent output distribution. The KL divergent provides considerable performance in satellite image classification and lower efficiency in the overlapping of data. Bazi et al. [12] applied vision transformers for satellite image classification and this was considered state-of-the-art in Natural Language Processing. Multi-head attention technique was used to develop building blocks to provide long-range contextual relations between pixels in the images. Embedding position was applied in these patches to track the positions and the resulting sequence was applied to the multi-head attention technique. The softmax classification layer was applied with the token sequence for the classification process. The multi-head attention technique suffers from a vanishing gradient problem. Xu et al. [13] applied Graph Convolutional Network (GCN) with deep feature aggregation for satellite image classification. Pre-trained CNN on ImageNet is applied for multi-layer features extraction and the GCN model was applied to reveal convolutional feature maps related to patch-to-patch correlations and more refined features are extracted. Multiple features are integrated using a weighted concatenation method based on three weighting coefficients and semantic classes of query images were carried out using a linear classifier. The GCN model performance was affected by the overfitting problem in the CNN model.

Alhichri et al. [14] applied the deep attention CNN model for satellite image classification and an attention mechanism is applied to process a new feature map related to a weighted average of original feature maps. The EfficientNet-B3-Attn-2 was an attention technique with a pre-trained CNN model for satellite image classification. A dedicated branch was applied to measure the required weights in CNN and the end-to-end backpropagation technique was used to learn the weights for CNN weights. The model has lower efficiency compared to state-of-the-art techniques. Ma et al. [15] applied multi-objective neural evolution (SceneNet) for satellite image classification. The evolutionary algorithm was applied for network architecture searching and coding and applied flexible hierarchical extraction of satellite image classification. Searched network performance error and computational complexity are balanced using a multi-objective optimization method and Pareto solution set was obtained using competitive neural architecture. The SceneNet model has an overfitting problem in the classification due to the generation of more features in CNN. Naushad et al. [16] applied the transfer learning method in CNN training for fine-tuning VGG16 and Wide Residual Networks (WRNs), additional layers are applied to replace final layers for LULC classification in EuroSAT dataset. The developed method was applied with data augmentation, adaptive learning rates, gradient clipping, and early stopping. The VGG16-WRNs network has considerable performance and suffers from the limitation of overfitting problems in the network.

Tang et al. [17] applied new CNN based Attention Consistent Network (ACNet) was applied using Siamese network. The ACNet dual-branch structure was applied with input data of image pairs using spatial rotation. The different attention techniques are applied to mine the object’s information from satellite images using similarities and spatial rotation. ACNet was applied to unify salient regions and affect satellite images of semantic categories. The learned features were used for the classification task in the network. Li et al. [18] applied a Gated Recurrent Multi-Attention Neural Network (GRMA-Net) for satellite image classification. Informative features occur at multiple stages of a network and multi-level attention modules are applied to focus on informative regions to extract features. Deep Gated Recurrent Unit (GRU) was used for spatial sequences to capture the contextual relationship and long-range independency. Li et al. [19] applied locality preservation deep cross-modal embedding network and fully assimilate the pairwise intra-modal and intermodal in an end-to-end manner to inconsistency between two hybrid spaces. The large-scale satellite images were used to evaluate the model performance in classification.

Wang et al. [20] applied an enhanced feature pyramid network with Deep Semantic Embedding (DSE) for satellite image classification. The DSE module was applied to generate discriminative features based on multi-level and multi-scale features. Two-branch Deep Feature Fusion (TDFF) of the feature fusion module was applied at various levels effectively. Zhang et al. [21] applied a meta-learning technique for few-shot classification and feature extraction was trained to represent inputs. The classifier was optimized in metric space in the meta-training stage using cosine distance with a learnable scale parameter. The developed model shows considerable performance in two datasets and has a limitation of overfitting problem. Zhang et al. [22] developed a suitable CNN model of Remote Sensing-DARTS to find optimal network architecture for satellite image classification. Some new techniques were applied in the search phase for a smoother process, optimal distinguishing and operator. The optimal cell was stacked to develop the final network for classification.

First, a global information and the local features are crucial to distinguish in the Remote Sensing (RS) images. The existing networks are good at capturing the global features of the CNNs’ hierarchical structure and nonlinear fitting capacity. However, the local features are not always emphasized. Second, to obtain satisfactory classification results, the distances of RS images from the same/different classes should be minimized/maximized. Nevertheless, these key points in pattern classification do not get the attention they deserve.

3 Proposed method

The AlexNet and ResNet50 models are applied to extract the features from the input images. The OG-WOA technique is applied to select the relevant features from extracted features of AlexNet–ResNet50. The overall process in satellite image classification is shown in Fig. 1.

Fig. 1
figure 1

The overall process in satellite image classification

3.1 CNN models for feature extractions

In deep learning techniques, Recurrent Neural Network (RNN) and CNN are suitable to handle two-dimensional images. CNN model consists of Fully Connected Layer (FCLs), Pooling Layers (PLs), and Convolutional Layers (CLs) [22,23,24,25,26]. CNN's model shows improved performance against many classifier techniques such as Naïve Bayesian Classifier, Decision Tree, and Support Vector Machine (SVM). The CNN learns the feature representation during training and significantly reduces the time required for feature design i.e., selecting the most distinguishing features.

CNN model’s most important process is convolution and the convolution layer is an important layer that uses kernel filters on input during forwarding propagation. Each convolution layer is assigned with random kernel weights and updated at each iteration from the loss function of network training. Some types of patterns in final learned kernels is present in input images.

Figure 2 provides three steps: (i) Convolution, (ii) stack, and (iii) NonLinear Activation Function (NLAF). Considering an input matrix \(X\) and Convolutional Layer output \(O\), there exists a set of kernels \({F}_{j}\), \({\forall }_{j}\in [1,\cdots , J]\), then \(C(j)\) convolution output, as given in Eq. (1).

Fig. 2
figure 2

Convolutional neural network on the feature extraction

$$C\left(j\right)=X\otimes {F}_{j},\forall j\in [1,\dots ,j]$$
(1)

An activation map is stacked to form \(C(j)\) activation maps, as given in Eq. (2).

$$D=S(C\left(1\right),\dots ,C(J))$$
(2)

where total filter is given as \(J\), and channel direction pile operation is given as \(\mathcal{S}\).

The activation map \(D\) is applied to Nonlinear activation function and the final activation map provides output, as in Eq. (3).

$$O=NLAF(D)$$
(3)

Three important matrixes in sizes \(S\) (input, filters, and output) are given in Eq. (4).

$$S\left(x\right)=\left\{\begin{array}{cc}{V}_{I}\times {Q}_{I}\times {H}_{I}& x=X\\ {V}_{K}\times {Q}_{K}\times {H}_{K}& x={F}_{j},\forall j\in [1,\dots ,J]\\ {V}_{0}\times {Q}_{0}\times {H}_{0}& x=0\end{array}\right.$$
(4)

where activation map height, width, and channel size are denoted in three variables \((V, Q, H)\), respectively. The input, filter, and output are denoted in subscript \(I, K,\) and \(O\), as there are two equalities. The input channel \({H}_{I}\) is equal to filter channel \({H}_{K}\) is indicated as \({H}_{I}={H}_{K}\). The output channel \({H}_{O}\) equals to the number of filters \(J\) is indicated as \({H}_{O}=J\). The CNN model in feature extraction is shown in Fig. 2.

3.1.1 AlexNet

Many hidden layers are present in deep architecture and hidden layers extract features in useful ways [27,28,29,30]. Deep network image classification performance gains a high classification rate than other techniques. AlexNet is a popular model that consists of several hidden layers to extract the features. Numerous enhancements are trained in these parameters and the overall architecture is shown in Fig. 3.

Fig. 3
figure 3

AlexNet architecture in feature extraction

The first improvement is made in the activation function and classical neural network of nonlinearity, the activation function is limited to the logistic function, tanh, arctan, etc. Activation functions use gradient values to significantly improves the input for small range 0 and activation function types fall into the gradient vanishing problem. Rectified Linear Unit (ReLU) is a new activation function and is applied to overcome this problem. The RELU gradient is less than 1 if the input is not less than 0. The training process is accelerated using RELU, as in Eq. (5).

$$y={\max}(0,x)$$
(5)

Several sub-networks are present in the network and overfitting falls into each sub-network, the same loss function is shared in sub-network and this is useful to drop out some of the layers. Some of the layers are dropped to avoid overfitting in the network. Fully connected layers are applied with drop-out to improve the performance. Each iteration trains part of the neurons during dropout and joint adaptation is applied to reduce between neurons due to neurons forced to co-operate using dropout that enhance and improve generalization. Entire network output is sub-networks average and dropout is used to improve and increase the robustness.

Convolutional layers automatically extract the features and reduced them by the pooling layer. An image \(I\) consists of width \(w\) and height \(h\) and convolutional kernel \(m\) has width \(c\) and height \(b\), convolution is defined in Eq. (6).

$$C\left(h,w\right)=\left(I\times m\right)\left(h,w\right)={\sum }_{b}{\sum }_{c}I(h-b,w-c)m(b,c)$$
(6)

Convolution learns image features in the model and parameters are shared to reduce model complexity. Extracted features are reduced using pooling layers and neighboring pixels consider the pooling layer from the feature map and represent generating values. AlexNet uses Max pooling to reduce the feature map and a \(4\times 4\) block of max pooling is used in the feature map to generate a \(2\times 2\) block that contains the maximum values.

Cross-channel normalization is improved using feature generalization that belongs to a local normalization technique. These maps are normalized and apply feature maps to the next layers. Cross-channel normalization generates the same position of the sum of several adjacent maps of the same positions. Fully connected layers are used for the classification. Fully connected layers are used with the activation function of softmax, as in Eq. (7).

$$softmax{\left(x\right)}_{i}=\frac{{\exp}\left({x}_{i}\right)}{{\sum }_{j=1}^{n}{\exp}\left({x}_{j}\right) }{\mathrm{for}}\; i={0,1},2,\dots ,k$$
(7)

Softmax output is presented in a range of 0 to 1 which is the main advantage to provide neurons activation; the activation function is used for this process. AlexNet is trained using different techniques. The AlexNet model in feature extraction is shown in Fig. 3.

3.1.2 ResNet50

Residual networks with 50 layers are named ResNet50 [31,32,33,34,35]. ResNet50 consists of additional identity mapping capacity compared to VGG-16.

Delta is predicted by ResNet50 and this requires final prediction from one layer to the next. ResNet provides an alternate shortcut path for the gradient to flow through that solves the vanishing gradient problem. ResNet50 uses identity mapping that allows the model to bypass a weight layer of CNN if the current layer is not necessary. In the training set, this solves overfitting problem and ResNet50 model consists of 50 layers for feature extraction. The ResNet50 model in feature extraction is shown in Fig. 4.

Fig. 4
figure 4

ResNet50 model in feature extraction

One of the common image classification techniques is AlexNet. However, it offers the following benefits when it is also used for feature extraction. The initial values that the AlexNet features can achieve are ideal. because it has two parallel CNN lines that have been trained on two GPUs and connected crosswise to easily fit the image. In a similar manner, ResNet50 is far deeper than AlexNet and its architecture size is significantly less due to fully-connected layers. ResNet50 makes it simple to train networks with many layers without raising the training error percentage.

Furthermore, AlexNet is not as deep as ResNet50, which leads to more architectural errors. The subspace value is ideal when the ResNet50 is taken into account, but there is a possibility of overlap in the feature sub-space. The subspace error value of particular classes changes as a result while using those features throughout the training and testing stage. Additionally, ResNet50 often requires more training time, making it nearly impossible to use in real-world applications.

From this section, 4096 and 64 features are retrieved from the AlexNet and ResNet50, respectively. Optimal values from the AlexNet and ResNet50 models are gathered in order to acquire more useful features. After that, for a better representation of the object, the results from AlexNet and Res-Net50 are combined and used in feature extraction. The feature selection procedure receives these extracted features as input.

3.2 Feature selection

Once the feature extraction is done, the process of feature selection is deliberated using OG-WOA algorithm. The feature selection process is considered as a problem of global combinatorial optimization, which seeks to reduce the noisy and redundant data while producing a uniform level of classification accuracy. The current feature selection approaches tend to choose a large number of irrelevant features for classification rather than selecting the features adaptively. The Whale Optimization Method (WOA) selects the pertinent features, because which is an algorithm that learns the features adaptively. Similar to WOA, the discrete search space for WOA consists of all feasible combinations of the attributes that can be chosen from the dataset.

3.2.1 Whale optimization algorithm

Humpback whale's hunting behavior is used to inspire the Whale Optimization Algorithm (WOA) [36,37,38]. Bubble-net hunting technique is used by humpback whales to encircle and catch their prey in small fish groups. Prey position \({X}^{*}\) are the best whale position in WOA and other whales update their position based on \({X}^{*}\). Three behaviors of whales are searching for prey (exploration), bubble-net attacking (exploitation), and encircling prey, as modeled in the definitions.

Encircling Prey: Prey is surrounded in whales' hunting process, whales can detect prey position and surround them. The current best whale \({X}^{*}\) is considered as prey or close to prey. The \({X}^{*}\) is used to update the position of all other whales, as in Eqs. (8) and (9).

$$D=|C\times {X}^{*}\left(t\right)-X(t)|$$
(8)
$$X\left(t+1\right)={X}^{*}\left(t\right)-A\times D$$
(9)

where whale is \(X\left(t\right)\), the distance calculation \(D\) between the prey \({X}^{*}(t)\), and \(t\) is iteration counter. The coefficient vectors \(A\) and \(C\) are used to calculate Eqs. (10) and (11).

$$A=2\times a\times r-a(t)$$
(10)
$$C=2\times r$$
(11)

where \(a\) value is linearly reduced from 2 to 0 over the iterations and random number \(r\) in the range of [0, 1].

Bubble-net attacking: Spiral updating position or shrinking encircling technique is used to whales spin around the prey, this behavior is given in Eq. (12).

$$X\left(t+1\right)=\left\{\begin{array}{cc}{X}^{*}\left(t\right)-A\times D& {\mathrm{if }}\; p<0.5\\ {D}^{{\prime}}\times {e}^{hl}\times {\cos}\left(2\times \pi \times l\right)\times {X}^{*}(t)& {\mathrm{if}}\; p\ge 0.5\end{array}\right.$$
(12)

where spiral updating position (if \(p > 0.5\)) or shrinking encircling technique (if \(p < 0.5\)) is used for updating whales’ probability and \(p\) is a random number in [0, 1]. \(A\) is a random value is in the range of \([-a, a]\), where \(a\) linearly decreases from 2 to 0 throughout the iteration. The D′ denotes the distance between the prey \({X}^{*}\) and current whale \(X\) in spiral updating position, constant \(b\) is used to define the shape of spiral movement and \(l\) is a random number in [− 1, 1].


Searching for prey Whales performs a global search in search space to find new prey. If vector \(A\) absolute value is greater or equal to 1, this is completed. This will perform exploration or exploitation. The position of the whales is updated in the exploration phase related to random whale \({X}_{rand}\) instead of best whale \({X}^{*}\), as calculated using Eqs. (13) and (14).

$$D=\left|C\times {X}_{rand}-X\left(t\right)\right|$$
(13)
$$X\left(t+1\right)={X}_{rand}-A\times D$$
(14)

where randomly selected whale \({X}_{rand}\) is from the current population.

3.2.2 Optimal guidance

In Optimal Guidance process, the weight coefficient \((w)\) with reference to the PSO algorithm is applied to improve WOA performance by adaptively changing the weight factor of the algorithm and randomly follows the object change in whale into the best individual in the swarm. Due to this reason, the whale gathers more data to better understand its own behavior, ensuring that the group is diverse and encouraging a balance between the stages of exploration and exploitation to increase the algorithm's search efficiency. Consequently, it is possible to list every possible subset of characteristics given the limited number of features. The state \(i\) of the modified position update equation is given in Eq. (15).

$${X}^{i,iter+1}=w\left(t\right)\times {x}^{i,iter}+{r}_{i}\times {f}^{i,iter}\times \left(gbest-{x}^{i,iter}\right)$$
(15)

where the current iteration of the optimal solution is given as \(gbest\) and adaptively changes \(w(t)\) according to Eq. (16).

$$w\left(t\right)={w}_{\max}\times {\exp}\left(-\frac{{t}^{2}}{2\times {\left(\frac{ite{r}_{\max}}{40}\right)}^{2}}\right)$$
(16)

where \(w\) maximum initial value is denoted as \({w}_{\max}\). The maximum weight value of OG-WOA is mainly depends on the population size. An adaptive weight coefficient is included in the position update to enhance the exploitation, because the conventional WOA gradually loses their exploitation ability along with the iteration. Here, the value of weight coefficient is depends on the wmax, iteration and maximum iteration value. The incorporation of adaptive weight coefficient affects position updated of WOA for enhancing search efficiency while choosing the features. The \(w\) value is larger in the early stage of benefits exploration and it became smaller in a later stage for benefits of exploitation. Totally, 4160 images are extracted, out of which 2560 features are selected and processed for classification. After merging small, medium, and large-scale spatial and visual histograms, the Bi-LSTM network is finally used to classify the remote sensing scene images.

3.3 Classification

Once the feature selection is done, the features are processed for classification using Bi-LSTM. It is one of the best deep learning models for image classification which has been shown to accomplish the highest level of accuracy.

3.3.1 Bi-LSTM

A deep neural network is trained using a bi-LSTM network to classify sequence data. In Bi-LSTM, the OG-WAO selected features are used as inputs while the number of classes is the output. Since Bi-LSTMs are adept at remembering specific patterns, they perform noticeably better. The Earth observation satellites typically take a series of images from the same location. As a result, the interval between subsequent images enhances the temporal resolution (i.e., the time once it was learned). The major reason for utilizing BiLSTM is the temporal pattern of the scenes are exploited across the image time series. The input sequence is decided using BiLSTM which is referred as \(i={i}_{1},{i}_{2}\dots .{i}_{n}\) from opposite order to a forward \({\overrightarrow{f}}_{t}=\left({\overrightarrow{f}}_{1},{\overrightarrow{f}}_{2},\dots {\overrightarrow{f}}_{n}\right)\) and a backward hidden sequence \({\overleftarrow{f}}_{t}={\overleftarrow{f}}_{1},{\overleftarrow{f}}_{2},\dots {\overleftarrow{f}}_{n}\). \({v}_{t}\) is referred as encoded vector which is calculated through the accumulation of decisive forward and backward outputs \({v}_{t}=\left[{\overrightarrow{f}}_{t},{\overleftarrow{f}}_{t}\right]\).

$${\overrightarrow{f}}_{t}=\delta \left({W}_{{\overrightarrow{f}}_{i}}{i}_{t}+{W}_{\overrightarrow{f}\overrightarrow{f}}{\overrightarrow{f}}_{t-1}+{q}_{\overrightarrow{f}}\right)$$
(17)
$${\overleftarrow{f}}_{t}=\delta \left({W}_{{\overleftarrow{f}}_{i}}{i}_{t}+{W}_{\overleftarrow{f}\overleftarrow{f}}{\overleftarrow{f}}_{t-1}+{q}_{\overleftarrow{f}}\right)$$
(18)
$${v}_{t}={W}_{{v}_{\overrightarrow{f}}}{\overrightarrow{f}}_{t}+{W}_{{v}_{\overleftarrow{f}}}{\overleftarrow{f}}_{t}+{q}_{v}$$
(19)

Logistic sigmoid function is defined as \(\delta\); First hidden layer output sequence is stated as \(v={v}_{1},{v}_{2}\dots .{v}_{t},\dots {v}_{n}\).

4 Simulation setup

The implementation details of the OG-WOA technique were discussed in this section.

Dataset: The UCM dataset consists of 2100 satellite images and 21 classes. The spatial resolution is 0.3 m, each image is in size of \(256\times 256\) pixels, and 100 images are present in each category. The sample of UCM, AID and NWPU are shown in Figs. 5, 6 and 7, respectively.

Fig. 5
figure 5

Samples of UCM dataset

Fig. 6
figure 6

Samples of AID dataset

Fig. 7
figure 7

The samples of NWPU dataset

AID dataset consists of 10,000 images and 30 classes. The spatial resolution is in range of 8 m to 0.5 m, the \(220\times 420\) pixel size, and each classes consists of 220 to 420 images.

The NWPU dataset is collected from Google Earth, consists of 31,500 images and in size of \(256\times 256\) pixels. The dataset has 45 classes, 700 images in each class and spatial resolution is in size of 30 m to 0.2 m.


Metrics Accuracy, sensitivity, and specificity were measured from the output of the OG-WOA technique and its formula are given in Eqs. (2022).

$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}\times 100$$
(20)
$$Sensitivity=\frac{TP}{TP+FN}\times 100$$
(21)
$$Specificity=\frac{TN}{TN+FP}\times 100$$
(22)

Parameter settings The parameter settings of AlexNet–ResNet50 model are 20 epochs, 0.1 dropout rate, 0.1 learning rate, and Adam optimizer is used.


System requirement The proposed OG-WOA–Bi-LSTM model is implemented in the system consists of Intel i7 processor, 128 GB of RAM, and 22 GB Graphics card.

5 Results

The proposed OG-WOA–Bi-LSTM technique is evaluated in three datasets and compared with existing techniques.

5.1 AID dataset

The OG-WOA technique is compared with various feature selection and deep learning techniques on AID dataset. Table 1 represents the different Optimization Algorithm used for Feature selection to analyze accuracy, sensitivity and specificity in Bi-LSTM on AID dataset.

Table 1 Performance analysis of different optimization algorithm for feature selection on AID dataset

Feature selection techniques such as Particle Swarm Optimization (PSO), Firefly Optimization Algorithm (FOA), Gray Wolf Optimization (GWO), and Whale Optimization Algorithm (WOA) are compared with OG-WOA technique, as in Table 1 and Fig. 8. The existing optimization techniques of PSO, FOA and GWO have limitation of local optima trap and WOA has limitation of lower exploitation in the feature selection. The Optimal Guiding technique modify the position of search agent based on best fitness values that benefits exploitation in the search. From Table 1, it clearly shows that the OG-WOA technique has achieved higher performance in terms of 97.12% accuracy, 97.43% sensitivity, and 97.21% specificity.

Fig. 8
figure 8

Comparison of feature selection methods on AID dataset

Deep learning techniques such as LSTM, CNN, Recurrent Neural Network (RNN), Generative Adversarial Network (GAN) models are compared with the proposed Bi-LSTM technique on the AID dataset, as shown in Table 2 and Fig. 9. The deep learning techniques of RNN, GAN and CNN models have a limitation of overfitting problems due to the generation of features in the network. The LSTM model has efficient performance in handling the sequence of data and has a limitation of vanishing gradient problems. The proposed Bi-LSTM has 97.12% accuracy, 97.43% sensitivity, and 97.21% specificity, GAN model has 91.27% accuracy, 93.84% sensitivity, and 93.71% specificity in satellite image classification.

Table 2 Performance analysis of different deep learning methods for classification on AID dataset
Fig. 9
figure 9

Comparison of deep learning methods on AID dataset

5.2 UCM dataset

Deep learning techniques and feature selection techniques were applied to the UCM dataset and compared with OG-WOA technique. Table 3 represents the different Optimization Algorithm used for Feature selection to analyze accuracy, sensitivity and specificity in Bi-LSTM on UCM dataset.

Table 3 Performance analysis of different optimization algorithm for feature selection on UCM dataset

Feature selection techniques are applied to the UCM dataset and compared with the OG-WOA technique, as in Table 3 and Fig. 10. The PSO, FOA, and GWO have limitations of local optima trap and lower convergence in the feature selection. The WOA model has a limitation of lower exploitation and tends to lose potential solutions in the classification. The Optimal Guiding technique increases the exploitation by changing the position of the search agent based on a higher fitness value. The OG-WOA technique has 99.34% accuracy, 99.44% sensitivity, and 99.31% specificity, GWO method has 93.22% accuracy, 93.8% sensitivity, and 91.7% specificity.

Fig. 10
figure 10

Comparison of feature selection methods on UCM dataset

The deep learning techniques of CNN, GAN, RNN and LSTM models are compared with the OG-WOA technique on the UCM dataset, as shown in Table 4 and Fig. 11. The CNN model has overfitting problem due to the generation of features in the models. The LSTM model stores the relevant features in long term and has a limitation of vanishing gradient problem. The OG-WOA technique has the advantage of increasing the exploitation process and selecting relevant features for classification. From the results, Bi-LSTM technique has 99.34% accuracy, 99.44% sensitivity, and 99.31% specificity, AlexNet model has 95.89% accuracy, 95.69% sensitivity, and 94.37% specificity on the UCM dataset.

Table 4 Performance analysis of different deep learning methods for classification on UCM dataset
Fig. 11
figure 11

Comparison of deep learning methods on UCM dataset

5.3 NWPU dataset

Deep learning techniques and feature selection techniques were applied to the NWPU dataset and compared with the OG-WOA model. Table 5 represents the different Optimization Algorithm used for Feature selection to analyze accuracy, sensitivity and specificity in Bi-LSTM on NWPU dataset.

Table 5 Performance analysis of different optimization algorithm for feature selection on NWPU dataset

The feature selection techniques such as PSO, FOA, GWO, and WOA are compared with OG-WOA technique, as shown in Table 5 and Fig. 12. The PSO, FOA, and GWO techniques have limitations of local optima trap and the WOA technique has lower exploration in the search process. The Optimal Guiding technique is applied to change the position of the search agent based on best fitness values to increase the exploitation and select the relevant features for classification. The proposed Bi-LSTM has 96.73% accuracy, 97.21% sensitivity, and 97.24% specificity, Firefly has 90.08% accuracy, 90.81% sensitivity, and 88.89% specificity on satellite image classification.

Fig. 12
figure 12

Comparison of feature selection Methods on NWPU dataset

The Deep learning techniques such as RNN, GAN, CNN and LSTM models are compared with the proposed Bi-LSTM technique on NWPU dataset, as shown in Table 6 and Fig. 13. As overfitting occurs in CNN which is failed to find the difference between the categories of more similarity. LSTM model has considerable performance in the classification and has the limitation of vanishing gradient problem. The proposed Bi-LSTM has 96.73% accuracy, 97.21% sensitivity, and 97.24% specificity, ResNet50 model has 90.99% accuracy, 93.74% sensitivity, and 92.75% specificity.

Table 6 Performance analysis of different deep learning methods for classification on NWPU dataset
Fig. 13
figure 13

Comparison of deep learning methods on NWPU dataset

5.4 Comparative analysis

The existing satellite image classification models were compared with proposed OG-WOA–Bi-LSTM technique for comparison.

The OG-WOA–Bi-LSTM technique is compared with recent techniques in satellite image classification, as shown in Table 7 and Fig. 14. The existing techniques have considerable performance in satellite image classification. The KL divergent [11] technique has lower efficiency in handling the overlapping of data and lower efficiency in differentiating the images of more correlation. The vision transforms [12] technique is suffering from a vanishing gradient problem and degrades the performance of the model. The GCN [13] and EfficientNet-B3-Attn-2 [14] models have overfitting problems that degrade the efficiency of the classification. The SceneNet [15] technique has an overfitting problem due to the generation of many features in the network. The proposed OG-WOA–Bi-LSTM technique has the advantage of increasing the exploitation in the search for feature selection. The incorporation of adaptive weight coefficient helps to enhance the balance among the exploration and exploitation which leads to enhance the search efficiency during feature selection. Therefore, it has achieved the accuracy of 97.12% on AID, 99.34% on UCM, and 96.73% on NWPU, SceneNet model has an accuracy of 89.58% on AID, and 95.21 on the NWPU dataset.

Table 7 Comparative accuracy (%) analysis for satellite image classification
Fig. 14
figure 14

The comparison of proposed method in satellite image classification

6 Conclusion

Satellite image classification is required for applications in urban planning and agriculture etc. The existing techniques have limitations of overfitting problems due to the generation of many features in the CNN model. This research proposes OG-WOA technique to increase the exploitation in feature selection. The AlexNet–ResNet50 model is applied to extract the features from the input images. The OG-WOA technique is applied to select the features from the extracted features. The OG-WOA technique changes the position of the search agent related to the best fitness value to increase the exploitation. This helps to escape from the local optima trap and increases the convergence. Finally, the selected features are processed for classification using Bi-LSTM. The existing techniques have limitations of local optima trap and lower convergence in feature selection. In classification, proposed OG-WOA–Bi-LSTM has attained the higher accuracy of 97.12% on AID, 99.34% on UCM and 96.73% on NWPU, SceneNet model has accuracy of 89.58% on AID, and 95.21 on the NWPU dataset. The future scope of this method involves applying the attention technique to improve classification performance.

Availability of data and materials

1. The datasets generated during and/or analyzed during the current study are available in the [UC Merced 15 datasets] repository, http://weegee.vision.ucmerced.edu/datasets/landuse.html. 2. The datasets generated during and/or analyzed during the current study are available in the [AID datasets] repository, https://captain-whu.github.io/AID/. 3. The datasets generated during and/or analyzed during the current study are available in the [NWPU datasets] repository [https://1drv.ms/u/s!AmgKYzARBl5ca3HNaHIlzp_IXjs].

References

  1. M. Alkhelaiwi, W. Boulila, J. Ahmad, A. Koubaa, M. Driss, An efficient approach based on privacy-preserving deep learning for satellite image classification. Remote Sens. 13(11), 2221 (2021)

    Article  Google Scholar 

  2. G. Cheng, X. Sun, K. Li, L. Guo, J. Han, Perturbation-seeking generative adversarial networks: a defense framework for remote sensing image scene classification. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021)

    Google Scholar 

  3. J. Kim, M. Chi, SAFFNet: self-attention-based feature fusion network for remote sensing few-shot scene classification. Remote Sens. 13(13), 2532 (2021)

    Article  Google Scholar 

  4. B. Bučko, E. Lieskovská, K. Zábovská, M. Zábovský, Computer vision based pothole detection under challenging conditions. Sensors 22(22), 8878 (2022)

    Article  Google Scholar 

  5. A. Shakya, M. Biswas, M. Pal, Evaluating the potential of pyramid-based fusion coupled with convolutional neural network for satellite image classification. Arab. J. Geosci. 15(8), 1–22 (2022)

    Article  Google Scholar 

  6. S. Lei, Z. Shi, Hybrid-scale self-similarity exploitation for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 60, 1–10 (2021)

    Google Scholar 

  7. Y. Xu, W. Luo, A. Hu, Z. Xie, X. Xie, L. Tao, TE-SAGAN: an improved generative adversarial network for remote sensing super-resolution images. Remote Sens. 14(10), 2425 (2022)

    Article  Google Scholar 

  8. S. Qin, X. Guo, J. Sun, S. Qiao, L. Zhang, J. Yao, Q. Cheng, Y. Zhang, Landslide detection from open satellite imagery using distant domain transfer learning. Remote Sens. 13(17), 3383 (2021)

    Article  Google Scholar 

  9. X. Feng, W. Zhang, X. Su, Z. Xu, Optical Remote sensing image denoising and super-resolution reconstructing using optimized generative network in wavelet transform domain. Remote Sens. 13(9), 1858 (2021)

    Article  Google Scholar 

  10. C. Xu, G. Zhu, J. Shu, A lightweight and robust lie group-convolutional neural networks joint representation for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2021)

    Google Scholar 

  11. H. Xie, Y. Chen, P. Ghamisi, Remote sensing image scene classification via label augmentation and intra-class constraint. Remote Sens. 13(13), 2566 (2021)

    Article  Google Scholar 

  12. Y. Bazi, L. Bashmal, M.M.A. Rahhal, R.A. Dayil, N.A. Ajlan, Vision transformers for remote sensing image classification. Remote Sens. 13(3), 516 (2021)

    Article  Google Scholar 

  13. K. Xu, H. Huang, P. Deng, Y. Li, Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing. IEEE Trans. Neural Netw. Learn. Syst. 33, 5751–5765 (2021)

    Article  Google Scholar 

  14. H. Alhichri, A.S. Alswayed, Y. Bazi, N. Ammour, N.A. Alajlan, Classification of remote sensing images using EfficientNet-B3 CNN model with attention. IEEE Access 9, 14078–14094 (2021)

    Article  Google Scholar 

  15. A. Ma, Y. Wan, Y. Zhong, J. Wang, L. Zhang, SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search. J. Photogramm. Remote. Sens. 172, 171–188 (2021)

    Article  Google Scholar 

  16. R. Naushad, T. Kaur, E. Ghaderpour, Deep transfer learning for land use and land cover classification: a comparative study. Sensors 21(23), 8083 (2021)

    Article  Google Scholar 

  17. X. Tang, Q. Ma, X. Zhang, F. Liu, J. Ma, L. Jiao, Attention consistent network for remote sensing scene classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 14, 2030–2045 (2021)

    Article  Google Scholar 

  18. B. Li, Y. Guo, J. Yang, L. Wang, Y. Wang, W. An, Gated recurrent multiattention network for VHR remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2021)

    Google Scholar 

  19. Y. Li, Z. Zhu, J.G. Yu, Y. Zhang, Learning deep cross-modal embedding networks for zero-shot remote sensing image scene classification. IEEE Trans. Geosci. Remote Sens. 59(12), 10590–10603 (2021)

    Article  Google Scholar 

  20. X. Wang, S. Wang, C. Ning, H. Zhou, Enhanced feature pyramid network with deep semantic embedding for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 59(9), 7918–7932 (2021)

    Article  Google Scholar 

  21. P. Zhang, Y. Bai, D. Wang, B. Bai, Y. Li, Few-shot classification of aerial scene images via meta-learning. Remote Sens. 13(1), 108 (2021)

    Article  Google Scholar 

  22. Z. Zhang, S. Liu, Y. Zhang, W. Chen, RS-DARTS: a convolutional neural architecture search for remote sensing image scene classification. Remote Sens. 14(1), 141 (2021)

    Article  Google Scholar 

  23. D. Roy, Snatch theft detection in unconstrained surveillance videos using action attribute modelling. Pattern Recognit. Lett. 108, 56–61 (2018)

    Article  Google Scholar 

  24. R. Saini, N.K. Jha, B. Das, S. Mittal, C.K. Mohan, Ulsam: Ultra-lightweight subspace attention module for compact convolutional neural networks, in, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1627–1636) (2020).

  25. E.P. Ijjina, C.K. Mohan, Human action recognition based on recognition of linear patterns in action bank features using convolutional neural networks, in 2014 13th International Conference on Machine Learning and Applications (pp. 178–182). IEEE (2014).

  26. M. Srinivas, D. Roy, C.K. Mohan, Discriminative feature extraction from X-ray images using deep convolutional neural networks, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 917–921). IEEE (2016).

  27. J. Chen, Z. Wan, J. Zhang, W. Li, Y. Chen, Y. Li, Y. Duan, Medical image segmentation and reconstruction of prostate tumor based on 3D AlexNet. Comput. Methods Programs Biomed. 200, 105878 (2021)

    Article  Google Scholar 

  28. P. Dhar, S. Dutta, V. Mukherjee, Cross-wavelet assisted convolution neural network (AlexNet) approach for phonocardiogram signals classification. Biomed. Signal Process. Control 63, 102142 (2021)

    Article  Google Scholar 

  29. B. Bučko, K. Zábovská, M. Zábovský, Ontology as a modeling tool within model driven architecture abstraction, in 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2019, pp. 1525–1530 (2019).

  30. D. Roy, K.S.R. Murty, C.K. Mohan, Unsupervised universal attribute modeling for action recognition. IEEE Trans. Multimed. 21(7), 1672–1680 (2018)

    Article  Google Scholar 

  31. A.A. Alnuaim, M. Zakariah, C. Shashidhar, W.A. Hatamleh, H. Tarazi, P.K. Shukla, R. Ratna, Speaker gender recognition based on deep neural networks and ResNet50. Wirel. Commun. Mobile Comput. 2022, 1–13 (2022)

    Article  Google Scholar 

  32. M. Elpeltagy, H. Sallam, Automatic prediction of COVID-19 from chest images using modified ResNet50. Multimed. Tools Appl. 80(17), 26451–26463 (2021)

    Article  Google Scholar 

  33. X. Feng, X. Gao, L. Luo, A ResNet50-based method for classifying surface defects in hot-rolled strip steel. Mathematics 9(19), 2359 (2021)

    Article  Google Scholar 

  34. N. Perveen, D. Roy, C.K. Mohan, Spontaneous expression recognition using universal attribute model. IEEE Trans. Image Process. 27(11), 5575–5584 (2018)

    Article  Google Scholar 

  35. D. Roy, T. Ishizaka, C.K. Mohan, A. Fukuda, Vehicle trajectory prediction at intersections using interaction based generative adversarial networks, in 2019 IEEE Intelligent Transportation Systems Conference (ITSC) (pp. 2318–2323). IEEE (2019).

  36. S. Chakraborty, A.K. Saha, S. Sharma, S. Mirjalili, R. Chakraborty, A novel enhanced whale optimization algorithm for global optimization. Comput. Ind. Eng. 153, 107086 (2021)

    Article  Google Scholar 

  37. S. Chakraborty, A.K. Saha, R. Chakraborty, M. Saha, An enhanced whale optimization algorithm for large scale optimization problems. Knowl.-Based Syst. 233, 107543 (2021)

    Article  Google Scholar 

  38. M. Abdel-Basset, R. Mohamed, S. Mirjalili, A novel whale optimization algorithm integrated with Nelder-Mead simplex for multi-objective optimization problems. Knowl.-Based Syst. 212, 106619 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

None.

Funding

This publication was realized with support of Operational Program Integrated Infrastructure 2014–2020 of the project: Innovative Solutions for Propulsion, Power and Safety Components of Transport Vehicles, code ITMS 313011V334, co-financed by the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Contributions

The paper investigation, resources, data curation, writing—original draft preparation, writing—review and editing, and visualization were conducted by VNV. The paper conceptualization, and software were conducted by JAB. The validation, formal analysis, methodology, supervision, writing—review and editing, and funding acquisition of the version to be published were conducted by JF. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jaroslav Frnda.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vinaykumar, V.N., Babu, J.A. & Frnda, J. Optimal guidance whale optimization algorithm and hybrid deep learning networks for land use land cover classification. EURASIP J. Adv. Signal Process. 2023, 13 (2023). https://doi.org/10.1186/s13634-023-00980-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13634-023-00980-w

Keywords