Hierarchical agglomerative clustering for SOMs
Due to the fact that hierarchical agglomerative clustering (HAC) can find arbitrary cluster shapes with appropriate criterion for cluster similarity, and can suit highdimensional data which are often hard to describe with parametric models, it is often preferred for SOM clustering [11, 16–18, 20, 34]. Each SOM prototype is considered as a singleton cluster and two clusters that are the most similar according to a predefined (dis)similarity criterion are merged iteratively until a predetermined number of clusters is obtained. A common approach is to use a criterion based on (Euclidean) distances between SOM prototypes, such as centroid linkage in [16, 18] and Ward’s measure in [17, 34]. Since any similarity measure solely based on the distances between SOM prototypes underutilizes available SOM knowledge such as data topology and data distribution, recent studies merges distance and density information. Brugger et al. [35] uses a recursive flooding of a Gaussian surface based on pairwise distances and receptive field sizes of SOM prototypes, resulting in partitionings similar to that of kmeans clustering. Wu and Chow [19] evaluates similarity by single linkage and density distribution at the separation boundaries using a cluster validity index in [36], producing good partitionings for datasets with wellseparated clusters.
A recent study [20] proposes CONN linkage, which is average linkage with CONN similarity based on detailed local density distribution, instead of traditional distance based similarity. CONN, originally proposed in [14] for informative SOM visualization, is a symmetric matrix, showing pairwise similarities of the SOM prototypes. Each pairwise similarity, CONN(i,j), is
\mathrm{CONN}(i,j)=\left{\mathrm{RF}}_{\mathrm{ij}}\right+\left{\mathrm{RF}}_{\mathrm{ji}}\right
(4)
with RF_{
ij
} is that portion of RF_{
i
} (receptive field of w_{
i
}) where w_{
j
} is the second BMU, and . is the cardinality of the set. Therefore, CONN(i j)not only indicates neighborhood relations of prototypes with respect to the dataset but also indicates how data samples are distributed within their receptive fields with respect to the neighboring prototypes, providing a density information more detailed than on the prototype level. Consequently, CONN linkage is shown to outperform distancebased linkages for several real datasets including a remote sensing image [20]. In addition, since CONN does not depend on SOM grid structure, it can be used as a similarity measure for prototypes obtained by any other quantization method (such as neural gas, kmeans).
Spectral clustering for SOMs
Similarly to HAC, spectral clustering (SC) can extract arbitrary shapes and can be easily implemented with high accuracies, as supported by empirical studies [24]. Contrary to HAC, SC is principally a manifold learning based on eigendecomposition of a similarity matrix, aiming at changing data representation to easily capture submanifolds (i.e., clusters). Being associated with relaxed optimization of graphcut problems, by a graph Laplacian matrix, L, various methods exist for SC [21, 22, 37]; however no clear advantage exists among them as long as a normalized L is considered [23, 38]. Referring to [23, 39] for detailed overview on different methods, we briefly explain the method in [22] utilized for this study.
Let G = (V S)be a weighted, undirected graph, nodes (V ) represent N samples (prototypes in this study) \mathcal{W}=\{{w}_{1},{w}_{2},\dots ,{w}_{N}\} to be clustered, and S, a N × Nsimilarity matrix, defines edges. A common way to construct edges is to define pairwise similarities based on the (Euclidean) distances,
s(i,j)={e}^{\frac{\left\right{w}_{i}{w}_{j}\left\right}{2{\sigma}^{2}}}
(5)
with a decaying parameter σto be determined properly, either by experimentally finding the optimum σ value [22] or by an automated setting of σ(different σ_{
i
} for each prototype w_{
i
}, changing the denominator to 2 σ_{
i
}σ_{
j
}) [28, 40, 41]. The latter is done by definingσ_{
i
} as the distance to the k th nearest neighbor ofw_{
i
}, introducing another parameter (k) to be set by the user.
Let D be the diagonal matrix denoting the degree of N nodes where d_{
i
} = ∑_{
j
}s(i j). Then the Laplacian matrix, L, is constructed in various ways depending on the approach for graphcut optimization [23, 39]. Ng et al. [22] define a normalized Laplacian matrix, L_{norm}, based on S and D,
{L}_{\text{norm}}={D}^{1/2}S{D}^{1/2}.
(6)
Then, K clusters are extracted using K eigenvectors associated with the K greatest eigenvalues, by the following algorithm [22]:

1.
Calculate the similarity matrix S (Equation (5)), its degree matrix D, and normalized Laplacian, L _{norm} (Equation (6))

2.
Find the K eigenvectors { e_{1},e_{2},…, e_{
K
}} of L_{norm}, associated with the K greatest eigenvalues {λ_{1},λ_{3},…,λ_{
K
}}

3.
Construct the N × K matrix E = [e_{1}e_{2}…e_{
K
}] and obtain N×K matrix U by normalizing the rows of E to have unit norm, i.e., {u}_{\mathrm{ij}}=\frac{{e}_{\mathrm{ij}}}{\sqrt{\sum _{k}{e}_{\mathrm{ik}}^{2}}}

4.
Cluster the N rows of U with the kmeans algorithm into K clusters.
Recently, we utilize this algorithm as an SOM clustering method [42], using similarity matrices calculated either by σ or local σ_{
i
}. This approach often outperforms HAC with the distancebased linkages or with CONN linkage, for synthetic and real datasets [42]. However, a σ or k value (to determine local σ_{
i
}), specific to the dataset, is required to be set optimally [42, 43]. Contrary to the distancebased similarity requiring userset parameters, CONN similarity can be advantageous for SC due to its construction using intrinsic data details without any parameter, its sparse nature by definition, and previous studies [20, 44] showing its outperformance. Therefore we modify the algorithm above by replacing S (Equation (5)) with CONN (Equation (4)).
Proposed method for the LPIS assessment
The proposed method aims to find the anomalies in the LPIS. Based on the SOM based spectral clustering described in previous section and the current LPIS, the method first finds a land cover mapping (with a predetermined number of clusters) in an unsupervised manner, then constructs an eligibility mask by checking whether clusters are eligible or ineligible according to the current LPIS. The difference between the resulting eligibility mask and the LPIS indicates possible anomalies in the system. A stepbystep explanation of the proposed method for the LPIS assessment is below:

1.
Set the number of neural units, N, and the number of clusters, K.

2.
Train the SOM with N units to obtain the prototypes (Section ‘Selforganizing maps’).

3.
Construct the similarity measure CONN for the SOM prototypes (Equation (4)).

4.
Obtain K clusters of the SOM prototypes by spectral clustering with CONN (Section ‘Spectral clustering for SOMs’).

5.
Assign the cluster label of prototypes to the data samples in their corresponding receptive fields.

6.
Use current LPIS (eligibleineligible) to determine eligibility of each cluster, i.e., if the majority of the data samples in a cluster is eligible in LPIS, then that cluster is eligible, and vice versa.

7.
Determine the areas where the resulting eligibility mask and LPIS have different labels.
The proposed method is automated, given that N and k are known a priori. Since the SOM is used as an intermediate quantization of the remote sensing images, and k can be determined from the LPIS, setting of N and k is not a limitation for this study.