This section introduces the methods used in work. Antenna selection problem formulation is introduced first, followed by a review of the relationship between coarray, sparse arrays, and minimum sidelobe level. This is followed by the proposed approach, corresponding training data generation procedures, and the CNN structure.

### 3.1 Antenna selection problem formulation

Given a ULA with *N* elements, the number of possible combinations *M* elements can be picked from the ULA is defined as [7]

$$\begin{array}{@{}rcl@{}} G={}_{N} \mathrm{C}_{M}= \frac{N!}{M!(N-M)!}. \end{array} $$

(5)

As in [10], all possible subarrays in (5) are considered as classes. Assuming that set \(\boldsymbol {\mathcal {G}}\) contains all classes in (5), and that each \(\boldsymbol {g} \in \boldsymbol {\mathcal {G}}\) is associated with *x*_{m} and *y*_{m}, for *m*=1,…,*M* in xy-plane, then the *g*th class consisting of all antenna elements in *g*th subarray can be denoted as \(\mathcal {Z}_{g}=\left \{z_{1}^{g}, z_{2}^{g}, \ldots, z_{M}^{g}\right \}\). As a result, \(\boldsymbol {\mathcal {G}}\) can be redefined as \(\boldsymbol {\mathcal {G}}=\left \{\mathcal {Z}_{1}, \mathcal {Z}_{2}, \ldots, \mathcal {Z}_{{G}}\right \}\) [10]. The assumptions above transform the antenna selection problem from a combinatorial optimization framework to a ML classification framework. Any ML or DL-based classification algorithm can be employed to classify the desired class (subarray) using appropriate metrics that characterize the best class (sparse subarray configuration in our case). However, this is possible if labels for the classes are known [11].

### 3.2 Coarray, sparse arrays, and minimum sidelobe level

In sparse array processing, it is well-known that for efficient spatial sampling purposes, an array whose coarray has no redundancy or holes is considered as a perfect array [7, 19]. Assuming no holes exist (*H*=0), a perfect array aperture can be defined as

$$\begin{array}{@{}rcl@{}} |\mathcal{Z}_{a}| = \frac{N(N-1)}{2}, \end{array} $$

(6)

where \(|\mathcal {Z}_{a}|\) is the array aperture and *N* is the number of antennas. Unfortunately, such an array does not exist for *N*>4 [20]. Alternatively, one can construct a sparse array with no holes but retains the largest possible aperture, i.e., \(|\mathcal {Z}_{a}|\) to approximate the perfect array. Consequently, several methods have been proposed in [7, 19, 20] for designing of minimum redundancy array as well as minimum hole arrays. Here, the former is the array that minimizes (*R*|*H*=0,*N*=*c**o**n**s**t**a**n**t*) for a given *N* elements, whereas the latter minimizes holes in the coarray.

These arrays are attractive due to their good beampattern properties—PSLs and narrow main lobe [7]. For instance, Fig. 2 compares the beampattern responses of ULA, a conventional DL proposed in [10], PSL-constrained array proposed in [8], and sparse array with a hole-free difference coarray. It can be observed that the beampattern response of conventional DL shows high PSLs as compared to that of PSL-constrained and sparse array with a hole-free difference coarray which shows well suppressed PSLs. Although the relationship between the two concepts is not directly proved here, the connection was thoroughly investigated in [7, 19–21], and sparse array with hole-free coarray or with minimum redundancy and minimum holes were recommended as the best solution for array thinning because of their narrow main lobe and minimum PSLs. Hence, inspired by beampattern properties of MRA and MHA as well as the work in [8], we aim to impose a hole-free constraining term on the solution set \(\boldsymbol {\mathcal {G}}\) which is used to create the training dataset in a bid to improve the performance DL-based antenna selection technique.

### 3.3 Proposed DL-based antenna selection approach

Motivated by beampattern properties of MHAs and MRAs [7], we extend and enhance the DL-based antenna selection technique proposed in [10]. By taking advantage of essential sensor properties of the array as introduced in [19, 20], we constrain the subarrays in the solution set (5) to retain a hole-free difference coarray as a means of enforcing sensor distribution within subarrays that form the feature space.

The idea is given a solution set \(\boldsymbol {\mathcal {G}}\) which consists of subarrays as possible solutions to a _{N}C_{M} antenna selection problem. We intend to use all \(\boldsymbol {g} \in \boldsymbol {\mathcal {G}}\) which retains a hole-free difference coarray only to generate the training dataset and discard the rest. As a result, we implement a basic search algorithm to search through \(\boldsymbol {\mathcal {G}}\) and reserve all \(\boldsymbol {g} \in \boldsymbol {\mathcal {G}}\) with a hole-free difference array, i.e., \({\mathcal {Q}_{g}} = \mathcal {Q}_{ULA}\) and omit those without, i.e., \({\mathcal {Q}_{g}}\ne \mathcal {Q}_{ULA}\) where \({\mathcal Q}_{g} \) and \({\mathcal {Q}_{ULA}} \) are difference coarrays of a *M*- sensor subarray \(\boldsymbol {g} \in \boldsymbol {\mathcal {G}}\) and *N*- sensor ULA respectively. The steps above are summarized in Algorithm 1.

In other words, the difference coarray as a constraint on every \(\boldsymbol {g} \in \boldsymbol {\mathcal {G}}\) can be expressed in terms of *e**s**s**e**n**t**i**a**l* *p**r**o**p**e**r**t**y* *o**f**sensors* of an array. For multiple sensors failure or omission, the essential property states that

###
**Property 1**

(*k*-essential property [19]) \(\mathcal {S} \subset \mathcal {Z}\) is said to be *k*-essential when (1) \(|\mathcal {S}|=k\), and (2) the difference coarray changes when \(\mathcal {S}\) is removed from \(\mathcal {Z}\) i.e. \(\hat {\mathcal {Q}} \ne \mathcal {Q}\) where \(\hat {\mathcal {Q}} \) and \({\mathcal {Q}} \) are difference coarrays of \( \mathcal {Z}\backslash \mathcal {S}\) and \( {\mathcal {Z}} \) respectively.

This entails that *N*−*M* sensors, which are not essential for the preservation of the *N*-sensor array’s difference coarray, can be discarded out *N*-sensor array without changing the array aperture and difference coarray. Therefore, we can reformulate the property 1 and define it with respect to the antenna selection problem as follows

###
**Property 2**

Let \(\mathcal {Q}\) be the difference coarray of a physical subarray \(\mathcal {Z}_{g}\) such that \(\mathcal {Z}_{g} \triangleq \boldsymbol {g} \in \boldsymbol {\mathcal {G}}\). If \(\boldsymbol {\mathcal {G}}\) consists of all possible subarrays as solutions to an (*N*,*M*) antenna selection problem, then for all \(\boldsymbol {g} \in \boldsymbol {\mathcal {G}}, \boldsymbol {g}\) is essential with respect to \(\mathcal {Z}_{ULA}\) if the difference coarray of the large array \(\mathcal {Z}_{N}\) changes when *g* is removed, that is, if \({\mathcal {Z}_{g}}=\mathcal {Z}_{ULA} \backslash \boldsymbol {g}\), then \(\grave {\mathcal {Q}} \ne \mathcal {Q}\) where \(\grave {\mathcal Q} \) and \({\mathcal {Q}} \) are difference coarrays of \(\mathcal {Z}_{g}\) and \( {\mathcal {Z}}_{ULA} \) respectively.

Note that the use of hole-free subarrays will not only assist in the realization of sparser subarrays with the well-distributed sensors but also sparse arrays with improved beampattern characteristics [7]. As a result, instead of using (5) as in [10] when preparing the training dataset, we resolve to use \(\boldsymbol {\mathcal {L}}\), output from Algorithm 1. Henceforth, for clarity sake, we refer to the implementation using \(\boldsymbol {\mathcal {L}}\) as the proposed method and the one using \(\boldsymbol {\mathcal {G}}\) or a portion of (5) as conventional method [10].

### 3.4 Training dataset generation for antenna selection problem

In this section, we consider training dataset generation–input data samples and corresponding labels or ground truths. Basically, the feature space is comprised of angle, real and imaginary components of a sample covariance matrix \(\boldsymbol {\hat {R}}\). Thus, the input data is *N*×*N*×3 real-valued matrices \(\{ \boldsymbol {H}\}_{i=1}^{3}\) whose (*i*,*j*)−th entry consists of \([\boldsymbol {H}_{1}]_{i,j}=\angle [\boldsymbol {\hat {R}}]_{i,j}, [\boldsymbol {H}_{2}]_{i,j}=\mathbb {R}\mathrm {e} [\boldsymbol {\hat {R}}]_{i,j}\) and \([\boldsymbol {H}_{3}]_{i,j}=\mathbb {I}\mathrm {m} [\boldsymbol {\hat {R}}]_{i,j}\) denoting the phase, real and imaginary components of sample covariance matrix \(\boldsymbol {\hat {R}}\) [10].

To generate input-out training dataset pairs, we need to determine subarrays with the best performance within the solution set \(\boldsymbol {\mathcal {L}}\) to act as ground truths or labels. For simplicity, like [10], we assume the CRB as a benchmark of determining the best array configurations. Therefore, we assume that the received signal vector at *l*th subarray with *M* elements is defined as

$$\begin{array}{@{}rcl@{}} \boldsymbol{x}_{l}(t) = \boldsymbol{A}_{l} \boldsymbol{s}_{l}(t) + \boldsymbol{n}_{l}(t), \end{array} $$

(7)

where *A*_{l} is the subarray steering matrix, *s*_{l}(*t*) denotes the signal vector and *n*_{l}(*t*) is the noise vector corresponding to the *l*th subarray position set \(\mathcal {Z}_{l}\) at the *t*th snapshot. Like (2), we assume that *s*_{l}(*t*) and *n*_{l}(*t*) are spatially and temporarily uncorrelated [14, 16]. Furthermore, we assume constant signal variance \(\sigma _{s}^{2}\) and noise variance \(\sigma _{n}^{2}\). Hence, the signal-to-noise ratio (SNR) in dB is expressed as \(\text {SNR}=10~\text {log}_{10} \left ({\sigma _{s}^{2}}/{\sigma _{n}^{2}}\right)\).

Therefore, following assumptions in [15], the CRB_{θ} for every \(\boldsymbol {l} \in \boldsymbol {\mathcal {L}}\) can be expressed as

$$\begin{array}{@{}rcl@{}} \mathcal{C}(\theta, \mathcal{Z}_{l})=\frac{\sigma_{n}^{2}}{2T}{ \left[ \Re \left\{ \left(\boldsymbol{B}^{H} \boldsymbol{P}^{\bot}_{A} \boldsymbol{B} \right) \odot {\left(\boldsymbol{R}_{s} {\boldsymbol{A}}^{H}_{l} {\boldsymbol{R}}_{l}^{-1} \boldsymbol{A}_{l} \boldsymbol{R}_{s} \right)}^{T} \right\} \right]^{-1}}, \end{array} $$

(8)

where \(\boldsymbol {P}^{\bot }_{A} = \boldsymbol {I}-\boldsymbol {A}_{l}\left (\boldsymbol {A}_{l}^{H}\boldsymbol {A}_{l}\right)^{-1}\boldsymbol {A}^{H}_{l}\) is the orthogonal projection onto the null space of \(\boldsymbol {A}_{l}^{H}, \boldsymbol {B}=\left [ \boldsymbol {b}(\theta _{1}), \boldsymbol {b}(\theta _{2}), \ldots, \boldsymbol {b}(\theta _{D})\right ] \) such that \( \boldsymbol {b}(\theta _{i}) =\frac {\partial }{\partial \theta _{i}} \boldsymbol {A}_{l}(\theta _{i})~~~\text {for}~ i=1,2, \ldots,D\) and

$$\begin{array}{@{}rcl@{}} {\boldsymbol{R}_{l}}=E \left[ {\boldsymbol{x}_{l}(t)} {\boldsymbol{x}_{l}^{H}(t)} \right] ={\boldsymbol{A}_{l} \boldsymbol{R}_{s} \boldsymbol{A}^{H}_{l}+\sigma^{2} \boldsymbol{I}_{M}}. \end{array} $$

(9)

Next, for various DOAs, we construct sample covariance matrices *R*_{l} for *l*=1,2,…,*L* and compute CRBs for all \(\boldsymbol {l} \in \boldsymbol {\mathcal {L}}\). Then, subarrays with lowest CRBs for various DOAs selected and save into \(\boldsymbol {\mathcal {W}}\). Here, \(\boldsymbol {w}_{i} \in \boldsymbol {\mathcal {W}}\) for *i*=1,2,…,*W* represents class labels such that *w* is defined as

$$\begin{array}{@{}rcl@{}} \boldsymbol{w}= \underset{l=1, 2, \ldots,L}{\text{argmin}} \mathcal{C} (\theta_{d}, \mathcal{Z}_{l}). \end{array} $$

(10)

Following (10) and realization of \(\boldsymbol {\mathcal {W}}\), we construct input-output data pairs as (*H*,*w*) where *H* is the real-value input data obtained from the covariance matrix and \(\boldsymbol {w} \in \boldsymbol {\mathcal {W}}\) is the label representing the best subarrays sensor positions for the sample covariance matrix \(\boldsymbol {\hat {R}}\) [10]. The above training dataset generation procedures are summarized in Algorithm 2.

### 3.5 Convolutional neural network architecture

In this work, we adopt a general CNN structure consisting of 9 sections as in [10]. In general terms, the first layer (1st layer) accepts the 2D input and the last output layer (9th layer) is a classification layer with *l* units where a softmax function is used to obtain the probability distribution of the classes [22]. The second (2nd layer) and the fourth (4th layer) layers are max-pooling layers with 2×2 kernel to reduce the dimension whereas the third (3rd layer) and the fifth (5th layer) layers are convolutional layers with 64 filters of size 3×3.

Finally, the seventh (7th layer) and the eighth (8th layer) layers are fully connected layers with 1024 units. Note that the rectified linear units (ReLU) are used after each convolutional and fully connected layers such that ReLU(*x*)=max(*x*,0) [11]. Furthermore, during the training phase, 90*%* and 10*%* of the data are allocated for training and validation purposes, respectively. The stochastic gradient descent with momentum (SGD) is used with a learning rate of 0.03 and a mini-batch of 500 for 50 epochs [10].