EURASIP Journal on Applied Signal Processing 2005:1, 4–12 c ○ 2005 Hindawi Publishing Corporation Determining the Number of Discrete Alphabet Sources from Sensor Data

Determining the number of sources in a received wave field is a well-known and a well-investigated problem. In this problem, the number of sources impinging on an array of sensors is to be estimated. The common approach for solving this problem is to use an information theoretic criterion like the minimum description length (MDL) or the Akaike information criterion. Under the assumption that the transmitted signals are Gaussian, the MDL estimator takes both a simple and an intuitive form. Therefore this estimator is commonly used even when the signals are known to be non-Gaussian communication signals. However, its ability to resolve signals (resolution capacity) is limited by the number of sensors minus one. In this paper, we study the MDL estimator that is based on the correct, non-Gaussian signal distribution of digital signals. We show that this approach leads to both improved performance and improved resolution capacity, that is, the number of signals that can be detected by the resulting MDL processor is larger than the number of array sensors. In addition, a novel asymptotic performance analysis, which can be used to predict the performance of the MDL estimator analytically, is presented. Simulation results support the theoretical conclusions.


INTRODUCTION
One of the fundamental problems in the field of array processing is to determine the number of signals that impinge on a passive sensors array. The problem arises in several fields of array processing such as radar, and seismology (see, e.g., [1,2,3,4,5,6]). This question received considerable interest not only because of its nature, but also because most algorithms for direction-of-arrival (DOA) estimation assume a prior knowledge of the number of signals. Moreover, with the advent of new communication systems and the growing lack of frequency spectrum, efficient techniques of communication are being developed. One of these approaches is to use an antenna array to generate a narrow beam focused at the receiver. This, in turn, enables us to reuse the frequency spectrum for communication with other receivers. The method is being developed for cellular communication where the users move and the cells should sense the advent of new users. New users are detected by monitoring the change in the number of users that the cell senses. The ability to detect the number of transmitters and their location is also required in spectrum monitoring and control applications.
The most common approach for estimating this number is to apply information theoretic criteria like the minimum description length (MDL) or the Akaike information criterion (AIC) [7]. Since 1985 [3], when a Gaussian version of the MDL was first suggested for estimating the number of narrowband sources impinging on an array of sensors, this MDL-based estimator became a standard tool for accomplishing this task.
for the received signal vector is x(t) = As(t) + n(t), (1) where A = [a 1 , a 2 , . . . , a q ] is a p × q matrix composed of q pdimensional vectors, a 1 , . . . , a q . A is referred to as the steering matrix. s(t) = [s1(t) · · · s q (t)] T is a q×1 signals vector and n(t) is a p × 1 vector of the additive noise. Many problems may be formulated using this simple, linear model (see [8,9] and references therein). These problems differ in the structure of the mixing matrix, A, in the assumed knowledge about the unknown parameters, or in the statistical modeling of both the signal and the noise. For example, n and/or s can be assumed to be Gaussian or non-Gaussian; the additive noise correlation matrix can be assumed to be white or not; or the mixing matrix A can be assumed to be of full rank or not.
In this paper we are concerned with the problem of estimating q when the sources are digital signals. Denote by D = [d 1 , . . . , d |D| ] an arbitrary signal constellation, where d i ∈ C, and |D| is the size of the constellation. In the sequel, we make the following assumptions.
(1) The additive noise is zero mean, both spatially and temporal white, complex Gaussian random process, with correlation matrix σ 2 n I. (2) The signal vector is uniformly distributed over D q , independent from snapshot to snapshot. (3) The additive noise and the signal vector are independent.
Based on these assumptions, the probability distribution function (pdf) of the received signal is where CN (x, R) = (1/π|R|)e −x H R −1 x , and H denotes the complex transpose. We note that, opposed to the common problem formulation [3], we do not assume that A is a fullrank matrix. As will be shown in the sequel, the full-rank requirement is not necessary for identifiability when the signals are digital signals. Note that our model, (1), is also used to represent the reception of any multiple access system, for example, codedivision-multiple-access (CDMA) systems transmitting over flat fading channels [10]. Our problem is to estimate the number of sources, q, given N independent snapshots of the array output, x(t 1 ), . . . , x(t N ), described by the pdf of (2).

The MDL approach
The information theoretic criteria approach is a general one for choosing a model that fits the data mostly from a family of possible models [7,11]. That is, given a parameterized family of probability densities, f X (X|θ (q) ), θ (q) ∈ Θ q , for various q, selectq such that where L(θ (q) ) = log f X (X|θ (q) ) is the log likelihood of the measurements, denoted by X = [x 1 , . . . , x N ], P(q) is a general penalty function associated with the qth family, andθ is the maximum likelihood estimate of the unknown parameters, given the qth family of distributions.
The MDL estimator is a special case of (3) with a certain penalty function. It is given by minimizing the MDL metric [7], that is, where )) + 0.5|Θ q | log(N)}, and |Θ q | is the minimum number of free parameters in Θ q . It is well known that asymptotically, under certain regularity conditions, the MDL estimator minimizes the description length (measured in bits) of both the measurements, X, and the model,θ (q) [7]. Although in many problems associated with array processing, for example, direction-of-arrival (DOA) estimation, one has some prior knowledge on the signals' statistical properties or on the array geometry, when the number of sources is estimated, this prior knowledge is usually ignored. The reason for this is that by assuming Gaussian sources and regardless of the array geometry, the resulting MDL estimator (4), termed the Gaussian minimum description length (GMDL) estimator, has a simple closed-form expression given bŷ where l 1 ≥ l 2 ≥ · · · ≥ l p are the eigenvalues of the empirical received signal's correlation matrix,R = (1/N) x i x H i . It is well known that the GMDL estimator is a consistent estimator of the number of sources [12].
In our problem the unknown parameters are the matrix A and the noise level. Since we do not restrict our attention to full-rank mixing matrices, when the number of sources is assumed to be q, the number of unknown parameters is 2pq + 1. We denote byÂ q and σ 2 nq the MLEs of the unknown parameters assuming q sources. The MDL estimator (4) for our problem (2) becomeŝ The reader can already spot an important difference between the commonly used GMDL (5) and the exact MDL solution (6). While the GMDL can detect up to p − 1 sources, the MDL estimator can detect any number of sources. This property of the MDL estimator will be discussed in Section 2. One should note that in practice, although the MDL can detect any number of sources, this number will be limited by the overall system resources, for example, the number of distinct receivers.

Paper organization
The paper is organized as follows. Section 2 discusses the resolution capacity of this problem and the asymptotic characteristics of the MDL estimator. Section 3 is devoted to numerical study of the performance of the MDL estimator; whereas Section 4 is devoted to analytical study of the performance of the MDL estimator. Section 5 provides a summary and some concluding remarks.

Identifiability
Consider a parameterized family of probability density functions f X (x|θ), θ ∈ Θ. This family of densities is said to be identifiable if, for every θ = θ , the divergence between f X (x|θ) and f X (x|θ ) is greater than zero, that is, f log( f /g)dx is the Kullback-Leibler divergence between f (x) and g(x). This condition ensures that there is a one-toone relationship between the parameter space and the statistical properties of the measurements.
The model order selection problem [13] discussed in Section 1.1 is unidentifiable if it is possible to find, for some k = l, two points in the parameter space, θ k ∈ Θ k and θ l ∈ Θ l , such that f (·|θ k ) = f (·|θ l ). If this is the case, one cannot distinguish between k sources and l sources because the statistical distribution of the received measurements is insensitive to whether θ k or θ l was transmitted. The following theorem establishes the identifiability of our problem. Theorem 1 points to very important difference between the problem of detecting Gaussian sources and detecting digital sources. For Gaussian sources, the problem is identifiable only when the number of sources is smaller than the number of sensors and the mixing matrix is of full rank. For the case of digital sources, on the other hand, the problem is identifiable for every number of sources and for every mixing matrix A, whether full rank or not. This fact has a significant importance in communication systems since we can estimate the number of users utilizing a given channel (say using CDMA scheme) with only one antenna. It should be noted that the identifibility was proved under the assumption that all the transmitters use the same constellation. This result can be extended to the case of unequal signal constellations, as long as any linear combination of the received signal is not equal to any other received signal.

Consistency of the MDL estimator
In the previous subsection it was proven that the estimation problem defined in Section 1.1 is identifiable. Once the estimation problem has been shown to be identifiable, it is possible to infer the number of sources from the measurements. However for a specific estimator, the issue of consistency must be considered. In model order selection problems, the common performance measure is the probability of error, that is, P e = P{q = q} = 1 − P c . In what follows the MDL estimator is proven to be a consistent estimator, that is lim N→∞ P e = 0.

Theorem 2. The MDL estimator is a consistent estimator of the number of sources.
See Appendix B for the proof. Both the proof of Theorem 2 and its practical implications deserve special attention. The proof of Theorem 2 is divided into two parts. In the first part it is argued that asymptotically the probability of underestimation, that is, P{q < q}, approaches zero as N → ∞; whereas in the second part it is argued that, asymptotically, the probability of overestimation, that is, P{q > q}, approaches zero as N → ∞. The first part is easily proven based on a more general theorem which appears in [12]. The second part, however, is proven using Wilks' theorem. The reader might recall that the first published consistency proof of the GMDL was based on Wilks' theorem [3]. Later, it was demonstrated that Wilks' theorem cannot be used in this problem [14], and an alternate proof, which is based on Taylor's expansion, was proposed in [11]. Since then, Taylor's expansion is the only tool used for this type of proofs. In what follows, we briefly explain why Wilks' theorem can be used in our proof.
Wilks' theorem asserts that under certain regularity conditions, the log likelihood ratio is distributed as a chi-square random variable with a certain number of degrees of freedom. The main regularity condition in Wilks' theorem is that the unknown parameters under the null hypothesis are not on the border of the parameter space of the unknown parameters under the alternative hypothesis. Simple example will clarify this point. Consider a random variable x. Assume that under the null hypothesis x ∼ N (0, 1), and under the alternate hypothesis x ∼ N (θ, 1), θ ∈ Θ. Now, if Θ = (−∞, ∞), the null hypothesis is, with some abuse of notation, "inside" the parameter space of the alternate hypothesis. However, if Θ = (0, ∞), the null hypothesis is, with some abuse of notation, "on the boarder" of the parameter space of the alternate hypothesis (the alternate hypothesis does not "wrap" the null hypothesis as required).
Recall that for Gaussian signals it is required that the matrix AR S A H be a rank-q matrix. Therefore, assuming that we have q sources, the unknown parameters are [λ 1 , . . . , where σ 2 n is the unknown noise level, λ 1 > 0, . . . , λ q > 0 are the eigenvalues of the matrix AR S A H , and v 1 , . . . , v q are the eigenvectors corresponding to these eigenvalues. It is easily proven that the parameter space that corresponds to q sources lies on the border of the parameter space that corresponds to q + 1 sources, and not inside it. The intuition for this is very simple. The same way the point 0 is on the boarder of the parameter space (0, T), the points , which compose the parameter space corresponding to q sources, are on the border of the , which compose the parameter space corresponding to q + 1 sources. The regularity conditions in Wilks' theorem require that the parameter space corresponding to q sources be inside the parameter space corresponding to q + 1 sources and not on its boarder. Since in the Gaussian case this condition does not hold, Wilks' theorem cannot be used.
Consider the case of digital signals. From Theorem 1 it is clear that assuming q sources, the parameter space is [a 11 , . . . , a p1 , a 21 , . . . , a pq , σ 2 n > 0], where a i j ∈ C for every 1 ≤ i ≤ p, 1 ≤ j ≤ q. Now, the same way the point 0 is inside the parameter space [−T, T], the points [a 11 , . . . , a pq , 0, . . . , 0, σ 2 n > 0], which compose the parameter space corresponding to q sources, are inside the points [a 11 , . . . , a p(q+1) , σ 2 n > 0], which compose the parameter space corresponding to q + 1 sources. Therefore, Wilks' theorem can be used in our problem for proving the consistency of the MDL estimator.
Theorem 1 demonstrates that one can estimate the number of digital sources even when the number of receivers is one. The importance of Theorem 2 lies in demonstrating that indeed one can estimate the number of sources (transmitters) with diminishing probability of error even with one receiving antenna. This is of special importance in CDMA systems that use blind multiuser detection. In these systems it is assumed that the number of users is smaller than the processing gain. However, the MDL estimator proposed in this paper can be used to estimate any number of users and hence can be used to increase the range of operation of blind multiuser detectors.

SIMULATION STUDY
In what follows we compare the performance of the GMDL estimator (5) and the MDL estimator (6), which exploits the special structure of the transmitted signals. We compare the performance of the two processors when either the number of sensors is larger than the number of sources or the number of sensors is smaller than the number of sources. Note that in the latter, the GMDL cannot be used and the MDL estimator is the only existing option. For simplicity we use BPSK signals; hence D = {±1}. In addition, in the MDL simulation, we calculated the MDL metric for k = 0, . . . , q + 2, where q is the true number of sources in the scenario. The reason is that the probability to overestimate the number of true sources by more than one is negligible.
It is obvious that the GMDL estimator is much more simple than the MDL estimator. The complexity of the GMDL estimator is O(p 3 ), which is the complexity of computing the eigenvalues. However, the complexity of the MDL estimator grows exponentially with p. In order to have a fair compression, assume that q = p. In this case the complexity of the MDL estimator is O(ND p ). The actual complexity of the MDL estimator is even larger because of the lack of closed form expressions for the unknown parameters. In order to overcome this complexity issue, in the simulations, we used the EM algorithm for computing the MLEs of the unknown parameters [15].
First we consider the performance of the MDL and GMDL estimators in the presence of one source. Consider a uniform linear array (ULA) with three sensors and one transmitter located at the array boresight. In the first simulation we study the performance of the GMDL and MDL estimators as a function of the number of snapshots, whereas in the second simulation we examine the performance of the GMDL and the MDL estimators as a function of the source signalto-noise ratio (SNR). Figures 1 and 2 depict the probability of correct decision as a function of the number of snapshots and as a function of the source SNR, respectively. For the first simulation the SNR per element was set to −6 dB, and for the second simulation the number of snapshots was set to 60. For each point in the graph, 2000 Monte Carlo runs were made.
In the graphs, we see the clear advantage of the MDL over the GMDL estimator. The GMDL requires about 2 dB additional power or about 3 times the number of snapshots in order to achieve the same performance as the MDL estimator that exploits the special signal structure. The uniform performance improvement of the MDL over the GMDL should not come as a surprise. In [16] it was demonstrated that by using additional a priori information, one can improve the estimator performance considerably.
In the next set of simulations we examine the case of detecting more than one source. Figure 3 depicts the probability of correct detection of two sources as a function of their spatial separation. For the first scenario, one source is located at angle θ = 0 • and the second at angle ρ where ρ is varied between 0 • and 30 • . We consider two scenarios. In the first, the sources' SNR was set to −3 dB and the number of snapshots was set to 50, while in the second, the sources' SNR was set to 0 dB and the number of snapshots was set to 100. Here again the MDL demonstrates uniform performance improvement over the GMDL. More surprisingly, however, is that with a sufficient SNR and or number  of snapshots, the MDL can perfectly resolve two sources of no spatial separation. This is in agreement with Theorems 1 and 2 where it was shown that the MDL estimator can resolve any number of sources with only one receiver. The MDL estimator, therefore, uses the special signal structure, and not spatial diversity, to resolve multiple sources. Thus, the MDL estimator can resolve multiple sources even without any spatial separation between them, as Figure 3 demonstrates. As already mentioned, one of the main advantages of the MDL estimator is its ability to operate even in the presence of more sources than sensors. In the next and final simulations, we explore this feature. We consider one receiving antenna and two equal-power sources, transmitting with 0-dB SNR. Figure 4 depicts the probability of a correct decision as a function of the number of snapshots for the MDL estimator only (since the GMDL estimator is invalid in this scenario). It is evident from the graph that the MDL estimator is a consistent estimator. Moreover, only a limited number of snapshots are required for achieving reliable probability of detection.

PERFORMANCE ANALYSIS OF THE MDL ESTIMATOR
The probability of error of the MDL estimator is an important figure of merit, and usually extensive simulations are carried out in order to study its performance. It is desired to have a simple tool for predicting the performance of the MDL estimator which can be used not only to study it but also to design a penalty function in (3) that meets some required performance. In what follows we provide such simple tool for predicting the performance of the MDL estimator. Denote by P M the probability of miss (the probability of underestimation), that is, P M = P(q < q), and P FA the probability of false alarm (the probability of overestimation), that is, P FA = P(q > q). The probability of error, denoted by P e , is the sum of the probability of miss and the probability of false alarm, that is, P e = P M + P FA . The probability of miss can be approximated by P M ≈ P(MDL(q − 1) < MDL(q)) while the probability of false alarm can be approximated by P FA = P(MDL(q + 1) < MDL(q)). These approximations are due to the fact that the MDL function, MDL(q), is a convex function with a single minimum [12].
The following two lemmas establish asymptotic approximations for P M and P FA , respectively. Lemma 1. Asymptotically, for large N, where F χ 2 2p (·) is the cumulative distribution function of the chisquare random variable with 2p degrees of freedom.
See (B.2) in the proof of Theorem 2 for the proof. Lemma 2. Asymptotically, for large N, where where D( f g) is the divergence between the two distributions.
See [12,Theorem 1] for the proof. These lemmas provide a simple tool to approximate the performance of the MDL estimator. The approximation improves with the number of snapshots or the SNR.  In order to validate our analysis we consider the following example. Assume a ULA with three sensors and one source that transmits a BPSK signal at the array boresight. Figure 5 depicts the probability of detection P c = 1 − P e , which has been derived empirically by simulations, and the probability of detection predicted by the two lemmas, as a function of the SNR. The number of snapshots was set to N = 60, and 2000 Monte Carlo runs were made. From the figure, one sees that even under nonasymptotic conditions (N = 60), our theoretical analysis predicts the probability of detection quite well, as the results of the theoretical analysis differ from the results of the simulation by no more than 1 dB. This demonstrates not only the validity of our analysis, but also its applicability as a synthesis tool.

SUMMARY AND CONCLUDING REMARKS
In this paper, we investigated the problem of estimating the number of communication sources impinging an array of sensors. We proved that the resolution capacity is not limited by the number of sensors and that one can estimate any number of sources. This is in contrast to the usual paradigm, which assumes that the number of resolvable sources is smaller than the number of sensors. We also proved that the MDL estimator is a consistent estimator of the number of sources.
The performance of the MDL estimator is shown to be uniformly superior over the commonly used GMDL estimator. In two important scenarios, namely, of more sources than sensors and of more than one source, the MDL estimator significantly outperforms the GMDL estimator.
When more sources than sensors are to be detected, one cannot use the GMDL estimator and the MDL is the only estimator that can solve the problem. When two or more sources exist, the MDL estimator exploits both the signal structure and the spatial separation between the sources, which results in substantial performance improvement over the GMDL estimator.

A. PROOF OF THEOREM 1
In this appendix we prove that the problem is identifiable. In particular we prove that there is a one-to-one correspondent between the measurements' statistical distribution and the spatial scenario. We assume, without loss of generality, that BPSK signals are transmitted, that is, we assume that Recall that given the mixing matrix, A, and the noise level, σ 2 n , the pdf of the received signal vector is where we used the assumption that the transmitted signals are BPSK signals. Let A be a p × q complex matrix, and x a p × 1 complex vector. We next define a simple operation on the matrix A and the vector x. From A we defineĀ as a 2p × q matrix according tō and from x we define a 2p × 1 vector denoted byx as follows: where [A] i: denotes the ith row of the matrix A, [x] i denotes the ith element of the vector x, is the real part of the number, and is the imaginary part of the number. It is easy to verify that n ) can be written as follows: where δ is the Dirac delta function, and * denotes the convolution operator. Now, the moment-generating function of x, given A and σ 2 n , or, equivalently, ofx, givenĀ and σ 2 n , is where [Ā] :i denotes the ith column of the matrixĀ. Note that the first product is the Fourier transform of the normal distribution, while the second product is the Fourier transform of the convolutions of the various delta functions. We now turn to prove the theorem. It is easily seen that if m = n, σ 2 n =σ 2 n , and A=Ã up to column permutation, f (x|A, σ) = f (x|Ã,σ 2 n ). This proves the sufficiency. We now turn to prove the necessity. That is, if f (x|A, σ) = f (x|Ã,σ 2 n ), then σ 2 n =σ 2 n and A =Ã. We first prove that σ 2 n =σ 2 n , that is, there are no two different scenarios that result in the same distribution at the output of the array. Lemma 3. Let A andÃ be two arbitrary p × m and p × n matrices, respectively. In addition, let σ 2 n > 0 andσ 2 Proof. Assume that f (x|A, σ 2 n ) = f (x|Ã,σ 2 n ). Consequently, whereĀ is the result of (A.2) applied toÃ.
Assume that σ 2 n =σ 2 n , and assume, without loss of generality, that σ 2 n >σ 2 n . Dividing both sides of (A.7) by 2p i=1 eσ 2 n ω 2 results in the following equality: It is now obvious that σ 2 n =σ 2 n is a necessary condition for the equivalence between the two sides of (A.9).
We now proceed to prove that m = n is also a necessary condition for f (x|A, σ 2 n ) = f (x|Ã,σ 2 n ). Since σ 2 n =σ 2 n , (A.9) becomes Consider the left-hand side of (A.10) and assume, without loss of generality, that the first row ofĀ does not contain any zeros. In this case the maximum value of [x] 1 is q j=1 |[Ā] 1 j | α. Consequently, the coefficient of product that includes δ([x] 1 − α) is 1/2 m . Using the same reasoning, we can demonstrate that there at least the coefficient of one product on the right-hand side of (A.10) is 1/2 n . Therefore, it is necessary that n = m for the equality to hold. If every row ofĀ contains at least one zero, a somewhat more complicated approach has to be taken in order to prove that m = n [15].
The uniqueness ofĀ is now easily proven from (A.10) by noting that the delta functions having the coefficients have to be on the same point in R 2p .

B. PROOF OF THEOREM 2
In this appendix the consistency of the MDL estimator is proven. Specifically it is shown that the probability of error of the MDL estimator converges to zero as the number of snapshots increases to infinity. An error event will occur if and only if there exists k = q such that MDL(q) − MDL(k) > 0. Thus in order to prove the lemma, it suffice to prove that, for every k = q, P(MDL(q) − MDL(k) > 0) → 0.
Assume that k < q. It was previously shown that under very weak conditions, the probability of miss of every MDL estimator converges to zero as N → ∞ [12]. In particular the probability of miss of the MDL estimator considered in this paper, which satisfies the condition stated in [12], converges to zero as N → ∞. Now, assume that k > q. MDL(q) − MDL(k) can be written as follows:

MDL(q) − MDL(k)
= − log f X X|θ In what follows we are going to use Wilks' theorem to compute the asymptotic distribution of MDL(q) − MDL(k).
Consider the difference between the log likelihoods, given q and k > q sources, that is, −2 log( f X (X|θ (q) )) + 2 log( f X (X|θ (k) )). According to Wilks' theorem, under some regularity conditions, asymptotically − log( f X (X|θ (q) )) + log( f X (X|θ (k) )) is distributed as a chi-square random variable with 2p(k − q) degrees of freedom. The regularity conditions are the ones that insure that the MLE is unique, efficient, and asymptotically normal. In [17], it is shown that these regularity conditions hold in our problem.
Using the asymptotic distribution of − log( f X (X|θ We have proven that for k = q, the probability of the event P(MDL(q) − MDL(k) > 0) N→∞ − −−→ 0. Therefore, the probability of error approaches zero as well.