In this section, we derive a new approximation on BER. The purpose of developing this approximation is to estimate BER with an expression which is a computationally simple function of the equalizer weights. This paves the way towards real-time channel equalization.
In order to derive a bound on BER, one can note that function
tends rapidly to zero for negative arguments. As a result, the terms in the summation can differ in several magnitudes. This gives rise to the idea of collecting only the dominant terms to provide a lower bound on BER. This lower bound has been commonly used in other domains, such as reliability analysis and referred to as Li-Silvester bound [17], where the tiresome calculation of an expected value over a large state space is approximated by only using the dominant terms in the summation.
In our case, this bound can be obtained as follows. We may look upon (8) as an expected value, given as follows:
Introducing the following notations, one can obtain
Let us then separate the set
into two disjoint subsets
and
(
,
). Let the number of elements in
be
, containing the first
vectors belonging to the first
largest values of
, for which
From the properties of
it follows that
. This gives rise to the following bounds on BER:
yielding
where
denotes the cardinality of
.
Remark 4.
In this case,
is subject to uniform distribution, thus the left hand side of (15) can be very tight if
is chosen reasonably high. At the same time, the upper bound tends to be loose as the terms
in
have rather inaccurate upperbounds. In Section 4.2, we will evaluate the tightness of the bound based on the
th dominant sample.
The most "harmful" sequence, denoted by
in
(i.e., the sequences beginning with
) is
since
, which indicates
to be the absolute dominant term in the summation equation (12).
In order to determine the dominant terms that form the set
, let us introduce the following notation: let
be an index array pointing to the different elements of
, where
Note that
points to the
th smallest element of
in absolute value. The extension of index array
for
will be given in Section 4.3. The second dominant term
can be deduced from
by changing the sign of the component
, because in this case
, where
is the smallest possible value for decreasing the PD.
Applying the same reasoning, the first
largest terms can be given as follows:
-
(1)
;
-
(2)
and change the sign of the component
;
-
(3)
and change the sign of the component of
;
;
-
(4)
FOR
TO
find the index set
for which
is minimal, but
and
;
-
(5)
Form the set
to be used in the lower bound given in (16).
It is easy to see that the case of
(when
) results in a cost function which has minimum value over the same coefficient vector as has the peak distortion. Increasing the value of
, the lower bound in (16) tends to the exact
and finally the case of
and
results in the exact minimum of BER.
We generally can derive an algorithm which identifies the dominant terms for any arbitrary
where
. The following procedure results in the first 4 dominant terms, which seems practically to be a good compromise between
(yielding the PD criterion) and
(for further details, see Section 4.2):
-
(1)
;
-
(2)
and change the sign of the component
;
;
-
(3)
and change the sign of the component
;
;
-
(4)
If
then
and change the sign of the components
and
;
; ELSE
and change the sign of the component
;
;
-
(5)
Form the set
to be used in the lower bound (16).
Unfortunately, calculating the set
to find the
largest term is of exponential complexity, as
must be calculated and arranged in monotone order for all possible
.
4.1. Optimization of the Bound
The gradient
of the lower bound in (16) is a truncated version of the gradient of the true BER (10), obtained by carrying out the summation over
instead of
:
Using the gradient, the following adaptive algorithm can be used for weight optimization
Of course, one can use variable step size
in algorithm (19) to improve the speed of convergence. For example, the Armijo rule [15] can be applied to speed up the convergence. However, simulations showed no improvement by applying this rule. Another problem with this method is its high complexity (the gradient has to be evaluated several times). On the other hand, we may introduce a heuristically chosen step size, such as
. The convergence of (19) by using this step size this algorithm is guaranteed by the Kushner-Clark theorem (for more details, see [16]). In the simulation section, the improvement of convergence achieved by variable step size method is also illustrated.
4.2. A Numerical Example for Calculating the New Bound
For the sake of better understanding of the algorithm developed for finding the dominant terms, a numerical example will be given as follows. Let us take the following overall impulse response function
where
. At first, the index array
will be defined containing the indices of the smallest, second smallest, and so forth. elements of
(omitting
), respectively:
The information sequence resulting in maximum distortion, can be calculated as
yielding
The information sequence
can be derived from
changing the sign of the element
:
resulting in
and
comes from
by changing the sign of the element 
yielding
To determine
, one can evaluate the sum
. Since it is smaller than
,
can be derived from
by changing the sign of the elements
and
:
resulting in
If one calculates all possible terms up to
(which equals
), the bit error probability can be calculated using (12) as
In Figure 1, one can see the terms
versus
for two different
values. In this figure, logarithmic scale on the vertical axis was used, and smaller values than
were omitted. Note that the number of negligible terms depends on the noise level. This is explained by the fact that increasing the noise level, the samples
will fall into a region where
decreases rather fast. The difference between the smallest and largest terms is of
magnitudes. Furthermore, one may note that there are no
samples on the positive side which proves that channel can indeed be equalized and in this case there are no local minima as proven in [13].
In order to derive a method to identify
first let us express the bit error probability as a sum of two expressions
where the second term can be upper bounded by using the
th dominant sample. In this way, one obtains the following bound:
Since the sharpness of (32) depends on the value of the (
)th dominant sample (it becomes sharp if the value of the
th dominant sample is small), this expression can be used to estimate the number of dominant samples
to be used for giving efficient bound on BER. If the bound using the
th dominant sample drops below a predefined value
then the number of samples
needed to approximate BER can be obtained as follows:
where
can be substituted by its approximation using the first
dominant samples. Figure 2 analyzes the accuracy of the bound obtained by the dominant samples.
Figure 2 shows two curves belonging to the
10 and 20 dB, respectively. From this figure, it can be seen that if
and
dB then the necessary number of samples
. This necessary sample number will increase with respect to the decrease of SNR (in the case of
dB the number of samples is seven). This is in line with the reasoning detailed above.
4.3. Handling the Channel Delay
If there is some delay
in the overall channel impulse response function
, a more efficient equalization can be carried out by the decision rule given in (3). The cost function based on the lower bound in (16) has only to be slightly modified in order to handling the delay parameter
. Since in this case (instead of the first element) the
th element of
must be set to
hence the index array used for calculating the dominant terms have to be changed to