In this section, we derive a new approximation on BER. The purpose of developing this approximation is to estimate BER with an expression which is a computationally simple function of the equalizer weights. This paves the way towards realtime channel equalization.
In order to derive a bound on BER, one can note that function tends rapidly to zero for negative arguments. As a result, the terms in the summation can differ in several magnitudes. This gives rise to the idea of collecting only the dominant terms to provide a lower bound on BER. This lower bound has been commonly used in other domains, such as reliability analysis and referred to as LiSilvester bound [17], where the tiresome calculation of an expected value over a large state space is approximated by only using the dominant terms in the summation.
In our case, this bound can be obtained as follows. We may look upon (8) as an expected value, given as follows:
Introducing the following notations, one can obtain
Let us then separate the set into two disjoint subsets and ( , ). Let the number of elements in be , containing the first vectors belonging to the first largest values of , for which
From the properties of it follows that . This gives rise to the following bounds on BER:
yielding
where denotes the cardinality of .
Remark 4.
In this case, is subject to uniform distribution, thus the left hand side of (15) can be very tight if is chosen reasonably high. At the same time, the upper bound tends to be loose as the terms in have rather inaccurate upperbounds. In Section 4.2, we will evaluate the tightness of the bound based on the th dominant sample.
The most "harmful" sequence, denoted by in (i.e., the sequences beginning with ) is since , which indicates to be the absolute dominant term in the summation equation (12).
In order to determine the dominant terms that form the set , let us introduce the following notation: let be an index array pointing to the different elements of , where
Note that points to the th smallest element of in absolute value. The extension of index array for will be given in Section 4.3. The second dominant term can be deduced from by changing the sign of the component , because in this case , where is the smallest possible value for decreasing the PD.
Applying the same reasoning, the first largest terms can be given as follows:

(1)
;

(2)
and change the sign of the component ;

(3)
and change the sign of the component of ; ;

(4)
FOR TO find the index set for which is minimal, but and ;

(5)
Form the set to be used in the lower bound given in (16).
It is easy to see that the case of (when ) results in a cost function which has minimum value over the same coefficient vector as has the peak distortion. Increasing the value of , the lower bound in (16) tends to the exact and finally the case of and results in the exact minimum of BER.
We generally can derive an algorithm which identifies the dominant terms for any arbitrary where . The following procedure results in the first 4 dominant terms, which seems practically to be a good compromise between (yielding the PD criterion) and (for further details, see Section 4.2):

(1)
;

(2)
and change the sign of the component ; ;

(3)
and change the sign of the component ; ;

(4)
If then and change the sign of the components and ; ; ELSE and change the sign of the component ; ;

(5)
Form the set to be used in the lower bound (16).
Unfortunately, calculating the set to find the largest term is of exponential complexity, as must be calculated and arranged in monotone order for all possible .
4.1. Optimization of the Bound
The gradient of the lower bound in (16) is a truncated version of the gradient of the true BER (10), obtained by carrying out the summation over instead of :
Using the gradient, the following adaptive algorithm can be used for weight optimization
Of course, one can use variable step size in algorithm (19) to improve the speed of convergence. For example, the Armijo rule [15] can be applied to speed up the convergence. However, simulations showed no improvement by applying this rule. Another problem with this method is its high complexity (the gradient has to be evaluated several times). On the other hand, we may introduce a heuristically chosen step size, such as . The convergence of (19) by using this step size this algorithm is guaranteed by the KushnerClark theorem (for more details, see [16]). In the simulation section, the improvement of convergence achieved by variable step size method is also illustrated.
4.2. A Numerical Example for Calculating the New Bound
For the sake of better understanding of the algorithm developed for finding the dominant terms, a numerical example will be given as follows. Let us take the following overall impulse response function
where . At first, the index array will be defined containing the indices of the smallest, second smallest, and so forth. elements of (omitting ), respectively:
The information sequence resulting in maximum distortion, can be calculated as
yielding
The information sequence can be derived from changing the sign of the element :
resulting in
and comes from by changing the sign of the element
yielding
To determine , one can evaluate the sum . Since it is smaller than , can be derived from by changing the sign of the elements and :
resulting in
If one calculates all possible terms up to (which equals ), the bit error probability can be calculated using (12) as
In Figure 1, one can see the terms versus for two different values. In this figure, logarithmic scale on the vertical axis was used, and smaller values than were omitted. Note that the number of negligible terms depends on the noise level. This is explained by the fact that increasing the noise level, the samples will fall into a region where decreases rather fast. The difference between the smallest and largest terms is of magnitudes. Furthermore, one may note that there are no samples on the positive side which proves that channel can indeed be equalized and in this case there are no local minima as proven in [13].
In order to derive a method to identify first let us express the bit error probability as a sum of two expressions
where the second term can be upper bounded by using the th dominant sample. In this way, one obtains the following bound:
Since the sharpness of (32) depends on the value of the ()th dominant sample (it becomes sharp if the value of the th dominant sample is small), this expression can be used to estimate the number of dominant samples to be used for giving efficient bound on BER. If the bound using the th dominant sample drops below a predefined value then the number of samples needed to approximate BER can be obtained as follows:
where can be substituted by its approximation using the first dominant samples. Figure 2 analyzes the accuracy of the bound obtained by the dominant samples.
Figure 2 shows two curves belonging to the 10 and 20 dB, respectively. From this figure, it can be seen that if and dB then the necessary number of samples . This necessary sample number will increase with respect to the decrease of SNR (in the case of dB the number of samples is seven). This is in line with the reasoning detailed above.
4.3. Handling the Channel Delay
If there is some delay in the overall channel impulse response function , a more efficient equalization can be carried out by the decision rule given in (3). The cost function based on the lower bound in (16) has only to be slightly modified in order to handling the delay parameter . Since in this case (instead of the first element) the th element of must be set to hence the index array used for calculating the dominant terms have to be changed to