Partial Crosstalk Cancellation for Upstream VDSL

Crosstalk is a major problem in modern DSL systems such as VDSL. Many crosstalk cancellation techniques have been proposed to help mitigate crosstalk, but whilst they lead to impressive performance gains, their complexity grows with the square of the number of lines within a binder. In binder groups which can carry up to hundreds of lines, this complexity is outside the scope of current implementation. In this paper, we investigate partial crosstalk cancellation for upstream VDSL. The majority of the detrimental e ﬀ ects of crosstalk are typically limited to a small subset of lines and tones. Furthermore, signiﬁcant crosstalk is often only seen from neighbouring pairs within the binder conﬁguration. We present a number of algorithms which exploit these properties to reduce the complexity of crosstalk cancellation. These algorithms are shown to achieve the majority of the performance gains of full crosstalk cancellation with signiﬁcantly reduced run-time complexity.


INTRODUCTION
VDSL is the next step in the on-going evolution of DSL systems. Supporting data rates up to 52 Mbps in the downstream, VDSL offers the potential of bringing truly broadband access to the consumer market. VDSL supports such high data rates by operating over short line lengths and transmitting in frequencies up to 12 MHz.
The twisted pairs in the access network are distributed within large binder groups which typically contain anything from 20 to 100 individual pairs. As a result of the close distance between twisted pairs within binders and the high frequencies used in VDSL transmission, there is significant electromagnetic coupling between nearby pairs. This electromagnetic coupling leads to interference or crosstalk between the different systems operating within a binder.
There are two types of crosstalk, near-end crosstalk (NEXT) and far-end crosstalk (FEXT). NEXT occurs when the upstream (US) signal of one modem couples into the downstream signal of another or vice versa. FEXT occurs when two signals traveling in the same direction couple. In VDSL, NEXT is avoided through the use of FDD. FEXT, on the other hand, is still present. FEXT is typically 10-15 dB larger than the background noise and is the dominant source of performance degradation in VDSL.
Many crosstalk cancellation schemes have been proposed for VDSL based on linear pre-and postfiltering [1,2], successive interference cancellation [3,4], and turbo coding [5]. These schemes are applicable to US transmission where the receiving modems are colocated. In downstream transmission, it is also possible to precompensate for crosstalk since the transmitters are then colocated at the central office (CO) [3,6]. Cancellation of crosstalk from alien systems like HPNA and HDSL has also been investigated [7,8].
Since crosstalk is the dominant source of performance degradation in VDSL, removing it leads to spectacular performance gains, for example, 50-130 Mbps in the US direction [3]. Whilst the benefits of crosstalk cancellation are large, complexity can be extremely high. For example, in a bundle with 20 users all transmitting on 4096 tones and operating at a block rate of 4000 blocks per second, the complexity of linear crosstalk cancellation exceeds 6.5 billion multiplications per second. This is outside the scope of present-day implementation and may remain infeasible economically for several years. Other techniques such as soft-interference cancellation and nonlinear crosstalk cancellation add even more complexity.
What is required is a crosstalk cancellation scheme with scalable complexity. It should support both conventional single-user detection (SUD) and full crosstalk cancellation. Furthermore, it should exhibit graceful performance degradation as complexity is reduced. We present a US crosstalk cancellation scheme which exhibits these properties. It is shown that by exploiting the space-and frequencyselective nature of crosstalk channels, this crosstalk cancellation scheme can achieve the majority of the performance gains of full crosstalk cancellation with a fraction of the runtime complexity.
This paper is organised as follows. In Section 2, we describe the system model for the crosstalk environment. Section 3 describes crosstalk cancellation, its performance and complexity. Due to the high complexity of full crosstalk cancellation, in Sections 4 and 5, we introduce the concept of partial crosstalk cancellation which exploits both the spaceand frequency-selectivity of the crosstalk channel. This takes advantage of the fact that the majority of the crosstalk experienced by a modem comes from only a few other crosstalkers in the binder. Furthermore, since crosstalk coupling varies dramatically with frequency, the worst effects of crosstalk are limited to a small selection of tones. Exploiting these two properties leads to significant reductions in complexity. In Section 6, we describe a partial cancellation algorithm which exploits space-selectivity. An algorithm which exploits frequency-selectivity only is described in Section 7. As we will see, achieving the largest possible reduction in run-time complexity requires algorithms to exploit both forms of selectivity and in Section 8 we describe such algorithms. The performance of the algorithms is compared in Section 9 and conclusions are drawn in Section 10.

UPSTREAM SYSTEM MODEL
We begin by assuming that all receiving modems are colocated at the CO as is the case in US transmission. This is a prerequisite for crosstalk cancellation since signal level coordination is required between receivers. Through synchronized transmission and the cyclic structure of DMT blocks, crosstalk can be modelled independently on each tone. We assume there are N + 1 users within the binder group so that each user has N interferers. Transmission of a single DMT block can be modelled as Here x n k and y n k denote the symbols transmitted and received, respectively by user n on tone k. The tone k is in the range 1, . . . , K, where K is the number of tones in the DMT system (e.g., for VDSL, K = 4096). h (n,n) k is the direct channel of user n at tone k, and h (n,m) k is the crosstalk channel from user m into user n. z n k represents the additive noise experienced by user n on tone k and is assumed to be spatially white and Gaussian such that E {z k z H k } = σ 2 k I N . We denote the transmit auto-correlation on tone k as Note that S k is a diagonal matrix since coordination is not available between the different customer premises (CP) transmitters.
A matrix A is said to be column-wise diagonal dominant if it satisfies where If A satisfies both (2) and (3), it is said to be strictly diagonal dominant.
In DSL channels with colocated receivers, the channel matrix H k is column-wise diagonal dominant and satisfies the following property: In other words, the direct channel of any user always has a larger gain than the channel from that user's transmitter into any other user's receiver. This property has been verified through extensive cable measurements (see the semiempirical crosstalk channel models in [9]). It will be exploited in the remaining sections.

Optimal crosstalk cancellation
When both the transmitters and the receivers of the modems within a binder are colocated, channel capacity can be achieved in a simple fashion [1,2]. Using the singular value decomposition (SVD), define where the columns of U k and V k are the left and right singular vectors of H k , respectively, and the singular values It is assumed that H k is nonsingular, which is ensured by (4) provided that h (n,n) k = 0 for all n.
Define the true set of symbols For a given S k , the optimal transmitter structure prefilters x k with the matrix such that x k = P k x k . At the receiver, we apply the filter w n k = e H n Λ −1 k U H k to generate our estimate of the transmitted symbol where e n [I N+1 ] col n , I N+1 is the (N + 1) × (N + 1) identity matrix, and z n k e H n Λ −1 k U H k z k . Here we use [A] row n and [A] col n to denote the nth row and column of matrix A, respectively. Note that E {| z n k | 2 } = σ 2 k (λ n k ) −2 . The preand postfiltering operations remove crosstalk without causing noise enhancement. Applying a conventional slicer to x n k achieves the following rate for user n on tone k: where Γ represents the SNR gap to capacity and is a function of the target BER, coding gain, and noise margin [10]. The maximum achievable rate of the multiline DSL channel is It is straightforward to show n k c n k = C. So through the application of a simple linear pre-and postfilter and a conventional slicer, it is possible to operate at the maximum achievable rate of the DSL channel for the given S k . Unfortunately, application of a prefilter requires the transmitting modems to be colocated. In US DSL, this is typically not the case since transmitting modems are located at different CPs.

Simplified, near-optimal crosstalk cancellation
As a result of the column-wise diagonal dominance of H k , rates close to the maximum can be achieved with a very simple receiver structure. Furthermore, prefiltering is not required so such rates can be achieved without colocated transmitting modems. We now show why this is true. Theorem 1. Any column-wise diagonal dominant matrix H k which satisfies (A.7) can be decomposed into (10) such that Q k is unitary and Σ k is strictly diagonal dominant with positive diagonal elements. Furthermore, the off-diagonal elements of Σ k can be bounded using (A.27) and (A.30).
Proof. See the appendix.
The strict diagonal dominance of Σ k allows us to make the approximations Hence Comparison with (5) yields U k Q k , Λ k diag{Σ k }, and V H k I N . So the optimal transmit/receive structure of Section 3.1 is well approximated by where we use (11) to go from line 2 to 3. In [6], an upper bound is proposed for the capacity loss incurred due to the above approximation. This is shown to be minimal for all practical DSL channels. Since P k = I N , prefiltering is not required. This is important since in US DSL transmitting, modems are not colocated. Furthermore, the optimal receiver structure is well approximated by a linear zero-forcing (ZF) design. Thus we can achieve close to maximum rate using the following estimate: Note that noise enhancement is not a problem since H −1 Q H k is unitary hence it does not alter the statistics of the noise. Σ −1 k is approximately diagonal hence it scales the signal and noise equally.
Using this scheme, crosstalk cancellation of one user at one tone requires N multiplications per DMT block. So crosstalk cancellation for N + 1 users on K tones at a block rate b (DMT blocks per second) requires (N 2 + N)Kb multiplications per second. Thus the complexity rapidly grows  with the number of users in a bundle. For example, in a 20user system with 4096 tones and a block rate of 4000, the complexity is 6.5 billion multiplications per second. So whilst crosstalk cancellation leads to significant performance gains, it can be extremely complex, certainly beyond the complexity available in present-day systems. This is the motivation behind partial crosstalk cancellation.

CROSSTALK SELECTIVITY
In Figure 1, some crosstalk transfer functions are plotted from a set of measurements of a British Telecom cable consisting of 8 × 0.5 mm pairs. Examining this plot, we can make two observations. First, from a particular user's perspective, some crosstalkers cause significant amounts of interference, whilst others cause little interference at all. We refer to this as the spaceselectivity of crosstalk since the crosstalk channels vary significantly between lines. Space-selectivity arises naturally due to the physical layout of binders. A 25-pair binder is depicted in Figure 4. As can be seen, each pair is typically surrounded by 4-5 neighbours. Since electromagnetic coupling decreases rapidly with distance, each pair will experience significant crosstalk from only a few other surrounding pairs within the binder. Naturally twisted pairs which are nearby within a bindergroup will cause each other more crosstalk. The nearfar effect also gives rise to space-selectivity. In US transmission, modems which are located closer to the CO will cause more crosstalk than those located further away.
To illustrate the space-selectivity of crosstalk, we calculated the proportion of total crosstalk energy that is caused by the i largest crosstalkers of user n on tone k. All users have identical transmit PSDs, hence, from the perspective of user n, crosstalker m is said to be larger than crosstalker q at tone k if |h (n,m) The result was averaged across all tones k and every line n within the binder. The measurements were done using the British Telecom cable and the result is shown in Figure 2. As can be seen on average, approximately 90% of  Second, crosstalk channels vary significantly with frequency. So whilst a user may experience significant crosstalk on one tone, weak crosstalk may be experienced on other tones. We refer to this as the frequency-selectivity of crosstalk which arises naturally from the frequency-dependent nature of electromagnetic coupling.
To illustrate the frequency-selectivity of crosstalk, we calculated the proportion of total crosstalk energy contained within the i worst tones. From the perspective of user n and crosstalker m, tone k is said to be worse than tone l if The result is shown in Figure 3. Approximately 90% of the crosstalk is contained within half of the tones.
So the effects of crosstalk vary considerably with both space and frequency. Furthermore, the majority of its effects are contained within a relatively small subset of tones and crosstalkers. These observations suggest that we can achieve the majority of the performance gains of crosstalk cancellation by cancelling only the largest crosstalkers on each tone and we refer to this as partial crosstalk cancellation.
Some tones will see more significant crosstalkers than others and we can scale between conventional SUD and full crosstalk cancellation on a tone-by-tone basis. On each tone, we choose the degree of crosstalk cancellation based on the severity of crosstalk experienced. By cancelling only  the largest crosstalkers and by varying the degree of crosstalk cancellation on each tone, partial crosstalk cancellation can approach the performance of full crosstalk cancellation with a fraction of the run-time complexity.

Partial crosstalk canceller structure
We now describe the design of partial crosstalk cancellation in more detail. In the detection of user n, we observe the direct line of user n (to recover the signal) and p k,n additional lines (to enable crosstalk cancellation). p k,n varies with both the tone k and the user n to match the severity of crosstalk seen by that user on that tone. Note that p k,n = N corresponds to full crosstalk cancellation whilst p k,n = 0 corresponds to none (i.e., SUD). Define the set of extra observation lines M n k m k,n (1), . . . , m k,n p k,n (15) and the corresponding received signals We also define the set of lines which are not observed in the detection of user n on tone k where A \ B denotes the elements contained in set A and not in set B. We form an estimate of the transmitted symbol using a linear combination of the received signals on the observation lines only: Note that crosstalk cancellation for user n at tone k now requires only p k,n multiplications per DMT block in contrast to the N multiplications required for full crosstalk cancellation. This technique has many similarities to hybrid selection/combining from the wireless field [11,12]. There, selection is also used between receive antennas to reduce runtime complexity and reduce the number of analog front ends (AFE) required.

Partial crosstalk canceller design
We now describe the design of the partial cancellation coefficients w n k . We begin with a reduced system model which only contains the signals observed in the detection of user n at tone k x n k contains the signals transmitted onto the set of observed and H n k contains the corresponding channels where [A] rows A , cols B denotes the submatrix formed from the rows A and columns B of matrix A. x n k contains the signals transmitted onto the set of nonobserved lines M n k : and H n k contains the corresponding channels z n k contains the noise seen on the observed lines We choose a ZF design which was shown in Section 3.2 to be a near-optimal transmit/receive structure. The partial cancellation filter is designed to remove all crosstalk from crosstalkers in the set M n k : where e n [I pk,n+1 ] col n . Hence The first term is the transmitted signal whilst the second and third terms are the residual crosstalk and filtered noise, respectively.

LINE SELECTION
In DSL, the majority of the crosstalk that a particular user experiences comes from only a few of the other users within the system. We have referred to this effect as the space-selectivity of the crosstalk channel and we exploit it to reduce the complexity of crosstalk cancellation. In practice, this corresponds to observing only the subset M n k of the lines at the CO when detecting user n.
In this section, we investigate the optimal choice for the subset M n k . Our problem is thus where |A| denotes the cardinality of set A and c n k is the rate of user n on tone k.

Residual interference
Column-wise diagonal dominance in H k implies the same in H n k . Hence we can use the decomposition defined in Theorem 1 where Q n k is unitary and Σ n k strictly diagonal dominant. Hence We define ρ Now since the diagonal elements of Σ n k are positive, taking the norm of both sides of (30) yields where we use the column-wise diagonal dominance of H n k and the observation [H Hence From (30), Thus we find Using (4), we can make the approximation hence the residual interference The power of the residual interference is thus: (37)

Filtered noise
Using (4) and (33), we can make the approximation The power of the filtered noise is thus:

SINR after partial crosstalk cancellation
After crosstalk cancellation, we have the following estimate of the transmitted signal: x n k = x n k + w n k H n k x n k + w n k z n k .
The signal-to-interference-plus-noise ratio (SINR) at the input of the decision device is thus with the approximation becoming exact in strongly columnwise diagonal dominant channels. There are two interesting observations to make at this point. First, as we expected, the ZF crosstalk canceller removes crosstalk caused by the modems in the set M n k perfectly. Second, more surprisingly, the ZF crosstalk canceller does not change the statistics of the crosstalk caused by modems outside of the set M n k . It also does not change the statistics of the noise. So the column-wise diagonal dominant property of H k ensures us that a ZF partial crosstalk canceller will not cause enhancement of the crosstalk caused by modems outside M n k or of the noise.

Line selection algorithm
Maximizing SINR n k and thus rate c n k corresponds to minimizing the amount of interference in the set M n k . Note that we assume a sufficient number of noise sources and crosstalkers such that the background noise and residual interference are approximately Gaussian. So, to maximize rate c n k , we simply choose M n k to contain the largest crosstalkers of user n on tone k. Define the indices of the crosstalkers of user n on tone k sorted in order of crosstalk strength Remark 1 (optimal line selection). In column-wise diagonal dominant channels, the set M n k , which maximizes the rate of user n on tone k subject to a complexity constraint of p k,n multiplications/DMT block (see optimization in (27)), is M n k = q k,n (1), . . . , q k,n p k,n . (43) Proof. Follows from examination of (41).
At this point, we can propose a simple approach to partial crosstalk cancellation: Algorithm 1. Assume we operate under a complexity limit of cK multiplications/DMT block/user, This corresponds to c times the complexity of a conventional frequency domain equalizer (FEQ) as is currently implemented in VDSL modems. In this algorithm, we simply cancel the c largest crosstalkers on each tone, hence The reduction in run-time complexity from this algorithm comes from space-selectivity only. Since the degree of partial cancellation stays constant across all tones, this algorithm cannot exploit the frequency-selectivity of the crosstalk channel. As we will see, this leads to suboptimal performance when compared to algorithms which exploit both space-and frequency-selectivity. The advantage of this algorithm is its simplicity. The algorithm requires only O(KN) multiplications and K sorting operations of N values to initialize the partial crosstalk canceller for one user. Here we define initialization complexity as the complexity of determining M n k for all k. Initialization complexity does not include actual calculation of the crosstalk cancellation parameters w n k for each tone. This requires O( k (p k,n + 1) 3 ) multiplications for user n regardless of the partial cancellation algorithm employed. We assume that the direct and crosstalk channel gains |h (n,m) k | 2 for all n, m, k are available and do not need to be calculated.
The initialization complexity (in terms of multiplications and logarithm operations per user) of the different partial cancellation algorithms is listed in Table 1. The required number of sort operations of each size is listed in Table 2. All algorithms have equal run-time complexity.

TONE SELECTION
In the previous section, we presented Algorithm 1 for partial crosstalk cancellation. This algorithm exploits the space-selectivity of the crosstalk channel, that is, the fact that crosstalk varies significantly between different lines. Crosstalk coupling also varies significantly with frequency and this can also be exploited to reduce run-time complexity.
In low frequencies, crosstalk coupling is minimal so we would expect minimal gains from crosstalk cancellation. In higher frequencies, on the other hand, crosstalk coupling can be severe. However, in high frequencies, the direct channel attenuation is high so the channel can only support minimal bit-loading even in the absence of crosstalk. This limits the potential gains of crosstalk cancellation. The largest gains from crosstalk cancellation will be experienced in intermediate frequencies and this is where most of the run-time complexity should be allocated. Define the rate achieved by user n on tone k when the p k,n largest crosstalkers are cancelled as Note that by operating on a logarithmic scale, g k,n can be calculated by dividing the arguments of the logarithms in r k,n (N) and r k,n (0). We can now define another partial crosstalk cancellation algorithm: Algorithm 2. This algorithm simply employs full crosstalk cancellation on the cK/N tones with the largest gain and no cancellation on all other tones. This leads to a runtime complexity of cK multiplications/DMT block/user.
Note that in this algorithm, p k,n is restricted to take only the values 0 or N. As a result, it is not possible to only cancel the largest crosstalkers and this algorithm cannot exploit space-selectivity. The initialization complexity of this algorithm is O(KN) multiplications and one sort of size K, per user.

JOINT TONE-LINE SELECTION
In Sections 6 and 7, we described partial cancellation algorithms which exploit only one form of selectivity in the crosstalk channel. To achieve maximum reduction in runtime complexity, it is necessary to exploit both space-and frequency-selectivity. We should adapt the degree of crosstalk cancellation done on each tone p k,n to match the potential gains. In practice, this means that we allow p k,n to take on values other than 0 and N whilst also allowing p k,n to vary from tone to tone.

Simple joint tone-line selection
As we saw in Section 6.3, observing the direct line of a crosstalker allows us to remove the crosstalk it causes to the user being detected. Hence line selection is equivalent to choosing which subset of crosstalkers we desire to cancel. When combined with tone selection, our problem is effectively to choose which (crosstalker, tone) pairs to cancel in the detection of a certain user.
The rate improvement from cancelling a particular crosstalker on a particular tone is dependent on the other crosstalkers that will be cancelled on that tone. As such, there is an inherent coupling in crosstalker selection which greatly complicates matters [13]. In this algorithm, we remove this coupling by ignoring the effect of other crosstalkers in the system. This greatly simplifies (crosstalker, tone) pair selection with only a small performance penalty, as will be demonstrated in Section 9.
Define the gain of cancelling crosstalker m on tone k in the detection of user n and in the absence of all other crosstalkers as Note that if we work in a logarithmic scale, then g k,n (m) can be calculated by simply dividing the arguments of each log function. Define (crosstalker, tone) pair d n (i) (m n (i), k n (i)) and its corresponding gain g n (d n (i)) g kn(i),n (m n (i)). This allows us to define the indices of (crosstalker, tone) pairs ordered by gain This leads to a run-time complexity of cK multiplications/DMT block/user. The benefit of this algorithm is its low complexity. Pair selection for one user has a complexity of O(KN) multiplications and one sort of size KN. Furthermore, this algorithm exploits both the space-and frequencyselectivity of the crosstalk channel, allowing it to cancel the largest crosstalkers on the tones where they do the most harm. In Section 9, we will see that this algorithm leads to near-optimal performance.

Optimum joint tone-line selection
It is interesting to evaluate the suboptimality of the algorithms we described so far through an upper bound achieved by a truly optimal partial cancellation algorithm. The problem of partial cancellation is effectively a resource allocation problem. Given cK multiplications per user, we need to distribute these across tones such that the largest rate is achieved: Initialize v k,n (p) = (r k,n (p) − r k,n (0))/ p ∀k, p > 0 Repeat (k s , p s ) = arg max (k,p) v k,n (p) M n ks = {q ks,n (1), . . . , q ks,n (p s )} v ks,n (p) = 0, p = 1, . . . , p s v ks,n (p) = (r ks,n (p) − r ks,n (p s ))/(p − p s ), p = p s + 1, . . . , N While k |M n k | < cK Algorithm 4: Optimal tone-line selection.
Since the channel is column-wise diagonal dominant, Remark 1 allows us to determine, in a simple fashion, the best set of lines to observe in the detection of user n. Hence our problem simplifies to An exhaustive search could require us to evaluate up to N K different allocations. In VDSL, K = 4096, which makes any such search numerically intractable. Due to the structure of the problem, it is possible to come up with a greedy algorithm, Algorithm 4, which will iteratively find the optimal allocation for some values of c. The algorithm cannot find a solution for any arbitrary value of c; however, the range of values of c generated by the algorithm are so closely spaced that this is not a practical problem. Define the value of cancelling p crosstalkers on tone k as v k,n (p) = r k,n (p) − r k,n (0) p . (54) Recall that r k,n (p) is the rate achieved by user n on tone k when the p largest crosstalkers are cancelled and is evaluated using (46). Value is the increase in rate (benefit) divided by the increase in run-time complexity (cost). It measures increase in bit rate per multiplication when p multiplications are spent on tone k. The algorithm begins by initializing v k,n (p) for all values of p and k. It then proceeds as follows: (1) Find choice of tone k and cancelled crosstalkers p with largest value v k,n (p). Store this in (k s , p s ). The algorithm iterates through steps (1)-(4) until the allocated complexity exceeds cK. This yields an upper bound on the partial crosstalk cancellation performance for a given complexity. Since the algorithm allocates at most N multipli-cations in each iteration, the total allocated complexity will be at the most cK + N. With K = 4096, typically cK N. Hence the difference between the desired run-time complexity and that of the solution provided by the algorithm is minimal. The upper bound is thus tight.
Like Algorithm 3, this algorithm can exploit both the space-and frequency-selectivity of crosstalk to reduce runtime complexity. This algorithm generates a resource allocation at the end of each iteration which is optimal. That is, of all the resource allocations of equal run-time complexity, the one generated by this algorithm achieves the highest rate. Unfortunately, this algorithm is considerably more complex than Algorithm 3. Pair selection for a single user requires O(KN 2 ) multiplications and O(KN) logarithm operations. It is hard to define the exact sorting complexity since it varies significantly with the scenario. Sorting complexity is typically much higher than any of the other algorithms and can require up to KN sort operations which can have sizes as large as KN.

Complexity distribution between users
So far we have limited the run-time complexity of detecting each user to cK such that If crosstalk cancellation of all lines in a binder is integrated into a single processing module at the CO, then multiplications can be shared between users. That is, the true constraint is on the total complexity of crosstalk cancellation for all users n k M n k ≤ cK(N + 1).
The available complexity can be divided between users based on our desired rates for each. Denote the number of multiplications/DMT block allocated to user n as κ n , then κ n = µ n cK(N + 1) s.t. n µ n = 1.
Here µ n is a parameter which determines the proportion of computing resources allocated to user n. This allows us to view partial cancellation as a resource allocation problem not just across tones but across users as well. Given a fixed number of multiplications, we must divide them between users based on the desired rate of each user. In a similar fashion to work done in multiuser power allocation (see, e.g., [14,15]), we can define a rate region as the set of all achievable rate tuples under a given total complexity constraint. This allows us to visualise the different trade-offs that can be achieved between the rates of different users inside a binder. Limiting crosstalk cancellation on each tone to the users who benefit the most leads to further reductions in run-time complexity with minimal performance loss. This is demonstrated in Section 9.2.

PERFORMANCE
We now compare the performance of the partial crosstalk cancellation algorithms described in Sections 6, 7, and 8.
Performance is compared over a range of scenarios with crosstalk channels which exhibit both space-and frequencyselectivity. As we show, the ability to exploit both space-and frequency-selectivity is essential for achieving low run-time complexity in all scenarios. We use semiempirical transfer functions from the ETSI VDSL standards [9]. Note that in these channel models, each user sees identical crosstalk channels to all crosstalkers of equal line length. That is, the variation of crosstalk channel attenuation with the distance between lines within the binder is not modelled. When a binder consists of lines of varying length, the model does capture the near-far effect. All users will see the modems located closest to the CO (nearend) as the largest sources of crosstalk. On the other hand, when a binder consists of lines of equal length, all users will see equal crosstalk from all other users. So there will be no space-selectivity in the crosstalk channel model.
In reality, we would expect more space-selectivity than is contained within these channel models. Hence we can expect the reduction in run-time complexity to be even larger than that shown here. The number of lines in the binder is always 8, so N = 7. Other simulation parameters are listed in Table 3.

Equidistant lines (8 × 1000 m)
In the first scenario, the binder contains 8 × 1000 m lines. Since the lines are of equal length, the crosstalk channels exhibit frequency-selectivity only; no space-selectivity is present. Shown in Figure 5 are the rates achieved by each of the algorithms versus run-time complexity. Complexity is shown as a percentage relative to full crosstalk cancellation (c = N).
Algorithm 1 can only exploit space-selectivity. There is no space-selectivity in this scenario so this algorithm gives extremely poor performance. Worst of all, we actually see a nonconvex rate versus run-time complexity curve. So doing partial crosstalk cancellation gives worse performance than time sharing. In other words, we could do full crosstalk cancellation for some fraction of the time and none for the rest  and this would lead to better performance than Algorithm 1 with the same run-time complexity. The reason for this is as follows: as we increase the number of crosstalkers cancelled p k,n , the increase in signal-to-interference ratio (SIR) grows rapidly.
We illustrate this with the following example. Consider a binder with 7 crosstalkers. We assume that the crosstalkers all have identical crosstalk channels χ (n) k to user n as is the case in our simulation. Cancelling the first crosstalker causes the SIR to increase from (1/7)|h (n,n) Cancelling the sixth crosstalker gives a much larger SIR increase from (1/2)|h (n,n) In general, cancelling the pth crosstalker leads to an SIR increase of ( So the increase in SIR grows rapidly with p as p → N. Recall that c n k = log(1 + SINR n k ) SINR n k for low SINR n k . So when crosstalkers have equal strength and the SINR is low, data-rate gain will grow rapidly with the number of crosstalkers cancelled p. This is why cancelling N crosstalkers typically gives greater than N times the data rate gain of cancelling one crosstalker. This leads to the nonconvex rate-complexity curve of Figure 5.
When the channel exhibits space-selectivity, the first crosstalker causes much more interference than the second, and so on. This effect counteracts the rapid growth of SIR with p. As a result, the best trade-off between performance and complexity usually occurs somewhere between no and full crosstalk cancellation.
Algorithm 2 cannot exploit space-selectivity. In this scenario, this is not a problem since all crosstalkers have equal strength. Algorithm 2 can implement a form of frequencysharing. This is analogous to the time sharing just discussed and allows this algorithm to cancel, for example, 6 crosstalkers on half of the tones instead of 3 crosstalkers on all of the tones. For this reason, Algorithm 2 will always give a convex rate versus complexity curve. Comparing the performance of Algorithm 2 to the optimal algorithm, Algorithm 4,  we see that it gives near-optimal performance in this scenario. Algorithm 3 also gives near-optimal performance. Note that with 29% of the complexity of full crosstalk cancellation, we can achieve 89% of the performance gains.

Near-far scenario (4 × 300 m, 4 × 1200 m)
We now evaluate the selection algorithms in a binder consisting of 4 × 300 m loops and 4 × 1200 m loops. In this configuration, the lines suffer the near-far effect causing all users to see the 300 m near-end lines as the largest sources of crosstalk. This space-selectivity assists the partial cancellation algorithms in reducing run-time complexity.
Frequency-selectivity is present in this scenario and is most pronounced on far-end lines. Near-end lines have relatively flat channels and benefit less from algorithms which exploit frequency-selectivity alone. Figure 6 contains the rates of the 300 m near-end users versus complexity under the different algorithms. Figure 7 contains the same for the 1200 m far-end users.
Algorithm 1 cannot exploit frequency-selectivity. On near-end lines, frequency-selectivity is minimal and reasonable performance is still achieved. Again we see a nonconvex rate-complexity curve; however, above 43% complexity, Algorithm 1 gives near-optimal performance. On far-end users, frequency-selectivity is pronounced and Algorithm 1 gives poor performance.
Algorithm 2 cannot exploit space-selectivity and, on near-end users, this leads to poor performance which is virtually identical to time sharing. On far-end users, frequencyselectivity is pronounced and this algorithm still achieves reasonable performance despite its inability to exploit spaceselectivity.
Algorithm 3 can exploit both space-and frequencyselectivity. As a result, it gives near-optimal performance for both near-and far-end users. With 43% complexity, this algorithm can achieve 99% of the performance gains on near-  end users. On far-end users, 29% complexity achieves 97% of the performance gains. We now examine the distribution of run-time complexity between users as described in Section 8.3. Figure 8 contains the achievable rate regions under varying complexities c using Algorithm 3. The rate region was constructed by dividing multiplications between the two classes of near-end and far-end users. Users of one class receive an equal number of multiplications, 2µ near cK and 2µ far cK multiplications per DMT block for the near-end and far-end users, respectively. By varying the parameter µ far , we can trace out the boundary of the rate region. Note that µ near = 1 − µ far . We see in Figure 8 that with c = 2 (29% of the run-time complexity of full crosstalk cancellation), we can achieve the majority of the operating points within the rate region.
In Figure 9, the achievable rate regions of the different partial cancellation algorithms are compared for c = 2. Note the considerably larger rate region which is achieved by exploiting both space-and frequency-selectivity, in Algorithms 3 and 4.

Distributed scenario (300 : 100 : 1000 m)
Simulations were run in a distributed scenario consisting of 8 lines ranging from 300 m to 1000 m in 100 m increments. Algorithm 3 exhibited near-optimal performance and could increase the average rate from 9.7 Mbps to 23.7 Mbps with only 29% of the complexity of full crosstalk cancellation. This is equivalent to 2 times the complexity of a conventional FEQ.
We have seen that the performance of algorithms which exploit only one type of selectivity such as Algorithms 1 and 2 varies considerably with the scenario. By exploiting both space-and frequency-selectivity, Algorithm 3 consistently gave near-optimal performance. This algorithm is also considerably less complex than the optimal algorithm, Algorithm 4.

CONCLUSIONS
Crosstalk is the limiting factor in VDSL performance. Many crosstalk cancellation techniques have been proposed and these lead to significant performance gains. Unfortunately, crosstalk cancellation has a high run-time complexity and this grows rapidly with the number of users in a binder. Crosstalk channels in the DSL environment exhibit both space-and frequency-selectivity. The majority of the effects of crosstalk are limited to a small number of crosstalkers and tones. Partial crosstalk cancellation exploits this by only performing crosstalk cancellation on the tones and lines where it gives the most benefit. This allows it to give close to the performance of full crosstalk cancellation with considerably reduced run-time complexity.
In this paper, we presented several partial crosstalk cancellation algorithms for upstream transmission. It was seen that designing a partial crosstalk canceller requires us to choose which lines to observe when detecting each user on each tone. This is equivalent to choosing the (crosstalker, tone) pairs to cancel in the detection of each user. We described different algorithms for choosing pairs. These included simplistic algorithms such as Algorithm 1 which exploits space-selectivity only, and Algorithm 2 which exploits frequency-selectivity only. In Section 9, we saw that the per-formance of these two algorithms varies greatly depending on the scenario. Robust performance requires us to exploit both space-and frequency-selectivity together.
We presented an optimal algorithm (Algorithm 4) for partial crosstalk cancellation. Whilst this algorithm is highly complex, its ability to exploit both space-and frequencyselectivity led to good performance in all scenarios. Partial crosstalk canceller initialization for one user in this algorithm requires O(KN 2 ) multiplications and O(KN) logarithms.
A simple joint selection algorithm (Algorithm 3) was described which decouples the problem of (crosstalker, tone) pair selection thereby reducing initialization complexity significantly. This algorithm gave near-optimal performance in all of the scenarios we evaluated and has an initialization complexity of only O(KN) multiplications per user.
With Algorithm 3, it is possible to increase the average rate from 9.7 to 23.7 Mbps using only 2 times the run-time complexity of a conventional single-user detector (SUD), that is, frequency domain equalizer (FEQ), as is currently implemented in VDSL modems. With this complexity, the algorithm achieves 89% of the performance gains of full crosstalk cancellation.
By treating computational complexity as a resource to be divided across tones and users, we developed rate regions in Section 9. These allow us to visualize all of the achievable rate tuples under a certain run-time complexity constraint. This is quite similar to work done in the areas of multiuser power allocation (see, e.g., [14,15]); however, here we consider the allocation of computing resources rather than transmit power.
Whilst this paper has focused on crosstalk cancellation in VDSL, the techniques here are also applicable to MIMO-CDMA systems. Taking into account the processing gain, the interference path typically has 15-20 dB more attenuation than the main path [16]. Hence the MIMO-CDMA channel is column-wise diagonal dominant and the partial crosstalk cancellation techniques developed here can be directly applied.
In this work, we have considered crosstalk cancellation, which is applicable only to upstream DSL where receivers are colocated at the CO. In downstream DSL, it is also possible to mitigate the effects of crosstalk through crosstalk precompensation [3,6]. The development of partial crosstalk precompensation algorithms with reduced run-time complexity is the subject of ongoing research.
The simulations done here neglected the problem of power loading and assumed flat transmit PSDs. The use of nonflat PSDs through multiuser water filling or power back-off is currently the subject of much activity in the research community (see, e.g., [14,15,17,18]). The use of nonflat PSDs increases space-and frequency-selectivity and would allow partial cancellation to achieve even greater runtime complexity reductions whilst maintaining similar performance. The combination of multiuser power allocation and partial cancellation will lead to even larger achievable rates with implementable run-time complexities. This is an important area for future work.
where we use (A.12) to get from line 2 to 3. So