 Research
 Open Access
Expander chunked codes
 Bin Tang^{1}Email author,
 Shenghao Yang^{2},
 Baoliu Ye^{1},
 Yitong Yin^{1} and
 Sanglu Lu^{1}
https://doi.org/10.1186/s1363401502978
© Tang et al. 2015
 Received: 28 June 2015
 Accepted: 7 December 2015
 Published: 22 December 2015
Abstract
Chunked codes are efficient random linear network coding (RLNC) schemes with low computational cost, where the input packets are encoded into small chunks (i.e., subsets of the coded packets). During the network transmission, RLNC is performed within each chunk. In this paper, we first introduce a simple transfer matrix model to characterize the transmission of chunks and derive some basic properties of the model to facilitate the performance analysis. We then focus on the design of overlapped chunked codes, a class of chunked codes whose chunks are nondisjoint subsets of input packets, which are of special interest since they can be encoded with negligible computational cost and in a causal fashion. We propose expander chunked (EC) codes, the first class of overlapped chunked codes that have an analyzable performance, where the construction of the chunks makes use of regular graphs. Numerical and simulation results show that in some practical settings, EC codes can achieve rates within 91 to 97 % of the optimum and outperform the stateoftheart overlapped chunked codes significantly.
Keywords
 Random linear network coding
 Chunked codes
 Iterative decoding
 Random regular graph
1 Introduction
Random linear network coding (RLNC) has great potential for data dissemination over communication networks [1–4]. RLNC can be implemented in a distributed fashion due to its random nature and is shown to be asymptotically capacityachieving for networks with packet loss in a wide range of scenarios [5–7]. In this paper, we propose a lowcomplexity RLNC scheme called expander chunked (EC) codes and analyze the achievable rates of EC codes.
1.1 Background
For ordinary RLNC studied in literature [3–7], all participating nodes forward coded packets formed by random linear combinations of all the packets received so far. Major issues in applying ordinary RLNC include the computational cost and the coefficient vector overhead. Consider the dissemination of k input packets, each consisting of L symbols from a finite field. For encoding, RLNC requires \(\mathcal {O}(kL)\) finite field operations to generate a coded packet, and for decoding, a destination node takes \(\mathcal {O}(k^{2}+kL)\) finite field operations per packet if Gaussian elimination is employed. Moreover, to recover the transfer matrices of network coding at the destination node, a coefficient vector of k symbols is usually included in each of the transmitted packets [3]. Since the packet length L has an upper bound in realworld communication networks,^{1} using large values of k reduces the transmission efficiency. When there are hundreds of input packets, the computational cost and the coefficient vector overhead would make RLNC difficult for realworld implementation.
To resolve these issues, chunked (network) codes have been proposed [8], where the input packets are encoded into multiple small chunks (also called generations, classes, etc.), each of which is a subset of the coded packets. When using chunked codes, an intermediate network node can only combine the packets of the same chunk. The encoding and decoding complexities per packet of chunked codes are usually \(\mathcal {O}(mL)\) and \(\mathcal {O}(mL+m^{2})\), respectively, where m is the chunk size, i.e., the number of packets in each chunk. The coefficient vector overhead also reduces to m symbols per packet since only the transfer matrices of the chunks are required at the destination nodes. Even so, the chunk size should be a small value (e.g., 16 or 32) for the purpose of practical implementation, as demonstrated in [9].
Existing chunked codes are in two categories: overlapped chunked codes and coded chunked codes. In overlapped chunked codes, the chunks are subsets of the input packets with possibly nonempty intersections. The first several designs of chunked codes all belong to this category. However, the existing designs of overlapped chunks are mostly based on heuristics, and no rigorous performance analysis is available for the existing designs [10–12]. In coded chunked codes, chunks are generated by combining multiple input packets. By generalizing fountain codes and LDPC codes, nearly throughput optimal chunked codes have been designed, including BATS code [13, 14], Gamma code [15, 16], and Lchunked (LC) code [17]. Overlapped chunks can be viewed as a degraded class of coded chunks where chunks are generated using certain repetition codes.
Overlapped chunked codes, however, can have lower encoding complexity and latency than general coded chunked codes. First, as no new packets are necessarily generated during the encoding, the encoding complexity is dominated by generating the indices for the packets in each chunk, which does not incur any finite field operation or depend on the packet length L. In contrast, coded chunked codes incur a computational cost that is linear of L to generate a coded packet. For instance, BATS codes require on average \(\bar {\Psi }mL\) finite field operations for encoding a chunk, where \(\bar {\Psi }\gtrapprox 3m\). Therefore, compared to general coded chunked codes, the computational cost of overlapped chunked codes is usually negligible.
Second, overlapped chunks can be encoded in a causal fashion. Suppose that the input packets arrive at the encoder gradually. The first chunk can be generated after collecting m input packets, and for every m input packets collected in the following, at least one new chunk can be formed. Therefore, the generation as well as the transmission of chunks can be performed in parallel with the collection of the input packets, reducing the total transmission latency. In contrast, how to achieve causal encoding for general coded chunked codes is not clear: BATS codes and Gamma codes usually require a large fraction of the input packets for encoding chunks.
These advantages motivate us to study overlapped chunked codes, which are especially suitable for delay sensitive applications and networks where the source node has limited computation and storage power, e.g., wireless sensors and satellites.
1.2 Our contribution
We propose expander chunked (EC) codes, the first class of overlapped chunked codes that has analyzable performance. In an EC code, the overlapping between chunks is generated using a regular graph: each chunk corresponds to a node in the graph and two adjacent chunks share an input packet. EC codes can be encoded causally and share the same belief propagation (BP) decoding of general overlapped chunked codes.
We analyze the BP decoding performance of EC codes generated based on random regular graphs. By exploring the locally treelike property of random regular graphs and then conducting a treebased analysis similar to that of LT/LDPC code, we obtain a lower bound on the achievable rate depending only on the chunk size, the degree of the regular graph, and the rank distribution of the transfer matrices.
The achievable rates of EC codes are evaluated and compared with other chunked codes in two scenarios. We first compare the achievable rates of EC codes with representative coded chunked codes for randomly sampled rank distributions of the transfer matrices, where the purpose is to understand the general performance of EC codes. We find that the performance of EC codes highly depends on the rank distributions: when the expected rank is relatively large, the average achievable rate (over the rank distributions sampled) of EC codes is close to 90 % of the representative coded chunked codes, as well as a theoretical upper bound. But for relatively small expected ranks, the achievable rate of EC codes varies significantly for different rank distributions.
To further see the realworld potential of EC codes, we then evaluate the performance for a nearoptimal chunk transmission scheme over linetopology (line) networks [18]. As most practical routing schemes are singlepath based, line networks have attracted a lot of interest [19–21]. Also, the chunked code scheme for line networks can be extended to general network scenarios, including general unicast networks [14, 18], twoway relay networks [22], and wireless broadcast networks [23]. For a wide range of the packet loss rates, with proper optimization of the transmission scheme, EC codes achieve rates very close to those of the coded chunked codes and about 91 %∼97 % of the theoretical upper bounds. Besides, we show by simulation that EC codes perform much better than the existing overlapped chunked codes in line networks.
Comparison among EC/BATS/LC codes where achievable rates are evaluated over line networks with chunk transmission scheme given in [18]
Design  Causal encoding  Encoding complexity  Decoding complexity  Achievable rates 

EC  Support  \(\mathcal {O}(n)\)  \(\mathcal {O}(nL)\)  91 % ∼ 97 % of opt. 
BATS  Unknown  \(\mathcal {O}(nL)\)  \(\mathcal {O}(nL)\)  >99 % of opt. 
LC  Unknown  \(\mathcal {O}(nL)\)  \(\mathcal {O}(nL)\)  >98 % of opt. 
As another contribution, a simple transfer matrix model is proposed to characterize the transmission of chunks over networks with packet loss. Compared with a similar model proposed in [14], which is more suitable for BATS codes, our model incorporates some more practical features of network operations for general chunked codes, making the design of efficient network transmission protocols easier. Therefore, our model is of independent interest for chunked codes. We derive some properties of this transfer matrix model for the performance analysis, which can apply to general chunked codes.
1.3 Related work
The simplest way to form a chunked code is to use disjoint subsets of the input packets as chunks [8], which has been used in some applications of RLNC [9, 24, 25]. To decode a chunk, the transfer matrix of the chunk must have full rank of m; otherwise, none of the packets in the chunk could be recovered with high probability. However, it is not always a simple task to guarantee the success of decoding a chunk at the destination node. One approach is to use feedbackbased chunk transmission mechanism [24]. While some efficient feedback protocols for specific applications have been developed [25, 26], in general, such feedback incurs an inevitable delay and also consumes network resources, resulting in degraded system performance. Besides, for some scenarios such as satellite and deepspace communications, feedbacks are not even available. Another approach is to employ a random schedulingbased chunk transmission scheme [27], where every network node always randomly selects a chunk for transmission. But this scheme has poor performance for small chunk sizes [10, 11].
Instead of using disjoint chunks of input packets, chunks with overlaps, i.e., different chunks share some input packets in common, have been proposed by several groups independently [10–12]. It is shown via simulations that overlapped chunked codes have much better performance than disjoint chunks [10, 11]. The random annex codes proposed by Li et al. [12] demonstrate better performance in simulation than the overlapped chunked codes in [10, 11], but only heuristic analysis of the design is provided.
BATS code [13, 14] is the first class of chunked codes that uses coded chunks. Each chunk in a BATS code is generated as linear combinations of a random subset of the input packets. BATS codes can be regarded as a matrix generalization of fountain codes [28, 29] and preserve the ratelessness of fountain codes.
Another kind of coded chunked codes consists of chunks that satisfy some paritycheck constraints, similar to those of LDPC codes. The first class of such codes is Gamma codes [15, 16, 30], where the paritycheck constraints are applied on the whole chunk [15] or on the individual packets in chunks [30]. Another class of such codes is Lchunked codes [17] which consider more general paritycheck constraints and show better performance. Note that the original Gamma code [15] paper is published in parallel with the conference version of this paper [31], while the refined Gamma codes [30] and Lchunked codes are published later than that of our conference version.
Various chunked codebased transmission schemes have been designed and implemented recently [18, 22, 32], which are consistent with our transfer matrix model.
2 Overlapped chunked codes
In this section, we give a general formulation of overlapped chunked codes, including causal encoding and belief propagation (BP) decoding. We also provide a transfer matrix model for general chunked codes.
2.1 Encoding of chunks
Consider transmitting a set of k input packets b _{1}, b _{2}, …, b _{ k } from a source node to a destination node over a network with packet loss. Each input packet composes of L symbols from the finite field \(\mathbb {F}_{q}\) and is regarded as a column vector in \(\mathbb {F}_{q}^{L}\) henceforth.
Definition 1 (Chunked codes).
A chunk is a set of packets each of which is a linear combination of the input packets, and a chunked code is a collection of chunks. A chunked code is said to be overlapped if its chunks are subsets of the input packets with possibly nonempty overlapping.
In this paper, we focus on the design of overlapped chunked codes. Evidently, an overlapped chunked code can be generated by repeating some input packets. Same as most related works, we assume that all the chunks in a chunked code have the same cardinality m, which is called the chunk size. As the chunk size is related to the encoding/decoding computational complexities and the coefficient vector overhead, for the sake of the applicability in common networks, we regard the chunk size m as a fixed constant which does not change with the number of input packets.
An overlapped chunked code can be more concisely represented by a collection of index sets of size m. For any integer n, let \(\mathcal {I}_{1},\mathcal {I}_{2},\ldots, \mathcal {I}_{n}\) be subsets of {1,…,k} with size m. Let \(\mathbf {B}_{j} =\{\mathbf {b}_{i}:i\in \mathcal {I}_{j}\}\). We call either \(\mathcal {I}_{j}\) or B _{ j } a chunk and the subscript j the chunk ID. An overlapped chunked code of n chunks can be given by either \(\{\mathcal {I}_{j}:j=1,\ldots,n\}\) or {B _{ j }:j=1,…,n}.
Since each chunk is a subset of the input packets, it is not necessary to duplicate the existing input packets for chunk encoding. During the encoding, only the address in the memory of each packet in a chunk needs to be recorded. Furthermore, every overlapped chunked code can be encoded causally, which is explained in the following.
Definition 2 (Causal encoding).

these chunks are formed by the first i input packets, and

each of the first \(m\times \lfloor \frac {i}{m} \rfloor \) input packets is used for generating these chunks at least once.
It is worth mentioning that, when m=1, systematic encoding of a linear code is a special case of causal encoding. For any overlapped chunked code where each input packet is included by at least one chunk, we can always apply some proper permutation of the indices such that, for any j≤n, the indices of the packets among the first j chunks are 1,2,…,k _{ j }, where k _{ j }≤m j. In this sense, every overlapped chunked code can be encoded causally. One example is given when introducing our EC codes in Section 3. Now, consider a scenario where the input packets arrive at the source node sequentially (e.g., the source node is a sensor which keeps on collecting data and encapsulating data into packets). Then, for any m input packets collected consecutively, the source node can generate one new chunk for transmission. Hence, the source node does not necessarily collect all the input packets before encoding and the chunks can be transmitted in parallel with the collection of succeeding input packets. Therefore, by applying an overlapped chunked code, the endtoend transmission latency could be significantly reduced.
2.2 Transmission of chunks
Each transmitted packet in the network is of the form (j,c,b), where j specifies a chunk ID, \(\mathbf {c}\in \mathbb {F}_{q}^{m}\) is the coefficient vector, and b=B _{ j } c, a linear combination of packets in B _{ j }, is the payload. Here, with some abuse of notation, B _{ j } is also treated as a matrix formed by juxtaposing the packets in B _{ j }. For convenience, we refer to a packet with chunk ID j as a j packet.
Now, we describe a chunk transmission model through a network employing linear network coding, which is consistent with the recent design and implementation of chunked codebased network protocols [18, 22, 32]. Consider the jth chunk of packets \(\mathbf {b}_{j_{1}}\), \(\mathbf {b}_{j_{2}}\), \(\ldots, \mathbf {b}_{j_{m}}\). The source node first attaches a coefficient vector to each packet and generates \(\tilde {\mathbf {b}}_{j_{i}} = (\mathbf {e}_{i}, \mathbf {b}_{j_{i}})\), i=1,…,m, where e _{ i } is the ith column of the m×m identity matrix. The source node then generates M _{ j } random linear combinations of \(\tilde {\mathbf {b}}_{j_{i}}\) and transmits these linear combinations after attaching the chunk ID, where M _{ j } is an integervalued random variable.
where ϕ _{ i }, i=1,2,…,h, are chosen from \(\mathbb {F}_{q}\). A network node does not transmit combinations of packets of different IDs. Note that in (1), we only need to combine the j packets with linearly independent coefficient vectors. For the scheduling issue, i.e., how to choose a chunk B _{ j } by each intermediate node for each transmission, please refer to some recent proposed network protocols [18, 22, 32].
The proposed chunk transmission model does not depend on a particular chunked code and hence can be used for the analysis of other chunked codes. A similar model has been used for BATS codes [14]. Our model, however, explicitly incorporates a parameter M _{ j } indicating the number of packets transmitted of a chunk, which has a clear operation meaning in chunked codebased network protocols. Intuitively, when the network has a higher packet loss rate, we intend to use a larger value of M _{ j } to gain the benefit of network coding. Readers can find more discussion about this parameter in [18].
Now, we present a key result about the transfer matrices, which shows that the column space of each transfer matrix with a fixed dimension is uniformly distributed over all the subspaces with the same dimension.
Lemma 1.
Proof.
where the first step follows by the invertibility of A.
The proof is completed by combining the above equality with (3).
2.3 BP decoding
The destination node tries to decode the input packets by solving the local linear systems Y _{ j }=B _{ j } T _{ j }, j=1,2,…,n. These local linear systems for chunks jointly give a global linear system of equations on the k input packets, but solving the global linear system without considering the chunk structure usually has high computational cost. Therefore, we consider the following BP decoding of overlapped chunked codes.

in the first phase, decode every decodable chunk that has not been decoded by solving its associated linear system using, e.g., Gaussian elimination, and

in the second phase, for each input packet b in B _{ j } that is decoded in the last phase and each chunk B _{ i }≠B _{ j } that includes b, substitute the value of b into the linear system of B _{ i }, reducing the number of unknown input packets in this linear system.
The BP decoding stops when all the chunks have been decoded or all the chunks that have not been decoded are not decodable.
Now, we analyze the time cost of the BP decoding algorithm measured in finite field operations. Solving the linear system of a chunk can be done by first inverting the coefficient matrix, which costs \(\mathcal {O}(m^{3})\) and then using the inverse to recover all the unknown input packets, which costs \(\mathcal {O}(m^{2}L)\). As there are n chunks, all the first phases cost \(\mathcal {O}((m^{3}+m^{2}L)n)\) in total. The substitution of an input packet into a linear system costs \(\mathcal {O}(mL)\) and can happen at most mn times, so all the second phases cost \(\mathcal {O}(m^{2}Ln)\) in total. Therefore, the BP decoding algorithm costs \(\mathcal {O}((m^{3}+m^{2}L)n)\) finite field operations.
Assume that rk(T _{ j }) follows the probability distribution t=(t _{0},t _{1},…,t _{ m }), i.e., \(\Pr \{\text {rk}(\mathbf {T}_{j})=i\}=t_{i}\) for i=0,1,…,m. We have the following theorem, which is the footstone for the analysis of the above BP decoding algorithm.
Theorem 2.
Proof.
Some values of \({\zeta _{i}^{w}}\) for different i and w
w  2  4  6  

i  q=2  q=256  q=2  q=256  q=2  q=256 
26          02933  0.9961 
27          0.5687  1.0000 
28      0.3076  0.9961  0.7823  1.0000 
29      0.6152  1.0000  0.8940  1.0000 
30  0.3750  0.9961  0.8203  1.0000  0.9536  1.0000 
31  0.7500  1.0000  0.9375  1.0000  0.9844  1.0000 
2.4 Achievable rate
Definition 3 (Achievable rate).
We say that a rate R is achievable by chunked codes using BP decoding if for any constant ε>0, there exists a chunked code with k≥(R−ε)m n input packets and n chunks for all sufficiently large n such that with probability at least 1−ε, when the BP decoding stops, at least (R−ε)m n input packets are recovered.
Remark 1.
It is not necessary that the chunked code recovers all the input packets. When all the input packets are required to be recovered by the destination node, we can either retransmit the input packets that are not recovered or use the precode technique as in Raptor codes [29].
Our objective is to design an efficient class of overlapped chunked codes according to the given rank distribution. A natural upper bound on the achievable rates of chunked codes is established as follows.
Proposition 3.
Proof.
Assume that \(\lambda =\bar {t}/m+\delta \), δ>0 is achievable by chunked codes. Fix ε=δ/2, by the definition of achievable rates, there exists a chunked code with n chunks for all sufficiently large n such that at least (λ−ε)m n input packets are recovered with probability at least 1−ε.
where the last inequality follows from the Chernoff bound. For a sufficiently large n, we have P _{err}>ε, a contradiction!
3 Expander chunked codes
In this section, we introduce a family of overlapped chunked codes, named EC codes.^{2}
3.1 Code description
 1.
Label each edge e∈E with a distinct integer in {1,…,k}, and denote the integer by i _{ e }. Label the rest k−n d/2=(m−d)n integers in {1,…,k} evenly to the n nodes in V, and denote the set of integers labelled to node v by \(\mathcal {I}_{v}'\).
 2.Form n chunks \(\{\mathcal {I}_{v}, 1\leq v \leq n\}\), where$$ \mathcal{I}_{v}=\mathcal{I}_{v}' \cup \{i_{e}: e \text{~is incident to node} v\}. $$
Due to the onetoone correspondence between nodes in G and the chunks, we equate a node with its corresponding chunk henceforth in the discussion. We call \(\mathcal {I}_{v}\) chunk v and i _{ e } an overlapping packet of chunk v.
3.2 Achievable rates
The performance of EC code with a particular generator graph is difficult to analyze. Instead, we analyze the performance of an EC code with a random dregular graph as the generator. There are various probability models for random dregular graphs. We adopt the uniform model, i.e., G is uniformly chosen from all dregular graphs with node set V. One can obtain the similar result for the permutation model, the perfect matching model [35], etc.
Theorem 4.
EC codes with the degree d and chunk size m can achieve a rate at least τ _{ d }(1−d/m)+λ _{ d } d/(2m).
Note that, for any fixed degree d, the achievable rate given in Theorem 4 is easy to calculate numerically. Thus, we can easily find a proper degree d to maximize the achievable rate.
3.3 Performance analysis
We provide an analysis of the BP decoding of the EC code with a random dregular graph as the generator and prove Theorem 4.
Definition 4.
For any generator graph G=(V,E), the l neighborhood of a node v∈V, denoted by G _{ l }(v), is the subgraph of G induced by all the nodes u with distance at most l to v.
After l+1 iterations of the BP decoding, whether all the input packets in chunk v are recovered is determined by G _{ l }(v). Hence, we study the BP decoding performance G _{ l }(v).
Definition 5.
For any generator graph G=(V,E), a node v∈V is said to be l decodable if all the input packets in chunk v can be decoded when the decoding process is applied on G _{ l }(v).
We first show that a random regular graph has the locally treelike property, i.e., almost all the nodes in G have their l neighborhoods being trees.
Lemma 5.
Proof.
Now, we show the probability that a node v is l decodable given that G _{ l }(v) is a tree. Note that the treebased analysis of EC codes can be viewed as a variation of the andortree analysis used for LT and LDPC codes.
Lemma 6.

the probability that chunk v is l decodable is at least (1−ε)τ _{ d }, and

the probability that an overlapping packet in chunk v can be recovered by BP decoding on G _{ l }(v) is at least (1−ε)λ _{ d }.
Proof.
We first prove the first part. Consider the tree G _{ l }(v) rooted at v. Clearly, the root v has d children nodes and all other internal nodes have d−1 children nodes. Let h _{ i } be the probability that a node u at level i (here, we assume that the node v is at level l and the leaves are at level 0) is decodable when the decoding process of u is restricted within the subtree of G _{ l }(v) rooted at u. In the following, we calculate h _{ i } in a bottomup fashion.
Next, we prove the second part. Let u be an arbitrary children of node v. According to the above analysis, we know that node u is decodable with probability \(h_{l1}={\alpha _{d}^{l}}(0)\). Meanwhile, under the condition that chunk u is not decodable, we can consider a new tree obtained by deleting the subtree rooted at u from G _{ l }(v). Similarly, we can show that node v can be decoded on the new tree with probability \(\alpha _{d}(h_{l1})=\alpha _{d}^{l+1}(0)\). Therefore, the common packet of chunk u and chunk v can be decoded with probability at least \(1(1{\alpha _{d}^{l}}(0))(1\alpha _{d}^{l+1}(0))\), which approaches λ _{ d } when n goes to infinity. The proof is accomplished.
Lemma 5 and Lemma 6 together give a bound on the expected number of packets that can be recovered by BP decoding. Finally, we complete the proof of Theorem 4 by showing that the number of recovered packets is sharply concentrated to its expectation.
Proof of Theorem 4.
Applying the AzumaHoeffding inequality [37], we have
Finally, since T≥(1−ε/2)n almost surely according to Lemma 5, and Z is a natural lower bound on the number of packets that can be decoded by the BP decoding algorithm, we complete the proof of Theorem 4.
3.4 Generator graph design
The above performance analysis implies that most dregular graphs have the locally treelike structure and hence the corresponding EC codes have the desired BP decoding performance. Therefore, the generator graph G can be designed randomly. That is, we randomly generated a dregular graph as the generator graph, which can be done in expected \(\mathcal {O}(n)\) time by the McKayWormald algorithm [38]. We will use this approach in our performance evaluation.
Since a randomly generated dregular graph lacks a structure, we may need the whole adjacency matrix to preserve the graph. Note that the adjacency matrix is sparse and hence can be compressed. Alternatively, we may just save the seed of the pseudorandom generator used for generating the dregular graph.
Structured dregular graphs can further simplify the generation and/or preservation of the EC code. When d=8, Margulis’ method [39] gives a structured 8regular graph. However, currently, we do not have an efficient algorithm for generating structured regular graphs with any parameters d and n. Construction of structured regular graphs is of independent interest in mathematics and computer sciences, and many researches have been conducted on developing new approaches [40].
4 Performance evaluation
In this section, we evaluate the performance of EC codes with comparison against the stateoftheart overlapped chunked codes (H2T codes [11] and random annex codes (RAC) [12]) and coded chunked codes (BATS codes [14] and Lchunked (LC) codes [17]). In all the evaluations, unless specified, we use m=32 and q=256, which gives a good balance between the achievable rates and the encoding/decoding cost.
4.1 Random transfer rank distributions
The performance of EC codes, as well as of BATS codes and LC codes, depends on the rank distribution t=(t _{0},t _{1},…,t _{ m }). So, we first evaluate the performance of EC codes for general rank distributions, which may provide some guidance on the application of EC codes.
Recall that the achievable rate of chunked codes is upper bounded by \(\bar {t}/m\) (see Proposition 3). For each fixed value \(\bar {t}/m=0.5,0.6,0.7,0.8\), we sample a number of rank distributions^{4} and derive the corresponding achievable rates of EC codes, BATS codes, and LC codes numerically. For BATS and LC codes, the achievable rate is obtained by solving the corresponding degree distribution optimization problem. For EC codes, the achievable rate is given by Theorem 4 with an optimized d. In particular, in order to see how the finite field size q affects the performance of EC codes, the achievable rate of EC codes for q=2 is also evaluated.
Achievable rates of EC/BATS/LC codes with 10 random rank distributions
\(\bar {t}/m=0.5\)  \(\bar {t}/m=0.6\)  \(\bar {t}/m=0.7\)  \(\bar {t}/m=0.8\)  

Average  Min.  Max.  Average  Min.  Max.  Average  Min.  Max.  Average  Min.  Max.  
EC (q=2)  0.103  0.015  0.172  0.500  0.487  0.511  0.569  0.547  0.589  0.677  0.645  0.691 
EC  0.294  0.184  0.411  0.523  0.508  0.532  0.591  0.569  0.619  0.719  0.694  0.740 
BATS  0.497  0.495  0.498  0.598  0.598  0.598  0.698  0.696  0.699  0.798  0.798  0.759 
LC  0.478  0.470  0.486  0.581  0.570  0.592  0.687  0.673  0.697  0.786  0.778  0.792 
When the value of \(\bar t/m\) becomes larger, the achievable rate of EC codes consistently becomes more close to \(\bar t/m\). When \(\bar t/m=0.8\), for example, the average achievable rate of EC codes is nearly 90 % of \(\bar t/m\). It is not surprising to see that both BATS codes and LC codes outperform EC codes due to the much more complicated encoding process and degree distribution optimization in the former codes.
By comparing the maximum and minimum achievable rates, we notice that the performance of EC codes varies significantly for different rank distributions, especially when \(\bar t/m\) is relatively small. When \(\bar t/m = 0.5\), for some rank distributions, EC codes achieve more than 80 % of \(\bar t/m\); while for some other rank distributions, EC codes can only achieve less than half of the rate of BATS/LC codes.
In many potential applications of chunked codes, the rank distributions of the transfer matrices have certain features, instead of occurring purely randomly. For instance, the number of packets in a chunk received by the destination node is a summation of multiple binomial random variables, which can be roughly approximated by a Poisson random variable. Also, in an optimized transmission scheme, if the average packet loss rate over the network is higher, the number M _{ j } of packets transmitted for each chunk usually also becomes larger, so that the average rank \(\bar {t}\) has a relatively large value [18]. In practice, EC codes can benefit from these features of rank distributions and achieve much higher rates than a rank distribution randomly generated. Therefore, in the remainder of this section, we focus on the performance of EC codes in a practical scenario.
4.2 Line networks
We use the nearoptimal chunk transmission scheme described in [18] over the line network. In this scheme, the chunks are transmitted in a sequential manner, and every node v, except for the destination node, transmits \(M_{j}^{(v)}\) packets of each chunk B _{ j }, where \(M_{j}^{(v)}\) is an integervalued random variable. For the source node s, \(M_{j}^{(s)}\) is just the variable M _{ j } defined in Section 2.2. For all the network nodes and chunks, \(M_{j}^{(s)}\) has the same mean value \(\bar {M}\). For a fixed \(\bar {M}\), the distribution of \(M_{j}^{(v)}\) is optimized hopbyhop according to the number of j packets received/possessed by node v. The value of \(\bar {M}\) is chosen such that \(\bar {t}/\bar {M}\) is maximized, which is an upper bound on the network transmission rate that can be achieved by any chunked code under this transmission scheme.
Achievable network transmission rates of chunked codes in line networks with ε=0.1
Network length  EC  LC  BATS  \(\bar {t}/\bar {M}\)  \(\bar {M}\) 

2  0.851  0.874  0.878  0.879  32 
3  0.825  0.852  0.866  0.866  33 
4  0.817  0.853  0.857  0.857  33 
5  0.809  0.847  0.850  0.850  33 
6  0.795  0.840  0.844  0.845  34 
Achievable network transmission rates of chunked codes in line networks with ε=0.2
Network length  EC  LC  BATS  \(\bar {t}/\bar {M}\)  \(\bar {M}\) 

2  0.743  0.764  0.772  0.773  35 
3  0.718  0.752  0.756  0.757  36 
4  0.702  0.741  0.745  0.746  36.5 
5  0.691  0.732  0.737  0.738  37 
6  0.682  0.727  0.731  0.731  37.5 
Achievable network transmission rates of chunked codes in line networks with ε=0.4
Network length  EC  LC  BATS  \(\bar {t}/\bar {M}\)  \(\bar {M}\) 

2  0.533  0.559  0.569  0.570  44 
3  0.523  0.543  0.553  0.554  46 
4  0.504  0.539  0.542  0.543  48 
5  0.493  0.523  0.534  0.534  49 
6  0.484  0.523  0.527  0.528  50 
4.3 Comparison with overlapped chunked codes
5 Conclusions
In this paper, we studied the performance of overlapped chunked codes with constant chunk sizes. We proposed and analyzed EC codes, a novel class of random regular graphbased chunked codes, which outperform stateoftheart overlapped chunked codes. Compared with coded chunked codes, EC codes can achieve a rate very close to that of BATS codes and Lchunked codes in line networks with a proper optimization of the transmission scheme, but EC codes can support causal encoding and have lower encoding complexity.
6 Endnotes
^{1} For example, network protocols usually have a maximum transmission unit (MTU) ranging from hundred to thousand bytes.
^{2} EC codes were motivated by the expander graphs, and the expansion property was applied in the first analysis of EC codes to obtain a lower bound on the achievable rates [31]. In this paper, we provide a better bound on the achievable rate without an explicit application of the expansion property, but the name of the code is preserved.
^{3} If both d and n are odd, then an EC code with n chunks and degree d can be generated by attaching an arbitrary chunk to an EC code with n−1 chunks and degree d using the described method, which does not affect the asymptotic performance of EC codes.
^{4} To the best of our knowledge, no efficient algorithms have been developed for uniformly sampling a rank distribution with a given mean value. Here, we use the following method for randomly sampling rank distributions. For a fixed \(\bar {t}\), denote \(a=\lfloor \bar {t}\rfloor \). We first sample a distribution (t _{0},t _{1},…,t _{ a }) over the set {0,1,…,a} and a distribution (t _{ a+1},t _{ a+2},…,t _{ m }) over the set {a+1,a+2,…,m} using the method in [41], which gives almost uniform sampling of distributions over the corresponding set. Let \(\eta =(\sum _{i=a+1}^{m} {it}_{i}\bar {t})/(\sum _{i=a+1}^{m} {it}_{i}\sum _{i=0}^{a} {it}_{i})>0\). Then, we get a distribution (η t _{0},η t _{1},…,η t _{ a },(1−η)t _{ a+1},(1−η)t _{ a+2},…,(1−η)t _{ m }), whose expectation is equal to \(\bar {t}\).
Declarations
Acknowledgements
This work was partially supported by NSFC Grants (Nos. 61501221, 61170069, 61373014, 91218302, 61321491, and 61471215); Natural Science Foundation of Jiangsu Province Grant (No. BK20150588), Science and Technology Pillar Program (Industry) of Jiangsu Province Grant (No. BE2013116); Collaborative Innovation Center of Novel Software Technology and Industrialization; EU FP7 IRSES MobileCloud Project Grant (No. 612212); and a grant from the University Grants Committee of the Hong Kong Special Administrative Region (Project No. AoE/E02/08). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 R Ahlswede, N Cai, SYR Li, RW Yeung, Network information flow. IEEE Trans. Inf. Theory. 46(4), 1204–1216 (2000).MathSciNetView ArticleMATHGoogle Scholar
 SYR Li, RW Yeung, N Cai, Linear network coding. IEEE Trans. Inf. Theory. 49(2), 371–381 (2003).MathSciNetView ArticleMATHGoogle Scholar
 T Ho, R Koetter, M Medard, DR Karger, M Effros, in Proc. IEEE International Symposium on Information Theory. The benefits of coding over routing in a randomized setting, (2003), p. 442.Google Scholar
 T Ho, M Medard, R Koetter, DR Karger, M Effros, J Shi, B Leong, A random linear network coding approach to multicast. IEEE Trans. Inf. Theory. 52(10), 4413–4430 (2006).MathSciNetView ArticleMATHGoogle Scholar
 Y Wu, in Proc. International Symposium on Information Theory. A trellis connectivity analysis of random linear network coding with buffering, (2006), pp. 768–772.Google Scholar
 AF Dana, R Gowaikar, R Palanki, B Hassibi, M Effros, Capacity of wireless erasure networks. IEEE Trans. Inf. Theory. 52(3), 789–804 (2006). doi:10.1109/TIT.2005.864424.MathSciNetView ArticleMATHGoogle Scholar
 DS Lun, M Medard, R Koetter, M Effros, On coding for reliable communication over packet networks. Phys. Commun. 1(1), 3–20 (2008).View ArticleGoogle Scholar
 PA Chou, Y Wu, K Jain, in Proc. 41st Allerton Conference on Communication, Control, and Computing. Practical network coding, (2003), pp. 40–49.Google Scholar
 Z Liu, C Wu, B Li, S Zhao, in Proc. IEEE International Conference on Computer Communications. UUSee: Largescale operational ondemand streaming with random network coding, (2010), pp. 1–9.Google Scholar
 D Silva, W Zeng, FR Kschischang, in Proc. Workshop on Network Coding. Sparse network coding with overlapping classes, (2009), pp. 74–79.Google Scholar
 A Heidarzadeh, AH Banihashemi, in Proc. IEEE Information Theory Workshop. Overlapped chunked network coding, (2010), pp. 1–5.Google Scholar
 Y Li, E Soljanin, P Spasojevic, Effects of the generation size and overlap on throughput and complexity in randomized linear network coding. IEEE Trans. Inf. Theory. 57(2), 1111–1123 (2011).MathSciNetView ArticleGoogle Scholar
 S Yang, RW Yeung, in Proc. IEEE International Symposium on Information Theory. Coding for a network coded fountain (Saint Petersburg, Russia, 2011), pp. 2647–2651.Google Scholar
 S Yang, RW Yeung, Batched sparse codes. IEEE Trans. Inf. Theory. 60(9), 5322–5346 (2014).MathSciNetView ArticleGoogle Scholar
 K Mahdaviani, M Ardakani, H Bagheri, C Tellambura, in Proc. International Symposium on Network Coding. Gamma codes: a lowoverhead linearcomplexity network coding solution, (2012), pp. 125–130.Google Scholar
 K Mahdaviani, R Yazdani, M Ardakani, in Proc. International Symposium on Network Coding. Overheadoptimized gamma network codes,(2013), poster.Google Scholar
 S Yang, B Tang, in Proc. IEEE Information Theory Workshop. From LDPC to chunked network codes, (2014), pp. 406–410.Google Scholar
 B Tang, S Yang, B Ye, S Lu, S Guo, Nearoptimal onesided scheduling for coded segmented network coding. IEEE Trans. Comput. (2015), to appear.Google Scholar
 P Pakzad, C Fragouli, A Shokrollahi, in Proc. IEEE International Symposium on Information Theory. Coding schemes for line networks, (2005), pp. 1853–1857. doi:10.1109/ISIT.2005.1523666.
 U Niesen, C Fragouli, D Tuninetti, On capacity of line networks. IEEE Trans. Inf. Theory. 53(11), 4039–4058 (2007).MathSciNetView ArticleMATHGoogle Scholar
 BN Vellambi, N Torabkhani, F Fekri, Throughput and latency in finitebuffer line networks. IEEE Trans. Inf. Theory. 57(6), 3622–3643 (2011). doi:10.1109/TIT.2011.2137070.MathSciNetView ArticleGoogle Scholar
 Q Huang, K Sun, X Li, D Wu, in Proc. ACM International Symposium on Mobile Ad Hoc Networking and Computing. Just fun: a joint fountain coding and network coding approach to losstolerant information spreading (Philadelphia, PA, USA, 2014), pp. 83–92.Google Scholar
 X Xu, PKM Gandhi, YL Guan, PHJ Chong, Twophase cooperative broadcasting based on batched network code. arXiv preprint arXiv:1504.04464, (2015). http://arxiv.org/abs/1504.04464.
 S Chachulski, M Jennings, S Katti, D Katabi, Trading structure for randomness in wireless opportunistic routing. SIGCOMM Comput. Commun. Rev. 37(4), 169–180 (2007).View ArticleGoogle Scholar
 Y Lin, B Li, B Liang, in Proc. IEEE International Conference on Network Protocols. CodeOR: Opportunistic routing in wireless mesh networks with segmented network coding, (2008), pp. 13–22.Google Scholar
 D Koutsonikolas, CC Wang, YC Hu, Efficient networkcodingbased opportunistic routing through cumulative coded acknowledgments. IEEE/ACM Trans. Netw. 19(5), 1368–1381 (2011).View ArticleGoogle Scholar
 P Maymounkov, NJA Harvey, DS Lun, in Proc. 44th Allerton Conference on Communication, Control, and Computing. Methods for efficient network coding, (2006).Google Scholar
 M Luby, in Proc. 43rd Annual IEEE Symposium on Foundations of Computer Science. LT codes, (2002), pp. 271–282.Google Scholar
 A Shokrollahi, Raptor codes. IEEE Trans. Inf. Theory. 52(6), 2551–2567 (2006).MathSciNetView ArticleMATHGoogle Scholar
 K Mahdaviani, R Yazdani, M Ardakani, Linearcomplexity overheadoptimized random linear network codes. arXiv preprint arXiv:1311.2123 (2013). http://arxiv.org/abs/1311.2123.
 B Tang, S Yang, Y Yin, B Ye, S Lu, in Proc. IEEE International Symposium on Information Theory. Expander graph based overlapped chunked codes (Cambridge, MA, USA, 2012), pp. 2451–2455.Google Scholar
 S Yang, RW Yeung, HF Cheung, HHF Yin, in Proc. 52nd Allerton Conference on Communication, Control, and Computing. BATS: Network coding in action, (2014), pp. 1204–1211.Google Scholar
 GE Andrews, The Theory of Partitions vol. 2 (Cambridge University Press, New York, NY, USA, 1998).Google Scholar
 M Gadouleau, Z Yan, Packing and covering properties of subspace codes for error control in random linear network coding. IEEE Trans. Inf. Theory. 56(5), 2097–2108 (2010).MathSciNetView ArticleGoogle Scholar
 L Shi, N Wormald, in Surveys in combinatorics. Models of random regular graphs (LMS Lecture Note Series 267, 1999), pp. 239–298.Google Scholar
 BD McKay, NC Wormald, B Wysocka, Short cycles in random regular graphs. Electron. J. Comb.11(1), 1–12 (2004).MathSciNetMATHGoogle Scholar
 M Mitzenmacher, E Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Cambridge University Press, New York, NY, USA, 2004).MATHGoogle Scholar
 BD McKay, NC Wormald, Uniform generation of random regular graphs of moderate degree. J. Algoritm. 11(1), 52–67 (1990).MathSciNetView ArticleMATHGoogle Scholar
 G Margulis, Explicit construction of concentrators. Probl. Peredachi Inf.9(4), 325–332 (1973).MathSciNetGoogle Scholar
 S Hoory, N Linial, A Wigderson, Expander graphs and their applications. Bull. Amer. Math. Soc. 43(4), 439–561 (2006).MathSciNetView ArticleMATHGoogle Scholar
 NA Smith, RW Tromble, Sampling uniformly from the unit simplex. http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf.