 Research
 Open Access
 Published:
A reconfigurable and compact subpipelined architecture for AES encryption and decryption
EURASIP Journal on Advances in Signal Processing volume 2023, Article number: 5 (2023)
Abstract
AES has been used in many applications to provide the data confidentiality. A new 32bit reconfigurable and compact architecture for AES encryption and decryption is presented and implemented in nonBRAM FPG in this paper. It can be reconfigured for the options of different key sizes which is very flexible for the users to apply AES for various application environments. The proposed design employs a singleround architecture and subpipeling to minimize the hardware cost. The fully composite field GF((2^{4})^{2})based encryption/decryption and keyschedule lead to the lower hardware complexity and efficient subpipelining for 32bit data path. In addition, a new subpipelined onthefly keyschedule over composite field GF((2^{4})^{2}) is proposed for all standard key sizes (128, 192, 256bit) which generates the roundkeys simultaneously and efficiently. This feature is very useful and efficient when the main key has been changed since AES is a symmetrickey cryptography and the session key usually changes frequently. The proposed reconfigurable and compact design has higher throughput and lower hardware cost. It achieves throughputs of 375Mbits/s with 128bit key, 318Mbits/s with 192bit key and 275Mbits/s with 256bit key on VIRTEX XC4VSX2512, and the total number of slices is 1766. The proposed reconfigurable and compact AES architecture can be efficiently applied in computingrestricted environments such as wireless and embedded devices.
1 Introduction
Advanced Encryption Standard (AES) based on Rijndael encryption algorithm has been used to replace DES in security services [1,2,3]. Hardware AES implementations are attractive because it provides better throughput as well as higher physical security. Compared with ApplicationSpecific Integrated Circuit (ASIC), field programmable gate array (FPGA) becomes more and more popular because of its scalability, reprogrammability, and fast development.
Numerous FPGA [4,5,6,7,8,9,10] and ASIC [7, 11, 12] implementations of the AES have been presented and evaluated. Other AES implementations have also been proposed such as GPUbased [13], Multicore Processorbased [14], and Rapid SingleFluxQuantum Circuitsbased [15] implementations. Fully unrolled schemes [6, 8] can achieve high throughput, but there are much more area and energy cost which only suitable for highend applications. Another approach is only implementing a singleround unit and applies the same unit in different rounds.
In this paper, a compact and reconfigurable design of AES with low hardware cost and adequate throughput is proposed and implemented in a nonBRAM FPGA. This design applies a 32bit singleround unit, which costs much less hardware area than the 128bit fully unrolled schemes. In order to reduce the hardware complexity further, we convert the arithmetic operations of AES from field GF(2^{8}) to field GF((2^{4})^{2}). Unlike the previous designs in [6, 8, 12, 16] where partialcomposite field AES is applied, we conduct the entire AES operations in GF((2^{4})^{2}) to minimize the overhead of isomorphic mapping functions. In our design, only two forward mapping functions and one backward mapping function are used. In addition, subpipelining is applied to improve the throughput/area ratio.
The standard announced by NIST [2] indicates that AES is a block cipher with 128bit block size and 128, 192, 256 bit key sizes. These three key sizes are specified for various security levels. The capability to deal with all key sizes makes reconfigurability an important feature of AES implementations. The previous work of [6, 8, 12, 17,18,19,20] applied the onthefly key generator to support instant key changing. The design in [8] made a subpipelined keyschedule, but it only supported 128bit key size. When subpipelining onthefly keyschedule is employed in an AES implementation, the stages in keyschedule must be synchronized with the stages in the cipher, because they share the same clock. In this design, we propose a subpipelined onthefly keyschedule over field GF((2^{4})^{2}), which supports all three key sizes.
The issue of secure communication in computingrestricted environments, such as personal digital assistants (PDAs), wireless devices, and many other embedded devices, has become more important recently. In order to apply AES in these devices, the AES implementations must be cost efficient. The objective of this research work is to design a reconfigurable and compact AES architecture which can be applied to the computingrestricted devices. The proposed architecture can be reconfigured to three different AES key sizes which is very useful when the users change the main key and also change the key sizes for different security levels because AES is a symmetrickey cryptography and the session key usually changes frequently. We also propose a subpipelined onthefly keyschedule for three options of key sizes that make the proposed architecture be easily implemented on nonBRAM FPGA.
The remainder of the paper is organized as the following. In Sect. 2, AES algorithm is introduced. The proposed compact and reconfigurable AES architecture is presented in Sect. 3. Implementation and performance are included in Sect. 4. Sections 5 and 6 are the conclusion and future work.
2 AES algorithm
AES is a symmetric block cipher with block size of 128bit and three key sizes of (128, 192, or 256bit) [1,2,3]. The AES parameters depend on the key size (Table 1, the size of word is 32 bits): AES runs iteratively on four transformations (inv/Subbytes, inv/ShiftRows, inv/MixColumns and addroundkey) with different sequences in encryption and decryption. Figure 1 illustrates the basic architecture of AES. In the initial round (r = 0), only addroundkey is performed; in the final round (r = Nr), it skips inv/MixColumns. The keyschedule module expands cipherkey to (Nr + 1) × 4 words of roundkeys. Each round applies a unique 128bit roundkey in the addroundkey operation [1, 2].
2.1 Subbytes
Subbytes are the only nonlinear transformation in AES which is also called SBox. SBox is a 16 × 16 matrix containing all possible 256 8bit values, which is used to perform a nonlinear bytebybyte substitution of the state.
Considering a byte {x_{7}x_{6}x_{5}x_{4}x_{3}x_{2}x_{1}x_{0}}, Subbytes transformation has two steps [1]:

(i)
{x’_{7} x’_{6} x’_{5} x’_{4} x’_{3} x’_{2} x’_{1} x’_{0}} is its multiplicative inverse in GF(2^{8}) field, modulo the irreducible polynomial m(x) = x_{8} + x_{4} + x_{3} + x + 1; {00000000}’s multiplicative inverse in GF(2^{8}) field is itself;

(ii)
An affine transformation over GF(2) is conducted on the inverse of {x_{7}x_{6}x_{5}x_{4}x_{3}x_{2}x_{1}x_{0}} (Eq. 1 [1]).
$$\left[ {\begin{array}{*{20}c} {y_{0} } \\ {y_{1} } \\ {y_{2} } \\ {y_{3} } \\ {y_{4} } \\ {y_{5} } \\ {y_{6} } \\ {y_{7} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 \\ \end{array} } \right]\;\;\left[ {\begin{array}{*{20}c} {x^{\prime}_{0} } \\ {x_{1}^{^{\prime}} } \\ {x_{2}^{^{\prime}} } \\ {x_{3}^{^{\prime}} } \\ {x_{4}^{^{\prime}} } \\ {x_{5}^{^{\prime}} } \\ {x_{6}^{^{\prime}} } \\ {x_{7}^{^{\prime}} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 0 \\ \end{array} } \right]$$(1)
2.2 ShiftRows
This transformation circularly shifts each row of the state to the left on encryption. As in Fig. 2 [1], the top row of the state is noted as row(0), and the bottom row is noted as row(3). The ShiftRows perform i − byte circular left shift to row(i) (i = 0, 1, 2, 3).
2.3 MixColumns
This transformation treats each column of the state as a fourterm polynomial over GF(2^{8}) and transforms each column to a new one by multiplying it with a constant polynomial a(x) = {03}x^{3} + {01}x^{2} + {01}x + {02} modulo x^{4} + 1. Equation 2 [1] is the matrix form of MixColumns.
2.4 Addroundkey
The addroundkey is a simple logical XOR of the current state with a roundkey which is generated by the keyschedule.
2.5 Keyschedule
Keyschedule derives roundkeys from the cipherkey. It consists of key expansion and roundkey selections. Figure 3 [2] shows the keyschedule algorithm which generates roundkeys for AES128, AES192 and AES256. The functions used in keyschedule are the following [1, 2]:

Rotword: Onebyte circular left shift on a word.

Subword: Using SBox to perform a byte substitution on each byte.

XOR with Rcon: XORing with a round constant Rcon[j], Rcon[j] = (RC[j], 0, 0, 0) with RC[1] = 1, RC[j] = 2 · RC[j − 1].
3 32bit subpipelined reconfigurable and compact architecture for AES
In this section, the 32bit reconfigurable and compact AES architecture is proposed. In our design, the data path is 32bit. That is one operation, for example, sbox, will be applied four times to process the 128 bits of one block in plaintext. The subpipelined onthefly keyschedule for different key sizes is also presented to provide the roundkey simultaneously and efficiently. In addition, the equivalent cipher [1] is adopted to make the same data flow for encryption and decryption and to share the reusable units.
3.1 32bit singleround unit
Roll unfolded architecture is widely used to achieve high throughput. It conducts multiple rounds on one block by implementing more than one round units on the hardware. The more round units the architecture includes, the higher the hardware cost. The alternative scheme, which is called the singleround unit architecture, can be applied to simplify the hardware complexity. Instead of unfolding all the round units in devices, it implements a singleround unit which costs approximately 1/N_{r} area of the unfolded scheme.
We propose a 32bit singleround unit for a compact AES architecture. It needs iterating four times to perform a round on a block (128bit), once every 32 bits.
3.2 Full composite field architecture with keyschedule
Many highend FPGA devices possess BlockRAMs (BRAMs) which is efficient for the implementation of SBox. SBox, also referred as Subbytes, is the important and complicated operation in both encryptor/decryptor and keyschedule modules. However, these BRAMbased designs cannot be implemented in the lowcost devices which do not have BRAMs. An alternative approach for SBox implementation is using combinational logic. But this method may lead to high hardware complexity because of the mathematical operations in AES over finite field GF(2^{8}).
The key step of SBox is calculating multiplicative inverse of each byte. Since the introduction of composite field GF((2^{4})^{2}), the calculation of multiplicative inverse over GF((2^{4})^{2}) has been investigated [6, 8, 12, 16]. The architectures in [5, 18, 21] applied the field GF((2^{4})^{2}) to affine transformation in SBox. By decomposing these operations from GF(2^{8}) to its subfield GF(2^{4}), the hardware complexity of SBox can be decreased dramatically.
In Fig. 4a, in each round before SBox, it needs an isomorphic mapping function (MAP) from GF(2^{8}) to GF((2^{4})^{2}); and the inverse mapping (MAP^{−1}) afterwards. If key size is 128 bits, it applies 10 times SBox to the plaintext and the cipherkey, which means that it needs 20MAPs and 20 MAP^{−1} s for the encryption of 128bit data. In order to save the cost of MAP and MAP^{−1}, we propose a new 32bit complete composite field approach (Fig. 4b). The GF((2^{4})^{2}) field applies in all transformations in encryptor/decryptor and keyschedule. As illustrated in Fig. 4b, one MAP and one MAP^{−1} are applied in encryption, and one MAP is applied in keyschedule. This is a constant overhead which is not affected by the number of rounds.
We use the composite field defined by Wolkerstorfer et al. [21]. There are two irreducible polynomials (Eqs. 3 and 4) involved in multiplication and inversion in GF((2^{4})^{2}).
The irreducible polynomial for the field GF(2^{8}) in AES is:
The isomorphic mapping functions between field GF(2^{8}) and field GF((2^{4})^{2}) are determined by the irreducible polynomials of field GF(2^{8}) (Eq. 5) and field GF((2^{4})^{2}) (Eqs. 3 and 4). We use the following mapping formulas in [21] to convert the representations between GF(2^{8}) and GF((2^{4})^{2}).
In Eq. 6, a is an element in field GF(2^{8}). MAP(a) converts a to its isomorphic element in GF((2^{4})^{2}), which is represented as a_{h}x + a_{l}.
In Eq. 7, a_{h}x + a_{l} is an element in field GF((2^{4})^{2}). MAP^{−1}(a_{h}x + a_{l}) converts a_{h}x + a_{l} to its isomorphic element in GF(2^{8}), which is represented as a.
3.3 Subpipelined encryptor/decryptor and keyschedule
Pipelining is applied in the designs to optimize speed/area ratio of AES. By inserting registers in combinational logic circuits, multiple blocks of hardware are running simultaneously. The frequency of the design is determined by the maximum delay between registers. We reduce the maximum delay and increase the frequency by optimizing the balance between stages. A 32bit singleround subpipelined architecture in full composite field is proposed where one round unit is implemented and subpipelined into eight substages. To generate the roundkeys synchronously, we present an onthefly keyschedule. The encryption/decryption unit and the key expansion unit share the same clock which leads to the fact that the clock frequency is determined by the maximum delay in both units. This makes the balance of substage in keyschedule as important as in encryptor/decryptor. We propose a new subpipelined keyschedule on composite field for all standard key sizes. The most costly part of keyschedule is still SBox. We divide it into the same substages as in encryptor/decryptor.
3.4 Doubleblock subpipelined architecture
The proposed architecture for encryptor is illustrated in Fig. 5. The decryption can be easily implemented by the equivalent cipher [1]. The eight 32bit registers (four in ShiftRows, three in Subbytes and one between Subbytes and MixColumns) are used to cut one round unit into eight substages, which leads to an eight clock cycles initial delay to generate the first 32bit ciphertext. clk counter is a clock register counter generated in keyschedule. It is used to synchronize encryptor/decryptor and keyschedule. We use a doubleblock (block A and B) data flow in our subpipelined architecture.
Figure 5a illustrates the subpipelining in ShiftRows operation, and Fig. 5b shows the subpipelining in Subbytes operation. We can see that the mappings from GF(2^{8}) to GF((2^{4})^{2}) are only required once after the inputs of plaintext and cipherkey. The inverse mapping ( GF((2^{4})^{2}) to GF(2^{8})) is applied to the final output in order to get the cipher text.
The 3to1 multiplexer (“mul”) is controlled by the clk counter:

Case a In initial round, where 0 ≤ clk counter < 8, 128bit plaintext is MAPed into GF((2^{4})^{2}) and XORed with the according roundkey in four clock cycles, 32 bits at each clock. The result is the outcome of the initial round (Nr = 0) which is the input of the second round;

Case b In normal rounds, where 8 ≤ clk counter < Nr × 8, the output of MixColumns XORs with the corresponding roundkey.

Case c In the last round, where Nr × 8 ≤ clk counter < (Nr + 1) × 8. The output of Subbytes XORs with the corresponding roundkey. The ciphertext is obtained.
The detail operations of the ShiftRows, Subbytes and MixColumns are presented in the following.
3.4.1 ShiftRows
We use our proposed ShiftRows operation [22] in the design. It includes sixteen 8bit registers and three 2to1 multiplexers. The block of data is shifted column by column. Two blocks of data are processed in the pipeline.
Our ShiftRows operation is designed in a column fashion (Fig. 6). In the architecture, the data (32bits) in the columns are shifted in the order of column instead of rows. Each column is composed of four shift registers, and each register has 8 bits. By transforming the ShiftRows operation to a column fashion operation, we can make the design of Mixcolumns operation easier, since all the data in one column are required in the MixColumn operation.
The following are the ShiftRows procedure for encryption.

(1)
First row No shift. We just let the data flow through.

(2)
Second row Circular left shift operation. In this case, we connect the output of register R1C2 and the output of R1C3 to a multiplexer in order to select the output.

(3)
Third row Switch data. Switch the data between first element and third element, second element and fourth element in the row. The outputs of R2C1 and R2C3 are connected to a Multiplexer.

(4)
Fourth row Circular right shift operation. Similar to the case of second row, we connect the output of register R3C0 and the output of R3C3 to a Multiplexer.
Similarly, we can derive the procedures for Inverse ShiftRows (InvShiftRows) operations:

(1)
First row No shift.

(2)
Second row Circular right shift operation. We connect the output of register R1C0 and the output of R1C3 to a Multiplexer.

(3)
Third row Switch Data. Same as the operation in ShiftRows of encryption.

(4)
Fourth row Circular left shift operation. We connect the output of register R3C2 and the output of R3C3 to a multiplexer in order to select the output.
The multiplexers are controlled by some clock counters and the encryption/decryption signals.
3.4.2 Subpipelined Subbytes
The key step of Subbytes is the calculation of the multiplicative inverse. Figure 7 illustrates the architecture of Subbytes proposed in [8]. It uses multiplication in GF(2^{4})^{2} three times. It also needs one inversion (x^{−1}), one constant multiplier with {e} (× e, {e} is in hexadecimal notation, which is ‘1110’ in binary notation), one squarer and two 4bit XORs ( ⊕).
We proposed a 32bit subpipelined compact sbox architecture in composite field of GF(2^{4})^{2} with balanced substages and efficient performance [23]. Considering x, y, z ∈ GF(2^{4}), x, y and z are represented in binary notation where \(x=\left\{{x}_{3}{x}_{2}{x}_{1}{x}_{0}\right\}, y=\left\{{y}_{3}{y}_{2}{y}_{1}{y}_{0}\right\},z=\left\{{z}_{3}{z}_{2}{z}_{1}{z}_{0}\right\}\). Let a, b, c, d, e and f be 1bit values, which equal to 0 or 1. ⊕ stands for XORoperation. x_{0}y_{1} means x_{0}∧y_{1}. Equations 8, 9, 10 and 11 [21] are used to calculate squaring, constant multiplication with {e}, multiplication and multiplicative inverse.
In our design which is illustrated in Fig. 5, Subbytes should be cut into four substages. The key to an efficient subpipelining technology is to balance the delays of these substages.
We derive a new Eq. 12 from Eq. 11 to reduce the delay caused by x^{−1}.
Equation 12 is derived in three steps:

(1)
In Eq. 11, replace “a” by its expression:
$$y_{0} = x_{1} \oplus x_{2} \oplus x_{3} \oplus \left( {x_{1} x_{2} x_{3} } \right) \oplus x_{0} \oplus \left( {x_{0} x_{2} } \right) \oplus \left( {x_{1} x_{2} } \right) \oplus \left( {x_{0} x_{1} x_{2} } \right)$$$$y_{1} = \left( {x_{0} x_{1} } \right) \oplus \left( {x_{0} x_{2} } \right) \oplus \left( {x_{1} x_{2} } \right) \oplus x_{3} \oplus \left( {x_{1} x_{3} } \right) \oplus \left( {x_{0} x_{1} x_{3} } \right)$$$$y_{2} = \left( {x_{0} x_{1} } \right) \oplus x_{2} \oplus \left( {x_{0} x_{2} } \right) \oplus x_{3} \oplus \left( {x_{0} x_{3} } \right) \oplus \left( {x_{0} x_{2} x_{3} } \right)$$$$y_{3} = x_{1} \oplus x_{2} \oplus x_{3} \oplus \left( {x_{1} x_{2} x_{3} } \right) \oplus \left( {x_{0} x_{3} } \right) \oplus \left( {x_{1} x_{3} } \right) \oplus x_{2} x_{3}$$ 
(2)
The expressions in step 1 can be equally changed to:
$$y_{0} = x_{1} \oplus x_{2} \oplus \left( {x_{1} x_{2} } \right) \oplus \left( {x_{0} x_{2} } \right) \oplus \left( {x_{0} \oplus x_{3} } \right)\left( {1 \oplus \left( {x_{1} x_{2} } \right)} \right)$$$$y_{1} = \left( {x_{0} x_{1} } \right) \oplus \left( {x_{0} x_{2} } \right) \oplus \left( {x_{1} x_{2} } \right) \oplus x_{3} \left( {1 \oplus x_{1} \oplus \left( {x_{0} x_{1} } \right)} \right)$$$$y_{2} = \left( {x_{0} x_{1} } \right) \oplus x_{2} \oplus \left( {x_{0} x_{2} } \right) \oplus x_{3} \left( {1 \oplus x_{0} \oplus \left( {x_{0} x_{2} } \right)} \right)$$$$y_{3} = x_{1} \oplus x_{2} \oplus x_{3} \left( {1 \oplus x_{0} \oplus x_{1} \oplus x_{2} \oplus (x_{1} x_{2} } \right))$$ 
(3)
Let \(a = x_{1} x_{2} , b = x_{0} x_{2} , c = x_{0} x_{1} , d = x_{1} \oplus x_{2} , e = 1 \oplus a, f = b \oplus c\), we have:
$$\begin{gathered} y_{0} = a \oplus b \oplus d \oplus \left( {\left( {x_{0} \oplus x_{3} } \right)e} \right) \hfill \\ y_{1} = a \oplus f \oplus x_{3} \left( {1 \oplus x_{1} \oplus c} \right) \hfill \\ y_{2} = f \oplus x_{2} \oplus x_{3} \left( {1 \oplus x_{0} \oplus b} \right) \hfill \\ y_{3} = d \oplus x_{3} \left( {e \oplus x_{0} \oplus d} \right) \hfill \\ \end{gathered}$$(12)
According to Eq. 12, we design the logic circuit illustrated in Fig. 8 to perform x^{−1} over GF(2^{4})^{2}. Besides multiplicative inversion, other operations in Fig. 7 are the three multiplications (× 1, × 2 and × 3). In order to decrease the maximum delay caused by multiplication, we separate each multiplication into two steps and put each step in different substages. The registers between each substage store the result of the first step of multiplication and pass it to the second step. We decompose these three multipliers into two different manners (ABtype and MNtype) to achieve the best balance.
ABtype The ABtype multiplication is based on Eq. 13 which is derived from Eq. 10. Step A calculates the value of all the binomials; Step B conducts XOR of every four values to generate z0, z1, z2 and z3. A register is inserted between Step A and Step B to store p_{0}, p_{1}, …, p_{15}. The multiplication “ × _{1}” in Fig. 7 is separated as × _{1A} and × _{1B} in Fig. 8;
Step A:
Step B:
MNtype The MNtype multiplication is based on Eq. 14 which is also derived from Eq. 10. Step M creates the value of a, b and c; Step N implements the rest of Eq. 10. A register is inserted between Step M and Step N to store a, b, c. The multiplications of “ × _{2}” and “ × _{3}” in Fig. 7 are separated as × _{2 M} and × _{2 N}, × _{3 M} and × _{3 N} in Fig. 8.
Step M:
Step N:
The last operation in Subbytes is the affine transformation. We derive Eq. 21 to do the affine transformation in \(GF\left( {2^{4} } \right)^{2}\) based on Eqs. 1,6 and 7.
Consider \(p \in {\text{GF}}\left( {2^{4} } \right)^{2} , \;q \in {\text{GF}}\left( {2^{8} } \right):p = \left\{ {p_{7} p_{6} p_{5} p_{4} p_{3} p_{2} p_{1} p_{0} } \right\}, q = \left\{ {q_{7} q_{6} q_{5} q_{4} q_{3} q_{2} q_{1} q_{0} } \right\}\)
For Eq. 6:

(1)
Replace \(a_{A} , a_{B} , a_{C} {\text{with their expressions:}}\)
$$a_{l0} = a_{4} \oplus a_{6} \oplus a_{0} \oplus a_{5} , a_{l1} = a_{1} \oplus a_{2} , a_{l2} = a_{1} \oplus a_{7} , a_{l3} = a_{2} \oplus a_{4}$$$$a_{h0} = a_{4} \oplus a_{6} \oplus a_{5} , a_{h1} = a_{1} \oplus a_{7} \oplus a_{4} \oplus a_{6} , a_{h2} = a_{5} \oplus a_{7} \oplus a_{2} \oplus a_{3} , a_{h3} = a_{5} \oplus a_{7}$$ 
(2)
Let \(p {\text{replace}} \, a_{h} x + a_{l} , q \, {\text{replace}} \, a,\) we derive Eq. 15:
$$\begin{gathered} {\varvec{p}} = {\varvec{MAP}}\left( {\varvec{q}} \right),\user2{ p} \in {\varvec{GF}}\left( {2^{4} } \right)^{2} ,\user2{ q} \in {\varvec{GF}}\left( {2^{8} } \right) \hfill \\ p_{0} = q_{0} \oplus q_{4} \oplus q_{5} \oplus q_{6} , p_{1} = q_{1} \oplus q_{2} , p_{2} = q_{1} \oplus q_{7} ,p_{3} = q_{2} \oplus q_{4} \hfill \\ p_{4} = q_{4} \oplus q_{5} \oplus q_{6} , p_{5} = q_{1} \oplus q_{4} \oplus q_{6} \oplus q_{7} , p_{6} = q_{2} \oplus q_{3} \oplus q_{5} \oplus q_{7} , p_{7} = q_{5} \oplus q_{7} \hfill \\ \end{gathered}$$(15)
Similar steps are applied in Eq. 7. Equation 16 is derived:
In the following, we derive Eq. 21 based on Eqs. 1, 15 and 16.
Let \({x}^{^{\prime}},y\) be the element in \({\text{GF}}\left( {2^{8} } \right)\).: \(x^{\prime} = \left\{ {x^{\prime}_{7} x^{\prime}_{6} x^{\prime}_{5} x^{\prime}_{4} x^{\prime}_{3} x^{\prime}_{2} x^{\prime}_{1} x^{\prime}_{0} } \right\}, y = \left\{ {y_{7} y_{6} y_{5} y_{4} y_{3} y_{2} y_{1} y_{0} } \right\}\).
According to Eq. 1:
We convert \(y\) to \({\text{GF}}\left( {2^{4} } \right)^{2}\) and also represent \(x^{\prime}\) in \({\text{GF}}\left( {2^{4} } \right)^{2}\) to derive the affine transformation in \({\text{GF}}\left( {2^{4} } \right)^{2} .\)

(1)
Let \(w\) represent \(y\) in \(GF\left( {2^{4} } \right)^{2}\). By Eq. 15 (map from \(GF\left( {2^{8} } \right)\) to \(GF\left( {2^{4} } \right)^{2}\)):
$$\begin{gathered} w_{0} = y_{0} \oplus y_{4} \oplus y_{5} \oplus y_{6} , w_{1} = y_{1} \oplus y_{2} , w_{2} = y_{1} \oplus y_{7} ,w_{3} = y_{2} \oplus y_{4} ,w_{4} = y_{4} \oplus y_{5} \oplus y_{6} \hfill \\ w_{5} = y_{1} \oplus y_{4} \oplus y_{6} \oplus y_{7} , w_{6} = y_{2} \oplus y_{3} \oplus y_{5} \oplus y_{7} , w_{7} = y_{5} \oplus y_{7} \hfill \\ \end{gathered}$$(18) 
(2)
Let \(z\) be the \(GF\left( {2^{4} } \right)^{2}\) format of \(x^{\prime}.\) From Eq. 16:
$$\begin{gathered} x^{\prime}_{0} = z_{0} \oplus z_{4} , x^{\prime}_{1} = z_{4} \oplus z_{5} \oplus z_{7} , x^{\prime}_{2} = z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{7} ,x^{\prime}_{3} = z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{6} \hfill \\ x^{\prime}_{4} = z_{1} \oplus z_{3} \oplus z_{4} \oplus z_{5} \oplus z_{7} , x^{\prime}_{5} = z_{2} \oplus z_{4} \oplus z_{5} , x^{\prime}_{6} = z_{1} \oplus z_{2} \oplus z_{3} \oplus z_{4} \oplus z_{7} , \hfill \\ x^{\prime}_{7} = z_{2} \oplus z_{4} \oplus z_{5} \oplus z_{7} \hfill \\ \end{gathered}$$(19) 
(3)
Replace \(y\) in Eq. 18 with x’ in Eq. 17 and replace x’ with its \(GF\left( {2^{4} } \right)^{2}\) format z:
$$\begin{aligned} w_{0} & = y_{0} \oplus y_{4} \oplus y_{5} \oplus y_{6} \\ & = (x_{0}^{^{\prime}} \oplus x_{4}^{^{\prime}} \oplus x_{5}^{^{\prime}} \oplus x_{6}^{^{\prime}} \oplus x_{7}^{^{\prime}} \oplus 1) \oplus (x_{0}^{^{\prime}} \oplus x_{1}^{^{\prime}} \oplus x_{2}^{^{\prime}} \oplus x_{3}^{^{\prime}} \oplus x_{4}^{^{\prime}} ) \\ & \quad \quad \oplus \left( {x_{1}^{^{\prime}} \oplus x_{2}^{^{\prime}} \oplus x_{3}^{^{\prime}} \oplus x_{4}^{^{\prime}} \oplus x_{5}^{^{\prime}} \oplus 1} \right) \\ & \quad \quad \oplus \left( { x_{2}^{^{\prime}} \oplus x_{3}^{^{\prime}} \oplus x_{4}^{^{\prime}} \oplus x_{5}^{^{\prime}} \oplus x_{6}^{^{\prime}} \oplus 1} \right)\;\;\left( {\text{by Equation 17}} \right) \\ & = x_{2}^{^{\prime}} \oplus x_{3}^{^{\prime}} \oplus x_{5}^{^{\prime}} \oplus x_{7}^{^{\prime}} \oplus 1 = \left( {z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{7} } \right) \\ & \quad \quad \oplus \left( {z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{6} } \right) \oplus \left( { z_{2} \oplus z_{4} \oplus z_{5} } \right) \\ & \quad \quad \oplus \left( { z_{2} \oplus z_{4} \oplus z_{5} \oplus z_{7} } \right) \oplus 1\;\;\left( {\text{by Equation 19}} \right) \\ & = z_{6} \oplus 1 = \left( {z_{6} } \right)^{\prime} \\ \end{aligned}$$Similarly, we can get:
$$\begin{gathered} w_{1} = (z_{1} \oplus z_{2} \oplus z_{7} )^{\prime}, w_{2} = (z_{0} \oplus z_{5} \oplus z_{6} \oplus z_{3} )^{\prime}, w_{3} = z_{1} \oplus z_{5} \oplus z_{6} \oplus z_{7} \hfill \\ w_{4} = z_{0} \oplus z_{2} \oplus z_{4} \oplus z_{5} \oplus z_{6} \oplus z_{7} , w_{5} = z_{1} \oplus z_{5} \oplus z_{6} , w_{6} = (z_{2} \oplus z_{6} \oplus z_{7} )^{\prime} \hfill \\ w_{7} = (z_{3} \oplus z_{5} )^{\prime} \hfill \\ \end{gathered}$$(20) 
(4)
For the consistency of the other equations in this paper, we replace w by y, z by x (x,y \(\in GF\left( {2^{4} } \right)^{2}\)) in Eq. 20 and let \(a = x_{5} \oplus x_{6} \oplus x_{7}\), we derive
$$\begin{gathered} {\mathbf{y}}\,{\mathbf{ = }}\,{\mathbf{AFF\_TRAN}}\left( {\mathbf{x}} \right){\mathbf{:}} \hfill \\ a = x_{5} \oplus x_{6} \oplus x_{7} \hfill \\ y_{0} = (x_{6} )^{\prime}, y_{1} = (x_{1} \oplus x_{2} \oplus x_{7} )^{\prime},y_{2} = (x_{0} \oplus x_{3} \oplus x_{5} \oplus x_{6} )^{\prime},y_{3} = x_{1} \oplus a \hfill \\ y_{4} = x_{0} \oplus x_{2} \oplus x_{4} \oplus a, y_{5} = x_{1} \oplus x_{5} \oplus x_{6} ,y_{6} = \left( {x_{2} \oplus x_{6} \oplus x_{7} } \right)^{\prime},y_{7} = \left( {x_{3} \oplus x_{5} } \right)^{\prime} \hfill \\ \end{gathered}$$(21)
Figure 8 describes the proposed subpipelined architecture of Subbytes in GF((2^{4})^{2}). The dashed lines stand for the registers.
We cut an AES round unit into 8 substages with the maximum delay determined by part II (Fig. 8) in Subbytes. The inverse Sbox can use the same multiplicative inverse in encryption except that the inverse affine transformation is applied before the multiplicative inverse. We also derive the following formula for the inverse affine transformation in GF(2^{4})^{2}:
Figure 9 illustrates the design of Sbox in encryption and decryption. It can process eight bits input in GF(2^{4})^{2}. Four units are required to process the 32bit data path.
3.4.3 MixColumns on GF((2^{4})^{2})
MixColumns are another transformation which involves mathematical operations in GF((2^{4})^{2}). We derive the following formulas to perform MixColumns in composite field.
Since GF((2^{4})^{2}) is an isomorphic field to GF(2^{8}), and {02}, {03}, {01} in GF(2^{8}) are mapped to {26}, {27}, {01}, respectively, in GF((2^{4})^{2}), the MixColumns operation described by Eq. 2 can be mapped directly to Eq. 23.
Observing that in GF((2^{4})^{2}), {27} = {26} ⊕ {01}, Eq. 23 is equal to Eq. 24, where j = 0, 1, 2, 3:
Equation 24 presents the MixColumn transformation of one column of a state. The MixColumn transformation can be implemented by the parallel structure in Fig. 10.
In the following, we derive Eq. 28 to calculate x × 26 in GF((2^{4})^{2}). That is, we represent the results of x × {02} in GF((2^{4})^{2}).

(1)
Let x,y \(\in GF\left( {2^{8} } \right),\) \(y = x \times \left\{ {02} \right\}:\)
$$y_{0} = x_{7} , y_{1} = x_{0} \oplus x_{7} , y_{2} = x_{1} , y_{3} = x_{2} \oplus x_{7} , y_{4} = x_{3} \oplus x_{7} , y_{5} = x_{4} , y_{6} = x_{5} , y_{7} = x_{6}$$(25) 
(2)
Convert y to the element in GF((2^{4})^{2}). Let w represent y in GF((2^{4})^{2}), that is Eq. 18.

(3)
Let \(z\) be the \(GF\left( {2^{4} } \right)^{2}\) format of x:
$$\begin{gathered} x_{0} = z_{0} \oplus z_{4} , x_{1} = z_{4} \oplus z_{5} \oplus z_{7} , x_{2} = z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{7} ,x_{3} = z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{6} \hfill \\ x_{4} = z_{1} \oplus z_{3} \oplus z_{4} \oplus z_{5} \oplus z_{7} , x_{5} = z_{2} \oplus z_{4} \oplus z_{5} , x_{6} = z_{1} \oplus z_{2} \oplus z_{3} \oplus z_{4} \oplus z_{7} , \hfill \\ x_{7} = z_{2} \oplus z_{4} \oplus z_{5} \oplus z_{7} \hfill \\ \end{gathered}$$(26) 
(4)
Replace x and y with their corresponding GF((2^{4})^{2}) format z and w:
$$\begin{aligned} w_{0} & = y_{0} \oplus y_{4} \oplus y_{5} \oplus y_{6} \left( {{\text{by Equation}}\;18} \right) \\ & = x_{7} \oplus \left( {x_{3} \oplus x_{7} } \right) \oplus x_{4} \oplus x_{5} \left( {{\text{by Equation}}\;25} \right) \\ & = x_{3} \oplus x_{4} \oplus x_{5} \\ & = (z_{1} \oplus z_{4} \oplus z_{5} \oplus z_{6} ) \oplus \left( {z_{1} \oplus z_{3} \oplus z_{4} \oplus z_{5} \oplus z_{7} } \right) \oplus \left( {z_{2} \oplus z_{4} \oplus z_{5} } \right)\left( {{\text{by Equation}}\;26} \right) = z_{2} \oplus z_{3} \oplus z_{4} \oplus z_{5} \oplus z_{6} \oplus z_{7} \\ \end{aligned}$$Through the same procedures, we can derive:
$$\begin{gathered} w_{0} = z_{2} \oplus z_{3} \oplus z_{4} \oplus z_{5} \oplus z_{6} \oplus z_{7} , w_{1} = z_{0} \oplus z_{2} \oplus z_{4} , w_{2} = z_{0} \oplus z_{1} \oplus z_{3} \oplus z_{4} \oplus z_{5} \hfill \\ w_{3} = z_{1} \oplus z_{2} \oplus z_{4} \oplus z_{5} \oplus z_{6} , w_{4} = z_{3} \oplus z_{6} , w_{5} = z_{0} \oplus z_{3} \oplus z_{6} \oplus z_{7} \hfill \\ w_{6} = z_{1} \oplus z_{4} \oplus z_{7} , w_{7} = z_{2} \oplus z_{5} \hfill \\ \end{gathered}$$(27)

(5)
For consistency, replace z with x, and replace w with y (x,y \(\in GF\left( {2^{4} } \right)^{2}\)):
$$\begin{gathered} {\varvec{y}} = {\varvec{x}} \times 26,\user2{ }\;\user2{x,y} \in {\varvec{GF}}\left( {2^{4} } \right)^{2} \hfill \\ a = x_{2} \oplus x_{4} , b = x_{3} \oplus x_{6} \oplus x_{7} , c = x_{1} \oplus x_{5} \hfill \\ y_{0} = a \oplus b \oplus x_{5} , y_{1} = a \oplus x_{0} , y_{2} = c \oplus x_{0} \oplus x_{3} \oplus x_{4} , y_{3} = c \oplus a \oplus x_{6} \hfill \\ y_{4} = x_{3} \oplus x_{6} , y_{5} = b \oplus x_{0} , y_{6} = x_{1} \oplus x_{4} \oplus x_{7} , y_{7} = x_{2} \oplus x_{5} \hfill \\ \end{gathered}$$(28)
In this design for both encryption and decryption, we will modify the MixColumn and InvMixColumn architecture proposed by Fischer et al. [24]. We need to map the previous architecture from GF(2^{8}) to GF((2^{4})^{2}). It can be seen that we only need to modify the “xtime” operation. That is, to calculate “xtimes” in GF((2^{4})^{2}).
3.4.4 Subpipelined keyschedule
There are two approaches to implement keyschedule: (1) precalculated keyschedule and (2) onthefly keyschedule. In the precalculated keyschedule, the (Nr + 1) 128bit roundkeys are generated before the encryption or decryption begins and stored in the memory. The addroundkey operation accesses the roundkeys by referring to the corresponding address in the memory. The advantage of this approach is that the keyschedule only needs to be performed once; however, the drawbacks include:

(i)
The (Nr + 1) roundkeys cost (Nr + 1) × 128 bits memory space;

(ii)
The cipherkey should not change frequently. Every time it changes, the roundkeys must be recalculated.
In this paper, we propose a new 32bit pipelined onthefly keyschedule in fully composite field (GF((2^{4})^{2})) with 128, 192, 256bit key sizes, where each 128bit roundkey is generated at every four clock cycles (32bit at each clock). The following shows the 32bit roundkeys at each clock cycle (KA(i), and KB(i) represent the round keys for block A and block B, each is 32bit, 0 ≤ i ≤ 4Nr + 3).
The roundkeys for block A:
roundkey[0]={KA(0), KA(1), KA(2), KA(3)}
roundkey[1]={KA(4), KA(5), KA(6), KA(7)}
……
roundkey[Nr]={KA(4Nr), KA(4Nr+1), KA(4Nr+2), KA(4Nr+3)}
The roundkeys for block B:
roundkey[0]={KB(0), KB(1), KB(2), KB(3)}
roundkey[1]={KB(4), KB(5), KB(6), KB(7)}
……
roundkey[Nr]={KB(4Nr), KB(4Nr+1), KB(4Nr+2), KB(4Nr+3)}
Because we are using the onthefly keyschedule, keyschedule and encryptor/decryptor are sharing the same clock, and the general frequency is determined by the maximum delay in both keyschedule and encryptor/decryptor modules. To achieve an efficient pipelining, proper division in keyschedule is as important as in encryptor/decryptor. We know that subword is the most costly component in keyschedule. In order to make the optimal delay in both modules, we implement subword in the same way as Subbytes in encryptor/decryptor.
All mathematic operations in keyschedule are transformed into field GF((2^{4})^{2}). Subword shares the same structure as in Subbytes. Xorrcon is a simple XOR operation with a round constant, which is initially {01} and multiplied by {02} at each keyschedule round. Keyschedule round is defined as follows. It begins when clk counter = 0. If key size is 128 bit, keyschedule round cycle is four; if key size is 192 bit, keyschedule round cycle is six; if key size is 256 bit, keyschedule round cycle is eight. We know that in GF((2^{4})^{2}), {01} is still {01} and {02} is mapped to {26}. We can use Eq. 28 to generate round constant for each keyschedule round.
The proposed keyschedule has three key size options: Key128, Key192 and Key256. The notation of roundkey32 stands for 32bit roundkey for each clock cycle, roundkey stands for 128bit roundkey for a round of AES.
For decryption, the roundkey32 must be created in the reverse order. The last Nk roundkey32 from encryption is stored in a 256bit register to be used as the initial decipherkey roundkey32 for decryption. For a given cipherkey, at least one encryption operation must be performed in order to store the final Nk roundkey32 for use during decryption. Multiplexers are then used to select between the cipherkey and decipherkey, based on encryption or decryption mode, respectively. Since the decipherkey roundkey32 is already in GF((2^{4})^{2}), they do not pass through the MAP operation.
Figure 11 illustrates the keyschedule architecture. The multiplexers mul1 and mul2 are used to reconfigure the pipeline for each of the three key sizes.
SA, SB, SC and SD are the four sections of subword operation with interspersed registers. RW is the outcome of rotword. RC generates the round constant for xorrcon in GF((2^{4})^{2}). Multiplexor mul3 is used to select the correct previous roundkey32 as input to the subword operation. Multiplexor mul4 selects the appropriate calculated result to serve as the next roundkey32.
Table 2 summarizes the reconfigurable control of the multiplexers to generate three key sizes (• represents that the multiplexer is enabled for the corresponding key size, and the numbers represent the input selections of the multiplexer depending on the corresponding clock cycles).
When key size is 128 bits, the encryptor round number is ten. Two blocks A and B need 22 roundkeys. In our design, the first step is to map (MAP) cipherkey from GF(2^{8}) to GF((2^{4})^{2}). After that, it performs its isomorphic functions in GF((2^{4})^{2}). The output of keyschedule is roundkey32s represented in GF((2^{4})^{2}). They are the exact format required in encryption where the message blocks are represented in GF((2^{4})^{2}). No inverse MAP is required in keyschedule. SA, SB, SC and SD are the four sections of subword operation. We place three registers among the four substages in subword. RW is the outcome of rotword. RC generates the round constant for xorrcon in GF((2^{4})^{2}).
4 Implementation performance and comparison
Many studies of hardware AES implementations have been published. Table 3 summarizes the functions provided by different FPGA implementations.
We do not use BRAM in our design in order to make the architecture suitable for wireless and embedded devices. Our proposed architecture has been simulated and synthesized with Xilinx Synthesis Technology (XST) ISE 10, and implemented on a Xilinx Virtex4 device. From the synthesis result, we also optimize the delay time between different stages in our design to improve the performance. Table 4 illustrates the synthesis results with Virtex4 XC4VSX25 and performance comparison.
Compared with the previous architectures, our design focuses on the low cost, nonBRAM implementations. Pramstaller et al. proposed a compact design costing 1125 slices in [5] with throughput of 215 Mbps for 128bit, 180 Mbps for 192bit, and 156 Mbps for 256bit in the maximum frequency of 161 MHz. However, the round keys were precalculated by the key generator and RAM required to store those keys. We generate the round keys onthefly which is very useful and efficient when the key has been changed (AES is a symmetrickey cryptography, and the session key usually changes frequently.) In addition, our throughput increases greatly for each of the three key sizes. Furthermore, we propose a new subpipelined keyschedule which can support all three key sizes (128, 192, 256bits). The time delays between the stages in encryption/decryption and keyschedule have been optimized in our architecture. We also present a new 32bit complete composite field approach where the GF((2^{4})^{2}) field arithmetic applies in all transformations in encryptor/decryptor and keyschedule to save the cost of mapping between GF(2^{8}) and GF((2^{4})^{2}) greatly. In addition, the 32bit data path in our design can reduce the hardware cost greatly and can be efficiently applied in computingresources restricted environments, such as wireless devices and embedded devices.
5 Conclusion
AES is an important and popular cryptographic algorithm to secure the information and data transmission. In this paper, we propose a compact reconfigurable FPGA architecture for the AES implementation. The 32bit singleround unit design results in low area cost, which makes it suitable for lowend devices. The combinational logic approach of SBox eliminates the need for BRAMs.
In our architecture, a fully GF((2^{4})^{2}) composite field arithmetic is applied in all transformations in encryption/decryption and keyschedule to save the cost of mapping greatly. That is, only one MAP and one MAP^{−1} are applied in encryption/decryption, and one MAP is applied in keyschedule. Full composite fieldbased design decreases hardware complexity of arithmetic operations in AES. In addition, we apply subpipelining technology in both encryptor/decryptor and keyschedule modules to optimize the speed/area ratio. The capability to deal with three key sizes makes our design an efficient reconfigurable architecture of AES. The performance comparison indicates that the proposed AES architecture achieves better performance than previous work.
In conclusion, the proposed compact and reconfigurable AES architecture has high throughput and low area cost, which is very useful in the computingrestricted environment and wireless devices.
6 Future work
In the future, we will synthesize our FPGA prototype, optimize the design and implement it in VLSI. We believe the performance of the proposed architecture could be increased with current VLSI design tools and technology, and develop a new reconfigurable and efficient AES encryption/decryption chip which can be easily embedded into the wireless and computingrestricted devices to provide the security services.
References
W. Stallings, Cryptography and Network SecurityPrinciples and Practices, 4th edn. (Pearson Prentice hall, 2006)
NIST. Announcing the advanced encryption standard (AES). Available at https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf, 2001.
Daemen J, Rijmen V. AES proposal: Rijndael. Technical report, National Institute of Standards and Technology (NIST). Available at http://www.nic.funet.fi/pub/crypt/cryptography/symmetric/aes/nist/Rijndael.pdf, 2000.
P. Chodowiec, K. Gaj, Very compact FPGA implementation of the AES algorithm. Cryptogr Hardw Embed Syst CHES 2003, 319–333 (2003)
N. Pramstaller, J. Wolkerstorfer, A universal and efficient AES coprocessor for field programmable logic arrays, in Field programmable logic and application. ed. by J. Becker, M. Platzner, S. Vernalde (Springer Berlin Heidelberg, Berlin, Heidelberg, 2004), pp.565–574. https://doi.org/10.1007/9783540301172_58
T. Good, M. Benaissa, AES on FPGA from the fastest to the smallest, in Cryptographic hardware and embedded systems – CHES 2005. ed. by J.R. Rao, B. Sunar (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005), pp.427–440. https://doi.org/10.1007/11545262_31
N. Pramstaller, S. Mangard, S. Dominikus, J. Wolkerstorfer, Efficient AES implementations on ASICs and FPGAs, in Advanced encryption standard – AES. ed. by H. Dobbertin, V. Rijmen, A. Sowa (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005), pp.98–112. https://doi.org/10.1007/11506447_9
X. Zhang, K.K. Parhi, Highspeed VLSI architectures for the AES algorithm. IEEE Trans VLSI Syst 12(9), 957–967 (2004)
Gaj K, Chodowiec P. Comparison of the hardware performance of the AES candidates using reconfigurable hardware. In: AES candidate conference, pp. 40–54, 2000.
Liberatori M, Otero F, Bonadero JC, Castineira J. AES128 Cipher. high speed, low cost FPGA implementation. In: 2007 3rd southern conference on programmable logic, pp. 195–198, 2007.
A. Rudra, P.K. Dubey, C.S. Jutla, V. Kumar, J.R. Rao, P. Rohatgi, Efficient rijndael encryption implementation with composite field arithmetic, in Cryptographic hardware and embedded systems — CHES 2001. ed. by Ç.K. Koç, D. Naccache, C. Paar (Springer Berlin Heidelberg, Berlin, Heidelberg, 2001), pp.171–184. https://doi.org/10.1007/3540447091_16
A. Satoh, S. Morioka, K. Takano, S. Munetoh, A compact Rijndael hardware Architecture with SBox Optimization, in Advances in Cryptology. ed. by C. Boyd (Springer Berlin Heidelberg, Berlin, Heidelberg, 2001), pp.239–254. https://doi.org/10.1007/3540456821_15
W.K. Lee, H.J. Seo, S.C. Seo, S.O. Hwang, Efficient implementation of AESCTR and AESECB on GPUs with applications for highspeed FrodoKEM and exhaustive key search. IEEE Trans Circuits Syst II Express Briefs 69(6), 2962–2966 (2022)
A.A. Pammu, W.G. Ho, N.K.Z. Lwin, K.S. Chong, B.H. Gwee, A high throughput and secure authenticationencryption AESCCM algorithm on asynchronous multicore processor. IEEE Trans Inf Forensics Secur 14(4), 1023–1036 (2019)
Y. Zhou, G.M. Tang, J.H. Yang, P.S. Yu, C. Peng, Logic design and simulation of a 128b AES encryption accelerator based on rapid singlefluxquantum circuits. IEEE Trans Appl Supercond 31(6), 1–11 (2021)
Hodjat A, Verbauwhede I. A 21.54 Gbits/s fully pipelined AES processor on FPGA. In: 12th annual IEEE symposium on fieldprogrammable custom computing machines, pp. 308–309, 2004.
McLoone M´, McCanny JV. High performance singlechip FPGA Rijndael algorithm implementations. In: Cryptographic hardware and embedded systems  CHES 2001, pp. 65–76, 2001.
N. Yu, H.M. Heys, Investigation of compact hardware implementation of the advanced encryption standard. Can Conf Electr Comput Eng 2005, 1069–1072 (2005)
Järvinen K, Tommiska M, Skyttä J. A fully pipelined memoryless 17.8 Gbps AES128 encryptor. In: ACM/SIGDA eleventh international symposium on field programmable gate arrays, pp. 207–215, 2003.
Chang CJ, Huang CW, Tai HY, Lin MY. 8bit AES implementation in FPGA by multiplexing 32bit AES operation. In: The first international symposium on data, privacy, and Ecommerce (ISDPE 2007), pp. 505–507, 2007.
J. Wolkerstorfer, E. Oswald, M. Lamberger, An ASIC implementation of the AES SBoxes, in Topics in cryptology — CTRSA 2002. ed. by B. Preneel (Springer Berlin Heidelberg, Berlin, Heidelberg, 2002), pp.67–78. https://doi.org/10.1007/3540457607_6
H. Li, J. Li, A new compact architecture for AES with optimized ShiftRows operation. In: Proceedings of 2007 IEEE international symposium on circuits and systems, pp. 1851–1854, New Orleans, USA, May 27–30, 2007.
K. Li, H. Li, An efficient and compact subpipelined sbox architecture for AES. In: Proceedings of the ISCA 2nd international conference on advanced computing and communications, pp 45–49, Los Angeles, USA, 2012.
V. Fischer, M. Drutarovsky, P. Chodowiec, F. Gramain, InvMixColumn decomposition and multilevel resource sharing in AES implementations. IEEE Trans VLSI Syst 13(8), 989–992 (2005). https://doi.org/10.1109/TVLSI.2005.853606
P. Bulens, F.X. Standaert, J.J. Quisquater, P. Pellegrin, G. Rouvroy, Implementation of the AES128 on virtex5 FPGAs, in Progress in cryptology – AFRICACRYPT 2008. ed. by S. Vaudenay (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008), pp.16–26. https://doi.org/10.1007/9783540681649_2
Author information
Authors and Affiliations
Contributions
KL and HL proposed the reconfigurable and compact AES architecture for encryption and decryption. GM implemented the design with Xilinx FPGA. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
There are no conflict/competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, K., Li, H. & Mund, G. A reconfigurable and compact subpipelined architecture for AES encryption and decryption. EURASIP J. Adv. Signal Process. 2023, 5 (2023). https://doi.org/10.1186/s13634022009633
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634022009633
Keywords
 AES
 FPGA
 Pipelining
 Fully composite field
 Reconfigurable architecture
 Computingrestricted environments