EURASIP Journal on Applied Signal Processing 2002:9, 954–960 c ○ 2002 Hindawi Publishing Corporation Low-Complexity Versatile Finite Field Multiplier

A low-complexity VLSI array of versatile multiplier in normal basis over GF(2 n) is presented. The finite field parameters can be changed according to the user's requirement and make the multiplier reusable in different applications. It increases the flexibility to use the same multiplier for different applications and reduces the user's cost. The proposed multiplier has a regular structure and is very suitable for high speed VLSI implementation. In addition, the pipeline versatile multiplier can be modified to a low-cost architecture which is feasible in embedded systems and restricted computing environments.


INTRODUCTION
The finite fields GF(2 n ) of characteristic 2 are of great interest for cryptosystems and digital signal processing. The addition operation in GF(2 n ) is fast and inexpensive as it can be realized with n bitwise XOR operations. The multiplication operation is costly in terms of gate number and time delay. There have been three main kinds of basis representations of the field elements in GF(2 n ): standard (canonical, polynomial) basis, dual basis, and normal basis. Different basis representation multipliers have their own benefits and tradeoffs. The dual basis multiplier [1] needs the least number of gates which leads to the smallest area required for VLSI implementation [2]. The normal basis multiplier, for example, Massey-Omura multiplier [3], is very effective in performing squaring, exponentiation, and inversion operation. The standard basis multiplier [4,5,6,7] is easier to extend to highorder finite fields than the dual or normal basis multipliers.
Most of the proposed finite field multipliers operate over a fixed field. In other words, a new multiplier is needed if there is a change in the field parameters such as the irreducible polynomial defining the representation of the field elements. This makes the multiplier not reusable. There are few versatile multipliers [4,6,8,9] reported and all based on canonical basis. In this paper, we present a new VLSI array of versatile pipeline multiplier based on the normal basis representation. In normal basis, the squaring is a cost-free cyclic shift operation and the inversion (the most complicated operation among the important finite field arithmetic operations) can be effectively computed by Fermat's theorem which requires recursive squaring and multiplication [10,11]. Three main advantages accrue from the proposed pipelined versatile multiplier. First, the finite field parameters can be changed according to the application environments. It increases the flexibility to use the same multiplier for different applications. Secondly, the structure of the multiplier can be easily extended to higher-order finite fields. Thirdly, the basic architecture of the proposed multiplier can be modified to a low-cost multiplier which is very suitable for both embedded systems and wireless devices with restricted hardware resources. Moreover, the structure of the multiplier has the properties of modularity, simplicity, regular interconnection, and is easy for VLSI implementation. The proposed versatile multiplier can be efficiently used in public-key cryptosystems, such as elliptic curve cryptography; and the digital signal processing, for example, the Reed-Solomon encoder/decoder.
The outline of the remainder of the paper is as follows. In Section 2, we briefly review the normal basis representation and Massey-Omura multiplier. Section 3 contains the derivation of the pipeline versatile normal basis multiplier in GF(2 n ) and comparison with previous works. Section 4 concludes with the improved result and a description of areas of applications.

MULTIPLICATION ON GF(2 n )
It has been proved that there always exists a normal basis [12] for a given finite field GF(2 n ) which is the form of where β is a root of the irreducible polynomial P(x) of degree n over GF (2) and n elements of the set are linearly independent.
We say that β generates the normal basis N, or β is a normal element of GF(2 n ). Every element a ∈ GF(2 n ) can be represented by a = n−1 i=0 a i β 2 i , where a i ∈ {0, 1}. The following properties [10] of a finite field GF(2 n ) are useful in the applications.
(4) Squaring an element a in the normal basis representation is a cyclic shift operation, that is, a i−1 β 2 i = a n−1 , a 0 , . . . , a n−2 (5) with indices reduced modulo n.
Let a and b be two arbitrary elements in GF(2 n ) in a normal basis representation and c = a·b be the product of a and b. We denote a = n−1 i=0 a i β 2 i as a vector a = (a 0 , a 1 , . . . , a n−1 ), Since squaring in normal representation is a cyclic shift of the element, we have c 2 = a 2 · b 2 or equivalently c n−1 , c 0 , c 1 , . . . , c n−2 = a n−1 , a 0 , a 1 , . . . , a n−2 · b n−1 , b 0 , b 1 , . . . , b n−2 .
Hence, the last component c n−2 of c 2 can be obtained by the same function f operating on the components of a 2 and b 2 . That is, c n−2 = f a n−1 , a 0 , a 1 , . . . , a n−2 ; b n−1 , b 0 , b 1 , . . . , b n−2 . (8) By squaring c repeatedly, we get . . .
Equations 9 define the Massey-Omura multiplier in normal basis representation [10]. In Massey-Omura multiplier, the same logic function f for computing the last component of c n−1 of the product c can be used to get the remaining components c n−2 , c n−3 , . . . , c 0 of the product sequentially. In parallel architecture, we can use n identical logic function f for calculating all components of the product simultaneously.

A PIPELINE ARCHITECTURE FOR THE SERIAL VERSATILE NORMAL BASIS MULTIPLIER
In this section, we derive a pipeline architecture to implement the versatile normal basis multiplier. Let c be the product of a and b, In the normal basis, we have Thus, we can get From the above analysis, we see that the important issue for building a versatile normal basis multiplier is to get the value of can be obtained if we know the transformation between the elements of the canonical basis and the elements of the normal basis, that is, the normal basis representation of the elements of the canonical basis.
In the following, we define the multiplication table of the normal basis and use the basis element transformation formula to get the values of the multiplication table, and then obtain the n × n matrices λ (k) . Finally, we illustrate the approach to build the versatile pipeline normal basis multiplier. Definition 1. Let N = {β, β 2 , . . . , β 2 n−1 } be a normal basis in GF(2 n ), then for any i, j (0 ≤ i, j ≤ n − 1), β 2 i β 2 j is a linear combination of β, β 2 , . . . , β 2 n−1 with coefficients in GF(2). In particular, where T is an n×n matrix over GF (2). We call T the multiplication table of the normal basis N. The number of nonzero entries in T is called the complexity of the normal basis N, denoted by C N .
There always exists the multiplication table T and the matrix λ (k) for a given irreducible polynomial which defines the normal basis in GF(2 n ) [12]. After the multiplication table T is obtained, the matrix λ (k) can be calculated according to (12). An example is shown below.
Example 1. Let the irreducible polynomial be P 1 (x) = x 5 + x 4 + x 2 + x + 1 and β be a root of the polynomial, then the canonical basis is {1, β, β 2 , β 3 , β 4 } and the normal basis is {β, β 2 , β 4 , β 8 , β 16 }. We can get the following normal basis representation for the elements of the canonical basis: The appendix illustrates how to obtain the normal basis representation of β 3 .
Thus the element β i (i > 5) can be reduced to the representation of canonical basis and converted to the corresponding representation of normal basis by the base element transformation formula (14). For instance, Then we can get the multiplication table T for given The product of a and b is As β 6 = (β 3 ) 2 , β 10 = (β 5 ) 2 , β 18 = (β 9 ) 2 , β 12 = (β 6 ) 2 , β 20 = (β 5 ) 4 , β 24 = (β 3 ) 8 , β 32 = β, we can easily obtain these elements' normal basis representation by cost-free cyclic shift operation on the row of the multiplication table T and get the matrix λ (4) It can be readily seen that the matrices λ (k) (0 ≤ k ≤ n−1) are symmetric.
From the matrix λ (4) , we can get the following logic function to compute the most significant bit of the product of ab in GF(2 5 ) defined on the irreducible polynomial P 1 (x) In the normal basis representation, the logic function f = (a 0 , a 1 , . . . , a n−1 ; b 0 , b 1 , . . . , b n−1 ) which is used to get the most significant bit (c n−1 ) of the product can also be used to get the remaining bits (c n−2 , c n−3 , . . . , c 0 ) of the product, except we cyclically shift the input of the function [10]. Thus, we may choose one matrix from the matrices λ (k) (0 ≤ k ≤ n−1) and input the values of upper triangle of the symmetric matrix for doing the multiplication.
A VLSI array architecture to implement the versatile GF(2 n ) normal basis multiplier is proposed and illustrated in Figures 1 and 2   AND gates and 2-input XOR gates. We use the 3-input AND gates to compute a i b j λ (n−1) in the X-Y dimension, and compute the sum of a i b j λ (n−1) i, j by a binary tree structure of 2-input XOR gates in the Z dimension. The architecture requires n 2 3-input AND gates and n 2 − 1 2-input XOR gates, the time delay for generating one bit of the product is T AND3 + 2( log 2 n )T XOR , where T AND3 is the time delay of a 3-input AND gate and T XOR is the time delay of a 2-input XOR gate. We can get all bits of the product by cyclically shifting the input coefficients of a and b. As the irreducible polynomial is not changed frequently as the multiplicands, we can store the elements of the matrix λ (n−1) in the registers once the irreducible polynomial has been decided.
The algorithm for this multiplication can be described as follows. The proposed architecture can be implemented by a pipeline structure. In the first n clock cycles, the coefficients of a and b are fed sequentially into the buffers. In the following n clock cycles, we will get the result of the product by cyclically shifting the registers which store the original coefficients of a and b. In the meantime, the next two multiplicands can be fed into the buffers during these clock cycles and we can compute the second product immediately just after we finish the first one.
In the restricted computing environment, we can iterate using one level components of the proposed multiplier ( Figure 2) to obtain a low-cost serial architecture as illustrated in Figure 3 to implement the same computation. It can be described by the following algorithm. The low-cost versatile normal basis multiplier in GF(2 n ) requires n 3-input AND gates and n 2-input XOR gates. The time delay for generating one bit of the product is n(T AND3 + ( log 2 n + 1)T XOR ).
The proposed versatile normal basis multipliers have modular structures, regular interconnections which are suitable for high speed or restricted space of VLSI implementations. Table 1 lists the comparison of space and time complexity between our new multipliers and previous works. The input ports of the proposed versatile multiplier are almost the same as the nonversatile multiplier, since the finite field parameters can be configured into the multiplier by the input ports of multiplicands (a and b) through a one-bit control signal at the configuration time. The finite field parameters do not need reconfiguration during the running time of the multiplier, until the application environments are changed. Thus the hardware cost can be greatly reduced compared to the nonversatile multiplier where a new multiplier has to be redesigned and implemented when the finite field parameters are required to be changed.

Multiplier
Type # XOR Gates # AND Gates Time Delay Wang-MOM [10] N o n v e r s a t i l e 2 n − 2 2 n − 1 n(T AND + ( log 2 n + 1)T XOR ) Li-CVM [9] (canonical basis) Versatile 2n 2 2n 2 n(T AND + 2T XOR ) Prop. multiplier (Figure 2) V e r s a t i l e n 2 − 1 n 2 (3-input) n(T AND3 + 2 log 2 n T XOR ) Prop. low-cost multiplier (Figure 3) V e r s a t i l e n n (3-input) n 2 (T AND3 + ( log 2 n + 1)T XOR ) Moreover, the proposed architecture in GF(2 n ) can be easily expanded to the finite field of GF(2 2n ). The one solution is to use two basic GF(2 n ) architecture to implement the multiplication in GF(2 2n ) and another alternative solution is to do the GF(2 2n ) multiplication serially by using only one basic GF(2 n ) architecture.

CONCLUSION
In this paper, the architectures for finite field multiplication based on normal basis have been proposed. The architectures require simple control signals and have regular local interconnections. As a consequence, they are very suitable for VLSI implementation. The versatile property of this VLSI array modular multiplier increases the application range and the same multiplier can be applied for different application environments, such as elliptic curve cryptosystems and Reed-Solomon encoder/decoder. The proposed multiplier can be easily extended to high order of n for more security. Moreover, the structures can be modified to make fast exponentiation and inversion. Also note that we can make a lowcost and space efficient serial multiplier which is feasible in the restricted computing environments and embedded systems.

APPENDIX
Let the irreducible polynomial be P 1 (x) = x 5 + x 4 + x 2 + x + 1 and let β be a root of the polynomial. We show the procedures of computing the multiplication table T and the matrix λ (4) .