Using the complement of the cosine to compute trigonometric functions

Guerrero Martos, David; Millán Calderón, Alejandro; Juan Chico, Jorge; Viejo Cortés, Julian; Bellido Díaz, Manuel J.; Ruiz-de-Clavijo Vazquez, Paulino; Ostúa Arangüena, Enrique

doi:10.1186/s13634-020-00692-5

Research
Open access
Published: 22 July 2020

Using the complement of the cosine to compute trigonometric functions

David Guerrero Martos ORCID: orcid.org/0000-0002-1207-0335¹,
Alejandro Millán Calderón¹,
Jorge Juan Chico¹,
Julian Viejo Cortés¹,
Manuel J. Bellido Díaz¹,
Paulino Ruiz-de-Clavijo Vazquez¹ &
…
Enrique Ostúa Arangüena¹

EURASIP Journal on Advances in Signal Processing volume 2020, Article number: 35 (2020) Cite this article

2439 Accesses
Metrics details

Abstract

The computation of the sine and cosine functions is required in devices ranging from application-specific signal processors to general purpose floating-point units. Even in the latter case, the required functionality can be reduced to computing the sine and/or cosine of multiples of a constant angle. The latency of a sine/cosine generator can be reduced by using look-up tables. However, a direct implementation with look-up tables may be unfeasible if the input space is huge. In such a case, look-up tables with a number of entries lower than the size of the input space can be used indirectly. In previously published methods, the reduction in the number of table entries is obtained at the expense of increasing the table width and the computational cost. This paper introduces an alternative technique that makes it possible to reduce the size of the look-up tables as well as the required multiplications. The proposed technique can be used to implement sine/cosine generators of huge input space. It has been used to implement several twiddle factor generators in reconfigurable hardware and has enabled the number of look-up tables to be reduced by between 6 and 26% with respect to previous table-based techniques. Also, these implementations are about 50% faster than those based on Volder’s algorithm.

1 Introduction

The computation of sine and cosine functions is fundamental in a wide range of applications, including that of signal processing [1, 2]. Obvious examples are the computation of discrete cosine transform (DCT), discrete sine transform (DST), and their inverses (IDCT and IDST) [3]. A fused sine-and-cosine implementation is of major interest because various methods compute both and numerous applications require both [1]. In this paper, the focus is on the implementation of functional units that provide the sine and cosine of multiples of a constant angle ϕ, that is, sin(nϕ) and cos(nϕ), where n is an integer given as an input. Applications of such functional units include the following:

Implementing the sine and/or cosine functions in arithmetic units. For example, suppose an arithmetic unit must compute the sine and/or cosine of a number x using the IEEE 754-1985 double-precision format [4]. x is coded in a 64-bit word with 3 fields called sign (1 bit), exponent (11 bits), and significand (52 bits). The significand is a number in the range [1,2−2⁻⁵²] coded in fixed-point, while the exponent is an integer laying in the range [−1022,1023] coded in excess 1023. If the exponent is lower than −27, then the unit can simply return 1 as the cosine and x as the sine, assuming rounding to the nearest representable value [5]. Otherwise, it can return the sine and/or cosine of nϕ, where ϕ is the constant 2⁻²⁷⁻⁵²=2⁻⁷⁹ and n is the integer x2⁷⁹.
Generating the set of coefficients, called twiddle factors, of a discrete Fourier transform (DFT) [6]. The DFT of a complex sequence x of length L is another complex sequence X of the same length defined by $X(k)=\sum ^{L-1}_{t=0} x(t)W_{L}^{tk}$, where W_L=e^ϕi and ϕ=−2π/L. The twiddle factors are the integer powers of W_L, and there are L different twiddle factors. Thus, the twiddle factor of index n is $W_{L}^{n}=(e^{\phi i})^{n}=e^{n \phi i}=\sin (n \phi)i+\cos (n \phi)$.

Since the sine and cosine functions are computationally expensive, in applications where a low latency is required, the generator is implemented using look-up tables (LUT). This implementation approach is problematic if the input space is large. For example, consider the arithmetic unit previously mentioned: as stated, exponents lower than −27 can be dismissed. However, even if the input angle is restricted to [0,2π), it is still necessary to consider 30 different exponent values and 2⁵² different significand values. A direct implementation would therefore require an LUT of 30∗2⁵² entries. Another example is given by the DFT engines required in applications such as required in Power Line Communications (PLC) [7], Digital Video Broadcasting—Terrestrial 2 (DVB-T2) [8], photon counting [9], and radio astronomy [10]. In those applications, the sequence can be as long as 2¹³, 2¹⁵, 2²⁷, and 2³⁰, respectively, and hence the coefficient tables are large in comparison with other elements of the DFT engine [11]. In this paper, we propose an innovative technique to reduce the resources required to implement a sine/cosine generator in application-specific integrated circuit (ASIC) and configurable logic. We have implemented an open-source tool to automate the design of twiddle factor generators of arbitrary size and precision using the proposed technique.

The rest of the paper is organized as follows. In the next section, the notation used is introduced and optimization techniques are presented to reduce the number of entries of the required LUT to a number proportional to the input space. In Section 3, optimization techniques are given that enable LUTs to be employed with a total number of entries that grows sublinearly with the input space. The new proposed technique is introduced in Section 4. The experiments are described in Section 5. The corresponding performance results are shown and discussed in Section 6. The last section provides a summary of the conclusions.

2 Argument reduction

As mentioned in the introduction, our objective is to efficiently implement a functional unit that provides sin(nϕ) and cos(nϕ), where n is an integer provided as an input to the unit, and ϕ is a constant angle that depends on the application. Hereinafter, the input of the functional unit will be denoted as I and the number of bits of I will be denoted as w. Furthermore, the following definitions will be used:

Definition 1

A real number ϕ is trigonometric rational if and only if $\frac {\phi }{\pi }$ is rational.

For example, the angle ϕ=−2π/L, used in the definition of the twiddle factors in Section 1, is trigonometric rational, while the angle ϕ=2⁻⁷⁹ mentioned in the arithmetic unit example of Section 1 is not.

Definition 2

The trigonometric Carmichael function of a trigonometric rational number ϕ is the minimum natural number λ_t(ϕ) such that $\frac {\lambda _{t}(\phi)\phi }{2\pi }$ is an integer.

This definition is useful in the calculation of the size of the output space of the generator. This size is the minimum of λ_t(ϕ) and the size of the input space. Note that λ_t(0)=1. In the DFT example, λ_t(ϕ) is equal to the length of the transform. The function λ_t can also be employed to make the following simplification. Suppose that ϕ has been defined as a trigonometric rational number whose absolute value is very large: |ϕ|≫2π. In this case, an angle α with |α|<2π can be found such that the functionality of the generator, that is, computing sin(nϕ) and cos(nϕ), is equivalent to computing sin(nα) and cos(nα). In order to obtain such α:

1.
Take the integer $k=\frac {\lambda _{t}(\phi)\phi }{2\pi }$
2.
Take the remainder r of the division $\frac {k}{\lambda _{t}(\phi)}$. Note that k and r have the same sign and |r|<λ_t(ϕ).
3.
$\alpha =\frac {r 2\pi }{\lambda _{t}(\phi)}$.

Definition 3

The trigonometric Shannon entropy of a trigonometric rational number ϕ is H_t(ϕ)=log₂(λ_t(ϕ)).

If ϕ is trigonometric rational, in order to maximize the size of the output space of the generator, that is, get an output space of size λ_t(ϕ), the minimum number of bits required to code the input is ⌈H_t(ϕ)⌉.

Definition 4

ϕ is trigonometric binary if and only if it is trigonometric rational and H_t(ϕ) is an integer.

The latter definition is relevant since, in many applications, the constant angle ϕ is trigonometric binary. For example, consider the algorithms designed to compute efficiently the DFT called fast Fourier transform (FFT) [12] algorithms: many of these algorithms require the length of the transform L to be a power of 2 [13], that is, log₂(L) must be an integer, and hence ϕ must be trigonometric binary since H_t(ϕ)=log₂(λ_t(−2π/L))=log₂(L). Moreover, in applications where ϕ is trigonometric binary, it is irrelevant whether the representation of n is either unsigned or two’s complement as long as w≥H_t(ϕ). As an example, consider the functional unit specified in [1]. This unit computes the sine and cosine of πx where x is a number in the interval [−1,1) coded in fixed-point two’s complement. Let S be the value represented by the input I in integer two’s complement, x=S/2^w−1, and hence πx=Sπ/2^w−1. Thus, the functional unit computes the sine and cosine of Sϕ, where ϕ=π/2^w−1. Let n be the value represented by I in unsigned integer representation. It is easy to prove that sin(Sϕ)= sin(nϕ) and cos(Sϕ)= cos(nϕ). Therefore, the functionality of the unit is equivalent to the computation of the sine and cosine of nϕ. This is exemplified in Table 1 for w=3.

Table 1 Values returned by the functional unit specified in [1] when the input I has exactly three bits (ϕ=π/2³⁻¹=π/4)

Full size table

Hereinafter, an unsigned notation for n is assumed. In the following subsections, we will see optimization techniques that require a trigonometric rational value of ϕ. In these subsections, the trigonometric Carmichael function of ϕ, λ_t(ϕ), is abbreviated to L.

2.1 Periodicity

If the size of the input space of the functional unit is greater than L, then the periodicity of the sine and cosine can be used to compute sin(nϕ) and cos(nϕ) in the following way:

1.
Compute n mod L. As noted by [1], if ϕ is trigonometric binary, then this computation has no cost since the result is simply the H_t(ϕ) least significant bits of the input I.
2.
Use a subgenerator to compute the sine and cosine of (n mod L)ϕ. The input space of the subgenerator is $\mathbb {Z}_{L}$, smaller than the original input space, and hence, it can be implemented using a smaller LUT.

In the optimization shown in the following subsections, it is assumed that the input space of the generator is $\mathbb {Z}_{L}$.

2.2 Sign reduction

If the input space of the functional unit is $\mathbb {Z}_{L}$, then it is possible to implement it with another subgenerator with the same value of ϕ but whose input space is $\mathbb {Z}_{\lfloor L/2 \rfloor +1}$, that is, its size is roughly half of the size of the original input space. This optimization takes into account the following trigonometric identities:

$$ \begin{aligned} \sin(\alpha)=-\sin(2\pi-\alpha)\\ \cos(\alpha)=\cos(2\pi-\alpha) \end{aligned} $$

(1)

If n≤L/2, then the input of the subgenerator is n and its output is the output of the functional unit. Otherwise, the input of the subgenerator is L−n, the cosine output of the unit is the cosine output of the subgenerator, and the sine output of the unit is the opposite of the sine output of the subgenerator. Note that if ϕ is trigonometric binary, then L−n can be obtained by simply taking the two’s complement. In the next subsection, another optimization is presented for the implementation of the subgenerator when L is even.

2.3 Quadrant reduction

If L is even and the size of the input space of the functional unit is ⌊L/2⌋+1=L/2+1, then it can be implemented using a subgenerator with the same value of ϕ but whose input space is reduced to $\mathbb {Z}_{\lfloor L/4 \rfloor +1}$. This optimization uses the following trigonometric identities:

$$ \begin{aligned} \sin(\alpha)=\sin(\pi-\alpha)\\ \cos(\alpha)=-\cos(\pi-\alpha) \end{aligned} $$

(2)

If n≤L/4, then the input of the subgenerator is n and its output is the output of the functional unit. Otherwise, the input of the subgenerator is L/2−n, the sine output of the unit is the sine output of the subgenerator, and the cosine output of the unit is the opposite of the cosine output of the subgenerator. Again, the computation of L/2−n is simply a two’s complement if ϕ is trigonometric binary. In turn, the optimization described in the next subsection can be used to implement the subgenerator if L is multiple of 4.

2.4 Octant reduction

Finally, if L is a multiple of 4 and the input space of the functional unit is $\mathbb {Z}_{\lfloor L/4 \rfloor +1}=\mathbb {Z}_{L/4+1}$, then it can be implemented with a subgenerator with the same value of ϕ but whose input space is reduced to $\mathbb {Z}_{\lfloor L/8 \rfloor +1}$ by applying the following trigonometric identities:

$$ \begin{aligned} \sin(\alpha)=\cos(\pi/2-\alpha)\\ \cos(\alpha)=\sin(\pi/2-\alpha) \end{aligned} $$

(3)

If n≤L/8, then the input of the subgenerator is n and its output is the output of the functional unit. Otherwise, the input of the subgenerator is L/4−n, the sine output of the unit is the cosine output of the subgenerator and the sine output of the unit is the cosine output of the subgenerator. Once more, a simple two’s complement provides L/4−n if ϕ is trigonometric binary.

With the previous optimizations, a sine/cosine generator can be implemented with an LUT of ⌊L/8⌋+1 entries. For example, the functional unit described in [1] previously mentioned can be implemented as shown in Fig. 1. As previously discussed, the functionality of that unit is equivalent to computing the sine and cosine of nϕ, where n is the number provided by input I in unsigned integer notation, the angle ϕ is 2π/2^w, and w is the number of bits of I. In this case, ϕ is a trigonometric binary. In this implementation, the output is provided in some type of sign-magnitude notation such as one of the IEEE 754-1985 floating-point formats, and w>3 so λ_t(ϕ) is multiple of 4 and an LUT of only 2^w−3+1 entries is required. The LUT of the figure should return the sine and cosine of angles in the range [0,π/4] and, since they are all positive, there is no need to store the sign bits. Instead, they are computed using a simple XOR gate (4c). The LUT has been implemented using a direct access memory. In order to prevent the problem of dealing with a direct access memory with a number of positions that is not a power of two, the access of the entry of the LUT corresponding to n=2^w−3 is detected by a simple logic gate (4a) and is treated separately. In that case, the LUT returns the sine and cosine of π/4, that is, $1/\sqrt []{2}$, using a pair of multiplexers (2b). The address lines of the memory are fed with the w−3 least significant bits of I or with its two’s complement depending on I_w−3, using the adder (3) and the multiplexer (2a). The gate (4b) is employed to ascertain whether the magnitude of the sine and the cosine should be interchanged with the multiplexers (2c).

3 Sublinear optimizations

The optimizations described in the previous sections have the following drawbacks:

They require a subgenerator with an input space greater than λ_t(ϕ)/8, that is, its input space grows linearly with λ_t(ϕ). In many applications, the subgenerator cannot be directly implemented using an LUT with a number of entries proportional to the input space since it is excessively large. For example, even if the octant optimization could be directly applied to the arithmetic unit mentioned in Section 1, it would only reduce the size of the input space of the required subgenerator to roughly 30∗2⁴⁹.
They can only be directly applied if ϕ is trigonometric rational. Furthermore, the quadrant and octant optimizations require λ_t(ϕ) to be even and a multiple of 4, respectively. Hence, in order to apply them in the arithmetic unit of the example of Section 1, a workaround similar to that shown in [1, 14] is necessary. For example, assuming the angle is positive, the arithmetic unit could execute the following steps to compute the sine and cosine:
1. 1.
  Divide the angle by 2π. This can be implemented efficiently by multiplying the angle by the precomputed value of the reciprocal of 2π.
2. 2.
  Take the fixed-point representation I of the fractional part of the previous division.
3. 3.
  Return the sine and cosine of nϕ, where n is the number represented by I in unsigned integer notation, $\phi =\frac {2\pi }{2^{w}}$, and w is the width of I.
This last step can be carried out by a generator that can be implemented using the optimizations described in the previous section. However, this approach has its own drawbacks: first, the cost of a multiplication is introduced; second, the reciprocal of 2π is not rational, and hence, its exact representation cannot be stored. Moreover, even if we could use an exact representation of the reciprocal of 2π, the exact product of the angle by the reciprocal is not rational unless the angle is zero, that is, the exact result of the division of the angle by 2π is not rational. Hence, its representation cannot be exact and an error is introduced [14].

In the following subsections, optimizations without these drawbacks are described.

3.1 Branching

This optimization, used in [1], accepts an arbitrary value of ϕ, although it was originally employed for a trigonometric rational value. When branching is applied, the generator is implemented using two subgenerators, M₁ and M₀, which we call branches. The inputs of the branches are denoted by A(1) and A(0), while the widths of these inputs are denoted by L(1) and L(0), respectively. These are chosen so that the width of the input of the generator, I, is w=L(1)+L(0). M₁ provides the sine and cosine of integer multiples of ϕ₁, that is, the sine and cosine of n₁ϕ₁, where n₁ is the value represented by A(1) and ϕ₁=2^L(0)ϕ. On the other hand, M₀ provides the sine and cosine of n₀ϕ₀, where n₀ is the value represented by A(0) and ϕ₀=ϕ. The least significant bits of I are connected to A(0), while the rest are connected to A(1). Since I is the concatenation of A(1) and A(0), the value represented by I is

$$ n=n_{1}2^{L(0)}+n_{0} $$

(4)

and hence,

$$ n\phi=n_{1} 2^{L(0)}\phi+n_{0}\phi=n_{1}\phi_{1}+n_{0}\phi_{0} $$

(5)

Since the sines and cosines of n₁ϕ₁ and n₀ϕ₀ are provided by M₁ and M₀, the sine and cosine of their sum can be computed by applying the following trigonometric identities:

$$ \begin{aligned} \sin(A+B)=\sin(A)\cos(B)+\cos(A)\sin(B)\\ \cos(A+B)=\cos(A)\cos(B)-\sin(A)\sin(B) \end{aligned} $$

(6)

Alternatively, we can say that each subgenerator M_k provides the complex $\sin (n_{k}\phi _{k})i+\cos (n_{k}\phi _{k})=e^{n_{k}\phi _{k} i}\phantom {\dot {i}\!}$, and the generator can provide the value e^nϕi by computing the complex product $\phantom {\dot {i}\!}e^{n_{1}\phi _{1} i}e^{n_{0}\phi _{0} i}$. Indeed, computing the product of two complexes, each of a unitary module, is equivalent to computing the sine and cosine of the sum of two angles from the sine and cosine of those angles and implies four real products, a real sum, and a real subtraction. A generalization of this branching technique was proposed in [15] to compute twiddle factors. Note that the sum of the sizes of the input spaces of M₁ and M₀ is minimum when L(1) and L(0) differ by no more than 1. In this case, such a sum grows with the square root of the size of the original input space, that is, sublinearly [15]. Accordingly to [1], floating-point sine/cosine applications will benefit from a fixed-point conversion of the datapath around these functions.

3.2 Tree generator

The implementation of the branches was not detailed in the previous subsection. In the generator described in [1], the branch M₀ computes its output using the Taylor series, while M₁ is implemented with an LUT of affordable size. Further optimization could be achieved if one or both branches were, in turn, implemented with sub-branches. This recursive application of the branch optimization is used by the tree generator described in [16]. In general, the tree generator requires a set of subgenerators that we will call leaves, complex multipliers, and, if the implementation is sequential or pipelined, registers. The following notation is employed for its description:

w: width of the input of the tree generator
I=I_w−1I_w−2…I₁I₀: input of the tree generator
$n=\sum _{t=0}^{w-1}{I_{t}2^{t}}$: number represented by the input of the tree generator
m: number of leaf subgenerators employed
M₀,M₁,…, M_m−1: the m leaves
L(k): width of the input of the leaf M_k
A(k)=A(k)_L(k)−1…A(k)₀: input of the leaf M_k
$n_{k}=\sum _{t=0}^{L(k)-1}{A(k)_{t}2^{t}}$: number represented by the input A(k)
$SL(k)=\sum _{t=0}^{k-1}{L(t)}=$

$\left \{\begin {array}{l} 0 \textrm { if} \ k=0 \\ L(k-1)+SL(k-1) \textrm { if}\ k>0 \end {array}\right.$: total number of input lines of the leaves with an index lower than k
ϕ_k: angle defined by ϕ_k=(2^SL(k))ϕ

Each leaf subgenerator M_k provides the sine and cosine of n_kϕ_k. The leaves are chosen such that the sum of the widths of their inputs is equal to the width of the input of the tree generator:

$$ w=SL(m)=\sum_{k=0}^{m-1}{L(k)} $$

(7)

The input lines of each leaf M_k are connected to the input lines of the tree generator from I_SL(k) to I_SL(k+1)−1, that is, each input line A(k)_t is connected to I_t+SL(k):

$$\begin{array}{*{20}l} A(0)=I_{L(0)-1}\ldots I_{1}I_{0} \end{array} $$

$$\begin{array}{*{20}l} A(1)=I_{L(0)+L(1)-1}\ldots I_{L(0)+1}I_{L(0)} \end{array} $$

$$\begin{array}{*{20}l} \vdots \end{array} $$

$$\begin{array}{*{20}l} A(m-1)=I_{w-1} \ldots I_{SL(m-1)+1}I_{SL(m-1)} \end{array} $$

Hence, the input value n represented by I becomes:

$$ \begin{aligned} n=\sum_{t=0}^{w-1}{I_{t}2^{t}}=\sum_{k=0}^{m-1}{\sum_{t=SL(k)}^{SL(k)+L(k)-1}{I_{t}2^{t}}}=\\ \sum_{k=0}^{m-1}{\sum_{t=0}^{L(k)-1}{I_{t+SL(k)}2^{t+SL(k)}}}=\\ \sum_{k=0}^{m-1}{\sum_{t=0}^{L(k)-1}{A(k)_{t}2^{t+SL(k)}}}=\\ \sum_{k=0}^{m-1}{(\sum_{t=0}^{L(k)-1}{A(k)_{t}2^{t})2^{SL(k)}}}=\sum_{k=0}^{m-1}{n_{k}2^{SL(k)}} \end{aligned} $$

(8)

and therefore the angle whose sine and cosine must be computed by the tree generator can be written as:

$$ n\phi=\sum_{k=0}^{m-1}{n_{k}2^{SL(k)}\phi}=\sum_{k=0}^{m-1}{n_{k}\phi_{k}} $$

(9)

Hence, the angle nϕ is the sum of the subangles n_kϕ_k, or, alternatively, the complex e^nϕi is the product $e^{n_{0}\phi _{0} i}e^{n_{1}\phi _{1} i} \ldots e^{n_{m-1}\phi _{m-1} i}\phantom {\dot {i}\!}$. Again, since the sine and cosine of the subangles are provided by the leaves, the sine and cosine of nϕ can be computed with complex multiplications. Taking this into account, the structure of the generator described in [16] becomes a directed rooted binary tree with m leaves. Each vertex corresponds to a component whose output is a complex of unitary module. Each internal vertex has exactly two children and corresponds to a complex multiplier that computes the product of the outputs of the components associated to these children. The components corresponding to the m leaves are the m subgenerators and provide the complex values $e^{n_{k}\phi _{k} i}\phantom {\dot {i}\!}$. The output of the tree generator is the output of the component corresponding to the root vertex. Hereinafter, the height of the tree will be denoted as h. The following recommendations may improve the efficiency of the design:

It is desirable to minimize the height of the tree h in order to reduce latency and rounding errors. This is achieved if the structure of the generator is a complete binary tree.
If each leaf is implemented with an LUT, the total number of entries is minimum when the width of the inputs of those LUTs differ by no more than 1. To this end, let q be the quotient obtained by dividing w by m, and let r be the remainder. A total of r LUTs must have inputs of width q+1. The other LUTs must have inputs of width q.
If the above recommendation is followed, then the total number of entries decreases when m increases. For a fixed height h, the maximum possible value of m is 2^h, and therefore, the total number of entries can be minimized by using 2^h leaves.

In order to ascertain the power of this approach, suppose we use a complete binary tree with height h=⌊log₂(w)⌋. In this case, the number of subgenerators m would be no greater than w, and each subgenerator would have no more than 2 input lines. If each subgenerator is implemented with an LUT, then an upper bound on the total number of entries is 4w, that is, the total number of entries grows logarithmically with the size of the input space of the tree generator. Hence, the implementation of a sine/cosine generator with an input space as large as that required by the arithmetic unit mentioned in Section 1 is feasible with a tree generator. Note that a tree generator can be combined with the argument reduction mentioned in Section 2. For example, in [1], argument reduction is first applied, and hence, only a subgenerator with an input space of roughly 1/8 of the original size is required. That subgenerator is then implemented with a tree generator of height h=1.

In the following sub-subsections, we will see several optimizations that can be applied to the tree generator. In the rest of the paper, ϕ>0 is assumed for the sake of simplicity, although in practice this is not a restriction since cos(nϕ)= cos(n|ϕ|) and sin(nϕ)=sgn(ϕ)∗ sin(n|ϕ|).

3.2.1 Quadrant restriction

This optimization can be applied to quadrant restricted sine/cosine generators, which are defined as follows:

Definition 5

Given a functional unit with an integer input n≥0 that computes one or more trigonometric functions of nϕ, where ϕ>0 is a constant, the unit is quadrant restricted if and only if $\frac {\pi }{2\phi }$ is an upper bound on its input space.

If a sine/cosine generator is quadrant restricted, then it must compute the sine and cosine of an angle nϕ in the interval [0,π/2]. Since both functions are positive in that interval, the following optimizations are possible:

As in the example of Section 2, if the generator is implemented with an LUT, there is no need to store the sign bits.
If it is implemented with a tree generator, no signed adders, subtracters, nor multipliers are required.

For example, the tree generator used in [1] is quadrant restricted, and hence, the complex multiplier requires no signed arithmetic components and the LUT employed does not need to store the sign bits. Note that even if a tree generator is not quadrant restricted, it may contain quadrant restricted branches that can benefit from these optimizations.

3.2.2 Leading zeros of the sine

This optimization is useful when the sine values of a quadrant-restricted generator are coded in fixed-point. In this case, an upper bound on the sine output is sin(n_maxϕ), where n_max is the maximum of the input space. Consequently, if the k most significant bits of the fixed-point representation of sin(n_maxϕ) are 0, those bits of the sine output of the generator are always 0, and the following optimizations are possible:

If the generator is implemented with an LUT, then there is no need to store the k most significant bits of the sine.
If the generator feeds a complex multiplier of a tree generator, the size of its real multipliers can be reduced.

3.2.3 Leading ones of the cosine

This optimization is useful when the cosine values of a quadrant-restricted generator are coded in fixed-point using all the bits for the fractional part. In this case, the representable value nearest to cos(0)=1 corresponds to the word with all the bits equal to 1. A lower bound on the cosine output is cos(n_maxϕ), where n_max is the maximum of the input space, and therefore, if the k most significant bits of the fixed-point representation of cos(n_maxϕ) are 1, those bits of the cosine output of the generator are always 1. Hence, if the generator is implemented with an LUT, there is no need to store the k most significant bits of the cosine.

To exemplify these optimizations, suppose a generator must provide the sine and cosine of nϕ in fixed-point notation with 8 fractional bits and no integer bits rounding to the nearest representable value. In this example, ϕ=2π/2¹¹=π/2¹⁰, that is, it is trigonometric binary and λ_t(ϕ)=2¹¹ is a multiple of 4, and hence, we first apply argument reduction and treat the case n=2⁸ separately as in the example of Section 2. A subgenerator still has to be subsequently implemented with an input space of size 2⁸ (w=H_t(ϕ)−3=8). We implement this subgenerator with a tree generator of height h=1, and therefore, there are 2 leaves (m=2^h=2) as shown in Fig. 2. Since this tree generator is quadrant restricted, it does not require signed arithmetic components. Each leaf is implemented with direct access memory (5) of depth 2⁴. Note that the sine/cosine values must be stored with a precision of 17 bits to compensate for rounding errors. The leaf M₀ provides the sines and cosines of the multiples of ϕ₀=2⁰ϕ=π/2¹⁰, while the leaf M₁ provides the sines and cosines of the multiples of ϕ₁=2⁴ϕ=π/2⁶. The greatest angle whose sine and cosine is stored in M₀ is 15π/2¹⁰. The 4 most significant bits of the representation of the sine of this angle are 0 and, since the generator is quadrant restricted, the 4 most significant bits of the other representations of the sines stored in M₀ are also 0, and therefore, there is no need for them to be stored. On the other hand, the 9 most significant bits of the representation of the cosine of that angle are equal to 1, and therefore, there is no need for them to be stored either. Similar optimizations can be applied to M₁, but in this case, only one bit can be saved. Note that we only need two integer multipliers of size 13×17(6a) and two of size 17×17(6b) instead of four multipliers of size 17×17 thanks to the leading zeros of the sine. The leading ones of the cosine cannot be employed to reduce the size the arithmetic components in a similar way. In the last stage, an adder (3) provides the sine of the tree generator and a subtracter (7) provides the cosine.

4 Sine/complement generator

In the optimizations described in Section 3, the angle whose sine/cosine must be computed is decomposed into two subangles, A and B. Two subgenerators, called branches, are employed to compute the sine/cosine of A and B, and then the sine/cosine of A+B is computed by applying the identities 6. This method presents the following drawbacks:

If the maximum value of one of the angles A or B is small, then its sine is close to 0, while its cosine is close to 1. In this case, the product cos(A) cos(B) can be orders of magnitude greater than sin(A) sin(B), and hence, smearing may occur when computing cos(A+B)= cos(A) cos(B)− sin(A) sin(B).
Unlike the optimization described in Section 3.2.2, the one described in Section 3.2.3 fails to help in the reduction of the required arithmetic components.

These problems can be solved by using another type of generator that we call sine/complement generator. Such generator receives an integer n and computes the sine and the complement of the cosine of nϕ, that is defined as follows:

Definition 6

The complement of the cosine of x is com(x)=1− cos(x)

It is possible to compute the sine and the complement of the cosine of the sum of two angles, A and B, from the sines and complements of the cosines of those angles by using a functional unit called a trigonometric adder [17]. Similar to the complex multiplier, the trigonometric adder can be implemented with adders (3), subtracters (7), and multipliers (6) as depicted in Fig. 3. This trigonometric adder implementation uses the following trigonometric identities derived from those of 6:

$$ \begin{aligned} \sin(A+B)=\\ \sin(A)+\sin(B)-[\sin(A)\text{com}(B)+\text{com}(A)\sin(B)]\\ \text{com}(A+B)=\\ \text{com}(A)+\text{com}(B)+[\sin(A)\sin(B)-\text{com}(A)\text{com}(B)] \end{aligned} $$

(10)

Trigonometric adders enable the implementation of a sine/complement generator using a tree structure similar to that described in Section 3.2. Such an implementation, described in [18], requires a set of sine/complement subgenerators (the leaves of the tree) as well as trigonometric adders (the internal vertex). Furthermore, if the generator is quadrant restricted, then optimizations similar to those described in Section 3.2 can be applied:

Since the complement of the cosine is also positive in [0,π/2], the trigonometric adders can be implemented without signed arithmetic components.
If fixed-point representation is used, the leading zeros of the sines make it possible to reduce the size of the integer multipliers and that of the leaves.

In this case, the optimization described in Section 3.2.3 cannot be applied, but if fixed-point representation is used, we can use the following optimization that we call leading zeros of the complement: if a branch is quadrant restricted, an upper bound on its complement output is com(n_maxϕ), where n_max is the maximum of the input space of the branch. Therefore, if the k most significant bits of the fixed-point representation of com(n_maxϕ) are 0, those bits of the complement output of the branch are always 0. If the branch is implemented with an LUT, then there is no need to store those bits. Note that, unlike the optimization described in Section 3.2.3, this optimization enables a reduction of the size of the required multipliers.

A sine/cosine generator can be implemented with a sine/complement generator by simply adding a trivial arithmetic circuit to subtract the complement of the cosine from 1. In fact, if fixed-point notation is used, then this arithmetic circuit is not necessary since the trigonometric adder corresponding to the root can be easily modified to provide cos(nϕ) instead of com(nϕ) at no additional cost. To this end, instead of com(nϕ)=com(A+B), the root vertex computes its opposite −com(nϕ)=−com(A+B) by using the following equation:

$$ \begin{aligned} -\text{com}(A+B)=\\ [\text{com}(A)\text{com}(B)-\sin(A)\sin(B)]-[\text{com}(A)+\text{com}(B)] \end{aligned} $$

(11)

subsequently 1 can be added to −com(nϕ) in order to obtain cos(nϕ). Note that this last operation is merely toggling the integer bit of the representation of −com(nϕ). If a sine/cosine generator or a branch of it is quadrant restricted, then it should be implemented employing a complement generator due to the following reasons:

The smearing problems are lessened by using the formulae 10.
In contrast to the leading ones of the cosine optimization, the leading zeros of the complement optimization make it possible to reduce the size of the multipliers.

As an example, Fig. 4 shows how to implement a sine/cosine generator with an input I of width w=11 using a sine/complement generator whose topology is a tree of height h=2. The sine/complement generator uses m=2^h=4 subgenerators, M₀, M₁, M₂, and M₃, which have been implemented with direct access memories (8). Following the recommendations of Section 3.2 to minimize the total number of memory locations, M₃ has 2 address lines (q=⌊w/m⌋=2) and each of the other 3 remaining memories (r=w−mq=3) has an additional address line, and therefore, L(3)=2 and L(2)=L(1)=L(0)=3. Hence, SL(0)=0, ϕ₀=2⁰ϕ, SL(1)=3, ϕ₁=2³ϕ, SL(2)=6, ϕ₂=2⁶ϕ, SL(3)=9, and ϕ₃=2⁹ϕ. Each memory M_k contains the sines and the complement of the cosines of the multiples of ϕ_k=(2^SL(k))ϕ, and therefore, its output provides the sine and the complement of the cosine of n_kϕ_k, where n_k is the value of its address lines. Each address line t of each memory M_k is connected to I_t+SL(k), that is, the inputs of M₀, M₁, M₂, and M₃ are connected to I₂I₁I₀, I₅I₄I₃, I₈I₇I₆, and I₁₀I₉, respectively. Hence, n=n₀2^SL(0)+n₁2^SL(1)+n₂2^SL(2)+n₃2^SL(3) ⇒ nϕ=n₀2^SL(0)ϕ+n₁2^SL(1)ϕ+n₂2^SL(2)ϕ+n₃2^SL(3)ϕ=n₀ϕ₀+n₁ϕ₁+n₂ϕ₂+n₃ϕ₃. Three trigonometric adders are used (9). Those connected directly to the memories are employed to compute the sine and the complement of the cosine of the angles n₀ϕ₀+n₁ϕ₁ and n₂ϕ₂+n₃ϕ₃. The other computes the sine and the complement of the cosine of nϕ=n₀ϕ₀+n₁ϕ₁+n₂ϕ₂+n₃ϕ₃. A trivial arithmetic circuit (10) subtracts the complement of the cosine of nϕ from 1 to obtain the cosine of nϕ.

5 Experiments

In order to measure the possible enhancements that the proposed approach may provide, we have written an open-source tool, called twiddle.py, to automate the design of twiddle factor generators that use the proposed optimization. The tool admits arbitrary output precision and tree height. We selected the same size for the input, the sine output, and the cosine output. In the current version, the sequence length must be a powers of 2 so it is possible to apply argument reduction and only quadrant-restricted generators are needed. The sequence lengths in our experiments ranged from 2¹⁶ to 2²³. The twiddle.py tool follows the recommendations of Section 3.2 to minimize the size of the memories. More information about the tool can be found in the section “Availability of data and materials.” The values computed by the generators are faithfully rounded.

6 Results and discussion

6.1 Multipliers

As described in Section 3.2.3, the leading ones of the cosine optimization make it possible to reduce the size of the memories, but not the size of the required arithmetic components. We claimed that the leading zeros of the complement optimization, described in Section 4, were more convenient because they make it possible to reduce not only the size of the memories, but also the size of the multipliers. To ascertain such claim, we used the twiddle.py tool to implement generators of height 1 with input size ranging from 16 to 23 bits. The comparison of the size of the complement multiplicands and the corresponding cosine multiplicands is shown in Table 2. The width in bits of the cosine and the complement of the cosine is reported in the subcolumns cosine and compl respectively. The relative reduction is reported in the subcolumns red.

Table 2 Multiplicand width comparison of generators with a tree structure of height 1

Full size table

The first fractional bit of the cosine multiplicand is always zero. However, the precision of the multiplicands must be two bits higher than the output precision to compensate for rounding errors. For this reason, the size of the cosine multiplicands of both branches must be a bit greater than the output data width. The size of the corresponding complement multiplicand of the left branch is just one bit lower so the relative saving decreases with the data width. On the other hand, the size of the complement multiplicand of the right branch is almost constant and is never greater than 4 so the relative saving increases with the data width and ranges from 83 to 90%.

As noted by [1], an obvious way to further reduce the size of the memories is to replace them by sub-branches with memories of smaller depth at the cost of more multiplications. Of course, if the implementation is fully pipelined or combinational, the additional multiplications should be carried out by new multipliers. If such replacement is carried out, the angles corresponding to most of these new memories are remarkably smaller. For this reason, the leading zeros of the complement optimization should further reduce the size of the additional multipliers. To ascertain this, we used the twiddle.py to carry out such replacements by increasing the tree height one stage. The saving obtained for each branch is reported in the next sub-subsections.

6.1.1 Left branch

The size of the operands of the left branch is shown in Table 3. As in the tree of height 1, the saving obtained by using the proposed implementation in its left sub-branch is negligible, but the saving in its right sub-branch is remarkable, ranging from 41 to 53%.

Table 3 Multiplicand width comparison of the left branch of generators with a tree structure of height 2

Full size table

6.1.2 Right branch

As shown in Table 4, the saving obtained by using the proposed implementation in the right branch is astonishing. To begin, the saving corresponding to its left sub-branch is very high, ranging from 77 to 88%. Moreover, the saving corresponding to its right sub-branch is always 100%. This means that the complement multiplicand of the right sub-branch is always zero and the corresponding multipliers can be removed.

Table 4 Multiplicand width comparison of the right branch of generators with a tree structure of height 2

Full size table

6.2 Implementation

The hardware description of the twiddle factor generators with a sine/complement tree of height 1 were implemented in a Xilinx (Virtex) 7 XC7VX485T-2FFG1761 field-programmable gate array (FPGA) chip. We also implemented the equivalent sine/cosine tree generators described in [1] and cordinate rotation digital computer (CORDIC) generators to measure the relative enhancements and penalties. Following the notation used in [1], the reference sine/cosine tree implementations will be called SinAndCos. The proposed implementations will be called SinAndCom. The SinAndCos and CORDIC implementations are generated by the flopoco open-source tool version 4.1.2 available at http://flopoco.gforge.inria.fr. The tested implementations are combinational, but are embedded in a dummy sequential module in order to obtain delay estimations from the synthesis tool. Synthesis was carried out with the (Vivado) Design Suite tool of Xilinx version 2017.2.1 using the default options. The only exception is that the use of the Digital Signal Processing (DSP) blocks was disabled in order to extrapolate the results to other reconfigurable devices as well as to ASIC. The obtained results are reported in the next sub-subsections.

6.2.1 Delay

The delay of the slowest path of the generators is shown in Tables 5, 6, and 7. The SinAndCom approach turned out to be about 50% faster than CORDIC but was slower than SinAndCom in most cases. For the lower data widths, the penalty was as high as 26%. One of the reasons is that, in order to apply Eq. 10, it is necessary to execute more explicit additions and subtractions than in Eq. 6. The penalty is not that severe for higher widths, and there are some little speed-ups ranging from 5 to 7%.

Table 5 Delay of the sine/complement generators

Full size table

Table 6 Delay of the CORDIC generators

Full size table

Table 7 Delay of the sine/cosine generators

Full size table

6.2.2 Resources

The resource utilization of the generators are shown in Tables 8, 9, and 10. No DSP units were used. As mentioned in Section 3.2, in order to save resources, the flopoco tool does not implement the right branch of the SinAndCom generators with a memory. Instead, the corresponding values are evaluated by using the Taylor series. The same optimization can be applied to the SinAndCom approach, but the twiddle.py does not implement it yet. For this reason, this comparison is not fair to the SinAndCom approach but, even without the Taylor series optimization, SinAndCom used less LUTs in every case. The best saving was as high as 26%. The implementation of the Taylor optimization should provide higher relative savings. Also, as mentioned in Section 6.1, the relative saving should be remarkable if higher tree structures are used. Unfortunately, unlike twiddle.py, the current version of flopoco only supports tree structures of height 1 so we were unable to ascertain this. Regarding CORDIC, for low data widths, the SinAndCom has enabled the number of look-up tables to be reduced by between 20 and 27%, while for higher data widths, it introduces a penalty as high as 15% in the resource utilization.

Table 8 Resource utilization of the sine/complement generators

Full size table

Table 9 Resource utilization of the CORDIC generators

Full size table

Table 10 Resource utilization of the sine/cosine generators

Full size table

7 Conclusions

In this paper, we propose a table based-sine/cosine computation technique. In the proposed technique, the complement of the cosine is computed before the cosine itself in order to reduce the size of the required multiplications. We have released an open-source tool to automate the design of twiddle factor generators of arbitrary size and precision using the proposed technique. Several twiddle factor generators have been implemented in a Xilinx (Virtex) 7 XC7VX485T-2FFG1761 FPGA chip using the proposed technique and other techniques described in [1]. The proposed technique is remarkably faster than CORDIC. Also, when compared with previous table-based implementations with a tree structure of height 1, the proposed technique enabled a remarkable saving in the hardware resources at the expense of delay. To increase the relative saving, some of the memories could be replaced by circuits that evaluate the corresponding values. Further research is required to measure the benefits that such optimization could provide. Also, it would be interesting to make new comparisons if a version of flopoco supporting higher tree structures is released.

Abbreviations

ASIC:: Application-specific integrated circuit
CORDIC:: Coordinate rotation digital computer
DCT:: Discrete cosine transform
DSP:: Digital signal processing
DFT:: Discrete fourier transform
DST:: Discrete sine transform
DVB-T2:: Digital video broadcasting—terrestrial 2
FFT:: Fast fourier transform
FPGA:: Field-programmable gate array
IDCT:: Inverse discrete cosine transform
IDST:: Inverse discrete sine transform
PLC:: Power line communications
LUT:: Look-up table

References

F. de Dinechin, M. Istoan, G. Sergent, Fixed-point trigonometric functions on FPGAS. SIGARCH Comput. Archit. News. 41(5), 83–88 (2014). https://doi.org/10.1145/2641361.2641375.
Article Google Scholar
K. J. Lin, C. C. Hou, in Proceedings of the IEEE 2nd Global Conference on Consumer Electronics (GCCE 2013). Implementation of trigonometric custom functions hardware on embedded processor (Tokyo, 2013), pp. 155–157. https://doi.org/10.1109/GCCE.2013.6664782.
H. Huang, L. Xiao, J. Liu, Cordic-based unified architectures for computation of DCT/IDCT/DST/IDST. Circ. Syst. Signal Proc.33(3), 799–714 (2014). https://doi.org/10.1007/s00034-013-9661-9.
Article Google Scholar
D. Goldberg, What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv.23(1), 5–48 (1991). https://doi.org/10.1145/103162.103163.
Article MathSciNet Google Scholar
V. Lefevre, J. M. Muller, in Proceedings of the 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001. Worst cases for correct rounding of the elementary functions in double precision (Vail, 2001), pp. 111–118. https://doi.org/10.1109/ARITH.2001.930110.
T. Kulshreshtha, A. S. Dhar, Cordic-based high throughput sliding DFT architecture with reduced error-accumulation. Circ. Syst. Signal Proc.37(11), 5101–5126 (2018). https://doi.org/10.1007/s00034-018-0810-z.
Article Google Scholar
IEEE Standard for broadband over power line networks: medium access control and physical layer specifications. IEEE Std 1901-2010, 1–1586 (2010). https://doi.org/10.1109/IEEESTD.2010.5678772.
S. -Y. Lin, C. -L. Wey, M. -D. Shieh, Low-cost FFT processor for DVB-T2 applications. IEEE Trans. Consum. Electron.56(4), 2072–2079 (2010). https://doi.org/10.1109/TCE.2010.5681074.
Article Google Scholar
R. H. Stanton, in Proceedings of the 31st Annual SAS Symposium on Telescope Science. Photon counting - one more time (Big Bear Lake, 2012), pp. 177–184. http://adsabs.harvard.edu/abs/2012SASS...31..177S.
H. Nakahara, H. Nakanishi, T. Sasao, On a wideband fast Fourier transform for a radio telescope. SIGARCH Comput. Archit. News. 40(5), 46–51 (2012). https://doi.org/10.1145/2460216.2460225.
Article Google Scholar
F. Qureshi, O. Gustafsson, in Proceedings of the 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers. Analysis of twiddle factor memory complexity of radix-2i pipelined FFTs (Pacific Grove, 2009), pp. 217–220. https://doi.org/10.1109/ACSSC.2009.5470121.
J. G. Nash, Distributed-memory-based FFT architecture and FPGA implementations. Electronics. 7(7) (2018). https://doi.org/10.3390/electronics7070116.
J. W. Cooley, P. A. W. Lewis, P. D. Welch, Historical notes on the fast Fourier transform. Proc. IEEE. 55(10), 1675–1677 (1967). https://doi.org/10.1109/PROC.1967.5959.
Article Google Scholar
R. A. Smith, A continued-fraction analysis of trigonometric argument reduction. IEEE Trans. Bus. Econ.44(11), 1348–1351 (1995). https://doi.org/10.1109/12.475133.
MATH Google Scholar
H. Kang, B. Yang, J. Lee, Low complexity twiddle factor multiplication with ROM partitioning in FFT processor. Electron. Lett.49(9), 589–591 (2013). https://doi.org/10.1049/el.2013.0689.
Article Google Scholar
D. Guerrero, J. Viejo, P. Ruiz-de-Clavijo, J. Juan, M. J. Bellido, A. Millan, E. Ostua, J. I. Villar, J. Quiros, A. Muñoz, Digital Electronic circuit for calculating sines and cosines of multiples of an angle. WO2018104566A1: (2018).
D. Guerrero, A. Millan, J. Juan, J. Viejo, M. J. Bellido, P. Ruiz-de-Clavijo, E. Ostua, Dispositivo Electrónico Calculador de Funciones Trigonométricas. P201831134: (2019).
D. Guerrero, A. Millan, J. Juan, J. Viejo, M. J. Bellido, P. Ruiz-de-Clavijo, E. Ostua, Dispositivo Electrónico Calculador de Funciones Trigonométricas Y Usos Del mismo. P201831133: (2019).

Download references

Acknowledgements

Not applicable.

Funding

This work has been partially supported by the Ministerio de Economía, Industria y Competitividad of Spain under project TIN2017-89951-P (BootTimeIoT) and by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Departamento de Tecnología Electrónica, Universidad de Sevilla, Avenida de Reina Mercedes S.N., Sevilla, 41012, Spain
David Guerrero Martos, Alejandro Millán Calderón, Jorge Juan Chico, Julian Viejo Cortés, Manuel J. Bellido Díaz, Paulino Ruiz-de-Clavijo Vazquez & Enrique Ostúa Arangüena

Authors

David Guerrero Martos
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Millán Calderón
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Juan Chico
View author publications
You can also search for this author in PubMed Google Scholar
Julian Viejo Cortés
View author publications
You can also search for this author in PubMed Google Scholar
Manuel J. Bellido Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Paulino Ruiz-de-Clavijo Vazquez
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Ostúa Arangüena
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, D.G.; data curation, D.G. and J.J; formal analysis, D.G.; funding acquisition, J.J. and P.R.; investigation, D.G.; methodology, D.G.; project administration, J.J.; resources, A.M., M.B. and E.O.; software, D.G. and A.M.; supervision, M.B.; validation, D.G.; visualization, P.R.; writing of the original draft and preparation, D.G. and J.J.; writing, review, and editing, A.M., J.V, and M.B. The authors read and approved the final manuscript.

Corresponding author

Correspondence to David Guerrero Martos.

Ethics declarations

Competing interests

The authors developed the inventions covered by patents WO2018104566A1, P201831134, and P201831133.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Guerrero Martos, D., Millán Calderón, A., Juan Chico, J. et al. Using the complement of the cosine to compute trigonometric functions. EURASIP J. Adv. Signal Process. 2020, 35 (2020). https://doi.org/10.1186/s13634-020-00692-5

Download citation

Received: 10 September 2019
Accepted: 06 July 2020
Published: 22 July 2020
DOI: https://doi.org/10.1186/s13634-020-00692-5

Using the complement of the cosine to compute trigonometric functions

Abstract

1 Introduction

2 Argument reduction

Definition 1

Definition 2

Definition 3

Definition 4

2.1 Periodicity

2.2 Sign reduction

2.3 Quadrant reduction

2.4 Octant reduction

3 Sublinear optimizations

3.1 Branching

3.2 Tree generator

3.2.1 Quadrant restriction

Definition 5

3.2.2 Leading zeros of the sine

3.2.3 Leading ones of the cosine

4 Sine/complement generator

Definition 6

5 Experiments

6 Results and discussion

6.1 Multipliers

6.1.1 Left branch

6.1.2 Right branch

6.2 Implementation

6.2.1 Delay

6.2.2 Resources

7 Conclusions

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords