Theoretical lower bounds for parallel pipelined shiftandadd constant multiplications with ninput arithmetic operators
 Miriam Guadalupe Cruz Jiménez^{1}Email author,
 Uwe Meyer Baese^{2} and
 Gordana Jovanovic Dolecek^{1}
https://doi.org/10.1186/s136340170466z
© The Author(s). 2017
Received: 26 January 2017
Accepted: 19 April 2017
Published: 3 May 2017
Abstract
New theoretical lower bounds for the number of operators needed in fixedpoint constant multiplication blocks are presented. The multipliers are constructed with the shiftandadd approach, where every arithmetic operation is pipelined, and with the generalization that ninput pipelined additions/subtractions are allowed, along with pure pipelining registers. These lower bounds, tighter than the stateoftheart theoretical limits, are particularly useful in early design stages for a quick assessment in the hardware utilization of lowcost constant multiplication blocks implemented in the newest families of field programmable gate array (FPGA) integrated circuits.
Keywords
SCM MCM FPGA Multiplication Lower bound1 Introduction
Particularly, in the last two decades, many efficient highlevel synthesis algorithms have been introduced for the multiplierless design of constant multiplication blocks. The common cost function to be minimized in these algorithms is given by the number of arithmetic operations (additions and subtractions) needed to implement the multiplications. Nevertheless, the critical path has the main negative impact in the speed and power consumption [13–18]. Therefore, substantial research activity has been carried out currently targeting both, applicationspecific integrated circuits (ASICs) [19–21] and FPGAs [5–10, 22–25], where the minimization of the number of arithmetic operations subject to a minimum number of depth levels is the ultimate goal.
On the other hand, even though ASICs still provides higher performance and low power consumption, the increased development time and manufacturing cost which comes with smaller CMOS transistor technologies have opened a large market for FPGAs. The FPGA technology provides the signal processing engineers with the ability to construct a custom data path that is tailored to the application at hand [26, 27]. FPGAs offer the flexibility of instruction set digital signal processors, while providing the processing power and flexibility of an ASIC, and enable significant design cycle compression and timetomarket advantages, an important consideration in an economic climate with everdecreasing market windows and short product life cycles [28, 29].
The novelty of this paper is to introduce the theoretical lower bounds for the number of operations necessary to implement pipelined single constant multiplication (PSCM) and pipelined multiple constant multiplication (PMCM) blocks that are constructed with the shiftandadd scheme. For the derivation of these bounds, we consider that either an ninput (where n is an integer) pipelined addition/subtraction or a single pipeline register have the same cost. As mentioned earlier, recently, this assumption fits particularly well for cases where n is set equal to 3 and the target platforms for implementation are the newest FPGAs from the two most dominant manufacturers, Xilinx and Altera. However, it is worth highlighting that n = 2 is still under common use in many applications. This contribution is important because the optimality of different algorithms that reduce the number of operations in PSCM and PMCM blocks can be tested using appropriate theoretical lower bounds. Additionally, these bounds can be useful to develop new algorithms.
The paper is organized as follows. In the next section, definitions and methods needed to address the proposal are given. Section 3 presents the new theoretical lower bounds along with theorems and proofs to support the derivation of these bounds. Comparisons with previous theoretical lower bounds from [3] and [4] are provided in Section 4. Finally, conclusions are given in Section 5.
2 Definitions of terms
where l _{1} ≥ 0,…, l _{ n } ≥ 0 are left shifts, r ≥ 0 is a right shift, s _{2},…, s _{ n } are binary values, q = {l _{1}, …, l _{ n }, s _{2}, …, s _{ n }, r} is the configuration of the Aoperation, and u _{1},…, u _{ n } are odd integers.

The output of each Aoperation is called fundamental.

For a graph with m Aoperations, there are m + 1 vertices and m fundamentals.

Each vertex has an indegree n, except for the input vertex which has indegree zero.

A vertex with indegree n corresponds to an ninput Aoperation.

Each vertex has outdegree larger than or equal to one except for the output vertex which has outdegree zero.

The constant resulting from the last Aoperation is output fundamental (OF). The constants resulting from previous Aoperations are nonoutput fundamentals (NOFs).
In the MCM case, there are several OFs.
Synthesis results of pipelined and nonpipelined implementations of a 45X multiplier in the Altera Cyclone IV EP4CE115F29C7 FPGA
Pipelined  Total logic elements (LE)  Maximum frequency of operation (MHz)  Area × Time cost metric (LE/MHz) 

No  31  285.47  0.1086 
Yes  34  376.08  0.0904 
 1)
Its MNSD, denoted by S. We will also refer to this number in a more informal manner as “the number of nonzero digits”.
 2)
Its number of prime factors (it does not matter if these prime factors are repeated). This number is denoted by Ω.
3 Proposed lower bounds
In the following, we state, in Subsection 3.1, Theorems 1 to 8 to derive the lower bounds of Roperations in PSCM and, in Subsection 3.2, Theorems 9 and 10 for PMCM, along with their corresponding proofs. The pipelining operation, which has not been alluded in the previous works [3] and [4], is explicitly included in the proposed lower bounds through the Roperations.
3.1 PSCM case
Whenever a constant c is mentioned in the theorems of this subsection (Theorems 1 to 8), we consider that the MNSD of that constant is S and its number of prime factors is Ω.
Theorem 1 provides the upper limit of nonzero digits that can be generated by any graph with a given number of depth levels, regardless of its number of Roperations. From this, we can know the minimum number of depth levels that a graph must have to implement a constant with a given S.
Theorems 2 and 3 prove the properties of the completely multiplicative graphs, namely, generating the upper limit of nonzero digits mentioned in Theorem 1 with the minimum possible number of Roperations. From them, we have that the completely multiplicative graph is a solution with the lower bound for the number of Roperations. However, as it is known, this graph has articulation points, and every articulation point represents the union between two cascaded subgraphs, i.e., the product of two smaller constants. Therefore, Theorem 4 uses Ω to identify what constants can be implemented with the completely multiplicative graph (for example, prime constants cannot be factorized into smaller constants; thus, they cannot be implemented by a completely multiplicative graph).
Theorem 5 identifies the minimum number of Roperations needed in any nonmultiplicative graph with a given number of depth levels, and Theorem 6 proves that nonmultiplicative graphs can generate the upper limit of nonzero digits mentioned in Theorem 1 with its minimum number of Roperations. Then, Theorem 7 establish the lower bound for the number of Roperations needed to implement a prime constant (Ω = 1).
Finally, Theorem 8 completes the information of Theorems 4 and 7, namely, the lower bound of Roperations needed to implement nonprime constants that have fewer number of factors than the number of subgraphs used in a completely multiplicative graph.
Theorem 1
A graph with p depth levels can provide at most n ^{ p } nonzero digits for a constant.
Proof
 1)
The base case corresponds to the first depth level, where a ninput Aoperation can form a constant with at most n nonzero digits. This is true since the input of any graph has one nonzero digit [3, 4, 39].
 2)
As inductive step, we assume that, in the pth level, there are n ^{ p } nonzero digits at most. In the (p + 1)th level, an Aoperation can form a constant whose number of nonzero digits is the sum of the numbers of nonzero digits at every input of that Aoperation. This is at most n times the maximum number of nonzero digits available in the previous level, i.e., n × n ^{ p } = n ^{ p + 1} nonzero digits.
Since assuming that the theorem is true for p implies that the theorem is also true for p + 1, and since the base case is also true, the proof is complete. An adder, regardless of its number of inputs, cannot generate more nonzero digits than the sum of the numbers of nonzero digits in every one of its inputs. Thus, the MNSD can be, at most, nplicate if the inputs of the ninput adder placed in any depth level come from the immediately previous depth level. ■
Theorem 2
A completely multiplicative graph with p Aoperations can generate n ^{ p } nonzero digits.
Proof
This proof is an straightforward extension of the proof of Theorem 6.8 in [39], which corresponds to completely multiplicative graphs with 2input Aoperations. As stated earlier, the input of a graph has one nonzero digit. In the completely multiplicative graph, there are at most n nonzero digits after the Aoperation placed at the 1st depth level. Cascading an Aoperation to that output yields at most n × n nonzero digits, and so on. The number of nonzero digits at the depth level p is at most the ntuple of the number of nonzero digits of a fundamental at the (p − 1)th depth level. Consequently, the maximum number of nonzero digits at the pth depth level is n ^{ p }. ■
Theorem 3
A completely multiplicative graph with p depth levels needs only p Roperations.
Proof
Theorem 4
A constant with (n ^{ p − 1 } + 1) ≤ S ≤ n ^{ p } and Ω ≥ p needs at least p Roperations.
Proof
From Theorem 2, we have that a constant with (n ^{ p − 1} + 1) ≤ S ≤ n ^{ p } nonzero digits can be implemented with at least p depth levels, which implies at least p Aoperations. From Theorem 3, we have that a completely multiplicative graph can generate those values for S with only p Roperations. The completely multiplicative graph with p Roperations consists of p cascaded subgraphs; thus, a constant implemented with that graph must have at least p prime factors. Since Ω ≥ p holds, the completely multiplicative graph can be employed to implement that constant using p Roperations. ■
Theorem 5
A nonmultiplicative graph with p depth levels needs at least (2p − 1) Roperations.
Proof
Theorem 6
A nonmultiplicative graph with p depth levels and (2p − 1) Roperations can generate n ^{ p } nonzero digits.
Proof
Theorem 7
A constant with (n ^{ p − 1 } + 1) ≤ S ≤ n ^{ p } and Ω = 1 needs at least 2p − 1 Roperations.
Proof
Since Ω = 1 holds, the nonmultiplicative graph must be employed to implement that constant. From Theorem 6, we have that a constant with (n ^{ p − 1} + 1) ≤ S ≤ n ^{ p } nonzero digits can be implemented with at least p depth levels and at least 2p − 1 Roperations. This is a lower bound for the number of Roperations, since from Theorem 5, we have that a nonmultiplicative graph with plevels needs at least 2p − 1 Roperations. ■
Theorem 8
A constant with (n ^{ p−1 } + 1) ≤ S ≤ n ^{ p } and 1 < Ω < p needs at least (2p − Ω) Roperations.
Proof
3.2 PMCM case
The theorems in this section are stated for N constants c _{1}, c _{2}, …, c _{ N }, whose respective MNSDs are S _{1}, S _{2}, …, S _{ N }, and their respective numbers of prime factors are Ω_{1}, Ω_{2}, …, Ω_{ N }, such that S _{1} ≤ S _{2} ≤ … ≤ S _{ N }.
Theorem 9 indicates the lower bound for the number of ninput Aoperations needed to form an MCM block. If pipelining is added, more Roperations than the aforementioned lower bound may be needed because the constants with fewer prime factors may use nonmultiplicative graphs, which require extra Roperations (see Theorems 5 to 8). Besides, all the outputs of the PMCM block must have equal number of depth levels to balance the input–output delay, which also may require extra Roperations. Based on these observations, Theorem 10 extends the lower bound provided in Theorem 9 by identifying at least how many extra Roperations would be needed. From these theorems, we obtain the lower bound for the number of Roperations needed to form a PMCM block.
Theorem 9
with \( E\left({S}_i,{S}_{i+1}\right)=\left\{\begin{array}{c}\hfill 1;\kern5em {S}_i={S}_{i+1},\hfill \\ {}\hfill \left\lceil { \log}_n\frac{S_{i+1}}{S_i}\right\rceil; \kern0.75em {S}_i<{S}_{i+1}.\hfill \end{array}\right. \)
Proof
Recall that every Aoperation has only one possible configuration and therefore can generate only one fundamental. Simply shifted (i.e., scaled by a power of two) versions of that fundamental can be obtained from that Aoperation. Since the target constants are integer and odd by definition, it is not possible to obtain two target constants from the same Aoperation. Therefore, there must be at least N ninput Aoperations for the N constants. Note that, since the terms S _{ i } are sorted in ascendant order, S _{1} corresponds to the simplest constant, i.e., the one with the smallest number of nonzero digits. From Theorem 1, we have that with p depth levels we can obtain n ^{ p } nonzero digits at most. By using the relation n ^{ p } ≥ S _{1}, we have that the minimum number of levels necessary to generate S _{1} nonzero digits is ⌈ log_{ n }(S _{1})⌉, which implies the existence of at least ⌈ log_{ n }(S _{1})⌉ Aoperations for that constant. Finally, if S _{ i+1} > n × S _{ i } holds, we have that a single Aoperation is not able to generate the constant c _{ i+1} if there are only coefficients with at most S _{ i } digits available because the number of nonzero digits at the output of an Aoperation is at most the sum of the number of nonzero digits at its inputs. Therefore, at least ⌈ log_{ n }(S _{ i + 1}/S _{ i })⌉ Aoperations will be required. This proof is a straightforward extension of the proof given in [3] for the lower bound of 2input Aoperations that form an MCM block. ■
Theorem 10
and K given in (7).
Proof
Consider that there is a constant c _{ m } that satisfies Ω_{ m } < ⌈ log_{ n }(S _{ m })⌉ and, if there are more constants that satisfy such condition, c _{ m } has the greatest difference [⌈ log_{ n }(S _{ m })⌉ − Ω_{ m }]. From Theorem 8, we have that the constant can be formed by cascading a nonmultiplicative graph with a completely multiplicative graph, where the nonmultiplicative graph needs 2[⌈ log_{ n }(S _{ m })⌉ − (Ω_{ m } − 1)] − 1 Roperations. Since Theorem 9 has not taken into consideration the number of prime factors, only [⌈ log_{ n }(S _{ m })⌉ − (Ω_{ m } − 1)] Aoperations have been accounted in that theorem, under the assumption that the constant c _{ m } can be constructed with the optimal completely multiplicative graph. Therefore, at least [⌈ log_{ n }(S _{ m })⌉ − (Ω_{ m } − 1)] − 1 extra Roperations must be included when pipelining is applied, which explains the term F. The term G is explained by the fact that extra Roperations may be needed to achieve the same number of pipelined stages from input to output in every constant. Since the minimum depth level of a constant is given by ⌈ log_{ n }(S)⌉, the differences between the minimum depth level of the constant c _{ N } (which has the greatest depth level among other constants) and the minimum depth levels of the other constants are accumulated in the term G. ■
with \( E\left({S}_i,{S}_{i+1}\right)=\left\{\begin{array}{c}\hfill 1;\kern5em {S}_i={S}_{i+1},\hfill \\ {}\hfill \left\lceil { \log}_n\frac{S_{i+1}}{S_i}\right\rceil; \kern0.75em {S}_i<{S}_{i+1},\hfill \end{array}\right. \)
and \( F=\left\{\begin{array}{c}\hfill {\displaystyle \underset{i}{ \max }}\left\{\left\lceil { \log}_n\left({S}_i\right)\right\rceil {\varOmega}_i\right\};\kern0.5em \forall\ i\kern0.5em \mathrm{such}\ \mathrm{that}\kern0.75em {\varOmega}_i<\left\lceil { \log}_n\left({S}_i\right)\right\rceil, \hfill \\ {}\hfill 0;\kern8.25em \mathrm{otherwise}.\hfill \end{array}\right. \)
4 Results and comparisons
In this section, comparisons of the proposed lower bounds with the lower bounds currently available in literature are presented, detailing PSCM and PMCM cases in Subsections 4.1 and 4.2, respectively. In all cases, two and threeinput additions were considered.
First, the PSCM case is addressed for n = 2 (i.e., 2input additions) with an illustration of the lower bounds averaged over all the constants with a word length of B bits, where B goes from 1 to 14. This illustration compares the proposed lower bound with the existing lower bounds from [3] and [4], showing that the proposed lower bound is tighter. An example is also included, where the pipelined shiftandadd multipliers for some constants are constructed with 2input and 3input additions.
The effectiveness of the PMCM lower bound is demonstrated by examples, where pipelined shiftandadd multiple constant multiplication blocks are constructed using the algorithms from [7, 8, 22, 30] and [36] for the case of 2input additions and the algorithm from [10] for the case of 3input additions. The proposed lower bound is compared with the lower bound from [3] in the case of 2input additions, and in most of the cases, it provides better estimation of the number of required Roperations. For n = 3 (i.e., 3input additions), there are no theoretical lower bounds currently available in literature. Thus, the proposed lower bound is only compared with the solution from [10]. In that case, the proposed lower bound falls short only by one Roperation.
4.1 PSCM case
Example 1 The constants {11,467}, {11,093}, and {13,003} have similar graph and the same lower bounds as shown in Table 3. The corresponding graphs are presented in Fig. 12.
4.2 PMCM case
Resulting Roperations for Example 4 using n = 2 input adders
Resulting Roperations for Example 4 using n = 3 input adders
Algorithm  Roperations 

PAG for 3input adders (method [10])  4 
L_{PMCM}  3 
Resulting Roperations for Example 5 using n = 2 input adders
Resulting Roperations for Example 5 using n = 3 input adders
Algorithm  Roperations 

PAG for 3input adders (method [10])  3 
L_{PMCM}  3 
Example 2 (example given in [8]) A multiplier block with the constants from the set {44; 130; 172} have the estimate number of Roperations as shown in Table 4 (the resulting graphs are shown in Fig. 1 of paper [8]).
Example 3 (example given in [7]) A multiplier block with the constants from the set {3; 13; 21; 37} have the estimate number of Roperations as is shown in Table 5 (the resulting graphs can be seen in Fig. 4 of [7]).
Example 4 (example given in [10]) A multiplier block with the constants from the set {7,567; 20,406} have the estimate number of Roperations as shown in Table 6 for twoinput adders and Table 7 for threeinput adders (Fig. 3 of [10] shows the corresponding graphs).
5 Conclusions
New theoretical lower bounds for the number of Roperations in the fully pipelined SCM and the fully pipelined MCM cases for ninput adders/subtractions have been presented. The proposed lower bounds are tighter because pipelining registers were explicitly considered. On the other hand, it was observed that the use of articulation points allows a rapid increase of the number of nonzero digits from a depth level to the next depth level. The new theoretical lower bounds achieve better estimation of the number of required operations needed to implement a single multiplier or a multiplier block. The tightening of the new lower bounds was illustrated with examples in the comparisons section.
Declarations
Acknowledgements
This paper has been supported by CONACYT scholarship no. 224191. The authors are grateful to D. E. T. Romero for his helpful comments during the development of this proposal.
Funding
This work is a result of a doctoral thesis developed in the Institute INAOE; the thesis has been supported with CONACYT’s grant.
Authors’ contributions
MGCJ contributed to the main development of the theorems and examples in this proposal. UMB is the advisor in the development of lowcomplexity FPGAbased arithmetic blocks and contributed to the review of theorems and examples. GJD as thesis advisor directed all the work an the paper was written under her supervision. All authors read and approved the final manuscript.
Authors’ information
Miriam Guadalupe Cruz Jimenez received the BS degree from the Minatitlan Institute of Technology and the MS degree from the National Institute for Astrophysics, Optics and Electronics (INAOE), Mexico. She received the best paper award at the conference CIIECC 2013. Currently, she is a PhD student in the Institute INAOE. She is a reviewer for the journals IEEE Transactions on Circuits and Systems I and Circuits, Systems & Signal Processing.
Dr. Uwe H. MeyerBaese (IEEE, S'91M'93) was born in Kassel, Germany, on July 10, 1964. He received his BSEE, MSEE, and Ph.D. “Summa cum Laude” from the Darmstadt University of Technology in 1987, 1989, and 1995, respectively. In 1994 and 1995, he held a Postdoctoral Position in “Institute of Brain Research,” Magdeburg, Germany. In 1996 and 1997, he was a visiting professor at the University of Florida, Gainesville. From 1998 to 2000, he worked as a Research Scientist in the ASIC industry. He joint Electrical and Computer Engineering Department at the FAMUFSU College of Engineering in 2001 and is now an Associate Professor. He holds 3 patents, has published over 100 journal and conference papers, 5 books, and supervised more than 60 master thesis projects in the realtime DSP/FPGA area. He is author of the bestselling Springer textbook on DSP with FPGAs. He was a recipient of the MaxKade Award in Neuroengineering in 1997, ECE Department Research Award in 2005, Who’s Who in Science member in 2005, SPIE, Best Presentation Award in 2006, FAMUFSU College of Engineering Teaching Award in 2007, and the Humboldt Fellow in 2009. He has served as Faculty Senator of the FSU senate since Spring 2011. He has been an elected member of the editorial board for the journal Signal, Image and Video Processing for 2011–2015 and has been elected as a board member as well as an associate editor for the EURASIP Journal of Advances in Signal Processing for 2011–2013.
Gordana Jovanovic Dolecek received a BS degree from the Faculty of Electrical Engineering, University of Sarajevo, an Ms degree from the University of Belgrade, and a Ph.D. degree from the Faculty of Electrical Engineering, University of Sarajevo. She was with the Faculty of Electrical Engineering, University of Sarajevo until 1993, as a research assistant, assistant professor, associate professor, and full professor. From 1986 to 1991, she was chairman of the Department of Telecommunication. During 1993–1995, she was with the Institute Mihailo Pupin, Belgrade. In 1995, she joined Institute INAOE, Department for Electronics, Puebla, Mexico, where she works as a professor and researcher. She is the author of three books and more than 100 papers. She is also author of four lectures for TechOnLine University. Her research interests include digital signal processing and digital communications. She is a member of IEEE and The National Researcher System (SNI) of Mexico.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 R Guo, LS DeBrunner, K Johansson, Truncated MCM Using Pattern Modification for FIR Filter Implementation. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, p. 3881–3884, May 30Jun 2, 2010.Google Scholar
 L Aksoy, EO Günes, P Flores, Search algorithms for the multiple constant multiplication problem: exact and approximate. Microprocess. Microsyst. 34(5), 151–162 (2010). doi: doi.org/https://doi.org/10.1016/j.micpro.2009.10.001.
 O Gustafsson, Lower bounds for constant multiplication problems. IEEE Trans. Circuits and Syst. II: Express briefs 54 (11), 974–978 (2007). doi: https://doi.org/10.1109/TCSII.2007.903212.
 DET Romero, U MeyerBaese, GJ Dolecek, On the inclusion of prime factors to calculate the theoretical lower bounds in multiplierless single constant multiplications. EURASIP Journal on Advances in Signal Processing 122, 1–9 (2014). doi:https://doi.org/10.1186/168761802014122.
 S Mirzaei, R Kastner, A Hosangadi, Layout aware optimization of high speed fixed coefficient FIR filters for FPGAs. Int. Journal of Reconfigurable Computing (2010). doi: https://doi.org/10.1155/2010/697625.
 M Kumm, P Zipf, High speed low complexity FPGAbased FIR filters using pipelined adder graphs. Paper presented at the Int. Conference on Field Programmable Technology (FPT), Indian Institute of Technology Delhi, New Delhi, India, p. 1–4, 12–14 December 2011.Google Scholar
 U MeyerBaese, G Botella, DET Romero, M Kumm, Optimization of high speed pipelining in FPGAbased FIR filter design using genetic algorithm. Proc. SPIE 8401, Independent Component Analyses, Compressive Sampling, Wavelets, Neural Net, Biosystems, and Nanoengineering X, 84010R112 (2012). doi:https://doi.org/10.1117/12.918934.
 M Kumm, P Zipf, M Faust, CH Chang, Pipelined adder graph optimization for high speed multiple constant multiplication. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), p. 49–52, COEX, Seoul, Korea, 20–23 May 2012.Google Scholar
 M Kumm, D Fanghanel, K Moller, P Zipf, U MeyerBaese, FIR filter optimization for video processing on FPGAs. EURASIP J Adv Sig Process 111, 1–18 (2013). doi:https://doi.org/10.1186/168761802013111 Google Scholar
 M Kumm, M Hardieck, J Willkomm, P Zipf, U MeyerBaese, Multiple constant multiplications with ternary adders. Paper presented at the International Conference on Field Programmable Logic and Applications (FPL), Porto, Portugal, p. 1–8, 2–4 Sept. 2013.Google Scholar
 M Kumm, P Zipf, Pipelined compressor tree optimization using integer linear programming. Paper presented at the 24th International Conference on Field Programmable Logic and Applications (FPL), p. 1–8, Munich, Germany, 2–4 Sept. 2014.Google Scholar
 M Kumm, P Zipf, Efficient high speed compression trees on Xilinx FPGAs. Paper presented at the MBMV, IBM Germany Research and Development, Böblinguen, Germany, p. 171–182, 10–12 March 2014.Google Scholar
 L Aksoy, E Costa, P Flores, J Monteiro, Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications. IEEE Trans. Comput.Aided Des. Integr. Circuits 27(6), 1013–1026 (2008). doi:https://doi.org/10.1109/TCAD.2008.923242 View ArticleGoogle Scholar
 L Aksoy, E Costa, P Flores, J Monteiro, Finding the optimal tradeoff between area and delay in multiple constant multiplications. Elsevier Journal Microprocessors and Microsystems 35 (8), 729 – 741 (2011). doi: doi.org/https://doi.org/10.1016/j.micpro.2011.08.009.
 AG Dempster, SS Dimirsoy, I Kale, Designing multiplier blocks with low logic depth. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Scottsdale, Arizona, p. 773–776, 26–29 May 2002.Google Scholar
 M Faust, CH Chang, Minimal logic depth adder tree optimization for multiple constant multiplication. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Paris, France, p. 457–460, May 30Jun 2, 2010.Google Scholar
 K Johansson, O Gustafsson, LS DeBrunner, L Wanhammar, Minimum adder depth multiple constant multiplication algorithm for low power FIR filters. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, p. 1439–1442, 15–18 May 2011.Google Scholar
 AG Dempster, MD Macleod, Using all signeddigit representations to design single integer multipliers using subexpression elimination. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Vancouver, British Columbia, p. 165–168, 23–26 May 2004.Google Scholar
 L Aksoy, E Costa, P Flores, J Monteiro, Multiplierless design of linear DSP transforms. VLSISoC: Advanced Research for Systems on Chip, ed. by S. Mir, CY Tsui, R Reis, O Choy (Springer 2011), p. 73 – 93.Google Scholar
 YH Ho, CU Lei, HK Kwan, N Wong, Global optimization of common subexpressions for multiplierless synthesis of multiple constant multiplications. Paper presented at the Proceedings of Asia and South Pacific Design Automation Conference, Seoul, South Korea, p. 119–124, 21–24 January 2008.Google Scholar
 A Hosangadi, F Fallah, R Kastner, Simultaneous optimization of delay and number of operations in multiplierless implementation of linear systems. Paper presented at the Proceedings of International Workshop on Logic Synthesis, Lake Arrowhead, California, p. 1–8, 8–10 June 2005.Google Scholar
 KN Macpherson, RW Stewart, Rapid prototyping  Area efficient FIR filters for high speed FPGA implementation. IEE Proceedings  Vision, Image Signal Processing 156, 711–720 (2006). doi:https://doi.org/10.1049/ipvis%3A20045133.
 U MeyerBaese, J Chen, CH Chang, AG Dempster, A comparison of pipelined RAGn and DA FPGAbased multiplierless filters. Paper presented at the IEEE AsianPacific Conference on Circuits and Systems, Singapore, p. 1555–1558, 4–7 December 2006.Google Scholar
 L Aksoy, E Costa, P Flores, J Monteiro, Design of lowcomplexity digital finite impulse response filters on FPGAs. Paper presented at the Proceedings of Design, Automation and Test in Europe Conference, Dresden, Germany, p.11971202, 12–16 March 2012.Google Scholar
 M Faust, CH Chang, Bitparallel Multiple Constant Multiplication using LookUp Tables on FPGA. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), p. 657–660, Rio de Janeiro, Brazil, 15–18 May 2011.Google Scholar
 G Botella, A Garcia, M. RodriguezAlvarez, E Ros, U MeyerBaese, M C Molina, Robust bioinspired architecture for opticalflow computation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(4), 616–629 (2010). doi: https://doi.org/10.1109/TVLSI.2009.2013957.
 G Botella, U MeyerBaese, A Garcia, M Rodriguez, Quantization analysis and enhancement of a VLSI gradientbased motion estimation architecture. Digital Signal Processing, 22(6), 1174–1187 (2012). doi: doi.org/https://doi.org/10.1016/j.dsp.2012.05.013.
 G Botella, U MeyerBaese, A Garcia, Bioinspired robust optical flow processor system for VLSI implementation. Electron Lett 45(25), 1304–1305 (2009). doi:https://doi.org/10.1049/el.2009.1718 View ArticleGoogle Scholar
 E Castillo, A Lloris, DP Morales, L Parrilla, A Garcia, G Botella, A new areaefficient BCDdigit multiplier. Digital Signal Processing 62, 1–10 (2017). doi: dx.doi.org/https://doi.org/10.1016/j.dsp.2016.10.011.
 Y Voronenko, M Püschel, Multiplierless multiple constant multiplication, ACM Transactions on Algorithms, 3 (2), 11 (2007). doi: https://doi.org/10.1145/1240233.1240234.
 I Koren, Computer Arithmetic Algorithms. (Prentice Hall, 1993).Google Scholar
 U MeyerBaese, Digital Signal Processing with Field Programmable Gate Arrays, 4th. edn. (Springer, 2014).Google Scholar
 DR Bull, DH Horrocks, Primitive operator digital filters. IEE Proceedings G  Circuits, Devices and Systems 138(3), 401–412 (1991). doi:https://doi.org/10.1049/ipg2.1991.0066 View ArticleGoogle Scholar
 K Johansson, O Gustafsson, L Wanhammar, Switching activity estimation for shiftandadd based constant multipliers. Paper presented at the Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Seattle, Washington, p. 676–679, 18–21 May 2008.Google Scholar
 J Chen, CH Chang, Highlevel synthesis algorithm for the design of reconfigurable constant multiplier. IEEE Trans ComputerAided Des Integr Circ Syst 28(12), 1844–1856 (2009). doi:https://doi.org/10.1109/TCAD.2009.2030446 View ArticleGoogle Scholar
 AG Dempster, MD Macleod, Use of minimumadder multiplier blocks in FIR digital filters. IEEE Trans. Circ Syst II – Analog Digit Sig Process 42(9), 569–577 (1995). doi:https://doi.org/10.1109/82.466647 View ArticleMATHGoogle Scholar
 O Gustafsson, AG Dempster, K Johansson, MD Macleod, L Wanhammar, Simplified design of constant coefficient multipliers. Circuits Syst. Signal Process 25(2), 225–251 (2006). doi:https://doi.org/10.1007/s0003400525055 MathSciNetView ArticleMATHGoogle Scholar
 KK Parhi, VLSI digital signal processing systems: design and implementation, (John Wiley & Sons, 2007).Google Scholar
 O. Gustafsson, Contributions to LowComplexity Digital Filters, 837, (Linköping Studies and technology dissertations, 2003).Google Scholar
 R Kastner, A Hosangadi, F Fallah, Arithmetic Optimization Techniques for Hardware and Software Design, (Cambridge University Press, 2010).Google Scholar