- Research Article
- Open Access
Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions
EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 046472 (2006)
This paper presents an architecture that combines VLIW (very long instruction word) processing with the capability to introduce application-specific customized instructions and highly parallel combinational hardware functions for the acceleration of signal processing applications. To support this architecture, a compilation and design automation flow is described for algorithms written in C. The key contributions of this paper are as follows: (1) a 4-way VLIW processor implemented in an FPGA, (2) large speedups through hardware functions, (3) a hardware/software interface with zero overhead, (4) a design methodology for implementing signal processing applications on this architecture, (5) tractable design automation techniques for extracting and synthesizing hardware functions. Several design tradeoffs for the architecture were examined including the number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply-accumulate operations. Using the MediaBench benchmark suite, we tested our methodology and architecture to accelerate software. Our combined VLIW processor with hardware functions was compared to that of software executing on a RISC processor, specifically the soft core embedded NIOS II processor. For software kernels converted into hardware functions, we show a hardware performance multiplier of up to times that of software with an average times faster. For the entire application in which only a portion of the software is converted to hardware, the performance improvement is as much as 30X times faster than the nonaccelerated application, with a 12X improvement on average.
Altera Corporation : Stratix II Device Handbook, Volume 1. available on-line: https://doi.org/www.altera.com
Xilinx Incorporated : Virtex-4 Product Backgrounder. available on-line: https://doi.org/www.xilinx.com
Lattice Semiconductor Corporation : LatticeECP and EC Familiy Data Sheet. available on-line: https://doi.org/www.latticesemi.com
Apple Computer Inc : Optimizing with SHARK, Big Payoff, Small Effort.
Suresh DC, Najjar WA, Vahid F, Villarreal JR, Stitt G: Profiling tools for hardware/software partitioning of embedded applications. Proceedings of ACM SiGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '03), June 2003, San Diego, Calif, USA 189–198.
De Micheli G, Ku D, Mailhot F, Truong T: The Olympus synthesis system. IEEE Design and Test of Computers 1990, 7(5):37–53. 10.1109/54.60605
Lavagno L, Sentovich E: ECL: a specification environment for system-level design. Proceedings of 36th Design Automation Conference (DAC '99), June 1999, New Orleans, La, USA 511–516.
Gupta S, Dutt N, Gupta R, Nicolau A: SPARK: a high-level synthesis framework for applying parallelizing compiler transformations. Proceedings of 16th IEEE International Conference on VLSI Design (VLSI Design '03), January 2003, New Delhi, India 461–466.
Gupta S, Savoiu N, Dutt N, Gupta R, Nicolau A: Using global code motions to improve the quality of results for high-level synthesis. IEEE Transactions On Computer-Aided Design Of Integrated Circuits and Systems 2004, 23(2):302–312. 10.1109/TCAD.2003.822105
Jones AK, Bagchi D, Pal S, Banerjee P, Choudhary A: Pact HDL: compiler targeting ASIC's and FPGA's with power and performance optimizations. In Power Aware Computing. Edited by: Graybill R, Melhem R. Kluwer Academic, Boston, Mass, USA; 2002:169–190. chapter 9
Tang X, Jiang T, Jones AK, Banerjee P: Behavioral synthesis of data-dominated circuits for minimal energy implementation. Proceedings of 18th IEEE International Conference on VLSI Design (VLSI Design '05), January 2005, Kolkata, India 267–273.
Jung E: Behavioral synthesis using systemC compiler. Proceedings of 13th Annual Synopsys Users Group Meeting (SNUG '03), March 2003, San Jose, Calif, USA
Black D, Smith S: Pushing the limites with behavioral compiler. Proceedings of 9th Annual Synopsys Users Group Meeting (SNUG '99), March 1999, San Jose, Calif, USA
Bartleson K: A New Standard for System-Level Design. Synopsys White Paper, 1999
Goering R: Behavioral Synthesis Crossroads. EE Times Article, 2004
Pursley DJ, Cline BL: A practical approach to hardware and software SoC tradeoffs using high-level synthesis for architectural exploration. Proceedings of of the GSPx Conference, March–April 2003, Dallas, Tex, USA
Chappell S, Sullivan C: Handel-C for Co-Processing and Co-Design of Field Programmable System on Chip. Celoxica White Paper, 2002
Banerjee P, Haldar M, Nayak A, et al.: Overview of a compiler for synthesizing MATLAB programs onto FPGAs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2004, 12(3):312–324.
Banerjee P, Shenoy N, Choudhary A, et al.: A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. Proceedings of 8th Annual IEEE International Symposium on FPGAs for Custom Computing Machines (FCCM '00), April 2000, Napa Valley, Calif, USA 39–48.
McCloud S: Catapult C Synthesis-Based Design Flow: Speeding Implementation and Increasing Flexibility. Mentor Graphics White Paper, 2004
Chaiyakul V, Gajski DD: Assignment decision diagram for high-level synthesis. In Tech. Rep. #92-103. University of California, Irvine, Calif, USA; December 1992.
Chaiyakul V, Gajski DD, Ramachandran L: High-level transformations for minimizing syntactic variances. Proceedings of 30th Design Automation Conference (DAC '93), June 1993, Dallas, Tex, USA 413–418.
Ghosh I, Fujita M: Automatic test pattern generation for functional RTL circuits using assignment decision diagrams. Proceedings of 37th Design Automation Conference (DAC '00), June 2000, Los Angeles, Calif, USA 43–48.
Zhang L, Ghosh I, Hsiao M: Efficient sequential ATPG for functional RTL circuits. Proceedings of IEEE International Test Conference (ITC '03), September–October 2003, Charlotte, NC, USA 1: 290–298.
Chouliaras VA, Nunez J: Scalar coprocessors for accelerating the G723.1 and G729A speech coders. IEEE Transactions on Consumer Electronics 2003, 49(3):703–710. 10.1109/TCE.2003.1233807
Atzori E, Carta SM, Raffo L: 44.6% processing cycles reduction in GSM voice coding by low-power reconfigurable co-processor architecture. IEE Electronics Letters 2002, 38(24):1524–1526. 10.1049/el:20021019
Hilgenstock J, Herrmann K, Otterstedt J, Niggemeyer D, Pirsch P: A video signal processor for MIMD multiprocessing. Proceedings of 35th Design Automation Conference (DAC '98), June 1998, San Francisco, Calif, USA 50–55.
Garg R, Chung CY, Kim D, Kim Y: Boundary macroblock padding in MPEG-4 video decoding using a graphics coprocessor. IEEE Transactions on Circuits and Systems for Video Technology 2002, 12(8):719–723. 10.1109/TCSVT.2002.800857
Hinds CN: An enhanced floating point coprocessor for embedded signal processing and graphics applications. Proceedings of Conference Record 33rd Asilomar Conference on Signals, Systems, and Computers, October 1999, Pacific Grove, Calif, USA 1: 147–151.
Alves JC, Matos JS: RVC-a reconfigurable coprocessor for vector processing applications. Proceedings of 6th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '98), April 1998, Napa Valley, Calif, USA 258–259.
Bridges T, Kitchel SW, Wehrmeister RM: A CPU utilization limit for massively parallel MIMD computers. Proceedings of 4th Symposium on the Frontiers of Massively Parallel Computation, October 1992, McLean, Va, USA 83–92.
Schmit H, Whelihan D, Tsai A, Moe M, Levine B, Taylor RR: PipeRench: A virtualized programmable datapath in 0.18 micron technology. Proceedings of IEEE Custom Integrated Circuits Conference (CICC '02), May 2002, Orlando, Fla, USA 63–66.
Goldstein SC, Schmit H, Budiu M, Cadambi S, Moe M, Taylor RR: PipeRench: a reconfigurable architecture and compiler. Computer 2000, 33(4):70–77. 10.1109/2.839324
Goldstein SC, Schmit H, Moe M, et al.: PipeRench: a coprocessor for streaming multimedia acceleration. Proceedings of 26th IEEE International Symposium on Computer Architecture (ISCA '99), May 1999, Atlanta, Ga, USA 28–39.
Cadambi S, Weener J, Goldstein SC, Schmit H, Thomas DE: Managing pipeline-reconfigurable FPGAs. Proceedings of 6th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '98), February 1998, Monterey, Calif, USA 55–64.
Schmit H: Incremental reconfiguration for pipelined applications. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 47–55.
Levine BA, Schmit H: Efficient application representation for HASTE: hybrid architectures with a single, transformable executable. Proceedings of 11th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '03), April 2003, Napa Valley, Calif, USA 101–110.
Ebeling C, Cronquist DC, Franklin P: RaPiD - reconfigurable pipelined datapath. Proceedings of 6th International Workshop on Field-Programmable Logic and Applications (FPL '96), September 1996, Darmstadt, Germany 126–135.
Ebeling C, Cronquist DC, Franklin P, Fisher C: RaPiD - a configurable computing architecture for compute-intensive applications. In Tech. Rep. TR-96-11-03. University of Washington, Department of Computer Science & Engineering, Seattle, Wash, USA; 1996.
Ebeling C, Cronquist DC, Franklin P, Secosky J, Berg SG: Mapping applications to the RaPiD configurable architecture. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 106–115.
Cronquist DC, Franklin P, Berg SG, Ebeling C: Specifying and compiling applications for RaPiD. Proceedings of 6th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '98), April 1998, Napa Valley, Calif, USA 116–125.
Cronquist DC, Fisher C, Figueroa M, Franklin P, Ebeling C: Architecture design of reconfigurable pipelined datapaths. Proceedings of 20th Anniversary Conference on Advanced Research in VLSI, March 1999, Atlanta, Ga, USA 23–40.
Mirsky E, DeHon A: MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources. Proceedings of 4th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '96), April 1996, Napa Valley, Calif, USA 157–166.
Kapasi UJ, Dally WJ, Rixner S, Owens JD, Khailany B: The imagine stream processor. Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, September 2002, Freiberg, Germany 282–288.
Khailany B, Dally WJ, Kapasi UJ, et al.: Imagine: media processing with streams. IEEE Micro 2001, 21(2):35–46. 10.1109/40.918001
Owens JD, Rixner S, Kapasi UJ, et al.: Media processing applications on the Imagine stream processor. Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, September 2002, Freiberg, Germany 295–302.
Hauser JR, Wawrzynek J: Garp: a MIPS processor with a reconfigurable coprocessor. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 12–21.
Callahan TJ, Hauser JR, Wawrzynek J: The Garp architecture and C compiler. Computer 2000, 33(4):62–69. 10.1109/2.839323
Callahan T: Kernel formation in Garpcc. Proceedings of 11th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '03), April 2003, Napa Valley, Calif, USA 308–309.
Hauck S, Fry TW, Hosler MM, Kao JP: The Chimaera reconfigurable functional unit. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 87–96.
Hauck S, Hosler MM, Fry TW: High-performance carry chains for FPGAs. Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '98), February 1998, Monterey, Calif, USA 223–233.
Hoare R, Tung S, Werger K: A 64-way SIMD processing architecture on an FPGA. Proceedings of 15th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS '03), November 2003, Marina del Rey, Calif, USA 1: 345–350.
Dutta S, Wolfe A, Wolf W, O'Connor KJ: Design issues for very-long-instruction-word VLSI video signal processors. Proceedings of IEEE Workshop on VLSI Signal Processing, IX, October–November 1996, San Francisco, Calif, USA 95–104.
Capitanio A, Dutt N, Nicolau A: Partitioned register files For VLIWs: a preliminary analysis of tradeoffs. Proceedings of 25th Annual International Symposium on Microarchitecture (MICRO '92), December 1992, Portland, Ore, USA 292–300.
Trimaran, An Infrastructure for Research in Instruction-Level Parallelism 1998, https://doi.org/www.trimaran.org
Jones AK, Hoare R, Kourtev IS, et al.: A 64-way VLIW/SIMD FPGA architecture and design flow. Proceedings of 11th IEEE International Conference on Electronics, Circuits and Systems (ICECS '04), December 2004, Tel Aviv, Israel 499–502.
Lee C, Potkonjak M, Mangione-Smith WH: MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. Proceedings of 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '97), December 1997, Research Triangle Park, NC, USA 330–335.
Degener J, Bormann C: GSM 06.10 lossy speech compression library. available on-line: https://doi.org/kbs.cs.tu-berlin.de/~jutta/toast.html
Golub G, Loan CFV: Matrix Computational. Johns Hopkins University Press, Baltimore, Md, USA; 1991.
Hassibi B, Vikalo H: On sphere decoding algorithm. I. Expected complexity. submitted to IEEE Transactions on Signal Processing, 2003
Hassibi B, Vikalo H: On sphere decoding algorithm. II. Examples. submitted to IEEE Transactions on Signal Processing, 2003
Chobe Y, Narahari B, Simha R, Wong WF: Tritanium: augmenting the trimaran compiler infrastructure to support IA64 code generation. Proceedings of 1st Annual Workshop on Explicitly Parallel Instruction Computing Architectures and Compiler Techniques (EPIC '01), December 2001, Austin, Tex, USA 76–79.
About this article
Cite this article
Hoare, R.R., Jones, A.K., Kusic, D. et al. Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions. EURASIP J. Adv. Signal Process. 2006, 046472 (2006). https://doi.org/10.1155/ASP/2006/46472
- Soft Core
- Benchmark Suite
- Signal Processing Application
- Performance Multiplier
- RISC Processor