 Research
 Open access
 Published:
Costeffective multistandard video transform core using timesharing architecture
EURASIP Journal on Advances in Signal Processing volumeÂ 2019, ArticleÂ number:Â 49 (2019)
Abstract
This paper presents a costeffective twodimensional (2D) inverse discrete cosine transform (IDCT) for supporting multiple standards of MPEG 1/2/4, H.264, VC1, and HEVC. The proposed approach employs a time allocation scheme to enable the simultaneous processing of the first and second dimensions in order to enhance data throughput and attain hardware utilization of 100%. The proposed onedimensional (1D) IDCT uses distributed arithmetic (DA) in conjunction with factor sharing (FS) within a hardware sharing architecture. Four parallel computation streams are employed to enhance the throughput rate as four times of operation frequency. The efficacy of this approach was verified by fabricating a test chip using the Taiwan Semiconductor Manufacturing Company Limited (TSMC) 90 nm Complementary MetalOxideSemiconductor (CMOS) process. The inverse transform core has an operating frequency of 200 MHz and a throughput of 800 Mpels/s with a gate count of 27.2 K.
1 Introduction
Video compression is used to reduce redundancy in videorelated information in order to increase storage capacity and transmission rates. The inverse discrete cosine transform (IDCT) is an application used in video decompression applications to obtain the initial picture. A variety of transform dimensions and coefficients have been developed by the International Organization for Standardization [1], the International Telecommunication Union Telecommunication Standardization Sector [2, 3], and Microsoft Corporation [4].
Numerous transform architectures supporting single standards have been developed to reduce hardware costs [5â€“7]. The architecture for the Video Codec 1 (VC1) standard uses matrix decomposition, additions, and row/column permutations to reduce hardware costs [5]. The twodimensional (2D) forward and inverse discrete cosine transform (DCT and IDCT) use a single 1D transform core and a transpose memory with time division to enhance areaefficiency and achieve highthroughput rates for the MPEG standard [6]. High Efficiency Video Coding (HEVC) is the most recent video coding standard, presented in [7]. This architecture uses an efficient constant matrix multiplication scheme implemented in HEVC for DCT. As outlined in [5â€“7], high areaefficiency and highthroughput rates can be achieved using these architectures to support single standard; however, this approach is unable to match various transforms supporting different standards that designed to support a single standard.
The parallel structure illustrated in Fig.Â 1a supports multiple standards and hardware sharing schemes outlined in [8â€“11]: This includes multiple onedimensional (1D) fast inverse transform algorithms using sparse matrix decomposition and shiftandadd computation [8, 9]. The common sharing distributed arithmetic (CSDA) algorithm combines factor sharing (FS) and distributed arithmetic (DA) sharing techniques with in the DCT transform presented in [10]. The parallel structure in Fig.Â 1a combines two similar 1D transform cores with a single transpose memory with parallelinput paralleloutput to achieve high throughput rates. Unfortunately, a parallel structure imposes high hardware costs. This has led to the development of multiplex structures, such as that presented Fig.Â 1b, comprising single 1D transform core, a transpose memory, and a multiplexer [12â€“14]. The twodimensional (2D) IDCT in [12] comprises a single 1D IDCT core, a multiplexer to control the operation of the firstdimensional (1stD) or seconddimensional (2ndD) transform and a transpose memory. The architecture in [13] uses a delta coefficient matrix with resource sharing to perform a variety of DCTbased transforms with support for multiple standards. The technique in [14] integrates four 4Ã—4 and two 8Ã—8 transforms using an extended transform and block multiplication. Multiplex structures enable a reduction in hardware cost; however, this approach can compromise overall throughput due to the fact that the 1stD and 2ndD operations are performed separately.
With the aim of enhancingthroughput and achieving hardware utilization of 100%, we developed a 2D IDCT comprising a single 1D IDCT with a transpose memory (TMEM) circuit, as shown in Fig.Â 1c. The resulting design provides four computation paths in conjunction with a time allocation scheme to enable the simultaneous processing of the 1stD and 2ndD. The proposed 1D IDCT core employs a CSDA algorithm to reduce hardware cost for supporting multiple standards (MPEG 1/2/4, H.264, and VC1, optional for HEVC). The proposed design provides four computation paths for the IDCT through the implementation of a time allocation scheme, which provides high throughput at low cost as well as support for multiple standards. The 2D IDCT core presented in this study includes the following features:
The remainder of this paper is organized as follows. In Section 2, we present the mathematical derivation of 2D IDCT transform and the proposed architecture using timeallocation scheme. A comparison of synthesis results and verylargescale integration (VLSI) implementation is outlined in Section 3, and conclusions are drawn in Section 4.
2 Method
2.1 Mathematical derivation of 2D IDCT transform
The 2D IDCT can be expressed in matrix notation as two 1D IDCTs. Consider the 1D IDCT, the inner product for general matrix multiplication is written as
2.1.1 8point IDCT transform algorithm
In this subsection, we introduce the proposed 8point inverse transform, which is defined as follows:
where
The MPEG, H.264, VC1, and HEVC standards use the same 8point coefficient structure; therefore, the 8point transform used for these standards can be obtained via the same mathematical derivation. The symmetry of the IDCT coefficient makes it possible for the 1D 8point IDCT in (1) to be expressed as (4) and divided into an even part z_{e} and an odd part z_{o}, as listed in (5) and (6), respectively:
where z_{e} and z_{o} can be expressed as
Even part z_{e} can be further decomposed into even part z_{ee} and odd part z_{eo}:
where z_{ee} and z_{eo} are expressed as
The even part of z_{ee} and z_{eo} in (8) and (9) can be expressed as follows:
Odd part z_{o} in (6) can also be expressed as follows:
Using (6) and (8)(9), the even and odd portions of the output transform can be calculated separately. If the even input data of 1D IDCT is calculated using an even circuit, then the odd input data of 2D IDCT can be calculated simultaneously using an odd circuit, which enables the sharing of hardware resources simply by reordering the architecture.
2.1.2 4point IDCT transform algorithm
The 1D H.264, VC1, and HEVC 4point integer is defined as follows:
where
Similar to the even part of 8point transform in (5), 1D 4point standards can be derived using the format presented in (8) and (9). This enables the sharing of hardware resources in H.264/VC1/HEVC 8point and 4point integer transforms. According to (13), the throughput of a 4point transform is a half that of an 8point transform. To maintain the same throughput in 4point and 8point transforms, we enabled support for the simultaneous processing of data related to two 4Ã—4 matrices for H.264/VC1 4Ã—4 and VC1 4Ã—8/8Ã—4 standards. Thus, the input data X_{4} in (13) includes data for two 4Ã—4 matrices, which necessitates the inclusion of circuits with a symmetrical architecture for the computation of even and odd parts, thereby enabling the simultaneous computation of two input 4Ã—4 matrices based on 4point standards. Under these conditions, the same throughput can be obtained using the 4point or 8point transforms.
2.2 Proposed architecture using time allocation scheme
As shown in Fig.Â 2, the proposed 2D IDCT architecture core was implemented using a single 1D IDCT transform core with one transposed circuit. The 1D IDCT core includes three reorders, each of which includes two 8input4output multiplexers used to differentiate even and odd part transforms. Even common sharing distributed arithmetic (ECSA) and odd common sharing distributed arithmetic (OCSA) compute data related to the even and odd part transforms of IDCT in order to enable CSDAbased computation. In the even part circuit, ECSA used to compute the addition and subtraction results associated with the even data, as in (8) and (9). Furthermore, four 2input multiplexers are used for the selection of either 8point or 4point transform standards. The multidual mode computation (MDMC) circuit comprises four 2input modified butterfly circuits for addition and subtraction computations using even and odd part data, as in (4). The proposed architecture includes three pipeline stages for the even part circuit and odd part circuits to enable high speed computation. The architecture of the 2D IDCT transform core and time allocation scheme are detailed in the following.
2.2.1 Architecture of the 2D IDCT core

1.
Reorder: We implemented a reorder circuit comprising two 8input4output multiplexers for the separation of the two dimensions of the transform in order to enable simultaneous computation. For example, in Reorder 1, when the selected signal is zero (S_{1}=0), even part input data X in the first dimension of the transform are fed into ECSA and odd part data of the second dimension are fed into OCSA. Similarly, when the selected signal is one (S_{1}=1), the odd part input data X are fed into the OCSA for the 1D transform and the even part of feedback data z_{2D} are fed into ECSA for the 2D transform. The two dimensions of the transform can be computed simultaneously, which means that the hardware resources can be shared via timeallocation, thereby reducing area overhead and achieving 100% hardware utilization.

2.
ECSA and OCSA: In the 1D transform, the calculation of the even and odd parts can be respectively performed using ECSA and OCSA, both of which are based on CSDA architecture. Mode selected signals Mode_{e} in the ECSA and Mode_{o} in the OCSA modules are used to select standards based on TableÂ 1. As shown in Fig.Â 2, G_{ee0} and G_{ee1} are the results of data distribution in the ECSA module. As in (10) and (11), outputs of ECSA modules z_{ee} and z_{eo} comprise two coefficient vectors, \(\left [ \begin {array}{cc} c_{4} & c_{4} \end {array} \right ] \) and \(\left [ \begin {array}{cc} c_{2} & c_{6} \end {array} \right ] \), respectively. The transform outputs of z_{ee} and z_{eo} require only two adders, two subtractors, a distributed data module, and an even addertree (EAT) to obtain output results. Data distribution modules are used to assign various nonzero values for each weight, whereupon the EAT sums them up to complete the transform \(\left [ \begin {array}{cccc} z_{ee1} & z_{ee2} & z_{eo1} & z_{eo2} \end {array} \right ]\). The result is then fed into the even butterfly module to calculate even result z_{e}, as in (7). The even butterfly module is combined with two adders, two subtractors, and four multiplexers, which are used to select either the 8point or 4point transform. The odd part transform in (12) can be implemented in the OCSA module using CSDAbased computation using two adders, four subtractors, and an OAT. To maintain the same throughput in the 4point and 8point transforms, ECSA and OCSA can be designed as symmetrical circuits based on the fact that the input matrix comprises data from two input matrices in VC1/H.264 4Ã—4 and VC1 4Ã—8/8Ã—4 standards. Moreover, the ECSA and OCSA modules are both provided pipelines before the outputs to enable highspeed computation. The proposed symmetrical architectures of the ECSA and OCSA modules are presented in Fig.Â 2.

3.
MultiDual Mode Computation (MDMC): Generally, the hardware utilization of two input butterfly module using adders and subtractors is only 50%. To overcome this shortcoming and increase data rates, this study proposes an alternative to the MDMC architecture in which we include four 2input dual mode computations (DMC2s), as shown in Fig.Â 3. Equation (4) could be realized using MDMC. Registers in DMC2 comprise oneword registers in which control signals are used to control the input and output of data or have it held in the register. To increase data rates, we propose a circuit module comprising four DMC2s, as shown in Fig.Â 3. First, the signals selected by the multiplexers are 0s and the first even data in the 1stD of the transform (from the input of M_{0},M_{2},M_{4},M_{6}) are input and stored in oneword registers (MR1_{0},MR1_{1},MR1_{2},MR1_{3}). Second, the signals selected by the multiplexers are 1s and the first odd data of the 1stD of the transform (from the input of MDMC M_{0},M_{2},M_{4},M_{6}) are added to the first even data previously stored in the registers and fed to the output (y_{0},y_{2},y_{4},y_{6}), with substrate results stored in registers (MR1_{0},MR1_{1},MR1_{2},MR1_{3}). In the third step, the second even data of the 1stD (from input M_{0},M_{2},M_{4},M_{6}) are input and stored in registers (MR1_{0},MR1_{1},MR1_{2},MR1_{3}) and substrate results previously stored in registers are fed to outputs (y_{1},y_{3},y_{5},y_{7}). The data of the two dimension transform are fed into MDMC (from input of M_{1},M_{3},M_{5},M_{7}) to be run simultaneously. This makes it possible for the 1stD and 2ndD transforms to run simultaneously. Thus, the adders and subtractors in MDMC are able to achieve 100% hardware utilization.
2.2.2 Timeallocation scheme
FigureÂ 4 present a flow chart showing the timing of the proposed time allocation scheme, which enables the simultaneous computation of the two dimensions of the transform after the 22th clock cycle, thereby increasing hardware utilization to 100%. In the proposed scheme, the MUX select signals (S1, S2, and S3) to control the switching of the transform among dimensions. The timing of ECSA and OCSA is based on clock cycles in which even and odd part data of the 1stD (or 2ndD) of the transform must be computed in ECSA and OCSA modules. MR1_{n} and MR2_{n} (n=0â€¦3) are clock cycles requiring that data be stored in the registers of MDMC. FigureÂ 4 also presents the timing allocation of outputs z_{2D} and y. The allocation of data computation is described in the following.

1.
1D data computation: During the first cycle, the first 1D even data are fed into pipeline with no input data from the 2D transform. During the 1st2nd cycle, the first 1D odd data are fed into pipeline, and the first 1D even data stored in the pipeline in the previously cycle are input and computed in the ECSA module, the results of which are stored in registers (MR1_{0},MR1_{1},MR1_{2},MR1_{3}) in the MDMC module. During the 2nd3rd cycle, the first 1D odd data are fed into the OCSA module and the results are fed into the MDMC to be added to the even data previously stored in the register. Results from the MDMC module are sent to the transpose circuit, and the substrate results are stored in registers (MR1_{0},MR1_{1},MR1_{2},MR1_{3}) in the MDMC module. During the 3rd4th cycle, the computation results of the second even data from ECSA are stored in registers (MR1_{0},MR1_{1},MR1_{2},MR1_{3}) and substrate results previously stored in registers are simultaneously fed into the transpose circuit. Thus, the latency associated with the computation of 1D transform data is 4 clock cycles.

2.
2D data computation: Starting in the 3rd cycle, data from the first dimension of the transform are sent to the transposed circuit. Due to the TMEM in the transposed circuit, transposed 2D computation data z_{2D} are sent to Reorder 1 during the 22th23th cycles while the first 2D even data and 11th 1D odd data are simultaneously sent to the pipeline. During the 23th24th cycle, first 2D odd data shift to the pipeline. The first 2D even data are sent to ECSA to be computed and the results are stored in registers (MR2_{0},MR2_{1},MR2_{2},MR2_{3}) in the MDMC module. At the same time, the 11th 1D odd data are fed into the OCSA module and results are fed to the MDMC module. During the 24th25th cycle, 2D odd data are fed into the OCSA module and added to the even data previously stored in registers (MR2_{0},MR2_{1},MR2_{2},MR2_{3}). The addition results are fed to output pots y and the subtraction results are simultaneously stored in registers (MR2_{0},MR2_{1},MR2_{2},MR2_{3}) in the MDMC module. During the 25th26th cycles, the outputs of 2D IDCT transform y are equal to the subtraction results stored in registers in the MDMC.

3.
Hardware utilization: As shown in Fig.Â 4, between the 1st and 22th cycles, the hardware utilization of the proposed 2D IDCT core transform core is 50%. After the 22th cycle, hardware utilization increases to 100% with total latency of 26 clock cycles.
In summary, the 1D IDCT proposed in this study is based on the CSDA algorithm. The proposed method for hardware sharing requires 4 clock cycles for computation using the 1D IDCT, thereby significantly reducing hardware resource requirements. The proposed time allocation scheme enables the simultaneous calculation of the first and second dimensions of the 1D IDCT transform following the 22th cycle, thereby achieving hardware utilization of 100%. The result is a highperformance inverse transform engine with high accuracy, low cost, and high throughput.
3 Results and discussion
In this section, we outline the methods used to evaluate the accuracy of the proposed 2D IDCT core. Then, we compare the proposed 2D IDCT architecture with existing methods and describe the characteristics of an actual implementation of the proposed chip.
3.1 Accuracy testing of proposed 2D IDCT core
The MATLAB tool was used to compute the 2D DCT core transform of the original test image and perform computations, the results of which were used as input data for the proposed 2D IDCT transform in which we calculated the peak signaltonoise ratio (PSNR). In accuracy testing for proposed 2D IDCT core system, we use the images (512 Ã— 512 pixels; 8bit) to evaluate the accuracy of the 2D IDCT core system, and the average PSNR is 43.04 dB. PSNR is a quality indicator used to evaluate the quality of image compression. Clearly, the quality of the image decompression is excellent. FigureÂ 5 presents a flowchart illustrating the methods used to verify the accuracy of the proposed 2D IDCT.
3.2 Characteristics of chip implementation
The proposed 2D IDCT core was implemented using RTL hardware based on the TSMC 90 nm standard CMOS process. Following the Synopsys Design Compiler synthesis, the Cadence Encounter digital implementation (EDI) was used for placement and routing (P&R). The proposed 2D IDCT core has a latency of 26 clock cycles and is capable of being operated at 200 MHz with core area 703Ã—702Î¼m^{2}. The characteristics and the core layout of the test chip with the proposed 2D IDCT architecture are presented in Fig.Â 6.
3.3 Comparison to existing works
TableÂ 2 presents a comparison of the proposed and previous methods. In [7], Meher et al. proposed powerefficient structures for folded and fullparallel implementations of 2D DCT as well as an efficient constant matrix multiplication scheme to derive parallel architectures for 1D integer DCTs of various lengths used in the HEVC standard. The parallel architecture and two 1D core with a TMEM requires huge area overhead. The In [8], Fan et al. combined sparse matrix decomposition with shiftandadd computation in a new multiple 1D fast IDCT in conjunction with hardware sharing. Compared to discrete 1D hardware, that method reduces hardware costs by 45%. However, two 1D core with a TMEM consumes large circuit area for applied to the 2D transform. In [9], Fan et al. adopted sparse matrix decomposition for multiple 1D and 2D fast forward/inverse transform algorithms with hardwaresharing, which reduced the gate count by 53.4%. In [13], Lee et al. proposed the novel concept of a delta coefficient matrix in which resources, such as adders and shifters, are combined. In [15], Part et al. proposed a flexible architecture for the transform function, the memory control scheme of which is able to store data for multiple standards. Unlike the works in [7â€“9, 13], the architecture of this flexible transform core uses only one 1D IDCT core with a TMEM to save the area cost.
TableÂ 2 provides a comparison of the proposed architecture with previous multiple transform methods. The synthesis results show that the maximum frequency can be increased to 200 MHz, resulting in a maximum data throughput of 800 Mpixels/s when implemented using TSMC 90 nm CMOS standard cell technology. This resulted in a gate count of 27.2 K. Clearly, the proposed 2D IDCT provides high hardware efficiency, as outlined in the following:
In summary, the proposed 2D IDCT architecture provides superior hardware efficiency than that found in existing designs. According to have a fair comparison, the operational frequency can be normalized by the CMOS technology. The delay information for the 180 nm, 130 nm, and 90 nm [16], which are 77.2 ps, 34.7 ps, and 26.5 ps for an inverter delay, is used to normalize the frequency. Thus, the definition of the frequency normalized function is expressed as follows.
where the Freq_{n} indicates the operational frequency normalized to 90 nm technology, and Freq_{w} is the operational frequency that want to be normalized. The Delay_{90 nm} and Delay_{w} are the inverter delay time for the 90 nm and the technology which want to be normalized, respectively. The proposed circuit can also achieve a good hardware efficiency for supporting multiple transform standards.
3.4 Proposed architecture applied to HEVC and future video coding standards
High Efficiency Video Coding (HEVC) is the most recent video coding standard proposed by the ITUT Video Coding Experts Group and Moving Picture Experts Group [17]. This objective behind this standard is improved compression performance. The proposed hardware sharing architecture accommodates this standard at 8point dimensions. The synthesis results of architecture including HEVC 8point dimensions of gate counts and throughput rate are 33.8 K and 800 Mpixels/s, respectively. The even and odd parts of the CSDA algorithm make the transform coefficients possible to meet HEVC standards via hardware sharing. The proposed method can apply to the larger transform size using the rowcolumn decomposition. According to the symmetric property, the Npoint DCT/IDCT can be decomposed into even/odd N/2point DCT/IDCT. The even N/2point DCT/IDCT can be further decomposed into even/odd N/4point DCT/IDCT. Thus, the Npoint can be decomposed into odd part N/2, N/4, â€¦, 8point DCT/IDCT and even part 8point DCT/IDCT. The odd part transforms can be implemented using the same method as the proposed OCSA, and the even part can be implemented using the same method as the pro posed core. This enables the hardware to support a wider range of standards, thereby making it more competitive. The operation of transformation in the video coding always utilizes the symmetric matrix to transfer the image/video to the spectral domain achieving the compression in the spectral domain. Thus, the proposed method can be applied to the existing video coding, such as MPEG 1/2/4, H.264, VC1, and HEVC, and even recommended to the future video coding (H.266 [18]).
4 Conclusions
This paper presents a cost effective 2D IDCT for the decoding of MPEG 1/2/4, H.264, and VC1. The inclusion of a time allocation scheme enables the 1D inverse transform core to calculate the first and second dimensions using four computation paths simultaneously, thereby achieving 100% hardware utilization. Consequently, the proposed scheme outperforms nearly all previous designs with regard to hardware efficiency support for multiple standards.
Availability of data and materials
Not applicable.
Abbreviations
 1D:

Onedimensional
 2D:

Twodimensional
 1stD:

Firstdimensional
 2ndD:

Seconddimensional
 CMOS:

Complementary MetalOxideSemiconductor
 DA:

Distributed arithmetic
 DCT:

Discrete cosine transform
 DMC2s:

Dual mode computations
 ECSA:

Even common sharing distributed arithmetic
 EDI:

Encounter digital implementation
 FS:

Factor sharing
 HEVC:

High Efficiency Video Coding
 IDCT:

Inverse discrete cosine transform
 MDMC:

Multidual mode computation
 OCSA:

Odd common sharing distributed arithmetic
 P&R:

Placement and routing
 PSNR:

Peak signaltonoise ratio
 TMEM:

Transposed memory
 TSMC:

Taiwan Semiconductor Manufacturing Company Limited
 VLSI:

Verylargescale integration
References
P Noll, MPEG Digital Audio Coding. IEEE Sig. Process. Mag.14:, 59â€“81 (1997).
A. Luthra, G. J. Sullivan, T. Wiegand, Introduction to the special issue on the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol.13:, 557â€“559 (2003).
T. Wiegand, G. J. Sullivan, G. Bjntegaard, A. Luthra, Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol.13:, 560â€“576 (2003).
H. Kalva, J. B. Lee, The VC1 video coding standard. IEEE Multimed.14:, 88â€“91 (2007).
C. P. Fan, G. A. Su, Fast algorithm and lowcost hardwaresharing design of multiple integer transforms for VC1. IEEE Trans. Circ. Syst. II Video Technol. 56(10), 788â€“792 (2009). https://doi.org/10.1109/tcsii.2009.2030366.
Y. H. Chen, R. Y. Jou, C. W. Lui, A highthroughput and areaefficient video transform core with a time division strategy. IEEE Trans. VLSI Syst.22:, 2268â€“2277 (2014).
P. K. Meher, S. Y. Park, B. K. Mohanty, K. S. Lim, C. Yeo, Efficient integer DCT srchitecture for HEVC. IEEE Tran. Circ. Syst. Video Technol.24:, 168â€“178 (2014).
C. P. Fan, C. H. Fang, C. W. Chang, S. J. Hsu, Fast multiple inverse transforms with lowcost hardwaresharing design for multistandard video decoding. IEEE Trans. Circ. Syst. II. 58:, 517â€“521 (2011).
C. P. Fan, C. W. Chang, S. J. Hsu, Cost effective hardware sharing design of fast algorithm based multiple forward and inverse transforms for H.264/AVC, MPEG1/2/4, AVS, and VC1 video encoding and decoding applications. IEEE Tran. Circ. Syst. Video Technol.24(4), 714â€“720 (2014). https://doi.org/10.1109/tcsvt.2013.2277580.
Y. H. Chen, J. N. Chen, T. Y. Chang, C. W Lu, Highthroughput multistandard transform core supporting MPEG/H.264/VC1 using common sharing distributed arithmetic. IEEE Trans. VLSI Syst.22:, 463â€“474 (2014).
S. Shen, W. Shen, Y. Fan, X. Zeng, in 2012 IEEE International Conference on Multimedia and Expo. A unified 4/8/16/32point integer IDCT architecture for multiple video coding standards (IEEE, 2012). https://doi.org/10.1109/icme.2012.7.
Y. C. Chao, H. H. Tsai, Y. H. Lin, J. F. Yang, B. D. Liu, in 2007 IEEE International Conference on Multimedia and Expo. A novel design for computation of all transforms in H.264/AVC decoder (IEEE, 2007). https://doi.org/10.1109/icme.2007.4285050.
S. Lee, K. Cho, Architecture of transform circuit for video decoder supporting multiple standards. Electron. Lett.44:, 274â€“275 (2008).
W. Hwangbo, C. M. Kyung, A multitransform architecture for H.264/AVC highprofile coders. IEEE Trans. Multimedia. 12:, 157â€“167 (2010).
J. H. Part, S. H. Lee, K. S. Lim, J. H. Kim, S. Kim, in 2006 IEEE International Symposium on Circuits and Systems. A flexible transform processor architecture for multiCODECs (JPEG, MPEG2, 4 and H.264) (IEEE, 2006). https://doi.org/10.1109/iscas.2006.1693841.
A. Stillmaker, B. Baas, Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm. Integration. 58:, 74â€“81 (2017).
G. J. Sullivan, J. R. Ohm, W. J. Han, T. Wiegand, Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circ. Syst. Video Technol.22:, 189 (2012).
A. C. Mert, E. Kalali, I. Hamzaoglu, in 2017 IEEE 7th International Conference on Consumer Electronics  Berlin (ICCEBerlin). An FPGA implementation of future video coding 2D transform (IEEE, 2017). https://doi.org/10.1109/icceberlin.2017.8210582.
Acknowledgements
The authors would like to thank the National Chip Implementation Center (CIC), Taiwan, for providing the electronic design automation tools.
Funding
This work was supported in part by the Ministry of Science and Technology of Taiwan under project 1072221E182066 and Chang Gung Memorial HospitalLinkou under project CMRPD2H0051, CMRPD2G0312, CMRPD2H0301, and CIRPD2F0013.
Author information
Authors and Affiliations
Contributions
Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Tseng, YH., Chen, YH. Costeffective multistandard video transform core using timesharing architecture. EURASIP J. Adv. Signal Process. 2019, 49 (2019). https://doi.org/10.1186/s1363401906451
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363401906451