# A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding

- Nghia Doan
^{1}, - Tae Sung Kim
^{1}, - Chae Eun Rhee
^{2}Email authorView ORCID ID profile and - Hyuk-Jae Lee
^{1}

**2017**:78

https://doi.org/10.1186/s13634-017-0513-9

© The Author(s). 2017

**Received: **29 June 2016

**Accepted: **6 November 2017

**Published: **22 November 2017

## Abstract

High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.

## Keywords

## 1 Introduction

The High-Efficiency Video Coding (HEVC) [1–5] standard, the latest video coding standard, is designed to replace the previous H.264/AVC standard owing to the fact that HEVC not only preserves the video compression quality of H.264/AVC but also reduces the bitrate by as much as 50%. However, this achievement results in a substantial increase of the encoding complexity or the encoding time. In HEVC, the most complicated block is the motion estimation (ME), accounting for more than 50% of the encoding complexity. Therefore, any complexity reduction in the ME can make a significant impact on the complexity of the entire HEVC standard.

In the integer ME (IME) part of all video encoders, the use of a full search algorithm usually guarantees the best motion vectors (MVs) while also significantly increasing the encoding complexity. Hence, fast IME algorithms have been developed with the aim of greatly decreasing the required computation while also attempting to preserve the video quality. Numerous techniques [6–8] have specifically been introduced to work with block-based IME, and these are applied in different video coding standards ranging from MPEG1/H.261 to MPEG10/H.264/AVC, also known as the cross search algorithm (CSA), the three-step search (3SS), and the four-step search (4SS) algorithms. In HEVC, the TZ search algorithm is adopted in the HEVC test module (HM) software. The test zone (TZ) search algorithm is very efficient when used to obtain accurate MV results through its adaptive use of diamond and raster search algorithms. Similar to many other search algorithms, the TZ search algorithm undertakes ME for prediction units (PUs) sequentially, with each PU tracking its own search points. This is designed assuming a sequential execution in software implementation and, thus, is not proper when applied for hardware implementation owing to the fact that parallelism is not explicitly exploited.

The proposed hardware-oriented concurrent TZ Search aims to extend the search positions of each PU partition in the same coding unit (CU), whereas the total search position sets tested in this CU remain the same. In the proposed algorithm, the search paths of the independent PU partitions are summed. To achieve this goal, the original TZ Search is modified in a manner such that it can be applied in parallel for all PU partitions, representing a completely different approach compared to the original TZ Search as well as all other fast TZ search-based IME algorithms. When using the conventional TZ search algorithm, temporal redundancy of the search positions between all PUs arises because searching is performed for each PU sequentially. In contrast, as all PUs are examined at the same time in the proposed algorithm, the temporal redundancy is converted to spatial redundancy, which is easily removed by simple comparisons. To the best of our knowledge, this paper is the first to address this problem. In addition, a search position reduction scheme is introduced for a further reduction of the complexity of the sum of the absolute difference (SAD) cost calculation. When all of the proposed schemes are applied, the complexity is reduced by as much as 54.54% along with a 0.84% improvement in the compression efficiency (i.e., bitrate reduction).

## 2 The conventional TZ search algorithm

Although the conventional TZ search algorithm can result in an improvement of more than 60% of the encoding time versus the full search algorithm, many optimization issues still exist. Attempting to solve these problems can decrease the encoding time significantly. Another problem associated with the traditional TZ search algorithm is that all search patterns used in the IME are fixed and limited in terms of the search range; therefore, the best match position can be trapped into a local minimum, which downgrades the video compression efficiency. A great amount of effort has been made to improve the original TZ search algorithm; the previous works mainly use two approaches. First, numerous modified search patterns are applied, such as pentagonal and hexagonal search patterns, as introduced in several earlier works [9–11]. The second approach determines the early termination conditions to reduce the computing time [9–15]. In one of these studies [9], due to the smaller number of search points, a hexagonal search pattern is used instead of the basic diamond pattern. Purnachand et al. noted that there is no need to continue to extend the search pattern in the first zonal search when the best distance is greater than the predetermined threshold iRaster. In a continuation of their work [10], rotating hexagonal patterns are shown to increase the peak signal-to-noise ratio (PSNR) slightly. In addition, another skipping method is presented based on the average motion cost among all previously examined search positions. Also motivated by the aforementioned study [10], in another work [12], the hexagonal and conventional diamond patterns are adaptively switched based on the MV differences (MVD) in the predicted neighboring blocks. Furthermore, a group of three search patterns [13] is selectively used for different directional movements, such as the horizontal or vertical directions, depending on the position of the best match point. Slightly different approaches have also been presented [14, 15]. A learning process was assessed in [14], where the search range of the current PU is determined by a learning algorithm following different search patterns for each particular established search range. In a related study [15], instead of finding the best match by considering all search positions within a search range, the best match point is found by solving a predictive model yielded by five fixed positions in the current search area. This prediction model is formulated from a statistical analysis derived from the MV cost distribution. It should also be noted that all previous works retain the sequential prediction order of all PU partitions in a CU. The experimental results of all related TZ search algorithms show that they all involve a trade-off between the complexity and the compression quality and that none of them can reduce the computation time or the complexity with a significant decrease in the bitrate.

## 3 The proposed hardware-oriented concurrent TZ search algorithm

### 3.1 The proposed algorithm

*J*

_{MV}) employed in HEVC to find the IMV yielding the smallest MV cost for that PU, this cost function is re-explained hereby as shown in [10]:

*λ*

_{M}is the Lagrangian multiplier, IMV and PMV are the current integer motion vector and the predicted integer motion vector of the current PU, respectively. The parameter

*R*(IMV−PMV) represents the rate needed to encode the motion vector difference between IMV and PMV.

### 3.2 An example of the proposed algorithm

_{hor_dir}(P, C), D

_{ver_dir}(P, C)}, where D

_{hor_dir}(P, C) is the distance from P to C in the horizontal direction and D

_{ver_dir}(P, C) is that value in the vertical direction. According to this assumption, the distance between the 2N × 2N PU search center and its best match P1 is max {6, 2} = 6. As the distance of a search point always refers to its search center, a shorter form of D(Pi) with

*i*= {1, 2, 3} is used to represent the distance from the best matches to the search centers of the 2N × 2N, N × 2N part-0, and N × 2N part-1 PUs, respectively. Following this assumption, D(P1) = 6, D(P2) = max {1, 1} = 1, and D(P3) = max {4, 0} = 4. Due to the fact that the distances between the current search center and the best match of the 2N × 2N and N × 2N part-1 PUs are 6 and 4, which are all greater than iRaster = 3, 2N × 2N, and N × 2N part-1 PUs need to enter the raster search phase, whereas when the distance of the N × 2N part-0 PU is 1, then a two-point search is applied to this PU prior to entering the raster search phase.

## 4 Complexity reduction schemes for the proposed algorithm

### 4.1 Search point reduction scheme for the diamond search

The diamond search is used in the first search and in every loop iteration of the refinement search. When some PU partitions generate their search points according to the diamond search pattern, not only can the redundant search points be removed, but there is also an opportunity to eliminate more points, which are assumed to be non-critical points. In general, the search positions which are close to their search centers are often more apt to be the best match as compared to the distant positions. The method presented in this subsection assumes that all search positions whose distances to their search centers are shorter than iRaster are the critical points. Therefore, these points are only removed if and only if they are redundant. On the other hand, for all search positions whose distances to their search centers are longer or equal to iRaster, a simple search position reduction scheme is applied with the expectation that the complexity of the proposed algorithm can be decreased without a dramatic drop in the quality of the compression.

The basic idea of this scheme is that for each non-critical position, a merge window is defined at each position; this non-critical point looks inside the predetermined window and determines whether or not there are other search points. If there is at least one search point which belongs to another PU inside the generated window, the current non-critical point is removed as the MV cost of the non-critical point can then be close to that of the search points found in its window. In addition, the aforementioned window size can be fixed for all non-critical positions or can be adaptively changed regarding the distance between this point and its search center. There are two important properties of the window size. First, this window of the current point cannot cover other search positions generated in the same PU partition to preserve the compression quality. Secondly, additional search positions can have larger windows owing to the fact that additional points are less apt to become the best match compared to search points located near the search centers. Given this assumption, the reduction scheme for search positions in the diamond search uses an adaptive window size, with the window size configured to different values, as shown in Table 2 in Sect. 5.

### 4.2 Search point reduction scheme for the raster search

When multiple PU partitions perform a raster search, because the AMVPs of those PU partitions contain fairly similar values, overlapping areas often exist among the raster search regions. It should be noted that because the distance between two adjacent positions in identical PUs in the raster search pattern is equal to iRaster, the distance between two adjacent search points that belong to different PUs is definitely shorter or equal to iRaster. Furthermore, by default, the value of iRaster is set to 5 in the HM software, meaning that in the overlapping areas, the distance between the raster search points in different PU partitions is considerably small. Therefore, these positions can be merged in order to decrease the MV cost calculation complexity.

_{n}is examined. In addition, despite the fact that the scheme introduced in this subsection requires sequential computations for each raster region, this process can nonetheless be pipelined with the MV cost calculation. Thus, it cannot increase the computation time for the proposed concurrent TZ search algorithm. When this scheme is applied, the coding efficiency is decreased negligibly, whereas the complexity calculated drops by 32.62%, as shown in the experimental results.

## 5 Experimental results

### 5.1 Complexity measurement

*N*.

In the example above, on average, the complexity of an 8 × 8 CU needed for the SAD calculation in the case of the concurrent TZ Search accounts for roughly 84% of the complexity when the same size CU undergoes fast integer motion estimation by the original TZ search algorithm.

### 5.2 Evaluation

The proposed concurrent TZ search algorithm is implemented into the HEVC reference software HM version 13.0, and the experimental results are compared with the original encoder in terms of the Bjøntegaard-Delta bitrate (BD-BR) and complexity, as noted in Sect. 4.1. In addition, video sequences from class A to E are examined with the low delay P configuration taken at 100 frames with various quantization parameters varying among 22, 27, 32, and 37.

_{Y}, BDBR

_{U}, and BDBR

_{V}are the Bjøntegaard-Delta bitrate values for the three video signal components Y, U, and V, respectively. As the Y component is the most important among the three signals, it has the largest gain which is equal to 6, and the other components only have the gain values equal to 1. Those weight values (6:1:1) for Y, U, and V signals are adopted from [5] during the HEVC standardization process. The last column gives the values representing the amount of SAD calculation complexity reduction denoted by SAD

_{cal}compared to that in the original algorithm. The bottom row of Table 1 shows the average values of the BD-BR and the complexity reduction of all test video sequences. As shown in Table 1, the proposed algorithm remarkably increases the compression quality when compared to the original TZ search algorithm, with the results 1.05% better in terms of BD-BR. Furthermore, in the proposed algorithm, the motion cost is estimated at all search points for all PUs. Among these search points, many of them are shared by different PUs. Consequently, because all redundant search positions are completely removed, the complexity of the SAD calculation of the proposed algorithm is also mitigated by 19.90% as compared to the original case.

The BD-BR values and complexity reduction results of the concurrent TZ Search

Class of test sequences | BD-BR (%) | Weighted BD-BR (wBDBR) (%) | Complexity reduction (SAD | ||
---|---|---|---|---|---|

Y | U | V | |||

Class A (2560 × 1600) | |||||

PeopleOnStreet | − 0.77 | − 2.22 | − 2.02 | − 1.11 | 13.36 |

Traffic | − 1.20 | − 1.26 | − 0.80 | − 1.16 | 13.21 |

Class B (1920 × 1080) | |||||

BasketballDrive | − 0.87 | − 0.61 | − 0.96 | − 0.85 | 43.41 |

BQTerrace | − 1.49 | − 1.20 | − 0.78 | − 1.36 | 26.86 |

Cactus | − 0.68 | − 0.78 | − 0.38 | − 0.66 | 17.61 |

Kimono | − 0.63 | − 0.16 | − 0.55 | − 0.56 | 25.15 |

ParkScene | − 0.89 | − 0.91 | − 0.89 | − 0.89 | 12.26 |

Class C (832 × 480) | |||||

Keiba | − 1.82 | − 2.26 | − 2.02 | − 1.90 | 14.07 |

BQMall | − 0.99 | − 1.30 | − 1.14 | − 1.05 | 13.48 |

BasketballDrill | − 0.62 | − 0.25 | − 1.46 | − 0.68 | 19.35 |

Flowervase | − 1.64 | − 1.64 | − 1.64 | − 1.64 | 60.81 |

PartyScene | − 0.57 | − 0.58 | − 0.68 | − 0.58 | 11.72 |

RaceHorses | − 2.80 | − 1.53 | − 1.56 | − 2.48 | 18.49 |

Class D (416 × 240) | |||||

BasketballPass | − 0.45 | − 1.96 | − 0.27 | − 0.62 | 7.32 |

BlowingBubbles | − 0.68 | 0.09 | − 0.05 | − 0.50 | 9.96 |

BQSquare | − 0.94 | 0.28 | 1.18 | − 0.52 | 14.24 |

RaceHorses | − 1.02 | − 1.89 | − 0.84 | − 1.11 | 7.43 |

Class E (1280 × 720) | |||||

FourPeople | − 0.77 | − 0.50 | − 0.99 | − 0.76 | 25.20 |

Johnny | − 1.87 | − 0.36 | − 0.62 | − 1.53 | 22.61 |

KristenAndSara | − 1.06 | − 1.11 | − 1.08 | − 1.07 | 21.53 |

Average | − 1.05 | 19.90 |

Various configurations of the proposed algorithm

Mode | Configuration properties | Weighted BD-BR (wBDBR) (%) | Complexity reduction (SAD | |||||
---|---|---|---|---|---|---|---|---|

AMVP replacement | Reduced search points in raster search | Reduced search points in diamond search W: window size D: best distance | ||||||

W = D/2 | W = D/4 | W = D/8 | W = D/16 | |||||

1 | Yes | No | No | No | No | No | − 1.05 | 19.90 |

2 | No | No | No | No | No | No | − 0.83 | 22.58 |

3 | Yes | Yes | No | No | No | No | − 0.92 | 32.62 |

4 | Yes | No | Yes | – | – | – | − 0.82 | 40.32 |

5 | Yes | No | – | Yes | – | – | − 0.90 | 31.67 |

6 | Yes | No | – | – | Yes | – | − 0.90 | 40.71 |

7 | Yes | No | – | – | – | Yes | − 0.88 | 24.39 |

8 | Yes | Yes | Yes | – | – | – | − 0.84 | 54.54 |

9 | Yes | Yes | – | Yes | – | – | − 0.91 | 41.02 |

10 | Yes | Yes | – | – | Yes | – | − 0.91 | 40.17 |

11 | Yes | Yes | – | – | – | Yes | − 0.88 | 36.20 |

_{cal}). For fair comparison, the previous works are implemented and tested using the same parameter configurations, video test sequences, and HM version as the proposed work. As shown in Table 3, all previous works cannot achieve both the compression efficiency and the complexity reduction at the same time. Those approaches merely reduce the number of search positions by using simple search patterns and/or early termination schemes. Consequently, it directly hurts the compression efficiency. By contrast, the proposed algorithm does not experience that compression efficiency-complexity trade-off because all PUs are searched at the same time in their summed paths, which enlarges the search point sets for each individual PU, keeping the search point set same for the whole CU. Thus, it gives more chances to find the accurate IMVs for each PU. While in this concurrent search process, only redundant points and non-critical points are removed enabling a great reduction in SAD calculation complexity without degrading the compression quality.

Performance evaluation of the proposed algorithm and previous works

Proposed | [9] | [10] | [11] | [22] | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Class of test sequences | wBD-BR (%) | SAD | wBD-BR (%) | SAD | wBD-BR (%) | SAD | wBD-BR(%) | SAD | wBD-BR (%) | SAD |

Class A (2560 × 1600) | ||||||||||

PeopleOnStreet | − 0.83 | 36.4 | 0.49 | 13.04 | 0.56 | 21.28 | 0.68 | 23.2 | 1.15 | 73.17 |

Traffic | − 0.99 | 22.96 | 0.05 | 3.55 | 0.09 | 7.31 | 0.17 | 22.08 | 0.3 | 32.57 |

Class B (1920 × 1080) | ||||||||||

BasketballDrive | − 0.69 | 43.61 | 0.05 | 12.01 | 0.24 | 18.32 | 0.27 | 18.82 | 0.37 | 71.41 |

BQTerrace | − 0.95 | 61.12 | 0.26 | 5.55 | 0.32 | 11.33 | 0.25 | 22.04 | 0.46 | 33.49 |

Cactus | − 0.6 | 56.17 | 0.01 | 6.38 | 0.05 | 13.94 | 0.09 | 18.58 | 0.13 | 63.85 |

Kimono | − 0.52 | 60.16 | 0 | 6.83 | − 0.06 | 15.8 | 0.03 | 17.32 | 0.09 | 70.01 |

ParkScene | − 0.81 | 55.61 | 0.02 | − 1.36 | − 0.02 | 4.08 | 0.01 | 15.42 | 0.17 | 36.48 |

Class C (832 × 480) | ||||||||||

Keiba | − 1.29 | 49.86 | 0.18 | 11.29 | 0.09 | 18.8 | 0.14 | 20.8 | 1.21 | 68.13 |

BQMall | − 0.78 | 52.32 | − 0.05 | 15.64 | − 0.16 | 21.46 | − 0.02 | 25.05 | 0.15 | 65.57 |

BasketballDrill | − 0.66 | 56.64 | 0.14 | 12.24 | 0.3 | 19.34 | 0.55 | 22.92 | 0.61 | 73.32 |

Flowervase | − 1.31 | 78.84 | 0.19 | 4.52 | 0.07 | 8.62 | 0 | 23.07 | 0.75 | 42.62 |

PartyScene | − 0.36 | 53.04 | 0.16 | 6.1 | 0.07 | 15.3 | 0.24 | 20.7 | 0.25 | 57.65 |

RaceHorses | − 1.01 | 59.52 | 0.14 | 11.92 | 0.12 | 19.78 | 0.38 | 21.22 | 0.72 | 75.28 |

Class D (416 × 240) | ||||||||||

BasketballPass | − 0.67 | 53.01 | 0.12 | 11.03 | 0.11 | 17 | 0.25 | 23.85 | 0.32 | 57.9 |

BlowingBubbles | − 0.86 | 55.74 | 0.25 | 2.71 | 0.31 | 9.01 | 0.12 | 20.88 | 0.22 | 42.08 |

BQSquare | − 0.66 | 61.75 | 0.23 | − 12.63 | 0.11 | − 10.99 | 0.3 | 15.51 | 0.11 | − 0.79 |

RaceHorses | − 0.91 | 57.23 | 0.22 | 11.18 | 0.17 | 20.1 | 0.53 | 22.58 | 0.54 | 71.17 |

Class E (1280 × 720) | ||||||||||

FourPeople | − 0.6 | 59.32 | 0.17 | 13.32 | 0.08 | 16.32 | 0.05 | 28.39 | 0.21 | 50.61 |

Johnny | − 1.24 | 58.15 | 0.21 | 9.67 | 0.21 | 12.59 | 0.52 | 27.47 | 0.67 | 36.99 |

KristenAndSara | − 1.04 | 59.44 | 0.11 | 9.3 | − 0.07 | 13.57 | − 0.09 | 25.75 | 0.12 | 45.33 |

Average | − 0.84 | 54.54 | 0.15 | 7.61 | 0.13 | 13.65 | 0.22 | 21.78 | 0.43 | 53.34 |

_{cal}for those works are 54.54 and 53.34%, respectively. It is worth to note that the SAD

_{cal}values from the proposed scheme are mostly concentrated on the average value of 54%, whereas those results from [22] are widely spread. In terms of compression efficiency, only the proposed algorithm shows a significant decrease in BD-BR at the average of − 0.84%. For further comparison, RD curves of [11, 22], the original TZ Search (OriginalTZS), and the proposed algorithm in mode 8 (ConTZS_M8) are examined in Fig. 13. In each chart, the horizontal axis shows the bitrates while the vertical axis shows the PSNR. The small chart inside a box for each test sequence illustrates the enlarged part of its corresponding chart around the RD value at QP = 27. As clearly depicted in the enlarged charts, the proposed algorithm always achieves better compression quality (lower bitrates and higher PSNR) in comparison with other works.

## 6 Conclusion

In this paper, a fast IME algorithm is designed for its implementation in hardware, as doing so enables parallel processing for the IME phase in HEVC. In addition, for all configurations of the proposed method, the compression quality is always better than that of the original TZ Search because it increases the number of search positions at every PU partition by sharing the search areas and search paths of all PUs and as a result of the strict termination condition of the refinement search such that all PUs achieve the best match positions identical to their recent search centers. Owing to the natural property of the proposed algorithm, which changes the redundancy in terms of the search positions from temporal to spatial redundancy, all redundant search positions among all PUs are easily removed. Therefore, the complexity of the proposed algorithm can be remarkably decreased. Ultimately, for future research, the proposed algorithm should be implemented in the hardware, and its efficiency should be compared with those of previous hardware implementations for IME.

## Declarations

### Acknowledgements

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2015R1C1A1A02037625) and by the Korea Institute for Advancement of Technology (KIAT) grant funded by the Korean government (Motie: Ministry of Trade, Industry and Energy, HRD Program for Software-SoC convergence) (No. N0001883).

### Authors’ contributions

All authors contributed equally. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- GJ Sullivan, J-R Ohm, W-J Han, T Wiegand, Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)Google Scholar
- P Helle et al., Block merging for quadtree-based partitioning in HEVC. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1720–1731 (2012)Google Scholar
- L Shen et al. An effective CU size decision method for HEVC encoders. IEEE Trans. Multimedia 15(2), 465–470 (2013)Google Scholar
- F Bossen et al., HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012)Google Scholar
- J-R Ohm et al., Comparison of the coding efficiency of video coding standards—including High Efficiency Video Coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012)Google Scholar
- LC Manikandan, RK Selvakumar, A new survey on block matching algorithms in video coding. Int. J. Eng. Res. 3(2), 121–125 (2014)Google Scholar
- Y-W Huang et al., Survey on block matching motion estimation algorithms and architectures with new results. J. VLSI Signal Process. Syst. 42(3), 297–320 (2006)Google Scholar
- CE Rhee et al. A survey of fast mode decision algorithms for inter-prediction and their applications to High Efficiency Video Coding. IEEE Trans. Consum. Electron. 58(4), 1375–1383 (2012)Google Scholar
- N Purnachand, LN Alves, A Navarro, in Proceedings of IEEE International Conference on Consummer Electronics-Berlin (ICCE-Berlin). Fast motion estimation algorithm for HEVC (Berlin, 2012), pp. 34–37Google Scholar
- N Purnachand, LN Alves, A Navarro, in Proceedings of IEEE International Conference on Systems, Signals and Image Processing (IWSSIP). Improvements to TZ search motion estimation algorithm for multiview video coding (Austria, 2012), pp. 388–391Google Scholar
- N Parmar, MH Sunwoo, in Proceedings of IEEE International SoC Design Conference (ISOCC). Enhanced test zone search motion estimation algorithm for HEVC (South Korea, 2014), pp. 260–261Google Scholar
- X Li et al., in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS). Context-adaptive fast motion estimation of HEVC (Portugal, 2015), pp. 2784–2787Google Scholar
- H Kibeya et al., in Proceedings of IEEE International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). TZ Search pattern search improvement for HEVC motion estimation modules (Tunisia, 2014), pp. 95–99Google Scholar
- LP Van et al., in Proceedings of IEEE International Conference on Image Processing (ICIP). Fast motion estimation for closed-loop HEVC transrating (France, 2014), pp. 2492–2496Google Scholar
- L Gao et al., in Proceedings of IEEE International Conference on Image Processing (ICIP). A novel integer-pixel motion estimation algorithm based on quadratic prediction (Canada, 2015), pp. 2810–2814Google Scholar
- G-L Li, C-C Wang, K-H Chiang, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). An efficient motion vector prediction method for avoiding AMVP data dependency for HEVC (Italy, 2014), pp. 7363–7366Google Scholar
- Q Yu, L Zhao, S Ma, in Proceedings of IEEE Visual Communications and Image Processing (VCIP). Parallel AMVP candidate list construction for HEVC (USA, 2012), pp. 1–6Google Scholar
- X Jiang et al., in Proceedings of IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). AMVP prediction algorithm for adaptive parallel improvement of HEVC (Japan, 2014), pp. 511–514Google Scholar
- S Rehman, R Young, C Chatwin, P Birch, An FPGA based generic framework for high speed sum of absolute difference implementation. Eur. J. Sci. Res. 33(1), 6–29 (2009)Google Scholar
- P Nalluri, LN Alves, A Navarro, in Proceedings of IEEE International Conference on Image Processing (ICIP). High speed SAD architectures for variable block size motion estimation in HEVC video coding (France, 2014), pp. 1233–1237Google Scholar
- Y Fan, L Huang, B Hao, X Zeng, A hardware-oriented IME algorithm for HEVC and its hardware implementation. IEEE Transactions on Circuits and Systems for Video Technology (2017). doi:10.1109/TCSVT.2017.2702194
- S-H Yang, J-Z Jiang, H-J Yang, Fast motion estimation for HEVC with directional search. Electron. Lett. 50(9), 673–675 (2014)Google Scholar