Hardware implementation of machine vision systems: image and video processing

This contribution focuses on different topics covered by the special issue titled ‘Hardware Implementation of Machine vision Systems’ including FPGAs, GPUS, embedded systems, multicore implementations for image analysis such as edge detection, segmentation, pattern recognition and object recognition/interpretation, image enhancement/restoration, image/video compression, image similarity and retrieval, satellite image processing, medical image processing, motion estimation, neuromorphic and bioinspired vision systems, video processing, image formation and physics based vision, 3D processing/coding, scene understanding, and multimedia.


Introduction
Machine Vision systems represent an extensive area belonging to the digital signal processing with vast applications nowadays for industrial environment, robotics, vehicular/avionics/space technology, multimedia, entertainment, security, medicine, scientific, and so on.Nevertheless, mostly these techniques cannot be achieved "off-line" due to the physical constraints of the complex environment and the context where they are applied.Additionally, as these systems are becoming complex, huge computational resources are demanding, mostly if hard real-time constraint is required.
The hardware implementation and the acceleration of these systems are gaining importance nowadays in order to accomplish with the requirements of the systems being usually needed to keep a scrupulous trade-off between accuracy, efficiency, and power consumption.The use of GPGPUs, cells, FPGAs, and DSPs as accelerators, thus, achieves speedups of orders of magnitude vs. any optimized CPU implementation.Each appropriate implementation solution is carried out depending on many final demands as power consumption, performance, accuracy, reliability, rapid prototyping, final cost, capability, and reconfigurability among others [1][2][3][4][5][6].
The objective of this special issue has been to bring together high-quality state-of-the-art research contributions as well as review articles that address recent developments in the hardware implementation of machine vision systems.

Specific contributions
In this special issue, we have detailed fundamental aspects related to hardware acceleration of image and video processing through different heterogeneous systems as commented previously.The applications described in this issue covers from multimedia coding to many machine vision topics.
Regarding this last aspect [7,8], we can remark that low-level vision gets useful measurements such as color, spatial frequency, binocular disparity, motion processing, etc., from several channels.Some of the aforementioned channels or space-temporal filters can be identified with receptive fields that deliver information to the retina.Others, such as binocular disparity or motion processing, are combinations of the previously mentioned ones.Mid-level vision integrates primitives processes at a previous level.Information delivered at this stage corresponds to real-world inferences such as egomotion and independent moving objects (IMOs).They are called causal actions or object candidates in connection with any multimodal characterization.Examples of these are the combination of luminance measurements to infer lightness, shape from shading, perceptual grouping, figure organization, etc.Finally, high-level vision interprets the scene through specific tasks such as relational reasoning, knowledge building, object recognition, etc.
As resumed, in the initial classification, two principal topics are covered in 12 top quality research articles: 1. New advances in hardware acceleration techniques for multimedia content representation/coding and transmission 2. Hardware acceleration techniques for machine vision systems belonging to low-, mid-and highlevel vision (according to the aforementioned taxonomy) The first topic is covered by the five following contributions: In the article entitled 'MPCM: a hardware coder for super slow motion video sequences' [9] by Estefania Alcocer et al., the authors develop a fast FPGA implementation of a simple codec called modulopulse code modulation (MPCM) which is able to reduce the bandwidth requirements up to 1.7 times at the same image quality when compared with PCM coding.This allows current high-speed cameras to capture in a continuous manner through a 40-Gbit Ethernet point-to-point access.In the article entitled 'Enhancing LTW image encoder with perceptual coding and GPU-optimized 2D-DWT transform' [10] by Miguel Martínez-Rach et al., the authors propose an optimization of the wavelet image coder (specifically E_LTW encoder) with the aim to increase its rate/distortion (R/D) performance through perceptual encoding techniques and reduce the encoding time by means of a graphics processing unitoptimized version of the two-dimensional discrete wavelet transform.The results show that in both performance dimensions, our enhanced encoder achieves good results compared with Kakadu and SPIHT (optimized versions of JPEG2000), achieving speedups of six times.In the article entitled 'GPU-based 3D lower tree wavelet video encoder' [11] by Vicente Galiano et al., the authors introduce a fast GPU-based encoder which uses 3D-discrete wavelet transform and lower trees, presenting also an exhaustive analysis of the use of GPU memory.This approach demonstrates a good trade-off between R/D, coding delay (as fast as MPEG-2 for high definition) and memory requirements (up to six times less memory than ×264 encoder).In the article entitled ' Adapting hierarchical bidirectional inter prediction on a GPU-based platform for 2D and 3D H.264 video coding' [12] by Rafael Rodríguez-Sánchez et al., the authors present an algorithm that concurrently performs the inter prediction carried out over Pand B-frames.The approach implements the hierarchical B frame prediction implemented in the H.264/AVC JM 17.2 reference software encoder, and it is tested using the main and the stereo high profile.Thus, a GPUbased implementation of an H.264/AVC and H.264/ MVC inter prediction algorithm on a graphics processing unit is developed.The results show a negligible rate distortion drop with a time reduction of up to 98% for the complete H.264/AVC encoder.The last article regarding this first topic is entitled 'Modeling of a method of parallel hierarchical transformation for fast recognition of dynamic images' [13] by Leonid Timchenko et al.; the authors present principles necessary to develop a method, and computational facilities for the parallel hierarchical transformation based on highperformance GPUs are discussed in the paper.Mathematical models of the parallel hierarchical (PH) network training for the transformation and a PH network training method for recognition of dynamic images are developed.
The second topic is discussed by the seven following contributions: Filter design optimization is a low-level vision task that has been covered in the paper entitled 'FIR Filter optimization for video processing on FPGAs' [14] by Michael Kumm et al.; the authors develop two proposed optimization techniques for highspeed implementations of the required multiplications in FIR filters with the least possible number of FPGA components.Both methods use integer linear programming formulations which can be optimally solved by standard solvers.In the first method, a formulation for the pipelined multiple constant multiplication problem is presented.In the second method, multiplication structures based on look-up tables are also taken into account.Due to the low coefficient word size in video processing, filters of typically 8 to 12 bits of an optimal solution is found for most of the filters in the benchmark used.A complexity reduction of 8.5% is achieved compared to the state-of-the-art heuristics.Super-resolution is also a low-level vision task that has been detailed in the paper entitled 'Exploring super-resolution implementations across multiple platforms' [15] by Bryan Leung et al.The authors implement and analyze a super-resolution algorithm across multiple platforms ranging from purely hardware to purely software and even a mix of both hardware and software.More specifically, they examine the performance for a FPGA implementation on two different FPGAs, a software/ hardware solution on a FPGA with a soft core processor (embedded system), a GPGPU implementation, and a MATLAB implementation.Overall, they are very interesting in terms of architecture and performance over different platforms.Regarding low-level vision, the optimization of hyperspectral imaging is a relevant topic nowadays belonging to remote sensing, that has been addressed in the paper entitled 'Performance versus energy consumption of hyperspectral unmixing algorithms on multi-core platforms' [16] by Alfredo Remón et al.The authors provide a detailed assessment of the performance versus energy consumption of different hardware architectures which have not been conducted as of yet in the field of hyperspectral imaging, particularly relevant to achieve processing results in real-time.They focus a thoughtful perspective on this relevant issue and analyze the performance versus energy consumption ratio of different processing chains for spectral unmixing when implemented on multi-core platforms.At this point, it is worthy to pay attention to an important aspect regarding low-level vision -the motion estimation -existing nowadays.It has many important applications and is crucial to the hardware acceleration of these algorithms.Two papers representing different family methods: "gradient-family" and "matching-family" are presented in this special issue.
A gradient family-based method is presented in the contribution 'Robust motion estimation on a low-power multi-core DSP' [17] authored by Francisco D. Igual et al., which addresses the efficient implementation of a robust gradient-based optical flow model in a low-power platform based on a multi-core digital signal processor (DSP C6678 DSP from Texas Instruments, Dallas, TX, USA).The aim of this work was to carry out a feasibility study on the use of these devices in autonomous systems such as robot navigation, biomedical assistance, or tracking with not only power restrictions but also real-time requirements.The interest of this research is particularly relevant in optical flow scope because this system can be considered as an alternative solution for mid-range video resolutions when a combination of in-processor parallelism with optimizations such as efficient memory-hierarchy exploitation and multiprocessor parallelization are applied.
A matching family-based method is presented in the contribution ' Acceleration of block-matching algorithms using a custom instruction-based paradigm on a Nios II microprocessor' [18] authored by Diego González et al.; this paper focuses on the optimization of matching algorithms widely used for video coding standards using an Altera custom instruction-based paradigm and a combination of synchronous dynamic random access memory (SDRAM) with on-chip memory in Nios II processors.A complete profile of the algorithms is achieved before the optimization, which locates code leaks and, afterward, creates a custom instruction set, which is then added to the specific design, enhancing the original system.As well, every possible memory combination between on-chip memory and SDRAM has been tested to achieve the best performance.The final throughput of the complete designs are shown.This manuscript outlines a low-cost system, mapped using very large scale integration technology, which accelerates software algorithms by converting them into custom hardware logic blocks and showing the best combination between onchip memory and SDRAM for the Nios II processor.
Keeping it going at a low-level vision layer, the next contribution focuses on augmented vision for impaired people; this paper is entitled ' A reconfigurable real-time morphological system for augmented vision' [19] authored by Ryan B. Gibson et al.Augmented visual aid devices require highly user-customizable algorithm designs for subjective configuration per task, where current digital image processing visual aids offer very little userconfigurable option.The high spatial frequencies of an image can be extracted by edge detection techniques and overlaid on top of the original image to improve visual perception among the visually impaired.This paper presents a highly userreconfigurable morphological edge enhancement system on FPGA, where the morphological, internal and external edge gradients can be selected from the presented architecture with specified edge thickness and magnitude.In addition, the morphology architecture supports reconfigurable shape structuring elements and configurable morphological operations.The proposed morphology-based visual enhancement system introduces a high degree of user flexibility in addition to meeting real-time constraints capable of obtaining 93 fps for high-definition image resolution.Finally, the last paper contributed in this special issue is entitled 'Parallel embedded processor architecture for FPGA based image processing using parallel software skeletons' [20] authored by Hanen Chenini et al.This application belongs to the mid-