MultiGPU based on multicriteria optimization for motion estimation system
 Carlos Garcia^{1}Email author,
 Guillermo Botella^{1},
 Fermin Ayuso^{1},
 Manuel Prieto^{1} and
 Francisco Tirado^{1}
https://doi.org/10.1186/16876180201323
© Garcia et al.; licensee Springer. 2013
Received: 31 October 2012
Accepted: 14 December 2012
Published: 19 February 2013
Abstract
Graphics processor units (GPUs) offer high performance and power efficiency for a large number of dataparallel applications. Previous research has shown that a GPUbased version of a neuromorphic motion estimation algorithm can achieve a ×32 speedup using these devices. However, the memory consumption creates a bottleneck due to the expansive tree of signal processing operations performed. In the present contribution, an improvement in memory reduction was carried out, which limited accelerator viability usage. An evolutionary algorithm was used to find the best configuration. It supposes a tradeoff solution between consumption resources, parallel efficiency, and accuracy. A multilevel parallel scheme was exploited: grain level by means of multiGPU systems, and a finer level by data parallelism. In order to achieve a more relevant analysis, some optical flow benchmarks were used to validate this study. Satisfactory results opened the chance of building an intelligent motion estimation system that autoadapted according to realtime, resource consumption, and accuracy requirements.
Keywords
1 Introduction
Motion estimation and compensation are crucial for multimedia coding characterized by high memory requirements and computation complexity. When considering MPEG processing, motion estimation is acknowledged as the most timeconsuming [1], creating up to 90% of the total execution time [2, 3]. Additionally, motion estimation has several applications regarding multimedia scope as segmentation, extraction of 3D structure, pattern tracking, filtering, compression, and deblurring. Differently developed motion estimation models and algorithms can be classified into three main categories: matching domain approximations [4], energy models [5], and gradient models [6].
Block matching algorithms have the pros of robustness, low cost VLSI implementation (because of their regular parallel procedure), and low overhead (since they contain one vector per block). Nevertheless, there are many cons, since a block may contain several moving objects and fail for zoom, rotational motion, local deformation, and blocking artifact. In additional, they usually estimate the motion error by minimizing a metric, which does not release the true movement, etc. Energy models are probabilistic, delivering a population of solutions that do not indicate motion itself and are not usually used for multimedia purposes.
The gradientbased family can estimate vector motion of every single pixel, giving a dense representation of the processed frame. There are several examples of video compression using gradient based algorithm [7]. Recursive algorithms belonging to this family do not have to transmit motion information. Nevertheless, this algorithm family has the drawback of large motion vectors (severe motion), noisy images, and changes in illumination. The present approach is based on a Multichannel Gradient Model (McGM) [8–10], a neuromorphic algorithm fitted to allow the construction of viable, highly robust, frontend processors for image recognition systems [11].
The increased computing capabilities of graphics processing units (GPUs) in recent years has increased their use as accelerators in many areas such as scientific simulations, computer vision, bioinformatics, cryptography, and finance, among others. This increase is largely due to impressive performance rates. For example, one of the latest GPUs from Nvidia, the GTX 680, achieves three petaflops in single precision with 1006 cores and also incorporates the newer Kepler architecture. Current trends seem to indicate that this capacity will grow even more with the incorporation of 22 and 28 nm technologies. Recently, for example, AMD announced its Radeon 8000 Series, branded as Sea Island, and Intel is manufacturing Knights Corner products. However, key points that dramatically affect performance rates include the efficient use of the memory hierarchy and the exploitation of parallelism capabilities.
The increased demand for information to be processed also plays a role, because the use of these devices as accelerators is limited due to DDR memory restrictions. To solve this problem, research [12] has often proposed a data reuse alternative with the aim of minimizing the memory traffic between GPU and CPU. Another approach in the field of rendering meshes can be found in [13] a solution that uses more efficient algorithms in terms of memory consumption alongside other techniques based on simplification or information compression. The GPU memory reduction proposed here is addressed using a motion estimation scenario, which, to the best of our knowledge, doesn’t exist as a solution in any of the current literature.
In previous study [14], we developed a GPUbased McGM implementation. In the present article, we address an efficient solution for dense and robust motion estimation per pixel related with GPU memory consumption, which limits the GPU viability.
This article is organized as follows: Section 2 moves through a specific neuromorphic model; Section 3 presents the motivation of this study where multiobjective optimization is used; and Section 4 shows performance and visual results. Finally, Section 5 concludes with the main contributions of this study.
2 Multichannel gradient model (McGM)
2.1 Stage I. temporal filtering
2.2 Stage II. spatial filtering
According to the space domain, the shape of the receptive fields from the primitive visual cortex can be modeled either by using Gabor functions—where the impulse responses are defined by harmonic functions multiplied by a Gaussian—or by using a derivative set of Gaussians [16]. The Gaussian is a unique function in many ways and is of particular importance to biology.
The n th Gaussian derivative can be expressed as a Hermite polynomial multiplied by the original Gaussian: (where σ is the standard deviation of the Gaussian, and the scale factor ensures the function integrates to unity).
2.3 Stage III. steering filtering
2.4 Stage IV. Taylor truncation
The three Taylor expansion derivatives are constructed in one large image using the completed set of basis filter responses. According to the original model [9], the expansions are truncated after the thirdorder in the primary direction and the secondorder in the orthogonal and temporal directions.
2.5 Stage V. quotients
2.6 Stage VI. velocity primitives
3 Multicriteria motivation for tunning McGM
Potential benefits of GPUs in the McGM context have been explored in the literature [14], where authors studied the viability in these novel devices. Throughput results with respect to a single CPU were satisfactory enough in terms of performance, achieving ×32 speedups for 256^{2} resolution movies.
We would like to emphasize that this particular GPUbased motion estimation scheme is an alternative to consider in terms of Mpixel/s compared to other purpose systems used for such motionestimation algorithms, although the algorithm features create a bottleneck, specifically when memory requirements are increased in each stage, with an upward trend. This disadvantage limits GPU viability. Attending to the largest memory usage configuration considered in [14], 3.5 GB of global memory was used, which was close to the capacity limit of a single GPU. Although the memory capacity is greater for GPUs nowadays, this problem is still present with larger data input resolutions.
Performance of the GPU versus CPU
Init. GPU  Temp. Fil.  Spatial F.  Steering  Taylor  Velocity  Total  Total  

(s/pix)  (Mpps)  (Mpps)  (Mpps)  (Mpps)  (Mpps)  (Mpps)  (fps)  
CPU (32^{2})  10.30  21.63  90.99  217.87  247.24  6.14  6327  
GPU (32^{2})  3.28E5  124.69  0.62  4.66  8.04  50.40  0.50  375.7 
CPU (64^{2})  12.85  1.44  2.09  3.78  20.14  0.64  195.2  
GPU (64^{2})  9.08E6  495.98  2.55  15.77  25.54  169.08  2  296.4 
CPU (128^{2})  13.97  0.92  1.20  2.06  11.53  0.39  30.72  
GPU (128^{2})  7.21E6  1166.12  8.79  36.17  51.49  240.65  6.03  210.8 
CPU (256^{2})  21.50  1.05  1.30  1.70  12.98  0.41  8.631  
GPU (256^{2})  2.25E6  1724.63  27.56  47.62  64.20  289.21  13  99.64 
In order to reduce algorithm memory consumption, we could afford not to store, as a particular solution, some of the temporary data computations, recalculating when necessary at the expense of reducing performance throughput under real time conditions. The most memorydemanding stages in the McGM algorithm correspond to the Spatial Filtering and Steering stages. On the one hand, an efficient way to reduce memory necessities was to perform the Steering stage with less θ angles at the expense of accuracy degradation. On the other hand, it was possible to use a numerical derivative [17] instead of the Gaussian counterpart in the Spatial Filtering stage in order to allow faster derivative recalculation. This alternative scheme was based on the fact of not requiring intermediate data computation storage by saving a huge amount of memory and to recalculate whenever data were used. A simple numerical differentiating filter was used based on the convolution commutative properties: I⊗G _{ x }=I⊗(G _{0}⊗D _{ x })=(I⊗G _{0})⊗D _{ x }. The number of operations performed in (I⊗G _{0})⊗D _{ x } are smaller than the Gaussian derivative filtering, making the convolution process faster.
Filter accuracy degradation using a numerical derivative instead of the Gaussian counterpart for first, second, … to the fifth derivative order
x ^{ ′ }  x ^{ ′ ′ }  x ^{ ′ ′ ′ }  x ^{ I V }  x ^{ V }  

filter degradation  0.003825  0.009415  0.018444  0.033701  0.060134 
Despite the unimportance of degraded filtering accuracy, an experiment comparing motion estimation degradation is carried out to evaluate the loss of accuracy in the overall algorithm. As benchmarks, we have used a couple of synthetic sequences widely accepted in this context: the ‘diverging tree’ and the ‘translating tree’, both created by David Fleet at Toronto University [18]. The ‘diverging tree’ shows an expansive motion of a tree (in camera zoom mode) with an asymmetric velocity range depending on the pixel position (null in the central focus and 1.4 pixels/frame and 2 pixel/frames in the left and right boundaries, respectively). The ‘translating tree’ shows the translational motion of a tree with an asymmetric velocity range depending on the pixel position (zero to 1.73 pixel/frames and zero to 2.3 pixel/frames in the left and right border, respectively). For an error metric, we used Barron [19], considered to be one of the most accepted metrics in the specialized literature.
Overall degradation measured as mean absolute error of Barron’s angle
O(h)  O(h ^{2})  O(h ^{3})  O(h ^{4})  # θ/2  # θ/4  

Diverging tree  0.9297  0.4982  0.4432  0.4020  0.0008  0.0122 
Translating tree  1.5185  0.7965  0.7762  0.6903  0.0015  0.0296 
As observed, the ‘diverging tree’ experiment behaves reasonably well with numerical derivatives reducing their impact with a higher order of accuracy. Nevertheless, in the ‘translating tree’ experiment, the algorithm is more vulnerable to numerical derivatives than the number of angles variation. Due to this disparity observed in Table 3, it is advisable the space of feasible solutions with any set of input data be explored. Given the large number of parameters to configure, on one hand relative to the McGM algorithm, and on the other hand those based on available resources, the use of genetic algorithms (GAs) can be useful to reduce timeconsuming exploration.
3.1 Multicriteria optimization description
The use of GAs arises from nonviability exploration with a large solution space. In our context, the target is to find a compromise in the reduction of the GPU’s memory usage with negligible accuracy degradation that allows motion estimation system selfadaptation under appreciable environmental conditions and changes in a reasonable time.
where z is the objective vector with 3 objectives to be minimized: execution time f _{1}, memory usage f _{2}, and loss of accuracy f _{3}; z is the decision vector, and X is the feasible region in the decision space, which corresponds to all possible McGM configurations with respect to the derivative decision and the number of angles involved. In GA terminology, x corresponds to a chromosome. In our context:

D _{ x } corresponds to the derivative to be computed in spatial filtering. This information is stored in a twodimensional array whose values determine the way their derivative is computed by means of Gaussian or ordernumerical differentiation. The twodimensional array position is related to the derivative order.

The number of θ angles to be performed in the steering stage, which can be assigned as an integer.
3.2 Our multiGPU implementation
Over the last few years, a great number of multiobjective evolutionary algorithms have been developed [21–23]. A revision of the GA can be found in a tutorial [24], where the authors provide the revision’s more relevant features.
For this study, we have chosen the NSGAII [25] for its following advantages:

Weights are not required, so it is not necessary to study the impact of f _{ i }(x) and assign them.

Its computational requirement is one, which presents less computational complexity.

Its ‘good’ behavior and ability to find a set of solutions near the true Paretooptimal with few iterations.

It’s widely used and amply tested.
 1.
Initially, a random population is created in pop.
 2.
The population is sorted based on the nondomination scheme.
 3.
It is assigned a fitness, which means every individual of the population is ranked into levels. Firstlevel or Paretofront is most preferable.
 4.
A binary tournament selection and combination is carried out.
 5.
A mutation phase is done.
 6.
A combined population R comes from the union of an old pop with the new one n e w _p o p. The population R is size 2∗p o p _s i z e.
 7.
R is ranked by means of the McGM algorithm and sorted according to a nondomination scheme.
 8.
New population pop is made from size p o p _s i z e.
The fast nondominated sorting is the most computational cost part of the GA, because it involves ranking every individual of the population. We urge that this task be performed entirely on multiGPUs since this is more efficient than using a CPU, from computational point of view. Most GA operators are executed in CPU due to its low computational demand.
To rank an individual of the population means to compute the McGM algorithm with chromosome configuration, to compute the derivatives in Spatial Filtering, and to compute the number of angles in the Steering Stage to be performed. Several levels of parallelism are exploited: a coarser level, where nondominate sorting is evaluated in parallel on several GPUs, and finer level by means of data parallelism exploitation available in each stage of the McGM algorithm. Algorithm 1 summarizes our parallel implementation where p o p _s i z e, ngens and % m u t a t i o n are GA input parameters which correspond to population size, number of generations, and mutation probability, respectively. The OpenMP paradigm is used to distribute a nondominated sort across multiple devices by means of #pragma omp parallel for directives. Our implementation generates Paretooptimal solutions with a set of motion estimation execution time, accuracy pixel error, and GPU memory usage points. This feature allows the choice of one of the best solutions, taking into account the available computational resources favoring the dynamic tuning depending on current conditions.
Algorithm 1 pareto front = multiGPU NSGAII(pop size, ngens,%mutation)
4 Results
4.1 Work environment
The systems used are based on Tesla technology. The first one consists of 2 Intel Xeon E5645 processors with six cores (2.40 GHz with 12 MB cache memory and Hyperthreading technology) and 2 Tesla M2070 GPUs. The second one is equipped with 2 Quad Intel Xeon E5530 processors (2.40 GHz with 8 MB cache memory and Hyperthreading technology), connecting with 4 Tesla C1060 GPUs. In both cases, the operating systems are Debian 2.6.38 kernels; the compiler used is a GNU g++ v.4.4 with compilation flags O3 m64 fopenmp and CUDA C/C++ SDK v.4.2 with O3 fopenmp arch sm_20/13 flags enabled.
The system based on Tesla M2070 incorporates Fermi technology, but due to a scarce number of devices available, a scalability study has been completed with a system based on 4 Tesla C1060 GPUs that allow projections be made of parallel efficiency rates in more modern systems.
4.2 Multicriteria results
Multiobjective GAs are used to look for optimal solutions in a huge search space. In our context, they are employed to achieve a set of optimal solutions that reduce the GPU’s memory usage in the McGM algorithm without losing significant accuracy in the motion estimation scenario. As previously mentioned, the tests were performed using the ‘diverging tree’ and the ‘translating tree’ benchmarks, which are widely accepted in this area.
The first experiment was to evaluate both the convergence of the GA and the set of optimal solutions reached. For this purpose, a Euclidean distance metric between consecutive solutions was employed as described in [25]. The GA implemented incorporated a stop condition based on a Euclidean metric when a certain number of iterations remained invariant to ensure the nondominant solutions converged to the optimal Paretofront.
In this experiment, the population size was fixed to 500 with 1% mutation probability. The results obtained indicated that after a certain number of generations, the GA barely improved the nondominant solutions, although it reported new pairs.
Population size only affects the final execution time, achieving results of an optimal solution with similar quality. Empirically, 1% of mutations reported better GA performance. Higher mutation rates only suppose significant variations between consecutive generations, which means higher generations are necessary to reach the convergence criterion. Particularly, greater mutation rates suppose a higher number of iterations, which varies between 15 to 320%.
As shown in Figure 2, optimal solutions are generated with significant reduction in memory requirements, achieving even more accurate solutions than the original McGM’s algorithm for the ‘translating tree’ benchmark.
4.3 MultiGPU results
MultiGPU execution times for Tesla M2070 based system
Tesla M2070  t _{CPU}(s)  t _{GPU}(s)  t _{Comm}(s) 

1 GPU  1.24  22495.6  869.5 
2 GPUs  124.2  12464.9  447.4 
MultiGPU execution times for Tesla C1060 based system
Tesla C1060  t _{CPU}(s)  t _{GPU}(s)  t _{Comm}(s) 

1 GPU  1.18  23748.0  2025.8 
2 GPUs  278.52  12613.1  1022.5 
4 GPUs  153.28  6284.4  513.5 
Moreover, the use of multiple levels of parallelism reports multiplicative accelerations: first, the speedups achieved in the multiGPU system, which can be up to ×3.71 with 4 GPUs enabled; and second, the accelerations up to ×32 the can be achieved by exploiting the data parallelism on a GPU. On one hand, the combination of both accelerations allows the reduction of exploration time to reach an optimal solution in 99.2% compared with a generalpurpose processor. On the other hand, the use of a multiGPU system not only reports greater FLOPS rates than a CPU, but it is also beneficial in terms of power consumption (MFLOPS/watt).
Moreover, although GA search times are important, their use encourages getting suboptimal solutions that meet the requirements of response time or resource consumption, and as GAs evolve, they are gradually refined. This feature, coupled with the chance of a population size reduction, supposes an impressive simulation times decrease which opens the possibility to build an intelligent system that autocorrects/adapts depending on the specific requirements or substantial environment changes.
4.4 Visual result
For the ‘diverging tree’, a reduction of 75% in memory usage returns the same precision using the Barron metric and 50% of the McGM execution time (MEtime) compared to the original algorithm. However, the configuration that reduces memory usage by 50% degrades the accuracy in 22% with a speedup of ×3.3.
For ‘translating tree’ benchmark, a solution with half of memory requirements is more accurate (Barron’s error is 0.13 radians less than the original McGM) and ×3.5 faster.
Despite Barron metric’s popularity by the scientific community in the context of motion estimation, some authors [26, 27] point out specific performances due to its asymmetry and its bias of large flow vectors.
4.5 Other error metrics
Although Barron’s metric [19] is probably the most used in the motion estimation scope, there are other metrics used by Machine Vision community that must be taken into account in order to enhance the visibility and generality of the results obtained.
Best configuration achieved for a reduction of 75 and 50% memory requirements using McCane and Otte&Nagel metric
Error metric  Benchmark  Memory  ME_{time}  Accuracy 

(Δ ψ)  
Barron  ‘Translating tree’  50%  28.6%  −0.13 
‘Diverging tree’  75%  50.0%  0.00  
50%  33.3%  0.05  
McCaneA  ‘Translating tree’  50%  26.9%  −0.12 
‘Diverging tree’  75%  49.1%  0.00  
50%  28.9%  0.05  
McCaneB  ‘Translating tree’  50%  29.6%  −0.11 
‘Diverging tree’  75%  49.2%  0.00  
50%  34.5%  0.09  
Otte&Nagel  ‘Translating tree’  50%  25.3%  −0.21 
‘Diverging tree’  75%  49.1%  0.00  
50%  35.7%  0.12 
5 Conclusions
A new and highly parallel approach is presented to overcome the GPU memory usage problems that occurred in our previous implementation of a wellknown neuromorphic motion estimation algorithm. This context provides the main motivation for using evolutionary algorithms to solve multicriteria optimization problems. The use of GAs based on a multiGPU scheme allowed for quick exploration of feasible solutions with any set of input data. The choice of NSGAII is motivated by the good results observed in a few iterations and a nearoptimal Paretofront.
From the viewpoint of reaching a solution that meets the requirements of memory consumption, we observed:

For ‘diverging tree’, a reduction of 75% in memory usage returns the same precision as the all metrics considered and 50% of the McGM execution time compared to the original algorithm. A configuration that reduces memory usage by 50% degrades the accuracy from 15 to 25% with a range of speedup which varies from ×2.8 to ×3.5.

For ‘translating tree’, a configuration that has half of the memory requirements is more accurate in terms of error and is between ×3.3 to ×4 faster.
From the point of view of multiGPU efficiency is observed:

Successful performance of ×3.71 speedups are archived when four GPUs are enabled.

Our implementation is a scalable approach due to both a wellbalanced workload and lowimpact communication between host and device.

A found multiplicative effect: ×3.71 speedups in a multiGPU system by ×32 acceleration by means of exploiting the data parallelism on a GPU. An impressive GA time in reaching an optimal solution in 99.2% compared with a CPU.

An alternative to be considered in terms of power consumption (MFLOPS/watt).
Because of these encouraging results, the possibility exists for building an intelligent system that autocorrects/adapts depending on specific requirements or environmental condition variations as the GA evolves.
Future lines are based on reusing this system with an environment predictor, with the possibility of realtime execution and self reconfiguration depending on the external constraints and resources available in the platform. This system is expected to contribute to the new machine vision trends, useful for many realworld applications.
Declarations
Acknowledgements
The present study had been supported by Spanish Projects CICYTTIN 2008/508, CICYTTIN 201232180 and Ingenio Consolider ESP00C0720811.
Authors’ Affiliations
References
 Shaaban M, Goel S, Bayoumi M: Motion estimation algorithm for realtime systems. IEEE Workshop on Signal Processing Systems 2004, 257262.Google Scholar
 Kang JY, Gupta S, Shah S, Gaudiot JL: An efficient pim (processorinmemory) architecture for motion estimation, IEEE International Conference on ApplicationSpecific Systems. Architectures, and Processors 2003, 282292.Google Scholar
 Kang JY, Gupta S, Gaudiot JL: An efficient datadistribution mechanism in a processorinmemory (pim) architecture applied to motion estimation. IEEE Trans. Comput 2008, 57(3):375388.MathSciNetView ArticleGoogle Scholar
 Oh HS, Lee HK: Blockmatching algorithm based on an adaptive reduction of the search area for motion estimation. RealTime Imag 2000, 6(5):407414. 10.1006/rtim.1999.0184View ArticleMATHGoogle Scholar
 Huang CL, Chen YT: Motion estimation method using a 3d steerable filter. Image Vis. Comput 1995, 13(1):2132. 10.1016/02628856(95)91465PView ArticleGoogle Scholar
 Baker S, Gross R, Matthews I: Lucaskanade 20 years on: a unifying framework: Part 3. Int. J. Comput. Vis 2002, 56: 221255.View ArticleGoogle Scholar
 Chi YM, Tran TD, EtienneCummings R: Optical flow approximation of subpixel accurate block matching for video coding. IEEE ICASSP 2003, 1: 10171020.Google Scholar
 Johnston A, McOwan PW, Benton CP: A unified account of three apparent motion illusions. Vis. Res 1995, 35(8):11091123. 10.1016/00426989(94)00175LView ArticleGoogle Scholar
 McOwan PW, Johnston A, CPB: Robust velocity computation from a biologically motivated model of motion perception. Proc. Royal Soc. B 1999, 266: 509518. 10.1098/rspb.1999.0666View ArticleGoogle Scholar
 Liang X, McOwan PW, Johnston A: Biologically inspired framework for spatial and spectral velocity estimations. J. Opt. Soc. Am. A 2011, 28(4):713723. 10.1364/JOSAA.28.000713View ArticleGoogle Scholar
 Botella G, García A, RodriguezAlvarez M, Ros E, MeyerBâse U, Molina MC: Robust bioinspired architecture for opticalflow computation. IEEE Trans. VLSI Syst 2010, 18(4):616629.View ArticleGoogle Scholar
 Mattes L, Kofuji S: Overcoming the GPU memory limitation on FDTD through the use of overlapping subgrids. Int. Conference on Microwave and Millimeter Wave Technology 2010, 15361539.Google Scholar
 Zhou Y, Garland M: Interactive pointbased rendering of higherorder tetrahedral data. IEEE Transactions on Visualization and Computer Graphics 2006, 12(5):12291236.View ArticleGoogle Scholar
 Ayuso F, Botella G, Garcia C, Prieto M, Tirado F: GPUbased acceleration of bioinspired motion estimation model. Concurrency and Computation: Practice and Experience p. (in press) (2012) http://dx.doi.org/10.1002/cpe.2946
 Snowden RJ, Hess RF: Temporal frequency filters in the human peripheral visual field. Vis. Res 1992, 32(1):6172. 10.1016/00426989(92)90113WView ArticleGoogle Scholar
 Koenderink JJ: Optic flow. Vision Research 1996, 26: 161180.View ArticleGoogle Scholar
 Fornberg B: Generation of finite difference formulas on arbitrarily spaced grids. Math. Comput 1988, 51(184):699706. 10.1090/S00255718198809350770MathSciNetView ArticleMATHGoogle Scholar
 Fleet DJ: Measurement of Image Velocity. Norwell, MA, USA: Kluwer Academic Publishers; 1992.View ArticleMATHGoogle Scholar
 Barron JL, Fleet DJ, Beauchemin SS: Performance of optical flow techniques. Int. J. Comput. Vis 1994, 12: 4377. 10.1007/BF01420984View ArticleGoogle Scholar
 Konak A, Coit D, Smith D: Multiobjective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Safety 2006, 91(9):9921007. 10.1016/j.ress.2005.11.018View ArticleGoogle Scholar
 Fonseca C, Fleming P: Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. Int. Conference on Genetic Algorithms 1993, 416423.Google Scholar
 Srinivas N, Deb K: Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol. Comput 1994, 2(3):221248. 10.1162/evco.1994.2.3.221View ArticleGoogle Scholar
 Coello Coello CA: 20 years of evolutionary multiobjective optimization: what has been done and what remains to be done. In Computational Intelligence: Principles and Practice, chap. 4. Edited by: Yen GY, Fogel DB. Vancouver, Canada: IEEE Computational Intelligence Society; 2006:7388. ISBN 0978713508Google Scholar
 Zitzler E, Laumanns M, Bleuler S: A tutorial on evolutionary multiobjective optimization. In Metaheuristics for Multiobjective Optimisation (SpringerVerlag) 2003, 535: 338.MathSciNetView ArticleMATHGoogle Scholar
 Deb K, Pratap A, Agarwal S, Meyarivan T: A fast elitist multiobjective genetic algorithm: Nsgaii. IEEE Trans. Evol. Comput 2000, 6: 182197.View ArticleGoogle Scholar
 Otte M, Nagel HH: Estimation of optical flow based on higherorder spatiotemporal derivatives in interlaced and noninterlaced image sequences. Artif. Intell 1995, 78(1):543.View ArticleGoogle Scholar
 McCane B, Novins K, Crannitch D, Galvin B: On benchmarking optical flow. Comput. Vis. Image Underst 2001, 84(1):126143. 10.1006/cviu.2001.0930View ArticleMATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.