Open Access

A novel fractal monocular and stereo video codec with object-based functionality

EURASIP Journal on Advances in Signal Processing20122012:227

https://doi.org/10.1186/1687-6180-2012-227

Received: 3 April 2012

Accepted: 1 October 2012

Published: 25 October 2012

Abstract

Based on the classical fractal video compression method, an improved monocular fractal compression method is proposed which includes using more effective macroblock partition scheme instead of classical quadtree partition scheme; using improved fast motion estimation to increase the calculation speed; using homo-I-frame like in H.264, etc. The monocular codec uses the motion compensated prediction (MCP) structure. And stereo fractal video coding is proposed which matches the macroblock with two reference frames in left and right views, and it results in increasing compression ratio and reducing bit rate/bandwidth when transmitting compressed video data. The stereo codec combines MCP and disparity compensated prediction. And a new method of object-based fractal video coding is proposed in which each object can be encoded and decoded independently with higher compression ratio and speed and less bit rate/bandwidth when transmitting compressed stereo video data greatly. Experimental results indicate that the proposed monocular method can raise compression ratio 3.6 to 7.5 times, speed up compression time 5.3 to 22.3 times, and improve the image quality 3.81 to 9.24 dB in comparison with circular prediction mapping and non-contractive interframe mapping. The PSNR of the proposed stereo video coding is about 0.17 dB higher than that of the proposed monocular video coding, and 0.69 dB higher than that of JMVC 4.0 on average. Comparing with the bit rate resulted by the proposed monocular video coding and JMVC 4.0, the proposed stereo video coding achieves, on average, 2.53 and 21.14 Kbps bit rate saving, respectively. The proposed object-based fractal monocular and stereo video coding methods are simple and effective, and they make the applications of fractal monocular and stereo video coding more flexible and practicable.

Keywords

FractalMonocularStereoVideo codingObject-based

Introduction

There are several image/video compression methods, for example, JPEG, MPEG, and H.26X which are all based on motion estimation/compensation (ME/MC). But fractal compression, which is based on the iterated function system (IFS) proposed by Mandelbrot[1], is a relative new approach to image coding. It reduces the redundancy of images by using their self-similarity properties which can make high compression ratio and simplicity of decompression. So, if we want to store a picture, we can do it by storing the numbers that define the contraction maps[2] and simply generating the picture whenever we want to see it. And the image can be decoded in any scale. Hence, fractal compression is particularly suitable for the situation of one encoding and many decoding. But some images do not contain the type of self-similarity, such as an image of a face. Fortunately, the human eyes are insensitive to a wide variety of information loss, so we allow some errors in our representation of the image of the face as a set of self-transformations. However, it usually takes long time to encode. In order to speed up the fractal encoder, Lin and Ming-Sheng[3] proposed an edge property-based neighborhood region search method. But the image quality is decaying. In this article, we proposed a novel macroblock partition scheme combined block searching method in fractal coding to obtain optimal performance.

With regard to video compression, two basic fractal compression methods are used most frequently. One is known as the cube-based compression[46] and the other is frame-based[7, 8]. In the cube-based compression, a sequence of images is divided into groups of frames, each of which in turn is partitioned into non-overlapped cubes. It can obtain high-quality decompressed images while has a high computing complexity and low compression ratio. In the frame-based compression, the domain blocks from the previous frame are used to compute the approximate transformation for the range blocks of the current frame. Although it obtains a high compression ratio, the current frame is related to the previous frame which introduces and spreads errors between frames. In this article, the two methods are combined which is also researched in[911] in order to improve the results instead of choosing the optimal one. In addition, ME is one of the most time-consuming parts in video coding. It is important to develop fast and effective ME algorithms. A novel fast ME method is proposed which performs a “rough” search before a “precise” search for the best partition in fractal coding. By reducing the searching load for the non-best partitions, the computation complexity for search can greatly be decreased.

In general, stereo video sequences are composed of left and right images acquired from two slightly different viewpoints, thus making them similar and containing a lot of redundant information. Fractal compression is an effective method to remove the redundancy. But traditional (2D) fractal coding makes the depth perception defective. In this article, we proposed disparity compensated prediction (DCP) and motion compensated prediction (MCP) which are used in fractal stereo coding to conquer these problems.

In this article, object-based (OB) coding, the notion which is first used by the MPEG-4 standard[12], is researched in fractal video coding. We developed a novel OB-video coding algorithm which has important advantages: it allows manipulation of image objects without complete decoding of the stream, and then improves the coding quality and reduces the bit rate. It alleviates the problem of annoying coding effects, such as blocking artifacts and mosquito effects compared to block-based approach at low bit rate, especially when the blocks coincide with boundaries of different objects. The object-based approach can also provide more natural representations of the scene and has another potential benefit of acquiring the depth information of semantically meaningful objects. In such a scheme, a prior segmentation map (alpha plane) of the image, which segments the image into objects, is known in advance[13, 14].

The rest of the article is organized as follows. The theory of fractal coding is summarized in Section 2. The proposed improving methods for monocular fractal video sequence coding are presented in Section 3. The method of stereo fractal video compression and decompression is proposed in Section 4. A detailed design of a new object-based fractal compression of monocular video sequence is presented in Section 5. The experimental results are presented in Section 6. And finally the conclusions are outlined in Section 7.

The fractal compression mathematical theory

Let I(X) be image intensity of a pixel at position X = (x, y) and let {R1, … R N } be the set of N non-overlapping range blocks (i.e., collections of pixel coordinates) partitioning the image. Similarly, let {D1, … D M } be the set of M, possibly overlapping, domain blocks covering the image. Finally, let I R i = I X : X R i and I D j = I X : X D j .

In general, the size of a range block, denoted as n × m, could have n and m chosen as 16, 8, or 4. For each range block R i (i = 1 N), the goal is to find a domain block D j (j = 1 M) and a contractive mapping wi that jointly minimize a dissimilarity (distortion) criterion ε. The contractive affine mapping wi consists of three submappings.

(1) Contraction σ(I, X): The dimension of R i is m × n, which is not the same as the dimension 2 m × 2n of D j , so they cannot be compared. The function σ(I, X) is to shrink the domain block D j by averaging the intensities of its four neighboring pixels [(I k , (k = 1 4))] of disjoint groups leading to the same dimension block denoted symbolically as D j r , which is also known as the codebook block expressing with the dimension m × n. If the intensity of D j is expressed as the submatrix
I D j x 1 , y 1 , 1 x 1 2 n , 1 y 1 2 m
and D j r is expressed as the submatrix, I D j r x 2 , y 2 , 1 x 2 n , 1 y 2 m
σ I , X : I D j r x 2 , y 2 σ I , X I D j x 1 , y 1
That is
I D j r x 2 , y 2 = I D j 2 x 2 , 2 y 2 + I D j 2 x 2 + 1 , 2 y 2 + I D j 2 x 2 , 2 y 2 + 1 + I D j 2 x 2 + 1 , 2 y 2 + 1 4
(1)
(2) Geometric transformation:
ξ X A X + b
(2)

where X = (x, y), A is a 2 × 2 matrix and b is a translation vector (this mapping must be 1-to-1 between pixels of the range and codebook block). The above general expression can be simplified by constraining the transformation A to eight cases: four rotations (0, 90, − 90, 180) and four mirror reflections (mid-horizontal, mid-vertical, first diagonal, and second diagonal)[15, 16]. {ζ P }P = 18 denotes the set of possible transformations A.

(3) Photometric transformation: We define γ I(X) as the following to adjust grey level:
γ I X s · I X + o
(3)

where is the composition operator, s is a scaling factor which controls the contrast, and o is an offset which controls the brightness of the transformation. The above general expression accounts for different dynamic ranges of pixels in the range and domain blocks.

The overall transformation w i that maps a domain-block pixel into the range-block pixel at X is
w i I D j , X γ σ I D j , ζ i P X , X D j , p 1 , …8
(4)
We also can write w i as the following:
w i I D j , X s i · σ I D j , ζ i P X + o i , X D j , p 1 , …8
(5)
In order to encode range block R i , a search for index j (domain block D j ) and for an isometry ζ i P must be executed, jointly with the computation of photometric parameters s i and o i . This can be performed by minimizing the following mean-squared error
ε I R i , I D j , w i = 1 R i X R i I R i X w i I D j , X 2
(6)
where |R i | = Card(R i ), which can get the number of pixels that R i contains. While the isometry ζ i P and index j (equivalent to translation b) are usually found by exhaustive search, the scaling s i and offset o i are computed as follows
s i = X R i σ I D j , ζ i P X m D j I R i X m R i X R i σ I D j , ζ i P X m D j 2
(7)
o i = m R i s i · m D j
(8)
where m R i and m D j are the mean intensity values in the range and domain blocks, respectively. Equations (7) and (8) will give us contrast and brightness settings that make the affinely transformed I R i X values have the least squared distance from the w i I D j , X values. This permits a precise representation of the mean local intensity but to assure convergence at the decoder requiring a modification of the photometric transformation, without a constraint on the intensity scaling coefficients[17]. This can be considered as orthogonalization with respect to the constant blocks and has been treated in detail in[18]. So, the matching rule in fractal image coding is RMS:
R M S = 1 N [ i = 1 N r i 2 + s ( s i = 1 N d i 2 2 i = 1 N r i d i + 2 o i = 1 N d i 2 + o N · o 2 i = 1 N r i
(9)
s = N i = 1 N r i d i i = 1 N r i i = 1 N d i N i = 1 N d i 2 i = 1 N d i 2
(10)
o = 1 N i = 1 N r i s i = 1 N d i
(11)

where r i , d i (i = 1…N) are pixel value of range block (R) and domain block (D) and N is the number of pixels in the block.

A new fractal monocular video coding method

Macroblock partition

Macroblock partition has a large impact on calculation speed and complexity of video compression algorithm. In circular prediction mapping and non-contractive interframe mapping (CPM/NCIM)[19], a frame is partitioned by quadtree partition and the iteration is used in matching process, resulting in high calculation complexity. In this article, macroblock partition scheme like in H.264 is used which reduces the number of the blocks compared to the quadtree partition. A frame is partitioned into many fixed size (generally 16 × 16 pixels) macroblocks, and then each macroblock may be partitioned in four ways and motion compensated either as one 16 × 16 macroblock partition, two 16 × 8 partitions, two 8 × 16 partitions or four 8 × 8 partitions as shown in Figure1.
Figure 1

Microblock partition mode.

Before the block matching processing, RMS of the whole microblock is calculated in mode 1, and γ is defined as a threshold. Encodings made with lower γ will have better fidelity, longer encoding time, and those with higher γ will have worse fidelity, shorter time. In general, we let
γ = t × t × n o

t is dependent on the size of the range block. With enormous practice, when the range block is 16 × 16, t is 10.0; when the range block is 8 × 8, t is 8.0; when the range block is 4 × 4, t is 6.0. Thus, we can get good performance. no is the pixel number of the range block. The steps of macroblock partition are as follows.

First, RMS which is calculated in mode 1 is compared to γ. If RMS is less than γ, then current IFS is saved and the algorithm processes the next block matching. Otherwise RMS will be calculated after the whole microblock is partitioned in mode 2. If RMS is more than γ, then mode 3 will be used. However, if RMS in mode 3 is also more than γ, mode 4 will be used automatically. Taking into account that the RMS of the four 8 × 8 blocks in mode 4 is more than γ, the block can be partitioned in a further four ways either as one 8 × 8 sub-macroblock partition, two 4 × 8 partitions, two 8 × 4 partitions, or four 4 × 4 partitions to find the matching block in the same way. And the result of microblock partition is shown in Figure2. In areas where there is little change between the frames (RMS is little), a 16 × 16 partition is chosen; in the areas of detailed motion, smaller partitions are more efficient.
Figure 2

Result of microblock partition.

A fast ME method

The most important factors which affect the fractal compression ratio and speed are the number of the domain blocks which need be searching and the ME method. The first one is discussed in Section 3.1. ME is time-consuming in video coding. We proposed a fast ME method which reduced the block searching strategy and range to increase the calculation speed greatly. Since there are temporal and spatial relations between two frames, the mapping block (domain block) is often near by the corresponding location in the reference frame of the range block. And the searching range is limited from 7 to 15 pixels from the corresponding location. So, the calculation complexity is decreased.

Using homo-I-frame in H.264

The original reference frame (homo-I-frame in H.264) makes a great impact on compression ratio and decoding image quality. In CPM/NCIM, the original reference frames are coded by using CPM, and the original reference frames could be several frames.

But in CPM, the coding process involves complex block-classifying, block-overturning, and iteration in order to make decoding frames converge to original frames, so the compression performances are under the requirements. Then, the method based on the discrete cosine transform (DCT) which is expressed in Equation (12) and has worked effectively in JPEG image compression standard is used to treat the original reference frame[20].

Let Xi,j denote a pixel, i = 0…N, j = 0…N, with an N × N block. The Y x,y is an N × N matrix which stores DCT coefficients. Taking the DCT used in JPEG,
Y x , y = 1 4 C x y i = 0 N 1 j = 0 N 1 X i , j cos 2 j + 1 y π 2 N cos 2 i + 1 x π 2 N
(12)

with C x = { 1 2 , for x = 0 1 , for x > 0 C y = { 1 2 , for y = 0 1 , for y > 0 ·.

And the inverse DCT (IDCT) which is used in decoder
X i , j = 1 4 x = 0 N 1 y = 0 N 1 C x C y Y x , y cos 2 i + 1 x π 2 N cos 2 j + 1 y π 2 N
(13)

Stereo fractal video coding and decoding

Stereo fractal video coding

The most mature technique for the multi-view video sequences compression is the method defined in the MPEG-4 multi-view profile[21]. With this approach, for example, the coder first compresses the right view with a monoscopic video sequence coding algorithm. To code the left view, each macroblock is predicted both from the right view using DCP, and from the previous frame of the right view using MCP as shown in Figure3. The predictive residual is then coded using the one which gives smaller predictive residual.
Figure 3

MCP and DCP structure.

So, in this article, we make use of this method based on fractal video compression algorithm which is presented in Section 3. For right view frames, the coder searches the D block in the previous frame of right view using domain block searching strategy and range in Section 3.2, but for left view frames, the coder searches the D block both in the right view frame (DCP) and left previous frame (MCP), and in two D blocks, the block which has smaller RMS is the best matching.

Stereo fractal video decoding

We cannot decode the right and left view videos simultaneously since the right view video which uses DCP to encode cannot decode independently. For example, we decode the third frame of the right video after decoding the reference frame of the left video. The difference between using CPM/NCIM and using homo-I-frame in fractal decompression is that the decompression method is IDCT transformation in the latter method which using DCT transformation to encode. Fractal decompression is an iterative process which uses the following equation:
R i s i · D a i + o i - s i · d · C
(14)

where d and C are the DC part of the domain block (Da(i)) and the coefficients of the DCT transformation, respectively. R i denotes range block. s i and o i are known as the scaling and the offset factors.

Object-based fractal video coding

Object-based fractal video compression is proposed in this article. The objects can be defined by a prior segmentation map named alpha plane and are encoded independently of each other.

The details of the proposed method are as following: the R and D blocks remain rectangular. If all pixels of the R block are inside current coded object, the R block is called interior block. If some pixels of R block are inside current encoded object but some are outside, it is called boundary block. The position of R and D block must be same to each other when searching and matching, interior block matching interior block, boundary block matching boundary block. Coding interior block is the same as NOB, so the key of OB is how to code boundary block. It is obvious that they contain pixels from two or more objects (i.e., the foreground and background). Therefore, in order not to mix pixels from different objects within one transformation, we associate the alpha plane a label with each pixel. It means that same label pixels are from one object as shown in Figure4.
Figure 4

Same label pixels in boundary block correspond to one object.

It is supposed that current coded object is object 1 as S1, so the method of coding boundary block is as following: for the relative computing of R block, only pixels inside S1 are calculated, pixels inside S2 are not considered; for the relative computing of D block, as is shown in Equation (15),
I r i d i = { I d i , i f d i s 1 a n d t h e c o r r e s p o n d i n g p o s i t i o n i n R , r i s 1 I , i f d i s 1 a n d t h e c o r r e s p o n d i n g p o s i t i o n i n R , r i s 1
(15)
If the pixel d i within D block corresponds to the same position of R block belongs to S1, then its original value (I(d i )) is used, otherwise the average value of pixels ( I ) inside D block which belongs to S1 is assigned to d i , resulting the new D block intensity I r i d i . Illustration of the object-based video frames mapping for the interior and boundary blocks is shown in Figure5.
Figure 5

Illustration of the object-based video frames mapping for the interior and boundary blocks.

Experimental results

Monocular video coding

To evaluate the performance of the proposed monocular codec, we use four video sequences: “hall.cif” (352 × 288 pixels, 15 frames), “highway.cif” (352 × 288 pixels, 15 frames), “race.yuv” (640 × 480, 15 frames), and “bridge-close.cif” (352 × 288 pixels, 15 frames). The maximum and minimum partition block sizes are 16 × 16 pixels and 4 × 4 pixels, respectively. To compare the performances with other methods, H.264 (main profile, JM 15.1, Search range: 7 pixels, Type of block matching algorithm: UMHexagon Search, QP: 28, Adopted fractional-pixel accuracy: 1/4 pixel, Entropy coding method: CAVLC) and CPM/NCIM are used. The experiments are proceeded in a PC (OS: Microsoft Windows XP Professional, CPU: Inter® Pentium® D, 3.20 GHz, RAM: 2048 MB).

The comparison of average coding results of four video sequences is shown in Table1. The results indicate that the proposed method can raise compression ratio 3.6 to 7.5 times, speed up compression time 5.3 to 22.3 times, and improve the image quality 3.81 to 9.24 dB in comparison with CPM/NCIM. Although the values of PSNR are lower than H.264, they are all above 32 dB and the human eyes are insensitive to the differences. The compression ratios of the proposed method are near to H.264, and some are higher than that of H. 264, while the compression speed is much better than H.264, which speed up compression time 1.93 times on average. So, the proposed method leads to more real-time applications.
Table 1

Comparison of coding results of four video sequences

Sequences

PSNR (dB)

Compression ratio

Compression time (s)

CPM/NCIM

Proposed

H.264

CPM/NCIM

Proposed

H.264

CPM/NCIM

Proposed

H.264

Hall

26.25

35.49

37.98

13.22

98.35

113.30

15.36

0.88

1.91

Highway

31.72

35.53

38.89

22.60

81.18

105.84

9.80

0.94

1.89

Race

25.07

32.60

38.69

10.82

38.53

36.22

20.93

3.95

6.11

Bridge-close

26.06

32.40

35.75

9.30

63.97

55.62

22.52

1.01

2.01

The comparison is shown in Figure6 for 15 frames of “bridge-close.yuv”. It is obvious that the proposed method has better performances than CPM/NCIM, and also has advantages compared to H.264. Besides, the PSNR or the quality of decoded image in the proposed method could be improved by inserting the I-frame.
Figure 6

The experimental comparison of “bridge-close.yuv”. (a) Comparison of PSNR. (b) Comparison of compression ratio. (c) Comparison of compression time.

Table2 shows the comparison of performances of the proposed method and the state-of-the-art fractal compression methods in[22, 23]. The sequence “conference” (255 × 255 pixels, 15 frames) is used and the compression ratio is 6.13 times higher than that of[22] and 4.24 times higher than[23]. The proposed method achieves 98.8 % computational time saving with 0.9 dB higher of PSNR, comparing with the results of[22]. Although the PSNR of our scheme is decreased by 0.6 dB compared to that of[23], the coding time is saved by 98.7 %.
Table 2

Experimental results comparison between the proposed method with[22, 23]

Sequences 256 × 256 pixels

Performance

Methods

Proposed

Reference[22]

Reference[23]

Videoconference

Compression ratio

83.91

11.76

16.02

 

PSNR (dB)

34.67

33.79

35.40

 

Total encoding time (min)

0.23

20.00

18.05

Stereo video coding

To evaluate the performance of the proposed stereo codec, we use “flamenco_r.yuv” and “flamenco_l.yuv” (640 × 480 pixels, 15 frames). The “flamenco_r.yuv” is right view, and the “flamenco_l.yuv” is left view. First, “flamenco_l.yuv” is compressed by monocular codec; second, it is compressed by stereo codec. As shown in Figure7, the PSNR and compression ratio of stereo codec are better than monocular codec, as the calculation raises, the compression time is more than monocular codec. For example, the proposed stereo method can raise compression ratio 1.7 to 3.7 and improve the image quality 0.13 dB on average comparing to the proposed monocular method.
Figure 7

The comparison of proposed monocular codec and proposed stereo codec of “flamenco_l.yuv”. (a) Comparison of PSNR. (b) Comparison of compression ratio. (c) Comparison of compression time.

The decoded images of 11th frame of “flamenco_r.yuv” and “flamenco_l.yuv” are shown in Figure8. Figure8a is the original image of “flamenco_r.yuv”; Figure8b is the decoded image of “flamenco_r.yuv” (compression ratio: 58.40, PSNR: 34.26 dB); Figure8c is the original image of “flamenco_l.yuv”; Figure8d is the decoded image of “flamenco_l.yuv” (compression ratio: 54.51, PSNR: 34.20 dB).
Figure 8

The decoded results of 11th frame of “flamenco_r.yuv” and “flamenco_l.yuv”. (a) Original image of 11th frame of “flamenco_r.yuv”. (b) Decoded image of 11th frame of “flamenco_r.yuv”. (c) Original image of 11th frame of “flamenco_l.yuv”. (d) Decoded image of 11th frame of “flamenco_l.yuv”.

Table3 shows the experimental results based on a set of stereo video sequences (“ballroom”, 640 × 480 pixels; “exit”, 640 × 480 pixels; “vassar”, 640 × 480 pixels) by comparing the proposed stereo video coding with the proposed monocular video coding and the JMVC full search (PelBlockSearch: PBS)[24]. The experiments are carried out by JMVC 4.0 (QP = 32)[25]. Two hundred and forty-eight frames from each sequence are tested and the average values are listed in Table3. As shown in Table3, proposed stereo codec, compared to proposed monocular codec and JMVC 4.0, can achieve a certain enhancement in PSNR and reduction in bit rate. For example, the PSNR of the proposed stereo video coding is about 0.17 dB higher than that of the proposed monocular video coding, and 0.69 dB higher than that of JMVC 4.0 on average. Comparing with the bit rate resulted by the proposed monocular video coding and JMVC 4.0, the proposed stereo video coding achieves, on average, 2.53 and 21.14 Kbps bit rate saving, respectively. The compression time of stereo video coding is about 0.51 s less than JMVC 4.0 on average, but as the calculation raises, the compression time is more than monocular codec.
Table 3

Performance comparison between proposed monocular video coding, proposed stereo video coding, JMVC (4.0)

Sequences 640 × 480 pixels

Methods

PSNR (dB) (average)

Bit rate/kbps

compression time (s/frame) (average)

Ballroom

Proposed monocular video coding

33.93

172.80

1.79

Proposed stereo video coding

34.25

168.38

9.49

JMVC (4.0)

33.45

187.76

12.07

Exit

Proposed monocular video coding

36.54

57.57

5.30

Proposed stereo video coding

36.67

55.85

11.71

JMVC (4.0)

36.03

88.10

10.36

Vassar

Proposed monocular video coding

35.09

52.37

3.53

 

Proposed stereo video coding

35.12

50.93

8.49

 

JMVC (4.0)

34.49

62.73

8.50

Object-based video coding

To evaluate the performance of the proposed OB codec, we use “foreman.cif” and its alpha plane. The study in[26] indicates that the encoding cost of alpha plane, which is about 0.021 bits per pixel, is much low. For the alpha plane of the sequence “foreman.cif”, its encoding cost is about 0.26 kb per image. Comparing to 148.5 kb of each original image, we can ignore the additional bits for compression ratio. As shown in Table4 and Figure9, the average performance results of OB are better than NOB and H.264.
Table 4

The performance comparison of NOB, OB, and H.264 of “foreman.cif”

 

NOB

Object 1

Object 2

H.264

PSNR (dB)

34.56

35.23

36.65

35.96

Compression ratio

55.83

118.81

92.90

113.75

Compression time (s)

0.87

0.66

0.75

0.96

Figure 9

The comparison of the results of NOB and OB by the proposed method and by H.264 of “foreman.cif”. (a) Comparison of PSNR. (b) Comparison of compression ratio. (c) Comparison of compression time.

The decoded images of 9th frame of “foreman.cif” are shown in Figure10. Figure10a is the original image; Figure10b is the decoded image by NOB (compression ratio: 65.77); Figure10c is the decoded image of object 1 by OB (compression ratio: 115.66); Figure10d is the decoded image of object 2 by OB (compression ratio: 87.13).
Figure 10

The decoded results of 9th frame of “foreman.cif”. (a) Original 9th frame of foreman.cif. (b) Decoded NOB 9th frame. (c) Decoded OB 9th frame foreground (Object 1). (d) Decoded OB 9th frame background (Object 2).

Conclusion

Based on the classical fractal video compression method, monocular and stereo fractal video compression methods are proposed in this article, experimental results indicate that the proposed monocular fractal video compression method can raise compression ratio 3.6 to 7.5 times, speed up compression time 5.3 to 22.3 times, and improve the image quality 3.81 to 9.24 dB in comparison with CPM/NCIM. The PSNR of the proposed stereo video coding is about 0.17 dB higher than that of the proposed monocular video coding, and 0.69 dB higher than that of JMVC 4.0 on average. Comparing with the bit rate resulted by the proposed monocular video coding and JMVC 4.0, the proposed stereo video coding achieves, on average, 2.53 and 21.14 Kbps bit rate saving, respectively. A new object-based method improves the performances of fractal video coding algorithm obviously. The proposed object-based fractal video coding method which can increase compression ratio, the decoded image quality, and speed is simple and effective, and adds more flexibility and practicability to the applications of fractal video coding.

Declarations

Acknowledgments

The project was sponsored by the National Natural Science Foundation of China (NSFC) under Grant nos. 61075011 and 60675018, and also the Scientific Research Foundation for the Returned Overseas Chinese Scholars from the State Education Ministry of China. The authors thank for their financial supports. The authors would like to thank the editors and the reviewers for their hard work and their insightful suggestions, which help improving this article.

Authors’ Affiliations

(1)
Department of Measurement Control and Information Technology, School of Instrumentation Science and Optoelectronics Engineering, Beihang University

References

  1. Mandelbrot BB: The Fractal Geometry of Nature. W H Freeman and Company, New York; 1982.MATHGoogle Scholar
  2. Demers M, Kunze H: Fractal-based methods in imaging. Nonlinear Anal.: Theory, Methods Appl 2009, 71(12):1413-1419. 10.1016/j.na.2009.01.179View ArticleGoogle Scholar
  3. Lin Y-L, Ming-Sheng W: An edge property-based neighborhood region search strategy for fractal image compression. Comput. Math. Appl. 2011, 62(1):310-318. 10.1016/j.camwa.2011.05.011MathSciNetView ArticleMATHGoogle Scholar
  4. Lazar MS, Bruton LT: Fractal block coding of digital video. IEEE Trans. Circuits Syst. Video Technol. 1994, 4(3):297-308. 10.1109/76.305874View ArticleGoogle Scholar
  5. Wang M: Cuboid Method of Fractal Video Compression. Macquarie Scientific Publishing, Sydney; 2004.Google Scholar
  6. Barthel KU, Voye T: Three-dimensional fractal video coding. In IEEE International Conference on Image Processing, vol. III. Washington, DC; 1995:260-263.Google Scholar
  7. Fisher Y, Shen TP, Rogovin D: Fractal (self-VQ) encoding of video sequences. In Proceedings of the SPIE Visual Communications and Image Processing, vol. 2308. Chicago, IL; 1994:1359-1370.Google Scholar
  8. Kim CS, Lee SU: Fractal coding of video sequence by circular prediction mapping. In NATO ASI Conference on Fractal Image Encoding and Analysis, vol. 5. London; 1997:1-15.Google Scholar
  9. Wang M, Lai C-H: Grey video compression methods using fractals. Int. J. Comput. Math. 2007, 84(11):1567-1590. 10.1080/00207160601178299MathSciNetView ArticleMATHGoogle Scholar
  10. Gharavi-Alkhansari M, Huang TS: Fractal video coding by matching pursuit. In Proceedings of the IEEE International Conference on Image Processing, vol. 1. Lausanne, Switzerland; 1996:157-160.View ArticleGoogle Scholar
  11. Wang Y, Ostermann J, Zhang YQ: Video Processing and Communication. Prentice-Hall; 2002.Google Scholar
  12. Rob K: Overview of the MPEG-4 Standard. 2002. ISO/IEC JTC1/SC29/WG11 N4668Google Scholar
  13. Belloulata K, Konrad J: Fractal image compression with region-based functionality. IEEE Trans. Image Process. 2002, 11(4):351-362. 10.1109/TIP.2002.999669View ArticleGoogle Scholar
  14. Peng B, Zhang L, Zhang D: Automatic image segmentation by dynamic region merging. IEEE Trans. Image Process. 2011, 20(12):3592-3605.MathSciNetView ArticleGoogle Scholar
  15. Jacquin AE: Image coding based on a fractal theory of iterated contractive image transformations. IEEE Trans. Image Process. 1992, 1(1):18-30. 10.1109/83.128028View ArticleGoogle Scholar
  16. Jaquin A: Fractal image coding: a review. Proc IEEE 1993, 10: 1451-1465.View ArticleGoogle Scholar
  17. Fisher Y: Fractal encoding with quadtrees. Fractal Image Compression. In Theory and Applications to Digital Images. Edited by: Fisher Y. Spring-Verlag, Berlin; 1995:55-77.Google Scholar
  18. Oien G, Lepsoy S: Fractal-based coding with fast decoder convergence. Signal Process. 1994, 40(1):105-117. 10.1016/0165-1684(94)90024-8View ArticleMATHGoogle Scholar
  19. Kim CS, Lee SU: Fractal coding of video sequence using circular prediction mapping and noncontractive interframe mapping. IEEE Trans. Image Process. 1998, 7(4):601-605. 10.1109/83.663508View ArticleGoogle Scholar
  20. Zhou Y-M, Zhang C, Zhang Z-K: An efficient fractal image coding algorithm using unified feature and DCT. Chaos. Solitons Fract. 2009, 39(4):1823-1830. 10.1016/j.chaos.2007.06.089View ArticleMATHGoogle Scholar
  21. Strintzis MG, Malassiotis S: Object-based coding of stereoscopic and 3D image sequences. IEEE Signal Process. Mag. 1999, 16(3):14-28. 10.1109/79.768570View ArticleGoogle Scholar
  22. Wang MQ, Choi-Hong L: A hybrid fractal video compression method. Comput. Math. Appl. 2005, 50(3–4):611-621.MathSciNetView ArticleMATHGoogle Scholar
  23. Wang MQ, Liu R, Choi-Hong L: Adaptive partition and hybrid method in fractal video compression. Comput. Math. Appl. 2006, 51(11):1715-1726. 10.1016/j.camwa.2006.05.009MathSciNetView ArticleMATHGoogle Scholar
  24. X-l T, S-k D, C-h C: An analysis of TZSearch algorithm in JMVC. In 2010 International Conference on Green Circuits and Systems (ICGCS). Shanghai, China; 2010.Google Scholar
  25. Heiko S, Tobias H, Karsten S: JMVC software model Version 4.0. 2009.Google Scholar
  26. Labelle L, Lauzon D, Konrad J, Dubois E: Arithmetic coding of a lossless contour-based representation of label images. In IEEE International Conference on Image Processing, vol. 1. Chicago, IL, USA; 1998:261-265.Google Scholar

Copyright

© Zhu et al.; licensee Springer. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.