Multiframe adaptive Wiener filter super-resolution with JPEG2000-compressed images
© Narayanan et al.; licensee Springer. 2014
Received: 21 October 2013
Accepted: 4 April 2014
Published: 25 April 2014
Historically, Joint Photographic Experts Group 2000 (JPEG2000) image compression and multiframe super-resolution (SR) image processing techniques have evolved separately. In this paper, we propose and compare novel processing architectures for applying multiframe SR with JPEG2000 compression. We propose a modified adaptive Wiener filter (AWF) SR method and study its performance as JPEG2000 is incorporated in different ways. In particular, we perform compression prior to SR and compare this to compression after SR. We also compare both independent-frame compression and difference-frame compression approaches. We find that some of the SR artifacts that result from compression can be reduced by decreasing the assumed global signal-to-noise ratio (SNR) for the AWF SR method. We also propose a novel spatially adaptive SNR estimate for the AWF designed to compensate for the spatially varying compression artifacts in the input frames. The experimental results include the use of simulated imagery for quantitative analysis. We also include real-video results for subjective analysis.
Multiframe super-resolution (SR) is a post processing technique designed to reduce aliasing and enhance resolution for detector-limited imaging systems . As described in [2, 3], SR methods generally fuse a set of low-resolution (LR) frames with a common field of view to form a high-resolution (HR) image with reduced aliasing. SR methods assume the presence of motion between the frames that can be estimated or known with sub-pixel accuracy. The motion allows each frame to capture certain unique samples of the scene, effectively increasing the sampling frequency of the imaging sensor. SR can be applied to produce a single output frame or produce video output by employing a moving temporal window of frames . SR techniques have proven to be highly successful in providing meaningful resolution enhancement for images and videos under appropriate conditions.
SR research has grown significantly in recent years . However, the majority of SR research focuses on the use of raw uncompressed image data obtained directly from an imaging sensor. But in many practical imaging applications, the acquired video frames must be stored using limited file size or compressed in order to be transmitted through a band-limited channel. One such powerful compression method is Joint Photographic Experts Group 2000 (JPEG2000) compression . Recently, studies have shown that JPEG2000 is a good choice for high-quality and high-resolution videos . In 2004, the motion picture industry, specifically Digital Cinema Initiatives, announced JPEG2000 as the standard for digital delivery of all motion pictures . In light of this, as well as the emergence of some important new classes of SR algorithms, important questions are raised regarding how to best incorporate the benefits of both SR and JPEG2000 image compression. For example, how does compression before SR compare to compression applied after SR? Also, how does SR performance degrade with compression ratio using JPEG2000? What can be done to improve the performance of SR methods with JPEG2000? We shall attempt to address these and other questions here.
In previous work, some SR techniques have been applied to compressed imagery and video signals. For video signals, most of the work focuses on processing LR imagery that has been compressed using Moving Picture Experts Group (MPEG) and H.26x methods. In [8, 9], a compression method of sub-sampling video prior to MPEG4 compression and SR-applied post decompression shows signal-to-noise ratio (SNR) improvement over merely MPEG4 compressing the original full resolution signal. In , P- and B-frames in an H.264 compression technique are sub-sampled prior to compression. At the decompression end, SR is used to resolve the P- and B-frames using the I-frames as the training samples. In addition to compressed video, SR techniques have been applied to independently compressed images. In , SR is applied to imagery after JPEG compression and shows degradation to the SR performance when compression is high. In [12, 13], SR techniques designed to be robust to JPEG compression artifacts are described. A downsampling-based video coding method is proposed in . There, SR is used to restore the downsampled frames to their original resolutions. An SR algorithm specifically for video with atmospheric turbulence and MPEG-4 compression is described in . In [16, 17], SR techniques are applied in the transform domain as part of compression/decompression. Finally, the work in  describes an SR technique designed to restore details in imagery that have been degraded due to JPEG2000 compression. With the exception of , the combination of SR and JPEG2000 has not been widely studied. We believe this is an important area to investigate.
In this paper, we provide a novel study of the performance of SR with JPEG2000 compression. We employ relatively new SR technique based on the adaptive Wiener filter (AWF). The AWF is a computationally efficient SR method, suitable for real-time implementation, with generally good performance . This method has been selected because of its computational simplicity and best-in-class performance, as demonstrated in . Here, we investigate several architectures for combining AWF SR with JPEG2000 compression. These include systems that apply compression prior to SR and compression after SR processing. We also investigate the use of individual-frame compression, as well as motion-compensated difference-frame compression. We study how SR performance is impacted by a wide range of compression ratios (CRs). Based on our findings, we make some practical and important recommendations and observations regarding the joint use of JPEG2000 and SR. We also show that by modifying the SNR present in the correlation model used by the AWF SR method, the compression artifacts can be better tolerated. Furthermore, a novel spatially varying SNR model is proposed and demonstrated to specifically target the adverse effects of spatially varying compression artifacts.
Applications that we believe are well suited to the joint use of SR and compression include airborne and satellite imaging [19–22]. In these applications, the dominant inter-frame motion is the result of camera platform motion. Thus, the motion can be well modeled with a global motion model. This allows for accurate sub-pixel motion estimation for super-resolution. There is also a strong need for compression in these applications, in order to store and transmit the acquired data through band-limited channels. We have observed that a video that is well suited to multiframe SR is also likely a good candidate for difference-frame compression. In this case, the correlation between registered frames is exploited for compression, and the high-frequency differences are exploited for SR.
The remainder of this paper is organized as follows. Section 2 presents the basic AWF SR algorithm along with the proposed spatially varying SNR estimation method. Several architectures for combining AWF SR and JPEG2000 compression are presented in Section 3. The experimental results are provided in Section 4. Finally, conclusions are offered in Section 5.
2 AWF SR algorithm
The AWF SR method is introduced in . We provide some of the key algorithm details here for the reader's convenience. We begin with the observation model and then describe the AWF SR algorithm.
2.1 Observation model
To model the PSF, we follow the approach in  that models diffraction and detector integration. For diffraction-limited optics, the spatial cut-off frequency is ωc = 1/(λ × f‒number). Here, λ is the wavelength of light, and f-number is the ratio of the focal length to the effective diameter of the optics. Another critical parameter for an imaging system is the detector pitch, p, for the focal plane array. The pitch is the spacing between the detector elements, and one divided by the pitch is the sampling frequency. The Nyquist criterion dictates that to avoid aliasing, we must have 1/p > 2ωc. However, due to a complex trade space in imaging system design, many imaging systems do not meet the Nyquist criterion. This issue is addressed well in . Resolution in such undersampled imaging systems may be thought of as limited by the detector array, rather than optically limited . These systems may benefit from multiframe SR processing, such as the AWF SR method.
2.2 AWF SR processing
Based on the distances between the observed samples in g i as they appear on the HR grid, we evaluate Equation 6 to populate R i . Similarly, given the distances between the desired sample position and the observed samples, we evaluate Equation 5 and populate the vector p i . This allows us to compute w i using Equation 3. Note that by estimating the HR image in this fashion, the AWF is simultaneously performing nonuniform interpolation, deconvolution, and noise reduction, all with a single weighted sum operation. This sets it apart from other fast SR methods that perform nonuniform interpolation and restoration as independent processing steps. The combined approach of the AWF has computational and performance robustness advantages .
2.3 Spatially varying SNR model
The AWF SR correlation model tuning parameters impacting performance are ρ and the SNR given by . We will show that by decreasing the assumed global SNR, we are able make the AWF SR more robust to the effects of compression. However, the compression artifacts tend to be spatially varying. For example, ringing artifacts are often produced in flat areas adjacent to strong edges. To better mitigate the impact of this kind of spatially varying artifact on AWF SR, we propose employing a correlation model with a spatially varying SNR.
Here, we estimate this local SNR in a novel manner. We first align the LR observed frames based on the SR registration and then average the frames. We then estimate the local variance using a Gaussian weighting function and treat this as the signal variance. Note that the frame averaging tends to reduce compression artifacts and noise, preventing those factors from falsely increasing the local signal estimate. We assume the noise variance is a global constant that is used as a tuning parameter. The ratio of estimated local signal variance to noise variance allows us to form a preliminary local SNR estimate. The final step is to filter this SNR array with a 5 × 5 2D minimum filter. This minimum filter step tends to lower the SNR estimate in flat areas near edges that are most vulnerable to ringing artifacts from compression. Areas of dense texture and detail tend to maintain the high SNR.
The estimated local SNR is then used with Equations 3 and 4. To keep the computational complexity low for the spatially adaptive AWF SR, we quantize the local SNR value estimated to K = 20 levels. The K distinct correlation models give rise to K sets of filter weights. These weights may be pre-computed prior to processing video frames. At each spatial location, the local SNR is computed and the appropriate filter weight vector is applied.
3 Architectures for SR with JPEG2000 compression
There are number of ways to combine SR algorithm with JPEG2000 compression. One way is to apply AWF SR after performing compression on the LR input frames. We also consider performing AWF SR first and then compressing the resulting SR images. When using compression on multiple input frames, we consider both individual- and difference-frame methods as described below.
3.1 SR after JPEG2000 compression
3.1.1 Individual-frame method
Perhaps the most straightforward method for treating the multiframe input with compression is to compress each frame individually and independently. This allows each frame to be decompressed independently, providing an advantage over MPEG-X (1, 2, and 4), for example. We shall refer to this as the individual-frame method. All processing here is done using MATLAB (The MathWorks, Inc., Natick, MA, USA), and JPEG2000 is achieved by using the 9/7 transform, no quantization, and optimal truncation for rate control.
It should be noted that SR is most beneficial for significantly undersampled imaging systems where aliasing is present. For such an imaging system, the individual LR observed frames may not compress well because of high spatial frequency content. However, since a set of frames suitable for SR must overlap in the field of view, these frames are also likely to exhibit inter-frame correlation. Thus, we also consider compression of registered difference frames, as described in the following sub-section.
3.1.2 Difference-frame method
Next, the decompressed reference frame is shifted to match each of the remaining P − 1 LR frames, and P − 1 difference frames are computed and compressed. The shifts are estimated from the LR frames the same way they are for SR . The difference frames are denoted as e(k) for k = 1, 2, …, P − 1. After compression, these difference frames are denoted as . To be stored and/or transmitted from this compression stage, we have the compressed reference, , along with the compressed difference frames and shifts s(k), for k = 1, 2, …, P − 1.
3.2 JPEG2000 compression after the application of SR
While the pixel dimensions of the SR image are increased, the aliasing is reduced, making the SR image generally easier to compress. Furthermore, the SR process gets the benefit of working on data with no compression artifacts. In our experiments, we combine 16 LR frames to produce a single SR image with upsampling in each dimension of L = 4. In a video-to-video application, the SR video frame rate may be the same as the LR video rate. In this case, we have L2 the number of SR pixels as we do input pixels. However, the possibility of difference-frame compression on the SR video exists for enhanced compression of the SR video.
4 Experimental results
In this section, we present the experimental results that include simulated LR data and a real-video sequence. The simulated data allow for quantitative analysis, while the real data allow for a subjective comparison in a real application. In addition to comparing the architectures described in Section 3, we also examine the use of the spatially varying SNR model, as described in Section 2.3, to treat the compression artifacts more robustly.
4.1 Simulated video data
In this section, we begin by presenting and describing the subjective image results and then we present the quantitative results.
4.1.1 Image results
PSNR obtained using various SR/compression methods
SR after compression
SR before compression
MAE obtained using various AWF SR/compression methods
Bicubic after compression (individual frame)
SR after individual-frame compression
SR after difference-frame compression
SR before compression
PSNR obtained using various AWF SR/compression methods
Bicubic after compression (individual frame)
SR after individual-frame compression
SR after difference-frame compression
SR before compression
We report compression level here using the CR value, which is defined as the ratio of input image file size to the output compressed file size. A higher value implies a smaller file size and reduced image quality. Figure 9 shows the LR reference frame compressed with a CR = 8 and then L = 4 bicubic interpolation. This is one of the P = 16 LR frames generated using the individual-frame compression method as described in Section 3.1.1. The result of applying AWF SR with L = 4 on the individually compressed LR frames is shown in Figure 10. Here, ρ = 0.7 and the SNR used is 67.40, which maximizes the peak-signal-to-noise ratio (PSNR) . The SR image looks notably sharper than the image in Figure 9. However, some artifacts are still present in the image. We have observed that the individual LR images are rather hard to compress due to aliasing. As a result of the shifts between LR frames, the compression artifacts tend to vary somewhat from frame to frame. This can lead to an SR image with artifacts as seen in Figure 10. Of course, one way to reduce these artifacts is to use a lower CR. Another approach is to use the spatially varying SNR model as described in Section 2.3. Figure 11 shows the AWF SR output image obtained using this spatially varying model. This result has lower error and, perhaps more importantly, tends to have visually reduced compression related artifacts.
The next result, shown in Figure 12, is for AWF SR applied to LR frames compressed with the difference-frame compression described in Section 3.1.2. Note that in the difference-frame compression method, the group of P = 16 LR frames is set to have an overall CR of 8, to match the individual-frame method. The optimum SNR of 315.67 is used along with ρ = 0.7. This result appears far superior to that obtained using individual-frame compression (for the same overall CR). Here, the redundancy between LR frames is exploited by the difference-frame compression. At the same time, the differences among the LR frames are exploited by SR. We believe this is perhaps the best way to combine compression and SR. Image data suitable for multiframe SR is probably well suited for difference-frame compression, due to overlapping field of view. Figure 13 shows the result obtained using the spatially adaptive correlation model AWF SR applied after difference-frame compression. As is the case for the individual-frame compression, this result has lower error and visually reduced compression artifacts.
4.1.2 Quantitative results
Let us now turn our attention to the quantitative results for the simulated data. The first quantitative experiment examines the impact of compression on registration, since sub-pixel registration is a key element of SR. Registering formerly compressed images may be necessary when compression is done prior to SR. Figure 15 shows registration error versus CR for both individual- and difference-frame compressions using the parrot image. The registration error is reported in mean absolute error (MAE) in units of pixels. These results show that registration error is small, even at large compression ratios here. We attribute this to the fact that registration is able to exploit knowledge of the global motion model, and estimation process is highly overdetermined with global motion. Registration based on difference frame-compressed images appears almost unaffected by compression. With individual-frame compression, the registration error does go up, but is still relatively small, even at a CR of 35.
In the next quantitative experiment, we compare the AWF SR method to two other benchmark methods for the different architectures proposed in Section 3. The results are shown in Table 1. The benchmark methods are Delaunay SR  and weighted nearest neighbor (WNN) SR . These methods are nonuniform interpolation-based SR methods with a computational complexity similar to that of AWF SR. We use the Kodak parrot image with CR = 8, and the optimum global SNR is employed for each method. The PSNR results in Table 1 show that AWF SR provided the highest PSNR results for these data. The remainder of the quantitative results focuses on the AWF SR method. However, we have observed the same basic trends using the benchmark methods.
Figure 16 shows PSNR as a function of CR for AWF SR with the different architectures described in Section 3 using the Kodak parrot image. The AWF SR method here uses a global SNR model. The plots clearly show that SR is still beneficial with compression, even out to high CRs. Individual-frame compression prior to SR is very good at low CRs but degrades quickly. The difference-frame compression performance is considerably better than individual frame for higher CRs. SR after compression provides the highest PSNR here. But again, for video-to-video applications, the issue of data throughput must be considered in this approach.
Figure 17 shows how adjusting the SNR in the AWF SR correlation model can improve results as CRs increase. The curves in Figure 17 show PSNR performance as a function of CR for the individual LR frame compression before AWF SR using three different SNR methods. The bottom curve is using a fixed SNR value of 249.74 for all CRs. While this is the optimum SNR for CR = 4, it does not produce the best results at higher CRs. The middle PSNR curve in Figure 17 uses the optimum SNR for each CR and provides significantly better results. Further improvement is obtained using the spatially varying SNR method, described in Section 2.3.
In a final simulated data experiment, we apply AWF SR after MPEG compression for the Kodak parrot image sequence with a CR of 8. The resulting MAE value is 3.53, and the PSNR is 32.26. Comparison of these values to the corresponding values in Tables 2 and 3 shows that the JPEG2000 difference-frame compression provides better results here than MPEG. We attribute this in large part to the ability of the global difference-frame method to better exploit the global motion than the block-based motion estimation of MPEG.
4.2 Real-video data
We see in Figures 21 and 22 that the results with the real-video data follow the same pattern seen with the simulated data. In particular, the AWF SR before compression is the best method among the architectures tested. This is followed closely in performance by AWF SR after difference-frame compression and then AWF SR after individual-frame compression using the spatially varying SNR. AWF SR after individual-frame compression using the optimum global SNR, while perhaps inferior to the other methods, still outperforms bicubic interpolation by a significant margin both subjectively and quantitatively.
The results obtained suggest that SR prior to compression provides the best results when compared to other architectures. However, this may require real-time SR processing at the sensor or storage of full resolution video for later processing. This may not always be feasible. Furthermore, in video-to-video SR applications with SR prior to compression, the data throughput is significantly increased if the frame rate is the same for the LR and SR frames (since the SR frames are upsampled by L). Often, a more practical scenario is to apply SR after JPEG2000 compression of the LR frames. With this approach, we have demonstrated that SR processing still provides improvement, even for relatively high CRs, provided that the SNR in the correlation model is adjusted to account for compression artifacts. We have proposed a novel approach for estimating and using a spatially varying SNR with the AWF SR method to help mitigate spatially varying compression artifacts.
When compression is done prior to SR, we have shown that difference-frame compression is superior to individual-frame compression. Note that a set of frames suitable for multiframe SR must overlap in field of view such that accurate registration possible. With such overlapping frames, there will be a tendency to have a significant inter-frame correlation. With difference-frame compression, this redundancy between LR frames is exploited. On the other hand, it is the differences among the LR frames that are exploited by SR to reduce aliasing; it improves the performance of the system. This provides a potentially practical and efficient approach to combining SR and JPEG2000 compression.
In summary, AWF SR processing can be effectively combined with JPEG2000 compression. Even at relatively high CRs using simple individual-frame compression prior to SR, we see improvement over bicubic interpolation. Difference-frame compression prior to SR provides improved performance without increasing the video throughput. Also, applying spatially varying SNR proposed here can improve the performance of AWF SR algorithm with JPEG2000 compression. One potentially important application for this kind of SR with JPEG2000 compression is in the field of airborne imaging [19–22]. Here, multiframe SR has been shown to be highly applicable [19–22]. At the same time, the need for compression is high due to the large amounts of data typically acquired with such systems.
- Milanfar P (Ed): Super-Resolution Imaging. Boca Raton: CRC; 2011.Google Scholar
- Hardie RC, Schultz RR, Barner KE: Super-resolution enhancement of digital video. EURASIP J. Adv. Signal Process. 2007, 2007(20984):1-3. doi:10.1155/2007/20984Google Scholar
- Kang MG, Chaudhuri S: Super-resolution image reconstruction. IEEE Signal Process. Mag. 2003, 20(3):19-20.View ArticleGoogle Scholar
- Hardie RC: A fast image super-resolution algorithm using an adaptive Wiener filter. IEEE Trans. Image Process. 2007, 16(12):2953-2964.MathSciNetView ArticleGoogle Scholar
- Marcellin MW: JPEG2000 Image Compression Fundamentals Standards and Practice. New York: Springer; 2002.Google Scholar
- Smith M, Villasenor J: Intra-frame JPEG-2000 vs. inter-frame compression comparison: the benefits and trade-offs for very high quality, high resolution sequences. Paper presented at the SMPTE technical conference and exhibition. Pasadena, CA, USA; 2004:20-23.Google Scholar
- Monaco K: Analog devices video technology enables delivery of digital cinema to big screen. Analog Devices, Inc; 2006. . Accessed 27 Feb 2014 http://www.analog.com/en/press-release/jun_26_2006_adi_video_tech_enables/press.htmlGoogle Scholar
- Molina R, Katsaggelos AK, Alvarez LD, Mateos J: Towards a new video compression scheme using super-resolution. Paper presented at the electronic imaging, International Society of Optics and Photonics. San Jose, CA, USA; 2006:607706.1-607706.13.Google Scholar
- Barreto D, Alvarez LD, Molina R, Katsaggelos AK, Callico GM: Region-based super-resolution for compression. Multidimensional Syst. Signal Process. 2007, 18(2–3):59-81.MATHMathSciNetView ArticleGoogle Scholar
- Brandi F, Querioz RD, Mukherjee D: Super-resolution of video using key frames. Paper presented at the IEEE international symposium on circuits and systems. Seattle, WA, USA; 2008:1608-1611.Google Scholar
- Freeman WT, Jones TR, Paszlor EC: Example-based super-resolution. IEEE Comput. Graph. Appl. 2002, 22(2):56-65. 10.1109/38.988747View ArticleGoogle Scholar
- Xiong Z, Sun X, Wu F: Super-resolution for low quality thumbnail images. Paper presented at the IEEE international conference on multimedia and expo. Hannover, Germany; 2008:181-184.Google Scholar
- Xiong Z, Sun X, Wu F: Robust web image/video super-resolution. IEEE Trans. Image Process. 2010, 19(8):2017-2028.MathSciNetView ArticleGoogle Scholar
- Shen M, Xue P, Wang C: Down-sampling based video coding using super–resolution technique. IEEE Trans. Circ. Syst. Video Tech. 2011, 21(6):755-765.View ArticleGoogle Scholar
- Fishbain B, Yaroslavsky LP, Ideses IA: Real-time turbulent video super-resolution using MPEG-4. Paper presented at the electronic imaging, International Society for Optics and Photonics. San Jose, CA, USA; 2008:681106.Google Scholar
- Gunturk BK, Altunbasak Y, Merseau RM: Super-resolution reconstruction of compressed video using transform-domain statistics. IEEE Trans. Image Process. 2004, 13(1):33-43. 10.1109/TIP.2003.819221View ArticleGoogle Scholar
- Pickering M, Ye GF, Arnold J: A transform-domain approach to super-resolution mosaicking of compressed images. J. Phys. Conf. Ser. 2008, 124(1):012039.View ArticleGoogle Scholar
- Xiang Z, Jie Y, Du SD: Super-resolution reconstruction of image sequences compressed with DWT-based techniques. Paper presented at the IEEE international conference on wavelet analysis and pattern recognition. Beijing, China; 2007:555-560.Google Scholar
- Hardie RC, Barnard KJ: Fast super-resolution using an adaptive Wiener filter with robustness to local motion. Optics. Express 2012, 20(19):21053-21073. 10.1364/OE.20.021053View ArticleGoogle Scholar
- Hardie RC, Barnard KJ, Raul O: Fast super-resolution with affine motion using an adaptive Wiener filter and its application to airborne imaging. Optics Express. 2011, 19(27):26208-26231. 10.1364/OE.19.026208View ArticleGoogle Scholar
- He Q, Schultz R: Super-resolution reconstruction by image fusion and application to surveillance videos captured by small unmanned aircraft systems. In Sensor Fusion and Its Applications. Edited by: Thomas C. Rijeka: InTech; 2010:475-486.Google Scholar
- Camrago A, He Q, Palaniappan K, Jara F: Super-resolution mosaics from airborne video using robust gradient regularization. Paper presented in SPIE defense, security and sensing. Burlingame, CA, USA: International Society for Optics and Photonics; 2013:875209-1–875209-7.Google Scholar
- Hardie RC, Barnard KJ, Bognar JG, Armstrong EE, Watson EA: High-resolution image reconstruction from a sequence of rotated and translated frames and its application to an infrared imaging system. Opt. Eng. 1998, 37(1):247-260. 10.1117/1.601623View ArticleGoogle Scholar
- Fiete RD: Image quality and λ FN/ p for remote sensing systems. Opt. Eng. 1999, 38(7):1229-1240. 10.1117/1.602169View ArticleGoogle Scholar
- The Kodak Image Database . Accessed 27 Feb 2014 http://r0k.us/graphics/kodak
- Lertrattanapanich S, Bose NK: High resolution image formation from low resolution frames using Delaunay triangulation. IEEE Trans. Image Process. 2002, 11(12):1427-1441. 10.1109/TIP.2002.806234MathSciNetView ArticleGoogle Scholar
- Alam MS, Bognar JG, Hardie RC, Yasuda BJ: Infrared image registration and high resolution reconstruction using multiple transnationally shifted aliased video frames. IEEE Trans. Instrum. Meas. 2000, 49(5):915-923. 10.1109/19.872908View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.