- Open Access
A no-reference metric for demosaicing artifacts that fits psycho-visual experiments
© Gasparini et al.; licensee Springer. 2012
- Received: 28 October 2011
- Accepted: 30 May 2012
- Published: 21 June 2012
The present work concerns the analysis of how demosaicing artifacts affect image quality and proposes a novel no-reference metric for their quantification. This metric that fits the psycho-visual data obtained by an experiment analyzes the perceived distortions produced by demosaicing algorithms. The demosaicing operation consists of a combination of color interpolation (CI) and anti-aliasing (AA) algorithms and converts a raw image acquired with a single sensor array, overlaid with a color filter array, into a full-color image. The most prominent artifact generated by demosaicing algorithms is called zipper. The zipper artifact is characterized by segments (zips) with an On–Off pattern. We perform psycho-visual experiments on a dataset of images that covers nine different degrees of distortions, obtained using three CI algorithms combined with two AA algorithms. We then propose our no-reference metric based on measures of blurriness, chromatic and achromatic distortions to fit the psycho-visual data. With this metric demosaicing algorithms could be evaluated and compared.
- Reference Image
- Distorted Image
- Just Noticeable Difference
- Color Filter Array
- Error Inversion
Identification of perceptual dimensions (attributes) of quality.
Determination of relationships between attribute scale values and objective, image based measures.
Combination of attribute scale values to predict overall image quality.
To define a no-reference image quality metric is therefore needed to design a good psycho-visual experiment. Ideally, we should be able to generate a dataset of distorted images where the distortion can be controlled by a proper defect-generating process. In this way the collected data can be easily related to the considered distortion. In particular, what we would like to obtain is a monotone behavior of the perceived quality with respect to the increase of the distortion.
physical defects, such as out of focus, motion blur, noise, etc.
digital defects introduced by the processing pipeline, such as demosaicing, compression, etc.
For physical defects the procedure adopted to generate the distorted images used within the experiments could be a simulation of the physical process, while in the case of digital defects the procedure should apply the corresponding algorithm(s) within the pipeline. Note that each of these distortion processes can vary with respect to one or more parameters.
Within this context, in this paper we address the problem of how demosaicing artifacts affect image quality. The demosaicing operation converts a raw image acquired with a single sensor array, overlaid with a color filter array, into a full-color image. The most prominent artifact generated by demosaicing algorithms is called zipper. The zipper artifact is characterized by segments (zips) with an On–Off pattern.
The quality of rendered images depends on the perception of the zipper artifact that can also affect the sharpness. The perception of this artifact also depends on image content.
We here propose a no-reference metric to assess image quality in case of demosaicing artifact that combines measures of blurriness (intended as lack of sharpness), chromatic and achromatic distortions and fits the psycho-visual data. Several full-reference metrics exist for this kind of artifact , while the literature is poor in no-reference ones. Some no-reference sharpness metrics [7, 8] could be adopted, but they can not take into account typical chromatic and achromatic zipper effects. Liu et al.  have recently presented a no-reference method for CFA demosaicing based on double interpolation and have evaluated several demosaicing algorithms. However this metric has not been correlated with psycho-visual experiments.
In this work we have generated a dataset with different degrees of zipper artifacts by applying a combination of three different CI algorithms with two AA algorithms. These algorithms have been applied to a set of reference images having different visual contents. More demosaicing and/or anti aliasing algorithms could have been used. However lengthy psycho-visual tests are not reliable, and we have preferred to not reduce the number of test images.
This paper is organized as follows. In Demosaicing section we briefly describe the demosaicing process, while in Psycho-visual setup section we describe how we have generated the dataset utilized during our tests and the psycho-visual experiments that we have conducted to rank the chosen algorithms. From the analysis of the experimental data (detailed in Data analysis section), we propose our novel no-reference metric, described in No-reference metric for Demosaicing section, based on measures of blurriness, chromatic and achromatic distortions. In Metric parameter estimation section we report details of the regression we have proposed to fit the subjective data and we compare our metric with a reference one . All the psycho-visual data presented and the corresponding distorted images are available at http://www.ivl.disco.unimib.it/. Finally, in Section Methods we report details on the testing methodology adopted here.
Several algorithms for demosaicing were developed in the literature [12–17], and some of them are proprietary. A survey of these methods was presented by Li et al. . Several methods deal with content adaptive demosaicing, based on an edge detection mechanism [19–21]. Recently Rehman and Shao  have presented a demosaicing method using optimised filters, based on a training process and well-defined content classification.
We have here considered nine different demosaicing algorithms obtained combining three CI algorithms with two anti aliasing (AA) algorithms.
Bilinear interpolation : it is the simplest demosaicing algorithm and acts as a benchmark; the missing values on the three channels are computed by linear interpolation independently.
ST1: proposed by Smith , it performs an isotropic interpolation that includes a non-linear step that minimizes the energy of aliasing artifacts.
ST2: proposed by Guarnera et al. , it uses an elliptic shaped Gaussian kernel to interpolate data, according to the gradient information to better exploit spatial correlation. The authors also included an enhancement step to restore the lost high frequencies.
an algorithm authored by Freeman  that suppresses demosaicing artifacts by applying a median filtering to the chrominance channels (R-G) and (B-G) to support the reconstruction of the R and B channels. The red and blue values estimated from the median filtered are used only at pixels where there is no R or B sensor value directly available.
an algorithm authored by Lu and Tan  that proposes an AA step to extend Freeman’s median filtering method by lifting the constraint of keeping the original CFA-sampled values intact.The nine combinations of these algorithms (summarized in Table 1 produce different levels of the typical demosaicing distortions. The choice of these algorithms does not affect the effectiveness of the proposed methodology.Table 1
Demosaicing algorithms considered
Color interpolation (CI)
To perform the subjective data analysis described in this paper we have generated a data set of distorted images (which we have called Zipper database) starting from the 24 images of the Kodak photoCD pcd0992 database available at http://r0k.us/graphics/kodak/. We have created the mosaiced images by deleting two of the three RGB values at each pixel of the full-color images, and then we have demosaiced them with the nine algorithms of Table 1. The database is therefore formed by a total of (24 images × 9 demosaicing methods =) 216 images. The image testing database has been created to satisfy a good compromise between the number of distortions and the number of different visual contents, keeping in mind that psycho-visual sessions should be limited in time to be reliable. In our work we evaluate the visual impact of the artifacts generated by demosaicing methods, and do not perform a quality evaluation of the algorithms themselves.
For the quality analysis of the images we adopted two different test methods: single stimulus method (1S), and double stimulus method (2S) .
Our goal was to evaluate the perceived quality of the rendered images; for this reason we have chosen to set up a single-stimulus test as our primary source of psycho-visual data, but we were also interested in gathering as much data as possible from the viewers, so we have also conducted a double-stimulus test. We followed Sheikh et al.  in setting up our tests by including the original images in both tests and calculating the Difference Score (DS) as the difference between the scores of the original and the distorted image. This way we have obtained different data from different setups with the same unit of measure. In the case of the 1S method, all the images (rendered images and the original one) are individually shown. While in the 2S method, the reference image (original image) is shown together with each of its rendered versions. The 1S method can thus be considered as an approximation of the 2S one, as the original image is evaluated only once. The fundamental difference between these two methods is that the 2S one uses an explicit reference, while the 1S one does not use any explicit reference.
All the monitors were calibrated with a colorimeter (D65, gamma 2.2).
Their resolution is 1600×1200 pixels, which corresponds to 110 dpi (using 18 in. as the physical diagonal of the screen as indicated by the manufacturer of the monitors)
The ambient light levels (a typical office illumination) were maintained constant between the different sessions. There were no reflections on the screens.
The distance between the observer and the monitors was about 60 cm (corresponding to about 46 pixels per degree of visual angle).
The refresh rate of the monitors was 75 Hz.
In all our experiments distorted images are shown in random order, different for each subject. In the case of the 2S method the relative position of the original with respect to its distorted version is random in the pair shown.
The panel of subjects involved in this study was recruited from the Psychology Department. The subject pool consisted of students inexperienced with image quality assessment and image impairments. The total number of subjects involved in our experiments is 39, divided into three groups as follows: 9 subjects involved in tuning experiments, and 30 subjects involved in 1S and 2S experiments, 15 for each test group.
As the different algorithms considered produce different levels of the typical demosaicing defects (chromatic and achromatic zipper, blur) we analyzed the subjective evaluation of these defects through the subjective rank of the algorithms.
As a preliminary step, we have grouped the 9 demosaicing methods into triplets, with respect to the CI algorithm applied.
We have analyzed the experimental data to investigate the cross-talks between the zipper artifacts introduced by the CI process and the image content. In Figure 4 the 10 images used in our tests are listed from 1 to 10 with increasing value of visual complexity as obtained applying the complexity index described in .
Each subplot reports the experimental Score i corresponding to the nine distortions applied to each image. These scores are grouped into triplets with respect to the CI method (bilinear + three AA, ST1 + three AA, and ST2 + three AA).
We can notice that images with a comparable level of details share common patterns in their scores. In particular, when the achromatic zipper (mainly produced by the Freeman AA algorithm) is combined with middle-high frequency content (roughly second and third column of Figure 4), not only the contrast of the zipper highlights the edges, but also the middle-high frequency content masks the On–Off pattern. This combined effect results in a sharper appearance of the image; this is more evident when the images are directly compared with the reference, as in the 2S test. This behavior is related to the texture masking effect of the human visual system . From the point of view of the algorithm ranks (Figure 8) these considerations are confirmed by the good performance on these images obtained using the Freeman AA with respect to each triplets of CI algorithms (algorithms number 2, 5, 8). On the other hand, when the algorithms that produce this achromatic distortion are applied to images with a low frequency content (first column), the high contrast of the zipper pattern and its On–Off structure remain visible. In fact, the evaluation of the CI algorithms coupled with the Freeman AA in the case of low frequency content is worse than in the case of higher frequency contents, especially in the case of the 1S experiment where the sharpness is less perceived.
For what concerns the chromatic zipper, the behavior is simpler. This artifact is more visible as the number of edge pixels in the image increases, and it seems to be immune to masking effects. For this reason we chose to discriminate between chromatic and achromatic distortion.
The data analysis confirms that the perceptual quality of demosaiced images depends on sharpness, and on chromatic and achromatic zipper. For this reason we have decided to define our no-reference metric considering the following three aspects separately:
Blur as index of lack of sharpness. The corresponding measure is indicated as B in what follows. Chromatic zipper distortion (measure indicated as CD) Achromatic zipper distortion (measure indicated as AcD)
We chose a sum expression because when one of these terms is significantly high, the others are less significant. This consideration arises from the experimental evidence of the behavior of different demosaicing algorithms. A strong low pass filtering adopted to reduce the zips produces a blurred image, and thus in this case the blur measure B is dominant with respect to the others. In case of more conservative filtering, the image sharpness is preserved, but the zips still remain as a defect. Different CI algorithms produce zips with different levels of saturation, ranging from achromatic to highly saturated zips.
Chromatic and achromatic zipper
We calculated the median of DL(x,y) with respect to the whole set of zipper segments in both directions and averaged them. We performed the same calculations for DC(x,y), obtaining two indicators labeled as DL and DC in what follows. These two indicators, together with the average edge spread (Es) and the percentage of zipper pixel in the image (ZpA), were used to calculate the overall metric.
Metric parameter estimation
The Pearson and Spearman coefficients above 0.98 indicate that our metric is highly accurate and monotonic. The corresponding coefficients of the DIPSNR metric are lower, as PSNR techniques are numerical measures that usually do not correlate well with perceived distortions .
In this work we have set up psycho-visual experiments to analyze the subjective evaluation of the artifacts introduced by the demosaicing process. To this end we have generated a dataset of distorted images, applying three CI algorithms combined with two AA algorithms for a total of nine different methods. From the data analysis, it emerges that the perceptual quality of demosaiced images mainly depends on perceived sharpness, and on chromatic and achromatic zipper. The perception of the defects is more evident when the rendered images are compared with the reference one (the 2S experiment), while they may be unnoticed when images are evaluated alone (the 1S experiment). We have thus defined a no-reference metric for demosaicing artifacts based on measures of blurriness, chromatic and achromatic distortions that is able to fit these experimental data for both 1S and 2S experiments. Our metric can be applied to evaluate other demosaicing methods. As a future work we plan to perform further test sessions to acquire more data to better analyze the cross-talk between distortion perception and image frequency content.
Details of experimental sessions
The total number of subjects involved in our experiments is 39, divided into three groups: (i) 9 subjects involved in tuning experiments, (ii) 15 subjects involved in the 1S experiments (both preliminary and test sessions), (iii) 15 subjects involved in the 2S experiments (both preliminary and test sessions).
Note that each subject only belongs to one group. Each subject has been individually briefed about the modality of the experiment in which he has been involved.
All the images utilized for the psycho-visual tests were cropped to fit the dimension of the screen. In particular, to avoid the undersampling of the images used in the 2S tests, we have cropped all the images to fit a 600×600 box, producing respectively images of 600×512 or 512×600. The remaining part of the box has the same color of the background (Figure 3). Each image has been cropped manually to keep the relevant part of the scene centered, to avoid interferences in the user’s judgment, due to a non significant cropping.
The subjects assume and maintain the correct position and distance from the monitor for the duration of the experiment.
30 min is the maximum duration of the test for each subject. For longer periods attention decreases and subjects tend to get tired.
In the case of 2S test, where the two images are compared, the sliders and the quality scales must appear contemporarily on the screen.
Regarding comments and considerations of the subjects involved in this tuning session, we have determined the minimum time of image visualization that permits an appropriate quality evaluation.
During a preliminary test, each subject was implicitly trained about the nature of the distortion he was going to evaluate. In particular, he was trained about the range of the distortion intensity. These preliminary sessions were necessary to avoid this training phase during the effective test, thus conditioning the experimental results. We had preliminary sessions for all the subjects involved (except for 9 subjects involved in the tuning phase) and for each of the experiments (1S and 2S). Thus we had preliminary sessions for all the subjects involved and for each of the experiments: 2S and 1S. Four images were chosen from the entire database. The demosaicing algorithms applied to these images where the Bilinear and the ST proprietary. We have decided to apply these two algorithms because they were supposed to be the worst and the best ones. In this way the subjects experience the entire distortion range before starting the effective test.
JND inversions The subject is not able to distinguish between the original and the distorted image. The inversion is unintentional.
Preferential inversions The subject prefers the elaborated image.
Error inversions The subject does not properly use the interface and in particular, assigns a wrong value in the quality scale.
The JND threshold is estimated with a Pairwise Comparison (PC) test ;
Inversions that produce values under the JND threshold (JND inversions) are taken into account in the final analysis;
Inversions that produce values over the JND threshold are considered as error inversions. Their absolute values are taken into account in the final analysis.
How to treat the error inversions?
The error inversions cannot be common to different subjects. They are anomalous values with respect to the score distribution of each algorithm. We are not interested in finding the error inversions; we just would like to verify that they do not alter the data analysis. To this end we have validated the final rank of the algorithms (R t in Equation 3 (which is a mean measure), also with the analysis of the median of the Difference Score, which is a more robust measure with respect to noise.
How to treat the preferential inversions?
Maintaining the preferential inversions, the DS measure cannot be further considered as a distance between the reference image and the distorted one with respect to the analyzed artifact (zipper artifact), as we have previously discussed. The influence of these inversions appears to be different in the case of 1S and 2S tests. In fact, the effect of the introduced sharpness is lower in the case of the 1S test because there is not a simultaneous comparison with the original image. Thus, the analysis of the 1S test results with respect to the 2S ones can be useful for evaluating this phenomenon.
1The equations are reported only for the horizontal case. In the calculation of the differences we excluded the non-zipper pixels. Δ E76 is the standard Euclidean distance between the L∗a∗b∗coordinate of the adjacent pixels.
Our investigation was performed as a part of a ST Microelectronics’s research contract. The authors thank ST Microelectronics for the permission to present this paper.
- Engeldrum PG: Psychometric scaling: avoiding the pitfalls and hazards. IS&T’s 2001 PICS Conference Proceedings (Montreal Quebec Canada, vol. 4, 101–107, 2001)Google Scholar
- Final Report from the Video Quality Experts Group on the Validation of Objective Models of Video Quality ITU-T Study Group 9 Contribution 80(2000)Google Scholar
- Recommendation 500-11: Methodology for the Subjective Assessment of the Quality for Television Pictures. ITU-R Rec. BT, 500 (2002)Google Scholar
- Bartelson J: The combined influence of sharpness and graininess on the quality of colour prints. J. Photogr. Sci 1982, 30: 33-38.Google Scholar
- Lukac R: Single-Sensor Imaging: Methods and Applications for Digital Cameras. (Boca Raton: CRC Press, 2008)View ArticleGoogle Scholar
- Lu W, Tan Y: Color filter array demosaicing: new method and performance measures. Image Process. IEEE 12: 1194-1210. (2003)View ArticleGoogle Scholar
- Mariziliano P, Dufaux F, Winkler S, Ebrahimi T: Perceptual blur and ringing metrics: application to jpeg2000. Signal Process. Image Commun 19: 163-172. (2004)View ArticleGoogle Scholar
- Ferzli R, Karam L: A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB). IEEE Trans. Image Process 18(4):717-728. (2009)MathSciNetView ArticleGoogle Scholar
- Liu Y, Lin Y, Chien S: A no-reference quality evaluation method for CFA Demosaicking. Digest of Technical Papers International Conference on Consumer Electronics (ICCE) 1: 365-3668. (2010)Google Scholar
- Gunturk B, Glotzbach J, Altunbasak Y, Schafer R, Mersereau R: Demosaicking: color filter array interpolation in single chip digital cameras. IEEE Signal Process. Mag 22: 44-54. (2005)View ArticleGoogle Scholar
- Bayer B: Color imaging array. 1976.Google Scholar
- Cok DR: Signal processing method and apparatus for producing interpolated chrominance values in a sampled color image signal. U.S. patent 4 642 678, 1986Google Scholar
- Kimmel R: Demosaicing: image reconstruction from color CCD samples. IEEE Trans. Image Process 8: 548-258. (1999)View ArticleGoogle Scholar
- Freeman TW: Median filter for reconstructing missing color samples. U.S. Patent 4724395, 1998Google Scholar
- Brainard DH, Sherman D: Reconstructing image from trichromatic samples: from basic research to practical applications. IS&T/SID Color Imaging Conference (Scottsdale, AZ, 4–10 1995)Google Scholar
- Leung B, Jeon G, Dubois E: Least-squares luma-chroma demultiplexing algorithm for bayer demosaicking. IEEE Trans. Image Process 20(7):1885-1894. (2010)MathSciNetView ArticleGoogle Scholar
- Pei S, Tam I: Effective color interpolation in ccd color filter arrays using signal correlation. IEEE Trans. Circuits Syst. Video Technol 13(6):503-513. (2003)View ArticleGoogle Scholar
- Lia X, Gunturk B, Zhang L: Image demosaicing: a systematic survey. Proc. SPIE 6822: 68221J-68221J15. (2003)Google Scholar
- Chung K, Yang W, Yan W, Wang C: Demosaicing of color filter array captured images using gradient edge detection masks and adaptive heterogeneity-projection. IEEE Trans. Image Process 17(12):2356-2367. (2008)MathSciNetView ArticleGoogle Scholar
- Zhang L, Wu X: Color demosaicking via directional linear minimum mean square-error estimation. 14 12: 2167-2178. (2005)Google Scholar
- Chung K, Chan Y: A low complexity color demosaicing algorithm based on integrated gradient. J. Electron. Imaging 19(2):0211041-02110415. (2010)View ArticleGoogle Scholar
- Rehman A, Shao L: Classification-based de-mosaicing for digital cameras. Neurocomputing 83: 222-228. (2012)View ArticleGoogle Scholar
- Smith SG: Color image restoration with anti-alias. 2005.Google Scholar
- Guarnera M, Messina G, Tommaselli V, Bruna A: Directionally filter based demosaicing with integrated antialiasing. International Conference on Consumer Electronics, ICCE 2008. Digest of Technical Papers 2008, 1-2.View ArticleGoogle Scholar
- Sheikh H, Sabir M, Bovik A: A statistical evaluation of recent full reference image quality assessment algorithms. Image Process. IEEE 15: 3440-3451. (2006)View ArticleGoogle Scholar
- Allen E, Triantaphillidou S, Jacobson R: Image quality comparison between JPEG and JPEG2000. I. Psychophysical investigation. J. Imaging Sci. Technol 51: 548-258. (2007)Google Scholar
- Nyman G, Häkkinen J, Koivisto EM, Leisti T, Lindroos P, Orenius O, Virtanen T, Vuori T: Evaluation of the visual performance of image processing pipes: information value of subjective image attributes. Proceedings of SPIE-IS&T Electronic Imaging (San Jose, California, vol. 7529, 752905-1–752905-10 2010)Google Scholar
- Leisti T, Radun J, Virtanen T, Halonen R, Nyman G: Subjective experience of image quality: attributes, definitions and decision making of subjective image quality. Proceedings of SPIE-IS&T Electronic Imaging (San Jose, California, vol. 7242, 72420D-1–72420D-9, 2009)Google Scholar
- Radun J, Leisti T, Häkkinen J, Ojanen H, Olives J, Vuori T, Nyman G: Content and quality: interpretation-based estimation of image quality. ACM Trans. Appl. Percept 2008., 4:Google Scholar
- Kayargadde V, Martens J: Perceptual characterization of images degraded by blur and noise: model. J. Opt. Soc. Am. A 13: 1178-1188. (1996)View ArticleGoogle Scholar
- Johnson GM, Fairchild MD: Sharpness rules. Proceedings of IS&T/SID 8th Color Imaging Conference (Scottsdale, 24–30, 2000)Google Scholar
- Cardaci M, Di Gesù V, Petrou M, Tabacchi ME: A fuzzy approach to the evaluation of image complexity. Fuzzy Sets Syst 160(10):1474-1484. (2009)View ArticleMathSciNetGoogle Scholar
- Pappas TN, Safranek RJ, Chen J: Handbook of Image and Video Processing, Perceptual Criteria for Image Quality Evaluation. (San Diego: Academic Press, 939–959, 2005)View ArticleGoogle Scholar
- Sharma G: Digital Color Imaging Handbook. (CRC Press 2002)Google Scholar
- Thurstone L: A law of comparative judgement. Psychol. Rev 34: 273-286. (1927)View ArticleGoogle Scholar
- Longere P, Xuemei Z, Delahunt P, Brainard D: Perceptual assessment of demosaicing algorithm performance. Proceedings of the IEEE 90(1):123-132. (2002)View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.