Skip to main content

Advertisement

Analysis and processing of pixel binning for color image sensor

Article metrics

  • 10k Accesses

  • 8 Citations

Abstract

Pixel binning refers to the concept of combining the electrical charges of neighboring pixels together to form a superpixel. The main benefit of this technique is that the combined charges would overcome the read noise at the sacrifice of spatial resolution. Binning in color image sensors results in superpixel Bayer pattern data, and subsequent demosaicking yields the final, lower resolution, less noisy image. It is common knowledge among the practitioners and camera manufacturers, however, that binning introduces severe artifacts. The in-depth analysis in this article proves that these artifacts are far worse than the ones stemming from loss of resolution or demosaicking, and therefore it cannot be eliminated simply by increasing the sensor resolution. By accurately characterizing the sensor data that has been binned, we propose a post-capture binning data processing solution that succeeds in suppressing noise and preserving image details. We verify experimentally that the proposed method outperforms the existing alternatives by a substantial margin.

1 Introduction

Recent progress on digital camera technology has had extraordinary impact on numerous electronic industries, including mobile phones, security, vehicle, bioengineering, and computer vision systems. In many applications, sensor resolution has exceeded the optical resolution, meaning that the additional hardware complexity to increase pixel density would not necessarily result in large image quality gains. The significant improvement in sensor sensitivity has allowed cameras to operate in lighting conditions that were unthinkable with film cameras.

Despite increased sensitivity, however, noise remains a serious problem in modern image sensors. Available technologies for reducing noise in hardware include backside illuminated architecture [1, 2], color filters with higher transmittance [3, 4], and pixel binning [57]. Processing techniques at our disposal include image denoising [810], joint denoising and demosaicking [1114], image deblurring [15, 16] (long shutter to compensate for light), and single-shot high dynamic range imaging [17].

The goal of this article is to provide a comprehensive characterization of the pixel binning for color image sensors, and propose post-capture signal processing steps aimed at eliminating the binning artifacts. Binning refers to the concept of combining the electrical charges of neighboring pixels together to form a superpixel. The combined signal will then be amplified by a source follower and converted into digital values by an analog-to-digital converter. The main benefit of this technique is that the combined charges would overcome the read noise, even if the individual pixel values are small. The improved noise performance comes at the price of spatial resolution loss, however. Binning in color image sensors is complicated by the presence of color filter array (CFA). Data are typically obtained via a single CCD or CMOS sensor with a CFA spatial subsampling procedure, a physical construction whereby each pixel location measures only a single color. Figures 1a,b show the most well known CFA scheme called the Bayer pattern, which involves red, green, and blue filters. To maintain the fidelity of color, binning in color image sensors are performed by combining neighboring pixels with the same color filter. As evidenced by the two well known binning configurations shown in Figures 1a,b, the resultant superpixel form a Bayer pattern, as shown in Figure 1c. The subsequent demosaicking algorithm—the process of interpolating to recover the full RGB representation of the image from the CFA subsampled sensor data—yields the final, lower resolution, less noisy image.

Figure 1
figure1

Commonly used binning schemes. Binning refers to the concept of combining the electrical charges of neighboring pixels together to form a superpixel. (a–b) The numbers over the high resolution Bayer pattern indicate which pixels are combined together. (c) The resultant superpixel Bayer pattern, where the numbers indicate the relative locations of the combined pixels (for [7] and [5]).

However, it is a common knowledge among the practitioners and camera manufacturers that binning introduces pixelization artifacts. An example is shown in Figure 2. As will be made clear in the sequel, these artifacts differ from the ones stemming from loss of resolution, and therefore it cannot be eliminated simply by increasing the sensor resolution. In-depth analysis of the sampling scheme implied by the binning proves that gross mismatch between binning and demosaicking results is at fault for the severe pixelization. Hence the right way to correct this problem is to design a binning-aware demosaicking algorithm. The proposed method still draws from the established demosaicking principles, but with profound differences in the way spatially high frequency components are handled. To the best of the knowledge of the authors, this is the first major article to examine pixel binning problem in color image sensors from the signal processing perspective, and to provide post-capture processing solution to correct for the pixelization artifacts.

Figure 2
figure2

Binning vs. no binning. Compared to no binning, binning succeeds in reducing noise. However, the pixelization and zippering artifacts deteriorate the image quality. (a) Reconstruction from full resolution CFA; (b) reconstruction from Kodak PIXELUX scheme of Figure 1a; (c) reconstruction from PhaseOne scheme of Figure 1b.

The remainder of this article is organized as follows. We begin by briefly reviewing CFA sampling and demosaicking in Section 2. Section 3. provides a rigorous analysis of binning. A novel binning-aware demosaicking technique is developed in Section 4. We experimentally verify its effectiveness in 5. before making concluding remarks in Section 6.

2 Background

2.1 CFA sampling

Thanks to the seminal work of [18] and further investigations by [1921], CFA sampling is well characterized and understood. The key insight is the two dimensional Fourier analysis of CFA sampled sensor data, which reveals that the signal is preserved by an efficient space-color representation. Specifically, let x: Z 2 R 3 , where x(n)=x r (n),x g (n),x b (n)]T correspond to the RGB tri-stimulus value at location n Z 2 . Then the CFA subsampled data has the following form:

y ( n ) = c ( n ) T x ( n ) = c ( n ) T 1 1 0 1 0 0 1 0 1 0 1 0 1 1 0 0 1 1 x ( n ) = 1 c α ( n ) c β ( n ) x g ( n ) x α ( n ) x β ( n ) ,
(1)

where c: Z 2 [ 0 , 1 ] 3 denotes the translucency of CFA at location n. The advantage to the representation is that the difference images x α =x r x g and x β =x b x g enjoy rapid spectral decay and can serve as a proxy for chrominance. On the other hand, the “baseband” green image x g can be taken to approximate luminance. As our eventual image recovery task will be to approximate the true color image triple x(n) from acquired sensor data y(n), note that recovering either representations ({x r x g x b }or {x g x α x β }) are equivalent. Moreover, the representation of (1) allows us to re-cast the pure-color sampling structure in terms of sampling structures c α and c β associated with the difference channels x α and x β . For more extensive investigation on the bandlimitedness assumptions of {x g x α x β }, see [1820].

Denote by the uppercase letters the discrete space Fourier transforms and ω= ( ω 1 , ω 2 ) T { R / ( 2 Π ) } 2 (R/(2Π) denotes the quotient group of R by the subgroup 2Πℤ) the two dimensional Fourier index. Then the Fourier analysis of CFA is:

C α ( ω ) = λ Π Z 2 2 δ ( ω λ ) 4 , C β ( ω ) = λ Π Z 2 2 e j { λ T 1 1 } δ ( ω λ ) 4 ,

where δ(·) is the Dirac delta function, and Z 2 denotes the cyclic group of order 2. Note that the phase shift term in C β arises due to the relative position of blue pixels relative to the red (the origin is assumed to be on a red pixel). The corresponding Fourier analysis of the sensor data y takes the following form:

Y ( ω ) = X g ( ω ) + C α ( ω ) X α ( ω ) + C β ( ω ) X β ( ω ) = X g ( ω ) + λ Π Z 2 2 X α ( ω λ ) + e j { λ T 1 1 } X β ( ω λ ) 4 ,
(2)

where denotes convolution. The Fourier support of the resultant sensor signal is shown in Figure 3.

Figure 3
figure3

Idealized spectral support of a color image acquired under the Bayer pattern. In each figure, the horizontal and vertical axes span [−Π,Π)2of Fourier index, and the DC is located at the center of the figure. Solid lines indicate the baseband signals, while replicated spectra with the dashed lines arises as a result of CFA sampling. Black and red lines correspond to the support of luminance and chrominance images, respectively. Alias-inducing chrominance replications are shown with (a) Radially symmetric luminance, (b) vertical feature luminance, (c) horizontal feature luminance.

2.2 Demosaicking

Most demosaicking algorithms described in the literature make use (either implicitly or explicitly) of correlation structure in the spatial frequency domain, often in the form of local sparsity or directional filtering [14, 19, 2123]. As noted in our earlier discussion, the set of carrier frequencies induced by c α and c β include Π,0]T and [0,ΠT, locations that are particularly susceptible to aliasing by horizontal and vertical edges. Figures 3b,c indicates these scenarios, respectively; it may be seen that in contrast to the radially symmetric baseband spectrum of Figure 3a, chrominance–luminance aliasing occurs along one of either the horizontal or vertical axes. However, successful reconstruction can still occur if a noncorrupted copy of this chrominance information is recovered, thereby explaining the popularity of (nonlinear) directional filtering steps [19, 2123]. We can, therefore, view the CFA design problem as one of spatial-frequency multiplexing, and the CFA demosaicking problem as one of demultiplexing to recover subcarriers, with spectral aliasing given the interpretation of “cross talk” [19].

In order to carry out this demultiplexing, signal-adaptive demosaicking methods take the scenarios of Figure 3a–c into account. Typically, this is carried out by first filtering in both horizontal and vertical directions to yield reconstructions x ̂ h and x ̂ v , respectively. Taking their convex combination to yield the final result:

x ̂ τ ( n ) = τ ( n ) x ̂ h ( n ) + ( 1 τ ( n ) ) x ̂ v ( n ) ,
(3)

where τ[0,1] is a set of weights. Based on models of a “natural image” behavior, various policies for determining the appropriate weights have been developed [14, 19, 2123]. For example, the weight combination should maximize the homogeneity u x ̂ (n)—defined as a percentage of pixels in the neighborhood of n(denoted η(n)) that are similar to x(n)[22]:

u x ̂ ( n ) = # { m η ( n ) : d ( x ̂ ( n ) , x ̂ ( m ) ) < ε } # { η ( n ) }
(4)

where d(·,·) is some distance metric and εis a tolerance parameter.

3 Analysis of binning

Let us rigorously analyze the effects that binning has on the acquired sensor data. We begin in Section 3.1 with a brief review of the signal-to-noise ratio (SNR) gains that binning is expected to improve [24]—the main motivation behind binning. An in-depth analysis in Section 3.2 will prove that a combination of binning and demosaicking results in a loss of resolution that is far worse than commonly believed. Section 3.3 offers an alternative perspective that paves a path towards recovering artifact-free images.

3.1 Signal measurement uncertainty

There are at least three types of noise that contribute to the overall error. “Shot noise” is due to the stochasticity of the photon arrival process, and it is well modeled by Poisson distribution. The dark current stemming from in-circuit electron excitation results in “thermal noise,” whose power is proportional to the exposure time. Finally, the source follower and analog-to-digital converter introduce the homoscedastic noise that is known as the “read noise.” The overall SNR of captured image is well modeled by:

SNR pix :=20 log 10 (Q·t·y)10 log 10 (Q·t·y+D·t+N),
(5)

where t is the exposure time, Q is the quantum efficiency constant, D is the dark current constant, and N is the read noise power.

Owing to the fact that the image sensor resolution exceeds the optical resolution in many applications, binning is an attractive way to trade off the excess spatial resolution for gains in SNR. It is instructive first to consider summing M pixel values digitally, post-acquisition. The signal y is boosted M-fold while the noise power increases M times, resulting in an overall 10 log10 (M)dB gain:

SNR sum : = 20 log 10 ( M · Q · t · y ) 10 log 10 ( M · Q · t · y + M · D · t + M · N ) = SNR pix + 10 log 10 ( M ) SNR pix .
(6)

Combining electrical charges of neighboring pixels to form a superpixel in hardware offers advantages over simply summing pixels digitally. The main difference is that when the electrical charges are combined before source follower and analog-to-digital converter, the uncertainty due to read noise remains constant. The corresponding SNR is:

SNR bin = 20 log 10 ( M · Q · t · y ) 10 log 10 ( M · Q · t · y + M · D · t + N ) SNR sum .
(7)

As illustrated by the example in Figure 4, the differences between SNRbin and SNRsum are more noticeable when the signal intensity y becomes small and read noise N become dominant—meaning that binning is most effective in the low light ranges.

Figure 4
figure4

SNR as a function of signal intensity. Here, M=4, Q=0.70, t=1/100s, D=0.1electrons/pixel/second, and N=10electrons rms/pixel [24]. See (5-7).

3.2 Binning “sampling”

Due to the fact that binning combines M electric charges of neighboring pixels, each pixel cannot be shared by more than one superpixel. Moreover, the charges can be combined by summation only (i.e. no fractional combinations). As such, the options for binning schemes are fairly limited. Furthermore, the superpixels produced by pixel binning in color image sensors form a Bayer pattern that requires the additional step of demosaicking to recover the full color low resolution image. We will show that superpixel Bayer pattern suffers from many problems that the pixel-level Bayer pattern does not, leading to the conclusion that combining pixel binning and demosaicking is the wrong approach.

Consider Kodak PIXELUX, the most widely used binning scheme illustrated in Figure 1a,c [7]. It combines four neighboring pixel values together to form one superpixel. This process of combining neighboring pixels to form a single superpixel is equivalent to applying a convolution operator followed by downsampling:

  • filtering: let hbindenote the filter coefficients

    h bin ( n ) = Δ n 1 1 + Δ n 1 1 + Δ n 1 1 + Δ n 1 1 ,
    (8)

    where Δ(·)denotes the Kronecker delta function. Then the charge summation in PIXELUX is

    y bin ( n ) = y ( n ) h bin ( n ) .
  • downsampling: to yield the superpixel Bayer pattern data s, do

    s ( 2 n ) = y bin ( 4 n ) s 2 n + 0 1 = y bin 4 n + 0 1 s 2 n + 1 0 = y bin 4 n + 1 0 s 2 n + 1 1 = y bin 4 n + 1 1 .
    (9)

Note that downsampling implied by (9) is non-uniform—the spatial relationships between samples are changed by the different relative shifts applied to each super pixels (contrast this to (11) below). The Fourier transform of s is (derived in Appendix Appendix 2: Proof of Fourier representation of binning sampling):

S ( ω ) λ Π Z 2 2 X α ω λ 2 + e j ω 2 T 1 1 unwanted filter e j λ 2 T 1 1 X β ω λ 2 + θ Z 2 2 e j ω 2 T θ H bin ω 2 16 unwanted filter X g ω 2 + λ Π 2 Z 4 2 0 0 θ Z 2 2 e j ω 2 + λ T θ H bin ω 2 λ 16 antialias filter X g ω 2 λ aliasing .
(10)

The corresponding Fourier support of S(ω) is shown in Figure 5. Note that the unwanted filter will boost X g to 16 at the DC. The approximate relation above is admitted by the bandlimitedness assumptions of X α and X β :

H bin ( ω ) X α ( ω λ ) 4 X α ( ω λ ) H bin ( ω ) X β ( ω λ ) 4 X β ( ω λ ) .
Figure 5
figure5

Idealized spectral support of binning sampled data S ( ω )in (10), corresponding to Figure 1. As before, solid lines indicate the baseband signals, while spectra with the dashed lines arises as a result of CFA sampling. Black and red lines correspond to the support of luminance and chrominance images, respectively. The blue box represents the original sampling rate.

The main advantage of binning in (9) over (2) is that the signal strength of the baseband X g and the chrominance components X α and X β are boosted by four times—consistent with the SNR analysis in the previous section. As evidenced by Figure 5a, the Fourier support of (9) closely resembles the Bayer pattern of Figure 3a. Superpixel Bayer pattern data in (10) is far from an ideal Bayer pattern representation of the true image x(n) we hope to recover from s(n), however. One distortion we see is the unwanted filtering term θ Z 2 2 e j ω T θ / 2 that degrades the baseband luminance/green signal X g (ω). Another complication is that the antialiasing is only partially effective, allowing aliasing to corrupt the baseband X g (ω) near ω=± [ 0 , Π 4 ] T ,± [ Π 4 , 0 ] T ,± [ Π 4 , Π 4 ] T ,± [ Π 4 , Π 4 ] T .

Contrary to the popular belief that Kodak PIXELUX binning results in 2×2reduction in resolution, the main conclusion we draw from (9) is that the “Nyquist rate” of this binning scheme is Π/4due to high risk of aliasing—implying that the actual resolution loss is 4×4, far worse than the presumed 2×2. Even if this Nyquist rate did not cause problems (e.g. increase sensor resolution), s does not escape the unwanted filtering term in (9)—this cannot be eliminated simply by increasing sensor resolution. Hence when a demosaicking algorithm is applied to the superpixel Bayer pattern data s, what is expected is a filtered and aliased image that we have already seen in Figure 2.

3.2 Binning “subsampling”

Below, we offer an alternative perspective to the analysis of Section 3.2. The analytical results contained herein will provide the basis for the proposed binning-aware demosaicking algorithm. Continuing with the analysis of PIXELUX, consider Figure 6a which displays data equivalent to the superpixels of Figure 1c. The superpixels are placed at the center of the four averaged pixels, denoting the implied superpixel positions. Other locations are given 0 value. This data can be represented by applying a convolution operator followed by subsampling, as follows:

Figure 6
figure6

Binning subsampling is an alternative interpretation to the binning sampling in Figure 1. (a) Subsampled data t(n)in (11) equivalent to the superpixel Bayer pattern of Figure 1c. (b) Idealized spectral support of binning subsampled data T(ω)in (12). The baseband signal X g is free of aliasing in the shaded region. As before, solid lines indicate the baseband signals, while spectra with the dashed lines arises as a result of CFA sampling. Black and red lines correspond to the support of luminance and chrominance images, respectively.

  • filtering: The charge summation in PIXELUX is

    y bin ( n ) = y ( n ) h bin ( n ) .
  • subsampling: to yield the binning subsampling data t, do

    t ( 4 n ) = y bin ( 4 n ) t 4 n + 0 1 = y bin 4 n + 0 1 t 4 n + 1 0 = y bin 4 n + 1 0 t 4 n + 1 1 = y bin 4 n + 1 1 t ( n ) = 0 otherwise.
    (11)

With arithmetic, the Fourier transform of t is deduced to:

T ( ω ) λ Π 2 Z 4 2 X α ( ω λ ) + e j { λ T 1 1 } X β ( ω λ ) 4 + λ Π 2 Z 4 2 θ Z 2 2 e j { λ T θ } H bin ( ω λ ) X g ( ω λ ) 16 .
(12)

Note that the summation over λ suggests 16 modulations. However, except λ Π 2 Z 3 2 , other λ results in θ Z 2 2 e j { λ T θ } is 0, as shown in Figure 7. The support of this transform is illustrated in Figure 6b.a As evidenced by this figure, the modulated baseband signal components X g (ωλ)overlap each other almost entirely—that is, they are aliased. However, the shaded regions of Figure 6b are still free of aliasing. Indeed, this uncorrupted portion of the Fourier support is the key to post-binning processing that is the subject of next section.

Figure 7
figure7

Fourier transform of θ Z 2 2 e j { λ T θ } M bin (ω).

4 Binning-aware demosaicking

Motivated by the analysis of pixel binning subsampling in (12), we now present a novel binning-aware demosaicking aimed at recovering full-color image xwithout introducing binning artifacts. We accomplish this in three stages.

Step 1: Chrominance estimation

Drawing parallels to [19], we assume that local image features are either vertically or horizontally oriented (approximately). If this assumption holds, certain subsets of the modulated chrominances in (11) are assumed to be alias-free conditional under the vertically or horizontally oriented image features—this is illustrated in Figures 8a. For example, assuming horizontal feature, an amplitude demodulation images x α and x β :

x ̂ α , h x ̂ β , h = 1 1 1 1 1 j 1 j from (12) h 0 ( n ) t ( n ) · e j { n T Π Π } h 0 ( n ) t ( n ) · e j { n T 0 Π } h 0 ( n ) t ( n ) · e j { n T Π / 2 Π } h 0 ( n ) t ( n ) · e j { n T Π / 2 Π } , demodulation
(13)
Figure 8
figure8

Idealized spectral support of binning subsampled data, at various stages of binning-aware demosaicking. Shaded regions denote filter support. See text.

where (·) denotes a pseudo inverse matrix and h0 is a lowpass filter whose passbands matches the support of X α and X β . The reconstruction of vertically oriented image feature (denoted x ̂ α , v , x ̂ β , v ) is same as (13) but with 90° rotation.

Step 2: Luminance filtering

Once the x ̂ α , h and x ̂ β , h are recovered, we compute the green image x ̂ g , h . Subtracting

m 4 Z 2 x ̂ α , h ( m ) Δ ( m n ) + x ̂ β , h ( m + 1 1 ) Δ ( m + 1 1 n )

from subsampled binning data t(n) results a Fourier transform that is comprised only of x g : (from (12))

T g ( ω ) = λ Π 2 Z 4 2 θ Z 2 2 e j { λ T θ } H bin ( ω λ ) X g ( ω λ ) 16 .
(14)

This is illustrated in Figure 8b. To reconstruct the green image x ̂ g , h from the unaliased (shaded in Figure 8b) portions of t g , we carry out a standard demodulation, as follows:

x ̂ g , h = h 2 ( n ) { f ( n ) · { h 1 ( n ) t g ( n ) isolate unaliased } modulation } , isolate signal

where h1and h2 are lowpass and highpass filters, respectively; and f is a sum of sinusoids intended for modulation, as follows:

H 1 ( ω ) = 1 if | ω 1 | > Π 2 and | ω 2 | > Π 2 0 else H 2 ( ω ) = 1 if | ω 1 | < Π 2 and | ω 2 | < Π 2 0 else F ( ω ) = λ ± Π ± Π / 4 δ ω + λ θ Z 2 2 e j { λ T θ } .
(15)

As illustrated in Figure 8c,d, the modulation by f(n)not only shifts the spectrums, but also creates additional aliasing copies. Hence, the filter h2is needed to attenuate them. The same procedure can be used to find the green image x ̂ g , v based on x ̂ α , v and x ̂ β , v .

Step 3: Directional selection

Once x ̂ h ={ x ̂ g , h , x ̂ α , h , x ̂ β , h } and x ̂ v ={ x ̂ g , v , x ̂ α , v , x ̂ β , v } are found, they must be combined to yield the final estimate, x ̂ t ={ x ̂ g , x ̂ α , x ̂ β } via the convex combination (3). As already mentioned, the directional selection variable τ has received considerable attention in research and many techniques are available. However, these studies often lack analysis under noise—although binning reduces noise considerably, most directional selection variables are nevertheless sensitive to random perturbations.

To address the problem of directional selection under noise, we modified the τcriteria used in the popular adaptive homogeneity directed (AHD) demosaicking method as follows:

τ ̂ ( n ) = arg max τ [ 0 , 1 ] u x ̂ τ ( n )
(16)
x ̂ t ( n ) = x ̂ τ ̂ ( n ) ( n ) ,
(17)

where x ̂ τ and u x ̂ are as defined in (3) and (4), respectively. Contrast this to the original AHD formulation which selected either x ̂ h or x ̂ v (i.e. τ{0,1} instead of τ[0,1]) as the final output x ̂ t . The modified strategy of (16) behaves similarly to the original AHD near the edges of an image, but encourages averaging in the flat regions of the image. It was found empirically to be far more robust to directional selection under noise.

5 Experimental validation

5.1 Setup

The proposed binning-aware demosaicking x ̂ t (n) in (16) is compared to four available alternatives ( x ̂ s (n), x ̂ p (n), x ̂ y (n), and x ̂ y (n)). The first is a state-of-the-art demosaicking method [19] applied to superpixels s(n)(i.e. output from PIXELUX binning):

x ̂ s ( n ) = demosaicking ( s ( n ) ) .

The second is the same demosaicking method [19] applied to PhaseOne binning superpixels p(n):

x ̂ p ( n ) = demosaicking ( p ( n ) ) .

The third is the application of the same demosaicking method [19] to a full resolution CFA y(n)(i.e. without binning):

x ̂ y ( n ) = demosaicking ( y ( n ) ) .

The fourth is a simulation of a lower resolution sensor. Let x(n) denote the downsampled version of the ideal lowpassed (antialiased) image:

x ( n ) = { h 2 x } ( 2 n ) .
(18)

The CFA subsampled data captured by this lower resolution sensor is then

y ( n ) = c ( n ) T x ( n ) ,

where c: Z 2 [ 0 , 1 ] 3 is same the translucency of CFA used in (1). The application of the same demosaicking method [19] to lower resolution CFA y(n) is:

x ̂ y ( n ) = demosaicking ( y ( n ) ) .

The output images from the proposed method ( x ̂ t ) and the full resolution demosaicking ( x ̂ y ) have the same size as the original image x. On the other hand, the conventional binning processing are based on superpixel sampling, so the pixel density of x ̂ s and x ̂ p is just a quarter of the original image (same is true also for x ̂ y ). Hence when we compare all results (Figures 9, 10, 11, 12, 13; Table 1), we downsample x ̂ t and x ̂ y by 2×2 (in the same manner as (18)) such that all results have the same pixel density as the lower resolution image x.

Figure 9
figure9

Example of images used in experiment (zoomed).

Figure 10
figure10

Reconstructed images with various noise levels. Demosaicking method used for comparison is that of [19]. Here, LR low resolution, DS downsample, PO PhaseOne [5], K Kodak PIXELUX [7].

Figure 11
figure11

Reconstructed images with various noise levels. Demosaicking method used for comparison is that of [19]. Here, LR low resolution, DS downsample, PO PhaseOne [5], K Kodak PIXELUX [7].

Figure 12
figure12

Reconstructed images with various noise levels. Demosaicking method used for comparison is that of [19]. Here, LR low resolution, DS downsample, PO PhaseOne [5], K Kodak PIXELUX [7].

Figure 13
figure13

PSNR (red/green/blue pixels are combined) of each methods, Kodak and PhaseOne refers to the binning methods of[7] and [5], respectively. Demosaicking method used for comparison is that of [19].

Table 1 Reconstruction performance in PSNR with various noise levels

The linear images used in this simulation study are a part of the collection of [25, 26], examples of which are shown in Figure 9. Numerical scores in Table 1 and Figure 13 were obtained by averaging performance over 84 images. Noise is simulated by adding pseudorandom white Gaussian noise to the CFA data y(n), the superpixel CFA data s(n) and p(n), and the lower resolution CFA data y(n). In the experiments, the 12 bit image data in [25, 26] were renormalized to ranges 0–1—meaning noise standard deviation σ n =0.01 correspond to standard deviation of 40.96 in a 12 bit camera processing pipeline, etc. Considering the noise models in (57), one may ask if such a simplified noise model is appropriate. As evidenced by the analysis in (7), however, the difference between SNR and SNRsumis M (the number of pixels combined together); and the difference between SNRsum and SNRbin is the read noise power N. Hence the SNR gains in binning is attributed only to the signal-independent portion of the noise, and not on the signal dependent portion. Furthermore, the read noise dominates in the low light regime. Hence simulated additive white Gaussian noise suffices for experimental verification. The binning subsample signal t(n) represents the same data as s(n) and is computed by upsampling s(n)(insert zeros where necessary).

5.2 Results

Example outputs from four different methods ( x ̂ y , x ̂ y , x ̂ s , x ̂ p , x ̂ t ) are shown in Figures 10, 11 and 12. As expected, demosaicking applied to a full resolution CFA ( x ̂ y ) has a noisy appearance due to low SNR of individual pixels. However, edges and image features are clearly defined even after downsampling thanks to the full resolution description. Demosaicking applied to superpixel CFAs ( x ̂ s , x ̂ p ), on the other hand, yields the opposite qualities—the noise is significantly reduced owing to high SNR of binning, but the image suffers from severe artifacts stemming from aliasing in (10). More specifically, the aliasing in Kodak PIXELUX binning manifests itself as a pixelization artifact, while PhaseOne binning results in zippering artifacts. However, one may argue that the aliasing artifacts in x ̂ p become less bothersome at the highest level of noise because the zippering and noise become less distinguishable. By contrast, the proposed binning-aware demosaicking method ( x ̂ t ) succeeds in suppressing noise while preserving the image features. Of particular interest is the comparison between x ̂ s and x ̂ t , since they both use Kodak PIXELUX binning but the proposed method yield drastically improved outcomes. Overall, the proposed method has better visual quality than x ̂ s and x ̂ p for σ n <0.03; but proposed has a slightly noisier appearance at the highest level of noise (σ n =0.03). Finally, the output from the low resolution camera x ̂ y is both robust to noise and aliasing. This is expected, as lower resolution CFA data y(n) does not share the problems that superpixel CFAs p(n),s(n),t(n) have. However, x ̂ y has superior reconstruction over x ̂ y without noise (σ n =0). Figure 12 shows an example where none of the reconstruction methods produced a satisfactory output (except for x ̂ y under no noise).

The performance is evaluated also in terms of peak SNR, using the downsampled version of the ideal lowpassed (antialiased) image xin (18) as their reference. The results are summarized in Table 1. When there is no noise (σ n =0), ordinary demosaicking reconstruction x ̂ y and lower resolution sensor x ̂ y yields the best results, as expected. However, the proposed x ̂ t is a very close third, yielding comparably satisfactory results. Binning result x ̂ s is worst by far due to binning artifacts.

When noise is taken into consideration, the quality of x ̂ y suffers greatly as expected. Even with noise variance as little as σ n =0.005, the performance of x ̂ y deteriorates significantly, while performance of x ̂ s , x ̂ p , x ̂ t , and x ̂ y in terms of PSNR are far less sensitive to noise. With moderate noise levels (σ n <0.03) the proposed binning-aware demosaicking clearly outperforms the artifact-plagued demosaicking of superpixels. With the largest noise level considered (σ n =0.03), PSNR performances of x ̂ s , x ̂ p , and x ̂ t are closer to each other because deteriorations in output images are dominated by noise (rather than by artifacts).

The analysis in Figures 10, 11, 12, 13 and Table 1 sheds a light on the decades-old debate about resolution versus noise. On one hand, the lower resolution sensor delivers consistent performance under noise ( x ̂ y ). However, Figure 11 shows that under no noise, extra sensor resolution is still desirable. Consider Figure 13. The comparison between green (low resolution) and red (high resolution) curves is consitent with the image quality of Figures 10 and 11. With the availability of pixel binning, we would compare the green curve with the “max function” over the red and blue (binning) curves in Figure 13. Hence one can think of binning as a way to narrow the gap between the red and green curves in noise, without making sacrifices to the advantages of higher spatial resolution.

6 Conclusion

In this article, we proved via a rigorous analysis of binning sampling that Kodak PIXELUX binning scheme results in 4×4reduction in image resolution—contrary to the popular belief that binning of four pixels should result in 2×2reduction in resolution. We proposed a binning-aware demosaicking algorithm based on the Fourier analysis of binning subsampling to combine unaliased copies of the Fourier spectra together via the demodulation. The resultant method succeeds in reconstructing the color image with only 2×2 resolution loss—or increasing the resolution by 2×2over the traditional approach of applying demosaicking to superpixels. The binning-aware demosaicking also succeeds in suppressing noise and preserving image details. We verified experimentally that the binning-aware demosaicking outperforms the alternatives.

Appendix 1: Proof of Fourier Representation of binning subsampling

We provide the proof for Equation (12). Let Hbinbe the Fourier transform of (8). Then the combination of charges can be represented as:

X r bin ( ω ) X r ( ω ) H bin ( ω ) X g bin ( ω ) X g ( ω ) H bin ( ω ) X b bin ( ω ) X b ( ω ) H bin ( ω ) .

Due to band limitedness of X α and X β , the following approximation hold:

X α bin ( ω ) X r bin ( ω ) X g bin ( ω ) = H bin ( ω ) X α ( ω ) 4 X α ( ω ) X β bin ( ω ) X b bin ( ω ) X g bin ( ω ) = H bin ( ω ) X β ( ω ) 4 X β ( ω ) .
(19)

where we used the fact that Hbin(0)=4.

Define m bin (n)= θ Z 4 2 Δ(nθ), as illustrated in Figure 14. The binning subsampling data t(n)refers to the concept of combining the electrial charges of four neighboring pixels together to form a superpixel. The process is illustrated in Figure 6a. Mathmatically, t(n) can be written as:

t ( n ) = m bin ( n ) x r bin ( n ) + m bin n + 1 0 x g bin ( n ) + m bin n + 0 1 x g bin ( n ) + m bin n + 1 1 x b bin ( n ) = m bin ( n ) x α bin ( n ) + m bin n + 1 1 x β bin ( n ) + θ Z 2 2 m bin ( n + θ ) x g bin ( n ) .
Figure 14
figure14

Binning sampling filter m bin ( n ).

In the Fourier domain, t(n)can be expressed as

T ( ω ) = M bin ( ω ) X α bin ( ω ) + ( e j { ω T 1 1 } M bin ( ω ) ) X β bin ( ω ) + θ Z 2 2 e j { λ T θ } M bin ( ω ) X g bin ( ω )

where

M bin ( ω ) = λ Π 2 Z 4 2 δ ( ω λ ) 16 .

With arithmetic and approximation of (19), the Fourier transform of t(n)simplifies to:

T ( ω ) λ Π 2 Z 4 2 X α ( ω λ ) + e j { λ T 1 1 } X β ( ω λ ) 4 + λ Π 2 Z 4 2 θ Z 2 2 e j { λ T θ } H bin ( ω λ ) X g ( ω λ ) 16 .

The Fourier support of T(ω) is illustrated in Figure 6b. Note that the summation over λsuggests that binning subsampling will result in 16 modulations. However, θ Z 2 2 e j { λ T θ } is 0 for many values of λ, as shown in Figure 7. As a result, there are only nine actual modulations.

Appendix 2: Proof of Fourier representation of binning sampling

We provide the proof for Equation (10). The binning sampling data s(n)refers to the concept of combining the electrial charges of four neighboring pixels together to form a superpixel Bayer pattern. The process is illustrated in Figures 1a,c. Similar to binning subsampling (see Appendix Appendix 1: Proof of Fourier Representation of binning subsampling, binning sampling s(n)has the following representation (it is mathmatically convenint to consider s( n 2 ) for n even, rather than s(n)directly);

s n 2 = m bin ( n ) x r bin ( n ) + m bin n + 2 0 x g bin n + 1 0 + m bin n + 0 2 x g bin n + 0 1 + m bin n + 2 2 x b bin n + 1 1 = m bin ( n ) x α bin ( n ) + m bin n + 2 2 x β bin n + 1 1 + m bin ( n ) x g bin ( n ) + m bin n + 2 0 x g bin n + 1 0 + m bin n + 0 2 x g bin n + 0 1 + m bin n + 2 2 x g bin n + 1 1 .

In Fourier domain,

S ( 2 ω ) = M bin ( ω ) X α bin ( ω ) + e j ω T 2 2 M bin ω e j ω T 1 1 X β bin ω + M bin ( ω ) X g bin ( ω ) + e j ω T 2 0 M bin ( ω ) e j ω T 1 0 X g bin ( ω ) + e j ω T 0 2 M bin ( ω ) e j ω T 0 1 X g bin ( ω ) + e j ω T 2 2 M bin ( ω ) e j ω T 1 1 X g bin ( ω ) = λ Π 2 Z 4 2 X α bin ( ω λ ) + e j ( ω + λ ) T 1 1 X β bin ( ω λ ) 16 + λ Π 2 Z 4 2 θ Z 2 2 e j { ( ω + λ ) T θ X g bin ( ω λ ) 16 .

Separating the X g bin (ωλ) to two parts, λ= 0 0 and λ 0 0 and downsampling (2ωω), we have

S ( ω ) = λ Π Z 2 2 X α bin ω λ 2 + e j ω + λ 2 T 1 1 X β bin ω λ 2 4 + θ Z 2 2 e j ω 2 T θ X g bin ω 2 16 + λ Π 2 Z 4 2 0 0 θ Z 2 2 e j { ( ω 2 + λ ) T θ X g bin ( ω 2 λ ) 16 ,

where the 1/4 term on X α bin and X β bin comes from exchanging Z 4 2 with Z 2 2 . With arithmetic and approximation of (19), the Fourier transform of s(n)simplifies to:

S ( ω ) λ Π Z 2 2 X α ω λ 2 + e j ω 2 T 1 1 unwanted filter e j λ 2 T 1 1 X β ω λ 2 + θ Z 2 2 e j ω 2 T θ H bin ω 2 16 unwanted filter X g ω 2 + λ Π 2 Z 4 2 0 0 θ Z 2 2 e j ω 2 + λ T θ H bin ω 2 λ 16 antialias filter X g ω 2 λ aliasing

7 Endnote

aFilter hbin is a combination of highpass and lowpass. However, binning takes advantage of the fact that the sensor resolution exceeds optical resolution, meaning hbin is taken to be a lowpass/antialiasing filter on x g .

References

  1. 1.

    Yamanakam H: Method and apparatus for producing ultra-thin semiconductor chip and method and apparatus for producing ultra-thin back illuminated solid-state image pickup device. US Patent 7,521,335 2006.

  2. 2.

    Edwards T, Pennypacker R: Manufacture of Thinned Substrate Imagers. US Patent 4,226 1981, 334.

  3. 3.

    Compton J, Hamilton J: Image sensor with improved light sensitivity. US Patent 2007/0024931 2007.

  4. 4.

    Barnhofer U, DiCarlo J, Olding B, Wandell B: Color estimation error trade-offs. Proceedings of the SPIE 2003.

  5. 5.

    Borchenko W: Phase One Patent Pending Sensor+Explained.[http://www.phaseone.com/Digital-Backs/P65//media/Phase%20One/Reviews/Review%20pdfs/Backs/Phase-One-Sensorplus.ashx]

  6. 6.

    Zhou Z, Pain B, Fossum E: Frame-transfer CMOS active pixel sensor with pixel binning. IEEE Trans. Electron. Dev 1997, 44(10):1764-1768. 10.1109/16.628834

  7. 7.

    F Chu: Improving CMOS image sensor performance with combined pixels (2005).[http://www.eetimes.com/design/embedded/4013011/Improving-CMOS-image-sensor-performance-with-combined-pixels]

  8. 8.

    Dabov K, Foi A, Katkovnik V, Egiazarian K: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process 2007, 16(8):2080-2095.

  9. 9.

    Portilla J, Strela V, Wainwright M, Simoncelli E: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Process 2003, 12(11):1338-1351. 10.1109/TIP.2003.818640

  10. 10.

    Hirakawa K, Baqai F, Wolfe P: Wavelet-based Poisson rate estimation using the Skellam distribution. Proc. SPIE , Electronic Imaging 2009.

  11. 11.

    Zhang L, Lukac R, Wu X, Zhang D: PCA-based spatially adaptive denoising of CFA images for single-sensor digital cameras. IEEE Trans. Image Process 2009, 18(4):797-812.

  12. 12.

    Hirakawa K, Parks T: Joint demosaicing and denoising. IEEE Trans. Image Process 2006, 15(8):2146-2157.

  13. 13.

    Zhang L, Wu X, Zhang D: Color reproduction from noisy CFA data of single sensor digital cameras. IEEE Trans. Image Process 2007, 16(9):2184-2197.

  14. 14.

    Hirakawa K, Meng X, Wolfe P: A framework for wavelet-based analysis and processing of color filter array images with applications to denoising and demosaicing. IEEE International Conference on Acoustics, Speech and Signal Processing 2007. ICASSP 2007 2007.

  15. 15.

    Fergus R, Singh B, Hertzmann A, Roweis S, Freeman W: Removing camera shake from a single photograph. ACM Trans. Graph. (TOG) 2006, 25(3):787-794. 10.1145/1141911.1141956

  16. 16.

    Levin A, Sand P, Cho T, Durand F, Freeman W: Motion-invariant photography. ACM SIGGRAPH 2008 papers, ACM 2008.

  17. 17.

    Hirakawa K, Simon P: Single-shot high dynamic range imaging with conventional camera hardware. IEEE International Conference on Computer Vision 2011.

  18. 18.

    Alleysson D, Susstrunk S, Hérault J: Linear demosaicing inspired by the human visual system. IEEE Trans. Image Process 2005, 14(4):439-449.

  19. 19.

    Dubois E: Frequency-domain methods for demosaicking of Bayer-sampled color images. IEEE Signal Process. Lett 2005, 12(12):847-850.

  20. 20.

    K Hirakawa P: Wolfe, Spatio-spectral color filter array design for optimal image recovery. IEEE Trans. Image Process 2008, 17(10):1876-1890.

  21. 21.

    Gu J, Wolfe P, Hirakawa K: Filterbank-based universal demosaicking. 2010 17th IEEE International Conference on Image Processing (ICIP) 2010.

  22. 22.

    Hirakawa K, Parks T: Adaptive homogeneity-directed demosaicing algorithm. IEEE Trans. Image Process. 2005, 14(3):360-369.

  23. 23.

    Zhang L, Wu X: Color demosaicking via directional linear minimum mean square-error estimation. IEEE Trans. Image Process 4(12):2167-2178.

  24. 24.

    Fellers T, Vogt K, Davidson M: CCD signal-to-noise ratio.[http://www.microscopyu.com/tutorials/java/digitalimaging/signaltonoise/]

  25. 25.

    Gehler P, Rother C, Blake A, Minka T, Sharp T: Bayesian color constancy revisited. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2008.

  26. 26.

    Shi L, Funt B: Re-processed Version of the Gehler Color Constancy Dataset of 568 Images.[http://www.cs.sfu.ca/colour/data/]

Download references

Acknowledgement

This work was funded in part by Texas Instruments.

Author information

Correspondence to Keigo Hirakawa.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Keywords

  • Aliasing
  • Neighboring Pixel
  • High Dynamic Range Imaging
  • Sensor Resolution
  • Color Filter Array