 Research
 Open
 Published:
Analysis and processing of pixel binning for color image sensor
EURASIP Journal on Advances in Signal Processingvolume 2012, Article number: 125 (2012)
Abstract
Pixel binning refers to the concept of combining the electrical charges of neighboring pixels together to form a superpixel. The main benefit of this technique is that the combined charges would overcome the read noise at the sacrifice of spatial resolution. Binning in color image sensors results in superpixel Bayer pattern data, and subsequent demosaicking yields the final, lower resolution, less noisy image. It is common knowledge among the practitioners and camera manufacturers, however, that binning introduces severe artifacts. The indepth analysis in this article proves that these artifacts are far worse than the ones stemming from loss of resolution or demosaicking, and therefore it cannot be eliminated simply by increasing the sensor resolution. By accurately characterizing the sensor data that has been binned, we propose a postcapture binning data processing solution that succeeds in suppressing noise and preserving image details. We verify experimentally that the proposed method outperforms the existing alternatives by a substantial margin.
1 Introduction
Recent progress on digital camera technology has had extraordinary impact on numerous electronic industries, including mobile phones, security, vehicle, bioengineering, and computer vision systems. In many applications, sensor resolution has exceeded the optical resolution, meaning that the additional hardware complexity to increase pixel density would not necessarily result in large image quality gains. The significant improvement in sensor sensitivity has allowed cameras to operate in lighting conditions that were unthinkable with film cameras.
Despite increased sensitivity, however, noise remains a serious problem in modern image sensors. Available technologies for reducing noise in hardware include backside illuminated architecture [1, 2], color filters with higher transmittance [3, 4], and pixel binning [5–7]. Processing techniques at our disposal include image denoising [8–10], joint denoising and demosaicking [11–14], image deblurring [15, 16] (long shutter to compensate for light), and singleshot high dynamic range imaging [17].
The goal of this article is to provide a comprehensive characterization of the pixel binning for color image sensors, and propose postcapture signal processing steps aimed at eliminating the binning artifacts. Binning refers to the concept of combining the electrical charges of neighboring pixels together to form a superpixel. The combined signal will then be amplified by a source follower and converted into digital values by an analogtodigital converter. The main benefit of this technique is that the combined charges would overcome the read noise, even if the individual pixel values are small. The improved noise performance comes at the price of spatial resolution loss, however. Binning in color image sensors is complicated by the presence of color filter array (CFA). Data are typically obtained via a single CCD or CMOS sensor with a CFA spatial subsampling procedure, a physical construction whereby each pixel location measures only a single color. Figures 1a,b show the most well known CFA scheme called the Bayer pattern, which involves red, green, and blue filters. To maintain the fidelity of color, binning in color image sensors are performed by combining neighboring pixels with the same color filter. As evidenced by the two well known binning configurations shown in Figures 1a,b, the resultant superpixel form a Bayer pattern, as shown in Figure 1c. The subsequent demosaicking algorithm—the process of interpolating to recover the full RGB representation of the image from the CFA subsampled sensor data—yields the final, lower resolution, less noisy image.
However, it is a common knowledge among the practitioners and camera manufacturers that binning introduces pixelization artifacts. An example is shown in Figure 2. As will be made clear in the sequel, these artifacts differ from the ones stemming from loss of resolution, and therefore it cannot be eliminated simply by increasing the sensor resolution. Indepth analysis of the sampling scheme implied by the binning proves that gross mismatch between binning and demosaicking results is at fault for the severe pixelization. Hence the right way to correct this problem is to design a binningaware demosaicking algorithm. The proposed method still draws from the established demosaicking principles, but with profound differences in the way spatially high frequency components are handled. To the best of the knowledge of the authors, this is the first major article to examine pixel binning problem in color image sensors from the signal processing perspective, and to provide postcapture processing solution to correct for the pixelization artifacts.
The remainder of this article is organized as follows. We begin by briefly reviewing CFA sampling and demosaicking in Section 2. Section 3. provides a rigorous analysis of binning. A novel binningaware demosaicking technique is developed in Section 4. We experimentally verify its effectiveness in 5. before making concluding remarks in Section 6.
2 Background
2.1 CFA sampling
Thanks to the seminal work of [18] and further investigations by [19–21], CFA sampling is well characterized and understood. The key insight is the two dimensional Fourier analysis of CFA sampled sensor data, which reveals that the signal is preserved by an efficient spacecolor representation. Specifically, let $\mathit{x}:{\mathbb{Z}}^{2}\to {\mathbb{R}}^{3}$, where x(n)=x_{ r }(n),x_{ g }(n),x_{ b }(n)]^{T} correspond to the RGB tristimulus value at location $\mathit{n}\in {\mathbb{Z}}^{2}$. Then the CFA subsampled data has the following form:
where $\mathit{c}:{\mathbb{Z}}^{2}\to {[0,1]}^{3}$ denotes the translucency of CFA at location n. The advantage to the representation is that the difference images x_{ α }=x_{ r }−x_{ g } and x_{ β }=x_{ b }−x_{ g } enjoy rapid spectral decay and can serve as a proxy for chrominance. On the other hand, the “baseband” green image x_{ g } can be taken to approximate luminance. As our eventual image recovery task will be to approximate the true color image triple x(n) from acquired sensor data y(n), note that recovering either representations ({x_{ r }x_{ g }x_{ b }}or {x_{ g }x_{ α }x_{ β }}) are equivalent. Moreover, the representation of (1) allows us to recast the purecolor sampling structure in terms of sampling structures c_{ α } and c_{ β } associated with the difference channels x_{ α }and x_{ β }. For more extensive investigation on the bandlimitedness assumptions of {x_{ g }x_{ α }x_{ β }}, see [18–20].
Denote by the uppercase letters the discrete space Fourier transforms and $\mathit{\omega}={({\omega}_{1},{\omega}_{2})}^{T}\in {\{\mathbb{R}/(2\Pi \left)\right\}}^{2}$ ($\mathbb{R}/\left(2\Pi \right)$ denotes the quotient group of $\mathbb{R}$ by the subgroup $2\mathrm{\Pi \mathbb{Z}}$) the two dimensional Fourier index. Then the Fourier analysis of CFA is:
where δ(·) is the Dirac delta function, and ${\mathbb{Z}}_{2}$ denotes the cyclic group of order 2. Note that the phase shift term in C_{ β } arises due to the relative position of blue pixels relative to the red (the origin is assumed to be on a red pixel). The corresponding Fourier analysis of the sensor data y takes the following form:
where ⋆denotes convolution. The Fourier support of the resultant sensor signal is shown in Figure 3.
2.2 Demosaicking
Most demosaicking algorithms described in the literature make use (either implicitly or explicitly) of correlation structure in the spatial frequency domain, often in the form of local sparsity or directional filtering [14, 19, 21–23]. As noted in our earlier discussion, the set of carrier frequencies induced by c_{ α } and c_{ β } include Π,0]^{T} and [0,Π^{T}, locations that are particularly susceptible to aliasing by horizontal and vertical edges. Figures 3b,c indicates these scenarios, respectively; it may be seen that in contrast to the radially symmetric baseband spectrum of Figure 3a, chrominance–luminance aliasing occurs along one of either the horizontal or vertical axes. However, successful reconstruction can still occur if a noncorrupted copy of this chrominance information is recovered, thereby explaining the popularity of (nonlinear) directional filtering steps [19, 21–23]. We can, therefore, view the CFA design problem as one of spatialfrequency multiplexing, and the CFA demosaicking problem as one of demultiplexing to recover subcarriers, with spectral aliasing given the interpretation of “cross talk” [19].
In order to carry out this demultiplexing, signaladaptive demosaicking methods take the scenarios of Figure 3a–c into account. Typically, this is carried out by first filtering in both horizontal and vertical directions to yield reconstructions ${\widehat{\mathit{x}}}_{h}$ and ${\widehat{\mathit{x}}}_{v}$, respectively. Taking their convex combination to yield the final result:
where τ∈[0,1] is a set of weights. Based on models of a “natural image” behavior, various policies for determining the appropriate weights have been developed [14, 19, 21–23]. For example, the weight combination should maximize the homogeneity ${u}_{\widehat{x}}\left(\mathit{n}\right)$—defined as a percentage of pixels in the neighborhood of n(denoted η(n)) that are similar to x(n)[22]:
where d(·,·) is some distance metric and εis a tolerance parameter.
3 Analysis of binning
Let us rigorously analyze the effects that binning has on the acquired sensor data. We begin in Section 3.1 with a brief review of the signaltonoise ratio (SNR) gains that binning is expected to improve [24]—the main motivation behind binning. An indepth analysis in Section 3.2 will prove that a combination of binning and demosaicking results in a loss of resolution that is far worse than commonly believed. Section 3.3 offers an alternative perspective that paves a path towards recovering artifactfree images.
3.1 Signal measurement uncertainty
There are at least three types of noise that contribute to the overall error. “Shot noise” is due to the stochasticity of the photon arrival process, and it is well modeled by Poisson distribution. The dark current stemming from incircuit electron excitation results in “thermal noise,” whose power is proportional to the exposure time. Finally, the source follower and analogtodigital converter introduce the homoscedastic noise that is known as the “read noise.” The overall SNR of captured image is well modeled by:
where t is the exposure time, Q is the quantum efficiency constant, D is the dark current constant, and N is the read noise power.
Owing to the fact that the image sensor resolution exceeds the optical resolution in many applications, binning is an attractive way to trade off the excess spatial resolution for gains in SNR. It is instructive first to consider summing M pixel values digitally, postacquisition. The signal y is boosted Mfold while the noise power increases M times, resulting in an overall 10 log_{10} (M)dB gain:
Combining electrical charges of neighboring pixels to form a superpixel in hardware offers advantages over simply summing pixels digitally. The main difference is that when the electrical charges are combined before source follower and analogtodigital converter, the uncertainty due to read noise remains constant. The corresponding SNR is:
As illustrated by the example in Figure 4, the differences between SNR_{bin} and SNR_{sum} are more noticeable when the signal intensity y becomes small and read noise N become dominant—meaning that binning is most effective in the low light ranges.
3.2 Binning “sampling”
Due to the fact that binning combines M electric charges of neighboring pixels, each pixel cannot be shared by more than one superpixel. Moreover, the charges can be combined by summation only (i.e. no fractional combinations). As such, the options for binning schemes are fairly limited. Furthermore, the superpixels produced by pixel binning in color image sensors form a Bayer pattern that requires the additional step of demosaicking to recover the full color low resolution image. We will show that superpixel Bayer pattern suffers from many problems that the pixellevel Bayer pattern does not, leading to the conclusion that combining pixel binning and demosaicking is the wrong approach.
Consider Kodak PIXELUX, the most widely used binning scheme illustrated in Figure 1a,c [7]. It combines four neighboring pixel values together to form one superpixel. This process of combining neighboring pixels to form a single superpixel is equivalent to applying a convolution operator followed by downsampling:

filtering: let h_{bin}denote the filter coefficients
$$\begin{array}{l}{h}_{\text{bin}}\left(\mathit{n}\right)=\Delta \left(\mathit{n}\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right)+\Delta \left(\mathit{n}\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right)\\ \phantom{\rule{5.5em}{0ex}}+\Delta \left(\mathit{n}\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right)+\Delta \left(\mathit{n}\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right),\phantom{\rule{2em}{0ex}}\hfill \end{array}$$(8)where Δ(·)denotes the Kronecker delta function. Then the charge summation in PIXELUX is
$$\begin{array}{l}{y}_{\text{bin}}\left(\mathit{n}\right)=y\left(\mathit{n}\right)\star {h}_{\text{bin}}\left(\mathit{n}\right).\end{array}$$ 
downsampling: to yield the superpixel Bayer pattern data s, do
$$\begin{array}{c}s\left(2\mathit{n}\right)={y}_{\text{bin}}\left(4\mathit{n}\right)\\ s\left(2\mathit{n}+\left(\genfrac{}{}{0ex}{}{0}{1}\right)\right)={y}_{\text{bin}}\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{0}{1}\right)\right)\\ s\left(2\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{0}\right)\right)={y}_{\text{bin}}\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{0}\right)\right)\\ s\left(2\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right)={y}_{\text{bin}}\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right).\end{array}$$(9)
Note that downsampling implied by (9) is nonuniform—the spatial relationships between samples are changed by the different relative shifts applied to each super pixels (contrast this to (11) below). The Fourier transform of s is (derived in Appendix Appendix 2: Proof of Fourier representation of binning sampling):
The corresponding Fourier support of S(ω) is shown in Figure 5. Note that the unwanted filter will boost X_{ g }to 16 at the DC. The approximate relation above is admitted by the bandlimitedness assumptions of X_{ α } and X_{ β }:
The main advantage of binning in (9) over (2) is that the signal strength of the baseband X_{ g } and the chrominance components X_{ α }and X_{ β }are boosted by four times—consistent with the SNR analysis in the previous section. As evidenced by Figure 5a, the Fourier support of (9) closely resembles the Bayer pattern of Figure 3a. Superpixel Bayer pattern data in (10) is far from an ideal Bayer pattern representation of the true image x(n) we hope to recover from s(n), however. One distortion we see is the unwanted filtering term $\sum _{\mathit{\theta}\in {\mathbb{Z}}_{2}^{2}}{e}^{j{\mathit{\omega}}^{T}\mathit{\theta}/2}$ that degrades the baseband luminance/green signal X_{ g }(ω). Another complication is that the antialiasing is only partially effective, allowing aliasing to corrupt the baseband X_{ g }(ω) near $\mathit{\omega}=\pm {[0,\frac{\Pi}{4}]}^{T},\pm {[\frac{\Pi}{4},0]}^{T},\pm {[\frac{\Pi}{4},\frac{\Pi}{4}]}^{T},\pm {[\frac{\Pi}{4},\frac{\Pi}{4}]}^{T}$.
Contrary to the popular belief that Kodak PIXELUX binning results in 2×2reduction in resolution, the main conclusion we draw from (9) is that the “Nyquist rate” of this binning scheme is Π/4due to high risk of aliasing—implying that the actual resolution loss is 4×4, far worse than the presumed 2×2. Even if this Nyquist rate did not cause problems (e.g. increase sensor resolution), s does not escape the unwanted filtering term in (9)—this cannot be eliminated simply by increasing sensor resolution. Hence when a demosaicking algorithm is applied to the superpixel Bayer pattern data s, what is expected is a filtered and aliased image that we have already seen in Figure 2.
3.2 Binning “subsampling”
Below, we offer an alternative perspective to the analysis of Section 3.2. The analytical results contained herein will provide the basis for the proposed binningaware demosaicking algorithm. Continuing with the analysis of PIXELUX, consider Figure 6a which displays data equivalent to the superpixels of Figure 1c. The superpixels are placed at the center of the four averaged pixels, denoting the implied superpixel positions. Other locations are given 0 value. This data can be represented by applying a convolution operator followed by subsampling, as follows:

filtering: The charge summation in PIXELUX is
$$\begin{array}{l}{y}_{\text{bin}}\left(\mathit{n}\right)=y\left(\mathit{n}\right)\star {h}_{\text{bin}}\left(\mathit{n}\right).\end{array}$$ 
subsampling: to yield the binning subsampling data t, do
$$\begin{array}{c}t\left(4\mathit{n}\right)={y}_{\text{bin}}\left(4\mathit{n}\right)\\ t\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{0}{1}\right)\right)={y}_{\text{bin}}\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{0}{1}\right)\right)\\ t\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{0}\right)\right)={y}_{\text{bin}}\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{0}\right)\right)\\ t\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right)={y}_{\text{bin}}\left(4\mathit{n}+\left(\genfrac{}{}{0ex}{}{1}{1}\right)\right)\\ t\left(\mathit{n}\right)=0\text{otherwise.}\end{array}$$(11)
With arithmetic, the Fourier transform of t is deduced to:
Note that the summation over λ suggests 16 modulations. However, except $\mathit{\lambda}\in \frac{\Pi}{2}{\mathbb{Z}}_{3}^{2}$, other λ results in $\sum _{\mathit{\theta}\in {\mathbb{Z}}_{2}^{2}}{e}^{j\left\{{\mathit{\lambda}}^{T}\mathit{\theta}\right\}}$ is 0, as shown in Figure 7. The support of this transform is illustrated in Figure 6b.^{a} As evidenced by this figure, the modulated baseband signal components X_{ g }(ω−λ)overlap each other almost entirely—that is, they are aliased. However, the shaded regions of Figure 6b are still free of aliasing. Indeed, this uncorrupted portion of the Fourier support is the key to postbinning processing that is the subject of next section.
4 Binningaware demosaicking
Motivated by the analysis of pixel binning subsampling in (12), we now present a novel binningaware demosaicking aimed at recovering fullcolor image xwithout introducing binning artifacts. We accomplish this in three stages.
Step 1: Chrominance estimation
Drawing parallels to [19], we assume that local image features are either vertically or horizontally oriented (approximately). If this assumption holds, certain subsets of the modulated chrominances in (11) are assumed to be aliasfree conditional under the vertically or horizontally oriented image features—this is illustrated in Figures 8a. For example, assuming horizontal feature, an amplitude demodulation images x_{ α } and x_{ β }:
where (·)^{†} denotes a pseudo inverse matrix and h_{0} is a lowpass filter whose passbands matches the support of X_{ α }and X_{ β }. The reconstruction of vertically oriented image feature (denoted ${\widehat{x}}_{\alpha ,v},{\widehat{x}}_{\beta ,v}$) is same as (13) but with 90° rotation.
Step 2: Luminance filtering
Once the ${\widehat{x}}_{\alpha ,h}$ and ${\widehat{x}}_{\beta ,h}$ are recovered, we compute the green image ${\widehat{x}}_{g,h}$. Subtracting
from subsampled binning data t(n) results a Fourier transform that is comprised only of x_{ g }: (from (12))
This is illustrated in Figure 8b. To reconstruct the green image ${\widehat{x}}_{g,h}$ from the unaliased (shaded in Figure 8b) portions of t_{ g }, we carry out a standard demodulation, as follows:
where h_{1}and h_{2} are lowpass and highpass filters, respectively; and f is a sum of sinusoids intended for modulation, as follows:
As illustrated in Figure 8c,d, the modulation by f(n)not only shifts the spectrums, but also creates additional aliasing copies. Hence, the filter h_{2}is needed to attenuate them. The same procedure can be used to find the green image ${\widehat{x}}_{g,v}$ based on ${\widehat{x}}_{\alpha ,v}$ and ${\widehat{x}}_{\beta ,v}$.
Step 3: Directional selection
Once ${\widehat{\mathit{x}}}_{h}=\{{\widehat{x}}_{g,h},{\widehat{x}}_{\alpha ,h},{\widehat{x}}_{\beta ,h}\}$ and ${\widehat{\mathit{x}}}_{v}=\{{\widehat{x}}_{g,v},{\widehat{x}}_{\alpha ,v},{\widehat{x}}_{\beta ,v}\}$ are found, they must be combined to yield the final estimate, ${\widehat{\mathit{x}}}_{t}=\{{\widehat{x}}_{g},{\widehat{x}}_{\alpha},{\widehat{x}}_{\beta}\}$ via the convex combination (3). As already mentioned, the directional selection variable τ has received considerable attention in research and many techniques are available. However, these studies often lack analysis under noise—although binning reduces noise considerably, most directional selection variables are nevertheless sensitive to random perturbations.
To address the problem of directional selection under noise, we modified the τcriteria used in the popular adaptive homogeneity directed (AHD) demosaicking method as follows:
where ${\widehat{\mathit{x}}}_{\tau}$ and ${u}_{\widehat{x}}$ are as defined in (3) and (4), respectively. Contrast this to the original AHD formulation which selected either ${\widehat{\mathit{x}}}_{h}$ or ${\widehat{\mathit{x}}}_{v}$ (i.e. τ∈{0,1} instead of τ∈[0,1]) as the final output ${\widehat{\mathit{x}}}_{t}$. The modified strategy of (16) behaves similarly to the original AHD near the edges of an image, but encourages averaging in the flat regions of the image. It was found empirically to be far more robust to directional selection under noise.
5 Experimental validation
5.1 Setup
The proposed binningaware demosaicking ${\widehat{\mathit{x}}}_{t}\left(\mathit{n}\right)$ in (16) is compared to four available alternatives (${\widehat{\mathit{x}}}_{s}\left(\mathit{n}\right)$, ${\widehat{\mathit{x}}}_{p}\left(\mathit{n}\right)$, ${\widehat{\mathit{x}}}_{y}\left(\mathit{n}\right)$, and ${\widehat{\mathit{x}}}_{{y}^{\prime}}\left(\mathit{n}\right)$). The first is a stateoftheart demosaicking method [19] applied to superpixels s(n)(i.e. output from PIXELUX binning):
The second is the same demosaicking method [19] applied to PhaseOne binning superpixels p(n):
The third is the application of the same demosaicking method [19] to a full resolution CFA y(n)(i.e. without binning):
The fourth is a simulation of a lower resolution sensor. Let x^{′}(n) denote the downsampled version of the ideal lowpassed (antialiased) image:
The CFA subsampled data captured by this lower resolution sensor is then
where $\mathit{c}:{\mathbb{Z}}^{2}\to {[0,1]}^{3}$ is same the translucency of CFA used in (1). The application of the same demosaicking method [19] to lower resolution CFA y^{′}(n) is:
The output images from the proposed method (${\widehat{\mathit{x}}}_{t}$) and the full resolution demosaicking (${\widehat{\mathit{x}}}_{y}$) have the same size as the original image x. On the other hand, the conventional binning processing are based on superpixel sampling, so the pixel density of ${\widehat{\mathit{x}}}_{s}$ and ${\widehat{\mathit{x}}}_{p}$ is just a quarter of the original image (same is true also for ${\widehat{\mathit{x}}}_{{y}^{\prime}}$). Hence when we compare all results (Figures 9, 10, 11, 12, 13; Table 1), we downsample ${\widehat{\mathit{x}}}_{t}$ and ${\widehat{\mathit{x}}}_{{y}^{\prime}}$ by 2×2 (in the same manner as (18)) such that all results have the same pixel density as the lower resolution image x^{′}.
The linear images used in this simulation study are a part of the collection of [25, 26], examples of which are shown in Figure 9. Numerical scores in Table 1 and Figure 13 were obtained by averaging performance over 84 images. Noise is simulated by adding pseudorandom white Gaussian noise to the CFA data y(n), the superpixel CFA data s(n) and p(n), and the lower resolution CFA data y^{′}(n). In the experiments, the 12 bit image data in [25, 26] were renormalized to ranges 0–1—meaning noise standard deviation σ_{ n }=0.01 correspond to standard deviation of 40.96 in a 12 bit camera processing pipeline, etc. Considering the noise models in (57), one may ask if such a simplified noise model is appropriate. As evidenced by the analysis in (7), however, the difference between SNR and SNR_{sum}is M (the number of pixels combined together); and the difference between SNR_{sum} and SNR_{bin} is the read noise power N. Hence the SNR gains in binning is attributed only to the signalindependent portion of the noise, and not on the signal dependent portion. Furthermore, the read noise dominates in the low light regime. Hence simulated additive white Gaussian noise suffices for experimental verification. The binning subsample signal t(n) represents the same data as s(n) and is computed by upsampling s(n)(insert zeros where necessary).
5.2 Results
Example outputs from four different methods (${\widehat{\mathit{x}}}_{{y}^{\prime}},{\widehat{\mathit{x}}}_{y},{\widehat{\mathit{x}}}_{s},{\widehat{\mathit{x}}}_{p},{\widehat{\mathit{x}}}_{t}$) are shown in Figures 10, 11 and 12. As expected, demosaicking applied to a full resolution CFA (${\widehat{\mathit{x}}}_{y}$) has a noisy appearance due to low SNR of individual pixels. However, edges and image features are clearly defined even after downsampling thanks to the full resolution description. Demosaicking applied to superpixel CFAs (${\widehat{\mathit{x}}}_{s},{\widehat{\mathit{x}}}_{p}$), on the other hand, yields the opposite qualities—the noise is significantly reduced owing to high SNR of binning, but the image suffers from severe artifacts stemming from aliasing in (10). More specifically, the aliasing in Kodak PIXELUX binning manifests itself as a pixelization artifact, while PhaseOne binning results in zippering artifacts. However, one may argue that the aliasing artifacts in ${\widehat{\mathit{x}}}_{p}$ become less bothersome at the highest level of noise because the zippering and noise become less distinguishable. By contrast, the proposed binningaware demosaicking method (${\widehat{\mathit{x}}}_{t}$) succeeds in suppressing noise while preserving the image features. Of particular interest is the comparison between ${\widehat{\mathit{x}}}_{s}$ and ${\widehat{\mathit{x}}}_{t}$, since they both use Kodak PIXELUX binning but the proposed method yield drastically improved outcomes. Overall, the proposed method has better visual quality than ${\widehat{\mathit{x}}}_{s}$ and ${\widehat{\mathit{x}}}_{p}$ for σ_{ n }<0.03; but proposed has a slightly noisier appearance at the highest level of noise (σ_{ n }=0.03). Finally, the output from the low resolution camera ${\widehat{\mathit{x}}}_{{y}^{\prime}}$ is both robust to noise and aliasing. This is expected, as lower resolution CFA data y^{′}(n) does not share the problems that superpixel CFAs p(n),s(n),t(n) have. However, ${\widehat{\mathit{x}}}_{y}$ has superior reconstruction over ${\widehat{\mathit{x}}}_{{y}^{\prime}}$ without noise (σ_{ n }=0). Figure 12 shows an example where none of the reconstruction methods produced a satisfactory output (except for ${\widehat{\mathit{x}}}_{y}$ under no noise).
The performance is evaluated also in terms of peak SNR, using the downsampled version of the ideal lowpassed (antialiased) image x^{′}in (18) as their reference. The results are summarized in Table 1. When there is no noise (σ_{ n }=0), ordinary demosaicking reconstruction ${\widehat{\mathit{x}}}_{y}$ and lower resolution sensor ${\widehat{\mathit{x}}}_{{y}^{\prime}}$ yields the best results, as expected. However, the proposed ${\widehat{\mathit{x}}}_{t}$ is a very close third, yielding comparably satisfactory results. Binning result ${\widehat{\mathit{x}}}_{s}$ is worst by far due to binning artifacts.
When noise is taken into consideration, the quality of ${\widehat{\mathit{x}}}_{y}$ suffers greatly as expected. Even with noise variance as little as σ_{ n }=0.005, the performance of ${\widehat{\mathit{x}}}_{y}$ deteriorates significantly, while performance of ${\widehat{\mathit{x}}}_{s}$, ${\widehat{\mathit{x}}}_{p}$, ${\widehat{\mathit{x}}}_{t}$, and ${\widehat{\mathit{x}}}_{{y}^{\prime}}$ in terms of PSNR are far less sensitive to noise. With moderate noise levels (σ_{ n }<0.03) the proposed binningaware demosaicking clearly outperforms the artifactplagued demosaicking of superpixels. With the largest noise level considered (σ_{ n }=0.03), PSNR performances of ${\widehat{\mathit{x}}}_{s}$, ${\widehat{\mathit{x}}}_{p}$, and ${\widehat{\mathit{x}}}_{t}$ are closer to each other because deteriorations in output images are dominated by noise (rather than by artifacts).
The analysis in Figures 10, 11, 12, 13 and Table 1 sheds a light on the decadesold debate about resolution versus noise. On one hand, the lower resolution sensor delivers consistent performance under noise (${\widehat{\mathit{x}}}_{{y}^{\prime}}$). However, Figure 11 shows that under no noise, extra sensor resolution is still desirable. Consider Figure 13. The comparison between green (low resolution) and red (high resolution) curves is consitent with the image quality of Figures 10 and 11. With the availability of pixel binning, we would compare the green curve with the “max function” over the red and blue (binning) curves in Figure 13. Hence one can think of binning as a way to narrow the gap between the red and green curves in noise, without making sacrifices to the advantages of higher spatial resolution.
6 Conclusion
In this article, we proved via a rigorous analysis of binning sampling that Kodak PIXELUX binning scheme results in 4×4reduction in image resolution—contrary to the popular belief that binning of four pixels should result in 2×2reduction in resolution. We proposed a binningaware demosaicking algorithm based on the Fourier analysis of binning subsampling to combine unaliased copies of the Fourier spectra together via the demodulation. The resultant method succeeds in reconstructing the color image with only 2×2 resolution loss—or increasing the resolution by 2×2over the traditional approach of applying demosaicking to superpixels. The binningaware demosaicking also succeeds in suppressing noise and preserving image details. We verified experimentally that the binningaware demosaicking outperforms the alternatives.
Appendix 1: Proof of Fourier Representation of binning subsampling
We provide the proof for Equation (12). Let H_{bin}be the Fourier transform of (8). Then the combination of charges can be represented as:
Due to band limitedness of X_{ α }and X_{ β }, the following approximation hold:
where we used the fact that H_{bin}(0)=4.
Define ${m}_{\text{bin}}\left(\mathit{n}\right)=\sum _{\mathit{\theta}\in {\mathbb{Z}}_{4}^{2}}\Delta (\mathit{n}\mathit{\theta})$, as illustrated in Figure 14. The binning subsampling data t(n)refers to the concept of combining the electrial charges of four neighboring pixels together to form a superpixel. The process is illustrated in Figure 6a. Mathmatically, t(n) can be written as:
In the Fourier domain, t(n)can be expressed as
where
With arithmetic and approximation of (19), the Fourier transform of t(n)simplifies to:
The Fourier support of T(ω) is illustrated in Figure 6b. Note that the summation over λsuggests that binning subsampling will result in 16 modulations. However, $\sum _{\mathit{\theta}\in {\mathbb{Z}}_{2}^{2}}{e}^{j\left\{{\mathit{\lambda}}^{T}\mathit{\theta}\right\}}$ is 0 for many values of λ, as shown in Figure 7. As a result, there are only nine actual modulations.
Appendix 2: Proof of Fourier representation of binning sampling
We provide the proof for Equation (10). The binning sampling data s(n)refers to the concept of combining the electrial charges of four neighboring pixels together to form a superpixel Bayer pattern. The process is illustrated in Figures 1a,c. Similar to binning subsampling (see Appendix Appendix 1: Proof of Fourier Representation of binning subsampling, binning sampling s(n)has the following representation (it is mathmatically convenint to consider $s\left(\frac{\mathit{n}}{2}\right)$ for n even, rather than s(n)directly);
In Fourier domain,
Separating the ${X}_{{g}_{\text{bin}}}(\mathit{\omega}\mathit{\lambda})$ to two parts, λ=$\left(\genfrac{}{}{0ex}{}{0}{0}\right)$and λ≠$\left(\genfrac{}{}{0ex}{}{0}{0}\right)$ and downsampling (2ω↦ω), we have
where the 1/4 term on ${X}_{{\alpha}_{\text{bin}}}$ and ${X}_{{\beta}_{\text{bin}}}$ comes from exchanging ${\mathbb{Z}}_{4}^{2}$ with ${\mathbb{Z}}_{2}^{2}$. With arithmetic and approximation of (19), the Fourier transform of s(n)simplifies to:
7 Endnote
^{a}Filter h_{bin} is a combination of highpass and lowpass. However, binning takes advantage of the fact that the sensor resolution exceeds optical resolution, meaning h_{bin} is taken to be a lowpass/antialiasing filter on x_{ g }.
References
 1.
Yamanakam H: Method and apparatus for producing ultrathin semiconductor chip and method and apparatus for producing ultrathin back illuminated solidstate image pickup device. US Patent 7,521,335 2006.
 2.
Edwards T, Pennypacker R: Manufacture of Thinned Substrate Imagers. US Patent 4,226 1981, 334.
 3.
Compton J, Hamilton J: Image sensor with improved light sensitivity. US Patent 2007/0024931 2007.
 4.
Barnhofer U, DiCarlo J, Olding B, Wandell B: Color estimation error tradeoffs. Proceedings of the SPIE 2003.
 5.
Borchenko W: Phase One Patent Pending Sensor+Explained.[http://www.phaseone.com/DigitalBacks/P65//media/Phase∖%20One/Reviews/Review∖%20pdfs/Backs/PhaseOneSensorplus.ashx]
 6.
Zhou Z, Pain B, Fossum E: Frametransfer CMOS active pixel sensor with pixel binning. IEEE Trans. Electron. Dev 1997, 44(10):17641768. 10.1109/16.628834
 7.
F Chu: Improving CMOS image sensor performance with combined pixels (2005).[http://www.eetimes.com/design/embedded/4013011/ImprovingCMOSimagesensorperformancewithcombinedpixels]
 8.
Dabov K, Foi A, Katkovnik V, Egiazarian K: Image denoising by sparse 3D transformdomain collaborative filtering. IEEE Trans. Image Process 2007, 16(8):20802095.
 9.
Portilla J, Strela V, Wainwright M, Simoncelli E: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Process 2003, 12(11):13381351. 10.1109/TIP.2003.818640
 10.
Hirakawa K, Baqai F, Wolfe P: Waveletbased Poisson rate estimation using the Skellam distribution. Proc. SPIE , Electronic Imaging 2009.
 11.
Zhang L, Lukac R, Wu X, Zhang D: PCAbased spatially adaptive denoising of CFA images for singlesensor digital cameras. IEEE Trans. Image Process 2009, 18(4):797812.
 12.
Hirakawa K, Parks T: Joint demosaicing and denoising. IEEE Trans. Image Process 2006, 15(8):21462157.
 13.
Zhang L, Wu X, Zhang D: Color reproduction from noisy CFA data of single sensor digital cameras. IEEE Trans. Image Process 2007, 16(9):21842197.
 14.
Hirakawa K, Meng X, Wolfe P: A framework for waveletbased analysis and processing of color filter array images with applications to denoising and demosaicing. IEEE International Conference on Acoustics, Speech and Signal Processing 2007. ICASSP 2007 2007.
 15.
Fergus R, Singh B, Hertzmann A, Roweis S, Freeman W: Removing camera shake from a single photograph. ACM Trans. Graph. (TOG) 2006, 25(3):787794. 10.1145/1141911.1141956
 16.
Levin A, Sand P, Cho T, Durand F, Freeman W: Motioninvariant photography. ACM SIGGRAPH 2008 papers, ACM 2008.
 17.
Hirakawa K, Simon P: Singleshot high dynamic range imaging with conventional camera hardware. IEEE International Conference on Computer Vision 2011.
 18.
Alleysson D, Susstrunk S, Hérault J: Linear demosaicing inspired by the human visual system. IEEE Trans. Image Process 2005, 14(4):439449.
 19.
Dubois E: Frequencydomain methods for demosaicking of Bayersampled color images. IEEE Signal Process. Lett 2005, 12(12):847850.
 20.
K Hirakawa P: Wolfe, Spatiospectral color filter array design for optimal image recovery. IEEE Trans. Image Process 2008, 17(10):18761890.
 21.
Gu J, Wolfe P, Hirakawa K: Filterbankbased universal demosaicking. 2010 17th IEEE International Conference on Image Processing (ICIP) 2010.
 22.
Hirakawa K, Parks T: Adaptive homogeneitydirected demosaicing algorithm. IEEE Trans. Image Process. 2005, 14(3):360369.
 23.
Zhang L, Wu X: Color demosaicking via directional linear minimum mean squareerror estimation. IEEE Trans. Image Process 4(12):21672178.
 24.
Fellers T, Vogt K, Davidson M: CCD signaltonoise ratio.[http://www.microscopyu.com/tutorials/java/digitalimaging/signaltonoise/]
 25.
Gehler P, Rother C, Blake A, Minka T, Sharp T: Bayesian color constancy revisited. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2008.
 26.
Shi L, Funt B: Reprocessed Version of the Gehler Color Constancy Dataset of 568 Images.[http://www.cs.sfu.ca/colour/data/]
Acknowledgement
This work was funded in part by Texas Instruments.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Aliasing
 Neighboring Pixel
 High Dynamic Range Imaging
 Sensor Resolution
 Color Filter Array