Research Article Design of Large Field-of-View High-Resolution Miniaturized Imaging System

Steps are taken to design the optical system of lenslet array/photoreceptor array plexus on curved surfaces to achieve a large field of view (FOV) with each lenslet capturing a portion of the scene. An optimal sampling rate in the image plane, as determined by the pixel pitch, is found by the use of an information theoretic performance measure. Since this rate turns out to be sub-Nyquist, superresolution techniques can be applied to the multiple low-resolution (LR) images captured on the photoreceptor array to yield a single high-resolution (HR) image of an object of interest. Thus, the computational imaging system proposed is capable of realizing both the specified resolution and specified FOV.


INTRODUCTION
Images captured by most modern image acquisition systems require further processing in order to be useful. The overall imaging system can be therefore considered as a combination of an optical subsystem, which includes the optical elements and the sensors, and a digital subsystem that comprises of the algorithms employed to perform the necessary signal processing.
Traditionally, the design of the optical subsystem has been separated from the design of the digital subsystem. In recent years, however, there has been a thrust towards an integrated approach for the design of the overall imaging system. Such an approach has been successfully applied to the design of high depth-of-field (DOF) systems. The approach, suggested by Dowski and Cathey [1], involves the use of optical phase masks to convert spatially variant blur to spatially invariant blur. In another approach, suggested by Adelson and Wang [2] (and improved upon by Ng et al. [3]), a "plenoptic camera" (a "lightfield camera"), comprising of a single large lens and an array of lenslets(small lenses)/photoreceptors placed at the focal plane of the large lens, is used to estimate the depth of the scene.
The integrated design of a large field-of-view (FOV) imaging system is still an open problem. One of the challenges in the design of a large FOV imaging system is that of maintaining the same image quality throughout the FOV. Fisheye lenses provide a very large FOV; however, the captured image suffers from severe distortion which requires subsequent correction [4]. Moreover, the resolution of the captured image is not uniform throughout owing to off-axis aberrations. Catadioptric omnidirectional cameras are capable of providing full 360 • field of view by using both lenses and mirrors [4]. This, however, results in a system that is bulky and costly. In this paper, therefore, a theoretical model for a miniaturized high-resolution, large FOV imaging system is presented and an approach to design such a system is proposed. The proposed system comprises of an array of lenslets arranged on a curved surface, with each lenslet capturing an undersampled low-resolution (LR) image of a portion of the scene. The multiple LR images captured thus are registered onto a common grid and superresolution techniques are used to obtain a single high-resolution (HR) image. Since superresolution techniques have been well documented in signal processing literature [5][6][7][8], this paper will focus on the design of the optical system. In Section 2, the factors influencing the design of miniaturized imaging systems are discussed. In Section 3, the specifications required for the design of the imaging system are stated and the steps involved in the design process are outlined. The rate at which the radiance field is sampled by the photoreceptor array is determined by the use of an information theoretic performance criterion, given in [9]. Conclusions and avenues for future research are presented in Section 4. It should be noted that manufacturability issues are not addressed in this paper. Such issues present challenging problems with the current state-of-the-art technology. The design presented will, hopefully, motivate engineers in industry and government laboratories to address the manufacturability problems, especially because, to the best of our knowledge, alternate approaches to simultaneous realization of superresolution and large FOV for computational imaging systems are nonexistent. Figure 1 shows three possible configurations for miniaturized imaging systems based on the compound eye of insects. The use of configuration I was reported by Kitamura et al. [10] and the use of configuration II was reported by Duparre et al. [11]. In configurations I and III, each lenslet is associated with multiple photoreceptors (pixels), while in configuration II, only one pixel is associated with each lenslet. Consequently, configurations I and III can employ superresolution techniques for resolution enhancement because of the scope for forming multiple-shifted LR images of a fixed subregion in object space. This is not possible in configuration II in which only a single image is formed. The FOV of the system in configuration I is the same as the FOV of each of the lenslets. The systems in configurations II and III, however, offer a FOV greater than that of the individual lenslets used in them. This is achieved by making the pixel pitch smaller than the pitch of the lenslets in configuration II and by arranging the lenses and photoreceptors on suitable curved surfaces in configuration III. The proposed configuration III, therefore, offers the advantages of both large FOV and resolution enhancement.

Effect of scaling on lenslet parameters
The effect of scaling on various lenslet properties was documented by Lohmann [12] and is summarized for a circularly shaped lenslet in Table 1. Here D is the diameter of the lenslet, f is its focal length, and d is the pixel pitch in the image plane. NA is the numerical aperture defined as D/2 f and F is the f -number defined as f /D. The properties considered here are the radius of the point spread function (PSF), the FOV, sensitivity, aberrations and angular resolution of the lenslet. Resolvable angular separation is the minimum angular separation required between two point sources in the object space in order for them to be resolved in the captured image. The expression for resolvable angular separation is justified in Section 3.1. Definitions and detailed explanations of the other quantities can be found in any standard book on optics [13,14]. The following two factors highlight the limitations of miniaturized imaging systems.
(1) The ability of the lenslet to resolve points in the object space decreases with decreasing D. This is because the resolvable angular separation, at a fixed wavelength, is proportional to 1/D. (2) The radius of the lenslet PSF roughly determines the number of resolvable spots that can be produced in the image plane. Decreasing D, while keeping F constant, reduces the image area, but not the size of the resolvable spots. As a result, the number of resolvable spots in the image decreases. To compensate for this, the radius of the PSF should be reduced. From Table 1, reducing the PSF radius entails the use of low f -number optics which increases aberrations, as explained in Section 3.2.
These factors suggest that there is a practical limit to which the size of each lenslet should be reduced.

Design assumptions
For simplicity of presentation, a number of simplifying but reasonable assumptions made here are the following.
(1) For the sake of calculations, all the lenslets are assumed to be circularly shaped, biconvex (plano-convex can also be handled), symmetric, and identical in size and optical characteristics. (2) If a region in the object space is common to N noisefree, undersampled and distinct LR frames, then the resolution of that region can be improved by a factor of N by digital superresolution provided each LR frame is undersampled by a factor of N. However, in practice, the resolution enhancement obtainable will be limited by noise and will be less than N, depending on the quality (peak signal-to-noise ratio (SNR)) of the LR frames. (3) The same effective resolution should be obtained throughout the FOV. Effective resolution refers to the resolution obtainable after superresolution. This requires the density of captured LR points to be roughly the same throughout the FOV. Consequently,  the amount of overlap between the LR images of neighboring lenslets should be the same throughout the FOV. A simple way to ensure that this condition is always met is to arrange the lenslets and the photoreceptors in a regular pattern on a spherical surface. Figure 2 shows the structure of the large FOV imaging system to be designed. The specifications are the following.
(2) Desired resolution Δz at distance z. Here Δz refers to the closest spacing of points that can be resolved by the system. Equivalently, the angular resolution, δθ = Δz /z, can be specified. (3) Mean radiance L 0 in the object plane required to determine the average signal strength as well as to calculate the average noise power at the image sensor.
The object surface is assumed to be spherical, centered at O, and of radius R + z. This ensures that the distance of the object surface from any lenslet, along the axis of the lenslet, is always z. Further, the set of photoreceptors (not shown in Figure 2 to avoid clutter, but clearly indicated in Figure 1) for each lenslet is assumed to lie in a plane perpendicular to the axis of the lens and at a distance f from the optical center of the lens. Thus, the image surface (photoreceptors) for the entire system is also spherical and centered at O, but with a radius of R − f . With this arrangement, some of the light captured by a particular lenslet would be focused on the photoreceptors associated with an adjacent lenslet. In order to prevent such crosstalk, an opaque wall could be constructed between adjacent optical channels as has been done for the case of lenslets arranged on a planar surface [10].
To design the system, the following parameter values need to be determined.
(3) The radius, R, of the surface on which the lenslets are to be arranged. (4) The number of lenslets, 2K + 1, assuming K lenses on either side of the axis of the system, required to achieve the specified FOV θ FOV .
Since the lenslets are small in size, the angular separation ϕ between the axes of successive lenslets is given by The total (half-angle) FOV θ FOV is related to the (half-angle) FOV θ of each lens by A systematic approach to arrive at appropriate values for the parameters above is outlined next.

Resolution and lenslet diameter
Resolution of an optical system refers to its ability to distinguish between two closely spaced point sources in object space. A real lens cannot distinguish between point sources placed arbitrarily close to each other in the object space. As the object points get closer, the contrast of their captured images keeps decreasing till the two point sources are captured as a single point in the image. The contrast of a signal refers to the amount the signal varies about its mean value divided by the mean value of the signal and is sometimes referred to as the modulation depth [14, page 545]. It is a measure of how discernible the fluctuations in the signal will be against 4 EURASIP Journal on Advances in Signal Processing the dc background. In order to resolve finely spaced features in the object space, the contrast in their captured image must be high. Any measure of resolution, therefore, must necessarily include contrast.
The resolution of a lenslet is typically characterized by its response to different spatial frequencies. The relevant analysis is presented next for the 1D case, but can be generalized to 2D to yield similar results.
The pupil function of a diffraction-limited lenslet (no optical aberrations) of diameter D, is [15, page 102]: The PSF of the lenslet for coherent light, denoted by b(x), and its Fourier transform (FT), B( f x ), are given by [15, page 130] b where z i is the image distance at the photoreceptor array from the corresponding lenslet and For incoherent light, the PSF is given by The optical transfer function (OTF), B( f x ), is the normalized Fourier transform (FT) of |b(x)| 2 and is given by [15, page 139] The magnitude of the OTF, which is called the MTF, is the ratio of image contrast to object contrast as a function of spatial frequency, or equivalently, the ratio of image-to-object modulation depths. For a circularly shaped, diffraction-limited lenslet, the MTF is as shown in Figure 3 [15]. From the figure, it is observed that the MTF always reduces contrast. Also, the MTF is bandlimited to 1/λF. To characterize the resolution of the lenslet, we consider a periodic array of point sources of equal strength at a distance z from the lenslet. If the spacing between successive sources is Δz, then the fundamental frequency of the input signal is 1/Δz. Magnification M is given by The fundamental frequency of the image of the point sources is, therefore, By examining the Fourier series coefficients of the sources, it is easy to see that the contrast of the sources at the fundamental frequency is 100%. Therefore, the contrast in their images, at the fundamental frequency, is the MTF value at the frequency. The sources are considered to be resolved if this value is higher than some chosen values C (0 < C < 1). If the contrast is 50%, (corresponding to C = 1/2), then the range of frequencies for which the MTF B > C is (from Figure 3) Using (7) in (8) and simplifying gives Thus, the minimum resolvable angular separation in object space is The smallest value of D that meets the desired specifications is chosen. Note that a choice of C = 0.09 (corresponding to a contrast of 9%), would yield Δz = 1.22λz/D, which corresponds to the resolution that would have been obtained by using the Rayleigh criterion [14, page 463].
As an example, suppose that the desired resolution is 5 cm at a distance of 50 m. Then, δθ = 1 mrad (milliradian) and the corresponding value of D, from (10), is 1.25 mm.

Optical aberrations and f -number
The analysis in the preceding section assumed that the optical system was diffraction limited and free from optical aberrations. In practice, lenses can suffer from a variety of optical aberrations. These depend on the diameter, D, of the lens, its f -number, F, and the shape of its surfaces. The value of D to be used for the lenslets is already fixed from the previous subsection. Also, as stated in the first assumption in Section 2.3, the lenses are assumed to be symmetric and biconvex with perfectly spherical surfaces (however, the procedure presented here can be easily extended to lenses of different shapes). Consequently, it only remains to choose a suitable value of F that keeps degradation owing to aberrations  negligible. For this, it is desired to investigate the effects of optical aberrations on the OTF of the lens. In the aberration free case, the OTF for incoherent light is related to the pupil function, P(x), by (6). In the presence of aberrations, the pupil function is modified to be [15] P where k = 2π/λ and Φ(x) is known as the wave aberration function. P (x) is referred to as the generalized pupil function [15]. A geometric optics-based explanation of the quantity Φ(x) is provided in [13] and is presented here briefly for clarity. Consider a rotationally symmetrical optical system as shown in Figure 4. Let P 0 be an object point and P * 1 its Gaussian image. D 0 is the distance of the object plane from the entrance pupil. P 1 and P 1 are the points at which a ray from P 0 intersect the plane of the exit pupil and the Gaussian image plane, respectively. Let W be the wavefront through the center O 1 of the exit pupil associated with the rays that reach the image space from P 0 . In the absence of aberrations, W coincides with a spherical wavefront S which passes through O 1 and is centered on P * 1 . The wave aberration function, Φ, at P 1 is the optical path length (refractive index of the medium times the geometric length) between S and W along the ray P 1 P 1 . Let P 0 and P 1 in Figure 4 be represented in polar coordinates by, respectively, (h cos β 0 , h sin β 0 ) and (r cos β, r sin β). It is shown in [13] that Φ can be expanded as a polynomial containing terms involving only h 2 , r 2 and hr cos(β − β 0 ) of even total order (order of h + order of r) greater than 2 [13, Chapter 5]. The fourth-order terms constitute what are known as the primary aberrations; higherorder terms are usually ignored as these do not have a significant effect on the OTF. The five primary aberrations are spherical aberration, astigmatism, field curvature, distortion, and coma. Expressions for these terms have been derived in [13] for a general centered optical system. These expressions show that the lowering of the f -number of the lens results in an increase in the effects of primary aberrations. Having determined Φ(x), P (x) can be calculated from (11). The OTF, and hence the MTF, is then evaluated by replacing P(v) by P (v) in (6). It can be shown that the presence of optical aberrations always lowers the MTF value from its diffractionlimited value without aberrations [15,Chapter 6]. Thus, F should be selected such that the MTF with aberrations is not significantly degraded as compared to the diffraction-limited MTF. The smallest value of F that causes the MTF at f res to drop by some chosen value, e 1 , is found. Figure 5 shows the MTF curves obtained for two different values of F. The choice of F = 8 is seen to result in a MTF plot which is closer to the diffraction-limited plot in Figure 3. This choice of F causes the MTF value at f res = 0.4/λF to drop by only 2% from the diffraction limited case (in Figure 3). Substituting D = 1.25 mm (from the previous subsection), and F = 8 in F = f /D gives f = 10 mm. Also, using F = 8 in θ = tan −1 (1/2F) (from Table 1) gives the FOV of each lenslet as θ = 0.0625 rad = 3.58 • .

Pixel pitch
The image formed by a lens is sampled in the image plane by the pixels. The pixel pitch, d, determines the sampling rate in the image plane. Each pixel measures the average light flux incident over its area. This causes additional blurring over and above that caused by the PSF of the lens. However, not all of the pixel area is available for light gathering. The ratio of the active pixel area to the total pixel area is referred to as the fill factor γ, 0 < γ < 1 [16]. Larger the value of γ, greater the blur caused by the pixel. For the following discussion, γ ≈ 1 is assumed.
Since the OTF of the lenslet is bandlimited to 1/λF, it is possible to avoid aliasing completely by choosing d < 0.5λF. This is the Nyquist sampling criterion. However, for a given fill factor, a smaller pixel pitch also implies that the area available to capture photons is smaller, and hence fewer photons per pixel are captured for the same irradiance. It is known that the number of photons collected by a pixel is a Poisson random variable having standard deviation equal to the square root of the mean number of photons captured per pixel [17, page 74]. Thus, for a photon-noise limited imaging system, SNR increases proportionally to the square root of the area. Thus, the choice of d involves a tradeoff between aliasing and SNR. In practice, the captured signal may be corrupted by additional sources of noise such as thermal reset noise, fixed pattern noise (FPN), and flicker noise [18, and references therein]. However, by use of techniques such as correlated double sampling (CDS) [18], it is possible to significantly reduce or even eliminate these sources of noise. In the subsequent analysis, therefore, only shot noise (photonlimited noise) will be considered.
It is desirable to choose a pixel size that will achieve the optimal tradeoff between the conflicting requirements of SNR and aliasing. The optimality criterion used here is based on an information theoretic metric. The definition of the metric and an expression for it, given in [9], is presented next.
Consider a planar object placed at a large distance z 0 from a lenslet. Since z o is large, z i ≈ f holds, where z i is the image distance. For the purpose of calculation, it is reasonable to treat each point on the object plane as an independent Lambertian source. Under this assumption, the radiance, L 0 (x 0 , y 0 ), at a point P 0 (x 0 , y 0 ) in the object plane depends only on its coordinates and not on the direction from which the point is viewed. The radiance field, L(x, y), in the image plane is a spatially scaled version of the radiance field in the object plane and is given by Let the combined PSF of the lenslet and photodetector be denoted by h(x, y). The incident field, L(x, y), is blurred by h(x, y). This blurred field is then sampled at the pixel locations (kd, ld) and corrupted by the photodetector noise n[k, l] to give the captured signal s [k, l]. This process can be represented by where g(x, y) = L(x, y) * h(x, y), g[k, l] = g(kd, ld), and K is the steady state gain of the linear radiance to signal conversion. In this paper, both s[k, l] and n[k, l] will be measured in terms of number of photoelectrons. The mutual information between the sampled signal, s [k, l], and the radiance field L(x, y) is defined as where H(s) is the entropy of s[k, l] and H(s | L) is the entropy of s [k, l] given L(x, y). L(x, y) is modeled as a wide-sense stationary (WSS) stochastic process having power spectral density (PSD) S L (Ω 1 , Ω 2 ). Then, the PSD, S g (Ω 1 , Ω 2 ), of g(x, y) is given by where H(Ω 1 , Ω 2 ) is the Fourier transform (FT) of h(x, y).
Since g[k, l] is obtained by sampling g(x, y), the PSD, S g (ω 1 , ω 2 ), of g[k, l] is related to S g (Ω 1 , Ω 2 ) by Define Then, it is stated in [9] that I(s, L) in (14) is given by where, and S n (ω 1 , ω 2 ) is the PSD of the discrete-domain noise n [k, l]. It remains to determine the expressions for various quantities required in the calculation of I(s, L) in (18). These include the gain, K, the PSF, h(x, y), and the statistics of both the signal, L(x, y), and the noise n[k, l].
We start by assuming that L(x, y) has mean L 0 and covariance K L (x, y) = σ 2 L e −r/μ , where r = x 2 + y 2 and μ is the mean spatial detail of the radiance field [9]. μ can be taken to be (δθ) f , where δθ is the resolvable angular separation in (10) and f is the focal length of the lenslet. The PSD, S L (Ω 1 , Ω 2 ), is then given by where ρ = Ω 2 1 + Ω 2 2 . The radiance of the source is converted to irradiance E(x, y) in the image plane and the two quantities are related by [19] E(x, y) = πL(x, y) The resulting irradiance is blurred by the PSF, b(x, y), of the lens for incoherent light and integrated over the area of a single pixel to give the total optical power, φ(x, y), incident at the pixel. Integration over the pixel area can be modeled as convolution with the function where h(x, y) = b(x, y) * a(x, y) is the combined PSF of the lenslet and the pixel. The spatial frequency response of the system is, therefore, where B(Ω 1 , Ω 2 ) is the OTF of the lenslet as given by (6), and FT of a(x, y).
Assuming that the light source is a monochromatic source of wavelength λ, the mean number of photons incident at the pixel location [k, l] per second is given by where Q(λ) is the quantum efficiency of the pixel material [17]. The gain K is, therefore, given by Since the number of photons collected at the pixel location (kd, ld) is actually a discrete Poisson random variable, its mean and variance are equal [20, page 108] and given by (25). The number of photoelectrons generated in response to this in a time interval t int , is, therefore, also a Poisson random variable [17], whose mean and variance are each equal to n pe [k, l]. The variance of this random variable constitutes the shot noise power. Thus, strictly speaking, the shot noise in each pixel depends on the signal strength at that pixel. However, this dependence is complex. It is usually acceptable to consider the noise to be uncorrelated with the signal and the noise power in all pixels to be equal to n (0) pe , the noise generated by the average value, L 0 , of the illumination. Replacing L(x, y) by L 0 in (21) and following the same reasoning leads to n (0) pe = KL 0 H(0, 0). From (24), it is easy to see that Since the noise is assumed to be white, we have This gives the expressions for all the quantities needed to calculate I(s, L) in (18). Typical values of some of the parameters involved in this calculation are given in Table 2. An approximate conversion from photometric to radiometric units suffices for our purpose. The choice of some of the parameter values presented above is justified next.
(1) The lighting condition of the input scene was assumed to be that present on an overcast day. Also, assuming that the scene has good contrast, it is reasonable to choose σ l = 6 (approximately 1/3 rd the mean value L 0 ). (2) A quantum efficiency value of 0.3 is typical for pixels sensing light in the visible range [16].
To determine the optimal value of d, I(s, L) is calculated for various values of d and the resulting curve is plotted in Figure 6. A value of F = 8, as determined from the previous section, is used to obtain this plot. For this value of F, the optical bandwidth is 1/λF = 2.5 × 10 5 cycles/m. The Nyquist sampling interval is therefore given by 0.5 λF = 2 μm. In the plot shown in Figure 6, d is varied from 2 μm (Nyquist sampling) to 8 μm (undersampling by a factor of 4). The curve shows a maximum at d = 3.6 μm indicating the tradeoff between aliasing and SNR. Also note that choosing d = 3.6 μm implies undersampling by a factor of 3.6/2 = 1.8, leaving scope for enhancement of resolution by digital superresolution. Note that this resolution enhancement is achieved by the recovery of frequency components lost due to aliasing. The value of the fill factor γ (0 < γ < 1) determines the blur/SNR tradeoff. Specifically, a large γ gives better SNR at the expense of increased blur, while a small γ gives poor SNR but less blur. To counter the degradations caused by pixel fill factor and lens PSF, additional filtering operations could be performed. From the second assumption in Section 2.3, we conclude that each point in the object space should be captured in 1.8 LR frames in order to attain resolution up to the diffraction limit. Hence, N = 2 is chosen, since the number of images should be an integer value.   Figure 7 shows a single lenslet placed on a spherical surface of radius R centered at O. The object surface is also spherical and centered at O, but has a radius of R + z, where z is the object distance. A particular lenslet captures the image of a limited region in the object space, the extent of the region being determined by its FOV, θ. Suppose that this region subtends an angle 2α at O. The number of LR images in which a point in the object space is captured depends on both ϕ (defined in (1)) and α. For each point to be captured N times, it is required that

Resolution enhancement factor and radius of curved surface
From (1) and (29), we get Also, from the geometry of Figure 7, it can be shown, after some calculations, that sin α = tan θ R sec 2 θ − R + R 2 + z 2 + 2zR sec 2 θ . (31) Equations (30) and (31) can be solved simultaneously for both α and R using either numerical or graphical techniques. Once R is known, ϕ can be determined from (1). Use of this in (2) allows one to determine K and hence the total number of lenslets required to achieve the desired field of view. Substituting D = 1.25 mm (from Section 3.1), θ = 3.58 • (from Section 3.2), and N = 2 in (30) and (31) and solving gives R = 2 cm. Hence ϕ ≈ D/R = 0.0625 rad = 3.58 • . Using this and θ FOV = 90 • (for a total FOV of 180 • ) in (2) gives K = 25. This completes the design of the high FOV optical system.

CONCLUSIONS AND FUTURE WORK
A systematic procedure for the design of a miniaturized imaging system with specified field of view and specified resolution has been presented here. Large FOV is obtained by arranging a lenslets on a curved surface. An optimal value of the pixel pitch in the image plane is determined by considering the mutual information between the incident radiance field and the image captured by each lenslet. This value turns out to be larger than that required for Nyquist sampling; consequently, superresolution techniques [5][6][7][8] can be used to compensate for this lower resolution due to aliasing and obtain resolution up to the diffraction limit of the optics.
The design procedure presented here seeks to maximize the mutual information, I(L, X i ), i = 1, . . . , n, between the radiance field L and each of the captured LR frames X i , i = 1, . . . , n, independent of the subsequent processing performed on the LR frames. However, to get a truly end-to-end optimized imaging system, the mutual information between the radiance field L and the HR image, Y , formed after superresolution should be considered. The distinction between this and the approach presented in this paper is shown in Figure 8. Such analysis is considerably more complicated and is being explored as part of future work.
Finally, a number of generalizations can be made to the design approach suggested here. These include (i) hexagonal arrangement of lenslets on the curved surface and of pixels in the image plane to achieve greater packing density; (ii) carrying out the SNR and aberration analysis for polychromatic light instead of monochromatic light; (iii) exploring the utility of the system to realize superresolution in 3D imaging.
Although the above generalizations will complicate the calculations involved in the design, it is expected that the same design principles and steps can be used.