SCIENCE CHINA Information Sciences
. RESEARCH PAPER .
June 2012 Vol. 55 No. 6: 1429–1435 doi: 10.1007/s11432-011-4495-1
CMOS image sensor with optimal video sampling scheme GUAN Ning∗ , ZHANG Xu, LIU Bo, DONG Zan, HUANG BeiJu, GUI Yun, HAN JianQiang, WANG Yuan, ZHANG ZanYun & CHEN HongDa State Key Laboratory on Integrated Optoelectronics, Institute of Semiconductors, Chinese Academy of Science, Beijing 100083, China Received April 8, 2011; accepted May 13, 2011; published online April 12, 2012
Abstract Time-varying illumination on the focal plane is a three-dimensional signal. Multidimensional sam√ pling theory proves that the temporal resolution can be optimally improved by a factor of 2 while the spatial resolution is reserved by changing the sampling scheme. Based on the theory, a prototype multi-field CMOS image sensor (CIS) is designed for a 0.35-μm 2P4M CMOS process. Corresponding pixels in 4×4-pixel clusters are assembled into 16 fields over the whole array. Control pins (resets and shutters) of pixels are separated which provides the ability of sampling the illumination with the optimal sampling scheme. Keywords
CMOS image sensors, sampling methods, deinterlacing, high speed photography, selective readout
Citation Guan N, Zhang X, Liu B, et al. CMOS image sensor with optimal video sampling scheme. Sci China Inf Sci, 2012, 55: 1429–1435, doi: 10.1007/s11432-011-4495-1
1
Introduction
The time-varying image on the focal plane is a three-dimensional signal. A video is formed when it is sampled periodically by the pixel array. The sampling of multidimensional signal is different from that of one-dimensional signal. Image sensors use hexagonal pixels instead of orthogonal pixels because the former offers higher sampling efficiency for same two-dimensional signals (stationary images), so fewer pixels are required on the focal plane in order to achieve the same resolution. Sampling efficiency can also be increased by changing the sampling scheme [1] for three-dimensional signal which means lower pixel rate is needed to achieve certain spatiotemporal resolution. Pixel rate is defined as the number of pixel values that can be extracted from the pixel array during a given period. Increasing the pixel rate can increase the speed of a CMOS image sensor. There are currently two kinds of methods to do so. One is in situ storage [2,3] and the other is parallel analogto-digital conversion [4–7]. But the maximum achievable pixel rate is limited by many physical factors including chip area, power consumption and image quality. If we want to further increase the speed, the sampling scheme should be changed. Moreover, we can also achieve the same spatiotemporal resolution with less resource consumption. ∗ Corresponding
author (email:
[email protected])
c Science China Press and Springer-Verlag Berlin Heidelberg 2012
info.scichina.com
www.springerlink.com
1430
Guan N, et al.
Sci China Inf Sci
June 2012 Vol. 55 No. 6
In the TV realm, interlacing is a common concept of splitting a frame into two fields comprising odd and even lines respectively. Originally, this method was used to reduce the bandwidth of video signal. When the era of progressively scanned display came, a lot of deinterlacing algorithms [8] was developed in order to show interlaced video on progressively scanned devices. They calculate non-existing lines in a field and generate a full frame. By comparing the original frame and generated frame, and using objective criteria like mean square error (MSE) and peak signal-to-noise ratio (PSNR) the deinterlacing algorithms are evaluated. Although subjective tests are still in use, researchers now have methods to predict the effect of interlacing-deinterlacing process. Interlacing could be used to reduce the time for readout and deinterlacing to reconstruct the full-resolution frames since the frame rate of a high-speed CIS is mainly limited by its pixel rate. In this paper, a new kind of active pixel array is described. The sensor uses four-transistor and onecapacitor (4T1C) pixel with shutter. Using all four layers of metal, we can separate control pins (resets and shutters) of neighboring 4×4 pixels. This configuration allows us to sample the frame in different schemes, including the optimal one with highest sampling efficiency, which is impossible for other videoacquiring devices. Pixels still share output column buses, so that they are digitalized with a column-level analog-to-digital converter. The rest of the paper is organized as follows. Section 2 discusses properties of the signal on the focal plane and reviews the sampling theory for 3-D signal. Section 3 develops a preliminary reconstructing algorithm for one of the sampling schemes. Section 4 presents details of the multi-field imager and some experimental results.
2
Signal on the focal plane and sampling schemes
The original signal on the focal plane is time-varying two-dimensional illumination, ψ(x, y, t). Spatially, ψ(x, y, t) is filtered out by the optical system, divided into an array of pixels and averaged over each pixel opening. Temporally, electrons generated by the incident light are collected during an integration process and then sampled as a voltage signal at the end of the integration. These mechanisms make the signal both spatial and temporal band-limited. The spectrum of ψ(x, y, t) is an ellipsoid, whose radii are fx max , fy max and ft max . The multi-dimensional Nyquist condition is not simply the superposition of multiple 1-D criteria [1], so that we can change the sampling scheme while still fulfilling the Nyquist condition. As shown in Figure 1, the sampling schemes of ψ(x, y, t) can be graphically represented in a projecting way, in which pixels sampled at different time are all projected on the x-y plane. Each circle represents a pixel and the number inside indicates the order of sampling. All circles carrying the same number are sampled simultaneously, which are called a field. The total sampling period is always Δt. Thus the interval between consecutive fields is Δt/n, in which n is the number of fields in Δt. Since the sampling schemes in Figure 1 are three-dimensional, they are analyzed with lattice theory [1, 9], which was originally borrowed from solid-state physics. The results are listed in Table 1. The other three sampling schemes are more efficient than the normal orthorhombic sampling (Figure 1(a)). By changing the sampling scheme, we can increase the temporal resolution without increasing the pixel rate. For instance, if we change the sampling scheme from orthorhombic (Figure 1(a)) to face-centered orthorhombic (Figure 1(c)), the maximum identifiable spatial frequency components fx max and √ fy max −1 are still Δx−1 /2 and Δy /2, while the temporal component f is improved by a factor of 2 (from t max √ −1 −1 Δt /2 to Δt / 2).
3
Reconstructing algorithms
In this section, only reconstructing algorithms for FCO are discussed. As we can see in Table I, the spatial resolution of FCO is not degraded. So there is no need to apply an extra low-pass spatial filter. Moreover, FCO sampling has the highest sampling efficiency among all sampling schemes [9], which means it offers
Guan N, et al.
Figure 1
Sci China Inf Sci
1431
June 2012 Vol. 55 No. 6
Projecting representations of sampling schemes. (a) Orthorhombic (ORT); (b) aligned line-interlaced (ALI);
(c) face-centered orthorhombic (FCO); (d) hexagonal with four-field periodicity (HEX4).
Table 1 Sampling scheme
Maximum sampling efficiency of different sampling schemes
Volume of brillouin zone
Sampling efficiency
Maximum identifiable fx∗
FCO
8 √ 4 3 √ 4 2
π/6≈0.524 √ π/3 3 ≈0.605 √ π/3 2 ≈0.740
HEX4
6
2π/9 ≈0.698
ORT ALI
∗: They are scaled by
Δx−1 ,
Δy −1
and
Δt−1
fy∗
ft∗ 1/2 √ 3/3 √ 2/2 √ 2 3/3
1/2
1/2
1/2
1/2
1/2 √ 3/6
1/2 1/2
respectively.
best overall spatiotemporal resolution. Once a video is sampled with FCO, it contains 2N fields. Each field comprises P × Q pixels, half of which are valid. The other half is not sampled and denoted with zeros. ψ(nx Δx, ny Δy, nt Δt/2), nx + ny + nt ≡ 0(mod2), (1) ψs (nx , ny , nt ) = 0, otherwise. Applying a reconstructing algorithm to the video sequence,√we will fill these zeros with interpolated values and have 2N frames of P × Q pixels. Precisely, only 2N frames are necessary. But it is only a one-dimensional √ sampling rate conversion problem for each pixel. We can find a rational factor that approximates 2, interpolate and decimate the frames. Researchers have developed many deinterlacing algorithms for conventional interlaced video. Although we cannot use these algorithms directly for sampling schemes other than ALI, we can still design similar algorithms involving motion-adaptive [10], edge-dependent [11] and motion-compensated [12] methods. All these methods are designed under certain assumptions of the image on the focal plane. So they can recover the video of full spatiotemporal resolution. Only if these assumptions still hold true for the fast-varying image, will we be able to use these methods. So we derive the canonical weighing function for face-centered orthorhombic sampling (FCO) like [9] √ Δx−1 Δy −1 Δt−1 [−2t˜sin(2 2πt˜) x, y˜, t˜) = √ hFCO (˜ 3 2 2 2 2 4 2π (t˜ − 2˜ x )(t˜ − 2˜ y ) √ √ √ √ + ( 2˜ x + t˜) sin(2π˜ x + 2πt˜) + ( 2˜ x − t˜) sin(2π˜ x − 2πt˜) √ √ √ √ + ( 2˜ y + t˜) sin(2π˜ y + 2πt˜) + ( 2˜ y − t˜) sin(2π˜ y − 2πt˜) √ √ − 2(˜ x + y˜) sin(2π˜ x + 2π˜ y) − 2(˜ x − y˜) sin(2π˜ x − 2π˜ y)], in which
⎧ 1 x ⎪ x ˜ = xfx max = , ⎪ ⎪ ⎪ 2 Δx ⎪ ⎪ ⎨ 1 y , y˜ = yfy max = 2 Δy ⎪ ⎪ ⎪ √ ⎪ ⎪ ⎪ 2 t ⎩t˜ = tf . t max = 2 Δt
(2)
(3)
1432
Guan N, et al.
Sci China Inf Sci
June 2012 Vol. 55 No. 6
And the whole sequence is recovered with a 3-D convolution: ψ(x, y, t) = ψs (x, y, t) ∗ hFCO (x, y, t).
(4)
Practically, we cannot apply (4) over the whole video sequence. So a 3-D mask is used to implement the reconstructing algorithm. For directly sampled pixels, we can use the sampled values. For the other half, we can multiply neighboring sampled pixels with predetermined coefficients and sum them up to get interpolated values. For instance, if ψ(nx , yn , nt ) isn’t a sampled pixel, coefficients of its six nearest neighbors are calculated with eq. (2) and normalized to one: 2 π3 2 ψs (nx − 1, ny , nt ) + 2 ψs (nx + 1, ny , nt ) ψ(nx , yn , nt ) = 8π + 16 π2 π 2 16 2 + 2 ψs (nx , ny − 1, nt ) + 2 ψs (nx , ny + 1, nt ) + 3 (nx , ny , nt − 1) π π π 16 (5) + 3 (nx , ny , nt + 1) . π Commonly used video test sequences [13] are used to evaluate the reconstructing method and mean squared error (MSE), defined as MSE =
P −1 Q−1 1 (ψ(i, j) − ψo (i, j))2 P × Q i=0 j=0
(6)
is used to assess it. By the definition of MSE, the error introduced by interpolation is just another kind of noise. Eq. (5) offers quite promising results (MSE<4) for quasi-stationary video like “akiyo” and “grandma” but bad results (MSE>100) for some moving video. On the other hand, if we need some “smart” features like tracking [14], motion detection [15] or collision detection [16], image sensors and algorithms could also be designed based on raw FCO sampled sequence, where interpolation is unnecessary.
4
Circuit implementation and test results
A prototype chip (Figure 2) of 128×128 pixels is designed for 2P4M CMOS technology. Basic information of the chip is listed in Table 2. In the prototype chip, 4T1C pixels share column-wise output bus (OUT0–3) and row-wise selector (SEL0–3). But resets (RST0–15) and shutters (SHT0–15) of adjacent 4×4 pixels are separated as shown in Figure 3. These separated resets and shutters are shared by 4×4-pixel groups over the whole array so that there are in all 16 fields whose exposure can be controlled separately. Using all four layers of metal, we can squeeze components and interconnects into 12.5μm×12.5μm and get a fill factor of 42%. 32 column buffers, each of which is shared by 4 columns, output the exposure signal analogously. So there are 9 address lines in all, 7 of which are row-wise (ROW6–0) and 2 are column-wise (COL1–0). Assigning different values to ROW1, 0 and COL1, 0, we can select different fields for readout (Figure 4). During the test, we create repeatable moving scenes with a stepper-motor-driven linear stage and a poker card (two of spades) on it. The background is black and white fringes whose width is 1 cm. The pixel rate is set to be relatively small (250×103 pixels/s) because the stage moves slowly. It takes 65.536 ms to sample the whole array. Both the reset (RSTs=1 and SHTs=1 in Figure 5(a)) and exposure (RSTs=0 and SHTs=0) values should be sampled for digital correlated double sampling (CDS). So we set the exposure time at 2×65.536 ms for ORT sampling as shown in Figure 5(a). And we get a sequence of pictures (Figure 6(a)–(d)) whose average pixel rate is 125×103 pixels/s. At the same average pixel rate, we use the timing shown in Figure 5(b) to sample the same moving scene with FCO sampling and reconstruct it with (5) (Figure 6(e)–(h)). Though the image is blurred, it is possible to identify the position of the spade.
Guan N, et al.
Figure 2
Sci China Inf Sci
Micrograph of the whole chip.
June 2012 Vol. 55 No. 6
Figure 3
Table 2
Schematic of 4 × 4 pixels.
Chip summary
Pixel size
12.5 μm×12.5 μm
Fill factor
41.9%
Array size
128×128
Dynamic range
65.1 dB
Saturation
1.1 V
Technology
0.35 μm 2P4M CMOS
Array size
1.6 mm×1.6 mm
Pixel rate
2×106 pixels/s
Figure 4
1433
Schematic of the whole chip.
Actually, setting the pixel rate to 125×103 pixels/s is not fair for FCO sampling. Because during the exposure time (RSTs=0 and SHTs=1) of ORT sampling, the ADC’s do nothing. If we use FCO sampling with 250×103 pixels/s, it takes 32.768 ms to readout half of the array and the total exposure for one field will be 65.536 ms as shown in Figure 7. The same exposure time is used for ORT sampling so that we
1434
Guan N, et al.
Sci China Inf Sci
June 2012 Vol. 55 No. 6
Figure 5 Timing for RSTs and SHTs to realize ORT (a) and FCO (b) sampling with same average pixel rate (125×103 pixels/s) and same exposure time (131.072 ms). In (a), ‘RSTs’ and ‘SHTs’ indicate all RSTx and SHTx terminals. In (b) ‘RST-1s’ and ‘SHT-1s’ include RST/SHT 0, 2, 5, 7, 8, 10, 13 and 15; ‘RST-2s’ and ‘SHT-2s’ include RST/SHT 1, 3, 4, 6, 9, 11, 12 and 14.
Figure 6 ORT and FCO sampled pictures in the same period with same average pixel rate (125×103 pixels/s) and same exposure time (131.072 ms). (a)–(d) ORT sampled pictures; (e)–(h) FCO sampled pictures.
Figure 7 Timing for RSTs and SHTs to realize ORT (a) and FCO (b) sampling with fixed pixel rate (250×103 pixels/s) and same exposure time (65.536 ms). The meanings of legends at the left are the same as in Figure 5.
Figure 8 ORT and FCO sampled pictures in the same period with fixed pixel rate (250×103 pixels/s) and same exposure time (65.536 ms). (a)–(d) ORT sampled pictures; (e)–(j) FCO sampled pictures.
Figure 9 frames.
The effect of reconstructing algorithm. (a) interlaced frame of two consecutive fields; (b) and (c) reconstructed
Guan N, et al.
Sci China Inf Sci
June 2012 Vol. 55 No. 6
1435
could assume the signals on the focal plane are the same. Under this condition, the frame rate of FCO sampling will be 1.5 times more than ORT sampling, so that FCO will generate 6 frames (Figure 8(e)–(j)) instead of 4 as ORT do (Figure 8(a)–(d)) in the same period. The effect of reconstructing algorithm is shown in Figure 9. Figure 9(a) is an interlaced frame of two consecutive fields. Figure 9(b) and (c) are amplified details of reconstructed frames. The reconstructing algorithm has some effect but still needs improvement.
5
Conclusion
The time-varying illumination on the focal plane is a 3-D signal. The sampling efficiency can be increased by changing the sampling scheme. In order to implement sampling schemes other than full-frame sampling (ORT), we designed a multi-field CMOS image sensor, whose control pins are separated and grouped into different fields. By doing this, we reach the highest sampling efficiency for 3-D signals. Since the speed of CIS is mainly limited by its pixel rate, this method can be used to increase the frame rate. But the video sequence sampled by FCO and other sampling schemes is different from what people can comprehend. Reconstructing algorithms are necessary to make the video sequence identifiable. A lot of algorithms are developed to display interlaced video on progressively- scanned devices. We could transplant the principles and design reconstructing algorithm for FCO sampled sequence, which is the next step of out research.
Acknowledgements This work was supported by National Natural Science Foundation of China (Grant Nos. 60536030, 61036002, 60776024, 60877035, 61076023 90820002), and National High Technology Research and Development Program of China (Grant Nos. 2007AA04Z254, 2007AA04Z329, 2011CB933203, 2011CB933102).
References 1 Dubois E. The sampling and reconstruction of time-varying imagery with application in video systems. P IEEE, 1985, 73: 502–522 2 Kleinfelder S, Chen Y, Kwiatkowski K, et al. High-speed CMOS image sensor circuits with in situ frame storage. IEEE Trans Nucl Sci, 2004, 51: 1648–1656 3 Le C V, Etoh T G, Nguyen H D, et al. A backside-illuminated image sensor with 200000 pixels operating at 250000 frames per second. IEEE Trans Electron Dev, 2009, 56: 2556–2562 4 Krymski A, Van Blerkom D, Anndersson A, et al. A high speed, 500 frames/s 1024×1024 CMOS active pixel sensor. In: Symposium on VLSI Circuits Digest of Technical Papers. Kyoto: IEEE Publishing, 1999. 137–138 5 Krymski A I, Tu N. A 9-v/lux-s 5000-frames/s 512×512 CMOS sensor. IEEE J Solid-St Circ, 2003, 50: 136–143 6 Furuta M, Nishikawa Y, Inoue T, et al. A high-speed, high-sensitivity digital CMOS image sensor with a global shutter and 12-bit column-parallel cyclic A/D converters. IEEE J Solid-St Circ, 2007, 42: 766–774 7 Kleinfelder S, Lim S, Liu X, et al. A 10000 frames/s CMOS digital pixel sensor. IEEE J of Solid-State Circuit, 2001, 36: 2049–2058 8 De Haan G, Bellers E B. Deinterlacing—an overview. P IEEE, 1998, 86: 1839–1857 9 Petersen D P, Middleton D. Sampling and reconstruction of wavenumber-limited functions in n-dimensional Euclidean spaces. Inf Control, 1962, 5: 279–323 10 Lin S F, Chang Y L, Chen L G. Motion adaptive interpolating with horizontal motion detection for deinterlacing. IEEE Trans Consum Electr, 2003, 49: 1256–1265 11 Park M K, Kang M G, Nam K, et al. New edge dependent deinterlacing algorithm based on horizontal edge pattern. IEEE Trans Consum Electr, 2003, 49: 1508–1512 12 Fan Y C, Lin H S, Chiang A, et al. Motion compensated deinterlacing with efficient artifact detection for digital television displays. J Disper Technol, 2008, 4: 218–228 13 YUV video sequences. Available at http://trace.eas.asu.edu/yuv/ 14 Higgins C M, Pant V. A biomimetic VLSI sensor for visual tracking of small moving targets. IEEE Trans Circuits-I, 2004, 51: 2384–2394 15 Etienne-Cummings R, Der Spiegel J V, Mueller P. A focal plane visual motion measurement sensor. IEEE Trans Circuits I, 1997, 44: 55–66 16 Harrison R R. A biologically inspired analog IC for visual collision detection. IEEE Trans Circuits I, 2005, 52: 2308– 2318