International Journal of Computer Vision 58(3), 173–207, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
Real-Time Omnidirectional Image Sensors YASUSHI YAGI The Institute of Scientific and Industrial Research, Osaka University, Japan
[email protected]
MASAHIKO YACHIDA Graduate School of Engineering Science, Osaka University, Japan Received January 7, 2002; Revised June 12, 2003; Accepted June 17, 2003
Abstract. Conventional T.V. cameras are limited in their field of view. A real-time omnidirectional camera which can acquire an omnidirectional (360 degrees) field of view at video rate and which could be applied in a variety of fields, such as autonomous navigation, telepresence, virtual reality and remote monitoring, is presented. We have developed three different types of omnidirectional image sensors, and two different types of multiple-image sensing systems which consist of an omnidirectional image sensor and binocular vision. In this paper, we describe the outlines and fundamental optics of our developed sensors and show examples of applications for robot navigation. Keywords: omnidirectional image, multiple image sensing system, sensor designing, optics, mobile robot navigation
1.
Introduction
Over the past 15 years, researchers in computer vision, applied optics and robotics have presented a number of papers related to omnidirectional cameras and their applications (Yagi, 1999). There have been several attempts to acquire omnidirectional images using a rotating camera, a fish-eye lens, a conical mirror and a spherical mirror. The various approaches taken for obtaining omnidirectional images can be classified into the following three types; the use of multiple images, the use of special lenses and the use of convex mirrors. A straightforward approach for obtaining an omnidirectional image from multiple images is to rotate the camera around its vertical axis (Saraclik, 1989; Zheng and Tsuji, 1990a, b; Ishiguro et al., 1992, Barth and Barrows, 1996). The camera rotates around with constant angular velocity. Vertical scan lines are taken from different images and stitched together to form an omnidirectional image. The horizontal resolution of the obtained omnidirectional image depends not on
the camera resolution, but on the angular resolution of the rotation. Therefore, an advantage of this particular method is that we can acquire a high-resolution omnidirectional image if the camera is precisely controlled. However, this method has the disadvantage that it takes a rather a long time to obtain an omnidirectional image. This restricts its use to static scenes and not dynamic or real-time applications. A fish-eye camera, which employs a fish-eye lens instead of a conventional camera lens, can acquire a wide-angle view of as much as a hemispherical view, in real time. Morita proposed a motion stereo method for measuring the three-dimensional locations of lines by mapping an input image on the spherical coordinates (Morita et al., 1989). Researchers at the University of Cincinnati applied a fish-eye camera to mobile robot position control using given targets (Cao et al., 1986; Roning et al., 1987; Cao and Hall, 1990) and following specific lines (Matthews et al., 1995). The camera, however, can obtain good resolution peripherally only above and not below or out to the sides. This means
174
Yagi and Yachida
that the image obtained through the fish-eye lens has rather good resolution in the center region but has poor resolution in the peripheral region. Image analysis of the ground (floor) and objects on it is difficult because the images appear along the circular boundary where image resolution is poor. Furthermore, it is difficult to generate a complete perspective image, because fisheye lenses do not have the single center-of-projection property. Details of the single center of projection will be described later. The basic idea of using a convex mirror is that a camera is pointed vertically toward a convex mirror with the optical axis of the camera lens aligned with the axis of the mirror (Ayres, 1942). By setting the axis of the camera vertically, we can acquire a 2π view around the camera in real time. Since the middle of the 20th century, convex mirrors have been used for insolation measurements and the photography of mountain views. A Globoscope is an orthogonal projection using a parabolic mirror and a 35 mm camera. An OT-scope combines a spherical mirror and an aspherical mirror, and can observe a hemispherical view. However, these sensors have not been applied to the fields of computer vision nor robotics. Since 1988, we have developed several new real-time omnidirectional image sensors which consist of a CCD camera and a convex mirror placed in front of a camera which can observe panoramic images simultaneously. This is used for robot navigation and human interaction problems. In this paper, we present two different types of our omnidirectional image sensors, named COPIS and HyperOmni Vision, and two different types of our multiple image sensing systems which consist of an omnidirectional image sensor and binocular vision (Yagi and Kawato, 1990; Yamazawa et al., 1993; Yagi et al., 1994a; Konparu et al., 1997; Yachida, 1998). Furthermore, we propose methods for map generation, localization and route navigation. Such real-time omnidirectional cameras can be applied in a variety of fields, such as autonomous navigation, telepresence, virtual reality and remote monitoring, but the resolution of omnidirectional cameras is not sufficient for the detailed analysis of an interesting object. The anisotropic property of the convex mirror, known as spherical aberration, astigmatism and coma (Rossi, 1962), results in much blurring in the input image of the TV camera if the optical system is not well designed. For instance, the curvature of the conical mirror across a horizontal section is the same
as that of the spherical mirror. If the camera lens has a large relative aperture, the rays coming through the horizontal section from the object point do not focus at a common point, an effect known as spherical aberration. The curvature of the conical mirror on the vertical section is the same as that of a planar mirror, while that on the horizontal section is the same as that of a spherical mirror; hence, the rays reflected through the vertical or horizontal planes focus at different points. Thus, to reduce blurring and to obtain a clear picture, one needs an optical system that can cover focus points from both optics. In particular, a careful design is needed when using a single convex mirror. Miniaturizing the omnidirectional camera usually makes it hard to reduce aberration. In order to overcome some of these disadvantages, we propose the super-resolution technique for an omnidirectional image sensor (Nagahara et al., 2000, 2001), and a new omnidirectional image sensor, named TOM (Yagi and Yachida, 1998), which can acquire an omnidirectional view around the view center, in real-time. The optics of TOM are similar to those of a reflecting telescope, which consist of two paraboloidal mirrors or hyperboloidal mirrors, to minimize blurring. The optics satisfy the important constraint of a single center of projection. 2. 2.1.
Omnidirectional Image Sensors COPIS and Conic Projection
The omnidirectional image sensor COPIS (Conic Projection Image Sensor) (Yagi and Kawato, 1990; Yagi et al., 1994b) as mounted on a robot, consists of a conic mirror and a TV camera, with its optical axis aligned with the cone’s axis, in a cylindrical glass tube, as shown in Fig. 1(a). The key feature of COPIS is the passive sensing of the omnidirectional environment in real-time (at the frame rate of a TV camera), using a conical mirror (see Fig. 1(b)). COPIS maps the scene onto the image plane through a conical mirror and a lens. We call this mapping “conic projection”. Let us use the three-dimensional coordinate system O − XYZ, aligned with the image coordinate system o − x y and the Z -axis pointed toward the cone’s vertex (see Fig. 2). We fix the origin O at the camera’s center, thus the image plane is on the level of f , where f is the focal length of the camera. A conical mirror yields the image of a point in space on a vertical plane through the point and its axis. Thus, the point
Real-Time Omnidirectional Image Sensors
Figure 1.
175
COPIS.
that the vertical line in the environment appears radially in the image plane. The vertical lines provide a useful cue in man-made environments such as rooms, which tend to contain many objects with vertical edges; for example, doors, desks and shelves. Many researchers use this characteristic for robot navigation. Details of our robot navigation methods are described later. The field viewed by COPIS is limited by the vertex angle of the conical mirror and the visual angle of the camera lens. As shown in Fig. 2(b), the vertical section of COPIS is a flat mirror and the ray from point P changes its optical direction using this. This means that the vertical angular resolution of COPIS is linear in the input image. We represent the image and the space by two polar coordinate systems (r, θ ) and (R, θ, Z ), respectively. As shown in Fig. 2, simple geometrical analysis of the vertical section through P and the Z -axis yields the following equations of the conic projection: r = f tanβ
Figure 2.
Conic projection.
P at (X, Y, Z ) is projected onto the image point p at (x, y) such that tanθ =
R + H sinδ β = δ − tan Z − H (1 − cosδ) π 0 ≤ β ≤ ,β < δ 2 −1
Y y = X x
(1)
In other words, all points with the same azimuth in space appear on a radial line through the image center, and its direction θ indicates the azimuth. This means
(2)
where H is the distance between the lens center and the vertex of the cone. If the distance H and the focal length f are given, the object height Z can be calculated by substituting the observed vertex angle β into (2). In other words, the virtual lens center O corresponding to the lens center O is defined by H sin 2δ R = (3) Z H 1 + cos 2δ
176
Yagi and Yachida
The virtual lens center does not cross at a single point but appears on a circle defined by (3). A serious drawback of COPIS is that it does not have a focal point (a single center of projection). Therefore, it is impossible to generate a distortion-free perspective image from an input omnidirectional image. The advantages of COPIS are that the vertical angular resolution is the same as the input camera system because the vertical section of the conic mirror is the same as a flat mirror, and the main field is a side view. COPIS is sufficient for detecting objects around a mobile robot, and suitable for a rough understanding of the environment. However, the distortion and the resolution of COPIS are not sufficient for detailed analysis of an interesting object. To overcome these disadvantages, we developed a new multiple-image sensing sensor MISS on a single camera with two types of imaging; omnidirectional imaging which is able to obtain a 360 degree field of view with limited resolution, and binocular vision which is able to obtain a high-resolution image with a limited field of view. Furthermore, we developed a new omnidirectional image sensor named HyperOmni Vision which consists of a CCD camera and a hyperboloidal mirror placed in front of the camera. An important feature of using the hyperboloidal mirror is that it has a focal point (a single center of projection). 2.2.
MISS
The tasks of an autonomous mobile robot can be classified into two categories; navigation to a goal, and the manipulation of objects at the goal. For navigation, robot vision must generate a spatial map of its environment for path planning, obstacle (possibly moving) collision avoidance, and finding candidates of interesting objects. For this purpose, a detailed analysis is not necessary but a high speed rough understanding of the environment in the vicinity of the robot is required. Omnidirectional sensing is suitable for this purpose as it can view 360 degrees in real time. On the other hand, for object manipulation or detailed analysis of an interesting object, a limited field of view is sufficient but a high resolution is required (namely a local view). The resolution of the omnidirectional camera is not sufficient for this purpose, since an omnidirectional view is projected onto an image with limited resolution, say 512 × 480. One approach is to use a high-resolution image sensor and to generate a local perspective view by transformation. However, even a resolution of the
current high-definition T.V. is not sufficient for the detailed analysis of an interesting object. Therefore, we use a conventional T.V. camera with a limited field of view for the local view. It is desirable to have these two cameras on an autonomous mobile robot, an omnidirectional camera which can obtain a 360 degree field of view with limited resolution and a local view camera which can obtain a high-resolution image with a limited field of view. For the local view, we employ binocular vision to obtain stereo information. Instead of putting both of these cameras on a mobile robot, we propose a new multiple-image sensing system (MISS) on a single camera with two types of imaging; omnidirectional imaging and binocular vision, since compactness and being light-weight are important design features of mobile robots (Yagi et al., 1994a). Figure 3 shows a prototype of an MISS built in our laboratory. It has three components; the optics of an omnidirectional imaging system, a binocular vision adapter, and a tilting optical unit for changing the viewing direction of binocular vision. We used a conical mirror for the omnidirectional imaging as it was at hand when we built the MISS and it was also sufficient for the purpose of multiple sensing. The omnidirectional image obtained by a conical mirror has a better resolution of azimuth angle in the peripheral area than in the central region. For example, the resolution of the azimuth angle was approximately 0.25 degrees in the peripheral area of the image, whereas it was approximately 1 degree at the circumference of a 60-pixel radius from the image center. In
Figure 3.
MISS.
Real-Time Omnidirectional Image Sensors
Figure 4.
177
Optical configuration of MISS.
particular, the central region of the omnidirectional image has poor resolution of the azimuth angle. The peripheral region of the image plane is used for omnidirectional sensing. Then, as shown in Fig. 4(a), we improved the shape of the conical mirror from a circular cone to a frustum of a cone. We also bore a hole in the central region of the frustum for use in binocular vision imaging, as shown in Fig. 4. MISS has a fast tilting unit for viewing an interesting object, but not a panning unit since panning can be done by rotating the robot itself. Two images taken by the binocular vision are projected on the central region of the image plane through the hole by using reflecting mirrors. Four mirrors are used as shown in Fig. 4(b). Two inside mirrors M R I , M L I are set perpendicular to each other and almost parallel to the outside mirrors (M R O, M L O). The reflected light from the environment first goes to the upper mirror of the tilting unit (c) and is reflected downward to the lower mirror of the tilting unit. It is then reflected leftward to the outer mirrors of the mirror complex and is changed downward to focus on the central region of the image plane by the mirror complex. The right half of the central region is used for the right view and the
other for the left. The upper mirror of the tilting unit can be rotated by a motor, thus the viewing direction of the binocular vision can be controlled by changing the tilt angle of the upper mirror. An image taken by MISS is digitized to 512×480 pixels (each pixel having 256 levels). Figure 5(b) shows an example of the input image taken by MISS in the environment, as shown in Fig. 5(a). 2.3.
HyperOmniVision
We proposed a new omnidirectional image sensor named HyperOmni Vision which consists of a CCD camera and a hyperboloidal mirror placed in front of the camera (see Fig. 6) (Yamazawa et al., 1993, 1995). The hyperboloidal mirror has a focal point which makes possible the easy generation of any desired image projected on any designated image plane, such as a perspective image or a panoramic image, from an omnidirectional input image deformed by the hyperboloidal mirror (see Fig. 7(a)). This easy generation of perspective or the panoramic images allows the robot vision to use existing image processing methods for autonomous navigation and manipulation. It also allows
178
Figure 5.
Figure 6.
Yagi and Yachida
An example of miss input image (a) scene and (b) input image.
HyperOmni vision.
a human user to see familiar perspective images or panoramic images, instead of an unfamiliar omnidirectional input image deformed by the hyperboloidal mirror. The hyperboloidal surfaces can be obtained by revolving the hyperbola around the Z axis and having two focal points at (0, 0 + c) and (0, 0, −c), as shown in Fig. 7(b). Using the world coordinates system (X, Y, Z ) the hyperboloidal surface can be represented as: X2 + Y 2 Z2 − = −1 a2 b2 c = a 2 + b2
(4)
where a and b define the shape of a hyperboloidal surface. We use one of the hyperboloidal surfaces at Z > 0 as the mirror. Figure 7(b) shows the geometry of HyperOmni Vision which consists of a CCD camera and a hyperboloidal mirror. It should be noted that the focal point of the hyperboloidal mirror O M and the lens
center of the camera OC are fixed at the focal points of the hyperboloidal surfaces (0, 0, c) and (0, 0, −c), respectively. The axis of the camera is aligned with that of the hyperboloidal mirror. The image plane should also be placed at a distance f (focal length of camera) from the lens center of camera OC , and be parallel to the X Y plane. A hyperboloidal mirror yields an omnidirectional image of the scene around its axis. We can observe the image using a TV camera with its optical axis aligned with that of the hyperboloidal mirror. HyperOmni Vision maps the scene onto the image plane through a hyperboloidal mirror and a lens. We call this mapping “hyperboloidal projection.” We will briefly describe how a point P in space is reflected by the hyperboloidal mirror and projected on the image plane. A ray going from point P(X, Y, Z ) in space toward the focal point of the mirror O M is reflected by the mirror and passes through the other focal point OC , which intersects the image plane at point p(x, y). Any point in space in the field of view (360 degrees around the Z axis) of the hyperboloidal projection satisfies this relation. Therefore, we can obtain an omnidirectional image of the scene on the image plane with a single center of projection O M . The angle in the image, which can be easily calculated as y/x, shows the azimuth angle θ of point P in space. Also, it can be easily understood that all points with the same azimuth in space appear on a radial line through the image center. This useful feature is the same as in a conic projection, as shown in (1). Therefore, with a hyperboloidal projection, the vertical edges in the environment appear radially in the image. By simple geometrical analysis, equations relating the point in space P(X, Y, Z ) and its image
Real-Time Omnidirectional Image Sensors
Figure 7.
Hyperboloidal projection.
179
180
Yagi and Yachida
point on the image plane p(x, y), can be derived as follows: Z =
X 2 + Y 2 tan α + c (b2 + c2 ) sin γ − 2bc α = tan−1 (b2 − c2 ) cos γ x 2 + y2 −1 γ = tan f
(5)
where α denotes the tilt angle of point P from the horizontal plane, f is the focal length of the camera lens, and a, b and c are parameters defining the shape of the hyperboloidal mirror. From Eqs. (1), (4) and (5), the azimuth angle θ and the tilt angle α of the line connecting the focal point of the mirror O M and the point in space P can be obtained from the position of the image point p(x, y). This means that the equation of the line connecting O M and P can be determined uniquely from the coordinates of the image point p(x, y), regardless of the location of the point P in space. HyperOmni Vision has a very interesting characteristic in terms of its single center of projection. However, generally, the anisotropic property of the convex mirror results in much blurring in the input image of the TV camera if the optical system is not well designed. Figure 8 shows the image resolution and point spread function (PSF) of HyperOmniVision. In this case, the input image was focused to a tilt angle of 0. Thus, to reduce blurring and to obtain a clear picture, one needs
Figure 9.
an optical system that can cover focal points from both optics. One idea is to use an optical system with a small lens aperture, a short focal length, and a small mirror curvature. However, using a small lens aperture makes it dark, and a small curvature decreases the vertical field of view. Furthermore, as shown in Fig. 9, the size of the spot diagram (blur size) is increased with the miniaturization of the mirror. This means that it is usually hard to reduce aberration by an omnidirectional image sensor with a single reflective mirror. Ree and Mich developed a panoramic viewing device for use in vehicles utilizing a hyperboloidal mirror (Rees and Mich, 1970). Their sensor was used only for monitoring by an operator. Our HyperOmni Vision has been independently developed. One of the important advantages of our HyperOmniVision is the easy generation of perspective and panoramic images. However, these authors did not consider these in such image processing. We are the first group to show the merit of an omnidirectional image sensor with a single center of projection. 2.4.
Figure 8. Image resolution and point spread function of HyperOmniVision.
Spot diagram of HyperOmni vision.
Tiny Omnidirectional Image Sensor TOM with Dual Reflection Optics
One promising idea for miniaturizing and focusing is to use multiple mirrors (Rees and Mich, 1970; Greguss, 1985; Chahl and Srinivasan, 1997; Takeya et al., 1998; Powell, 1995; Rosendahi and Dykes, 1983; Davis et al., 1997). A multiple mirror system usually results in clearer images than an optical system with a single reflective mirror. Greguss has proposed an optical lens system that can acquire a distortion free panoramic image (Greguss, 1985). The optics of the proposed
Real-Time Omnidirectional Image Sensors
system, named Panoramic Annular Lens (PAL), consists of a single glass block with two reflective and two refractive planes. The PAL optics does not have to be aligned, and can easily be miniaturized. In contrast, other omnidirectional imaging optics require several optical elements to be aligned. One disadvantage of the PAL optics is that it is difficult to increase the field of view of depression. Takeya, etc. have designed optics, such as a reflecting telescope, which consists of two convex mirrors, to minimize the influence of blurring (Takeya et al., 1998). The curvatures of both mirrors are optimized for minimizing the blur on the image plane. The fundamental configuration of the optics is similar to PAL, but the optics do not satisfy the important requirement of a single center of projection. We propose a tiny omnidirectional image sensor, named TOM (Twin reflective Omnidirectional image sensor), which can acquire an omnidirectional view around the view center in real time (Yagi and Yachida, 1998, 1999, 2000). Our optics, which consists of two paraboloidal mirrors or two hyperboloidal mirrors, not only satisfies the important optical characteristics of a single center of projection and can minimize the blurring, but it is also able to easily obtain a wide side view-centered panorama image. At almost the same time, Nayer investigated the same optics with two paraboloidal mirrors and developed a prototype of the omnidirectional camera (Nayar and Peri, 1999). However, the developed camera did not use two paraboloidal mirrors but paraboloidal and spherical mirrors because the manufacturing of a spherical mirror is easier than that of paraboloidal mirrors. Bruckstein and Richardson also showed the idea of two paraboloidal mirrors. However, a real sensor has not been reported (Bruckstein and Richardson, 2000). As shown in Fig. 10, the sensor consists of a primary paraboloidal mirror and a secondary paraboloidal mirror. The shapes of the primary and secondary mirrors are convex and concave, respectively. In the case of a convex paraboloidal mirror, all rays from the environment reflect at the mirror and run parallel to the rotating axis of the convex paraboloidal mirror. A straight line that extends the ray from the object point in the environment passes through the focal point, regardless of the location of the object point. This means that the important single center of projection is satisfied. If we set the secondary concave mirror along the rotational axis of the primary paraboloidal mirror, parallel rays reflected from the primary convex paraboloidal mirror are converged to a focal point of the secondary paraboloidal
Figure 10.
181
Optics of TOM.
mirror. Therefore, by setting the lens system on the focal point of a secondary paraboloidal mirror, an omnidirectional input image, which satisfied the single center of projection, can be obtained. The distance between both paraboloidal mirrors can be decided by minimizing the blurring. We design the optics of a TOM by the following steps. (1) The shape of the paraboloidal mirror is defined by two paraboloidal parameters. Therefore, by giving the radius of a primary paraboloidal mirror and the field of view (actually, the maximum angles of declination), the parameters of the primary paraboloidal mirror are determined. (2) Secondly, to set the lens system inside the primary mirror, a paraboloidal cylinder is chosen as the shape for the primary mirror. The principle point of the lens system is to set the focal point of the secondary mirror. Therefore, the focal point of the secondary mirror is further away than the vertical position of the primary mirror. (3) Maintaining this relationship between the two paraboloidal mirrors, the best focusing position is searched for by changing the distance between the two paraboloidal mirrors. Actually, the TOM is designed by using the commercial optical design program ZEMAX. Figure 11 shows a prototype of TOM and an example of the input image. The semi-diameters of the primary and secondary mirrors are 9.5 mm and 9.6 mm, respectively. The distance between the mirrors is 17 mm. We use a 270,000 pixels 1/3 inches monocular CCD device for a current system. The F number and the focal
182
Yagi and Yachida
Figure 11.
Tiny omnidirectional image sensor TOM.
length of the camera lens are 4 and 6 mm, respectively. The maximum angles of declination and elevation are approximately 25 degrees and 25 degrees, respectively. The size of TOM was approximately half the size of HyperOmni Vision. The wide side view omnidirectional images could be acquired by the TOM. 2.5.
High-Resolution Omnidirectional Imaging and Super-Resolution Modeling
Catadioptric omnidirectional image sensors using convex mirrors can simultaneously capture omnidirectional information and can continuously observe objects while the camera moves around an environment. These advantages are suitable for remote monitoring. However, the resolution of the transformed image is lower than that of a conventional camera image, because a 360 degree field of view is captured by only a single charge-coupled device. In order to solve this problem, we propose two types of the super-resolution method from consecutive omnidirectional images shifted by rotating the HyperOmniVision around the vertical axis (Nagahara et al., 2000, 2001). Our super-resolution approaches can be achieved by obtaining more samples of the scene from a sequence of displacement images. HyperOmniVision has an axis-symmetric structure around the vertical axis (optical axis of the lens and a single center of projection). This means that it is easy to align consecutive images, even if the camera is rotating around the vertical axis. Therefore, both methods choose rotation on the vertical axis as the camera motion to make a sequence of displaced images. As mentioned before, the anisotropic property of the convex mirror results in much blurring in the input image of the TV camera. The image resolution depends on the physical characteristics of the sensor, for instance, the optics, density and spatial response of the CCD elements and the image resolution of an output image such
Figure 12.
Spatial frequency against depression angle.
as a panoramic image. Figure 12 shows the cut-off frequency of the CCD with sub-pixel displacement and a MTF (Modulated Transfer Function) response against depression when the focal point of the lens is set on the depression angle of −10, 10 and 40 degrees. Here, f 10 means that the focus is set at the point where the depression angle is −10 degree. The MTF frequency response is changed by a focus point. The MTF frequency response satisfies the CCD frequency requirement near the focal point. This result shows that input images have been blurred and it is difficult to obtain high-resolution images simply by selecting the nearest pixel data from the sequence of displacement images. For the first approach, we model the optical relation of HyperOmni Vision and reduce the image blur by a common back projection method (Peleg et al., 1987; Irani and Peleg, 1991). Super-resolution by back projection is obtained by (6) and (7) iteratively.
gkn (Pi ) = f n (Pt )h PSF (Pt ) (6) Pt
f
n+1
(Pt ) = f n (Pt ) +
(gk (Pi ) BP 2 h Pt Pi n − gk (Pi )) c Pi ∈∪k Pik,P h BP Pt P i
(7)
i
where Pt and Pi are pixels on the texture and input image coordinates, respectively. f n (Pt ) is the estimated texture after the n iteration on the texture coordinates. gkn (Pi ) is the predicted input image simulated from f n (Pt ), as shown in Eq. (6). Here, h PSF is the optical relation of HyperOmni Vision defined by PSF. gk (Pi ) is a real input image. As shown in Eq. (7), the iterative estimation is done until the difference between the predicted input image and the real input image is minimized. h BP is a back projection kernel that means the Pt Pi
Real-Time Omnidirectional Image Sensors
contribution of Pt by Pi . c is a constant normalizing factor. The kernel is calculated from Eq. (6) by the voting algorithm. Pik,Pi is the set of Pt that has an effect on Pi . This super-resolution method can perform not only super-resolution but also blur restoration simultaneously. The blur of the input image depends on the depression. In the case of f −10 , the CCD response is only higher than the frequency of MTF at depression angles of between −18 to 3 degrees. The optical property is not satisfied with the CCD frequency requirement when the depression angle is higher than 3 degrees. This blur restricts the resolution improvement. Of course, the back projection method is able to include the blur restoration process, but it is not effective when the blur is very large, like with HyperOmniVision, in spite of its high computational cost. By the way, if the focal point is shifted to 10 degrees, the range that satisfied the CCD frequency requirement is also shifted, as shown in Fig. 12. This means that the MTF responses would be satisfied with the CCD frequency requirement through the whole view field if we could select adequate focal points. The second proposed method improves the transformed image resolution by combining sub-pixels displaced and multi-focused images. We know that a clear image can be obtained at a certain interval of angle of depression against a focal point. Therefore, we divide the input image into a few areas by depression angles. Then, each area selects the best focal point, and its image resolution is improved by selecting the nearest pixel data from the displaced image sequence. Finally, a high-resolution image is made by stitching each divided region together. Here, the super-resolution image is constructed by selecting the nearest neighbor pixel from the displaced consecutive omnidirectional images directly. This method is available for real-time processing, because of its low computational cost. Figure 13 shows experimental results of the superresolution of the panoramic image. The simple bilinear interpolation resolution is too poor (PSNR 21.2 dB) and characters in the image were illegible. In the case of nearest neighbor interpolation from single focused sub-pixel displaced images, the PSNR increased to approximately 4 dB. Furthermore, PSNR increases to approximately 26 dB by the proposed methods. It is worth noting that many researchers have investigated several 3D modeling methods with omnidirectional image sensors (Yagi et al., 1995, 2000; Delahoche et al., 1998; Etoh et al., 1999; Simamura
183
Figure 13. Super resolution of omnidirectional image. (a) Bilinear, (b) Nearest, (c) Multi-focus and (d) Back projection.
et al., 2000; Kawasaki et al., 2000a,b; Hicks et al., 2000; Brassart et al., 2000; Svodoba et al., 1998; Chang and Hebert, 2000). In practice, however, the environmental model generated cannot be applied to the applications mentioned above because of the low angular resolution of the sensors, and because the image resolution of the observed surface texture is too low for monitoring details. In essence, catadioptric omnidirectional image sensors have the advantages of simultaneous omnidirectional sensing and easy handling, but the disadvantage of low angular resolution. We have proposed the high-resolution omnidirectional imaging system mentioned above (Nagahara et al., 2000, 2001a). Nayer has also proposed a method that rotates the omnidirectional image sensor around an axis perpendicular to the lens
184
Yagi and Yachida
axis (Nayar and Karmarkar, 2000). However, with these methods, the camera must remain motionless while recording the images. Hence, these methods cannot be applied to an omnidirectional image sequence while the sensor moves along a path. We propose a resolution-improvement method for a 3D model generated from the omnidirectional image sequence (Nagahara et al., 2001b). The proposed method only requires an omnidirectional video stream with smooth sensor motion. The method improves the resolution of the textures mapped on the geometrical surface model, using image mosaic and super- resolution techniques, generated from an image sequence. Our modeling method can be applied not only to an omnidirectional image sensor but also to a standard camera. We call this concept super-resolution modeling. Many super-resolution methods combining multiple low-resolution images have been proposed for improving resolution (Tsai and Hang, 1984; Peleg et al., 1987; Irani and Peleg, 1991). The previous super-resolution techniques, which we call super-resolution imaging (as opposed to super-resolution modeling), usually generate a high-resolution image on the projective image plane at a viewpoint where low-resolution input images have been captured. In contrast, our proposed method is super-resolution on the 3D real object surface. Our proposed method of super-resolution modeling directly generates a high-resolution texture that is mapped on the model. The mosaic process is coupled with the super-resolution process. We define the weighting table to estimate the resolution difference depending on the position, the height and the posture of the object plane against the sensor. Super-resolution is done using this weighting table, and only high-resolution regions in the image sequence are stitched. Figure 23 shows the result of the resolutionimproved image and zoomed up images of before and after super-resolution. This result shows the effect of super-resolution in improving the resolution by fusing the information from multiple sub-pixel displaced images. 3.
3.1.
Omnidirectional Sensing for Autonomous Robot Map Generation and Localization
3.1.1. Egomotion and Posture Estimation. A general technique for estimating egomotion is presented by Nelson and Aloimonous (1988). To estimate egomo-
tion with 6 DOF, this technique requires optical flows on three orthogonal great circles on spherical coordinates. As egomotion is determined by iterative calculation, the computational cost is high. Gluckman and Nayer have estimated the 6 DOF of the egomotion of the camera using their omnidirectional camera. However, the estimation error is not so small (Glucjman and Nayer, 1998). In the case of mobile robot navigation, the egomotion of the robot is usually caused by juggling due to unevenness in the ground plane. Under the assumption of the ground plane motion of the robot, we have proposed a method for robustly estimating the rolling and swaying motions of the robot from omnidirectional optical flows using the special characteristics of an omnidirectional camera with a single center of projection (Yagi et al., 1996a, b). The optical flow for a projected point on the image plane can be decomposed into perpendicular radial and circumferential components with respect to the center of projection. The circumferential component is perpendicular to the radial component. As shown in Figs. 2 and 7, the radial and circumferential components can be defined by ∂β(t) , ∂θ∂t(t) , respectively. From (5), the ∂t input image taken by HyperOmni Vision can be easily transformed to coordinates where the origin is fixed at the focal point O M . Therefore, it is equivalent to evalin the place of the radial component of the uate ∂α(t) ∂t optical flow ∂β(t) . Now, we examine how both of the ∂t flow components, ∂α(t) and ∂θ∂t(t) , arise. ∂t Suppose the robot moves horizontally in a static environment. From (1) and (5), generally, ∂α(t) and ∂θ∂t(t) are ∂t caused by translational motions of the robot. However, in the case of a point on the horizontal plane which passes through the focal point O M , from (5), the tilt angle holds constantly at zero because Z is equal to c. This means that the radial component of the optical flow at any point on the horizontal plane which passes through the focal point O M is independent of the translational motion of the robot. The radial component of the optical flow at a point on the horizontal plane is affected by the rolling motion, as shown in Fig. 14. Here, the illustrations use the cylindrical coordinate system. The rotation is referred to as “rolling motion” and the translation is referred to as “vertical motion”. We call the radial component of the optical flow at a point on the horizontal plane in the cylindrical coordinates as “radial flow”. On the other hand, the circumferential component of optical flow ∂θ∂t(t) at a point on the horizontal plane
Real-Time Omnidirectional Image Sensors
185
The circumferential flow can also be represented by the following function. ycircum (θit ) =
Figure 14.
Radial flow in HyperOmni vision.
Figure 15.
Circumferential flow in HyperOmni vision.
(called circumferential flow) is independent of the rolling motion. The circumferential flow is caused by the sway motion and the translational motion of the robot, as shown in Fig. 15. As a result, the radial and circumferential flows in the cylindrical coordinates can be represented by a general sine function. As shown in the next equation, in the case of radial flow, the amplitude, phase shift and offset of the sine function corresponded to the roll angle δ(t), direction (angle γ (t)) of the rolling axis and deviation ot of the vertical position, respectively. yradial (θit ) = Rcyl tan δt sin θit + γt +
ot f Rit
(8)
MHt f sin (θit + ξt ) + Rcyl tan t Rit
(9)
Here, the robot motion parameters are shown in Fig. 16. Rcyl is the radius of the projected cylindrical image plane. f is the focal length of the camera. yradial (θit ) and ycircum (θit ) are the magnitudes of the radial flow and the circumferential flow at azimuth angle θit in cylindrical coordinates, respectively. Rit and θit are the distance and azimuth angle of the feature point i at time t. M H t is the horizontal motion, Rit is the distance to the feature point at the azimuth angle, ζt is the direction of horizontal translation, and t is the rotation about the Z axis. As shown in (8), the offset Roitt f actually depends on the distance from the robot to the observed feature point. Therefore, in the case of a large difference in distances among observing features, it is difficult to estimate the precise rolling motion of the robot. However, in general, the distance Rit is sufficiently shorter than the magnitude of the deviation ot of the vertical motion that, most of the observed features are at a similar distance from the robot. Therefore, one can consider that Rfit is approximately constant. Under a swaying motion, the magnitude of the circumferential flows are constant, regardless of the observed azimuths (see Fig. 15(a)). Under translational motion, the magnitude of the obtained flow depends on MRHit t . Thus, it is difficult to fit the observed data with the equation because M H t is not sufficiently smaller than Rit . However, the sign of this equation corresponds with the one of the numerator because of the positive denominator. The circumferential flows have the opposite sign with respect to the direction of translational motion of the robot. Therefore, the swaying motion was estimated by evaluating the sign of the circumferential flow. Figure 17 shows an example of the estimated result of the radial flow model. 3.1.2. Geometrical Map Generation. The generation of a stationary environmental map is one of the important tasks for vision-based robot navigation. For this purpose, a detailed analysis is not necessary but high speed and a rough understanding of the environment around the robot is required. If considered from the viewpoint of machine perception, autonomous navigation needs a field of view as wide as possible. Thus, a real-time omnidirectional camera, which can acquire
186
Yagi and Yachida
Figure 16.
Definition of robot egomotion parameters.
Figure 17.
An estimated result of radial flow model.
an omnidirectional (360 degrees) field of view at video rate, is suitable for autonomous navigation. There has been much work carried out on mobile robots with vision systems which navigate in both unknown and known environments. We also propose methods for guiding the navigation of a mobile robot and generating an environmental map by monitoring the azimuth changes of the vertical edges in the image while the robot is moving. Let us denote the robot location and orientation at time t by (0 X t ,0 Yt ) and φ(t), respectively. As shown in Fig. 18, defining the position of the object i at time t = 0 by (0 X i ,0 Yi ), the relationship between the observed azimuth angle θi (t) of object i at time t and the object location relative to the robot is obtained as follows, tan (θi (t) − φ(t)) =
0
Yi − Yt 0 i − Xt
0X
Figure 18.
Coordinate system.
0
(10)
If the robot location and orientation at time t are given by a robot internal encoder, the unknown parameters in (10) are only the position (0 X i ,0 Yi ) of the object i at time t = 0. Under the assumption of known motion
of the robot, we could estimate the locations of objects around the robot by triangulation (Yagi et al., 1995). Under the assumption of known motion of the robot, environmental maps of the real scene are successfully generated by monitoring azimuth changes in the image. Several researchers have used this property for
Real-Time Omnidirectional Image Sensors
robot navigation (Pegard and Mouaddib, 1996). Delahoche et al have proposed the incremental map building method, based on the exploitation of the azimuth data given by omnidirectional vision and by an odometer (Delahoche et al., 1998). The robot position estimation and map updating are based on the use of an Extended Kalman Filter. However, observational errors in the generated environmental map accumulate during long movements of the robot because the encoder usually includes a measurement error. To generate a large environmental map, it is desirable not to assume a known robot motion. If the robot orientation φ(t) is given, the unknown parameters are the robot locations (0 X t ,0 Yt ) and the object locations (0 X i ,0 Yi ). The total number of unknown parameters and the total number of observational equations are (2i + 2(t − 1) − 1) and i × t, respectively. We estimate the location of objects and the robot by observing three object points from three different robot positions (Yagi et al., 2000). The fast and robust estimation can be done because the Eq. (10) becomes linear. Under the assumption of completely unknown motions of the robot, the unknown parameters are the robot locations (0 Ut ,0 Vt ), orientations φ(t) and object locations (0 X i ,0 Yi ). The azimuth angle θi (t) of the object i at time t can be obtained by the omnidirectional image sensor. Therefore, the total number of unknown parameters and the total number of observational equations are (2i + 3(t − 1) − 1) and i × t, respectively. If the following relation is satisfied, the robot egomotion and object locations can be estimated at the same time. (i − 3)(t − 1) >= 2
the orientation and location parameters are alternately estimated while the robot is moving. As mentioned before, the magnitude of the obtained azimuth changes depend on Di (t). If the distance Di (t) is known, the unknown parameters are M(t), α(t) and αt , and we can fit the sine function to the Eq. (9). Therefore, first, we assume that the object distance is sufficiently far or roughly constant from the robot. Under this assumption, the robot orientation is estimated by fitting a sine function by (9). If the robot orientation is given, Eq. (10) is modified to a linear function, and the location estimation can be done by observing three object points from three different robot positions. Once the location estimation is done, the estimated location data are used for the azimuth change sine fitting in the next frame. Therefore, we normalize the obtained azimuth changes by distance Di (t), then estimate the robot orientation from all of the obtained azimuth change data whose object distances have already been estimated in the prior frame. The computational cost of each process is very low. All of the estimation can be done in real-time while the robot is moving. Figure 19 shows the experimental result of the map generation. Figure 20 shows another experiment where the robot snaked through the computer room. Figure 21 shows the locus map of the azimuth angles of the vertical edges. Figure 22(a) shows the result of the generated environmental map and estimated trajectory of the robot movement. Figure 22(c) shows our previous result. In this case, the robot orientation was given by the internal sensor. In the case of the proposed method, the average error of the location measurement of the robot was approximately 10 cm. Figure 22(b)
(11)
Equivalently, location estimation can be done by observing three object points from five different robot positions or four object points from four different robot positions (Yagi et al., 2000). Map generation and robot egomotion estimation can be done by solving the aforementioned nonlinear Eq. (10). In general, a nonlinear observational equation can be solved by an iterative nonlinear estimation method such as Levenberg-Marquardt (Etoh et al., 1999). However, one can consider that it takes a long computational time for converging an evaluation function of the iterative nonlinear estimation method. Therefore, we solve the observational equation in realtime by redefining them as a combination of two linear observational equations (Yagi et al., 2000). Actually,
187
Figure 19.
Generated 3D model (Corridor).
188
Yagi and Yachida
(Yamazawa et al., 2000). The difference from a common Hough transformation is the shape of the Hough space. Our Hough space is cubical and it is suitable for mobile robot applications because of the low computational cost for voting edges and the high accuracy of 3D line reconstruction.
Figure 20.
Experimental scene.
Figure 21.
Locus map of vertical edges.
shows the result when we skipped the process of the normalization by the object distance. In this case, a large error occurred. This means that the normalization is effective for map generation by the proposed method. The method mentioned above is useful for an environment where the floor is almost flat and horizontal, and the walls and static objects such as desks or shelves have vertical planes. However, sometimes, there are slanted edges in the general indoor environment. We propose the omnidirectional Hough transformation method for reconstructing 3D line segments
3.1.3. Outdoor Map Generation. In general, real outdoor environments, such as cities and university campuses, are complex and large. We propose a method for building a large-scale map based on route scenes, assuming that the topological relation of the routes at intersections are known (Li et al., 2000). The environment is defined by the network structure. Nodes and arcs of the network correspond to intersections and routes, respectively. Each route is represented by two-sided route panoramic images. Two-sided route panoramic images are obtained by stitching each bilateral input image together when the camera is moving along a smooth path with its optical axis orthogonal to the path. It is difficult to make a map in terms of intersections since there may be many similar intersections (a T junction and a crossroad) and routes in the outdoor environment. However, there are rarely similar closed loops which are an ordered sequence of routes and intersections. Therefore, a large network is generated by connecting closed loops. The robot follows the left side of the road boundary and finds the closed loop when it is back in the starting position. The starting position is detected by matching route scenes by DP matching. Since the robot moves along a closed loop, the route scenes appear in a cycle. So, first we match a panoramic view along the first path with the view along the second one by a dynamic programming method. If the matching score is low, we rematch it with the next path until we find the view along the same path. When paths are continuously correspond and a sequence of the corresponding path appears in a cycle, we consider that the closed loop could be found as shown in Fig. 26. The round and cross signs are indicated with matched and unmatched views, respectively. Figure 27(b) shows the generated environmental map of Fig. 27(a). 3.1.4. Topological Map Representation. We propose the abstract topological map generation method from omnidirectional image sequences (Hiura et al., 1995). A topological map representation is suitable for largescale space modeling because numerical data, such as precise positions, are not included in the map. Our
Real-Time Omnidirectional Image Sensors
Figure 22.
Figure 23.
189
Experimental results of map generation and location estimation of the robot.
An example of high-resolution modeling.
route map is represented by a sequence of abstracted scene descriptions such as “straight road”, “junction” and “crossroad”. We consider that a matching process between candidates of scene descriptions and omnidirectional sensory information is a sort of optimization problem.
Thus, we propose a adaptive scene description selection method based on the genetic algorithm (GA). First, we extract a sequence of azimuth angles of vertical edges from omnidirectional images and generate the local geometrical map. The generated local geometrical map is compared with each candidate of the route model. The candidates of the scene model are coded to the individuals of the GA. The best-matched individual is selected as the representing scene model at the current environment. Then, the GA selects the individuals that fit the environment and generates the descendants from the selected individuals. The individual with the best evaluation is assigned as the representative individual for the current environment. When the class that the representative individual belongs to is different from the previous one, the system adds it to the end of the global topological map. Figure 28 shows the behavior of the system. Figure 29 shows the result of the generated topological map after the robot movement. The representative local model along the route is illustrated by maps from
190
Yagi and Yachida
Figure 24.
Block diagram of the floor map building process.
Figure 25.
The floor map after inverse perspective transformation and the final estimated result of free space.
A to G. A broad line indicates the existence of the wall, and the triangles show the movable direction of the robot, respectively. Dark triangles show the actual direction of movement of the robot. For example, in the local map F, the robot can go straight and also can turn left. Actually, the robot moves straight. As shown in map C, the triangle and the wall indicate that the robot can move in that direction because a narrow space, like an opened door, exists or the wall is far enough from the robot position.
3.2.
Route Navigation
3.2.1. Reactive Navigation. Low-level action commands such as following a road and collision avoidance, need fast reaction capabilities. An artificial potential field is a common approach to realizing reactive collision-avoidance behavior. We are describing a new method for reactive visual navigation based on omnidirectional sensing (Yagi et al., 1999). The robot is projected at the center of the input image by
Real-Time Omnidirectional Image Sensors
Figure 26. Detecting cycles while a robot moves along a closed loop.
the omnidirectional image sensor HyperOmni Vision. Therefore, the rough free space around the robot can be extracted by the active contour model (see Fig. 30). The method produces low-level commands that keep the robot in the middle of the free space and avoids collisions by balancing the shape of the extracted active contours. The basic idea of following a path and avoiding collisions is similar to a combination of Santos-Victor’s (1995) method and Holenstein’s (1991) method, re-
Figure 27.
191
spectively. Santos-Victor is based on optical flows from two cameras directed opposite to each other and just following the corridor. It may collide against objects moving from the side or behind because of only lateral observations. For collision avoidance, Holenstein measured omnidirectional range data by setting 24 ultrasonic sensors around the robot. We can generate both behaviors at the same time by making use of the omnidirectional image sensor. Then, the robot can avoid obstacles and move along the corridor by tracking the close-looped curve with an active contour model. Furthermore, the method can directly represent the spatial relationships between the environment and the robot on the image coordinate. Thus, the method can control the robot without geometrical 3D reconstruction. A boundary of free space around the robot can be represented by a close-looped line in the input image because the robot is projected to the center of the input image by HyperOmni Vision. First, a rough free space area around the robot is detected using an active contour model. Then, by evaluating the shape of the extracted free space area, the method produces low-level motor commands that allow it to avoid the obstacle and keep the robot in the center of the free space. By evaluating three features calculated from the shape of the extracted free space area (active contours), the method produces low-level motor commands that avoid the obstacle and keep the robot in the middle of the free space. To be more specific, the robot action (moving direction) is determined by combining three features; the principal axis of inertia, the center of gravity of the extracted free space region and repulsive forces from the environment (obstacle and wall).
Map representation of environments using route-based network.
192
Yagi and Yachida
the right side, and finally came back to the middle of the corridor.
Figure 28.
Map generation by genetic algorithm.
As shown in Fig. 31(a), since the possibility of collision with the unknown object became evident, the robot changed its direction and moved along an arc toward the left side. Next, as shown in Fig. 31(b), the robot changed its direction and moved along an arc toward
Figure 29.
Topological map representation.
3.2.2. Memory-Based Navigation. Memory based navigation is a common approach for visual navigation. The basic operation is the comparison between present sensory inputs and previously memorized patterns. It is easy to relate the robot action and sensory data without the geometrical model. Zheng’s robot memorized the side of the scene of a route by a panoramic view while it moved along the route (Zheng and Tsuji, 1992). Matsumoto et al.’s robot memorized the whole front view image at reference points along the route for visual navigation (Matsumoto et al., 1997). The correspondence between the present input images and previously-memorized images was established by using a DP matching method and a correlation method, respectively. However, these methods need a large amount of memory for memorizing the route. Therefore, to reduce the memorizing data, Ishiguro proposed a compact representation by expanding it into a Fourier series (Ishiguro and Tsuji, 1996). Each input image is memorized by the coefficients of the low frequency components. His approach is simple and useful for navigation in a real environment. Aihara et al. compressed memorized data by KL transformation (Aihara et al., 1998). Most of these previous memory-based approaches memorize relationships between the apparent features (image) at reference points. The apparent features are useful information for finding the correspondence between the current position and the memorized ones and for estimating the orientation of the robot. However, the apparent features at the reference point do not directly represent the robot action against the environment and the spatial relationship between
Real-Time Omnidirectional Image Sensors
Figure 30. Result of extracted free space and principal axis of inertia around the center of gravity.
the environment and the robot. An important point for memory-based navigation is how to represent the relationship between the environment and the robot behavior. We represent this by performing 2-D Fourier transformations on an omnidirectional route panorama which can be acquired by arranging points on the horizontal plane, which passes through the virtual center of the lens, taken by the robot moving along the route, as shown in Fig. 32 (Yagi et al., 1998). The omnidirectional route panorama in the past certain number of frames, which is a standard unit of spatio-tempral representation, is transformed to a 2D Fourier power spectrum (2DFPS). The route is memorized by a series of 2DFPS. While the robot is navigating towards the goal point, it is controlled by comparing the pattern of the memorized Fourier power spectrum and its principal axis of inertia. The method can directly represent the temporal and spatial relationship between the environment and the robot.
Figure 31.
193
Actually, as shown in Fig. 34(a), when the robot moves along a simple environment, patterns on the omnidirectional route panorama are distributed in a lower frequency component of the 2DFPS. On the other hand, if the robot moves in the same sized environment, higher frequency components increase in the area where there are dense features, as shown in Fig. 34(b). The frequency components of 2DFPSs can represent the complexity of the environment relative to the robot movement and the size of the environment. The direction of the distribution in the 2DFPSs includes 3D information because it is perpendicular to the slope of the tangent of the patterns in the Omnidirectional Route Panorama. The pattern indicates the azimuth change of the locus of the feature while the robot is moving. As shown in Fig. 34(c), in the case of a large distance between the robot and a wall, patterns on the wall move slowly. The inclination of the distribution in the 2DFPSs increases and the distribution shifts toward a longitudinal axis. On the other hand, as shown in Fig. 34(c), when the robot moves near a wall, the inclination decreases and the distribution shifts toward a transverse axis. Therefore, the inclination of the distribution in the 2DFPSs can represent the environment relative to the robot location and the robot movement without geometrical reconstruction. (a) Memory Organization. A route toward the goal position is divided into straight paths and subgoals are set at the joints. The rotational angle of the robot is memorized at each subgoal. To reduce the total amount of iconic memory, each path is represented by a series of representative 2DFPSs which are automatically selected and memorized. The first 2DFPS is set as the representative 2DFPS. The cross-correlation between the polar transformed representative 2DFPS and the
A result of collision avoidance against an unknown moving obstacle.
194
Yagi and Yachida
Figure 32.
Omnidirectional route panorama. Figure 34.
Relation between environment and 2DFPS.
poral polar transformed 2DFPS and the current polar transformed 2DFPS. The robot memorizes changes of two cross-correlations while it moves: the first is the cross-correlation between the current and the previous polar transformed 2DFPSs, and the other is between the current and the next polar transformed 2DFPSs. Furthermore, the locus of the angle of the principle axis of inertia is memorized. The cross-correlation and the angle of the principle axis of inertia are used for robot control.
Figure 33.
Space representation by 2D fourier power spectrum.
current polar transformed 2DFPS is calculated while the robot is moving. The value of the cross correlation decreases with distance in common environments. If the value of the cross correlation is lower than a certain threshold, we set a flag for a temporal reference point. In the same manner, we extract the next reference point by calculating the cross-correlation between the tem-
(b) Autonomous Navigation. After organizing the iconic memory, the robot can be navigated along the memorized route. The robot is initially parked at a standard position and is driven around a room and a corridor of the building via a given route. The rough location of the robot can be identified by comparing the present polar transformed 2DFPS with the polar transformed representative 2DFPS on both sides of the present path. With the robot undergoing a constant velocity, the direction of the robot motion (the steering angle of the robot) is controlled by the common proportional controller defined by the angular difference between the present PAI and the memorized PAI.
Real-Time Omnidirectional Image Sensors
195
erence points. We consider that both patterns in Fig. 36 are almost the same. Therefore, we consider that reproducible and stable representations could be generated by the proposed method. Figure 37 shows the result of autonomous route navigation. The upward arrows are reference points on the memorized route data. The downward arrows are the corresponding frames in the omnidirectional route panorama observed during autonomous navigation. In this experiment, the robot could successfully navigate to the goal position.
Figure 35.
Experimental environment.
(c) Results of Autonomous Navigation. Experiments were carried out in our university building. Figure 35 shows the layout of the experimental environment. The robot was initially parked in our computer room. To organize the iconic memory, the operator controlled the mobile robot and moved it toward the door. After passing through the door, the operator turned the robot left in the middle of the corridor. Then, the robot was navigated by the operator until the goal position. In total, the robot moved approximately 10 meters. We did the same experiment twice. The second run was performed in a different day. The lighting condition was a little bit different and a few people were walking in the corridor. The second navigated route was similar but not identical to the first one because the robot was controlled by a human operator. Figure 36(a) and (b) are the omnidirectional route panorama and iconic memorized data including 2DFPSs and cross-correlations as well as the angle of the principle axis of inertia. The route was divided into fifteen regions as shown in Fig. 36. The threshold for cross-correlation was 0.8. The window size of a reference route panorama was 360 degrees by 32 frames. The upward and downward arrows in Fig. 36(a) (dotted lines) are the respective ref-
3.2.3. Geometrical Map-Based Navigation. Since the environmental map is given, the location and the motion of the robot can be estimated by detecting the azimuth of each object in the omnidirectional image (Yagi et al., 1995). We assume that the robot is initially parked at a standard position, driven around a room and a corridor of the building via a given route, and then the rough movement of the robot can be measured using an internal sensor. There are, however, measurement errors caused by a swaying motion of the robot. The sway motion means that unobservable errors occur in the orientation caused by tire-slipping on a slightly rough ground. A rough location of the robot can be calculated from the starting position and the movement from there. Thus, the azimuth angles of the vertical edges are estimated using the given map and rough location. Using the azimuth information from both the input image and the environmental map, one can estimate the location and motion of the robot, even if all of the edges are not extracted correctly from the omnidirectional image, because the omnidirectional image sensor can observe a 360 degree view around the robot. After matching the observed edges with the environmental map, the undetectable vertical edges are recognized as unknown obstacles and their locations are estimated. When there is an obstacle along a path toward a given goal position, the robot has to change its path. By evaluating the distance between the robot and the goal position, the robot plans a new minimum length path without colliding with the obstacle and moves toward the goal position. Nowadays, robot applications are being extended to service tasks. There is a need for a safe coexistence of service robots and human beings. There has been much work carried out on the development of a humanfriendly robot system. We propose a mobile robot guide system for following and navigating an observer to the desired position in an environment such as a museum or exhibition (Yagi et al., 1998).
196
Yagi and Yachida
Figure 36.
Figure 37.
Example of memorized route data.
A Result of autonomous route navigation.
The system acquires omnidirectional information around the robot, in real-time, with multiple sensors such as omnidirectional image sensors HyperOmniVision and omnidirectional-ultrasonic sensors. Since HyperOmni Vision observes a 360 degree view around the robot, it can observe the global and precise azimuth information of features. However, it is difficult to estimate the precise location of a moving object (observer) from the azimuth information. On the other hand, active sensors such as omnidirectional ultrasonic sensors can obtain range data around the robot. However, as the azimuth resolution of the ultrasonic sensor is poor, it is not suitable for object recognition. Therefore, we integrate the merits of both sensors and construct useful capabilities for a mobile robot guide system; a localization function and an observer following function which maintain a constant interval between the observer and the robot. First, HyperOmni Vision identifies and tracks the observer using azimuth and color information. An acoustic sensor can easily acquire information on the depth around the robot. From both sensory data, the robot can estimate the location of the observer relative
Real-Time Omnidirectional Image Sensors
to the robot. Furthermore, the location of the robot is estimated by our map-based navigating method, mentioned before. Finally, the robot estimates the relationship of the relative locations among the robot, the observer and the stationary environment. Then the robot follows the observer while maintaining a constant distance between the robot and the observer. Therefore, the robot can easily estimate the walking velocity of the observer from its own velocity. For instance, the robot assumes that the observer is interested in something at the current position if the robot stops for a while. Then, the robot starts voice guidance about the pre-registered object close to the current position.
4.
Cooperative Observation by Multiple Sensors
In this section, we propose three different types of cooperative observations. The first one is cooperative observations among multiple mobile robots with the omnidirectional image sensor COPIS. Each robot shares observed azimuth information and builds a large environmental map. The second cooperation is a combination of omnidirectional image sensor data and ultrasonic sensor data. Although each sensor has advantages and disadvantages, the proposed robot system can estimate a free space by combining the two sets of sensory data. The third cooperative observation is on a multiple image sensing system that consists of an omnidirectional image sensor and binocular vision. We apply our two types of multiple-image sensing systems to the problem of the focus of attention. 4.1.
Cooperation Among Multiple Robots with Omnidirectional Image Sensor
Researchers have investigated multi-agent autonomous robot systems or decentralized autonomous robot systems because of the expansion in the ability of these systems. From the point of view of observation, if multiple robots can communicate and communize each piece of sensory information, the robot can recognize the environment without visiting it first. Therefore, sensor and sensory information processing are the most important cues for the multi-agent system. Most multi-agent systems assume that every robot has common world coordinates and that the initial location of each robot is known. In general, however, these assumptions decrease the flexibility of the system.
197
We propose a method for detecting and estimating the locations of moving objects by cooperative observations of azimuth changes among multiple mobile robots with the omnidirectional image sensor COPIS (Yagi et al., 1994). Each robot can observe azimuth changes in consecutive images while it is moving in the environment. The locations and motions of moving objects are estimated by matching azimuth information among multiple robots. Our method did not need common world coordinates or information on the initial location of each robot. Our method involves only the use of information on azimuth changes in the vertical edges. All robots observe azimuth changes in consecutive images while they are moving in the environment. Multiple robots can acquire azimuth information from each other. Although they do not know their relative location and orientation, each robot can get observed data from another robot. This observed data is used for the identification of a cooperative robot. Consider that both robots are parallel to each other. The observed azimuth angle relations between robot i and robot j are represented by the following equations. First, the azimuth angle of robot j observed by robot i is given as follows, tan θ ji (t) =
(V j − Vi )t + Y ji (0) (U j − Ui )t + X ji (0)
(12)
The azimuth angle of robot ( j) observed by robot (i) is given as follows, tan θi j (t) =
(Vi − V j )t + Yi j (0) (Ui − U j )t + X i j (0)
(13)
Here, the relative locations between the two robots are represented by Xi j(0) + X ji(0) = 0, Y i j(0) + Y ji(0) = 0
(14)
From the above equations, the subtraction of the azimuths between the two robots is equal to 180 degrees as follows, |θi j (t) − θ ji (t)| = π
(15)
Here, the two robots are not always parallel to one another. As shown in Fig. 38, there is a difference in orientation (φ(t), pan angle) between the two robots. In consideration of φ(t), the equation is rewritten as |θi j (t) − θ ji (t)| = π − φ(t)
(16)
198
Yagi and Yachida
Figure 38. Observed azimuths & direction of relative velocity between two cooperative robots.
From (16), the following relation only exists, regardless of the time between the communicating robots. θi j (t) − θi j (t + t) = θ ji (t) − θ ji (t + t) (17) By evaluating the azimuth changes in the image sequence, the projected azimuth of the partner can be detected. Furthermore, in the same manner, the direction of the relative velocities between the two communicating robots is equal to (180 − φ(t)) degrees. Thus, confirmation between the two robots can be performed by checking the above relationships. Furthermore, we propose a method for generating an environmental map by cooperative observation among multiple mobile robots with the omnidirectional image sensor COPIS (Yagi et al., 1996). Each robot with COPIS can observe an omnidirectional view around itself in real-time using a conic mirror. Under the assumption of the known motion of the robot, an environmental map of a scene is generated by monitoring the azimuth change in the image. However, the obtained environmental map cannot be integrated at this stage because the robots do not know their relative locations. Therefore, each robot communicates and exchanges information on sensory data. Then, static objects (environment), unknown moving objects, and a cooperative robot are discriminated by evaluating the relative directions of motion and estimating the locations of the vertical edges. However, even if the relative location and orientation of both robots are es-
timated precisely, the locations of the corresponding vertical edges in the two maps do not always locate the same point because the estimated relative location and orientation of each robots has observational errors. Therefore, the two maps are matched by assuming an error ellipse around the estimated locations of each of the vertical edges. Figure 40 shows the experimental results of map integration between the two robots with COPIS (called robot 1 and robot 2). Both robots moved in an indoor environment. The speeds of robot 1 and robot 2 are 6 cm/frame and 5 cm/frame, respectively. The images were taken at 2 frame/sec. In total, 58 images were taken. Figure 39 shows the temporal changes in the azimuth angles of the vertical edges in the environment. Each line represents the correspondence between the adjustment frames. The black, light gray and middle gray lines represent the vertical edges on the static, moving object and UNFIXED objects, respectively. After the 30th frame, the static objects and the communicating robot can be discriminated exactly. Figure 40(a) and (b) show the individual environmental maps generated by robot 1 and robot 2, respectively. In front of the robot, a large error appeared in both of the maps. These large errors are thought to be the result of insufficient changes in the azimuth angle. However, the error decreased by integrating both maps, as shown in Fig. 40(c). 4.1.1. Robust Map Generation by Combining an Ultrasonic Sensor with an Omnidirectional Image Sensor. Due to the cost and limitations of various sensors and the stability of sensing, a single sensor is generally not sufficient to provide a satisfying result. Thus, sensor fusion or integration has been widely used to enhance the precision of maps. Among them, an ultrasonic sensor is often used with a vision sensor. This is due to the fact that the ultrasonic sensor can directly measure the depth of the environment model, though it is poor at angle resolution and error-prone due to specular reflection. On the other hand, the vision sensor is good in terms of resolution, though it is slow at image processing, and suffers in conditions of varying lighting in the environment. In map building, combining these two sensors allows the exploitation of the visions area or edge information with the readily available ultrasonic range information. To build the floor map, we adopt the cooperation of ultrasonic sensors with an omnidirectional vision sensor (Wei et al., 1998). The omnidirectional vision sensor we use can get a less distorted and faster scan
Real-Time Omnidirectional Image Sensors
Figure 39.
199
Locus map of observed azimuth changes of two cooperative robots.
of the surrounding floor than traditional vision sensors. Also, the range filtering method we use on the ultrasonic range data can provide a more reliable initial estimation of the free space. To facilitate the fusion of the heterogeneous sensor input at all times, we use the grid based representation as our fusion basis. A safety index defined on the grid representation is first derived from the filtered range data, and then revised according to the color and edge information extracted from the inverse perspective transformed floor image. The final obtained free space in the floor map can then be used for either obstacle avoidance or robot relocation by the methods mentioned above. As in related work, Bang et al. (1995) also integrate the sensor input from a conic omnidirectional vision sensor and the ultrasonic sensor for building the local map. For Bang, the ultrasonic sensor gives the range, and the vision sensor the azimuth of the edge feature.
However, Bang assumed that the map is given and did not deal with map building. Also, for both, they did not deal with the ultrasonic specular problem. A block diagram of the system is shown in Fig. 24. There are two major modules. The image module obtains input of an omnidirectional image and corrects the distortion by inverse perspective transformation. The distortion- corrected image is then fed into the edge extraction and color clustering modules to obtain the edge map, and the color map, respectively. These two maps reflect the connectivity information about the robot’s surrounding environment. The sonar module gets the input of 16 sonar ring readings, and produces a sonar map. The sonar map reflects the range information about the robot’s environment. From the sonar map, an initial area of the surrounding free space is also estimated and used by the color clustering module to give an initial estimation of the surrounding floor color.
200
Yagi and Yachida
The local map module does the main fusion process for the three separate maps previously obtained. The fusion result represents the environment model of the robot. When the robot moves, the maps are shifted with the encoder information to remain valid. Other modules, like the plan primitive and command modules, can be added for robot control. In terms of fusion, there are three levels of sensor fusion involved in these modules. The sonar module fuses the 16 ultrasonic sensors to produce the echo map; the color clustering modules fuse the initial free space information from the echo map to produce a uniform color map; and finally the local map module fuses the echo, color and edge map from different sensors to produce the final free space around the robot. Figure 25(a) and (b) show the input floor map after inverse perspective transformation and the final estimated free space, respectively. 4.2.
Figure 40. Observed azimuths & direction of relative velocity between two cooperative robots.
Mobile Robot Navigation and Focus of Attention by Multiple Image Sensing System
The idea of using MISS for robot navigation is to focus the attention on complex scenes. While navigating towards a goal, omnidirectional views are used to generate and update a spatial map of its environment for path planning, collision avoidance and finding candidates of interesting objects such as landmarks, manipulated or inspective objects and unknown objects. When an interesting object is found and the resolution of the omnidirectional image is not sufficient, binocular vision is used to focus attention for further analysis of the object in more detail. When an unknown object, for example, is found during navigation, the binocular vision focuses its attention on the neighborhood of the unknown object. Since the vertical lines can be robustly detected in the omnidirectional image and their 3-D locations can be obtained by motion stereo of the omnidirectional image sequence (Yagi et al., 1995), the 3-D location of the vertical edges of the unknown object is used for the attention control of the binocular vision. As the robot moves, it obtains a sequence of stereo images of the unknown object. How to establish correspondence between the images, is an important problem, not only between binocular images (spatial) but also between images taken at different locations (temporal). We use different strategies for the vertical lines and the other lines. Since the 3-D locations of the vertical lines are known from the omnidirectional image sequence, they are projected on both of the binocular images. By searching for a vertical line
Real-Time Omnidirectional Image Sensors
in the neighborhood of the projected one in both the binocular images, the correspondence of the vertical edges can be easily established. For the other edges, we extend the principle of trinocular vision which utilizes geometrical constraints caused by the three cameras. Trinocular stereo has been found as a reliable method for establishing the correspondence between images. We apply the trinocular stereo method to the binocular motion images to establish the correspondence of the non-vertical edges. When the robot with binocular vision moves once, it gets four images. Then, the segment-based trinocular method is applied to two sets of three images, and the 3-D coordinates of the segment is calculated using the better set of three images. A better set means that there is a better matching between the projected segment, and the edge segment has been obtained in the third image. Segment-based trinocular vision usually finds a unique candidate in the third image because the trinocular stereo has geometrical constraint and because the edge segment has richer local features than the edge point, and the number of edge segments is much fewer than the number of edge points in the image. However, multiple candidates are sometimes left in the third image. This problem is a common one related to the multiple vision system that uses the epipolar constraint. In this case, we use COPIS information to find the true one from the candidates. COPIS can observe the precise location of vertical edges of the unknown object. Thus, we calculate the x coordinate in both of the binocular images by projecting location data of the vertical edge observed by COPIS. Actually, the calculated x coordinate has observational error, thus we set the band region to (x − δx < x < x + δx) for a certain width. Then, we verify the correspondence of the vertical edges, whether vertical edges in both binocular images exist in the band region or not. After matching the vertical edges, we find the edge connected with a vertical one. When both sides of an edge are connected with the vertical edge, correspondence can be detected. If one side of an edge is connected with a vertical edge, we assume that the other side of the edge connects with the nearest and consecutive vertical edge and thus verify the correspondence. Furthermore, even if both sides of the edge do not connect with any of the vertical edges, we can restrict the region of existence of the edge segment because it is situated between two consecutive vertical edges. The conditions mentioned above are strong geometrical constraints, however, sometimes multiple candidates remain. In this case, we use uniqueness of correspondence to find the
201
true one among the candidates. Although the binocular motion vision system assumes that the robot has position sensors and knows its movements, there are measurement errors caused by the swaying motion of the robot. On the other hand, COPIS can observe the precise location and rotation measurements of the robot. Therefore, these location data from COPIS are used for the camera parameters of the binocular vision, and we obtain the 3-D coordinates of each line-segment of any direction and build the 3-D model of the environment. Then we recognize surfaces from the 3-D wire frame-like model of the environment and reconstruct the spatial configuration of interesting objects, but the details are left to other literature (Utsumi et al., 1992). Using the MISS, we performed several experiments in the man-made environment. Images are digitized to 512 × 480 pixels (each pixel: 256 levels). Figure 5 is an example of an input image in the environment, as shown in Fig. 3. The objects in the image are large boxes like a house. Each image was taken after the robot had moved 2.5 cm. The robot moves straight ahead by about 70 cm. Observing the loci of the azimuth angles of the vertical edges, COPIS can estimate its own location and motion. Furthermore, the locations of the unknown vertical edges are estimated by triangulation. An average error of the location measurement of the robot was under approximately 10 mm, and the average location error of the unknown vertical edges was approximately 15 mm. Figure 41(a) and (b) are the edge pictures of the binocular vision images at the 7th and 8th frame. As shown in Fig. 42, after setting the search regions in the binocular images, we detect a correspondence of the vertical edges, as shown in Fig. 43. The thick black lines are the corresponding vertical edges. The correspondence is checked recursively. For instance, the edges, when both the vertices of the candidate are connected, are verified, as shown in Fig. 44. The white circles are the corresponding vertices. Figure 45 shows the results of the case where one side of
Figure 41.
Extracted edge segments.
202
Yagi and Yachida
Figure 42.
Figure 43.
Setting search regions.
Correspondence of vertical edge.
Figure 44. Connectivity of vertices—In the case that both vertices are connected.
Figure 45. Connectivity of vertices—In the case that one side of the vertex on an edge only is connected.
an edge is connected with a vertical edge. A dark circle shows an unconnected vertex. Figure 46 shows results after the final verification. Although the candidate vertices were not connected with the vertical edges, we can find true candidates. The measurement error of the estimated location of these edges was approximately 13 mm. However, we think that the precision of the obtained location of the robot and the unknown obstacle are enough for robot navigation and object recognition.
Figure 46.
Results of final verification.
Thus, these results suggest that the system is a useful sensor for robot navigation and object recognition. Next, we would like to describe the map generation method using the multiple-image sensing system MISS under unknown robot egomotion. When the robot moves within the environment, a large number of omnidirectional images and a pair of stereo images taken at different viewpoints can be observed by the multiple-image sensing system MISS. If the robot motion is known, it can easily generate the environmental map by triangulation from the loci of azimuths of the objects (Yagi et al., 1995). However, it is difficult to observe the exact motion parameters of the robot because of measurement errors of the encoder of the robot. Therefore, observational errors in the generated environmental map accumulated after long movements of the robot. To generate a large environmental map, it is desirable not to assume known robot motion. During unknown robot egomotion, one of the important problems in reconstructing a 3-D model of the environment is how to integrate observational information from different viewpoints. Our method integrates the local range information observed by binocular vision at different viewpoints by using the loci of azimuths observed by the omnidirectional vision while the robot moves, and then estimates the robot egomotion to generate the whole environmental map (Yagi et al., 1997; Tsuji et al., 1997). Defining the robot position j relative to the robot position i by the translational component (i u j ,i v j ) and the rotational component i θ j of the robot movement, the locations i P j (i X j ,i Y j ) of all object points j relative to the robot position i, can be represented by i j i cos i θ j − sin i θ j Xj Xj Uj = + i i j i i Yj sin θ j cos θ j Yj Vj (18) Here, consider that the robot moves from position i to position i + 2, as shown in Fig. 47. Omnidirectional
Real-Time Omnidirectional Image Sensors
Figure 47.
Map generation by MISS.
vision can observe object azimuths continuously. The observed azimuths α j have the following relationships. tan i αi = tan i αi+2 = tan i+1 αi = tan i+1 αi+2 = tan i+2 αi = tan i+2 αi+2 =
i
i Xi X i+1 i tan α = i+1 iY iY i i+1 i X i+2 iY i+2 i+1 i+1 Xi X i+1 i+1 tan α = i+1 i+1 Y i+1 Y i i+1 i+1 X i+2 (19) i+1 Y i+2 i+2 i+2 Xi X i+1 i+2 tan α = i+1 i+2 Y i+2 Y i i+1 i+2 X i+2 i+2 Y i+2
Although the view field of binocular vision is restricted by the visual angle of the lens, it can acquire the distance of objects in the view field. Consider the range L k of the object point k estimated at robot position k (k = i, i + 1, i + 2). The relationship between the range L k and observed azimuth αk is represented by k Xk L k cos k αk = , (k = i, i + 1, i + 2) (20) k Yk L k sin k αk From (18), (20) and (20), the total number of observational equations is six. On the other hand, the unknown parameters are the parameters of the robot location i P j (i U j ,i V j ) and rotation i θ j relative to the initial robot position i. The total number of unknown parameters is also six. Thus, the following equation must be satisfied for the estimation of unknown parameters.
203
Actually, as each observed datum has measurement errors, one can estimate a more precise location using the least squares method. This method can estimate the robot egomotion and generate the environmental map against the object point (vertical edges) which can be observed by the range data from the binocular vision. With regard to the object points for which the range data is not acquired by binocular vision, one can estimate the locations of the robot and the object point by using our previous map-based navigation method and map generation method (Yagi et al., 1995). Figure 48(a) and (b) shows the results at the 30th frame and the 82nd frame. The estimated locations of the vertical edges which have been observed using both global azimuth and local range information, are plotted as black square points. The white square points show the estimated locations of the vertical edges which have been observed only by global azimuth information. The estimated loci of the robot are drawn by a thin black line. The ability to detect and track a human face is very useful and suitable for human-machine interactions. We track persons around the robot by combining omnidirectional vision and binocular vision Fig. 49 (Konparu et al., 1997, 1998). Since the omnidirectional vision robot can observe a 360 degree view around the robot in real time, it can observe global azimuth information of persons. By using the global azimuth information from omnidirectional vision robot, the binocular vision robot can fix its attention and observe the face of the person in which it is interested. (a) Human Detection and Tracking by Omnidirectional Vision. We assume that a robot starts from rest. Before the robot starts moving, it takes a stationary background color image and produces a background histogram. The regions where the optical flows occur are registered as human candidates. Then, we calculate the color histograms of each human candidate region. As shown in Fig. 50, the characteristic human features, which are good at discerning a person from one’s background, are made by subtracting between the normalized color histograms of each human candidate region from the normalized background histogram. After human detection, each person can be tracked by integrating the optical flow regions and estimating each human feature. (b) Human Face Detection and Tracking by Binocular Vision. Next, the binocular vision fixes its attention on the face region of an interesting person. Our face
204
Yagi and Yachida
Figure 48.
Results of map generation by MISS.
cause the omnidirectional vision continuously track each person. 5.
Figure 49. Multiple sensing system by cooperating omnidirectional vision and binocular active vision.
region detection is based on a skin color model and a simple shape model. As the azimuth angle of each person can be estimated from omnidirectional vision data, the region for which skin color is searched can be limited by using the estimated azimuth angle of an interesting person. Each skin region is matched with a standard face shape model defined by the area and aspect ratio of the skin region (Fig. 51). (c) Cooperative Tracking. The binocular vision sometimes loses track of the face of a person because of a limited field of view. Thus, the main robot tries to rotate its body and change the viewing direction of the binocular vision toward the human direction be-
Conclusions
In this paper, we have described three different types of omnidirectional image sensors, named COPIS, HyperOmni Vision and TOM, and two different types of multiple-image sensing systems. COPIS, which uses a conical mirror, is suitable for mobile robot navigation because its main field of view is a side view. The multiple-image sensing system, named MISS, can obtain an omnidirectional image and binocular vision images on a single camera since compactness and light weight structures are important factors for a mobile robot. HyperOmni Vision has an important feature in that it has a focal point (a single center of projection) and an input image can easily be transformed to any desired image projected on any designated image plane, such as a pure perspective image or a panoramic image. These sensors can be applied to a variety of fields such as autonomous navigation, telepresence, virtual reality and remote monitoring. However, the resolution of the omnidirectional camera is not sufficient for detailed analysis of an interesting object. Also, the anisotropic property of the convex mirror results in much blurring in the input image and it is difficult to miniaturize these sensors. To overcome these disadvantages, we have proposed new optics for omnidirectional sensing and a new method for high-resolution imaging. TOM, which
Real-Time Omnidirectional Image Sensors
Figure 50.
205
Human detection and tracking by omnidirectional vision.
have proposed high-resolution omnidirectional imaging and super-resolution modeling methods. We think that these methods expand the application field for omnidirectional image sensors. We have also proposed several methods that apply an omnidirectional image sensor and multiple image sensing systems for robot navigation and human-robot interactions. The advantage of omnidirectional sensing is not only its wide view angle but also its special properties, such as its invariability of azimuths, its circumferential continuity, periodic characteristic, symmetric characteristic and rotational invariance. Our proposed methods use these characteristics of omnidirectional image sensors for image processing. Further research continues on omnidirectional image processing and sensor design that uses the inherent optical characteristics of the omnidirectional image sensor will be necessary. Acknowledgment
Figure 51.
Human face detection and tracking by binocular vision.
consists of two paraboloidal mirrors, has good optics for making a small omnidirectional image sensor. The optics satisfies the important relationship of a single center of projection. Wide side view omnidirectional images could be acquired by TOM. We also
This work was supported by a Grant-in-Aid for Scientific Research from the Ministry of Education, Science and Culture, Japanese Government, and the Japan Society for the Promotion of Science under grant JSPS-RFTF 99P01404. References Aihara, N., Iwasa, H., Yokoya, N., and Takemura, H. 1998. Memory-based self-localization using omnidirectional images.
206
Yagi and Yachida
In Proc. of Int. Conf. Pattern Recognition, pp. 1799– 1803. Ayres, W.A. 1942. Projecting device. US Patent 2304434. Bang, S.W., Yu, W., and Chung, M.J. 1995. Sensor-based local homing using omnidirectional range and intensity sensing system for indoor mobile robot navigation. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 542–548. Barth, M. and Barrows, C. 1996. A fast panoramic imaging system and intelligent imaging technique for mobile robots. In Prof. of IEEE/RSJ Int. Conf. Intelligent. Robots and Systems, no. 2, pp. 626–633. Brassart, E., Delahoche, L., Cauchois, C., Drocourt, C., Pegard, C., and Mouaddib, M. 2000. Experimental results got with the omnidirectional vision sensor SYCLOP. In Proc. IEEE Workshop on Omnidirectional Vision, pp. 145–160. Bruckstein, A. and Richardson, T. 2000. Omniview cameras with curved surface mirrors. In Proc. IEEE Workshop on Omnidirectional Vision, pp. 79–84. Cao, Z. and Hall, E.L. 1990. Beacon recognition in omni-vision guidance. In Proc. of Int. Conf. Optoelectronic Science and Engineering, vol. 1230, pp. 788–790. Cao, Z.L., Oh, S.J., and Hall, E.L. 1986. Dynamic omnidirectional vision for mobile robots. J. Robotic Systems, 3(1):5–17. Chahl, J.S. and Srinivasan, M.V. 1997. Reflective surfaces for panoramic imaging. Applied Optics, 36(31):8275– 8285. Chang, P. and Hebert, M. 2000. Omni-directional structure from motion. In Proc. IEEE Workshop on Omnidirectional Vision, pp. 127– 133. Davis, J.E., Todd, M.N., Ruda, M., Stuhlinger, T.W., and Castle, K.R. 1997. Optics assembly for observing a panoramic scene. US Patent 5627675. Delahoche, L., Pegard, C., Mouaddib, E.M., and Vasseur, P. 1998. Incremental map building for mobile robot navigation in an indoor environment. In Proc. of IEEE Int. Conf. Robotics and Automation, pp. 2560–2565. Etoh, M., Aoki, T., and Hata, K. 1999. Estimation of structure and motion parameters for a roaming robot that scans the space. In Proc. IEEE Int. Conf. on Computer Vision, vol. 1, pp. 579– 584. Glucjman, J. and Nayer, S.K. 1998. Ego-motion and omnidirectional cameras. In Proc. IEEE Int. Conf. on Computer Vision, pp. 999– 1005. Greguss, P. 1985. The tube peeper: A new concept in endoscopy. Optics and Laser Technology, 41–45. Hicks, R.A., Pettey, D., Daniilidis, K., and Bajcsy, R. 2000. Closed form solutions for reconstruction via complex analysis. Journal of Mathematical Imaging and Vision, 13(1):57–70. Hiura, R., Yagi, Y., and Yachida, M. 1995. Topological mapping using a combination of edge and region information. In Proc. RSJ Robot Symposium, pp. 151–156. Holenstein, A. and Badreddin, E. 1991. Collision avoidance in a behavior-based mobile robot design. In Proc. IEEE Int. Conf. on Robotics and Automation, vol. 1, pp. 898–903. Irani, M. and Peleg, S. 1991. Improving resolution by image registration. Computer Vision, Graphics, and Image Processing, 53(3):231–239. Ishiguro, H. and Tsuji, S. 1996. Image-based memory of environment. In Proc. of Int. Conf. Intelligent Robots and Systems, pp. 634–639.
Ishiguro, H., Yamamoto, M., and Tsuji, S. 1992. Omni-directional stereo. IEEE Trans. Pattern Analysis and Machine Intelligence, 14(2):257–262. Kawasaki, H., Yatabe, T., Ikeuchi, K., and Sakauchi, M. 2000a. Construction of 3D city map using EPI analysis and DP matching. In Proc. Asian Conf. Computer Vision. Kawasaki, H., Ikeuchi, K., and Sakauchi, M. 2000b. EPI analysis of omni-camera image. In Proc. IAPR Int. Conf. of Pattern Recognition, vol. I, pp. 379–383. Konparu, T., Yagi, Y., and Yachida, M. 1997. Finding and tracking a person by cooperation between an omnidirectional sensor robot and a binocular vision robot. In Proc. 15th Annual Conf. on RSJ, vol. 3, pp. 957–958. Konparu, T., Yagi, Y., and Yachida, M. 1998. Finding and tracking a person by cooperation between an omnidirectional sensor robot and a binocular vision robot. In Proc. Meeting on Image Recognition and Understanding MIRU98, vol. II, pp. 7–12. Li, S., Ochi, A., Yagi, Y., and Yachida, M. 2000. Making 2D map of environments based upon routes scenes. Journal of Autonomous Robots, 8(2):117–128. Matthews, B.O., Perdue, D., and Hall, E.L. 1995. Omnidirectional vision applications for line following. In Proc. of SPIE Intelligent Robots and Computer Vision XIV: Algorithms, Techniques, Active Vision, and Materials Handling, vol. 2588, pp. 438–449. Matsumoto, Y., Inaba, M., and Inoue, H. 1997. Memory-based navigation using omni-view sequence. In Proc. of Int. Conf. Field and Service Robotics, pp. 184–191. Morita, T., Yasukawa, Y., Inamoto, Y., Uchiyama T., and Kawakami S. 1989. Measurement in three dimensions by motion stereo and spherical mapping. In Proc. of IEEE Computer Vision and Pattern Recognition, pp. 422–434. Nagahara, H., Yagi, Y., and Yachida, M. 2000. Super-resolution from an omnidirectional image sequence. In Proc. IEEE Int. Conf. on Industrial Electronics, Control and Instrumentation, pp. 2559– 2564. Nagahara, H., Yagi, Y., and Yachida, M. 2001. Resolution imrpoving method from multi-focal omnidirectional images. IEEE International Conference on Image Processing, vol. 1, pp. 654–657. Nagahara, H., Yagi, Y., and Yachida, M. 2001b. Hi-resolution modeling of 3D environment using omnidirectional image sensor. Technical Report of IEICE, PRMU-2000-152, pp. 39–46. Nayar, S.K. and Peri, V. 1999. Folded catadioptric cameras. In Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Vol. II, pp. 217–223. Nayar, S.K. and Karmarkar, A. 2000. 360 × 360 mosaics. In Proc. IEEE Computer Vision and Pattern Recognition, vol. II, pp. 388– 395. Nelson, R.C. and Aloinomous, J. 1988. Finding motion parameters from spherical motion fields. Biological Cybernetics, 58:261– 273. Pegard, C. and Mouaddib, E.M. 1996. Mobile robot using a panoramic view. In Proc. of IEEE Int. Conf. Robotics and Automation, vol. 1, pp. 89–94. Peleg, S., Keren, D., and Schweitzer, L. 1987. Improving image resolution using subpixel motion. Pattern Recognition Letters, 5(3):223–226. Powell, I. 1995. Panoramic lens. US Patent 5473474. Rees, W.D. and Mich, W. 1970. Panoramic television viewing system. US Patent 3505465.
Real-Time Omnidirectional Image Sensors
Roning, J.J., Cao, Z.L., and Hall, E.L. 1987. Color target recognition using omnidirectional vision. In Proc. of SPIE Optics, Illumination, and Image Sensing for Machine Vision, vol. 728, pp. 57–63. Rosendahi, G.R. and Dykes, W.V. 1983. Lens system for panoramic imagery. US Patent 4395093. Rossi, B. 1962. Optics, Addison-Wesley Publishing Co. Inc. Santos-Victor, J., Sandini, G., Curotto, F., and Garibaldi, S. 1995. Divergent stereo in autonomous navigation: From Bees to Robots. Int. J. of Computer Vision, 14:159–177. Saraclik, K.B. 1989. Characterizing an indoor environment with a mobile robot and uncalibrated stereo. In Proc. of IEEE Int. Conf. Robotics and Automation, pp. 984–989. Simamura, J., Yokoya, N., Takemura, H., and Yamazawa, K. 2000. Construction of an immersive mixed environment using an omnidirectional image sensor. In Proc. IEEE Workshop on Omnidirectional Vision, pp. 62–69. Svodoba, T., Pajdla, T., and Hlavac, V. 1998. Epipolar geometry for panoramic cameras. Europian Conf. Computer Vision, 218– 232. Takeya, A., Kuroda, T., Nishiguchi, K., and Ichikawa, A. 1998. Omnidirectional vision system using two mirrors. In Proc. SPIE, vol. 3430, pp. 50–60. Tsai, R.Y. and Hang, T.S. 1984. Multiframe image resolution and registration. Advanced in Computer Vision and Image Processing, 1:317–339. Tsuji, Y., Yagi, Y., and Yachida, M. 1997. The observation planing for generating an environment map. In Proc. 15th Annual Conf. RSJ, vol. 3, pp. 961–962. Utsumi, A., Yagi, Y., and Yachida, M. 1992. Estimating surface and spatial structure from wire-frame model using geometrical and heuristical relation. In Proc. IAPR Machine Vision and Appl., pp. 25–28. Wei, S., Yagi, Y., and Yachida, M. 1998. Building a local floor map by use of ultrasonic and omnidirectional vision sensors. Advanced Robotics, 12(4):433–453. Yachida, M. 1998. Omnidirectional sensing and combined multiple sensing. In Proc. Workshop on Computer Vision for Virtual Reality Based Human Communications, pp. 20–27. Yagi, Y. 1999. Omnidirectional sensing and its applications, IEICE Trans. Information and Systems, E82-D(3):568–579. Yagi, Y., Egami, K., and Yachida, M. 1997. Map generation for multiple image sensing sensor MISS under unknown robot egomotion. In Proc. Int. Conf. Intelligent Robots and Systems, vol. 2, pp. 1024–1029. Yagi, Y., Fujimura, S., and Yachida, M. 1998. Route representation for mobile robot navigation by omnidirectional route panorama fourier transformation. In Proc. IEEE Int. Conf. Robotics and Automation, pp. 1250–1255. Yagi, Y., Hamada, H., Benson, N., and Yachida, M. 2000. Generation of stationary environmental map under unknown robot motion. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 1487–1492. Yagi, Y., Izuhara, S., and Yachida, M. 1996. The integration of an environmental map observed by multiple mobile robots with omnidirectional image sensor COPIS. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, vol. 2, pp. 640– 647. Yagi, Y. and Kawato, S. 1990. Panorama scene analysis with conic projection. In Proc. IEEE/RSJ Int. Workshop on Intelligent Robots and Systems, pp. 181–187.
207
Yagi, Y., Kawato, S., and Tsuji, S. 1994. Real-time omnidirectional image sensor (COPIS) for vision-guided navigation. IEEE Trans. Robotics and Automation, 10(1):11–22. Yagi, Y., Lin, Y., and Yachida, M. 1994. Detection of unknown moving objects by reciprocation of observed information between mobile robot. In Proc. IEEE/RSJ/GI Int. Conf. on Intelligent Robots and Systems, vol. 2, pp. 996–1003. Yagi, Y., Nagai, H., Yamazawa, K., and Yachida, M. 1999. Reactive visual navigation based on omnidirectional sensing—Path following and collision avoidance. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, vol. 1, pp. 58–63. Yagi, Y., Nishii, W., Yamazawa, K., and Yachida, M. 1996a. Rolling motion estimation for mobile robot by using omnidirectional image sensor hyperomnivision. In Proc. IAPR Int. Conf. on Pattern Recognition, vol. 1, pp. 946–950. Yagi, Y., Nishii, W., Yamazawa, K., and Yachida, M. 1996b. Stabilization for mobile robot by using omnidirectional optical flow. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 618–625. Yagi, Y., Nishizawa, Y., and Yachida, M. 1995. Map-based navigation for a mobile robot with omnidirectional image sensor COPIS. IEEE Trans. Robotics and Automation, 11(5):634–648. Yagi, Y., Okumura H., and Yachida, M. 1994. Multiple visual sensing system for mobile robot. In Proc. IEEE Int. Conf. on Robotics and Automation, vol. 2, pp. 1679–1684. Yagi, Y., Shouya, K., and Yachida, M. 2000. Environmental map generation and egomotion estimation in a dynamic environment for an omnidirectional image sensor. In Proc. IEEE Int. Conf. on Robotics and Automation, pp. 3493–3498. Yagi, Y., Sato, K., Yamazawa, K., and Yachida, M. 1998. Autonomous guidance robot system with omnidirectional image sensor. In Proc. Int. Conf. on Quality Control by Artificial Vision, pp. 385–390. Yagi, Y. and Yachida, M. 1999. Development of a tiny omnidirectional image sensor. 1999 JSME Conf. on Robotics and Mechatronics, No. 99-9,2A1-66-060, pp. 1–2. Yagi, Y. and Yachida, M. 2000. Development of a tiny omnidirectional image sensor. In Proc. Asian Conf. on Computer Vision, pp. 23–28. Yagi, Y. and Yachida, M. 1998. Omnidirectional visual sensor. Japanese Patent Application, 10-17251. Yamazawa, K., Yagi, Y., and Yachida, M. 1993. Omnidirectional imaging with hyperboloidal projection. In Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, no. 2, pp. 1029–1034. Yamazawa, K., Yagi, Y., and Yachida, M. 1995. Obstacle detection with omnidirectional image sensor hyperomni vision. IEEE the International Conference on Robotics and Automation, pp. 1062– 1067. Yamazawa, K., Yagi, Y., and Yachida, M. 2000. 3D line segment reconstruction by using HyperOmni vision and omnidirectional hough transforming. In Proc. IAPR Int. Conf. on Pattern Recognition, vol. 3, pp. 487–490. Zheng, J.Y. and Tsuji, S. 1990. Panoramic representation of scenes for route understanding. In Proc. of Int. Conf. Pattern Recognition, pp. 161–167. Zheng, J.Y. and Tsuji, S. 1990. From anorthoscope perception to dynamic vision. In Proc. of IEEE Int. Conf. Robotics and Automation, pp. 1154–1160. Zheng, J.Y. and Tsuji, S. 1992. Panoramic representation for route recognition by a mobile robot. Int. J. Computer Vision, 9(1):55–76.