P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Journal of Mathematical Imaging and Vision 8, 255–269 (1998) c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands. °
Determination of Aircraft Orientation for a Vision-Based System Using Artificial Neural Networks SANJEEV AGARWAL Intelligent Systems Center, University of Missouri, Rolla, MO 65401 SUBHASIS CHAUDHURI Department of Electrical Engineering, Indian Institute of Technology, Powai, Bombay, 400 076
Abstract. An algorithm for real-time estimation of 3-D orientation of an aircraft, given its monocular, binary image from an arbitrary viewing direction is presented. This being an inverse problem, we attempt to provide an approximate but a fast solution using the artificial neural network technique. A set of spatial moments (scale, translation, and planar rotation invariant) is used as features to characterize different views of the aircraft, which corresponds to the feature space representation of the aircraft. A new neural network topology is suggested in order to solve the resulting functional approximation problem for the input (feature vector)-output (viewing direction) relationship. The feature space is partitioned into a number of subsets using a Kohonen clustering algorithm to express the complex relationship into a number of simpler ones. Separate multi-layer perceptrons (MLP) are then trained to capture the functional relations that exist between each class of feature vectors and the corresponding target orientation. This approach is shown to give better results when compared to those obtained with a single MLP trained for the entire feature space. Keywords: 3-D orientation estimation, pose estimation, moment invariants, principal axis moments, Kohonen clustering, multi-layer perceptron
1.
Introduction
Estimation of the orientation of an object from its image at an arbitrary viewing direction is an important problem in the image processing literature. The orientation of a 3-D object often needs to be calculated for object identification purposes [11, 43]. Many applications in automation, including robotics, demand the estimation of 3-D orientation. Pick-and-place robot is one such application. It is also quite useful in many video tracking systems [1, 32]. We shall be primarily concerned with the application in a tracking problem where a maneuvering aircraft is to be tracked. This has many important applications including the tracking of commercial aircrafts for air traffic control and collision avoidance [3, 28], and also for military-target tracking [3, 22, 24].
Traditionally, only positional data (range and bearing) and occasionally rates (Doppler) from radar sensors have been used for the estimation of highly uncertain and dynamic acceleration process in a typical target-tracking problem [9]. These sensors, even though quite effective against non-maneuvering or slowly maneuvering targets, fail to achieve a reasonably good tracking performance against highly maneuverable targets such as aircrafts. The measurement of the orientation of a target is of particular interest because a significant correlation exists between the aircraft orientation and its acceleration [24]. Thus, while tracking-algorithms based on only the positional data assume the motion of a point mass for the target, the availability of the space orientation information allows us to consider a more elaborate rigid body motion which gives better tracking performance [2, 37].
P1: VBI Journal of Mathematical Imaging and Vision
256
KL550-04-Agarwal
January 29, 1998
12:17
Agarwal and Chaudhuri
Kendrick et al. [24] have proposed a multisensor data fusion system, where the radar provides the range, the range rate, and the bearing information, while the imaging sensor is employed to obtain the target orientation. Instantaneous orientation of the target in 3-D space has been used to estimate the most likely direction, and magnitude of the maneuvering acceleration. This additional information about target maneuvers, when augmented with a conventional extended Kalman filter, results in a marked improvement in the estimation of motion parameters for the target. Hutchins and Sworder in a series of papers [22, 36] have employed a similar idea for tracking a land vehicle. Lefas [28] has similarly used the heading angle along with the radar data in aircraft tracking system for air traffic control purposes. However, the aircraft is assumed to provide the additional measurement of the heading angle through an air-ground data link. The applicability of the target orientation information to an automatic and autonomous target-tracking system is severely limited because of the lack of a suitable method for estimating the 3-D orientation of the target. The purpose of the present paper is to fill this gap. Given the 3-D structure of an aircraft, it is straightforward to obtain its image in any viewing direction by simply defining a projective transformation. However, the inverse problem of finding the viewing direction given its image is quite challenging because the image of a complex 3-D object such as an aircraft changes in a highly nonlinear manner with the viewing direction. A significant amount of work has been reported in computer vision literature to deal with 3-D object recognition and pose estimation based on 2-D image data. A survey of some of these efforts can be found in [8]. In the constrained environment of the tracking problem, because of the poor quality of the images, only silhouette of the object may be available for pose estimation. Thus, the local feature based object recognition/pose estimation methods [14, 15, 23, 26, 30, 35, 40] are not very useful. Moreover, these algorithms are computationally intensive since they require sophisticated feature detection routines. Global features such as moment invariants and Fourier descriptors have been used extensively [6, 7, 11, 43] in the literature. Most of these methods have been developed for identification of the target; orientation is obtained only as a byproduct of these algorithms. The classification methods are based on the minimum distance and the k-nearest neighbor classifications over the library of views, which are very slow
and thus may not be suitable for real-time implementation. Moreover, the accuracy of the estimation is limited by the number of library views stored for the object. There have also been attempts to match a model to a given observation [27, 34, 41]. Advantages and limitations of these methods have been discussed in [18, 31]. Wallace and Mitchell [42] have proposed an algorithm for the estimation of orientation of a 3-D object for a target-tracking problem based on the linearity property of the normalized Fourier descriptors. Even though better estimates of orientation can be obtained with this approach, the algorithm is computationally intensive. Recently, considerable research has been devoted to aspect graph representation of 3-D objects [12, 16] (see also [13]). An aspect graph seeks to provide a view-centered representation of the object. Each node of the graph represents the characteristic view of the object for a connected set of viewpoints, from which the object appears qualitatively similar. Both global and local features can be used to represent different views of the object. This approach is especially suited for an object recognition problem and by itself it could provide only a crude approximation of the pose. Better estimation of pose can be obtained at the cost of added complexity and increased size of the graph which slows down the system. In this paper, we take a neural network approach to solve the problem of matching the 2-D image information to the 3-D object representation. The noniterative and feed-forward nature of the neural network makes this algorithm amenable for the real-time implementation. Moreover, due to interpolation capabilities of the network it gives reasonably accurate results even for orientations not trained explicitly. The developed algorithm will, in the future, be integrated into a recursive target-tracking system for an improved accuracy. The problem is formally defined in the next section. It is reduced to an equivalent problem of estimation of the view of the aircraft and the rotation about the optical axis, which can be solved separately. The generation of the feature vector to characterize different views of an aircraft is discussed in Section 3. In Section 4, two different neural network approaches are presented for the estimation of the viewing direction. The first method takes advantage of the ability of a simple multilayer perceptron (MLP) to learn the functional relationship between the input and the output patterns. Due to highly nonlinear and complex nature of the relationship, a simple multi-layer perceptron was not entirely
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation
257
suitable, especially since the number of the training sets was limited. Thus, in the second approach, a Kohonen self-organizing network is used for clustering the input space into subsets (classes). This classification results in a simpler relationship between the input class and the corresponding output class, which can be learned more effectively by MLPs. The Kohonen layer in the second approach could also be thought of as partitioning the view space in abstract aspects, thus automatically providing an aspect graph representation of the object. Given the viewing direction, the problem of estimation of the angle of rotation about optical axis is discussed in Section 5. The simulation results are discussed in Section 6. The paper ends with conclusions and a discussion on the scope for future work. 2.
Problem Formulation
In the rest of the paper we shall assume that the object under consideration is an aircraft, so that the terms object and aircraft have been used interchangeably. In order to continue with our discussion we first define three coordinate systems (see Fig. 1). Let (X, Y, Z ) be the right-handed global coordinate system (GCS) attached to the earth with the positive Z -axis pointing upward. The aircraft coordinate system (ACS), denoted by (xa , ya , z a ) has its origin at the center of mass of the aircraft, with positive xa -axis pointing towards the nose of the aircraft and positive z a -axis pointing upward perpendicular to the wing plane. Also, the camera coordinate system (CCS) denoted by (xc , yc , z c ) is chosen such that x c –z c is parallel to the image plane and yc -axis is along the optical axis of the camera. For convenience we shall assume that the camera does not rotate about its optical axis, thus z c -axis always lies in Z –Y plane. With this assumption, the camera coordinate system can be defined with respect to the global coordinates by two angles (θgc , φgc ) which represent the direction angles of the optical axis with respect to the GCS. In order to solve the problem for the target-tracking application, we make certain assumptions: • The 3-D model of the target is assumed to be known a priori, i.e., target identification has been performed. • The target is far enough compared to the focal length of the camera, so that we may consider the model of a scaled orthographic projection, i.e., x = α X , and z = α Z in the image plane, where α is the magnification [19].
Figure 1. 3-D object orientation defined with respect to (a) global coordinate system and (b) camera coordinate system.
• The image (monocular) is assumed to have been binarized for the analysis. • The target is assumed to undergo a rigid motion (i.e., there is no deformation). • The silhouette of the object is assumed to be unambiguous (due to possible bilateral symmetry of the object). Such an ambiguity can, however, be resolved by following the evolution of the pose during the course of maneuvering of the object. The 3-D orientation of the target is defined in terms of three Euler angles namely, azimuth, pitch and roll (θa , φa , ψa ) that the aircraft coordinate system makes with respect to the global coordinate. Since the camera orientation (and hence the CCS) can be obtained from the gyros attached to the camera subsystem, a suitable transformation from camera coordinates to global coordinates can be defined. Thus, given the target orientation with respect to the camera coordinate system, its 3-D space orientation in inertial coordinates can easily be obtained. In the following discussion we shall refer
P1: VBI Journal of Mathematical Imaging and Vision
258
KL550-04-Agarwal
January 29, 1998
Agarwal and Chaudhuri
to the problem of estimating target orientation with respect to the CCS as pose estimation. The pose of the target can be defined in terms of three angles: 1. The view of the target (θ, φ): The angles that the camera axes make with the aircraft coordinate system. The view (0, 0) corresponds to the front view of the aircraft. Here θ ∈ (−π, π ], with right hemisphere taking positive values, while φ ∈ [−π/2, π/2] with the top hemisphere taking positive values. 2. The rotation about the camera axis (ψ), such that a rotation of the image by ψ coincides the given image to a standard view at that viewing direction. The standard image for any view (θ, φ) is the one obtained with the ACS coincident with the GCS and the CCS pointing in (θ, φ) direction. The two bearing angles namely the elevation (θt ) and the azimuth (φt ) can be obtained from the direction cosines of the camera optical axis (θgc , φgc ) as obtained from the inertial navigation system and the location of the centroid (x, ¯ y¯ ) of the silhouette of the object. Since the 3-D space orientation with respect to the GCS can easily be obtained given the pose angles (θ, φ, ψ) and the camera coordinate angles (θgc , φgc ), in the rest of the section we shall concentrate on the pose estimation problem, i.e. estimating (θ, φ, ψ) from a monocular, binary image of the target. 3.
12:17
Feature Extraction
In order to obtain the pose of the aircraft for a given image, it should be compared with the stored images of different targets at various orientations. Storing the whole image of each view of the object and subsequent comparison with the given image is physically not possible due to memory and time limitations. Thus, some simple representative features should be obtained in order to characterize the images of the target at different orientations. As we have seen, the ‘pose’ of an object is defined by (θ , φ, ψ), where (θ, φ) represents the viewing direction of the object, and ψ represents the rotation about the optical axis. It may be noted that rotation about the optical axis produces merely a rotation in the image plane and there is no change in the shape and size of the image. Thus, given an image representation which is independent of the rotation in the image plane, the search can be restricted to a two-dimensional space for
two angles representing the direction of view of the object. The rotation about the optical axis can subsequently be found [11]. Generally it is advantageous to have an image representation which is also invariant to distance of the object from the camera, and its position in the field of view. Thus, we seek a set of features which is invariant with respect to scaling, translation and rotation in the image plane. Such a set of features is referred to as feature vector corresponding to a particular view of the object. Many different feature vectors based on Fourier descriptors [43] and moment invariants [5, 11, 20, 33, 39] have the desirable properties mentioned above. Fourier descriptors are functions of the boundary of the object and are consequently very sensitive to the inaccuracies accrued during the segmentation and the edge detection process. Spatial moment, in contrast, is a property of the mass distribution in the image. The features defined in term of these are thus more robust against sensor noise, especially for thresholded images considered here [33]. Through experimental results on handwritten numerals and aircraft pictures, Belkasim et al. have shown that normalized Zernike moment invariants work considerably better than Hu moments [20] in terms of their discrimination power. They are also shown to be marginally better than the principal axis moments in that respect. However, normalized Zernike moments are computationally more intensive. In this paper we have employed principal axis moment invariant based features to characterize different views of the object. Other invariants may also be used without affecting the solution modality. Two-dimensional spatial moment of order ( p + q) of an image function f (x, y) is defined as ZZ ∞ M pq = x p y q f (x, y) dx dy −∞
( p, q = 0, 1, 2 · · ·).
(1)
For the binary image, f (x, y) takes value 1 if (x, y) ∈ object and 0 otherwise.
The invariance to translation can be achieved by calculating moments about the center of mass of the object. The scale invariance can be obtained by normalizing the moments such that the 0th order moment is unity. The normalized central moments are defined as ZZ ∞ (x − x) ¯ p (y − y¯ )q f (x, y) dx dy µ0pq = λ2+ p+q −∞
(2)
√ where x¯ = M10 /M00 , y¯ = M01 /M00 , and λ = 1/ M00 .
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation
The rotational invariance can be obtained using a suitable combination of the central moments such that they contain all the information of the original moment set except the angle of rotation. Hu [20] has defined one such set of seven algebraic moment invariants of order 3. The rotational invariants can also be obtained by rotating all normalized central moments as defined in Eq. (2), by an angle ψm such that the central moment µ11 = 0. Here ψm represents the angle that the original image axes make with the principal axes of the best fit ellipse and is defined as (see [20]) µ ¶ 1 2µ011 . (3) ψm = tan−1 2 µ020 − µ002 We may note that ψm , as obtained above may be with respect to either the major principal axis or the minor principal axis. However, if we ensure that the resultant principal axis moment µ20 > µ02 , then ψm represents the angle that the major principal axis of the best fit ellipse makes with the image plane x-axis. We should also ensure µ30 ≥ 0 so that the positive major axis points in the same direction in both the standard view and the given image. The set of spatial moments calculated about the major principal axis forms a set of scale, translation and rotation invariant features. With simple algebraic manipulations it follows that the normalized principal axis moments can be defined in terms of the central moments as [33]: µ pq =
q p X X
µ ¶µ ¶ q q−s p (−1) (cos ψm ) p−r +s r s
r =0 s =0
×(sin ψm )q+r −s µ0p+q−r −s,r +s
(4)
The invariant moments can thus be obtained from these central moments with very little computational effort. 4.
Estimation of the View of the Aircraft
With the change in viewing direction, the shape of the object on the image plane changes. The feature vector is thus dependent on the viewing direction of the object. In this section we discuss the estimation of the viewing direction given the rotation, translation and scale invariant feature vector for the image. Various algorithms have been proposed in the literature to estimate the viewing direction and recognition of a 3-D object. They include normalized quadtree representation [7], and syntactic pattern recognition [44].
259
The most popular among them is probably the library view method [11, 42, 43]. In this approach, a set of feature vectors of dimension n is stored in a library of views, where each vector is representative of a particular view of the 3-D object. When an image is obtained, its corresponding feature vector is calculated. A suitable viewing angle is assigned to the image by a search algorithm based on a minimum distance or a k-nearest neighbor classifier. The search can be limited over a subset of the library views if a limit on the maximum possible change from the previously calculated orientation can be defined a priori [42]. The algorithm is easily extended to more than one object by storing the views of each object and making comparison over the resultant (n + 1)-dimensional space [11]. The above method works reasonably well for the target identification problem. However, due to the following reasons it is not suitable for accurately determining the target orientation in real time applications: • Since the distance of the image feature vector to every library view (belonging to a chosen subset) has to be calculated, the computation time can be undesirably large. • Due to the lack of interpolation between the library views, for a reasonably accurate estimation of orientation, one has to store the library views at small intervals. This implies a prohibitively large database requirement, and an increased computation time. In the library view method, the estimation of viewing direction is considered as a classification problem, where each view of the object forms a distinct class. Thus, there is a minimum error depending upon the number of classes considered over the range of viewing directions. Wallace and Mitchell [42] have employed the linearity property of normalized Fourier descriptors to interpolate between the library views and thus define a continuum of library projections. This yields better estimates of the orientation. However, since the interpolation has to be performed for each view, the algorithm is quite slow. In the following section, we take a different approach to this problem wherein the direction of view of the aircraft is functionally related to the feature vector of the object image. Thus, the view of the aircraft can directly be calculated from the relationship, once the feature vector for the image has been extracted. Two neural network topologies (NNT) are discussed in this section to solve the resulting functional approximation problem relating the viewing direction to the feature vector for the image.
P1: VBI Journal of Mathematical Imaging and Vision
260
4.1.
KL550-04-Agarwal
January 29, 1998
Agarwal and Chaudhuri
Viewing Direction as a Function of Feature Vector
Since each element of the feature vector represents a physical quantity (feature) of the target shape for any specific view of the aircraft, we could expect these features to vary smoothly with the variation in viewing angle of the target. Thus, each element of the feature vector xE = (x1 , x2 , . . . , xn ) can be defined as a piece-wise smooth function of two orientation angles (θ, φ) as: xi = gi (θ, φ),
i = 1, 2, . . . , n.
(5)
The set of functions (gi (θ, φ), i = 1, 2, . . . , n) constitutes an approximate feature space representation of the 3-D object. We prefer calling this representation an approximate one since the set of functions may not be complete with respect to specifying all attributes of the 3-D shape of the object. The estimation of orientation can now be defined as an inverse problem, that is, given the object representation gi (θ, φ) for all i, find the orientation angle (θ0 , φ0 ) for a given feature vector xE, xi = gi (θ, φ)|(θ0 ,φ0 ) . Note that gi (θ, φ) may not be monotonic over domain (θ ∈ [θ1 , θ2 ] and φ ∈ [φ1 , φ2 ]) and thus the inverse relationship may be multivalued. However, for sufficiently many features constituting the feature vector, we can expect a one-to-one relationship to exist between the feature vector and the orientation. While one feature is not monotonic over some domain the other one could very well be, resolving the nonuniqueness of the estimate. The problem of estimating the viewing direction is thus reduced to finding a functional relationship F:
xE ∈ Ä ⊂
single perceptron) in between the input and the output layers. Each node in any layer is fully connected to each node in the layer below it. A sigmoidal nonlinearity given by f s (u) = (1 + e−βu )−1 is most popularly used to introduce the nonlinearity at the output of each node. Here β is a scale factor that defines the steepness of the transition region of the nonlinearity. The MLP has the desired capability of not only to implement, but also to learn nonlinear transformations for functional approximation problems [21]. The network stores an approximate input-output relationship in terms of its weights. At the time of training, the weights are adjusted such that the cumulative error in the output over the training set, defined as Jp =
p X ¯ ¯ ¯ yEd − yEa ¯2 i i
(7)
i=1
is minimized. Here, p is the size of the training set and the subscript d and a stand for the ‘desired’ and the ‘actual’ network outputs, respectively. For our orientation estimation problem, n-dimensional feature vector constitutes the input pattern to the network. The output of the network gives an estimate of the viewing angle (θ, φ) of the target. The MLP solves the functional approximation problem by combining simple functional units (rounded step functions for sigmoidal nonlinearity), formed by hidden layer nodes [21]. It has been shown that a twolayer MLP is capable of forming an arbitrarily close approximation to any continuous nonlinear mapping [10]; however, there is no objective manner in which to choose the number of nodes in each layer. For some complex mapping, the network size (number of nodes and weights) can be arbitrarily large. Moreover, since any error in the output affects all the network weights, a large error for some training pattern undermines the accuracy achieved for the rest of the training set.
(6)
Rather than explicitly calculating this functional relationship (which can be a daunting task), we attempt to train a suitable neural network to learn the relationship taking advantage of the functional approximation capability of the neural networks. In the rest of this section, two different neural network topologies (NNT) are presented in order to obtain the relationship. 4.2.
12:17
NNT-I: Multi-layer Perceptron
A multi-layer perceptron (MLP) is a feed forward net with one or more hidden layers of nodes (neuron or
4.3.
NNT-II: MLP with Kohonen Clustering
In light of above observations, we can expect better results if the complete mapping from input set (feature vector) to the output set (viewing direction) is divided into mappings from a suitable subset of the inputs to the corresponding output subset. For properly chosen subsets, we may expect the mapping between them to be simpler than the one that exists between the complete set of inputs and the outputs. It must then be easier to train an MLP network for the resulting functional approximation problem for these simpler input-output relationships. Also, the error in the estimate for any
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation
training pattern affects only the patterns belonging to its class. Thus, if Ä ∈
m [
Äi
and Äi ∩ Ä j = {∅}
for i 6= j, (8)
261
need to be chosen such that a stable classification is obtained. Typically, η is progressively reduced to close to zero from an initial level of 0.2 or so. Also, Ne is initialized to be a large number (less than the number of clusters) and is gradually reduced to zero. 4.4.
Architecture and Training for NNT-II
i=1
where m is the number of classes formed. The subsets need to be disjoint without which a particular input may fire up more than one MLP, introducing an ambiguity in the output. Moreover, Äi should be connected. This partition of the view-space can be viewed as obtaining an aspect graph of the object such that each subset Äi represents a different general view of the object. As a consequence of the above partition of the feature space, the nonlinear transformation F(x), gets simplified into m functionals x ), yE = Fi (E
xE ∈ Äi , yE ∈ 8i ,
i = (1, . . . , m)
(9)
where 8i ⊂ 8 is the output subset corresponding to Äi . Having constructed the framework, we need a mechanism to define these clusters in a consistent manner. Kohonen self-organizing network [25] provides one such simple method to do this clustering. The number of output nodes is kept equal to the number of required clusters. The output nodes are connected among themselves with lateral inhibition. The weights from each input to a given cluster adjusts itself such that all feature vectors with properties similar to the vector stored at these weights are assigned to the same class. The properties of the feature vector to be matched are the magnitude and direction of the vector in n-dimensional space. Given the number of clusters to be formed and an initial guess for the weight vector, learning algorithm for Kohonen network automatically adjusts the elements of the weight vector in such a manner as to form m connected, nonoverlapping clusters. The steps involved in training the Kohonen network may be found in [4]. The choice of the number of clusters is made by trial and error. The number of clusters should be quite small failing which the computational overhead increases in the second stage (we need to train as many MLPs). The weight vectors for the network are initialized such that they are oriented in the Euclidean space randomly around the center. This ensures that each node has equal likelihood of being chosen as the best match in the early stages of the training, thus allowing all the output nodes to participate in the classification process. A neighborhood (Ne ) and a learning rate (η)
Once the classes have been defined by the Kohonen layer, the functional relationship from the jth input subset (Ä j ) to the corresponding output subset (8 j ) is trained on separate multi-layer perceptrons for each class with no interconnection across one another. A schematic diagram of the proposed network architecture is shown in Fig. 2. Since all the MLPs are mutually non-interacting, the error in one region affects only the weights of the class it belongs to, and hence does not undermine the results in any other class. Moreover, the training for the MLP corresponding to each class can be accomplished with a lesser number of nodes because of the simpler relationship that exists between the respective input-output sets. Owing to a smaller number of nodes in the hidden layer this topology is expected to provide a better generalization for the untrained data. The major steps of the algorithm to train the network for a given target type are summarized below: (Step 0) Get the training data sets: Training data consists of the feature vectors at different viewing directions (θ, φ) sampled at a fairly regular interval over the range of view for the given target.
Figure 2. topology.
Schematic diagram of the proposed neural network
P1: VBI Journal of Mathematical Imaging and Vision
262
KL550-04-Agarwal
January 29, 1998
Agarwal and Chaudhuri
(Step 1) Kohonen clustering: Divide the input space in the required number of subsets. The number of classes should be large enough so that the complex functional relationship is effectively simplified, at the same time it should have a reasonable size of each cluster so that the MLPs can be properly trained. (Step 2) Define network architecture: Assign a suitable number of hidden layers and the number of nodes for the multi-layer perceptron corresponding to each Kohonen cluster. Each MLP network is fully connected among its nodes with no interconnections between the nodes belonging to different clusters. (Step 3) Train the network: The input feature vector is presented to the Kohonen layer. Kohonen network output assigns this input to one of the classes, say j ∗ . The input-output set is then fed only to the multi-layer perceptron corresponding to class j ∗ . The weights for this network are updated in exactly the same way as for a simple multi-layer perceptron [29]. The weights for no other clusters are updated. (Step 4) Check: After enough training through the complete training set, a test for learning and generalization is made. If found unsatisfactory, the procedure can be repeated from Step 2. If the results are yet not satisfactory the Kohonen clustering may have to be repeated with an increased number of output classes. The neural network, after proper training, defines the transformation from an image feature vector to the viewing direction of the 3-D object. When the image at any arbitrary viewing direction is obtained, the feature vector is calculated and is fed to the feedforward neural network as trained above. The output of the net directly gives an approximate viewing direction.
5.
12:17
Estimation of Rotation about Optical Axis
Once the viewing direction has been calculated, the rotation in the image plane can be found as the angle that the principal major axis of the image makes with that of the standard view in that viewing direction. Thus, ψ = ψm |image − ψm |standard ,
(10)
where ψm is the inclination of the major axis of the best fit ellipse (principal axis) with respect to z c axis of CCS.
Figure 3.
Schematic diagram for the orientation estimation scheme.
In the library view method the angle ψm |standard is stored along with the feature vector for each library view. Since ψm |image can be calculated for the given image, the angle ψ can directly be determined from Eq. (10). However, in our case the viewing direction is continuous valued. Thus, for every viewing direction (θ, φ) there is a unique ψm |standard , which must be stored. If we were to store this angle in a look up table, the retrieval of the data can be done very quickly, however, the amount of data to be stored may be quite large depending on the required accuracy. In order to save memory, we define the standard orientation ψm |standard as a function of the viewing direction. A two inputs (θ, φ), one output (ψm |standard ) multi-layer perceptron is trained to store this relationship for each type of the target. For every estimate of the viewing direction (θ, φ), this MLP network gives the corresponding estimate for ψm |standard . Figure 3 shows the schematic diagram of the complete system for the estimation of the orientation of the target in 3-D space. 6. 6.1.
Simulation Experiments Results
The training and test images used in the following experiments have been generated from a 3-D geometric model of the X-29 demonstration aircraft. We now list the design steps involved in the experiment. • Generating Training and Test Data: For a binary image as considered here, views at (θ, φ) and (θ − π, −φ) differ from each other merely by a reflection. Moreover, since the aircraft is symmetric about x–z plane, the views at (θ, φ) and (−θ, φ) are also the same. Thus, the range of significant views is limited to the quarter sphere, 0 ≤ θ ≤ π , 0 ≤ φ ≤ π/2. However for the sake of convenience in simulation experiments, we limit the range of views to
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation
Figure 4.
263
Some of the typical X-29 aircraft views.
the quarter hemisphere given by 0 ≤ θ ≤ π/2, 0 ≤ φ ≤ π/2 in this study. Figure 4 shows some of the typical views of the aircraft. The complete training set consists of 236 aircraft views, sampled uniformly at different viewing directions covering the quarter hemisphere defined above. Binary images are generated on a 640 × 480 grid for each of these views. Similarly, a set of 40 test images at arbitrary angles over the quarter hemisphere is obtained to verify the generalization achieved by the neural networks. • Calculate Moment Invariants: From each image, a total number of eleven principal axis moment invariants (upto fourth order moments) are calculated. Also, the values of angle ψm |standard for these images are stored. Higher order moments were not used because they were found to be very sensitive to noise and change in scale. To validate our assumption of smooth variation of features with the change in aircraft views, and to get a visual feel of the correlation that exists between various components of the feature vector, each element of the feature vector is interpolated over a grid of viewing angles, given their values over the training set. The gradient projection interpolation algorithm [17] is used for the purpose. Figure 5 shows the plots obtained after interpolation for some of these principal axis moments. As we can see, the features vary smoothly with the viewing direction as expected, notwithstanding the fact that the interpolation scheme yields a C 2 function. • Normalize the Training Patterns: In order to keep the size of the network small, a set of nine most significant principal axis moments is experimentally chosen as the feature vector. The moments µ40 and µ03 are dropped since the plots of these features show that these moments do not contribute much additional information, that is not already included in the
other nine features. Experimentation with and without µ40 and µ03 yielded almost indistinguishable results. All the training patterns (236 input-output sets) are now normalized with suitable mapping functions such that the input features take values from 0 to 1, while the output is normalized between 0.3 and 0.7. The normalization of the input in this manner ensures that all the features are weighted equally. If one has any a priori knowledge about the relative merits of individual components of the feature vector in evaluating the viewing direction, the weights to the input could be changed suitably. The normalization of output vector is needed to avoid the network from going into saturation (sigmoidal nonlinearity at the output nodes goes into saturation around ‘0’ and ‘1’). • Estimation of Viewing Direction Using a Single MLP: An MLP with two hidden layers and having 8 nodes in first hidden layer and 5 in the second is trained using the complete set of 236 feature vectors. The backpropagation training algorithm as outlined in [29] along with the momentum term has been followed for the training purposes. Training is done for 9000 feeds through the feature set in random order, with the learning rate and momentum term progressively being reduced. Figure 6 shows the error map over the training set at the end of the training. The error in each viewing direction (θ, φ) is represented by an error vector, where ‘•’ corresponds to the true viewing direction. The magnitude of the error is thus represented by the length of the error vector, while its θ or φ component can be obtained from the direction of the vector. The errors are relatively high and lie mostly in the range of 2◦ –10◦ . • Estimation of Viewing Direction Using MLPs with Kohonen Clustering: Instead of training a single MLP to estimate the view of the aircraft, we employ
P1: VBI Journal of Mathematical Imaging and Vision
264
Figure 5.
KL550-04-Agarwal
January 29, 1998
12:17
Agarwal and Chaudhuri
Mesh plots of the features over the range of viewing directions.
Figure 6. Errors in estimation for the training pattern using MLP. Here ‘•’ corresponds to the true value of viewing direction while the error in estimation is given by the length of the error vector drawn from •.
Kohonen clustering to first divide the input-output mapping into many simpler mappings. The set of normalized feature vectors is clustered in four classes following the steps as outlined in Section 4.3. The weight vectors are initialized such that they represent some random vectors in Euclidean space, around a vector of unity √ magnitude and all direction cosines equal to 1/ n, where n = 9. Moreover, since there are only four clusters to be made, the neighborhood Ne is taken as 0. The learning rate is progressively reduced from 0.2 to 0. Figure 7 shows the classes as clustered after 1500 feeds through the data. The MLP architecture for each of the above four classes is decided after experimentation with different nodal arrangements. The training of the networks is accomplished following the training algorithm outlined in Subsection 4.4. Figure 8 shows the error map over the training set with the MLP architecture as indicated in the figure. An MLP architecture with n inputs and m outputs is represented as n- p-q
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation
Figure 7. Classification of the training patterns achieved using Kohonen clustering. Here the classes 1 to 4 are given by ‘◦’, ‘?’ ‘+’ and ‘×’, respectively.
265
Figure 9. Errors in the estimate for the test data (previously unseen) using both neural network topologies. Here •—× corresponds to the error in the single MLP scheme, and •— corresponds to the same for the MLPs with Kohonen clustering.
the range 1◦ –10◦ , whereas it was mostly below 4◦ in the second case. The performance of neural network architecture (NNT-II) was also tested with noisy input images and with different scaling. Two different sequences of target motion were simulated. Figure 10 shows the corresponding trajectories as projected on the φ–θ plane. Corresponding 3-D poses for the entire motion sequence were recovered for the object from its silhouettes. In order to simulate noise in the images, the object pixels (assumed dark) lying on
Figure 8. Error map for the training patterns using MLPs with a Kohonen layer. MLP architectures used are: class1 → 9-7-2; class2 → 9-5-2; class3 → 9-8-2; class4 → 9-8-5-2.
-· · ·-m, where p, q, . . . are the number of nodes in each of the hidden layers. The mode of distribution of the errors was found to be about 2◦ . To study the generalization and interpolation capability of the networks, orientation estimates for the test images are obtained with both neural network topologies. Figure 9 shows the corresponding error map with the two neural network topologies. •—× shows the error in estimation with topology-I, while •— shows the corresponding error with NNT-II. The errors in the first case were found to lie scattered in
Figure 10.
Aircraft trajectory of two simulated paths.
P1: VBI Journal of Mathematical Imaging and Vision
266
KL550-04-Agarwal
January 29, 1998
12:17
Agarwal and Chaudhuri
Table 1.
Estimation error under noise and scale changes. Trajectory 1
Trajectory 2
Error θ ◦
Error φ ◦
Error θ ◦
Error φ ◦
3.3927
2.2856
2.0398
1.5859
(5%)
3.5895
2.8635
3.5833
2.1420
(10%)
4.3770
3.1906
3.6534
2.3342
(20%)
4.9940
3.7749
3.3182
3.4542
(30%)
5.3565
4.6607
4.7859
4.0804
(0.5)
2.9818
2.7202
3.4615
2.6260
(0.33)
4.6822
4.2728
2.5910
3.9753
(0.25)
5.1877
4.2322
4.1201
5.2775
Original Noise
Scale
the boundary of the silhouette are switched to background intensity with a specified probability. The experiment was performed with varying amounts of noise perturbations, namely 5, 10, 20 and 30 percent. Similarly, the experiments were also repeated for both the motion data sequences with varying scale factors. The scale is changed by subsampling the image by different factors, such as 2, 3 and 4. The error variances in θ and φ for all these cases are shown in Table 1. • Estimation of Rotation About the Optical Axis: It was experimentally found that the variation in ψm |standard for each viewing direction is fairly smooth (except in small region around θ = 0◦ , φ = 30◦ . Thus, a 2 input-1 output multi-layer perceptron can be trained to capture this relation. However, the results with a single MLP were not satisfactory [2]. Four separate MLPs were trained for each of the classes obtained earlier with Kohonen clustering. The error in the estimation of ψm |standard obtained with this approach is shown in Fig. 11. In the figure, the error in the estimation of ψm |standard is given by the length of the segment. The line segment pointing at 45◦ indicates positive error, while the one at −135◦ implies negative estimation error. Also, the segments ending with an arrow sign indicate an error above 10◦ .
6.2.
Observations and Discussions
Our observations about the performance of the proposed scheme to estimate the 3-D orientation of a target can be summarized as follows:
Figure 11. Errors in the estimation of ψm |standard over the range of viewing directions. MLP architectures used in this study are: class1 → 2-5-2-1; class2 → 2-4-2-1; class3 → 2-3-1; class4 → 2-3-1.
1. The error in the estimation of viewing angle is very small in most of the regions in the case of network topology-II (i.e., with Kohonen clustering). Also, the errors with this topology are much smaller compared to that obtained with a single network. This clearly shows the advantage of using the Kohonen clustering in order to divide the functional approximation problem into many simpler ones. 2. The error in the estimates of θ when φ is close to 90◦ is found to be quite large. But this error is expected as, in this region, the variation in the feature vector with θ is very small (see Fig. 5). The reason for this small variation in feature vector over this region can be seen easily. The longitude lines (θ = constant) near the pole (φ = 90◦ ) are very close together, thus, a change in θ does not result in an appreciable change in the projection of the aircraft. Hence the feature vectors are quite similar. However, the error in estimating θ is tolerable in this region, since this error gets offset by the resulting estimate of ψ. For example at φ = 90◦ (top view of the aircraft) feature vectors for all θ are the same. Thus the aircraft pose given by (θ, φ) = (0◦ , 90◦ ) and ψ = 0◦ is the same as that corresponding to (θ, φ) = (45◦ , 90◦ ) and ψ = − 45◦ . Similarly a relatively large error occurs in the estimation of viewing angle in the region given by 75◦ ≤ θ ≤ 90◦ ,
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation 0◦ ≤ φ ≤ 15◦ . This error is due to the fact that the feature vector is not discriminatory enough with respect to change in viewing angles over this domain (see Fig. 5). This may be attributed to the geometry of the aircraft in this region (side view). Inclusion of higher order moments may not alleviate this problem as they are more sensitive to noise. 3. From Fig. 9, we can see that the neural network topology with Kohonen clustering yields good estimates of the viewing direction even for the untrained views. This shows that the network has, indeed, learnt the input-output functional relationship rather than just memorizing the training patterns. We suggest the use of this architecture for the estimation of the view of the aircraft. 4. The error in the estimation of ψm |standard is reasonably small in most cases. However, due to a sudden change in ψm |standard at around θ = 0◦ and φ = 30◦ , the error in the vicinity of this region is undesirably large (as much as 25◦ ). The reason for this sudden change is that the best fit ellipse in this region is almost circular, thus the principal axes no longer bear any meaning. At a particular view of the aircraft (this will be different for different target types), the major principal axis changes from being perpendicular to the fuselage to around the axis of the fuselage. When this transition takes place it becomes difficult to train the network. 5. Experiments were also performed on noisy silhouettes of the object, and the results were found to be of good accuracy. The increase in error variance for the pose angles is quite marginal with the increase in perturbations of the silhouette. Similarly, the effect of scaling (i.e., reduction in pixel resolution) was also found to be quite gradual during the sensitivity analysis of the proposed technique. Hence, the method can be used for quite accurate results even under varied imaging conditions.
7.
Conclusions
A neural network approach to solving the problem of estimating the 3-D orientation of an aircraft, given its monocular, scaled orthographic, binary image from any arbitrary viewing angle, has been presented in this paper. The estimation of the orientation is modeled as a functional approximation problem. Two different neural network topologies to capture this complex nonlinear relationship have been discussed. The comparison of the results obtained with these neural
267
network topologies suggests that a substantial improvement in the functional approximation can be achieved by clustering the input space in a suitable number of subsets and then training separate multi-layer perceptrons for each of these clusters. It may be interesting to compare the proposed technique with the library view method with interpolation capabilities. The Kohonen clustering process of splitting the input space is equivalent to determining the corresponding closest library view and the MLP output corresponds to the interpolation scheme. However, the number of clusters being much smaller compared to the number of library views, the proposed technique requires less storage but emphasizes more on the accuracy of the interpolation process by the MLP unit. The proposed orientation estimation system can easily be implemented for real time applications in target tracking. Once the network is trained, it takes only a fraction of a second on a workstation (we used an alpha400 machine) to obtain the pose information, given the silhouette of the aircraft. We plan to use the pose information thus obtained in conjunction with the range and bearing information to develop a target-tracking system for improved accuracy. It may also constitute a building block in an object-identification algorithm. Moreover, since this technique does not need any of the time-consuming edge-detection routines, it can be employed in various industrial automation and robotic applications where the orientation estimation and/or object identification of an isolated object need be done, such as in a vision-guided pick-and-place robot. A closer look at the error map obtained with NNT-II reveals that a relatively large estimation error is obtained at the boundary of the clusters. This suggests a possible use of overlapping clusters at the classification level. Also, we may note that the clustering of input feature space results in a simpler input-output relationship only if each of the input and its corresponding output classes form a connected set. For the Kohonen clustering algorithm, the input clusters are always connected, however the same cannot be said of the resultant clustering in the output space. For the present experiments, the output clusters are indeed connected (Fig. 7), however more work needs to be done in this regard. In the present paper we have considered only static views of the target for the estimation of the orientation. The dynamics of the maneuvering target has not been considered in this study. We are currently looking into ways of incorporating the evolution of pose angles with time while estimating the orientation for an improved performance.
P1: VBI Journal of Mathematical Imaging and Vision
268
KL550-04-Agarwal
January 29, 1998
12:17
Agarwal and Chaudhuri
Acknowledgments The authors wish to gratefully acknowledge the suggestions from the referees that have greatly improved the presentation of this paper. References 1. M.A. Abidi and R.C. Gonzalez, “The use of multisensor data for robotic applications,” IEEE Trans. on Robotics and Automation, Vol. 6, pp. 159–177, 1990. 2. S. Agarwal, “Imaging sensor based target tracking, guidance and control,” M.Tech. Dissertation, Department of Electrical Engineering, IIT Bombay, 1993. 3. D. Andrisani, F.P. Kuhl, and D. Gleason, “A nonlinear tracking using attitude measurements,” IEEE Trans. Aero. Electronic Systems, Vol. AES-22, pp. 533–538, 1986. 4. R. Beale and T. Jackson, Neural Computing: An Introduction, Adam Hilger, Brisol, 1990. 5. S.O. Belkasim, M. Shridhar, and M. Ahmadi, “Pattern recognition with moment invariants: A comparative study and new results,” Pattern Recognition, Vol. 24, pp. 1117–1138, 1991. 6. Z. Chan and S.Y. Ho, “Computer vision for robust 3-D aircraft recognition with fast library search,” Pattern Recognition, Vol. 24, pp. 375–390, 1991. 7. C.H. Chien and J.K. Aggarwal, “A normalized quadtree representation,” Comput. Vision, Graphics and Image Process, Vol. 26, pp. 331–346, 1984. 8. R.T. Chin and C.R. Dyer, “Model based recognition in robot vision,” ACM Computing Surveys, Vol. 18(1), pp. 68–108, 1986. 9. J.R. Cloutier, J.H. Evers, and J.J. Feeley, “Assessment of airto-air missile guidance and control technology,” IEEE Control System Magazine, Vol. 9, pp. 27–34, 1989. 10. G. Cybenko, “Approximation by superposition of a sigmoidal function,” Mathematics of Control, Signal, and Systems, Vol. 2, pp. 303–314, 1989. 11. S.A. Dudani, K.J. Breeding, and R.B. McGhee, “Aircraft identification by moment invariants,” IEEE Trans. on Computers, Vol. C-26, pp. 39–45, 1977. 12. D.W. Eggert, K.W. Bowyer, C.R. Dyer, H.I. Christensen, and D.B. Goldfof, “The scale space aspect graph,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 11, pp. 1114–1130, 1993. 13. O. Faugeras, A. Pentland, J.L. Mundy, R. Jain, N. Ahuja, C. Dyer, K. Ikeuchi, and K. Bowyer, “Why aspect graphs are not (yet) practical for computer vision,” CVGIP: Image Understanding, Vol. 55, No. 2, pp. 212–218, 1992. 14. D. Forsyth, J.L. Mundy, A. Zisserman, C. Coelho, A. Heller, and C. Rothwell, “Invariant descriptors for 3-D object recognition and pose,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 13, No. 10, pp. 971–991, 1991. 15. D. Forsyth, J.L. Mundy, A. Zisserman, and C. Rothwell, “Recognizing rotationally symmetric surfaces from their outlines,” in Proc. European Conf. Computer Vision, Santa Margherita Ligure, Italy, 1992, pp. 639–647. 16. Z. Gigus, J. Canny, and R. Seidel, “Efficiently computing and representing aspect graphs of polyhedral objects,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 13, No. 6, pp. 542–551, 1991.
17. W.E.L. Grimson, From Images to Surfaces: A Computational Study of the Human Early Visual System, MIT Press: Cambridge, 1981. 18. W.E.L. Grimson, D.P. Huttenlocker, and T.D. Alter, “Recognizing 3D objects from 2D images: An error analysis,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Champaign, IL, 1992, pp. 316–321. 19. B.K.P. Horn, Robot Vision, MIT Press: Cambridge, 1986. 20. M.K. Hu, “Visual pattern recognition by moment invariants,” IRE Trans. Info. Theory, Vol. IT-8, pp. 179–187, 1962. 21. D.R. Hush and B.G. Horne, “Progress in supervised neural networks, what’s new since Lippmann?,” IEEE Signal Processing Magazine, Vol. 10, pp. 8–37, 1993. 22. H.G. Hutchins and D.D. Sworder, “Image fusion algorithms for tracking maneuvering targets,” AIAA Journal of Guid. and Control, Vol. 15, pp. 175–184, 1992. 23. D. Huttenlocher and S. Ullman, “Recognizing solid objects by alignment with an image,” Int. J. Comp. Vis., Vol. 5, No. 2, pp. 195–212, 1990. 24. J.D. Kendrick, P.S. Maybeck, and J.G. Reid, “Estimation of aircraft target motion using orientation measurements,” IEEE Trans. Aero. Electro. Systems, Vol. AES-17, pp. 254–259, 1981. 25. T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag: Berlin, 1984. 26. R. Krishnan, H.J. Sommer III, and P.D. Spidaliere, “Monocular pose of a rigid body using point landmarks,” CVGIP: Image Understanding, Vol. 55, No. 3, pp. 307–316, 1992. 27. Y. Lamdan, J.T. Schwartz, and H.J. Wolfson, “Affine invariant model-based object recognition,” IEEE Trans. Robotics and Automation, Vol. 6, No. 5, pp. 578–589, 1990. 28. C.C. Lefas, “Algorithm for Improved, heading assisted, maneuver tracking,” IEEE Trans. Aero. and Electronic Systems, Vol. AES-21, pp. 351–359, 1985. 29. R.P. Lippmann, “An introduction to computing with neural nets,” IEEE Acoustics, Speech and Signal Process. Magazine, Vol. 4, pp. 4–21, 1987. 30. D.G. Lowe, Perceptual Organization and Visual Recognition, Kluwer Academic Publishers: Hingham, MA, 1985. 31. Y. Moses and S. Ullman, “Limitations of nonmodel-based recognition schemes,” in Proc. European Conf. Computer Vision, Santa Margherita Ligure, 1992, pp. 820–828. 32. N.P. Papanikolopoulos, P.K. Khosla, and T. Kanade, “Visual tracking of a moving target by a camera mounted on a robot: A combination of control and vision,” IEEE Trans. Robotics and Automation, Vol. 9, pp. 14–36, 1993. 33. A.P. Reeves, R.J. Prokop, S.E. Andrews, and F.P. Kuhl, “Threedimensional shape analysis using moments and Fourier descriptors,” IEEE Trans. on Pattern Anal. Mach. Intell., Vol. PAMI-10, pp. 937–943, 1988. 34. C. Rothwell, A. Zisserman, J. Mundy, and D.A. Forsyth, “Efficient model library access by projective invariant indexing function,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, Champaign, IL, 1992, pp. 109–114. 35. W.B. Seales and C.R. Dyer, “Viewpoint from occluding contour,” CVGIP: Image Understanding, Vol. 55, No. 2, pp. 198– 211, 1992. 36. D.D. Sworder and R.G. Hutchins, “Maneuver estimation using measurements of orientation,” IEEE Trans. on Aero. and Electronic Systems, Vol. AES-26, pp. 625–638, 1990. 37. D.D. Sworder, R.G. Hutchins, and M. Kent, “Utility of imaging sensors in tracking systems,” Automatica, Vol. 29, pp. 445–450, 1993.
P1: VBI Journal of Mathematical Imaging and Vision
KL550-04-Agarwal
January 29, 1998
12:17
Determination of Aircraft Orientation
38. C.H. Teh and R.T. Chin, “On image analysis by the method of moments,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. PAMI10, pp. 496–513, 1988. 39. M. Teague, “Image analysis via the general theory of moments,” J. Optical Society of America, Vol. 70, pp. 920–930, 1980. 40. D.W. Thompson and J.L. Mundy, “Three-dimensional model matching from an unconstrained viewpoint,” in Proc. IEEE Int. Conf. on Robotics and Automation, Raleigh, NC, 1987, pp. 208– 220. 41. S. Ullman and R. Basri, “Recognition by linear combination of models,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 13, No. 10, pp. 992–1006, 1991. 42. T.P. Wallace and O.R. Mitchell, “Analysis of three dimensional movement using Fourier descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., Vol. PAMI-2, pp. 583–588, 1980. 43. T.P. Wallace and P.A. Wintz, “An efficient three-dimensional aircraft recognition algorithm using normalized Fourier descriptors,” Comput. Graphics and Image Process., Vol. 13, pp. 99– 126, 1980. 44. K.C. You and K.S. Fu, “Distorted shape recognition using attributed grammars and error correcting technique,” Comput. Graphics and Image Process., Vol. 13, pp. 1–16, 1980.
Sanjeev Agarwal received his Integrated M.Tech. degree in electrical engineering from the Indian Institute of Technology, Bombay
269
in 1993. He is currently pursuing his Ph.D. degree at the University of Missouri, Rolla. His research interests include computer vision, invariant theory, neural networks, artificial intelligence and control theory.
Subhasis Chaudhuri was born in Bahutali, India. He received his B.Tech. degree in electronics and electrical communication engineering from the Indian Institute of Technology, Kharagpur in 1985. He received the M.S. and the Ph.D. degrees, both in electrical engineering, respectively, from the University of Calgary, Canada and the University of California, San Diego. He joined IIT, Bombay in 1990 and is currently serving as an associate professor. He has also served as a visiting professor at the University of ErlangenNuremberg, Germany during the summer of 1996. He is a fellow of the Alexander von Humboldt Foundation. His research interests include image processing and computer vision, pattern recognition and biomedical signal processing.