Three-Dimensional Human Body Model Acquisition from Multiple Views

We present a novel approach to the three-dimensional human body model acquisition from three mutually orthogonal views. Our technique is based on the ...

2 downloads 18 Views 2MB Size

Download PDF

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

International Journal of Computer Vision 30(3), 191–218 (1998) c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands. °

Three-Dimensional Human Body Model Acquisition from Multiple Views IOANNIS A. KAKADIARIS∗ AND DIMITRI METAXAS Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104-6389 [email protected] [email protected]

Received June 20, 1995; Revised June 4, 1998; Accepted June 30, 1998 Abstract. We present a novel approach to the three-dimensional human body model acquisition from three mutually orthogonal views. Our technique is based on the spatiotemporal analysis of the deforming apparent contour of a human moving according to a protocol of movements. For generality and robustness our technique does not use a prior model of the human body and a prior body part segmentation is not assumed. Therefore, our technique applies to humans of any anthropometric dimension. To parameterize and segment over time a deforming apparent contour, we introduce a new shape representation technique based on primitive composition. The composed deformable model allows us to represent large local deformations and their evolution in a compact and intuitive way. In addition, this representation allows us to hypothesize an underlying part structure and test this hypothesis against the relative motion (due to forces exerted from the image data) of the defining primitives of the composed model. Furthermore, we develop a Human Body Part Decomposition Algorithm (HBPDA) that recovers all the body parts of a subject by monitoring the changes over time to the shape of the deforming silhouette. In addition, we modularize the process of simultaneous two-dimensional part determination and shape estimation by employing the Supervisory Control Theory of Discrete Event Systems. Finally, we present a novel algorithm which selectively integrates the (segmented by the HBPDA) apparent contours from three mutually orthogonal viewpoints to obtain a three-dimensional model of the subject’s body parts. The effectiveness of the approach is demonstrated through a series of experiments where a subject performs a set of movements according to a protocol that reveals the structure of the human body. Keywords: human body model, three-dimensional model acquisition, motion-based part segmentation, integration of multiple views, articulated objects, physics-based modeling, deformable models, physics-based modeling 1.

Introduction

Computer vision has begun to play an increasingly important role in applications such as virtual reality, teleconferencing, anthropometry, human factors design, ergonomics and performance measurement of both athletes as well as patients with psycho-motor disabilities. All of these areas require knowledge of the parts of a subject’s body, their joints and their relative motion. ∗ Present address: Department of Computer Science, University of Houston, MS CSC 3475, Houston, TX 77204-3475; Email: ioannisk @uh.edu URL:http://www.cs.uh.edu/˜ioannisk.

In some tracking applications, such as determining if a person is moving towards or away from the camera, information about the movement of the centroid of the subject’s silhouette is adequate. However, in applications whose goals are the understanding of observations from interacting objects and reasoning about scene dynamics (Mann et al., 1996), information about the movement of the parts of the human body is essential. However, at present, no technique exists that segments the apparent contour (in a monocular image sequence) of a moving human and estimates the instant rotation center of the body parts.1 In addition, although laser-based methods have been developed to

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

192

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

acquire a three-dimensional model of the shape of a subject’s body, no technique exists that recovers a model in which the articulations are explicit. In particular, to capture the shape and texture of a subject’s body, Cyberware® has presented the Cyberware WB4 Whole Body Scanner. This scanner produces a high resolution mesh model of a human’s body, and has a very high acquisition speed. However, the model delivered is not articulated (therefore, it cannot be used directly for tracking of the subject’s body parts in 3D), and the cost of the equipment is very high ($410,000). To overcome this problem, most researchers have either used a standardized human model (Azuola et al., 1994), based on a given statistical population (e.g., based on the Army General Forces (NASA, 1978)), or they have manually measured the shape parameters of the body parts that are to be tracked (Goncalves et al., 1995; Rehg and Kanade, 1995). In addition, most of the existing approaches that employ a human body model (Akita, 1984; Hogg, 1983; Leung and Yang, 1987a, 1987b, 1995; O’Rourke and Badler, 1980; Rohr, 1994; Rehg and Kanade, 1994) use nondeformable models that can only approximate the human body and cannot adapt to different body sizes. Since tracking is sensitive to the shape parameters used, considerable amount of time is spent to tune these parameters (Goncalves et al., 1995). Thus, since there is no such thing as an “average human” (arithmetic mean) it is necessary to develop techniques for the automatic acquisition and segmentation of human body parts based on computer vision techniques. In this paper, we present a methodology for the acquisition of the three-dimensional shape and of the articulations of a human subject. In particular, the technique allows us: • to detect the body parts of a moving human from a monocular image sequence, to determine the twodimensional shape of a subject’s body parts and to estimate the projection of their instant rotation center. • to obtain a three-dimensional shape model of a subject’s body parts using information from multiple monocular image sequences. These threedimensional shape models of the body parts can be used in several applications including tracking human motion in 3D (Kakadiaris and Metaxas, 1996). Our approach has the following advantages: (1) allows the reliable shape estimation of the body parts, (2) detects their multiple joints, (3) estimates the projection of the instant rotation center of the joints be-

tween the parts, (5) integrates the processes of part segmentation and fitting, (6) applies to humans of any anthropometric dimension by making no assumptions of a prior model or part segmentation. This paper makes two contributions. The first contribution is the development of a novel Human Body Part Decomposition Algorithm (HBPDA) that automatically segments the apparent body contour of a moving human into its constituent parts. Initially, a single deformable model is used to fit the image data. As the human moves, the deformable model of her apparent contour deforms to fit the changing over time (due to the motion of the parts) image contours. Then, four mutually exclusive criteria are evaluated to test for the appearance of new parts. These criteria relate to whether significant protrusions are observed within the model of the apparent contour. Depending on which of these criteria is satisfied, we apply the corresponding algorithm that replaces a deformable model with a composed model or two new models. By applying the HBPDA iteratively over the subsequent frames, all the moving body parts are identified. Thus, we formally automate the task of deciding how many deformable models should be used to fit the apparent contours of a human whose body parts are in motion. In order to recover all the major parts of the human body and due to the absence of a prior model, the person under observation2 is requested to perform a set of movements according to a protocol3 that incrementally reveals the structure of the human body. In addition, we present a new approach to modeling and controlling the processes of the HBPDA. While multiple deformable models are employed to fit the data from a moving human, the physics-based fitting process for each deformable model is expressed as a finite state machine (FSM). The states of the FSM correspond to the differential equations used for model fitting and the transitions between states are caused by events representing quantitative changes in the evolution of the model parameters. The number of the models that are fitted to the data at each frame of the image sequence is changing over time and therefore there is a need for controlling the interaction between segmentation and fitting. Using the Supervisory Control Theory of Discrete Event Systems (Ramadge and Wonham, 1989) allows us to: (1) encapsulate both the continuous aspects of fitting and the discrete aspects of part segmentation, (2) decompose the overall design into components modeled by the simplest devices (FSMs), and (3) investigate the behavior of the participating processes which operate concurrently.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

The second contribution of this paper is the estimation of the three-dimensional shape of a subject’s body parts. To accomplish this task, information from images taken from three cameras placed orthogonally is integrated (Fig. 11). In addition, the subject is requested to perform a set of movements according to a generic, subject invariant protocol that allows the integration of information from multiple views. First, the image data from each active view are fitted using two-dimensional deformable models. During the movement new parts are detected based on the HBPDA. Depending on whether the new part is a previously unseen body part or a subpart of a part whose threedimensional model has already been recovered, two different algorithms are employed to estimate the threedimensional shape of the new part. At the end of all the movements, the three-dimensional shape of each of the observable parts of the human body is available. The rest of the paper is organized as follows:4 In Section 2 the theoretical framework for the analysis of the model’s deformations is formulated. In Section 3 the HBPDA is presented, while Section 4 describes the modeling of the processes of the HBPDA using the Supervisory Control Theory of Discrete Event Systems. The integration of the segmented two-dimensional apparent contours to obtain the three-dimensional shape of a body part is presented in Section 5. Finally, the effectiveness of the approach is demonstrated through a series of experiments in Section 6. 2.

Theoretical Framework

In this section, the theoretical framework that will allow the analysis of a model’s deformations is presented. We begin by reviewing the notation for deformable models and then we formulate the theory of parametric composition of geometric primitives. 2.1.

Deformable Models: Geometry

The models used in this work are two-dimensional contour and three-dimensional surface shape models. The material coordinates u (u = (v) and u = (u, v) for the two-dimensional and three-dimensional case, respectively) of a point on these models are specified over a domain Ä. The position of a point on the model relative to an inertial frame of reference 8 in space is given by a vector-valued, time varying function. In particular, the three-dimensional position of a point w.r.t. a

Figure 1.

193

Coordinate systems for deformable models.

world coordinate system is the result of the translation and rotation of its position with respect to a noninertial, model-centered coordinate frame φ (Fig. 1). Therefore, the position of a point (with material coordinate v) on a deformable model i at time t with respect to an inertial frame of reference 8 is given by the formula: 8

φi x(v, t) = 8 ti (t) + 8 φi Ri (t) p(v, t),

(1)

where 8 ti is the position of the origin Oi of the model frame φi with respect to the frame 8 (the model’s translation), and 8 φi Ri is the matrix that encapsulates the orientation of φi with respect to 8. φi p(v, t) is the position of a model point with material coordinate v w.r.t. the model frame i and can be expressed as the sum of a reference shape φi s(v, t) and a local displacement φi d(v, t) as given by the formula: φi

p(v, t) = φi s(v, t) + φi d(v, t).

(2)

The reference shape captures the salient shape features of the model and it is the result of applying global deformations T (such as tapering and bending) to a geometric primitive e = (ex , e y , ez )T . In particular, φi

s(v, t) = (sx , s y , sz )T = T(e; qT ),

(3)

where the global deformations T depend on the parameters qT . For example, the Constant Curvature Bending global deformation that we employ in this paper depends on the parameters qT = (b0 , b1 , b2 )T , where b0 denotes the radius of curvature and the parameters b1 and b2 denote the range of the bending zone. In particular, the bending deformation s = Tb (e; b0 , b1 , b2 )

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR

P2: MHL-NAL/MDR-BIS-RCK

International Journal of Computer Vision

194

KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

along a centerline parallel to the y-axis of a primitive e = (ex , e y , ez )T is given by: µ

¶ 1 1 + sinθ(e y − eˆ y ), + sx = cosθ ex − b0 b0 µ ¶ 1 b 1 + b2 (4) + s y = −sinθ ex − b0 2 + cosθ(e y − eˆ y ), sz = ez , where the bending angle θ is constant at the extremities and changes linearly in the bending zone. Specifically: ¶ µ (b1 + b2 ) θ = b0 eˆ y − , 2 ¶ µ b1 − b2 b , θ = b0 2    b1 , if e y ≤ b1 , eˆ y = e y , if b1 < e y < b2 ,   b2 , if e y ≥ b2

(5) (6)

e(v) =

a1 Cv²1 a2 Sv²1

where − π2 ≤ u ≤ π2 , −π ≤ v < π , a1 , a2 , a3 ≥ 0 are the parameters that define the superquadric size, and ²1 and ²2 are the “squareness” parameters in the latitude and longitude planes, respectively. For the purposes of this research, we have restricted the shape recovery procedure to fit models with 0 ≤ ²1 , ²2 ≤ 1. Local displacements d are computed based on the use of triangular finite elements. Associated with every finite element node i is a nodal vector variable qd,i . We collect all the nodal variables into a vector of local degrees of freedom qd = (. . . , qTd,i , . . .)T , and we compute the local displacement d based on the finite element theory as d = Sqd . S is the shape matrix whose entries are the finite element shape functions.5 2.2.

! ,

a3 Su²1

(7)

where θ b is a quantity that relates to the Part Decomposition Criterion C that is described in Section 3.5. The geometric primitive e is defined parametrically in u ∈ Ä and has global shape parameters qe . For the purposes of this research, we employ a superellipsoid e(v) : [−π, π) → R2 with global shape parameters qe = (a1 , a2 , ²1 )T as two-dimensional shape primitive. A superellipsoid contour is defined by a twodimensional vector sweeping a closed contour in a plane by varying the material coordinate v. The parametric equation of a superellipsoid is given by the formula (Barr, 1981, 1984): Ã

a closed surface in space by varying the material coordinates u and v. The parametric equation of a superquadric is given by the formula (Barr, 1981, 1984):   a1 Cu²1 Cv²2   e(v) =  a2 Cu²1 Sv²2  , (9)

(8)

where −π ≤ v < π, a1 , a2 ≥ 0 are the parameters that define the superellipsoid size in two orthogonal directions in a plane, ²1 is the “squareness” parameter, Su² = sgn(sinv)ksinvk² and Cu² = sgn(cosv)kcosvk² . As a three-dimensional shape primitive, we employ a superquadric e(u, v) : [− π2 , π2 ) × [−π, π) → R3 with global shape parameters qe = (a1 , a2 , a3 , ²1 , ²2 )T . A superquadric surface is defined by a vector sweeping

Parametric Composition of Geometric Primitives

Initially, we will use a single deformable model to fit the initial apparent contour of a human. As a subject moves and attains new postures, her apparent contour changes dynamically and large protrusions may emerge as the result of the motion of her parts (Fig. 13(c)). To represent large protrusions or concavities and their shape evolution in a compact and intuitive way, we introduce a new shape representation based on the parametric composition of geometric primitives. Intuitively, we will represent these protrusions with another geometric primitive. In particular, using the parametric composition, we can describe compactly the shape of the union (in the case of protrusions) or intersection (in the case of concavities) of the defining primitives. Figure 2 depicts an example of composition (union) of two superellipsoids. The composition is performed along v and the composition function was chosen to illustrate how different parts of the defining primitives are expressed at the composed deformable model. Referring to Fig. 2, the shape of the composed deformable model has the shape of the root primitive x0 for values of the material coordinate v that belong to the interval [−π, 0.65π ) ∪ (0.77π, π ). For values of v that belong to the interval [0.65π, 0.77π ], however, the composed

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

195

where δ : [vb , ve ) → [0, 1] is the composition function for the union of two primitives. Specifically, δ(v; vmin , vmax , c) = ζ (c(v − vmin )) − ζ (c(v − vmax )), 1 + tanh(x) , ζ (x) = 2

Figure 2. An example of composition of two superellipsoids: (a) depicts x1 (v1 ) which intersects x0 (v0 ) at points A and B, (b) depicts their composition x(v), and (c) depicts the composition function δ(v; 0.65π, 0.77π, 10).

deformable model has the shape of the intersecting primitive. For simplicity, we formulate the theory of composition in 2D to represent the shape of the boundary of the union of primitives. Let x0 and x1 be two 2D parametric primitives (defined by the mappings C0 : [v0b , v0e ) → R2 and C1 : [v1b , v1e ) → R2 , respectively—where the subscripts b and e denote the beginning and end of the domain) positioned in space so that x1 , the intersecting primitive, intersects x0 , the root primitive, at points A and B. The material coordinates of these two points can be expressed in terms of either the material coordinate v0 of x0 or the material coordinate v1 of x1 . Let v0A and v0B be the values of v0 , and v1A and v1B be the values of v1 at points A and B, respectively (Fig. A.1(a,d)). Without loss of generality, we can assume that we name the points of intersection A and B so that the relation v0A < v0B holds. Based on the above, the shape x of the composed primitive (C : [vb , ve ) → R2 , Fig. 2(b)) can be defined in terms of the parameters of the defining primitives x0 and x1 (Fig. 2(a)) as follows:

where the function ζ : R → [0, 1] approximates the step function and c is a constant that controls the shape of the function δ at the neighborhoods of vmin and vmax . The piecewise linear function h 0 : [vb , ve ) → [v0b , v0e ) maps the material coordinate v of the composed deformable model to the material coordinate v0 of the root primitive. The piecewise linear function h 1 (x) : [v0b , v0e ) → [v1b , v1e ) maps the material coordinate v0 to the material coordinate v1 of the intersecting primitive. The definition of the functions h 0 and h 1 is provided in the Appendix A. The above equations generalize easily to the case of multiple primitives intersecting a single root primitive. If the material coordinates of points Ai and Bi are v Ai and v Bi , where a primitive i intersects the root primitive, then we require that [v Ai , v Bi ] ∩ [v A j , v B j ] = ∅ for every i and j (Fig. 3). Since, each two-dimensional deformable model is discretized into one-dimensional finite elements (line segments), one can check line segment intersection using standard techniques from the area of Computational Geometry. We employ a technique similar to the one described in (Prasad, 1991). In the physics-based framework, the geometric degrees of freedom of a shape (translation, rotation, global and local parameters) form the generalized coordinates qi of a model i, ¡ ¢T qi = qTti , qTθi , qTsi , qTdi ,

(11)

where qti = 8 ti , qθi is the quaternion that corresponds T T T to 8 φi Ri , qsi = (qe , qT ) are the global parameters of the

¡ ¢ x v; v0A , v0B , c ¡ A ¢ −1 ¡ B ¢ ¢¢ ¡ ¡ = 1 − δ v; h −1 0 v0 , h 0 v0 , c x0 (h 0 (v)) ¡ A ¢ −1 ¡ B ¢ ¢ ¡ + δ v; h −1 0 v0 , h 0 v0 , c x1 (h 1 (h 0 (v))), (10)

Figure 3.

Multiple primitives intersecting one root primitive.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR

P2: MHL-NAL/MDR-BIS-RCK

International Journal of Computer Vision

196

KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

shape, and qdi represents the local deformations. If q0 and q1 are the generalized coordinates of the root and intersecting primitives, respectively, then the generalized coordinates of the composed model are given by the formula: ¢T ¡ (12) q = qT0 , qT1 , qTcomp ,

achieved based on forces that the data exert to the surface of the model (Metaxas and Terzopoulos, 1993). Based on Lagrangian dynamics, the simplified equations of motion take the general form of:

where qcomp = (vmin , vmax , c)T are the generalized coordinates of the composition function. Using the notation of Metaxas and Terzopoulos (1993), the relation of the velocity of a point on the model to the generalized coordinates can be expressed using the following formula:

where K is the stiffness matrix. The generalized forces fq are computed from the two-dimensional or threedimensional forces that the data apply to the twodimensional or three-dimensional deformable model, respectively. In particular, the generalized forces fqT are computed from (Metaxas, 1992):

˙ x˙ = Lq,

q˙ + Kq = fq ,

Z fqT

(13)

where L is the jacobian matrix. For the case of a composed deformable model, the jacobian Lcomp can be derived as follows:

=

x˙ (v) = (1 − δ(v)) L0 q˙ 0 + δ(v) L1 q˙ 1 + [x1 (h 1 (h 0 (v))) − x0 (h 0 (v))]

∂δ(v) q˙ ⇒ ∂q

δL1 (x1 − x0 )

¸ ∂δ . ∂qcomp

Therefore, the jacobian Lcomp (v) at a point of the composed model with material coordinate v depends on the jacobians L0 and L1 of the defining models. The degree of dependence is regulated by the values of the composition function. 2.3.

Dynamics of Fitting a Composed Deformable Model

When fitting a model to visual data, the goal is to recover the vector of generalized coordinates q. This is

fTapplied Lcomp du.

(17)

In addition, these generalized forces can be broken down into the following components: ¡ ¢T fq = fqT0 , fqT1 , fqTcomp , ¡ ¢T fqi = fqTt , fqTθ , fqTs , fqTd , i = {0, 1}. i

In summary, · Lcomp = (1 − δ)L0

(16)

Z fqT =

˙ x˙ (v) = Lcomp q. Since the function δ is dependent only on the values of the parameters qcomp then · ¸ ∂δ(v) ∂δ(v) = 01×q0 01×q1 . (14) ∂q ∂qcomp

fTapplied L du,

where fapplied are the external two-dimensional or threedimensional forces that are exerted on the model. For the case of a composed deformable model:

x(v) = (1 − δ(v)) x0 (h 0 (v)) + δ(v) x1 (h 1 (h 0 (v))) ⇒ x˙ (v) = (1 − δ(v)) x˙ 0 (h 0 (v)) + δ(v) x˙ 1 (h 1 (h 0 (v))) ˙ ⇒ + [x1 (h 1 (h 0 (v))) − x0 (h 0 (v))]δ(v)

(15)

3.

i

i

(18)

i

Two-Dimensional Human Body Model Acquisition

Our goal is to automatically segment the apparent body contours of moving humans and to estimate the shape of their body parts without assuming a prior model of the human body or a body part segmentation. In this section, we present a technique for building a twodimensional body model of a subject who performs a set of controlled movements. First, we present a set of movements that when performed by a subject, reveal the structure of the human body. Then, we present the HBPDA which recovers all the moving parts of a subject’s body by exploiting the geometry and the deformation over time of the apparent contour of the subject. The technique is based on the observation that the geometry of the model of an apparent contour and its deformation under motion are related to the structure of the underlying articulated object such as the human body. We decided not to use a prior deformable model

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

because we are interested in accurate estimation of the shape of the human body parts. Our objective is to detect the head, torso, upper body extremities and lower body extremities of a subject. Since, the HBPDA does not employ any prior information about the structure of the human body, we request that the subject performs a set of movements according to a generic, subject invariant protocol that reveals the structure of the human body and ensures that all the major parts of the human body become visible. Assuming that we observe the motion of the subject using a camera whose image plane is parallel to the sagittal plane6 (Fig. 11), one possible set of movements that will lead to the estimation of the shape and determination of the connectivity of the subject’s body parts is described below (Fig. 4). The protocol of movements differs depending on the viewing position of the observer with respect to the subject. Its design is gov-

erned from the principles that the parts that we want to detect (e.g., head, torso, upper body extremities and lower body extremities) should become visible to the observer during the movement. Therefore, if the initial orientation of the subject with respect to the camera is different from the one described above, a different protocol is needed. All the movements start and end in the position where the body is erect and the arms are placed straight against the sides of the body facing medially ( first reference position). Protocol of Movements: MovA 1. Stand still: The subject is requested to stand still for a few moments to acquire her apparent contour. 2. Head movement: The subject tilts her head backwards as far as possible. 3. Left upper body extremities’ movements: In the first phase of the movement, the subject lifts her left arm to the front until the arm reaches the horizontal position. Continuing from this position to the second phase, the subject rotates her hand so that the palm is facing upwards and she flexes the wrist. Then, she subject bends the elbow, bringing the forearm to the vertical position. 4. Right upper body extremities’ movements: The subject lifts her left arm backwards to a comfortable position, in which the arm is not fully occluded by the torso. Then, the subject performs the left upper body extremities’ movements using the right arm. 5. Lower body extremities’ movements: This movement consists of two phases. In the first phase, the subject extends her left leg to the front. When the leg reaches the maximum comfortable position, the subject flexes her foot and then she bends her knee. In the second phase, from the position where the left leg is extended, the subject steps forward and raises her right leg. Again, when that leg reaches a comfortable position, the subject flexes her foot and then she bends her knee.

3.1.

Figure 4.

Protocol of movements: MovA.

197

Human Body Part Decomposition Algorithm

To accomplish the goal of segmenting the apparent contour of a subject and estimating the shape of a subject’s body parts, we use a sequence of images in which movement between the body parts occurs and we employ the HBPDA (Fig. 5). In the following, we describe the criteria associated with HBPDA and the corresponding algorithms.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

198

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

Figure 5. Flow diagram of the Human Body Part Decomposition Algorithm.

The apparent contour or silhouette of a human body is the projection of the locus of points on a human body which separates the visible from the occluded surfaces of the body parts. We assume initially that the human body under observation consists of a single part. A single deformable model is fitted to time-varying image data using the physics-based framework (Metaxas, 1992). As the subject moves her body parts, the geometry of the apparent contour of the body dynamically changes. Since the human body is comprised from several parts which are nominally rigid and move relative to each other then, assuming that we observe only one human, the changes to the apparent contour can be attributed to the event that parts of the body which initially occlude each other become unoccluded as they move. Based on observing these events four mutually exclusive criteria have been developed that lead to the hypothesis about the existence of body parts. • The first criterion (Parametric Composition Invocation Criterion—Section 3.2) detects the existence of large deformations within a deformable model which can be compactly modeled as a composition of parametric primitives. An example where this criterion will apply is the case where the subject lifts her arm towards the horizontal position, and the apparent contour of the arm protrudes from the one of the torso. • The second criterion (Part Decomposition Criterion B—Section 3.4) refers to the case in which a contour is evolving in the interior of a deformable model with large local deformations. An example of the application of this criterion occurs during the observation of the movement of the legs.

• The third criterion (Part Decomposition Criterion C—Section 3.5) refers to the case of body parts which are initially aligned but then move relative to one another. An example where this criterion will apply is when the subject bends her elbow or flexes her wrist. • The last criterion (Part Decomposition Criterion D—Section 3.3) refers to the case where the defining primitives of a composed deformable model move relative to each other. An example where this criterion will apply is the case in which after a composed deformable model has been created to represent compactly large local deformations, movement between its defining primitives verifies the hypothesis of multiple body parts. If any of the above criteria is satisfied then the corresponding algorithm is employed to hypothesize multiple body parts. It should be noted that this algorithm applies to humans of any anthropometric dimension because instead of a deformable template, we concentrate our effort in specifying general criteria sufficient to analyze and to segment the model of the deforming apparent contours from moving humans. Below, the structure of the HBPDA is presented. Algorithm: Human Body Part Decomposition (HBPDA) • Step 1. Initially, assume that the subject’s body consists of a single part. Create a list of deformable models L (with one entry initially) that will be used to model the subject’s body parts. In addition, create a graph G with one node (Fig. 6). The nodes of the graph G denote the parts recovered by the algorithm. The edges of the graph denote which parts are connected by joints. • Step 2. If not all the frames of the motion sequence have been processed, fit the models of the list L to the image data using the physics-based shape and motion estimation framework (Metaxas, 1992) and execute Steps 3 and 4. Otherwise, output L and G. • Step 3. For each (noncomposed) model in L, determine: a: if the Parametric Composition Invocation Criterion is satisfied, b: else if the Part Decomposition Criterion B is satisfied, c: else if the Part Decomposition Criterion C is satisfied, For each composed model in L, determine:

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

199

For each (noncomposed) model in L • If 3a is satisfied, invoke the Parametric Composition algorithm. • If 3b is satisfied, invoke the Part Decomposition B algorithm. • If 3c is satisfied, invoke the Part Decomposition C algorithm. For each composed model in L • If 3d is satisfied, invoke the Part Decomposition D algorithm. In the following, we present the details of the Parametric Composition Invocation Criterion and Part Decomposition Criteria (and the related algorithms), through which one is able to fully segment the apparent contours of a subject’s body. 3.2.

HBPDA—Step 3a

As a subject moves and attains new postures the apparent contour changes dynamically and large protrusions may emerge as the result of the motion of the limbs (Fig. 13(c)). If there is no hole present within the apparent contour and there is a significant deformation of the apparent contour (Fig. 7), we represent the protrusions compactly based on the composition of two primitives. Figure 6. (a) Initially the graph G of the parts consists of one node. The arrow denotes the node that has been refined in the next iteration. (b) At the completion of the head movement the graph consists of one node for the head and one for the rest of the body. (c) At the end of the movement of the left arm, the node for the rest of the body will be refined to consist of one node for the left hand, one for the left forearm, one for the left upper arm, and one for the rest of the body. (d) Similarly, at the end of the movement of the right hand, the graph G will contain nodes for the right upper body extremities also. (e) At the end of the movement of the left leg, the graph G will contain nodes for the left lower body extremities as well. (f) The nodes of the graph at the end of all prespecified movements. (g–h) The body parts that the nodes represent.

d: if the Part Decomposition Criterion D is satisfied (e.g., this criterion is satisfied during the later stages of the movement of the arm with respect to the torso). • Step 4. For each model in L, depending on which criterion (if any) is satisfied invoke the corresponding algorithm.

Parametric Composition Invocation Criterion. Signal the need for parametric composition of primitives if no hole is evolving within the apparent contour and the relation kpi (v, t) − pi (v, tinit )k > K A , where K A is an a priori defined constant, and pi (v, t), pi (v, tinit ) represent the current and the initial shapes (w.r.t. the model-centered reference frame) of the model (of the apparent contour) mi . Algorithm: Parametric Composition • Step 1. Determine the interval Ii of the material coordinate v of the model mi , in which maximum variation of the shape over time is detected. • Step 2. Perform an eigenvector analysis on the data points that correspond to the interval Ii , to approximate the parameters of a new deformable model mi1 which can fit these data points. • Step 3. Construct (as explained in Section 2.2) a composed parametric primitive n with mi and mi1 as defining primitives.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR

P2: MHL-NAL/MDR-BIS-RCK

International Journal of Computer Vision

200

KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

Part Decomposition Criterion D. If the generalized coordinates of a composed model mi (whose defining primitives are the models mi1 andWmi2 ) satisfy the relation k1qt (t) − 1qt (tinit )k > K D1 k1qθ (t) − 1qθ (tinit )k > K D2 , where K D1 and K D2 are two a priori defined constants, tinit is the time that the composed deformable model mi was initialized, 1qt (t) = qti2(t) − qti1(t) and 1qθ (t) = qθi2(t) − qθi1(t) represent the relative translation and orientation of the model-centered coordinate systems of mi1 and mi2 , then decompose mi into its defining primitives. Algorithm: Part Decomposition D • Step 1. Construct two new deformable models ni1 and ni2 using the parameters of the defining models mi1 and mi2 of the composed model mi . • Step 2. Update G and L by replacing mi with ni1 and ni2 . 3.4.

Figure 7. As the protrusions to a shape are evolving over time (a), the deformations of the shape are monitored (shown superimposed in (b) and over time in (c)).

• Step 4. Update the list L by replacing the model mi with the composed model n. 3.3.

HBPDA—Step 3b

The visual event of a hole evolving within the tracked apparent contour indicates that parts which were initially occluded are gradually becoming visible. During this process though, the still occluded regions of the part that is becoming visible are not contiguous, thus, the appearance of the hole (Fig. 8). Therefore, if there is a hole present within the apparent contour whose shape has changed considerably, the parametric composition algorithm is not invoked. Instead, the evolution of the hole is monitored and the Part Decomposition B algorithm is invoked only when the hole ceases to exist (provided that the apparent contour is still deformed w.r.t. the initial shape).

HBPDA—Step 3d

For Step 2 of the HBPDA, the models of the updated list L are continuously fitted to the time-varying image data, using the all neighbors force assignment algorithm (Kakadiaris et al., 1997). If the parameters of a composed deformable model indicate that its defining primitives are moving with respect to one another, then this signals the presence of two distinct parts. Therefore, it verifies the hypothesis that the deformation of the apparent contour is the result of two parts moving relative to each other.

Figure 8. When the occluded leg gradually becomes visible, a new contour evolves within the apparent contour.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

Part Decomposition Criterion B. If an evolving closed contour within the apparent contour ceases to exist and for a model mi the relation kpi (v, t) − pi (v, tinit )k > K A , where K A is the same a priori defined constant as in Parametric Composition Invocation Criterion, and pi (v, t), pi (v, tinit ) represent the current and the initial shapes (w.r.t. the model-centered reference frame) holds, then invoke the Part Decomposition B algorithm. Algorithm: Part Decomposition B • Step 1. Determine the interval Ii0 (Ii0 ⊂ Ä) of the material coordinate v of the model mi , in which maximum variation of the shape over time is detected. • Step 2. Monitor the evolution of the shape of the deformable model mi and cluster its finite elements based on their change of orientation. Determine the interval Ii1 , Ii1 ⊆ Ii0 , for which consistent change of the orientation of the finite elements within the element domain Ii0 of the deformable model is detected. • Step 3. If n (n > 1) clusters are recovered, perform an eigenvector analysis on the data points that correspond (e.g., are applying forces) to these finite elements to approximate the parameters of the deformable models mi j , ( j = 1, . . . , n) which can fit these data points. • Step 4. Update the list L by replacing the model mi (t) with the models mi (tinit ) and mi j , ( j = 1, . . . , n).

Figure 9.

3.5.

201

HBPDA—Step 3c

The time-varying image contour data from an elastic object which bends or from the rotating parts of an articulated object (assuming that the parts are connected at their endpoints and do not fully occlude each other initially) can be fitted using a deformable model which undergoes bending deformation. An example of such a movement is shown in Fig. 9. To determine the existence of multiple parts, the values of the model’s bending parameters are monitored. The first part of the criterion7 ensures that the object under observation undergoes a bending deformation (undergoing a transition from an unbent to a bent state or vice versa). If the error of fit ψ(t) (computed as leastsquares) does not change during the bending process, then we observe an elastic object that bends. However, if the error of fit does change then this is an indication that we observe multiple objects rotating relative to each other or multiple parts of the same object rotating relative to each other. As stated previously, the list L contains the deformable models mi , i = {1, . . . , n} that model the objects and their parts in a scene. Let θ b (t) be the θ b bending parameter of a deformable model mi at time t (Eq. (6)) and θ b (tinit ) be the bending parameter of the model fitted to the data at time tinit when the model was initialized. In addition, let ψ(t) and ψ(tinit ) be the errors of fit of the model mi fitted to the data at times t and tinit , respectively. Part Decomposition Criterion C. If the bending parameters of a model mi satisfy the relation (θ b (t) − θ b (tinit )) > K C1 and the error of fit of the model (using

Bending of the model of the apparent contour of a multipart object indicates the existence of multiple parts.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

202

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

Kakadiaris and Metaxas

only global deformations) to the data satisfies the relation (ψ(t) − ψ(tinit )) > K C2 , where K C1 and K C2 are a priori defined constants, then the underlying object is not an elastic object, but it is composed from multiple parts. Therefore, invoke the Part Decomposition C algorithm to replace the model mi with two new models mi1 and mi2 . Algorithm: Part Decomposition C • Step 1. Based on the bending parameters b1i and b2i (see Eq. (4)) of the deformable model mi , identify the data points that correspond to the fixed and relocation zones of the bent model mi and mark them as to be modeled by the two new models mi1 and mi2 , respectively. However, the data points that correspond to the bending zone of the model mi are marked as orphan data points since it is uncertain as to which of the two new models they should be assigned. This is necessary since we do not know in advance the shape of the underlying parts. • Step 2. Perform an eigenvector analysis on the data points that correspond to the fixed and relocation zones of the model mi , to approximate the parameters of two deformable models mi1 and mi2 which can describe these data. • Step 3. Update the list L, by replacing the model mi with the two new models mi1 and mi2 . For Step 2 of the HBPDA, the models included in list L are fitted to the image data, using the all neighbors force assignment algorithm (Kakadiaris et al., 1994). In addition, the instant rotation center around which a model rotates (for the case of two-dimensional deformable models) can be determined using the method described in the following section. 4.

P2: MHL-NAL/

17:0

Implementation of the HBPDA

In this section, the modeling and the control of the processes involved in the system implementing the HBPDA, using the Discrete Event Systems theory is described. First, the Supervisory Control Theory of Discrete Event Systems is presented and then the models for the participating processes. Our design of the system implementing the HBPDA has been dictated in part by the following considerations: 1. In fitting a deformable model to time-varying image data, the global parameters of the deformable

model are fitted first, followed by the local parameters. In both cases the fitting is accomplished using the equation of motion (15), but the number of degrees of freedom is different for each case. Therefore, each deformable model is in a different state according to which parameters are being fitted. 2. During the continuous fitting of the deformable model to the image data, its parameters may attain values that trigger events signaling the need for parametric composition or part decomposition. These events are discrete. 3. During the evolution of the task of part segmentation, the number of deformable models that are used to describe the structure of the image data is changing over time. There is a need to handle the complex interactions between them according to their state. Therefore, our system is clearly a dynamic system. A dynamical system in which changes in the state occur at discrete instances of time is a discrete event system. The Supervisory Control Theory of Discrete Event Systems (DES) developed by Ramadge and Wonham (1989) allows us to encapsulate both the discrete and the continuous aspects of the fitting and segmentation processes. In addition, it allows us to formulate in a compact and elegant way the basic concepts and algorithms for our problem. Therefore, the addition of a new deformable model in the system can be done in a modular and hierarchical fashion. Using the DES framework allows us to decompose the overall design into components modeled by Finite State Machines (FSM) and provides efficient methods to investigate the behavior of the participating processes which operate concurrently. Let us first describe the components of the system that implements HBPDA in terms of the DES theory. There are three components to the system: the plant, the observer(s) and the supervisor (Fig. 10(a)). The set of all deformable models that are being fitted to the image data at each time instant constitutes the plant. The observers are processes that monitor the quantitative changes to the shape of the modeling primitives and send messages to the supervisor. The supervisor is the process that controls the behavior of the system, handles the complex interactions between the fitting processes and invokes the appropriate parametric composition or part decomposition algorithms depending on the input from the observers. Therefore, the observers provide feedback to the supervisor regarding the state of the plant. The three components of the system (Fig. 10(b)) are communicating

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

203

of the deformable model in order to infer the state of the deformable model. To drive each deformable model to the desirable state, a supervisor is designed. The function of the supervisor is to select the appropriate action depending on the state of the observer processes (see Fig. B.3) in order to ensure correct behavior of the plant. Therefore, the supervisor provides feedback control. In Appendix B, we examine in detail how each of the three components of the system is designed. 5.

Three-Dimensional Human Body Model Acquisition

Figure 10. (a) The components of a dynamic system according to the Supervisory Control Theory of Discrete Event Dynamic systems. (b) Communication channels between the plant, the observers and the supervisor.

In this section, we present how to combine twodimensional information from multiple views, in order to estimate the three-dimensional shape of a subject’s body parts (see Figs. 11 and 12). In particular, we present a new algorithm which selectively integrates the (segmented by the HBPDA) two-dimensional apparent contours from three mutually orthogonal views. The initial pose of the subject with respect to the cameras is such that the image plane of the first camera is parallel to the sagittal plane of the subject, the image plane of the second camera is parallel to her coronal plane and the third camera overlooks the scene and is parallel to the transverse plane of the subject (Fig. 11). Due to the complexity of a human’s body shape, and due to occlusion among the body parts, the subject is requested to perform a set of movements according to the protocol MovB that incrementally reveals the structure

through a set of channels CC = {Ci , Mi , Oi }, where i = {1, . . . , n}, where n is the number of deformable models used to describe the image data at each time instant, Ci is the set of channels through which the supervisor sends control commands to each fitting process (each of the deformable models), Mi is the set of channels through which each fitting process sends messages to the supervisor, and Oi is the set of channels through which the observers are sending messages to the supervisor. Each participating process of the system is modeled as a nondeterministic finite state machine. In the implementation of HBPDA, for each of the deformable models used at every time instant to fit the image data, there is a corresponding fitting process whose design is shown in Fig. B.1. These processes, which constitute the plant, run in parallel and are synchronized by a global clock. For each fitting process there is an observer process (Fig. B.2) that monitors the parameters

Figure 11. (a) The movements of the subject are observed by three cameras placed orthogonally to each other, (b) view from the side camera, (c) view from the front camera, (d) view from the top camera.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

204

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

Figure 12. (a) Results of the Integration algorithm, and (b–e) results of the Intersection algorithm for a simple example. In particular, assuming that (b) is the initial three-dimensional shape of a part, (c) is the two-dimensional model of the apparent contour of one of its subparts, then (d,e) are the three-dimensional shapes of the part and the subpart, respectively.

of the body and allows the integration of information from multiple views. The movements are performed in sequence. Each movement starts at the end of the previous one. At the end of each movement the subject remains still to signal the end of movement. Protocol of Movements: MovB 1. Stand still: The subject just stands still for a few moments, in order for the system to acquire the apparent contour. 2. Tilt head back: The subject tilts her head backward as far as possible and then she returns to the first reference position. 3. Lift arms: The subject lifts her arms to the side until the arms reach the horizontal position (that is the second reference position).

4. Flex wrists: The subject flexes her wrists downwards as far as possible and then she returns to the second reference position. 5. Bend elbows: The subject bends her elbows, bringing the forearms to the vertical position and then she returns to the second reference position. 6. Extend legs side: The subject extends her left leg to the side and then she returns to the second reference position. Next, she extends her right leg to the side and then she returns to the second reference position. 7. Extent legs front: The subject extends her left leg to the front and then she returns to the second reference position. Next, she extends her right leg to the front and then she returns to the second reference position. 8. Flex left leg: The subject flexes her left foot, bends her left knee and then she returns to the second reference position.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

205

Table 1. Specification of the active views during each movement, of the method employed (IG stands for the Integration algorithm and IN for the Intersection algorithm) for the estimation of the three-dimensional shape of the parts, and of the parts whose three-dimensional shape is obtained. Step

Movement

1

Stand still

2

Tilt head back

3

Lift arms

4

Flex wrists

5

Bend elbows

6

Extend legs side

7 8

Extent legs front Flex left leg

9

Flex light leg

Side

Active views Front

√

√

√

3D shape estimation method

√

√ √

IN

Head

IG

Arms

IN

Hands

IN

Upper and lower arms

IG IN

Legs and torso Left thigh, left lower leg and left foot

IN

Right thigh, right lower leg and right foot

√ √ √

√

Our approach is to first build a single threedimensional model of the human standing, and then to incrementally refine and decompose it by extracting the three-dimensional models of the different parts as they become visible to the different views. At each stage of the algorithm, the image data from a specific movement are processed. The stages of the algorithm are summarized in Table 1. Intersection Algorithm

For each movement, the apparent contours from each active view are fitted using the techniques described in the previous section. Therefore, for each active view there is a set of two-dimensional deformable models that fit the corresponding image data. Due to occlusion, the number of deformable models in each of the views may not be the same. Consequently, depending on the type of movement a new part or parts may be detected in some of the two-dimensional apparent contours. The new part is either a previously unseen body part or a subpart of a part whose three-dimensional model has already been recovered. In the first case, the two-dimensional models of the corresponding ap-

3D parts acquired

IG

√

9. Flex right leg: The subject flexes her right foot, she bends her right knee and then she returns to the second reference position.

5.1.

Top

parent contours of a part are integrated using the algorithm described in Section 5.2 (see also Fig. 12(a)). In the second case, the model of the apparent contour of the subpart is intersected with the model for the three-dimensional shape of the part to obtain two new three-dimensional shapes, the three-dimensional shape of the subpart and the three-dimensional shape of the rest of the part (Fig. 12(b–e)). 5.2.

Integration Algorithm

In the following, the algorithm for the integration of the two-dimensional models of two apparent contours is described. The input to the algorithm is the twodimensional models of the apparent contours of the part as observed from two mutually orthogonal views, and the spatial relation between the views. A threedimensional deformable model is initialized and the nodes that lie on its meridians whose plane is parallel to the planes of the apparent contours are fitted to the nodes of the two-dimensional models. Local deformations are employed to capture the exact shape of the two-dimensional models. The rest of the threedimensional shape is interpolated. Due to the fact that the local deformations may be large in magnitude and to avoid shape discontinuities between the fitted nodes and the rest of the shape, a thin-plate deformation energy is imposed during the fitting process. Therefore, as shown in Fig. 12(a), the deformable model fits

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

206

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

Kakadiaris and Metaxas

accurately the two-dimensional models, while its shape in between the apparent contours (in the absence of any image data) varies smoothly. It should be noted that the three-dimensional shape of the new part is obtained at the end of the appropriate set of movements. This is a desirable solution because the shape fitting is done in 2D continuously and the three-dimensional shape estimation is done only once, making the approach computationally efficient. 6.

Experimental Results

This section presents selected experiments from the ones that were carried out on real image sequences in order to evaluate the effectiveness of the proposed approach in general and of the specific algorithms in particular. In all the experiments, the region of interest has been obtained by subtracting the current image from the known background and the apparent contours have been obtained by applying a variation of the Canny edge detector to the input image sequence (Canny, 1986). Edge following is employed to detect the presence of a closed contour within the apparent contour, which we referred to as a hole within the apparent contour. The values of the a priori constants for the HBPDA are the following: K A = 0.5 min(a1 , a2 ) (where a1 and a2 are the parameters that define the superellipsoid size in two orthogonal directions in a plane—see Section 2.1), K C1 = 15 deg, K C2 = 0.2ψ(tinit ), K D1 = 10 pixels and K D2 = 15 deg. These values were experimentally determined and remained the same for all the experiments. The local deformation stiffness parameters of the initial model were set to w0 = 0.1 and w1 = 0.2, and the time step for the Euler method was 10−6 s. 6.1.

P2: MHL-NAL/

17:0

Experiment I: Parametric Composition and Part Decomposition D

This experiment was designed to evaluate the performance of the Part Composition Invocation Criterion, which is described in Section 3.2, and the performance of the Part Decomposition D algorithm, which is described in Section 3.3. In this experiment, the subject, starting from the first reference position, lifted her left arm until the arm reached the horizontal position. Figures 13(a)–(e) show five frames from the image sequence. Figure 13(ra) shows the fitted model

to the first frame using global and local deformations. Figures 13(rb) and (rc-1) show the results of fitting the apparent contours in Figs. 13(b) and (c), respectively. In Fig. 13(rc-1) notice the deformation from the initial shape of the model which is quantified in Fig. 13(rc-2). In this frame, the Parametric Composition Invocation Criterion was satisfied and therefore the Parametric Composition algorithm was invoked. The composed model, depicted in Fig. 13(rc-3) is fitted to the data in Fig. 13(c), and its defining primitives are shown in Fig. 13(rc-4). Figure 13(rd-1) shows the fitting result (composed model) to the data in Fig. 13(d). In this frame, the Part Decomposition Criterion D was satisfied and the corresponding algorithm was invoked in order to recover the underlying parts. Figure 13(rd-2) shows the recovered models for the parts, while Fig. 13(re) shows the fitting results of these recovered models to the data in Fig. 13(e). Notice that there are multiple superellipsoids (each with different length) that can fit the data. Since we are observing the relative motion of connected parts, the algorithm automatically adjusts the length of the superellipsoids so that the estimated instant rotation center lies inside them. In our implementation, the length of the fitted superellipsoid is adjusted to be 5% longer than the maximum distance (measured along the longest axis of the superellipsoid) between the instant rotation center and the visible pole of the superellipsoid. 6.2.

Experiment II: Part Decomposition B

The second experiment was designed to demonstrate the Part Decomposition Criterion B and Part Decomposition B algorithm, which are described in Section 3.4. In the case that parts of an articulated assembly are totally occluded at the initial position, as the parts move relative to each other and become gradually unoccluded, a hole might evolve within their apparent contour. In this experiment, the subject starting from the first reference position extended her left leg to the front until she reached a comfortable position. Figures 14(a)–(f) show six frames from the image sequence. Figures 14(ra)–(rd) depict the models fitted to the data in Figs. 14(a)–(d). While fitting the data in Fig. 14(e) the Part Decomposition Criterion B was satisfied and therefore the corresponding part decomposition algorithm was invoked. The two models recovered are depicted in Fig. 14(re), while Fig. 14(rf) shows the results fitting these models to the data in Fig. 14(f).

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

Figure 13.

6.3.

207

Two-dimensional part segmentation, shape and motion estimation of a human arm.

Experiment III: HBPDA

The purpose of this experiment was to demonstrate the results of the Human Body Part Decomposition Algorithm when applied to the image data from observing a subject moving according to the protocol MovA (Section 3). Figures 15(a) and (b) show two frames from the movement where the subject tilted her head back. Initially, when the subject was standing still at the first reference position, only one deformable model was used to fit the data (Fig. 15(c)). However, at the end of the head movement the models for the head and the rest of the body were recovered as depicted in

Figs. 15(d) and (e). As the subject lifted her left arm, the shape of the left upper body extremities was estimated (Fig. 15(f)). Since the arm is kept straight during the movement, the algorithm cannot identify the hand, the forearm and upper arm as distinct body parts. But when the subject flexed her wrist and bent her elbow, the models for these parts were estimated as shown in Figs. 15(g) and (h). If instead of lifting her arm the subject lifts her leg, then the model for the left lower body extremities is recovered as shown in Fig. 15(m). When the subject flexed the foot (Figs. 15(i) and (j)) and bent the knee (Figs. 15(k) and (l)), this model was replaced by three models; one for the left foot, one for

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

208

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

Kakadiaris and Metaxas

Figure 14.

Two-dimensional part segmentation, shape and motion estimation of a human leg.

the left lower leg and one for the left thigh as shown in Figs. 15(n)–(q). Figure 15(r) depicts these models when the subject returned to the first reference position. 6.4.

P2: MHL-NAL/

17:0

Experiment IV: Three-Dimensional Human Body Model Acquisition

The final experiment was designed to demonstrate the feasibility of estimating the three-dimensional shape of body parts by selectively integrating the (segmented by HBPDA) two-dimensional apparent contours from multiple views. The subject performed the movements specified by protocol MovB. First, the twodimensional models for the apparent contours at the side and front view were integrated to obtain a single three-dimensional model of a subject’s body. The position of the origin of the three-dimensional deformable model is determined based on information provided by the calibration of the three cameras. Any errors on the calibration of the cameras will result in errors in the three-dimensional shape of the part. During the movement of the left arm and when the arm reached the horizontal position, the Integration Algorithm (Section 5.2) was invoked to estimate the shape

of the left upper body extremities. The inputs to this algorithm were the two-dimensional models of the segmented apparent contours from the side and top view. Figures 16(a)–(d) show several views of the recovered left arm of the subject, while Figs. 16(e)–(g) show the models for the hand, forearm and upper arm correspondingly. Figure 16(h) shows the spatial layout of these parts. Figures 16(i)–(k) show three views of the recovered model for the torso. Finally, Figs. 16(l)– (q) show several views and the corresponding parts of the left leg. 6.5.

Evaluation of the Algorithm

In the following, we evaluate the performance of our algorithm in estimating the shape of body parts. In this work, we assume that the subject wears tight clothes, that she moves against a stationary background, and that changes in the lighting do not affect the topology of the extracted figure from the background. Indeed, if the subject does not wear tight clothes, the results only approximate the true body dimensions. Concerning the figure background segmentation, one alternative is to employ the background subtraction algorithm proposed by Russel et al. (1995). The

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

Figure 15.

Two-dimensional part segmentation, shape and motion estimation (subject Julie).

209

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

210

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

Kakadiaris and Metaxas

Figure 16.

Three-dimensional human body shape estimation (subject Julie).

17:0

P2: MHL-NAL/

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR

P2: MHL-NAL/MDR-BIS-RCK

International Journal of Computer Vision

KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

Table 2.

Select information for some of the subjects.

Name

Sex

Age (years)

Height

Weight (lb)

Gaylord Ioannis Julie Stasi

M M F M

34 30 28 5

5 ft 8 in. 5 ft 11 in. 5 ft 11 in. 3 ft 6 in.

150 175 130 50

Right upper arm Length (in.) Width (in.) 11.25 10.50 10.50 8.50

system is initialized by acquiring 40 images of the static background scene and the mean and variance of the red, green, blue and luminance values of each pixel location are computed. After initialization, each pixel can be classified as part of the background based on four conditions and two thresholds. As far the changes in the lighting, our assumption ensures that any changes in the topology of the observed object are due to the existence of multiple parts. The task for which the models are to be used (tracking in our case) imposes the criteria upon which to evaluate the performance our algorithms. Subject Invariant. We have performed several experiments with subjects of different age and sex. Table 2 presents select information for the subjects for whom we present the models of the shape of their body parts. In all cases we were able to recover a shape model of the body parts requested (Fig. 17).

Figure 17.

10.50 10.50 10.00 7.50

Right forearm Length (in.) Width (in.) 13.00 11.00 12.50 7.50

11.50 11.50 11.00 7.50

211

Right hand Length (in.) Width (in.) 6.50 6.50 7.00 5.00

3.25 3.50 3.00 2.50

Accuracy. In the following, we provide a statistical analysis of the errors in estimating the threedimensional shape models of a subject’s body parts. Our goal is to compare the mesh recovered by integrating the deformable models of the apparent contours from multiple views of a subject’s arm with the mesh obtained through a range scanner. The experimental protocol is the following: First, we have created a plaster cast of the right lower arm and hand of the subject Ioannis. Next, the plaster cast was scanned using the Cyberware® scanner and the resulting meshes were integrated using the methods presented in (Pito, 1996), to form a complete and accurate three-dimensional mesh (mesh A) of the arm (Fig. 19(a)). In addition, we obtained two views of the plaster cast—which was suspended from the ceiling of our 3D studio (Kakadiaris and Metaxas, 1996) using strings—using the top and front camera of the 3D studio (Figs. 18(a) and (b)). Using the techniques

Three-dimensional models of the right upper arm and forearm (a, b) subject Gaylord and (c, d) subject Ioannis.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

212

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

Figure 18. Top (a) and front (b) view of a plaster model of the right arm of the subject Ioannis. Front (c) and back (d) view of the model obtained by scanning the plaster model using a Cyberware® scanner.

Figure 20. Histogram of the distance error between meshes A and B, (b) histogram of the discrepancy in their normals. Figure 19. Mesh model of plaster cast of Ioannis’ arm (a) as acquired by the Cyberware scanner mesh A, and (b) as reconstructed by integrating the apparent contours from two views mesh B.

described in Section 5, we have obtained a mesh model (mesh B) for the plaster cast (Fig. 19(b)). To compare the two meshes first we aligned them using the ICP algorithm (Besl and McKay,1992). Then, for each node in mesh A, we compute the nearest node in mesh B and we report the difference in the position of the two nodes as distance error of the mesh B. The statistics of the distance error are the following: min error = 0.001 mm, max error = 6.736 mm, mean = 1.459 mm and std = 1.170 mm. Figure 20(a) depicts a histogram of the distance error between meshes A and B. In addition, for each node in mesh A we compute the angle of its normal w.r.t. the normal of the nearest node in mesh B. Figure 20(b) depicts a histogram of the discrepancy in the normals between the two meshes. However, the accuracy should be considered with respect to the task at hand (tracking for our case). Indeed, using the models

recovered using the techniques described in this paper, we have successfully tracked body parts moving in 3D (Kakadiaris and Metaxas, 1996). The experiments that were carried out have proven that it is possible to determine a subject’s body parts from an image sequence of the subject performing a set of controlled movements. Moreover, it is possible to estimate their shape in 3D. The three-dimensional shape estimation is accurate up to the integration of information from the three views. Therefore, if one wishes to obtain accurate anthropometric data, the subject should wear tight clothes. 7.

Conclusions

This paper presented a novel, integrated approach to the three-dimensional shape model acquisition of a subject’s body parts from multiple views. First, we have related the deformation of the silhouettes of a subject that performs movements in a plane

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

perpendicular to the viewing geometry, to the part structure of the human body. We have shown how a spatiotemporal parameterization of the deformations of the apparent contours leads to the hypothesis of multiple parts and how this hypothesis is verified. Second, we have developed a Human Body Part Decomposition Algorithm that recovers all the body parts of a subject by monitoring the deformations of the shape of the initial model of the apparent contour. Third, we have formalized the algorithm related to this approach using the Supervisory Control Theory of Discrete Event Dynamic systems which allowed for a modular and hierarchical design, and provided the means for controlling the process of fitting and part determination. Finally, we have presented a simple, computationally efficient method for selectively integrating sets of twodimensional deformable models recovered by segmenting two-dimensional apparent contours obtained during a controlled experiment. This has allowed us to obtain the three-dimensional shape of a subject’s body parts. Such a model can be used in many applications including model-based human tracking which is becoming increasingly popular in vision (Gavrila and Davis, 1996; Kakadiaris and Metaxas, 1996). Acknowledgments We would like to thank Dr. Richard Pito for providing a scanned model of the arm. This work is primarily supported by NSF grant MIP94-20397. Additional support comes from: NSF grants DMI95-12402 and SBR89-20230, ARO grant DAAH04-96-1-0007, and DARPA grant N00014-92-J-1647. Appendices A. Parametric Composition of Geometric Primitives

213

the root primitive. The piecewise linear function h 1 (x) : [v0b , v0e ) → [v1b , v1e ) maps the material coordinate v0 to the material coordinate v1 of the intersecting primitive. For the definition of h 1 (x), four cases have to be distinguished. Let I0 be the curve segment of x0 which lies in the interior of the union of x0 and x1 (Fig. A.1(b)), and J0 be the curve segment of x0 which belongs to the boundary of their union (Fig. A.1(c)). We define I1 and J1 in a similar way (Fig. A.1(e,f)). Intuitively, in composing two primitives to represent the boundary of their union, we want to map I0 to J1 and J0 to I1 . However, depending on the position of the point C0 (v0b ) (if it belongs or not to I0 ), the curve I0 can be the map of either a single continuous interval or of a union of continuous intervals. For example, a superellipsoid x0 (v0 ) is defined by the mapping C0 : [−π, π ) → R2 as depicted in Fig. A.1(a). When a superellipsoid x1 intersects x0 , the point C0 (−π ) either belongs to I0 (Fig. A.1(i,k)) or not (Fig. A.1(h,j)). In the first case, I0 is the map of the union of two continuous intervals I0 = {x0 (v0 ) : v0 ∈ (v0B , π) ∪ [−π , v0A ]}. In the second case, I0 is the map of a single interval I0 = {x0 (v0 ) : v0 ∈ (v0A , v0B )}. This distinction arises from the fact that the open interval [−π , π ) and the closed curve x0 are not homeomorphic.8 Therefore, depending on whether the curve segments I0 , J0 , I1 and J1 are maps of a single continuous interval or of a union of two continuous intervals we can distinguish four cases for the definition of function h 1 (x). This function maps I0 to J1 and J0 to I1 . In particular, we can distinguish the following cases: Case 1. For the first case (Fig. A.1(h)), the following relations hold: © I0 = C0 (v0 ) : v0 © J0 = C0 (v0 ) : v0 © I1 = C1 (v1 ) : v1 © J1 = C1 (v1 ) : v1

¡ ¢ª ∈ v0A , v0B , £ ¢ £ ¤ª ∈ v0B , π ∪ −π , v0A , £ ¢ £ ¤ª ∈ v1B , π ∪ −π , v1A , and ¡ ¢ª ∈ v1A , v1B .

As explained in Section 2.2, the shape x of the composed primitive (C : [vb , ve ) → R2 ) can be defined in terms of the parameters of the defining primitives x0 and x1 as follows: ¢ ¡ x v; v0A , v0B , c ¡ A ¢ −1 ¡ B ¢ ¢¢ ¡ ¡ = 1 − δ v; h −1 0 v0 , h 0 v0 , c x0 (h 0 (v)) ¡ A ¢ −1 ¡ B ¢ ¢ ¡ + δ v; h −1 0 v0 , h 0 v0 , c x1 (h 1 (h 0 (v))),

If C0 (−π ) ∈ J0 and C1 (−π ) ∈ I1 , then

The piecewise linear function h 0 : [vb , ve ) → [v0b , v0e ) maps the material coordinate v of the composed deformable model to the material coordinate v0 of

The function f 1 : (v0A , v0B ) → (v1A , v1B ) maps I0 to J1 , and the function f 4 : [v0B , π) ∪ [−π , v0A ] → [v1B , π ) ∪ [−π , v1A ] maps J0 to I1 .

( h 1 (x) =

¡ ¢ f 1 (x) x ∈ v0A , v0B £ ¢ £ ¤ f 4 (x) x ∈ v0B , π ∪ −π , v0A .

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

214

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

Fig. A.1. (a–c) Notation pertaining to x0 , (d–f) notation pertaining to x1 , (g) the interval [−π , π ) and a closed curve are not topologically equivalent, and (h–k) possible positions of the points C0 (−π ) and C1 (−π ).

Case 2. For the second case (Fig. A.1(i)), the following relations hold: © I0 = C0 (v0 ) : v0 © J0 = C0 (v0 ) : v0 © I1 = C1 (v1 ) : v1 © J1 = C1 (v1 ) : v1

¡ ¢ £ ¢ª ∈ v0B , π ∪ −π, v0A , £ ¤ª ∈ v0A , v0B , £ ¢ £ ¤ª ∈ v1A , π ∪ −π, v1B , and ¡ ¢ª ∈ v1B , v1A .

If C0 (−π) ∈ I0 and C1 (−π) ∈ I1 , then ( h 1 (x) =

¡ ¢ £ ¢ f 2 (x) x ∈ v0B , π ∪ −π , v0A £ ¤ f 3 (x) x ∈ v0A , v0B .

The function f 2 : (v0B , π) ∪ [−π, v0A ) → (v1B , v1A ) maps I0 to J1 , and the function f 3 : [v0A , v0B ] → [v1A , π) ∪ [−π, v1B ] maps J0 to I1 .

Case 3. For the third case (Fig. A.1(j)), the following relations hold: © I0 = C0 (v0 ) : v0 © J0 = C0 (v0 ) : v0 © I1 = C1 (v1 ) : v1 © J1 = C1 (v1 ) : v1

¡ ¢ª ∈ v0A , v0B , £ ¢ £ ¤ª ∈ v0B , π ∪ −π , v0A , £ ¤ª ∈ v1B , v1A , and ¡ ¢ £ ¢ª ∈ v1A , π ∪ −π , v1B .

If C0 (−π ) ∈ J0 and C1 (−π ) ∈ J1 , then ( h 1 (x) =

¡ ¢ f 3 (x) x ∈ v0A , v0B £ ¢ £ ¤ f 2 (x) x ∈ v0B , π ∪ −π , v0A .

The function f 3 : (v0A , v0B ) → (v1A , π) ∪ [−π, v1B ) maps I0 to J1 , and the function f 2 : [v0B , π) ∪ [−π , v0A ] → [v1B , v1A ] maps J0 to I1 .

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR

P2: MHL-NAL/MDR-BIS-RCK

International Journal of Computer Vision

KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

Case 4. For the fourth case (Fig. A.1(k)), the following relations hold: © I0 = C0 (v0 ) : v0 © J0 = C0 (v0 ) : v0 © I1 = C1 (v1 ) : v1 © J1 = C1 (v1 ) : v1

¡ ¢ £ ¢ª ∈ v0B , π ∪ −π, v0A , £ ¤ª ∈ v0A , v0B , £ ¤ª ∈ v1A , v1B , and ¡ ¢ £ ¢ª ∈ v1B , π ∪ −π, v1A

If C0 (−π ) ∈ I0 and C1 (−π) ∈ J1 , then ( ¡ ¢ £ ¢ f 4 (x) x ∈ v0B , π ∪ −π, v0A h 1 (x) = £ ¤ f 1 (x) x ∈ v0A , v0B . The function f 1 : [v0A , v0B ] → [v1A , v1B ] maps J0 to I1 , and the function f 4 : (v0B , π) ∪ [−π, v0A ) →(v1B , π) ∪ [−π, v1A ) maps I0 to J1 . The linear functions that allow mappings between intervals are defined below. Let A = [ab , ae ), B = [bb , be ), C = [cb , ce ) and D = [db , de ) be four continuous intervals with corresponding lengths l A = (ae − ab ), l B = (be − bb ), lC = (ce − cb ) and l D = (de − db ). • To linearly map A to C, we define the function f 1 : A → C such that: µ ¶ lC ae cb − ab ce x+ f 1 (x) = lA lA • To linearly map the union A ∪ B (assuming that ae 6= bb ) to C, we define the function f 2 : A ∪ B → C such that: ( cb + λ(x − ab ) x∈A f 2 (x) = cb + λl A + λ(x − bb ) x ∈ B where λ = (l A l+C l B ) . • To linearly map A to the union B ∪ C (where be 6= cb ), we define the function f 3 : A → B ∪ C such that: ( p(x) < l B bb + p(x) f 3 (x) = cb − l B + p(x) p(x) ≥ l B where p(x) = λ(x − ab ) and λ = (l B l+A lC ) . • To linearly map the union A ∪ B to the union C ∪ D, we define the function f 4 : A ∪ B → C ∪ D such that: ( cb + p(x) p(x) < lC f 4 (x) = db − lC + p(x) p(x) ≥ lC

215

where the function p(x) is defined as: ( x∈A λ(x − ab ) p(x) = λ(x + l A − bb ) x ∈ B and λ =

(lC + l D ) . (l A + l B )

B. HBPDA: Modeling the Participating Processes Each participating process of the system is modeled as a nondeterministic finite state machine. The states of this finite state machine correspond to a particular discretization of the evolution of the task over time. Transitions between these states are caused by discrete events. All events from observations, computations and communications are treated in a uniform way. The set of events is the union of (a) the uncontrollable events, 6u , (which can be observed but cannot be prevented from occuring, such as the arrival of new image data), (b) the controllable events, 6c , (which can be prevented from occuring, such as enable local deformations or which can be induced by the supervisor), and (c) the tick of a global clock event. Some of these events can be expressed in the form (guard → operation), where guard is a boolean valued expression. If the value of the guard is true, the operation (e.g., send/receive message) is executed. The rest of the events are expressed in the form (guard), where the value of guard controls the transition between states. In this system, operations can have two forms depending on whether a message or command is being sent or received. pi ?qi is a communication operation, in which a process receives a message or command qi from channel pi . pi ! qi is also a communication operation and denotes sending of a message or command qi through channel pi . While messages convey information related to the state of the system (e.g., values of variables), commands are executed when received. In the following, we examine in detail how each of the three components of the system is designed. Formally, all events which a component can produce or receive are modeled by the symbols of a finite alphabet 6. The behavior of the component is characterized in terms of strings over this alphabet. Let 6 ∗ represent all finite strings over 6 and let the subset L ⊆ 6 ∗ represent all event trajectories which are physically possible for this component. When the language L is regular, there exists some finite automaton G such that L is generated/accepted by G. This automaton G is a 5-tuple (Ramadge and Wonham, 1989):

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

216

P2: MHL-NAL/

P3: AKO-SUK

KL661-02-KAKADIRIS

November 3, 1998

16:8

Kakadiaris and Metaxas

G = (S, 6, γ , s0 , Se ), where S is the set of states, 6 is the set of events, γ : 6 × S → S is the transition function, s0 is the initial state, and Se ⊂ S is the set of states that denote the completion of the task (e.g., completion of fitting). In the implementation of HBPDA, for each of the deformable models used at every time instant to fit the image data, there is a corresponding fitting process whose design is shown in Fig. B.1. These processes, which constitute the plant, run in parallel and are synchronized by a global clock (i.e., all of them have to complete an iteration before each one of them proceeds to the next iteration). In addition, they receive commands from the supervisor and send messages back. The states of each fitting process correspond to a discretization of the fitting task based on which degrees of freedom of a model are used (e.g., global and/or local parameters). The transitions between the states are caused by events which represent quantitative changes to the deformable model’s parameters.

In particular,9 S = {0, 1, 2, 3}, 6c = {σ0 , σ1 , σ2 , σ3 }, 6u = {dinit}, s0 = {0}, Se = {0}. the fitting process in states 2 and 3 is governed by the Lagrange equations of motion (15). Although in both states the governing differential equation is the same, the generalized coordinates that are fitted in each case are different. In particular, in state 2 the generalized coordinates are the translation, rotation and global parameters of the deformable model qi = (qTti , qTθi , qTsi )T , while in state 3, the generalized coordinates include in addition the local parameters (qi = (qTti , qTθi , qTsi , qTdi )T ). For each fitting process there is an observer process (Fig. B.2) that monitors the parameters of the

Messages st j : ( j ∈ {1, 2, 3, 4}) Report the state identifier j to the supervisor.

Messages nd: Deformable model i is to be fitted to the new image data. init: Initialize a deformable model i. dinit: Initialization of the deformable model has been completed. Event Descriptions σ0 : Significant change (>10−4 ) to the parameters of global deformations. σ1 : Non significant change to the parameters of global deformations. σ2 : Significant change to the error of fit. σ3 : Non significant change to the error of fit. State Descriptions 0: Wait state. 1: Initialization of a deformable model i. 2: Fit the deformable model i to the image data using global deformations only. 3: Fit the deformable model i to the image data using global and local deformations. Fig. B.1. FSM model of the process fitting a deformable model i to the image data.

Event Descriptions σ0 : No hole is evolving within the apparent contour. σ1 : A hole is evolving within the apparent contour. σ2 : The relation pEi (v, t) − pEi (v, tinit ) <= K A holds for the shape of the deformable model i. σ3 : The relation pEi (v, t) − pEi (v, tinit ) > K A holds for the shape of the deformable model i. σ4 : The relation (θ b (t) − θ b (tinit )) > K C1 holds for the bending parameters of the model. σ5 : The relation (θ b (t) − θ b (tinit )) <= K C1 holds for the bending parameters of the model. σ6 : The relation (ψ(t) − ψ(tinit )) > K C2 holds for the deformable model i. σ7 : The relation (ψ(t) − ψ(tinit )) <= K C2 holds for the deformable model i. σ8 : The deformable model i is a composed deformable model. σ9 : The deformable model i is not a composed deformable model. State Descriptions 0: In this state the following conditions are true: a) there is no hole evolving within the apparent contour of the deformable model and b) the Parametric Composition Invocation Criterion and the Part Decomposition Criteria B, C and D are not satisfied. 1: Parametric Composition Invocation Criterion is satisfied. 2: Part Decomposition Criterion B is satisfied. 3: Part Decomposition Criterion C is satisfied. 4: The deformable model is a composed deformable model and the Part Decomposition Criterion D is satisfied. Fig. B.2.

FSM model of the observer for the fitting process i.

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Three-Dimensional Human Body Model Acquisition

217

the state of the observer processes (see Fig. B.3) and enable, disable or force controllable events in order to ensure correct behavior of the plant. Therefore, the supervisor provides feedback control. Notes σ0 : σ1 : σ2 : σ3 : σ4 : σ5 : σ6 : σ7 : σ8 : σ9 :

Event Descriptions (O0 ?st1 ) ∨ (O1 ?st1 ) ∨ . . . ∨ (On ?st1 ) Parametric Composition algorithm has been completed. (O0 ?st2 ) ∨ (O1 ?st2 ) ∨ . . . ∨ (On ?st2 ) Part Decomposition B algorithm has been completed. (O0 ?st3 ) ∨ (O1 ?st3 ) ∨ . . . ∨ (On ?st3 ) Part Decomposition C algorithm has been completed. (O0 ?st4 ) ∨ (O1 ?st4 ) ∨ . . . ∨ (On ?st4 ) Part Decomposition D algorithm has been completed. New image data have arrived. Messages have been sent to all the participating fitting processes and bookkeeping has been completed.

State Descriptions 0: Wait state. 1: Execute Parametric Composition algorithm with input the deformable model i for which the corresponding observer has send the message st1 . 2: Execute Part Decomposition B algorithm with input the deformable model i for which the corresponding observer has send the message st2 . 3: Execute Part Decomposition C algorithm with input the deformable model i for which the corresponding observer has send the message st3 . 4: Execute Part Decomposition D algorithm with input the deformable model i for which the corresponding observer has send the message st4 . 5: Perform bookkeeping associated with the arrival of new image data. Send command to all participating processes that new image data have arrived, (Ci ? nd). Fig. B.3.

FSM model of the supervisor.

deformable model in order to infer the state of the deformable model. The quantitative information extracted from the time-varying parameters of the deformable model is quantized in terms of the events 6 = {σ0 , σ1 , σ2 , σ3 , σ4 , σ5 , σ6 , σ7 , σ8 , σ9 } which are explained in Fig. B.2. Our goal is to be able to drive a deformable model from any state10 in S = {0, 1, 2, 3, 4, 5}, back to state 0. This is the desirable state because in that state there is no significant change from the initially fitted shape and no evolving holes that for our problem would imply the existence of additional parts. If all the deformable models fitted to the time varying data are in state 0, then the structure of the image data has been captured. To drive each deformable model to the desirable state 0, a supervisor is designed. The function of the supervisor is to select the appropriate action depending on

1. Note that the path the joint center follows during a movement depends on the task and the anthropometric dimensions of the individual. 2. In the following, we will refer to the person under observation as subject. 3. The protocol is the same across humans with different anthropometric dimensions. 4. Parts of this paper have appeared previously in (Kakadiaris and Metaxas, 1995). 5. Details about the type of finite elements and shape functions used are provided in (Kakadiaris, 1997). 6. Sagittal plane: Any plane that is parallel to the plane bisecting the body into a left and right half. 7. This criterion has been presented in (Kakadiaris et al., 1997) and it is included in this paper for reasons of completeness. 8. A set S is topologically equivalent or homeomorphic to a set T iff there is a 1-1 bi-continuous mapping f of S onto T. 9. The event global clock tick has been omitted from all the descriptions. 10. The error state and the events associated with the error state have been omitted for clarity.

References Akita, K. 1984. Image sequence analysis of real world human motion. Pattern Recognition, 17:73–83. Azuola, F., Badler, N.I., Ho, P, Kakadiaris, I.A., Metaxas, D., and Ting, B. 1994. Building anthropometry-based virtual human models. In Proceedings of the IMAGE VII Society Conference, Tucson, AZ. Barr, A.H. 1981. Superquadrics and angle-preserving transformations. IEEE Computer Graphics and Applications, 1(1):11–23. Barr, A.H. 1984. Global and local deformations of solid primitives. Computer Graphics, 18(3):21–30. Besl, P. and McKay, N.D. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2). Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679–698. Gavrila, D.M. and Davis, L.S. 1996. 3-D model-based tracking of humans in action: A multi-view approach. In Proceedings of the 1996 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, pp. 73–80, IEEE Computer Society Press: New York, NY. Goncalves, L., Di Bernardom, E., Ursella, E., and Perona, P. 1995. Monocular tracking of the human arm in 3D. In Proceedings of the Fifth International Conference on Computer Vision, Boston, MA, pp. 764–770. Hogg, D. 1983. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20.

P3: AKO-SUK

P1: ATI-MHL-AKO/SUB-RCK-MDR International Journal of Computer Vision

218

P2: MHL-NAL/MDR-BIS-RCK KL661-02-KAKADIRIS

P3: AKO-SUKP1: ATI-MHL-AKO/SUB-RCK-MDR October 21, 1998

P2: MHL-NAL/

17:0

Kakadiaris and Metaxas

Kakadiaris, I.A. 1997. Motion-based part segmentation, shape and motion estimation of multi-part objects. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA. Kakadiaris, I.A. and Metaxas, D. 1995. 3D Human body model acquisition from multiple views. In Proceedings of the Fifth International Conference on Computer Vision, Boston, MA, pp. 618–623. Kakadiaris, I.A. and Metaxas, D. 1996. Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In Proceedings of the 1996 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, pp. 81–87. Kakadiaris, I.A., Metaxas, D., and Bajcsy, R. 1994. Active part-decomposition, shape and motion estimation of articulated objects: A physics-based approach. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, pp. 980–984. Kakadiaris, I.A., Metaxas, D., and Bajcsy, R. 1997. Inferring 2D object structure from the deformation of apparent contours. Computer Vision and Image Understanding, 65(2):129–147. Leung, M.K. and Yang, Y.H. 1987a. Human body motion segmentation in a complex scene. Pattern Recognition, 20(1):55–64. Leung, M.K. and Yang, Y.H. 1987b. A region based approach for human body motion analysis. Pattern Recognition, 20(3):321– 339. Leung, M.K. and Yang, Y.H. 1995. First sight: A human body outline labeling system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4):369–377. Mann, R., Jepson, A., and Siskind, J.M. 1996. Richard Mann, Allan Jepson, and Jeffrey Mark Siskind. Computational perception of scene dynamics. In Proc. of the Fourth European Conference on Computer Vision, Cambridge, UK, Bernard Buxton and Robert Cipola, (Eds.), Lecture Notes in Computer Science, Springer, pp. II:528–539. Metaxas, D. 1992. Physics-based modeling of nonrigid objects for

vision and graphics. Ph.D. Dissertation, Department of Computer Science, University of Toronto. Metaxas, D. and Terzopoulos, D. 1993. Shape and nonrigid motion estimation through physics-based synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6):580–591. NASA. 1978. Anthropometric source book. volume II: A handbook of anthropometric data. Technical Report NASA Reference Publication 1024, NASA Scientific and Technical Information Office, Johnson Space Center, Houston, TX. O’Rourke, J. and Badler, N.I. 1980. Model-based image analysis of human motion using constraint propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6):522–536. Pito, R.A. 1996. Mesh integration based on comeasurement. In IEEE International Conference on Image Processing, Vienna, Austria, pp. II:397–400. Prasad, M. 1991. Intersection of line segments. In Graphics Gems II, James Arvo (Ed.), Academic Press. Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press. Ramadge, P.J. and Wonham, W.M. 1989. The control of discrete event systems. Proceedings of the IEEE, 77(1):81–97. Rehg, J.M. and Kanade, T. 1994. Visual tracking of high DOF articulated structures: An application to human hand tracking. In Proceedings of the Third European Conference on Computer Vision, Jan-Olof Eklundh (Ed.), Stockholm, Sweden, pp. 35–46. Rehg, J.M. and Kanade, T. 1995. Model-based tracking of selfoccluding articulated objects. In Proceedings of the Fifth International Conference on Computer Vision, Boston, MA, pp. 612–617. Rohr, K. 1994. Towards model-based recognition of human movements in image sequences. Computer Vision, Graphics, and Image Processing: Image Understanding, 59(1):94–115. Russell, K., Starner, T., and Pentland, A. 1995. Unencumbered virtual environments. In IJCAI-95 Workshop on Entertainment and AI/Alife, IEEE Computer Society Press.

P3: AKO-SUK

Three-Dimensional Human Body Model Acquisition from Multiple Views

Recommend Documents