Original paper
Soft Computing 4 (2000) 165 ± 185 Ó Springer-Verlag 2000
A novel soft computing architecture for the control of autonomous walking robots M. J. Randall, A. G. Pipe
terms, they are capable of tasks which cannot be achieved by conventional methods as ef®ciently, if at all. Here, the architecture is described primarily with application to walking machines, but with some results presented for a wheeled mobile robot. There are numerous applications for climbing and walking robots. These CLAWAR machines are generally very complicated to design, manufacture and control. This extra effort is wasted unless a CLAWAR machine provides a signi®cant advantage over other mobile robots within the target application. For instance, wheeled or tracked vehicles may damage the substrate or may not reach less accessible areas. The goal of this research is to produce a control structure for walking robots that would negotiate extremely rough terrain environments with sparse footholds. The robot itself has been inspired by the stick insect Carausius morosus, which uses accurate foot placement to move around its unstable environment where few footholds exist. It was in trying to solve a very speci®c problem that this generic intelligent planning and control hierarchy was developed. The problem was how to achieve the accurate foot placement required for the kinds of rough terrain envisaged. A layered architecture emerged. The proposed solution relies on an integration of concepts from neurobiology, applied psychology, insect physiology and behaviour based robotics. It has led to a novel generic Key words Autonomous walking, Cognitive agents, systems architecture for the intelligent control of mobile Adaptive neural control robots and in particular, autonomous walking machines. In this paper, the control architecture is hierarchical and 1 will be described from a top-down perspective. An earlier Introduction description of the architecture was ®rst published in [1]. This paper reviews and combines two strongly linked Before delving into the ®ne structure of the hierarchy, pieces of work conducted by the authors over a ®ve-year period in the Intelligent Autonomous Systems (Engineer- an overview will be given in Sect. 2, followed by the logic ing) Laboratory of the University of the West of England, for choosing a hierarchical structure as opposed to a sliced, co-operative/competitive, or ``Brooksian'' structure Bristol, which have not been published together before. will be given (cf. [2]). It has several similarities to other The resulting architecture described here is a cohesive, control architectures, particularly of walking machines, theoretically and experimentally validated structure that and these will also be reviewed brie¯y. Subsequent integrates the navigation and control aspects of the sections will describe each layer in detail. authors' research. Soft computing techniques have been employed throughout the architecture, because, in simple Abstract An integration of concepts from neurobiology, applied psychology, insect physiology and behaviour based robotics has led us to propose a novel generic systems architecture for the intelligent control of mobile robots and in particular, autonomous walking machines. (We de®ne what we mean by ``autonomy''.) The control architecture is hierarchical and will be described from a top-down perspective. Level one consists of interpreting a motivation and translating this into high-level commands. Once a high-level command is generated, a range of internal representations or ``cognitive maps'' may be employed at level two to help provide body-centred motion. At level three of the hierarchy kinematic planning is performed. The fourth level ± dynamic compensation ± requires feedback from the actuators and compensates for errors in the target vectors provided by the kinematic level and caused by systematic dynamic uncertainties or environmental disturbances. This is implemented using adaptive neural controllers. The interfaces will be described and results from simulation and implementation of levels 2±4 on a hexapod robot will be presented. The hierarchy employs the following soft computing techniques: evolution strategies, cognitive maps, adaptive heuristic critics, temporal difference learning and adaptive neural control using linear-equivalent neural networks.
M. J. Randall (&), A. G. Pipe Intelligent Autonomous Systems Engineering Laboratory, Faculty of Engineering, University of the West of England, Bristol, Coldharbour Lane, Bristol BS16 1QY, UK E-mail:
[email protected] Tel.: +44 (117) 344 2705, Fax: +44 (117) 344 2683
2 Architectural overview The hierarchy was designed on the basis of several observations of movement generation. Two authors [3, 4] identi®ed a high level purpose, motivation or volition that makes subsequent movements goal-directed. Marr [5] identi®ed three fundamental ways in which an information processing device needs to be understood, whether that device be the brain or an intelligent hexapod robot. They
165
166
and motivation?''). The body motion level has been inspired by concepts of cognitive mapping in robotics and from neurobiology, in particular ``place cells'' in the hippocampal formation, ``head direction cells'' in the postsubiculum and parietal/retrosplenial association cortex, and path integrators postulated in the caudate nucleus [8]. The resulting body motion level may operate in two interacting modes. The non-procedural cognitive mapping mode may acquire, then make use of, a geocentrically oriented spatial map of its environment. The procedural mode may likewise acquire and then use a map, though of an egocentrically oriented topological nature, operating through a behaviour based mechanism. These two operational modes each have their advantages and disadvantages for agent orienteering, which will be expanded upon in the section entitled ``Body route generation''. 2.1 The kinematic level involves two sub-levels comprising The layers of the architecture The highest level of the architecture is called ``the moti- gait planning and leg trajectory generation. The gait convation level''. It is based on an analysis of the functionality troller is based on observations of walking in ®rst instar and adult stick insects. Each leg has its own trajectory of emotions in humans and other animals. The implementation of this level will provide the agent with a high generator which determines the course of the tarsus (i.e. degree of autonomy (see the section ``What is autonomy the foot). The leg trajectory generator is based on a model of cost-minimised motions of human arm trajectories in order to achieve accurate ``foot'' placement. Finally, at the level of each joint, adaptive neural controllers are used to compensate for errors in the planned leg trajectories. Also, by reducing torques in the dimension of motion as a consequence of hitting an obstacle, or by detecting a lower torque than expected as a consequence of a ``hole'', the adaptive neural controller deals with terrain roughness. The interfaces between levels of the hierarchy are essentially provided by vectors, except in the case of the highest level which could generate either a vector or a symbolic goal. A series of co-ordinate transformations are carried out by the hierarchy: from the task based coordinate system to a cognitive map, from the cognitive map to an agent-centred kinematic system, and from the kinematic co-ordinates to the actuator dynamics. Such an architecture is generic in that it can be applied to almost any robotic system and even the analysis of human goaldirected or extemporaneous movements. are: computational theory; representation and algorithm; and hardware implementation (cf. [6, 7]). According to Marr, the computational theory de®nes the goal of the computation, its appropriateness and the logic of the strategy by which it can be carried out. Here, this will be referred to as ``planning'', i.e. the conversion of a highlevel goal into an appropriate body trajectory. This bodycentred route is then transformed into the trajectories of the individual legs, in co-ordination with each other on the basis of a gait model. The second of Marr's levels, ``representation and algorithm'', forms the core of this paper, and the third level is implemented by the control of the hexapod, in particular the adaptive neural controllers. The control hierarchy is presented in Fig. 1.
Fig. 1. The generic control hierarchy. High-level sensory input, for example, vision, or ``hunger'' produces a motivation to move by providing a high level command, e.g. ``run'' and/or a position vector x for the robot. This is converted into a velocity vector v and an angular velocity vector x for the body centred frame by level 2. Level 3 implements a kinematic plan that can produce target joint space vectors h for lower-level actuators (i.e. the hexapod leg joints), which are then turned into actuator signals that minimise error between target vectors and actual position by level 4. Low level sensory information may be required to do this. A status ¯ag at each level is returned to the higher level during each time step. The exact nature of the status ¯ag and sensory inputs is left open to make the architecture as generic as possible
2.2 Why use soft computing techniques? For the cognitive map building architecture it was important to address the following issues. A powerful search engine was required which could investigate an initially unknown, but complex, multi-dimensional continuous valued solution space ± making no a priori assumptions about the characteristics of the space to be searched. The evolution strategy, drawn from the family of Evolutionary Computational techniques, is just such a powerful tool. The acquired information about the characteristics of the discovered environment needed to be stored as a continuous function representing the ``potential ®eld'' around the vicinity of attractive and repulsive objects. The universal continuous function approximation abilities of neural networks are now well established. In addition the local activation characteristics of the radial basis function neural network made it the obvious choice. One of the
objects of the learning process was to acquire information about movement sequences that led to favourable goal states. Since this was a delayed reward problem the adaptive heuristic critic was chosen to perform this part of the task. For the fuzzy behavioural module system the fuzzy logic approach was again a natural choice due to its ability to use a rule based approach in continuous valued input/ output spaces, and its inherent capability to express a ``strength'' attached to the currently active rules. This could be used as a measure of ``belief'' in terms of the degree of matching between the inputs signals and one or more of the rules in a behavioural module. In the context of mobile robotics this ± in turn ± could be used as a ``place recognizer'' function. At the joint level, neural networks are ideal candidates for use in the control loop because of their ability to learn the systematic dynamic uncertainties of the system and their capability of fast adaptation to the presence of obstacles or to force contact constraints during walking.
2.3 Comparison with other systems architectures There are essentially two different architecture paradigms for the design of autonomous robots. The hierarchical models, to which the approach in this paper is most closely related, is typi®ed by such structures as the NASA/NBS Standard Reference Model for Telerobot Control System Architecture (NASREM) developed by Albus [9]. At the other end of the spectrum is the behavioural/reactive structure described by Brooks [2]. For a crude comparison, the hierarchical approach structures the problem horizontally, in that it plans an action and then converts the plan into a co-ordinated set of servo outputs. On the other hand, the reactive approach slices the problem vertically by consolidating competing/co-operating servo commands based on a set of priorities like ``feed'', ``wander'', ``turn'' etc. where each action has its own sensorimotor apparatus. Neves and Gray [10] show that a general navigation problem can be mapped to either of these approaches. They showed that hierarchical architectures, like NASREM, can accommodate reactive algorithm solutions well, and reactive architectures, like Brooks' subsumption architecture can use a planning and model-based algorithm. Hence, the traditional boundary between the approaches is ``somewhat arti®cial'' [10]. The relative advantages and disadvantages follow. The subsumption architecture solution con®guration was found to expand with the task, relying on previously successful con®gurations. This makes the system robust but lacking in scalability, because new levels are required to suppress obsolete functionality in previous levels which is wasteful. On the other hand, the NASREM-type architectures require a structured approach to the problem from the beginning. Neves and Gray [10] conclude about the hierarchical approach that `the design dif®culty has an evolution which is opposite to that of the subsumption architecture: for simple problems it is necessary to put a number of unused structures in place, but when the complexity of the task increases, those structures already exist.
The architecture developed in this paper can be implemented as a hybrid of the two approaches. The key levels for enabling this are the top two levels: ``motivation'' and ``body route trajectory''. So long as the required vectors are provided by the appropriate levels, the architecture is entirely general. Hence, the ``body route'' planner can have the effect of consolidating the internal/external motivations using a subsumption architecture provided by a vector of goals at the interface of the ``motivation'' and ``body route'' levels. Once this consolidation has taken place, a suitable pair of velocity vectors can be provided to the ``kinematic'' level. The fact that this approach represents a set of co-ordinate transformations interfaced by a speci®ed set of vectors makes it transparent to the algorithms employed to solve the co-ordinate transformations. For instance, at the body route level, these could be provided by any navigational algorithm, be it an implementation of the subsumption architecture or the method described later in this paper. A further advantage of this approach is that at any level of the architecture, all the levels above can be reduced simply to the vector input for the level being considered, e.g. the motivation and body route levels can be represented by a joystick command. This generality and ¯exibility makes the approach described here extremely useful. The planning and control have therefore been purposely separated. Much disagreement exists in the literature concerning the nature of the planning and execution of movements. Here, we have chosen kinematic planning (layer 3 of the architecture) and a dynamic compensation scheme to control the actuators (layer 4). The kinematic planning exists at two levels: gait control (which implements the navigation layer velocity vectors) and leg trajectory generation. Kinematic planning has been shown to exist in many different classes of human movement involving the whole arm, and even ®ngers. It holds true in the case of reaching movements [11], point-to-point and continuous arm movements [12], and `tracking movements exhibit a considerable degree of commonality with learned and extemporaneous movements' ± this commonality being expressed by kinematic invariances [13]. A number of kinematic invariances have been identi®ed (which will be further elaborated below) under varying load conditions or dynamic environments. The kinematic variables for the leg trajectory generator are in extrinsic (i.e. ``feet'' co-ordinates). This is because vision is not employed in placing the feet, where jointspace schemes may be more appropriate (e.g. [14, 15]). Morasso [16] demonstrates that executing similar movements at different speeds requires signi®cantly different torque changes applied to the joints. The invariance of the trajectory convinced him that the planning was therefore achieved in extrinsic co-ordinates. Similar invariances do not exist in intrinsic frames (i.e. joint space) [17]. The demand trajectories for each foot are transformed into joint space demand trajectories using inverse kinematics and supplied to the adaptive neural controllers. These controllers adjust the output of the motors in order to achieve actual motor trajectories as close to the demand
167
168
trajectories as desired (within a small, ®xed error). The neural networks are trained on-line using the error between the desired and actual joint trajectories. This control scheme is asymptotically stable [18]. Where appropriate, results from implementing this architecture on two robots is presented in this paper. Where otherwise appropriate, simulation results are given. Layer 2 has been implemented on a small wheeled robot (Fig. 2a), and layers 3±4 of the architecture have been implemented on a walking machine (Fig. 2b). The walking machines currently does not have the sensory interface required to implement layer 2, but this will be developed in the near future. Further details of the experiments and results can be found in the references given in the text. An exhaustive presentation of the relevant experiments is not
practical in this paper, where the purpose instead is to present the main feature of the architecture.
3 Other walking robot control architectures There are almost as many control architectures as there are walking robots. In this section, those of only a few of them will be compared with the work presented in this paper. LAURON [19, 20] is a walking robot developed by the Forschungszentrum Informatik in Karlsruhe. The robot is controlled using a hierarchical structure similar to the one described in this paper. Berns et al. describe the hierarchical scheme which has a ``reactive element'' dealing with velocity and task commands. The next level down deals with leg co-ordination, and the lowest level plans and implements leg level control through monitoring joint angles. The purposed of the ``reactive element'' is to evaluate high-level sensory data (provided by laser range-®nders, inclinometers, and an internal state which determines if a demanded gait is possible) and it responds to a task provided by human operators. The leg co-ordination level determines the gaits of the robot: it implements the rules set forth by [21] using a neural network. At the leg level, leg trajectories were learned by providing angle data for 1000 different joint-angle measurements. The robot was able to negotiate a series of steps using a tripod gait. The above work was extended in [20], in which LAURON realises a ``self-organising'' representation of its environment through the integration of its sensor data. In order to interpret the data, a reinforcement learning algorithm is implemented by an adaptive-heuristic critic architecture coded in a radial-basis function neural network. New knowledge about the environment is added to this internal representation by exploration. In reaction to particular sensor states, the correct action is determined in a manner similar to the generation of a ®ring signal by a neuron in an arti®cial neural network: a weighted sum of ``prototypical'' actions is taken and the chosen action is a normalisation with some bias determined from environmental effects. The resulting control architecture enabled the robot to walk in a tripod gait in a structured environment. This work is similar in intention and technical approach to our work. However, the generation of fuzzy behavioural modules and topological maps from LAURON's representation of the environment does not take place. A hexapod developed at the Technical University of MuÈnich employs a four-layered architecture, which can be directly mapped to the three lower layers of the architecture described in this paper. The highest level determines direction and velocity, the next level is a gait-controller, the third level is a leg controller, and the fourth level controls joints using PID feedforward control [22]. The middle two levels correspond to the ``kinematic planning'' layer of Fig. 1. Other hierarchical control architectures for Fig. 2. a The wheeled mobile robot used in the navigation experiments in a typical maze, with the ultrasonic distance sen- walking machines are described in [23±26]. All the above architectures differ from ours in three sors visible at the front. b The walking robot ``HUWE'', the Hexapod from University of the West of England ways: there is no formal description of a layer to facilitate
``autonomy'' at the highest level, e.g. using a ``motivation'' module as here; the extent and ¯exibility of the navigation algorithms used; and the use of neural networks at the actuator level using on-line learning in a way which guarantees controller stability. These issues are expanded in the next section.
4 The application of the hierarchy to walking robots Each of the levels of the hierarchy will now be described as it has been applied to the hexapod walking robot in our research. The highest level, motivation, is presented at a conceptual level of detail only, since we have not yet implemented this level on the robot. 4.1 What is autonomy and motivation? The goal of autonomy is likely to be desirable in many mobile robotic applications but it is particularly important for walking robots in very rough terrain with sparse footholds. This is because at the lowest level, the robot should be able to negotiate the environment without requiring a human operator to identify footholds. At higher levels too, for instance in navigation, autonomy may be desirable. Within the ®eld of CLAWAR machines, much work is required before a high degree of autonomy is achieved. It is interesting that NASA designed a wheeled vehicle to explore the Martian surface [27] ± the kind of target environment for which CLAWAR machines would be identi®ed by some as being more suited. Below we de®ne what we understand by ``autonomy''. A measure of autonomy is the level of ``decision-making'' a robot or creature can achieve within the environment to which it is best suited. An animal, for instance, enjoys complete freedom to control its actions within the environment to which it has adapted. Most climbing and walking robots, on the other hand, are deterministically pre-programmed, have a limited powered life span and need a large amount of human control intervention to achieve the tasks for which they were designed. The goal of autonomy can be broken down into more achievable targets. Hence, we have de®ned three degrees of autonomy, each of which presents a series of problems with progressively higher level requirements. The ®rst (and lowest) level of autonomy is that of ``power'' autonomy, where the robot is a self-contained unit that does not require an umbilicus providing power. Battery technology needs to improve signi®cantly before the ideal ± light-weight and long lasting power cells ± are available at an affordable price. In our experience, it is fortunate if a mobile robot's battery lasts half an hour. For a small number of applications, this may be acceptable, but in general, it is not so alternative sources of power like gas or petrol engines may be used instead. Within an industrial context, power autonomy may not be necessary at all, but for outdoor applications it would be preferable. In our laboratory, some of our colleagues are developing ``slug-bots'' which digest slugs as their power source, but such ``eating'' robots are more futuristic than realistic at present [28]. The second degree of autonomy might be described as ``intelligence'' in that the agent is able to adapt to a
potentially dynamic environment and achieve a goal with no intervention from human operators. Few authors (including ourselves) would feel comfortable claiming this has been completely achieved and a ``man-in-the-loop'' seems inevitable for now. Again, in industrial applications, there may be no need to have ``intelligent'' CLAWAR machines, where tele-operated robots are acceptable. However, for applications like humanitarian de-mining, the advantages of ``intelligence'' should be clear. Our control architecture has been designed to go a long way to achieving this level of ``intelligent'' autonomy. The third (and highest) degree of autonomy might be described as ``independence''. It has several consequences. Most signi®cantly it means the agent is able to choose amongst its motivations, e.g. ``do I continue feeding, or should I run away from this nasty furry thing''. Less signi®cantly, it implies that the agent is able to repair itself or recover from signi®cant (though not fatal) damage and requires minimal or no maintenance from an external source. This third level of autonomy is achieved by animals and in this sense, walking creatures are the ``blueprints for autonomy'' in CLAWAR machines [29]. It is probable that arti®cial systems will not achieve this level of autonomy unless there is a fundamental shift in the design approach of engineers. Most arti®cial systems are still engineered with rigidity and accurate tolerances. This ``tradition'' may have emerged predominantly because such structures minimise ``hard-to-control'' characteristics given classical approaches, e.g. the non-linear dynamics introduced by passive compliance. Of course, natural systems cope with many non-linearities effortlessly, which implies that these classical approaches could be misguided for such applications. On the other hand, biological systems are passively compliant (reducing ``wear and tear'' and the likelihood of excessive forces on joints in fault conditions etc.). Furthermore, animal locomotion control systems have built-in redundancy, so that for instance limb loss does not prevent the animal from being effective, even if repair of the limb is impossible. It is for these reasons among others, that we have chosen to understand and, where possible, ``copy'' certain features of animal control strategies, an approach that has been called ``bio-robotics'' [30]. Once ``power autonomy'' and ``intelligence'' are achieved, several applications are opened up that seem elusive otherwise. In particular, remote inspection of utilities, in ducts or pipes etc., the clearance of landmines, or rescue operations in unstable or hostile conditions are all applications that have been proposed for CLAWAR machines. It seems unlikely that useful CLAWAR machines will be commercially viable unless they involve minimal operator intervention, hence this paper opens up the discussion of autonomy by introducing these de®nitions. In our research, we are not concerned with ``power autonomy'', settling for now on using an umbilicus. Our main interest is ``intelligence'', although in the future we intend to investigate ``independence''. In order to be as autonomous as possible, in the above sense of ``intelligence'', an agent must be able to produce motivations of its own devising. Machine motivation can
169
170
be expressed as an interplay between machine emotions (e.g. ``fear'', ``curiosity'') and machine needs (e.g. low power, maintenance requirement) [31]. One structure for implementing this could be a high level behaviour selection mechanism [32, 33] wherein a behavioural module represents an emotion or need. To use an anthropomorphism, one could view the processes of competition (and perhaps co-operation) between behavioural modules as analogous to the pre-conscious parallel processing underlying a ``stream of consciousness''. The sequential selection of competition winners thereby gives rise to ``conscious'' motivations. An architecture such as this shares some common ground with the model for human consciousness proposed by [34]. A motivation expresses the desire for attaining a current reward state. Since the context is walking machines this might be one of a set of symbolic goals to be attained such as ``locating a battery charging station'', ``detection of structural fault in bridge'', or perhaps ``detection of new mine''. Gadanho and Hallam [35] have described an emotiondriven architecture which has some similarities with these ideas. It is based upon Damasio's somatic-marker hypothesis [36]. In their architecture sensations (which can be linked with sensory transducer outputs) give rise to feelings, which in turn can excite emotions if they occur in appropriate combinations. However feelings are mediated by an arti®cial hormone system which derives its outputs from feedback provided by the currently active emotional state. In this way sensations can be ``hidden'' from the emotion system, if appropriate, given a particular state of ``mind'' (e.g. ignoring tiredness when ¯eeing). Although this work is at an early stage, they use their model to mediate reinforcement learning sessions for a simulated mobile robot, and propose that it may be useful for avoiding the problems of using pure logical processes (e.g. rules) to model human ± and other biological ± reasoning [37]. A structure similar to this might provide a suitable mechanism for linking sensory information with motivations relevant to a rough terrain walking robot, such as that described here, via an arti®cial ``emotion'' system.
navigation, and to combine them with the good reactive abilities which are achievable by the behaviour-based approach. In previous works (e.g. [39, 40]), we presented an architecture which would enable a mobile robot with limited sensorimotor apparatus to build a non-procedural metric spatial map of its environment through interaction with it. The resulting cognitive map is a continuous representation of utility for any location in terms of the likelihood of reaching positive reinforcement (reward). It may be interpreted as an appetitive/aversive potential ®eld map [42, 43] which could be used for off-line trajectory planning purposes. As a representational form, non-procedural metric spatial knowledge of an environment is clearly powerful. For example, given that routes A to C and B to C had been visited and built into such a map then an agent could hypothesize about unvisited routes between A and B using the global metric information contained in the map. Although reasoning about visited places is clearly possible using a topological representation, the scenario outlined above is not. For all the architectures presented below, a generalised mobile agent ®tted with ultrasonic sensors is assumed. However, all the ideas are transferable to other sensorimotor interfaces. The map-building architecture is based upon the adaptive heuristic critic (AHC) paradigm, which is made up of two principal components, the ``action policy'' and ``critic''. In this architecture an evolution strategy (ES) is used directly as the main exploring part of the agent, i.e. it forms the ``action policy'' for each movement in the maze. A radial basis function (RBF) neural network is used to store the knowledge gained from exploration and therefore forms the main part of the ``critic'', updated using a temporal difference (TD) learning algorithm as the search progresses. Each ES member represents a potential move from the current position, expressed in egocentrically oriented polar co-ordinates. The RBF network performs a mapping between a given continuous valued xy-coordinate and a measure of ``value'' or ``utility'' for that position. At each ``movement time step'' the ES is restarted with a fresh population and with the entire maze at its disposal searches for the best next movement, any straight line traversal of the maze from the current position. Fitness values are derived using a three step process illustrated in Fig. 3.
4.2 Body route generation Given a goal to be achieved this second level in the hierarchy must establish an appropriate body centred trajectory (route) through a partially or completely unknown environment. This navigation task has been extensively investigated in [38±41]. The main results, which use ``cognitive mapping'' approaches, will be reviewed below. A cognitive map is suitable for use in mobile robot navigation under the following circumstances. The agent needs to transcend basic stimulus-response activity, however landmark recognition is limited to relatively crude local scene characterisation, and it is not possible for the agent to leave ``markers'' in its environment. These are likely to be pertinent circumstances for the applications in question here. The overall objective is to make use of the best features Fig. 3. Exploration in the AHC architecture, where h is the of two cognitive mapping forms, each having its strengths change in head direction and d is potential distance that can be travelled at angle h and weaknesses for different aspects of mobile robot
Many experiments have been performed using this architecture (e.g. [39, 40]), space only allows for a single example. The start position is at the top left of Fig. 5 and also shows a converged trajectory through the environment. Figure 6 shows the cognitive map which was formed. ``Bad places'' and obstacles appear as ridges, whilst ``good'' places appear as valleys. The main contributions made by this architecture are the ability to search a large action space ef®ciently using the ES, and to perform local generalisation about the effectively in®nite state space using the RBF neural network. However globally This process is repeated for each population member, for metric mapping methods face many practical problems. each generation of the ES, until a maximum number of Two that stand out are (see [1] for discussion): generations has been reached. The highest rated population member is then used to make a ``non-return'' move- 1. Cumulative errors in the current position estimate. 2. On-line computational requirements. ment to the next position in the maze. After the ``non-return'' move is completed a temporal Perhaps the best role for a globally metric map-building difference (TD) learning algorithm is executed on the RBF architecture is in allowing an agent to acquire maps aunetwork to change the V-function's shape. This is illustonomously of approximate accuracy under static envitrated diagrammatically in Fig. 4. Changes are made in the ronmental conditions, which could then be used for offregions of the maze surrounding the movements executed line route planning. Our approach would indeed be quite so far in the current trial by distributing a discounted suitable for such purposes. In attempting to unify mapreward or punishment back to them. The amount of this building (and using) with good reactive abilities we have reward/ punishment is simply derived from the value of chosen to pursue an approach involving a procedural tothe V-function at the new position. For example if the Vpological map acting via inhibition of a lower level reactive function value at the new position is 255.5 and the value at behaviour-based system. Although it uses a weaker form of the previous position was 180.5 then some proportion of map than that described above it is likely to be computhe difference (75.0) is added to the previous position's tationally less intensive, would not suffer from cumulative value. According to a simple variant of the standard TD path integration errors, and displays an elegant abstracalgorithm, this process is ``daisy-chained'' back through tion from the details of metric information at the higher time until a ``horizon'' of backward time steps is reached. map level of reasoning. The implication for this paper is The process described above is then repeated from the that the non-procedural metric map has already been acnew position in the maze until the goal position is reached. quired by the agent under static environmental conditions, This completes one trial. and has been used to derive a favourable route. We now wish the agent to express this route in a procedural topological form, for on-line use in the presence of local environmental disturbances. Below we brie¯y describe a scheme which extracts a procedural topological map from a non-procedural metric map (see Fig. 7). At the lower level a behavioural module typically contains a small group of competencies,
Step 1 ± Each ES member is tested on the environment by making an ultrasonic distance measurement in the direction speci®ed. Step 2 ± If a collision would result, the ES member is modi®ed to the location of the collision and the RBF neural network is trained with a low Utility at that position. Step 3 ± the ®tness of an ES member is established by making a forward pass of the RBF network at the proposed new location.
Fig. 4. Exploitation in the AHC architecture, where the bold arrow (®) is a polar to Cartesian transform
Fig. 5. Maze solved using non-procedural metric map
Fig. 6. The cognitive map
171
172
Fig. 7. Procedure for building fuzzy-behavioural modules and topological maps from a cognitive map
such as ``corridor following'' or ``turn right at T-junction''. For topological map building purposes each competency is expressed by an explicit symbolic component. A semi-ordered set of competencies may be built to form a procedural topological cognitive map. Each map represents a route through a particular maze to a goal from a given start point. From a behaviour-based robotics viewpoint it acts as a second level behaviour selection mechanism which overrides a ¯at competitive system [41] when necessary. It inhibits behavioural modules as appropriate in order to resolve con¯icts and achieve the desired behaviour. Since it is a topological representation it is as insensitive to local disturbances as the behavioural modules that it controls will allow. There is a clear analogy here with the behavioural scripts identi®ed in [44]. The overall structure of this layer in the hierarchy is also clearly similar to the work of others [e.g. 44±51]. However, here we are speci®cally interested in the process of extracting a procedural topological map from a previously acquired metric non-procedural map, a topic not directly addressed by any of the above. The competitive stimulus-response selection mechanism which mediates between groups of them and the process of extracting a module's fuzzy rules from a nonprocedural metric map are described more fully in [41], only a brief overview is given below.
Each behavioural module contains one or more competencies (e.g. obstacle avoidance, turn left at T-junction), and is implemented using standard fuzzy logic. When active as the robot's controller a fuzzy behavioural module simply provides an updated body-centred steering angle every 100 ms clock cycle. Each module uses the same ®ve inputs, one from each of the ®xed distance sensors (set at +90°, +45°, 0°, )45°, )90°). If there is a valid module output then the robot is assumed to travel constantly in a forward direction at a speed of 0.1 m/s. If there are no matching fuzzy rules then the motors are turned off. The ``winner takes all'' competitive bidding process between modules also runs at each 100 ms system clock cycle. This structure is based on a generalised competitive behaviourbased approach to robotics. Competition takes place on the basis of comparing the degree of matching between each fuzzy logic system and the current sensory input. Since such a measure is available as a natural by-product of fuzzy inference, i.e. fuzzy AND, this incurs little processing overhead. Even though a topological map can be completely dimensionless at a high level, it must use metric or other detailed environmental features at a lower level in order to perform place or landmark recognition. One could envisage the architecture for a place recognition system being based on a neuro-fuzzy information processing algorithm, indeed there is already such a structure embedded in each of the fuzzy logic based behavioural modules. Let us consider a simple example of a maze containing two T-junctions. Let us assume that an agent based upon this technique possesses, as part of its portfolio of behavioural modules, one for turning left at a T-junction and another for turning right. Each of these modules will contain a number of fuzzy rules for the competency of negotiating the junction, either to the right or left. These rules will ``®re'' in a sequential fashion analogous to an S-RS environmentally linked behavioural chain. Regardless of their respective control outputs a high activation level for one or more of these rules, evaluated as part of the normal fuzzy logic process, gives an indication of proximity to the junction. More speci®cally, provided that the sensor arrangements allow for it, there are likely to be rules which act as ``local predictors'' of an oncoming junction. It is tempting to expect that a high activation level for a whole behavioural module would be suf®cient for this task. However this is not the case here since each behavioural module will inevitably contain more than one competency. This means that a high level of activation in a behavioural module does not uniquely specify proximity to a single junction type. If one is to make use of this feature as a place recognition system one must therefore operate at the level of individual rules within a behavioural module. A simple scheme to achieve this was devised which is based upon the extent of input stimulus change across the sensor array between one 100 ms clock cycle and the next. If the averaged absolute change in stimulus input exceeds a preset level then this is taken to indicate the onset of a new region. The maximally active rule in the ten time steps before and after this time is taken to be the place recognizer. Agents may be physically different from each other. Even if they are physically identical they may possess
different mappings between behavioural modules and the competencies contained within them. A map is therefore speci®ed at the level of competencies and it is assumed that the agent is able to perform the translation between these and its behavioural modules. A ``minimum intervention'' approach is adopted in the map protocol, which leaves the basic processes of behavioural module competition undisturbed unless it is necessary to resolve a con¯ict. A map speci®es the non-con¯icting competencies required to complete a route through a maze, plus an ordered list of competence inhibitions. The term ``inhibition'' is used in the sense adopted by Brooks [32], i.e. the behavioural module to which the competency is mapped has its output vetoed for a predetermined time, thus leaving all other modules uninhibited. A map consists of three main parts specifying the competencies required to complete a route through a maze. (i) The ®rst part speci®es an unordered default list of competencies required to solve unforeseen problems such as small moving objects in the environment which were not present during training for a speci®c route. (ii) The second part is another unordered list of further non-con¯icting competencies, such as following a winding corridor, required to complete a speci®c route through a maze. (iii) The third part speci®es an ordered sequence of competency con¯ict resolutions. It is composed of a set of clauses, each of which speci®es a list of competencies requiring inhibition so as to allow an appropriate one to bid for control of the agent through the normal processes of behavioural module competition.
or straight on. Again one selected result from more thorough testing is presented below. To evaluate the architecture proposed above a 3 ´ 3 m multiple T-junction maze was devised. To proceed successfully from the start at the left of the middle corridor to the goal at the top right corner the agent must turn left at the ®rst junction, right at the next and left at the last junction. The complete maze of was solved using the non-procedural map-builder. This was then used in the processes of extracting behavioural modules for turning right at a T-junction, turning left at a T-junction and following a winding corridor, establishing appropriate place recognizer rules, and building competency lists for the noncon¯ict and con¯ict categories (i.e. items (b) and (c) above) of the topological map shown in Table 1. The list of modules to be involved was determined, by the method described above, using a module activation threshold of 0.1. Note that the con¯ict category is an ordered list consisting of clauses separated by ``;'' which list competencies to be inhibited. For T-junctions there will only be one competency to suppress, but more generally there could be more than one. The maze and body trajectory generated by the agent (left to right) is shown in Fig. 8. So far this research has been con®ned to 2D worlds and is currently being expanded to the 3D environment of a typical walking robot.
4.3 Kinematic planning: gait generation Wilson [52] presents his observations of insect gaits (cf. Fig. 9). He hypothesised that each side of an insect produced a metachronal wave starting with the hind leg protraction and followed by the middle and front legs For the experimental work presented below all behavioural protracting in turn. The slowest gait he observed involved modules, and therefore competencies, were acquired au- an alternation between a right-sided metachronal wave tonomously by the agent, using the procedure described in and a left-sided metachronal wave. As the insect moved [41]. However the mapping between competencies and faster, the left- and right-sided metachronal waves overbehavioural modules is speci®ed by the designer as part of a ``scaffolding'' or ``shaping'' process. Further, for item (i) above, the list of default competencies is also speci®ed by Table 1. Topological Map for multiple T-junction maze the designer. The contents of item (ii) is ascertained by Competency list monitoring module activation levels as the route through a Map category maze is executed using a trajectory derived from a non- Default Small-obstacle-avoidance procedural metric map. Any non-con¯ict module whose Non-con¯ict Winding-corridor Right-T; Left-T; Right-T activation level exceeds a threshold during the route has its Con¯ict competencies added to the list. Item (iii) is composed simultaneously with item (ii) by the following procedure. Each con¯ict behavioural module will have an identi®ed fuzzy-rule which acts as the place recognizer and is monitored for activation above a threshold, indicating that a junction has been reached. For our non-procedural mapbuilding architecture each action is composed of a stationary turn followed by a linear translation. The upcoming rotation speci®ed by interpretation of the nonprocedural map is inspected to ascertain the appropriate behavioural module to excite, and therefore the details of a suppression clause. This is not a precise inspection, the next rotation within a reasonable number of time steps (approximately a second in experiment below) from the present is simply evaluated against +/) thresholds to ascertain whether the action will be a turn to the left, right, Fig. 8. Maze solved using procedural topological approach
173
174
Fig. 9a±e. Gait patterns generated from the Graham model (Left: in simulation, Right: from the real robot). a A wave gait with no coupling. (i.e. legs step in a front-middle-hind pattern, and this pattern alternates between each side of the body). As the coupling is increased these left and right ``metachronal'' waves overlap
more with a continuum of patterns ranging through b to e. c and d are tetrapod gaits and e is the typical tripod gait. Black lines represent legs in swing phase (which last 100 time steps in the simulation and on average 1061 ms on the robot). L1 signi®es the left front leg and R3 indicates the right hind leg
lapped more and more until the typical tripod gait was observed (Fig. 9e). Further increases in speed led to further overlapping where not only do the left and right side waves overlap but one wave overlaps with its subsequent wave on one side. The main mechanism by which such co-ordination is achieved is through a swinging leg inhibiting adjacent legs, thus reducing the chance of producing an unstable leg con®guration causing the animal to fall over. Three versions of this basic principle will be discussed. The ®rst involves mutual inhibition of all nearest neighbours. The second has an anteriorly directed inhibition of nearest neighbours with a delay element built in, and the ®nal model involves up to six different co-ordination mechanisms operating between adjacent legs. The concept of mutual inhibition was proposed by [53], in which both intrasegmental (between the left and right sides of the animal) and intersegmental (between adjacent segments) inhibitory connections were identi®ed in the cockroach. Pearson calls the oscillatory elements ``rhythm generators'', which form part of the ``¯exor-burst'' model he proposed for controlling individual legs. Pearson's later suggestion [54] is that modi®cation of the walking co-ordination may occur through load sensors in the leg's chordotonal organ and position information from the campaniform sensillae. This model formed the basis of Beer's simulation of cockroach behaviours [55] where the effect of load and position sensors was simulated by forward and backward angle sensor ``neurons'' as well as ground contact and stance and swing ``neurons'' within in a distributed neural network control architecture. This basic model was then implemented on a walking robot with two degrees of freedom per leg [56].
A second, and the most complex, interleg co-ordination model to be described here was proposed by [21] and [57]. Together they identify at least six mechanisms that work between legs in a stick insect, and show how some of the mechanisms have analogues in the cray®sh. According to [21], three different mechanisms in¯uence the posterior extreme position (PEP), that is the transition from power stroke to return stroke. Two of these in¯uences act in an anterior direction. The ®rst inhibits the start of the protraction in the anterior leg while the inhibiting leg is still in swing phase. The second of the two in¯uences is an excitatory connection that can bring about a protraction in the anterior leg when the in¯uencing leg starts its own power stroke. The third in¯uence is directed rearward and excites the posterior leg to perform a protraction before the in¯uencing leg begins its own protraction. The main purpose of the latter two of these mechanisms is to establish co-ordination after a disturbance. There are two further mechanisms at work. The ®rst is the targetting behaviour in the stick insect tarsus (or foot). A posterior leg positions its tarsus close to, and slightly behind the tarsus of the leg in front. This is to ensure that known footholds are used effectively when traversing rough terrain. Information is provided to the front legs from the antennae. The ®nal coupling mechanism causes an increase in the motor outputs of neighbouring legs as a result of an increased motor output of the in¯uencing leg during the power stroke. This `co-activating in¯uence is a mutual in¯uence between all immediately adjacent legs with the exception of the two hind legs.' [21] Some of the above mechanisms also work intrasegmentally. In addition to the motor excitation shown above,
the two excitatory in¯uences on the PEP work in both contralateral directions. A sixth mechanism is described by [57] in addition to the ®ve described by [21]. This mechanism is called the ``treading on tarsus'': it causes the tarsus of a posterior leg to stand on top of the leg in front of it, the posterior leg lifts up and places itself near the leg in front of it [57±59] simulate most of these six mechanisms, some of which have also been implemented on a robot with two degrees-of-freedom per leg [60] and two robots with three d-o-f per leg (with legs similar to Fig. 11, later) [61, 62]. The model used in this research is the simplest of the three: it uses rostrally-directed inhibition with a hierarchy of natural frequencies in the leg oscillators. Graham [63] proposed a model for the walking in ®rst instar and adult insects, where the inhibitions of adjacent legs occur in an anterior direction only, and where the front legs have no intrasegmental inhibitions. This would accord more with Wilson's proposition [52] that inhibitions are from a posterior leg. Graham's motivation was to explain some gait data that was quite different from the six gaits described by Wilson. However, as can be seen from Fig. 9, ®ve of the six gaits described by Wilson can be generated from this model. The sixth where subsequent metachronal waves overlap temporally cannot be achieved by any of the models described in above. Graham's model uses coupled oscillators to generate the stepping pattern and unlike the above two models, there is no central signal that is received by all the legs. The oscillators comprise an input signal, which determines the frequency of oscillation; and a threshold value which can be varied. The oscillators are coupled to other oscillators through a delay module. This delay inhibits the adjacent legs by raising their threshold level for the time that the delay oscillator is ®ring, determined by its own independent threshold and frequency. The delay oscillator does not start to ®re until it has been triggered by the start of a new oscillation of the leg with which it is associated, that is when the leg begins a swing phase. Since the legs are only coupled from the posterior to the anterior, there is no feedback circuit from the front legs. This means that the threshold level of the hind leg oscillators has to be higher than those of the middle or front legs. Thus, for each side of the body, a metachronal wave is completed before the next one can begin. The intrasegmental connections (the ``contralateral inhibitions'') were considered to be weaker than the intersegmental connections (the ``ipsilateral inhibitions''). Graham [64] found that there were clear contralateral asymmetries in free-walking insects. This is modelled by using different excitation levels on the left and right side of the body. (Note that the two other models have a central excitation level.) The result is that the slower side controls the faster side. Experimental observations [65] support the Graham model. The results in Fig. 9 are from both simulation (left) and implementation on a hexapod robot (right) using a Siemens C164C1 processor board. Black bars represent the time when a leg is in swing phase (and for the robot, they last for an average of 1061 ms). The implementation of the model was slightly different than for the simulation in
three ways. First the processor real-time clock was used as the timer for the oscillators, instead of using simulated time with a loop counter. Second, the delay oscillators were not implemented: instead the same effect was ``hardcoded'' using a set of rules, such that the inhibited oscillator thresholds were raised for the duration of a swing, rather than for the duration of the delay oscillator cycle. However, the same delay oscillator thresholds were used as for the simulation. Third, it was found that the Wilson gaits were easier to implement with the addition of an extra parameter, called the ``frequency scale factor'', which was used to determine the step period, and to scale the duration of retraction. The values of these scale factors were (gaits a and b: 30000, gait c: 10000, gait d: 6000, gait e: 2500). Otherwise, the same parameters (thresholds, etc.) were used as for the simulation (taken from the original paper [63]). The transfer from simulation to the real world was, as expected, not as smooth a transition as desired. In both simulation and implementation the most dif®cult gait to establish was gait b because the oscillator interactions are not in a stable cycle and drift can be observed in the results. Gait c is more like the simulated gait d. Gait e is unmistakable and very stable. The motivation for using Graham's model (summarised in Fig. 10) is its computational simplicity. For a robot which would normally not walk on ¯at surfaces but on rough terrain with highly unstable footholds, the computational expenditure needs to be at the lowest level of leg and joint control and so a simple gait model is better. One of the most important features of such a robot is its capability of accurate foot placement and this is facilitated by the trajectory planner and the same targetting behaviour described by [21] and implemented in neural networks after the method of [66] ± the second of the two stages of kinematic planning in level 3 of the hierarchy. What the gait generator provides is a set of timing signals
Fig. 10. a Graham's model showing anteriorly directed inhibition with a hierarchy of natural frequencies. The delay elements associated with each leg are shown attached to it with a bold line, and pre®xed with a `D'. All inhibitions go through the delay element. See text for further explanation. b The oscillators on the left and right hand side of the body receive different inputs, and the delay oscillators also receive a ``gait control'' input
175
iour depends critically on the choice of co-ordinates for the onset of a swing/stance cycle for individual legs, and for very rough terrain the default gait will be gait a in [17, 6]. The minimum jerk objective function for an arm trajectory is in Cartesian space for motion in a plane. Fig. 9. The equation is given as [17]:
176
4.4 Kinematic planning: leg trajectory generation For an environment with sparse footholds, accurate foot trajectories are required, as inspired by the stick insect Carausius morosus. The leg design is thus intended to be as articulate as possible and has been based on Carausius' geometry, as shown in Fig. 11. Much of the material for this section comes from the results of applied psychology studies of human arm trajectories. There are two reasons for choosing such a starting point. First, it is intended that the arti®cial hexapod reduce ``wear and tear'' on its mechanical components by employing some appropriate cost-minimising function observed in biological systems. Second, it is intended to generate trajectories which appear ``natural'', based on a principled approach, which can incorporate stick insect data to derive the trajectory features. In considering the protraction of a leg, an optimisation principle is required which generates a unique trajectory based upon some objective criterion. Several examples of these objective functions exist. In principle, this criterion could be any measure of physical cost, for example minimising the energy expended, the force, movement time, impulse or effort [67]. Attention will be focused on two competing theories, namely the minimum torquechange model [68] and the minimum jerk model [17]. Jerk is de®ned as the time derivative of acceleration. Strictly speaking, the model of Flash and Hogan [17] minimises the mean-squared jerk over the total duration of a movement. The movements modelled by Flash and Hogan were uniplanar, large-amplitude, multi-joint elbow movements at intermediate speed. The predicted behav-
1 CJ 2
Ztf ( 3 2 3 2 ) dx dy dt 3 dt dt3 0
where x(t), y(t) is the position of the end-point on the plane, and tf is the duration of the motion. The subscript J stands for jerk. Using dynamic optimisation theory, the predicted trajectory is shown to be a ®fth order polynomial in time for both dimensions with chosen boundary conditions that ®t the observed movement. The minimum jerk model was extended to include curved trajectories which were constrained to pass through a speci®ed via point at an unspeci®ed time [17]. The time at which the point is passed is a prediction of the model. In effect, the motion is segmented into two parts, each of which has a minimum jerk pro®le (i.e. a ®fth-order polynomial) but with the constraint that there is a continuity of velocity and acceleration at the intermediate time, since a discontinuity in either of these quantities would require an in®nite jerk. Several predictions come out of the model. The ®rst was an observation similar to the isochrony principle [69]. That is, if the ®rst segment duration was t1 then the ratio t1/tf was roughly close to 0.5 for all trajectories except those where the via point was very close to either of the end positions. In a previous paper [1] we de®ned the ``strong isochrony principle'', which has the implication that t1/tf = 0.5. Two other testable predictions are mentioned but are not relevant here. Flash and Hogan [17] are clear that the minimum jerk movement is independent of the neuromuscular dynamics only if the demands of the movement lie within the capabilities of the neuromuscular system. If one (or more) of the neuromuscular performance limits are reached, they impose a constraint on the achievable movements, and the planning and execution processes cannot be clearly separated. This observation led [68] to de®ne the minimum torque-change model which accounts for such situations and does not separate the planning and execution [7]. Uno et al. [68] propose that the optimised quantity in human arm trajectories is represented by a minimisation of changes in joint torques. Their belief is that planning is intrinsically related to dynamics, and the cost function is represented as follows:
1 CT 2 Fig. 11. The structure of a leg in the stick insect Carausius morosus. The orientation of the leg relative to the longitudinal axis of the body and the vertical can be described by two angles k and l. Each of the main joints of the leg exist in a common plane called the ``leg plane''. The angle around which the coxa moves the leg forwards and backwards is here denoted as h, the coxatrochanteris is u and the femorotibial joint is w. Note the multijointed tarsus in the insect, which is not modelled in the hexapod
1
Ztf X m dsi 2 dt dt i1
2
0
where si is the torque fed to the ith of m actuators and the subscript T refers to torque. The two criteria are related because torque change is locally proportional to jerk, but the latter depends critically on the dynamics. It has been shown that this criterion is better than the minimum jerk approach in four different experimental
conditions. In one case, for trajectories at the side of the body where movement is at the edge of the workspace, the neuromuscular system acts at the limit of its capabilities and the movements are not expected to comply with a minimum jerk model [17]. The experiments were also arti®cially constrained to a plane [68]. Trajectories are not implemented at the edge of the workspace under most conditions. Instead, the body employs extra degrees of freedom to avoid the extra effort involved, for instance, twisting or stretching of the trunk. This eliminates all such conditions except where there are no more degrees of freedom. Employing extra degrees of freedom involves planning at a kinematic level. Similar objections can be made to the other cases described by [68]. For all of the reasons mentioned above, the choice of organising principle used in this research is that of the minimum jerk model, which allows a separation of the planning and execution levels, and an expression of the planned trajectory in the kinematic variables of the endpoint (or foot). Furthermore, even if the scheme did not evolve to reduce ``wear and tear'' in the musculoskeletal system, it has this useful effect in the hexapod. There is no a priori reason why the minimum jerk criterion cannot be extended to the motion of a leg in three dimensions:
Ztf 3 2 3 2 3 2 ! d py 1 d px d pz dt CF 3 3 dt dt dt 3 2
3
t0
where px(t), py(t) and pz(t) de®ne the foot trajectory in three-dimensional Cartesian co-ordinates relative to the coxa-thoracic joint as in Fig. 11, and the subscript F stands for foot. At the unspeci®ed time t1, the foot passes through the point x(t1) = x1, y(t1) = y1 and z(t1) = z1. (This is necessary to avoid trajectories that may scrape across the substrate.) In terms of dynamic optimisation theory, such a problem is termed `optimal control with interior point equality constraints on the state variables' [17]. The method separates the trajectory into two parts, the segment up to the via-point and the segment from the viapoint to the end of the trajectory. Each segment is a ®fthorder polynomial. The same steps are applied as in Flash and Hogan's paper [17], but with the more complicated boundary conditions (where the vectors v0 and vf are related to the velocity of the hexapod body):
showing the effect of the constraint p1± a consequence of dynamic optimisation:
x
1
t x0 v0x t
xf
10s3
v0x tf
15s4 6s5 13
vfx
10s3 17s4
tf5 120
p1 fs41
30s3 40s4
10s21 s3 5s1 s4
2
x
2
t a0 v0x t
xf
10s3
120
p1 fs41
5s
s31
10s2
15s5
4 177
10s5 s5 g v0x tf
2
a0
15s4 6s5 13
mfx
tf
10s3 17s4
v0x tf
7s5
s31
30s3 30s4
tf5
2
a0
m0x
7s5
5
30s3 40s4
30s3 30s4
15s5
10s5 g
where s1 = (t1/tf) and s = (t/tf), x0 = x(0) and xf = x(tf).
2 a0 and p1 are constants determined by the via point constraint x(1)(t1) = x(2)(t1) = x1, and solving the above equations as simultaneous equations. Two similar
2 equations exist in y(t) and z(t), where the constants a0
2
2 and p1 are replaced by b0 and p2 for y(t) and by c0 and p3 for z(t) respectively, with appropriate boundary conditions y0, y1, yf and z0, z1, zf. Note that the form of equations (4) and (5) differs from the trajectory derived by [17] with one via point, but it can be shown that the two are equivalent. From Eqs. (4)±(5), a fourteenth-order polynomial in the time t1 needs to be solved. Uno et al. suggests that it is ``impossible'' for the human planning system to obtain a minimum jerk path using such an analytical approach. However, as was our initial contention, it is precisely at the level of how the trajectories were represented within the arti®cial and biological systems that the approach proposed here differs from that in real cognitive planning and so an analytical approach is appropriate in an arti®cial system. With this in mind, it seems appropriate, at least for the arti®cial system, to impose the strong isochrony
_ _ f vfx x
tf 0 x
0 x0 x
0 v0x x
0 0 x
tf xf x
t _ _ f vfy y
tf 0 y
0 y0 y
0 v0y y
0 0 y
tf yf y
t z
0 z0 z_
0 v0z z
0 0 z
tf zf z_
tf vfz z
tf 0 Solving the equations, de®ning the necessary boundary conditions for a minimum and applying the boundary conditions at t = 0, t = tf and the constraint equations (due to the dynamic optimisation method), leads to twelve equations in eleven unknowns. The following expressions are obtained for x(t) for t £ t1 and t ³ t1 respectively,
principle on the above equations for the minimum jerk (4)±(5) with one via point. In effect, this removes the need to solve the fourteenth order polynomial in s1 = (t1/tf) by setting its value to 0.5. A consequence of this is simpli®ed (and computationally much more ef®cient) expressions for the trajectories in each of the dimensions
with only known constants of the motion and the independent variable t. We have termed this method ``isochronous minimised jerk''. The support phase does not require a via-point and, using the method of [70], the trajectory is uniquely determined by the following equations for legs on the ground, where the vectors are explained by reference to Fig. 12:
178
dq
j x ^ q
j V dt fj 2 L1; . . . ; R3 : j supporting legsg
6
For a stick insect, the stepping of legs is clearly in two phases, called swing and stance. The transition between each of the phases may be punctuated by a short pause at either of the extreme positions, the AEP or the PEP. This pause is determined by the gait generator. The way the problem has been set up here is such that the pause can be included or excluded depending on what is happening at the higher level of the hierarchy. Once the leg is in stance phase, the boundary conditions are determined by (6) which requires the body-centred velocity and angular velocity as supplied by the body trajectory generator at a level above the gait generator. Once the leg is in swing phase, it needs no intervention from other levels of the hierarchy. Several assumptions have been made in simulating the trajectory generator. The ®rst is that the biological data presented by [71, 72] for stick insects is suitable for the hexapod. This is a valid assumption since the robot design is based on the stick insect with similar relative dimensions (albeit on a different scale). For simple comparison with the biological data of the above papers, the units are maintained as mm in the graphs (Fig. 13), but of course these values can be considered as relative units of length. The second assumption (as far as the simulations reported here are concerned) relates to the nature of the gait. The velocity of the robot is assumed to be 200 mm/s as is
Fig. 12. The hexapod and associated vectors: R is the position of the body centre in terms of the inertial frame of ``world co-ordinates'', (e1, e2, e3), r(j), the position of the coxa of the jth leg, for example, L1, and q(j), the endpoint vector of leg j. Also shown are the angular velocity x and body centre velocity V. For right legs the vectors q(j) are identical to the end-point vector p for the leg j in terms of the coxa-thoracic joint of leg j. For left legs, q(j) is a 180° re¯ection of the vector p for leg j
observable for stick insects walking at their fastest speeds on a horizontal surface [71] and compatible with this assumption is a further assumption that there is no delay between the swing and stance phases, and that they each last for 100 ms. The results (Table 2) show that the isochronous minimised jerk is on average 6% over-simpli®ed yielding on average 3% more jerky trajectories (but as little as 0.4% more jerky). The graphs shown in Fig. 13 present only the trajectories generated for the front legs. Nothing pro®table is demonstrated by including our results for middle and hind leg con®gurations. Because the leg trajectory generator is part of the kinematic plan, simulation results are appropriate to demonstrate its usefulness. The resulting trajectory serves as the ``demand'' trajectory which is passed on to the fourth level of the hierarchy, and the results of this scheme are presented in the next section. The trajectory planning level also implements the observed ``elevator re¯ex'' seen in some insects [73] which is triggered when an obstacle is detected. Also, the ``searching re¯ex'' which is triggered when no substrate is found at the end of a return stroke to search for a stable foothold [73, 74] is implemented in this level. Both these re¯exes are initiated as a result of an appropriate status ¯ag value from the lowest level, dynamic compensation, discussed in the next section. Finally, the via-points can be made adaptive. There are two cases in which it is sensible to adapt the via points: ®rst, when a leg hits an obstacle in the previous step in order to ensure clearance in subsequent steps; second, when no obstacle was detected in a previous step by decreasing a relatively high via point to reduce energy consumption. Thus, via points are adapted to the roughness of the terrain. It has been observed in stick insect walking that the front legs have a higher via point clearance than the middle and hind legs [62]. In adapting via points, the via point of the leg in front (or any information given by the antennae in the case of the front legs) and the previous via point of the leg engaging in swing phase are required. Results from adaptive via point simulations have been reported in [1].
4.5 Dynamic compensation: adaptive neural control of joint trajectories Space does not allow for an exhaustive report of the work we have done at this level of the control architecture. A summary will be presented of the work reported in detail in [75±77, 18], together with some more recent results, which are previously unpublished. Figure 14 shows the neural control structure which implements the control of each joint in the hexapod's front left leg. A radial basis function (RBF) neural network is used as the controller, strapped around a PD controller. PD controller are essentially kinematic controllers. The neural network controls the dynamics of the system and compensates for any trajectory tracking errors that result from the implementation of the demand trajectory by the actuators. A sliding controller is added to ensure neural network robustness to large errors. The neural network is
179
The robot itself consists of 18 model aircraft servo motors, which have been modi®ed such that the control circuitry has been removed and replaced with a shaft encoder from a PS/2 mouse. The neural controllers directly control the motor currents in order to achieve the desired trajectory. Each leg is controlled by a Siemens C164CI processor board (hence there are seven processor boards, including one for the gait controller as mentioned above), and each motor has a custom built driver circuit based on the National Semiconductors LMD 18200 motor driver presented with the desired positions and velocities and is chip. The six leg processor boards also implement the trained by the error in those values, and modi®es the PD isochronous minimised jerk trajectories described above output to reduce those errors. (the so-called ``demand'' trajectories). Fig. 13. Trajectories generated using the isochronous minimised jerk and the minimum jerk method using the biological data speci®ed in Table 2. The top pair of graphs shows the top and side pro®les of the trajectories produced using isochronous minimised jerk for comparison with minimum jerk trajectories in middle pair. The bottom pair of graphs shows the endpoint trajectories as functions of time for each of the dimensions using minimum jerk, and the corresponding joint space pro®les, generated using inverse kinematics (Randall et al. 1996)
Table 2. Biologically-derived data used to generate the trajectories for the walking robot
180
Datum
Hind leg
Middle leg
Front leg
l1 l2 l3 l kl AEP (xf) PEP (x0) Via point (x1) Velocity (v0 = vf) IMJ value MJ value Stance jerk s1 (MJ only)
1.39 mm 14.82 mm 15.51 mm )128° )32° ()26, )11, )7)T ()2, )15.5, )7)T ()18.5, )17.5, 0.5)T ()0.2, 0, 0)T 1.6726 ´ 10)4 1.5635 ´ 10)4 1.3050 ´ 10)6 0.4563
1.57 mm 12.47 mm 12.2 mm )92° )37° (11, )15.5, )8)T ()17, )15.5, )8)T ()4.5, )20, )5)T ()0.2, 0,0)T 1.9101 ´ 10)4 1.9017 ´ 10)4 2.5290 ´ 10)6 0.4892
1.61 mm 16.85 mm 16.1 mm )84° )34° ()25, )14, )10)T (0, )19, )6)T (12, )25.5, 4)T ()0.2, 0, 0)T 3.0503 ´ 10)4 3.0 ´ 10)4 2.3760 ´ 10)6 0.4762
IMJ: Isochronous Minimised Jerk, and MJ: Minimum Jerk. The IMJ and MJ jerk values are for the swing phase
A series of experiments were carried out on one front leg of a hexapod robot (reported in [75±77]). Those experiments were part of an initial investigation into the design of the robot structure and have now been superseded. Furthermore, they refer to an ``air trajectory'', that is, both the planned ``swing'' and ``stance'' cycles of a single leg trajectory were traced in the air ± the stance trajectory was not made in contact with the ground. This implies that the complete cycle is an open-chain trajectory, and we have demonstrated in our laboratory that adaptive neural control for such a system is Lyapunov stable [78± 80]. The key results from those experiments are shown in Fig. 14. what is particularly interesting about them is that the neural controllers were shown to be effective in adapting to an externally applied force disturbance. In those experiments, the neural networks were trained offline initially and then operated with an on-line learning algorithm to reduce tracking errors.
Fig. 14. The leg joint neural control scheme. The demand trad is passed to the neural network and a PD jectory
qd ; q_ d ; q controller. The error is generated by taking the error between the actual and demand trajectories (denoted for position and velocity ~_ respectively. These errors are used to train the neural ~ and q as q network. In order to ensure that the neural network output is robust, a sliding controller is placed in parallel with it (the box with em sgn(.), which ``sees'' the same input as the neural network weight update algorithm)
To generate a trajectory, desired positions were produced in sequence for each of the three servo-motors of a leg by the on-board computer. In these experiments the RBF network is placed between this computer output and the servo-motor input (i.e. the PD controller was part of the servo motor control circuitry). Gaussian basis functions were used for the neural network, evenly spaced at intervals of 10 units across an integer desired position (qd) input range of 0 to 1000. Each basis function had a variance of 169. The choice of ranges and values were determined by potentiometer readings. The RBF network was trained initially with an identity function so that, without on-line learning, it contributes no extra control effort. An error signal is derived from the position potentiometer already contained in the servo as part of the existing position control loop. This feedback signal is read into the on-board computer via an A/D converter and then used to derive a position error signal. This signal is used to train the neural network on-line between one control action and the next, thus generating a modi®ed trajectory demand signal which reduces trajectory tracking errors. All results in Fig. 15 show position against time in the integer units of the trajectory-planning algorithm, and in time steps of approximately 20 ms. There are three plots on each graph. The ``bold white'' plot is the desired position for the coxa joint, the ``grey'' plot is the servomotor position feedback signal, and ®nally the ``bold black'' plot is the RBF neural network output. This set of experiments consisted of allowing the neural network to learn the undisturbed desired trajectory over ®ve successive cycles. On the sixth cycle, a spring with non-linear force-extension characteristics was applied to the middle of the tibia. This systematic disturbance is left on the leg for ®ve more cycles, after which it is removed for the last ®ve cycles. In summary, once the spring is added to the leg, it ``under swings''. The neural network learns to adapt to this disturbance within two cycles and the leg then meets the desired trajectory again. Once the disturbance is removed, an immediate ``over-swing'' is observed. Again, the neural network adapts the leg trajectory to meet the desired trajectory after a couple of cycles. The details of these results will be presented for each joint.
181
Fig. 16a±c. Graphs showing on-line learning during the ®rst walking cycle. a The coxa joint, b the femur joint, c the tibia joint. Note that the axes scales are signi®cantly different from those in Fig. 15. The neural network does not have a model of the contact constraint forces, but ``learns'' one from the feedback, hence the errors are initially large on the return stroke (time units 101±200) for this ®rst step. Over successive cycles this error decreases (not shown) Fig. 15a±d. Graphs showing the experimental results of on-line adaptation to non-linear applied force disturbances and gravity. a The coxa trajectory before learning, b how the neural network has adapted the trajectory in the presence of a non-linear variable force disturbance. c The femur joint before learning and d after learning (i.e. after adapting to the gravitational force of the tibia on the femur)
Figure 15 shows the key results reported in [77] Fig. 15a shows the initial off-load case on the ®rst run through for the coxa trajectory. It can be seen that all three signals are quite similar under these conditions, meaning that the servo-controller is performing quite well under these circumstances and the RBF network is unnecessary.
182
Fig. 17. The planning and control hierarchy applied to the control of a hexapod robot. Down the left hand side are the four layers of the hierarchy. The motivation level translates a goal into position information. The body route planning level employs a cognitive map of the environment which it uses to generate velocity and turning velocity signals for the gait generator. The gait generator generates a set of co-ordinated timing signals for
each leg. The single leg controller generates a set of joint angles and the front two legs also provide target positions for the leg trajectory generator of their immediate rear neighbours. The joint level is controlled by adaptive neural controllers for both joint angle and torque feedback. See text for details. Only the lower three levels have been implemented in this research
Figure 15b shows the learned neural network output for the coxa joint after a non-linear variable force disturbance was applied to the middle of the tibia, in the form of an elastic spring. It is clear that the feedback is close to the desired trajectory whereas the actual servo voltages are represented by the neural network outputs. Once the disturbance is removed, the neural network output returned to its former values (similar to Fig. 15a) in two cycles. A different effect is seen in the femur trajectory. Unlike the coxa, the initial feedback ``misses'' the desired trajectory and so here, the neural network improves the performance of the joint over successive cycles. The reason why the desired trajectory is not achieved is because of the gravitational force exerted on the end-point of the femur by the weight of the tibia joint. This is shown in Fig. 15c (initial cycle) and Figure 15d (®ve cycles later). These initial experiments demonstrated there were a number of problems to be overcome. First the potentiometer feedback was too noisy. This is not clear from the graphs in Fig. 15 because the feedback has been put through a low-pass ®lter before being ``seen'' by the neural network [77]. Secondly, and related, the velocity feedback was not used to train the neural network, again because the position feedback (from which it would be derived) was too noisy. For these reasons it was deemed appropriate to replace the motor circuitry and employ an incremental shaft encoder which could give higher resolution feedback. Despite the problems, the results from this ®rst set of experiments demonstrate the usefulness of adaptive neural control in adapting to environmental force disturbances. Once these alterations were made to the robot, A second series of experiments was performed and these are presented in Fig. 16. They show a complete walking cycle for the a robot leg including contact with the ground during stance phase. The controller in Fig. 14 was used with the PD controller implemented in software. PD gains are the same for each experiment, and are set at 40. The RBF neural networks each consist of 200 neurons evenly spread across the full range of desired joint values (for all joints, i.e. 115±220, determined from shaft encoder feedback). The ``spread'' of the Gaussian functions was 1.3 and the learning rate set at 0.03. The inputs to each neural network included desired joint position, joint position error, and direction of motion derived from the velocity vector. All experiments were conducted on the right middle leg. Improvements could be made in both sets of results. There are a number of classical control techniques that could be used to improve the PD controller signi®cantly, e.g. model based reference control, computed torque control, or standard adaptive control. However, many of these methods are computationally expensive requiring an accurate model of the plant dynamics, and thus are dif®cult to implement in real-time. Note that the control loop sampling time is 10 ms. The neural controllers could also yield better results by optimising the neural architecture, input range and the spread functions at an individual neuron level. Results from a single leg complete swing/stance cycle with ground contact is shown in Fig. 16. There are three plots, showing the effects in the coxa (a), the femur (b) and
the tibia (c) during a single walking cycle. The key is the same as for Fig. 15 with the addition of the motor current, shown as a ``®ne black'' plot. The results are taken in the ®rst cycle of learning, with no initial off-line training (all weights set to 0). They are interesting because although the tracking errors are large (as might be expected for the ®rst cycle of learning), the neural controllers were able to make the machine walk, whereas the PD controllers were not (the weight of the robot was close to the maximum that the motors could handle). With PD control of the leg, the errors grew so large that the motors stalled as a result of a sudden increase in current. The current increase over the stance phase during neural control was more graceful, which is why it succeeded. The trajectory errors decrease over successive cycles. Since the focus of this paper is not the adaptive neural controllers themselves, but rather the control architecture, more detailed results will be presented in future publications. In [18], we demonstrated that the neural control scheme for the complex multi-body system with varying closed chain kinematics (as in the hexapod) is asymptotically stable. Without such a theoretical guarantee of stability, then it is unlikely that industrial applications will employ soft computing techniques for the control of walking robots.
5 Conclusions We have presented a generic control hierarchy for the planning and control of motion. It can be implemented in a wide range of applications, including manipulators, and mobile robots. In particular we have described how we have implemented each layer of the hierarchy for a hexapod walking robot, as summarised in Figure 17. We presented the highest level ± ``motivation'' ± at a conceptual level of detail only, although it is intended to provide the machine with ``intelligent'' autonomy, as de®ne in the paper. Layer two, the navigation layer, has been implemented in simulation and on a wheeled robot, and for the lower layers, kinematic planning and dynamic compensation, results were presented from both simulation and implementation on a real walking robot. Future work will concentrate on two areas. First, we wish to investigate the layer of ``motivation'' further, so that a generic industrial task interpreter can be included within the controller. Second and in the longer term, we would like to de®ne the conditions for guaranteeing stability of the whole hierarchy, that is to attempt to establish the conditions that provide bounded vector sets at the interfaces of the hierarchy layers. References
1. Randall MJ, Pipe AG (1997) An Intelligent Control Architecture and Its Application to Walking Robots, Proc. International Workshop on Advanced Robotics and Intelligent Machines, Salford, UK, ISSN pp. 1363±2698 2. Brooks RA (1991) Intelligence without Representation, Arti®cial Intelligence, 47, pp. 139±159 3. Woodworth RS (1899) The Accuracy of Voluntary Movements, Psychol Rev Monogr, Suppl. 3 4. Ricouer P (1966) Freedom and Nature: the voluntary and involuntary. North Western University Press, Illinois 5. Marr D (1982) Vision, W. H. Freeman, San Francisco
183
184
6. Jordan MI (1990) Motor Learning and the Degrees of Freedom Problem, In: Jeannerod, M. (Ed.) Attention and Performance XIII, Chapter 29, Erlbaum 7. Kawato M (1992) Optimization and Learning in Neural Networks for Formation and Control of Coordinated Movement, In: Meyer DE, Kornblum S (Eds.) Attention and Performace XIV: Synergies in Experimental Psychology, AI and Cognitive Neuro sci, MIT Press, pp. 821±849 8. Touretzky DS, Wan HS, Redish AD (1994) Neural Representation of Space in Rats and Robots, Computational Intelligence: Imitating Life, IEEE Press, pp. 57±68 9. Albus JS (1987) NASA/NBS Standard Reference Model for Telerobot Control System Architecture (NASREM) NBS Technical Note, No. 1235, July 1987 10. Neves CC, Gray JO (1997) Comparative Analysis of Three Architectures using a Generalised Framework, Proc. International Workshop on Advanced Robotics and Intelligent Machines, Salford, UK, ISSN, pp. 1363±2698 11. Shadmehr R, Mussa-Ivaldi FA (1994) Adaptive Representation of Dynamics during Learning of a Motor Task, J Neuro sci, 14(5), pp. 3208±3224 12. Soechting JF, Terzuolo CA (1990) Sensorimotor Transformations and the Kinematics of Arm Movements in Threedimensional Space, in Jeannerod, M. (Ed.) Attention and Performance XIII, Chapter 16, Erlbaum 13. Viviani P (1990) Common Factors in the Control of Free and Constrained Movements, In: Jeannerod, M. (Ed.) Attention and Performance XIII, Chapter 11, Erlbaum 14. Hollerbach JM (1990) Planning of Arm Movements, In: Osheron DM, Kosslyn SM, Hollerback JM (Eds.) An Invitation to Cognitive Science, Vol. 2: Visual Cognition and Action, MIT Press, pp. 183±211 15. Soechting JF, Flanders M (1991) Deducing Central Algorithms of Arm Movement Control from Kinematics, In: Humprey DR and Freund HJ (Eds.) Motor Control: Concepts and Issues, John Wiley and Sons Ltd., pp. 293±306 16. Morasso P (1981) Spatial control of arm movements. Exp Brain Res 42, pp. 223±7 17. Flash T, Hogan N (1985) The Co-ordination of Arm Movements: An Experimentally Con®rmed Mathematical Model, J Neurosci, 5(7), pp. 1688±1703 18. Randall MJ, Pipe AG, Win®eld AFT, Jin Y (1999) Adaptive Neural Control of Walking Robots With Guaranteed Stability, In: Virk GS, Randall MJ, Howard D (Eds.) (1999) Climbing and Walking Robots (Proc. CLAWAR `99) Professional Engineering Publishing, Bury St. Edmunds, pp. 111±121 19. Berns K, Dillman R, Piekenbrock S (1995) Neural Networks for the Control of a Six-legged Walking Machine, Robotics and Autonomous Systems, 14, pp. 233±244 20. Ilg W, Berns K (1995) A Learning Architecture Based on Reinforcement Learning for Adaptive Control of the Walking Machine LAURON, Robotics and Autonomous Systems, 15, pp. 321±334 21. Cruse H (1990) What Mechanisms Coordinate Leg Movement In Walking Arthropods?, Trends in the Neurosci, 13, pp. 15±21 22. Steuer J, Pfeiffer F (1998) Autonomous Control of a SixLegged Walking Machine, Proceedings Euromech 375: Biol and Tech of Walking, Munich, pp. 141±148 23. Song SM, Waldron KJ (1987) An Analytical Approach for Gait Study and Its Applications on Wave Gaits, Int J Robotics Res, 6(2), pp. 60±71 24. Halme A, Salmi S, Leppaenen I (1997) Control and Stabilisation of the Semi-Dynamical Motion of a Heavy Six-Legged Walking Machine, New Approaches on Dynamic Walking and Climbing Vehicles, Workshop II: 8th Int Conf on Advanced Robotics (ICAR'97) Monterey, CA., July 7±9, pp. 44±47 25. Luk BL, Galt S, Hewer ND, Cooke DS (1998) Advances in Legged Locomotion, Proceedings of International Conference
26.
27. 28.
29.
30.
31. 32. 33. 34. 35. 36. 37. 38. 39.
40.
41. 42. 43. 44.
45. 46. 47.
of Climbing and Walking Robots 1998, Brussels November 26±28, pp. 155±160 Frik M, Guddat M, Karatas M, Losch DC (1999) A Novel Approach to Autonomous Control of Walking Machines, In: Virk GS, Randall MJ, Howard D. (Eds.) (1999) Climbing and Walking Robots (Proc. CLAWAR `99), Professional Engineering Publishing, Bury St. Edmunds, pp. 333±342 Gat E, Desai R, Ivlev R, Loch J, Miller DP (1994) Behavior Control for Robotic Exploration of Planetary Surfaces, IEEE Trans Rob Automat, 10(4), pp. 490±503 Kelly I, Holland O, Scull M, McFarland D (1999) Arti®cial Autonomy in the Natural World: Building a Robot Predator, Proc. European Conference of Arti®cial Life, September 1999 Randall MJ, Pipe AG, Win®eld AFT (1998) Blueprint for Autonomy in CLAWAR Machines, Proc. of International Conference of Climbing and Walking Robots (CLAWAR) `98, Brussels November 26±28, pp. 131±136 Randall MJ, Pipe AG, Win®eld AFT, Muttalib A Md Gh (1998) Intelligent Hexapod Bio-Robotics, Proceedings Euromech 375: Biol and Tech of Walking, Munich, 1998, pp. 124± 132 Aube M, Sentini A (1996) What Are Emotions For?, From Animals to Animats 4, MIT Press, pp. 264±271 Brooks RA (1986) A Robust Layered Control System for a Mobile Robot, IEEE J Rob and Automation RA-2(1) Spier E, McFarland D (1996) A Finer Grained Motivational Model of Behaviour Sequencing, From Animals to Animats 4, MIT Press, pp. 255±263 Dennett DC (1993) Consciousness Explained, Penguin Gadanho SC, Hallam J (1998) Emotion-driven Learning for Animat Control, From Animals to Animats 5, MIT Press, pp. 354±359 DamaÂsio AR (1994) Descartes' Error±Emotion, Reason and Human Brain, Picador Dreyfus HL (1992) What Computers Still Can't Do: A Critique of Arti®cial Reason, MIT Press Pipe AG (1997) Reinforcement Learning and Knowledge Transformation In: Mobile Robotics, PhD thesis, University of the West of England, UK Pipe AG, Carse B, Fogarty T, Win®eld A (1995) Learning Subjective ``Fuzzy Cognitive Maps'' in the Presence of Sensory-Motor Errors, Procs 3rd European Conf on Arti®cial Life, Springer LNAI 929, pp. 463±476 Pipe AG, Fogarty TC, Win®eld A (1994) A Hybrid Architecture for Learning Continuous Environmental Models in Maze Problems, From Animals to Animats 3, MIT Press, pp. 198± 205 Pipe AG, Win®eld A (1996) An Autonomous System for Extracting Fuzzy Behavioral Rules in Mobile Robotics, From Animals to Animats 4, MIT Press, pp. 233±242 Arkin RC (1989) Motor Schema-Based Mobile Robot Navigation, Intl. J Robotics Res. 8(4), pp. 92±112 Schmajuk NA, Blair HT (1993) Place Learning and the Dynamics of Spatial Navigation, Adaptive Behavior, 1(3), pp. 353±385 Aylett R, Coddington A, Barnes D, Ghanea-Hercock R (1995) A hybrid Multi-agent System for Complex Task Achievement by Multiple Co-operating Robots, Proc Multi-Agent Workshop Kuipers B, Byun Y (1991) A Robot Exploration and Mapping Strategy based on a Semantic Hierarchy of Spatial Representations, Robotics and Autonomous Systems 8 Mataric MJ (1992) Integration of Representation into GoalDriven Behavior-Based Robots, IEEE Trans Robotics and Automation, 8(3), pp. 304±312 Nehmzow U, Smithers T (1991) Using Motor Actions for Location Recognition, Procs 1st European Conf on Arti®cial Life, MIT Press, pp. 96±104
48. Smart WD, Hallam JCT (1994) Location Recognition in Rats and Robots, From Animals to Animats 3, MIT Press, pp. 174± 178 49. Colombetti M, Dorigo M (1994) Training Agents to Perform Sequential Behavior, Adaptive Behavior, 2(3), pp. 247±275 50. Grefenstette JJ, Ramsey C, Schultz AC (1990) Learning sequential decision rules using simulation models & Competition, Mach Learning 5(4), pp. 355±381 51. Michaud F, Lachiver G, Dinh CTL (1996) A New Control Architecture Combining Reactivity, Planning, Deliberation and Motivation for Situated Autonomous Agent, From Animals to Animats 4, MIT Press, pp. 245±254 52. Wilson DM (1966) Insect Walking, Annual Review of Entomology, 11, 103±122 53. Pearson KG, Iles JF (1973) Nervous Mechanisms Underlying Intersegmental Co-ordination of Leg Movements During Walking in the Cockroach, J Exp Biol, 58, pp. 725±744 54. Pearson KG (1976) The Control of Walking, Scienti®c American, 235(6), pp. 72±86 55. Beer RD (1990) Intelligence as Adaptive Behavior, Academic Press 56. Beer RD, Chiel HJ, Quinn RD, Espenschied KS, Larsson P (1992) A Distributed Neural Network Architecture for Hexapod Robot Locomotion, Neural Computation, 4, pp. 356±365 57. Dean J (1991) A Model of Leg Co-ordination in the Stick Insect, Carausius morosus: II. Description of the kinematic model and simulation of normal step patterns, Biol Cybern, 64, pp. 403±411 58. Dean J (1992) A Model of Leg Co-ordination in the Stick Insect, Carausius morosus ± III. Responses to perturbations of normal co-ordination, Biol Cybern, 66, pp. 335±343 59. Dean J (1992) A Model of Leg Co-ordination in the Stick Insect, Carausius morosus ± IV. Comparison of different forms of coordinating mechanisms, Biol Cybern, 66, pp. 345± 355 60. Espenchied KS, Chiel HJ, Quinn RD, Beer RD (1992) Leg Coordination Mechanisms in the Stick Insect Applied to Hexapod Robot Locomotion, Adaptive Behaviour 1(4), pp. 455± 468 61. Espenschied KS, Quinn RD, Chiel HJ, Beer RD (1994) Biologically-Inspired Hexapod Robot Control, Proc 5th Int Symp on Robotics and Manufacturing (ISRAM), 5, pp. 89±94 62. Cruse H, Brunn DE, Bartling Ch, Dean J, Dreifert M, Kindermann T, Schmitz J (1995) Walking: A Complex Behavior Controlled by Simple Networks, Adaptive Behavior, 3(4), pp. 385±418 63. Graham D (1977) Simulation of a Model for the Co-ordination of Leg Movement in Free Walking Insects, Biol Cybern, 26, pp. 187±198 64. Graham D (1972) An analysis of walking movements in the ®rst instar and adult stick insect, J Comp Physiol, 81, pp. 23±52
65. Cruse H, Epstein S (1982) Peripheral In¯uences on the Movements of the Legs in a Walking Insect Carausius Morossus, J Exp Biol, 101, pp. 161±170 66. Dean J (1990) Coding Proprioceptive Information to Control Movement to a Target: Simulation with a Simple Neural Network, Biol Cybern, 63, pp. 115±120 67. Flash T, Hogan N (1995) Optimization Principles in Motor Control, In: Arbib NA, Handbook of Brain Theory and Neural Networks, MIT. Press, pp. 682±685 68. Uno Y, Kawato M, Suzuki R (1989) Formation and Control of Optimal Trajectory in Human Multijoint Arm Movement: Minimum Torque-Change Model, Biol Cybern, 61, pp. 89±101 69. Viviani P, Schneider R (1991) A Developmental Study of the Relationship Between Geometry and Kinematics in Drawing Movements, J Exp Psychol, 17(1), pp. 198±218 70. Devjanin EA, Gur®nkel VS, Gur®nkel EV, Kartashev VA, Lensky AV, Shneider A Yu, Shtilman LG (1983) The Six-Legged Walking Robot Capable of Terrain Adaptation, Mechanism and Machine Theory, 18(4), pp. 257±260 71. Cruse H (1976) The Control of Body Position in the Stick Insect (Carausius morosus) when Walking Over Uneven Surfaces, Biol Cybern, 24, pp. 25±33 72. Cruse H, Bartling Ch (1995) Movement of Joint Angles in the Legs of a Walking Insect, Carausius morosus, J Insect Physiol, 41(9), pp. 761±771 73. Pearson KG, Franklin R (1984) Characteristics of Leg Movements and Patterns of Co-ordination in Locusts Walking on Rough Terrain, Int J Rob Res, 3, pp. 101±112 74. Espenschied KS, Quinn RD, Beer RD, Chiel HJ (1996) Biologically based distributed control and local re¯exes improve rough terrain locomotion in a hexapod robot. Robotics and Autonomous Systems 18, pp. 59±64 75. Randall MJ, Pipe AG, Win®eld AFT (1998) Stable On-line Neural Control of Hexapod Joint Trajectories, IEEE Control Applications, Trieste 76. Randall MJ, Pipe AG, Win®eld AFT (1998) Adaptive Neural Control of Hexapod Leg Trajectories, In: Adolfsson, J. and KarlseÂn, J. (Eds) Mechatronics'98, Elsevier Science Ltd 77. Randall MJ, Pipe AG, Win®eld AFT (1998e) Blueprint for Autonomy in CLAWAR Machines, Proceedings of International Conference of Climbing and Walking Robots 1998, Brussels November 26±28, pp. 131±136 78. Jin Y (1994) Intelligent Control and Its Applications, In: Robotics, PhD Thesis, University of the West of England, UK 79. Jin Y, Pipe AG, Win®eld A (1993) Stable Neural Network Control for Manipulators, IEE J Intelligent Systems Engineering, 2(4), pp. 213±222 80. Jin Y, Pipe AG, Win®eld A (1997) Stable Manipulator Trajectory Control using Neural Networks: Methodology, Stability Analysis and PUMA Simulations, In: Omidvar O, van der Smagt P (Eds.) Progress in Neural Networks ± Neural Systems for Robotics, Academic Publishing Corp.
185