Neural Comput & Applic (1998)7:115-120 9 1998 Springer-Verlag London Limited
Neural Computing & Applications
Neural Robot Path Planning: The Maze Problem D.C. Dracopoulos Department of Information Systems and Computing, Brunel University, Uxbridge, UK
The importance of path planning is very significant in the field of robotics. This paper presents the application of multilayer perceptrons to the robot path planning problem, and in particular to the task of maze navigation. Previous published results implied that the training of feedforward muItilayered networks failed, because of the non-smoothness of data. Here the path planning problem is reconsidered, and it is shown that multilayer perceptrons are able to learn the task successfully.
Keywords: Autonomous robots; Maze navigation; Multilayer perceptrons; Path planning
1. Introduction A major component for the creation of autonomous robots is the ability of a robot to 'plan its paths', and in general, the ability to 'plan its motion'. In a limited or carefully engineered environment, it is possible to program the robot for all possible combinations of motions in order to accomplish specific tasks [1]. But even when this is possible, one would like to tell the robot 'what' to do, rather than 'how' to do it, making its operation much easier [1]. In general, however, 'pre-programming' a robot for all possible cases or conditions it will meet is impossible, due to the fact that the number of motion combinations can be large or infinite (as is Correspondence and offprint requests to: D.C. Dracopoulos, Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK. Email:
[email protected]
the case in non-tracked systems). In addition, it is highly desirable to have robots which are able to adapt and operate in unknown or changing environments. Adaptation, robustness and operation in a wide range of environments provides robots with a higher degree of 'intelligence'. Such an intelligence can be achieved only through learning. This paper considers the application of artificial neural networks (and more specifically, multilayer perceptrons) in robot path planning. The problem which is addressed here is that of maze navigation. Apart from the obvious industrial applications, the solution of such a problem is inspired from daily life. Humans seem to be able to find their optimal path through rooms they have not visited before, without bumping or colliding with the various obstacles that lie in the room. Somehow, they are able to 'see' the obstacles and make an appropriate optimum path planning so as to avoid them (PJ Werbos, 1996, personal communication). In contrast with this, many path planning techniques (including machine learning algorithms) seem to lead the robot through a room and attempt to 'force' the robot to learn its future path, after a number of collisions with obstacles in the environment. Such an approach, besides being an unnatural method (when compared with humans), is infeasible for most real industrial applications, as the result of a robot colliding with its environment can be catastrophic for the robot itself, and damaging for the environment. The next section provides some background in path planning and relevant techniques. Section 3 describes the simple maze problem considered here, and Section 4 proposes an artificial neural network architecture for the solution of the maze problem along with the results obtained from this approach. Finally, Section 5 summarises what was achieved, and gives directions for future work.
D.C. Dracopoulos
116
2. Path Planning The general problem of path planning for autonomous robots is defined as the search for a path which a robot (with specified geometry) has to follow in a described environment, in order to reach a particular position and orientation B, given an initial position and orientation A (Fig. 1). The path is subject to certain constraints which serve to avoid obstacle collision and optimise the performance of the robot (e.g. find the path with the minimum distance, or find the path which minimises the energy spent by the robot). Sometimes, the motion of a robot is restricted to particular paths or roadways, the railways tracks [2]. In this case, the problem of path planning is equivalent to that of graph search, and any of the graph search algorithms can be utilised (e.g. A* or Dijkstra's algorithm). In general, however, systems are non-tracked, and the number of routes a robot can follow is infinite. Real robot path planning becomes even more complicated due to the fact that the shape of the robot has to be taken into account. A tool which is commonly used to face this extra complication is the configuration space. The robot's configuration space % represents the robot as a point and maps the obstacles in this space in an appropriate way. This mapping transforms the problem of planning the motion of a dimensioned object into the problem of planning the motion of a point [1]. The full mathematical analysis of a path planning problem using the configuration space is too complex as the number of dimensions increases, therefore other
techniques have to be used in combination with the configuration space. Potential fields is one of the most popular methods [1]. Following this approach, the robot resembles a particle charged positively which is moving under the influence of a potential field created by the target configuration (position and orientation of the robot) and the obstacles in the
3. The Maze Problem Fig. 1. Path planning: the robot would like to move from the position and orientation A to the position and orientation B avoiding the shaded obstacles.
It is well known that maze navigation is an important task in robotics. The maze problem in the
117
Neural Robot Path Planning
form considered in this paper was proposed by Werbos [3]. This problem was used as a testbed for the capability of different artificial neural networks to approximate non-smooth functions. As pointed out by Werbos (PJ Werbos, 1996, personal communication), the testbed was established due to the fact that Houillon and Caron [4] were unable to succeed in the training of a multilayer perceptron to solve the problem. The task is defined as follows: Given a maze of the type shown in Fig. 2 (which may vary in size, number of dimensions or the configuration of the obstacles), find an appropriate path which moves a robot from an initial position I to a target position G, while minimising the distance which the robot has to travel. A specialised neural network architecture, based on simultaneous recurrent networks, was designed [3,5] to approximate the dynamic programming J function of the maze navigation problem. Such an approximation is crucial for the adaptive critics neurocontrol methods. These methods are Approximate Dynamic Programming (ADP) techniques: given a utility function U which has to be maximised over all future time (which could be an infinite or a finite time horizon), find an approximation of the strategic utility function J, for which maximisation in the short term will maximise U in the long term [5]. Exact dynamic programming is too 'expensive' computationally for complex dynamic systems or large problems, therefore ADP methods are used. A neural network (called the critic network) learns to output (predict) an approximation J* of the function J (Fig. 3). Thus, the Bellman equation in dynamic programming [6] J(R(t))=max [U(R(t), u(t))+
]
(1)
u(0
yg Yi
x.a
xg
Fig. 2. The maze problem: find the full optimum path which will lead the robot from the position (xl, yi) to the position (xg, y~).
U(t)
Environment
J*(t) Fig. 3. The Approximate Dynamic Programming (ADP) operation.
is not solved exactly, but it is replaced by a neural network able to approximate J(R) by J*(R) [7]. However, the emphasis [3,5] was given in the capability of the simultaneous recurrent network architecture to approximate the J function, rather than solving the path planning problem for the 5 by 5 maze of Fig. 2. In addition, the results presented the approximation of the J function for a particular target goal, although recent experiments with the proposed SRN architecture (PJ Werbos, 1996, personal communication), seemed to give promising results for generalisation in different mazes. Generalisation in cases which are 'unseen' by the planner (either for different initial and final conditions or for different mazes) have a great importance for real world path planning applications. Letting a robot move around a fixed environment, colliding with objects and becoming familiar with it, in order to perform exactly the same navigation task again, is an approach of limited usefulness. Besides the cost involved in such techniques (due to the damage occurred by the collisions), the necessity for intelligent behaviour defined by properties like adaptivity and robustness makes their application inappropriate for many cases. After all, humans are able to 'see' their near optimum path when they arrive in a room for the first time. Intelligent behaviour of a robot can be achieved only if the robot is capable of learning. Artificial neural networks have learning properties which make them ideal candidates for robot motion planning. However, a neural network application was unsuccessful for a problem of this type [4]. The next section proposes a neural architecture for the solution of the maze problem described here.
118
D. C Dracopoulos
4. Neural Path Planning The maze problem illustrated in Fig. 2 has an extra complication which is not apparent when one considers the problem for the first time. For each of the cells and a specific goal position there is more than one equally good direction [5]. Some of these have up to four equally good directions, as there are four different paths which are optimum for the target configuration. This can be very confusing for any path planner, including humans. In particular, such a case may cause many difficulties for the training of articial neural networks. Since the mapping between current state and next action/current state is one-to-many, multilayer perceptrons will learn an incorrect model. This is true because standard supervised learning algorithms average over multiple targets, assuming a squared error criterion function [7]. In addition, such data can be nonsmooth, and training based on such data can be very difficult or impossible for most networks (this was the motivation for the work on specialised more powerful SRN networks in Pang and Werbos [5].) The approach which is tested here is based on feedforward backpropagation-type neural networks. Although such networks suffer from the problem of local minima, in practice it is found that many local minima give good results. In addition, a global minimum based on the total error of the training data does not guarantee or imply that the generalisation of the network will be better. However, a straightforward test can help in deciding the point at which the training of the network should be stopped. One does not use all the available problem data as the training data. Instead, the available data are split into three sets: the training set, the validation set and the test set. The training set is used for training data, and while training in discrete time periods, the validation set is used to test the current state of the network (the validation set is distinct from the training set). The training error is decreasing continuously, but the error on the validation set will start increasing at some point (Fig. 4). This is the point at which the training procedure should stop, since this indicates the best generalisation. Now the test set (distinct from both the training and the validation sets) can be used, to test the capabilities of the network. The network used here is trained to predict the direction in which the robot should move at the next time step. Hence, one output node is used. The inputs to the network are the current position of the robot (x, y) and its target position (Xg, yg). A network architecture with two hidden layers and a total size of 4 - 1 0 - 1 0 - 1 was used (Fig. 5), after having done
Training error
Calidation error
/
t optimum
t r a i n i n g time
Fig. 4. An example of when training should be stopped, for good generalisation capability of a network. )r West
x
y
Xg
yg
Fig. 5. The architecture of the network used. The inputs to the network are the current robot position (x, y) and its goal (Xg,yg). Its output determines whether the robot will move North, South, East or West at the next time step. a number of experiments in order to determine it. However, no real effort in optimising the network architecture was attempted. The training set consisted of 174 distinct data items, mapping the current position and target pos-
119
Neural Robot Path Planning
Table 1. How well the network learned its training data and how well it was able to generalise. Its generalisation capability is of particular interest for real applications, and it can be easily seen how good it is, in this aspect.
Training Generalisation
Total
Correct (with hypothesis of uniqueness)
Really correct
174 288
174 203
174 231
yg
optimum
Yi
actual
x.
1
x
g
Fig. 6. An example of the optimum path versus the path produced by the neural planner.
ition to the optimum action. These training data were generated from the optimum full paths of 50 trajectories. The trajectories were chosen from 50 different initial-target conditions. The generation of training data taken out of optimum full trajectories makes the training of the network an easier task, as the data produced in this way are 'smoother'. The target values in the output node were scaled in the range [-0.9, 0.9] so as to avoid the 'saturated areas' of the sigmoidal function used. Standard multilayer perceptron training can be significantly improved in terms of speed convergence if an adaptive learning rate is used [8]. In all the experiments described here, a different rule for adapting the learning rate ~ was used as follows:
l a =
0.7a,
newerror if > error
1.04
(2)
1.05o~, otherwise
Using this learning rate update after each iteration of the training samples not only speeds up the process of learning, but can also help to avoid many local minima. The initial value which was used here for the learning rate was 0.02. In addition, the learning rule used to update the weights utilised the standard momentum term with a coefficient of 0.7. The possible number of combinations for inputs and outputs to the network is 462 (if one ignores
the trivial case where the initial and final conditions are the same, so the robot does not have to move). The feedforward network described learned perfectly the 174 training samples, and its training was stopped after 50,000 iterations. The remaining 288 samples were used to test the generalisation capability of the network. Out of the 288 test data items, the network predicted correctly the next move of the robot in 203 cases. This is based in the assumption that only one move out of the four is correct (hypothesisof uniqueness)and priority of correctness is given according to the order: North, South, East, West. That is, if more than one move leads the robot to an optimum path, we count as correct only the move which comes first in the described order. However, if one accepts as a correct prediction any of the moves which lead the robot to the optimum path (something which is much more realistic and the true case), then the generalisation capability of the network exceeds 80%. These results are summarised in Table 1. Figure 6 shows the optimum path versus the neural network planner path for an initial position and target position that was not included in the training set. It can be seen that the two paths are very similar, and that the performance of the neural path planner in this case is optimum. However, as is obvious from Table 1, the path produced by the neural path planner is not always optimum, but quite close to it.
5. Conclusions and Future Work The importance of autonomous robots is very significant for complex industrial tasks. A robot cannot be considered as 'autonomous' if it is not able to plan its own motion routes. This work has demonstrated how multilayer perceptrons can be applied to the robot path planning problem. For this purpose, the task of maze navigation was considered. Previous published work indicated that the problem could not be solved using standard feedforward backpropagation type networks [4] and (PJ Werbos, 1996, personal communication). It is unknown why Houillon and Caron [4] failed to solve the similar problem addressed. The results shown here suggest
120 that such an approach is not only feasible for path planning problems, but also that the accuracy which can be achieved is quite high. Future work has to demonstrate how well the neural architecture will scale when applied to mazes of a different size, or to mazes where the robot has more degrees of freedom. Although initially it can be thought that more degrees of freedom make the problem more difficult, such a case will generate much smoother data, something which usually makes the training of multilayer perceptrons an easier task.
References 1. Latombe J-C. Robot Motion Planning. Kluwer Academic, 1991
D.C. Dracopoulos
2. Cameron S. Obstacle avoidance and path planning. Industrial Robot 1994; 21 (5) 3. Werbos PJ, Pang X. Generalized maze navigation: SRN critics solve what feedforward or hebbian nets cannot. World Congress on Neural Networks, San Diego, CA, 1996; 88-93. Lawrence Erlbaum INNS Press 4. Houillon P, Caron A. Planar robot control in cluttered space by artificial neural network. J Math Modeling and Computing 1993; 498-502 5. Pang X, Werbos PJ. Neural network design for J function approximation in dynamic programming. Neural Networks (to appear) 6. Bertsekas DP, Tsitsiklis JN. Neuro-Dynamic Programming. Athena Scientific, 1996 7. Dracopoulos DC. Evolutionary Learning Algorithms for Neural Adaptive Control. Springer-Verlag, 1997 8. White DA, Sofge DA (eds). Handbook of Intelligent Control. Van Nostrand Reinhold, 1992