Vol. 3 No. 4 Dec. 1999
JOURNAL OF SHANGHAI UNIVERSITY
Reinforcement-Based Fuzzy Neural Network Control with Automatic Rule Generation WU Geng-feng DONG Jian-quan
CHEN Yi-min
CAO Min
ZHANG Yue
( School of Computer Engineering and Science, Shanghai University)
FU Zhong-qian ( University of Science and Technology of China ) A reinforcemen-based fuzzy neural network control with automatic rule generation (RBFNNC) is proposed. A set of optimized fuzzy control rules can be automatically generated through reinforcement learning based on the state variables of object system. RBFNNC was applied to a cart-pole balancing system and simulation result shows significant improvements on the rule generation.
Abstract
Key words reinforcement learning, fuzzy neural network, rule generation
1
factor of reinforcement based on the system state variables and then abstracts a set of optimized rules by com-
Introduction A fuzzy neural network controller based on the combi-
petition. R B F N N C is successfully applied to a cart-pole balan-
nation of fuzzy logic system and neural network has uIt not only has the function of
cing system. The computer simulation shows significant
learning and associative memory but also clearly repre-
improvements on the rule generation and the perfor-
sents a kind of structured knowledge and does fuzzy inferenceE1,2].
forcement algorithm are described in the paper. Some
nique characteristics.
The most important thing in designing a fuzzy controller is to obtain a set of optimized rules. There are many ways to obtain fuzzy rules if one has enough pre-
mance of controller. The system structure and the reinresults of computer simulation are presented.
2
Architecture of R B F N N C
cise input data and output samples. For instance, a
Fig. 1 shows the basic configuration of R B F N N C
product space clustering method based on adaptive vector quantization ( A V Q ) with competitio~l [2] can gener-
composed of five modules, among which, fuzzy neural network controller ( F N N C ) module, signal stochastic processing (SSP) module and cart-pole balancing system
ate a set of fuzzy rules corresponding to a certain amount of input-output samples. However, for some real-world applications, precise data for training or learning are difficult and expensive to obtain. Therefore there has been a growing interest in reinforcement learning algorithms
(CPBS) module form a basic fuzzy controller. The reinforcement factor evaluation ( R F E ) module and rule ge-
for neural network [3] . In this paper, we propose a rein-
neration ( R G ) module perform the function of adaptive learning and rule generation. Below we briefly introduce the function of different
forcement-based fuzzy neural network control ( R B F N -
modules.
NC) with automatic rule generation. The only training data for reinforcement learning are the s y s t e m ' s state variables. R B F N N C first evaluates a system error and a
Received Jan. 13, 1999 Project supported by the Science Foundation of Shanghai Municipal Commission of Education (980034) WU Geng-feng, Prof., School of Computer Engineering and Science, Shanghai University, 149 Yanchang Road, Shanghai 200072, CHINA
Rule generation
_•
Fuzzyneural network
controller
stochastic ~ pr°eessmgl
balancing ] system
Systemstate ( 0, AO ) Fig. 1
Architecture of RBFNNC
Vol. 3 No. 4 Dec.1999
WU G. F. :
Reinforcement-BasedFuzzy Neural Network Control with Automatic'"
283
(1) Fuzzy neural network controller (FNNC) module This module consists of a fuzzy neural network with five layers (shown in Fig. 2) acting as a basic fuzzy controller. Rules generated by the RG module (through learning and training) are stored in the FNNC, which performs fuzzification, inference and defuzzification function. The input to the FNNC is the precise system
In order to know the contribution of each rule to the general output, an inverse calculation is taken to mean a suitable defuzzification procedure applicable to an individual rule. For instance, the output of rule r in fuzzy set "small" is computed as
values. Finally, it produces a precise output to the SSP module for further processing. As shown in Fig. 2, the first layer contains one node for each input variable Xi. These nodes can also be considered as the linguistic variables and only pass the crisp input values to the nodes on the second layer. The second layer contains one node for each possible value of one of linguistic variables in layer 1. For example, if "small" is one of the values that Xi can take, a node computing membership function /-/small(,Z') belongs to layer 2. Actually layer 2 fuzzifies input values to the nodes on the third layer. A node in layer 3 corresponds to a rule R r ( I ~ r ~ R ) . Its inputs come from all nodes in layer 2 which participate in the "if" part of that rule (antecedent lable). The node itself computes the continuous, differentiable "softmin ''[3] operation of its inputs to get a firing strength o)~ (or in other words, the degree of applicability of rule r ) .
The fifth layer computes the crisp output by a weighted sum of its inputs to get a crisp control value, network output F . The weights are the firing strengths. The inputs of layer 5 come from the recommendations of all the fuzzy control rules in the rule base. F can be obtained using the following weighted sum:
~ i / z ; i exp( - kpij ) ('Or
=
~>--~1 i
exp( - k,u 0 )
(1)
'
.< Fig. 2
(3)
(2) Reinforcement factor evaluation ( R F E ) module This module is the basis of the proposed reinforcement algorithm. It produces a measure of reinforcement factor v associated with the current state of the object system. The factor v is fed to module RG for generating rules automatically, and to module SSP for computing the final output Fout. The system error E is issued by an inference process based on a set of fuzzy rules [4'51 . Therefore, reinforcement v can be calculated as follows v = 1-2-
I E I.
(4)
This module stochastically generates an action Fout applied to the system based on the action F recommended by the FNNC and the value of v from the RFE. Actually Fout is a Gaussian random variable with mean F
'E -
rrlAr~l-l(COr)
(2)
(3) Signal stochastic processing (SSP) module
¢IIr
N
-1
Z r = /Asmall(OAr).
o o
g--~
',.9
~..--. O
Fuzzy neural network
where parameter k controls the hardness of the "softrain" operation. When k--~ co we recover the usual "min" operator. When k is finite we get a differentiable function of the inputs. The fourth layer, one node for each linguistic value of the output variable (consequent label). Its inputs (cot) come from all rules that use this particular consequent.
and standard deviation e x p ( - v ) E33. This leads to 'a better exploration of the state space and a better generalization ability. (4) Rule generation (RG) module This module performs a proposed reinforcement algorithm (see section 3) to generate a set of optimized rules without any input-output samples required. In other words, the function developed in RG allows it to learn the fuzzy rules from scratch. The only inputs for this module is the state variables of system and a reinforcement factor v. (5) Cart-pole balancing system (CPBS) module A typical non-linear dynamic system, a cart-pole balancing system [5] is used in RBFNNC to test the effectiveness of the fuzzy rules generated by the proposed
Journal of Shanghai University
284 reinforcement algorithm. As shown in Fig. 3, we select angle 0 and angle velocity 0 as inputs of RBFNNC. When the position of pole is on the right side to the vertical line, we define 0 as positive. When the pole goes down from the right side of vertical line, we define 0 as positive. The output of SSP is Vou t. The positive direction of Fout is to right direction.
? F-.I
-I
(3 ¸
(9
I
Fig. 3 Cart-pole balancing system
for control to get new system state variables 0 and 0. ( 7 ) Go to (2) until a fixed number of iterations is completed. After the first phase, though correct output value is not known, its sign can be determined. This will produce a set of rules that produce the correct sign (see Fig. 8 ( b ) ) . The set of rules is still inconsistent because rules with the same antecedent, but with different consequents, remain. Phase 2: Optimizing a group of rules with same antecedent. Suppose there are R remaining rules after the first phase. We divide them into m groups R 1 , R 2, R 3 "'" R m . Each R i ( 1 ~ i ~ rn ) has an identical antecedent. Rj represents one of the rules in group R i (1 ~ j ~ l i ).
li is a total number of rules in a group and may be dif-
3
R e i n f o r c e m e n t A l g o r i t h m in R B F NNC
We take 0 and 0 as the state variables of system and CPBS as object system of the controller. The reinforcement algorithm can be realized in two phases.
ferent for each group. (1) Obtain the state variables of system (0 and 0 ) . (2) For each group R i , randomly select one rule node Rj. There are rn rules selected in m groups. ( 3 ) FNNC module computes output Z!j=/~ - 1 (o~;)i
Phase 1: Deleting all the rules that have produced a counterproductive result. ( 1 ) Build a full network as shown in Fig. 2 which consists of all possible rules as a basic rule base. In order to show the whole process of rule generation, we omit the input and output layers as in Fig. 2 and keep antecedent lables layer, rule layer and consequent labels layer shown in Fig. 8 ( a ) . (2) Obtain the state variables of the object system (0 and 0 ). The FNNC module computes the contribution of each rule R r to the output F as Zr = / z - 1 (~0r) and the network output F . (3) RFE module computes system E through inferring based on a set of fuzzy rules stored previously and then obtain v based on E . ( 4 ) Determine the sign of the optimal output value Fopt under present state of system. Although the exact
for rule R j , and F for m selected rules.
output value is not known, its sign can be determined.
(6) Accumulate E R R R' for each Rj.
i
(4) RFE module computes E and v. (5) Compute rule error ERRRi for each Rj : i
i
---k--1 E 1 /1 k=l
Vm~x Fm~. '
(5)
(ORk
where CORkindicates the firing strength of the rule R~ (k ;
= l'"m), Fmax
and
Zj is the crisp output of the rule R~, and Fmi n a r e
the possible maximum and minimum
output value F respectively. Obviously, ERRRI depends on the normalized firing strength, the error of the whole system and a factor corresponding to the error in the normalized difference between the rule output and the output of the whole network. i
(5) Delete each rule R~ with sgn(Z~ )=J=sgn(Four), in other words, delete all rules that produce a result opposite to the expected one. Delete each rule R~ whose firing strength cot is smaller than a threshold determined by real situation. (6) SSP module stochastically computes the final output Fo~t based on F and v , then forward it to the CPBS
J
(7) SSP module computes the final output Four based on F and v and then forwards it to the CPBS for control to get new system state variable 0 and ~. (8) Go to (1) and start again until a fixed number of iterations is performed. (9) For each group R i ( l ~ i ~ m ) , only the node with least mean error value remains, all other rute nodes
Vol. 3
No. 4
Dec. 1999
W U G. F. :
Reinforcement-Based Fuzzy Neural Network Control with Automatic'"
interfaces of software which can perform rule generation. All the parameters, including initial angle, mass of cart and pole, length of pole, simulation times and sample time e t c . , can easily be adjusted according to the requirements. The member functions designed for the state variables 0, 0, output F and error E are shown in Fig. 5, 6 and 7 respectively.
are deleted. At the end of the second phase, only one rule remains from each group of rules with identical antecedents. Thus, a consistent network is produced (see Fig. 8(c)).
4
285
Computer Simulation of RBFNNC
The RBFNNC function is implemental with a computer simulation software. Fig. 4 shows one of the user
Temp. Rule Bases NE
Angle ZE
PO
NE
NL
NS
ZO
ZE
NM
ZO
PM
PO
ZO
PS
PL
Velodty
Rule Bases G,m~a6on
Tune Times = 9.999992371 Simulationparameters
Wodk Mode I
PdmeAngle:
o
Simu. Time: Goal Angle:
© ,,-
Tuning Timer:
Fig.4
-I Fig. 5
0
co,,,,
I
i p.~u,,
j
] P~.°-,
I
]
I
~uM
¢?ac
Rule Bases Selection
One of user interfaces for computer simulation of RBFNNC
I
- 20
Fuzzy subset definitions for 0 and t~
Angle
RuleBate Show
Fig. 6 Force
20
-I
Fuzzy subset definitions for force F
Angle
Fig. 7
Farce
NE
NL
NL
PO
NS
NS
i~
I Fuzzy subset definitions for error E
~~ NL
~ 5-71
NF
NE
Velocity
Velocity
(a)
Velocity
(b)
(e)
Fig.8 Topology of fuzzy neural network during rule generation (a) Full fuzzy neural network, (b) Rule base after first phase learning, (e) Rule base after second phase learning
~
NS
286
Journal o f Shanghai University
A set of nine rules shown in Fig. 9 is used in the R F E
0
module to predict the error E . T h e reinforcement v can
0
F
F NE ZE PO
be determined from E according to Equation ( 4 ) . T h e process of rule generation is clearly observed
NE
through change of topology of fuzzy neural network as
NE ZE PO [NE
NM PS
NL ZE
ZE NL ZE PM
ZE NL ZE PL
consists of all possible rules before learning. Fig. 8 ( b )
PO] NS PS
PO NL PM
shows the topology of network after the first phase of
(a)
(b)
shown in Fig. 8. Fig. 8 ( a ) shows a full network that
learning. All the rules that have produced a counterproductive result have been deleted.
In this example,
0
0 F
F
R B F N N C is executed for about 200 iterations in the
NE Z E ! P O
first phase. Fig. 8 ( c ) shows the final topology of the network after the second phase of learning. A set of op-
NE
timized rules, therefore, is easy to obtain from the net-
ZE NL ZE PL
work and then fed to CPBS for control simulation.
In
NE ZE P O NE
PM
PO NS PL
/?
NL PL
ZE NL ZE PL PO PM
the second phase R B F N N C is run for about 500 itera(c)
tions. At the end of this learning process, the control system is able to balance the pole in most of the situa-
(d)
Fig. 10 A group of different set of fuzzy rules generated by RBFNNC
tions. T h e rule base is not always the same, which depends on the parameters of the learning process.
References 0 E NE ZE PO NE NM NS ZE 0
ZE NS ZE PS PO ZE PS P M
Fig.9
A set of fuzzy rules for inferring to obtain E
Fig. 10 shows a group of different set of fuzzy rules generated by R B F N N C that can successfully control a cart-p01e balancing system. T h e experiment was repeated for different initial conditions of 0. In a successful
1 Patrikar A. , Provence J. , Control of dynamic system using fuzzy logic and neural networks, International Journal of Intelligent Systems, 1993, 8: 7 2 7 - 748 2 Kosko 13. , Neural Network and Fuzzy Systems, PrenticeHall, Englewood Cliffs. NJ, 1991:225 - 228, 304 - 306, 327 - 335 3 Berenji H.R. , Khedkar P. , Learning and tuning fuzzy logic controllers throug reinforcements, IEEE Transactions on Neural Networks, 1992, 3(5) : 7 2 4 - 740 4 Nauck D., Kruse R., NEFCON- I: An X-window based simulator for neural fuzzy controllers, Proc. IEEE Int. Conf. Neural Networks, IEEE World Congress on Computational Intelligence ( W C C I ' 9 4 ) , Orlanda, Jun. 26 - Jul.
trial, the controller learned to mainatain the pole along the vertical axis. Also the position of the cart was maintained close to the center. A set of rules in Fig. 10(a) is exactly in correspondance with the topology of network
in Fig. 8 ( c ) .
2,1994 5
Nauck U. D. , Kruse R. , A fuzzy neural network learning fuzzy control rules and membership functions by fuzzy error backpropagation, Proc. IEEE Int. Conf. Neural Networks ( I C N N ' 9 3 ) , San Francisco, Mar. 2 8 - A p r . 1, 1993:1022 - 1027