c Pleiades Publishing, Ltd., 2018. ISSN 0005-1179, Automation and Remote Control, 2018, Vol. 79, No. 2, pp. 327–336. c M.Yu. Ryabchikov, E.S. Ryabchikova, 2018, published in Avtomatika i Telemekhanika, 2018, No. 2, pp. 154–166. Original Russian Text
INTELLECTUAL CONTROL SYSTEMS, DATA ANALYSIS
Self-Tuning of a Neural Network Controller with an Integral Estimate of Contradictions between the Commands of the Learning Algorithm and Memory M. Yu. Ryabchikov∗,a and E. S. Ryabchikova∗,b ∗
Nosov Magnitogorsk State Technical University, Magnitogorsk, Russia e-mail: a mr
[email protected], b
[email protected] Received August 26, 2016
Abstract—We propose an approach to organizing self-tuning for a controller based on an artificial neural network that uses information on the contradictions arising in the creation of the value for the control signal between accumulated memory of the neural network and the learning algorithm based on backpropagation. The activity of neural network memory is estimated as its reaction to changing the state of the control system. Self-tuning is done by controlling the learning rate coefficient with an integral controller in order to stabilize the integral criterion for estimating the contradictions. Based on this modeling, we show a conceptual possibility for the operation of the self-tuning system with constant tuning parameters in a wide range of changes of the control object’s dynamical properties. Keywords: artificial neural network, control, backpropagation, self-tuning, control system memory DOI: 10.1134/S000511791802011X
1. INTRODUCTION Over the last decades control technologies with elements of artificial intelligence, in particular technologies based on artificial neural network (ANN), have become increasingly popular. Problems of applying ANN to control problems have been considered in many works of both Russian and foreign authors. A rather comprehensive survey of the history of this research direction’s development in the works of Russian authors has been given in [1], which contains a retrospective analysis of the history of the development of research done at the St. Petersburg State Electrotechnical University. All proposed approaches can be subdivided into two classes judging by the presence or absence of information regarding the properties of the model (object, system as a whole, reference model of desired properties, and so on). One of the most well-developed approaches, in case when a mathematical description of nonlinear dynamical systems is available, is tuning neural network controllers based on the method of analytic construction of aggregated nonlinear controllers (ACAC) [2]. Synthesis of ANN learning algorithms includes the choice of the same macrovariables to compute generalized learning errors as in the ACAC approach [1, 2]. The main paradigm of this research direction presumes that learning processes in a multilayer neural network, solution of stable extremals equations, and control for a multidimensional multiconnected object form a unified dynamical process. 327
328
RYABCHIKOV, RYABCHIKOVA
A drawback of this approach is the impossibility to design the two main components of the intelligent control system, namely the neural network memory and algorithm for its further tuning, separately. The neural network memory may preliminary contain dependencies obtain by the model of both dynamic and static properties of the system. A number of approaches are known for integrating models in the control circuit in order to solve various applied problems; see [3–6] and other sources. However, it is unclear whether it is possible to create control systems based on integrating the algorithms that continue learning and pretrained ANNs. Such systems may be implemented both in the form of augmenting ANN memory with new data and in the form of updating it with, e.g., the classical learning algorithm based on backpropagation. Obviously, in the first case to solve the problem one needs either dynamic updates for the training dataset with sampling algorithms [7] or using special fine-tuning algorithms [8, 9]. In the absence of information on the properties of the object there is no way to choose the macrovariables in advance in order to compute learning errors, and one has to analyze the system’s response to constructed controlling influences. The work [10] surveys the reasons that cause the need in such influences from the point of view of practical aspects of the development and application of controlling microprocessor equipment. Methods of influencing the system can be classified from the point of view of the frequency range. For instance, Siemens S7-300/400/1200/1500 controllers, given the assumption that the object’s dynamical properties correspond to the temperature type, implement a self-tuning algorithm based on obtaining a part of the transient response curve [3]. Remicont controllers implement a method [11, 12] that allows the system to have frequency characteristics with one resonance peak and is designed to find the complex frequency characteristic of the object in the neighborhood of the resonance frequency with an iterative procedure. The work [13] proposes to feed the object’s input with two sample harmonics in the region of comparatively high frequencies in order to estimate model parameters that are further used to predict frequency properties of the object while tracking its drifting parameters. The work [14] considers a neural network control system for multiconnected dynamical objects based on accounting for control errors during learning. The main computational relations in the learning of output layer neurons have the form l l (t) = wij (t − δ) − γ wij
∂E (2) ∂wij
=
n
αk ek
k=1
k = 1, n,
∂yk ∂ui
i = 1, n,
f
∂E δ, l ∂wij
(2)
zi
(1)
(1)
oj ,
j = 0, N1 ,
where ek , αk is the control error and the weight coefficient of the k th output variable; ∂yk ∂ui (2)
is the derivative of the k th output variable with respect to the i th input influence; zi is the superposition of input signals of the i th output layer neuron; f is the derivative of the activation (1) function; oj is the output value of the j th first layer neuron; γ is the learning rate coefficient; l (t) is the weight coefficient of the connection between the j th neuron of layer (l − 1) and the wij i th neuron of layer l; δ is the time discretization step. Formula (1) shows that the value of correction for weight coefficients that influences the form of the transition process depends on the learning rate coefficient γ for a given discretization step δ. However, the work [14] did not propose an approach for choosing γ. In [2], to solve the neural network control problem the authors proposed a fast backpropagation algorithm and a learning algorithm with error prediction. The authors admit the great importance of both γ and initial values of weight coefficients in the ANN, but their choice remains arbitrary. AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
SELF-TUNING OF A NEURAL NETWORK CONTROLLER
329
2. INFLUENCE OF THE LEARNING RATE COEFFICIENT AND INITIAL VALUES OF THE WEIGHT COEFFICIENTS IN A NEURAL NETWORK TO MEMORY FORMATION For a generalized estimate of the current memory state we need to introduce the corresponding criteria. The form for representing such criteria can be defined by the accepted equivalent scheme of the ANN learning process. The work [2] proposed such schemes based on nonlinear feedbacks that followed from the assumption that there exists a known reference control. However, when the reference is not available and when we consider the memory as a separate structure designed to solve control problems, this approach is hard to apply. At the same time, capabilities of modern microprocessors let us quickly perform various operations with a neural network’s memory, including copying, saving, and loading. In these conditions, control from a neural network controller U (t) − U (t − δ) can be represented as a sum of two separate values (Fig. 1). The first value ΔU1 is a control constructed by the neural network’s memory as a response to changing state estimates of system X over time period δ, with no regard for the correction of weight coefficients at the current time step. The second value ΔU2 is related to the control caused by the correction of weight coefficients. For the scheme on Fig. 1 we can write U (t) − U (t − δ) = ΔU1 (t) + ΔU2 (t) = (Y1 − Y2 ) + (Y3 − Y1 ) = Y3 − Y2 ,
(2)
where ANN outputs are: U (t) = Y3 —after weight correction on the current time step for current X; U (t − δ) = Y2 —before weight correction on the current time step for X from the previous time step; Y1 —before weight correction on the current time step for current X. The value ΔU1 lets us evaluate the state of ANN memory. As the estimate we can use the variance D(ΔU1 ) over a given number of the latest time steps starting from the current time moment. Figure 2 shows a sample difference D(ΔU1 ) (Fig. 2a) and D(ΔU2 ) (Fig. 2b) for an ANN that controls an object with dynamics (1/(20s + 1))(1/(2s + 1)) for δ = 0.01 s; δγ = 1.5 × 10−4 . Initial values of weight coefficients were chosen uniformly at random from the interval [−a; a], where a = 0.1. The disturbance is an automatic step-like change in the reference signal with period Ts = 400 s. This kind of disturbance was chosen in order to more conveniently compare quality estimates for transition processes. The ANN inputs include: the task, object output, and its increment over period δ. The system’s operation time can be provisionally divided into two periods. On the first period ΔT1 for small values of D(ΔU1 ) < 0.1D(ΔU2 ) the ANN’s memory influences how the transition process goes only implicitly, via its interaction with the learning algorithm. During the second period ΔU1 and ΔU2 jointly influence the control results, which in the considered example is ac-
Fig. 1. Decomposition scheme for the control signal of a neural network controller. AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
330
RYABCHIKOV, RYABCHIKOVA
Fig. 2. Sample changes in (a) D(ΔU1 ) and (b) D(ΔU2 ), and also (c) and (d) quality of transition processes when the control system is operating: 1—controlled parameter; 2—controlling signal; 3—task.
Fig. 3. Dependence of the probability of initial stability and average duration of ΔT1 on the learning rate for different values of a: 1—0.2; 2—0.4; 3—0.8; 4—1.2; 5—1.
companied by the grows of oscillations in the transition processes (Figs. 2c and 2d) with subsequent loss of stability (the ANN’s output becomes saturated). Reducing the learning rate γ and initial level of coefficients defined by the limits of their generation a leads to prolonging the period ΔT1 (Fig. 3a). Figure 3d shows how the probability of initial stability (over the period from the start to 2Ts ) depends on γ and a. Figure 3b implies that it is possible to change γ in a wider range at the initial tuning stage by reducing the initial level of weight coefficients while preserving the initial stability. A change in the learning rate γ influences both the quality of transition processes and the ANN memory contents, and we have to choose a criterion to evaluate them. AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
SELF-TUNING OF A NEURAL NETWORK CONTROLLER
331
3. EVALUATION CRITERIA FOR MEMORY CONTENTS A complex estimate of the memory contents is a typical statistical modeling problem. The model’s adequacy can be evaluated by comparing learning errors and generalization [15, 16]. Due to the decomposition scheme shown on Fig. 1, estimate of the memory contents can be done based on comparing ΔU1 and ΔU2 . This correspondence can take into account both absolute deviation of the values and the similarity between their changes over time. With regard to the problem of constructing a self-tuning system, control over the learning process in order to minimize absolute deviation of the values ΔU1 and ΔU2 in our opinion has no practical relevance since it simply leads to memorizing a certain preconstructed reference control. Therefore, as the evaluation criterion for the memory contents we have taken a coefficient that reflects similarity between them: R(t) =
N
(ΔU1 (t − nδ)ΔU2 (t − nδ) > 0),
(3)
n=0
where t is the current time and N is the number of averaging cycles chosen from the condition N δ >> Ts . Figure 4a shows sample changes of the value of coefficient R(t) and overregulation in time for constant γ (Fig. 4b). We see a high degree of correspondence between the changes of R(t) and overregulation while changing the memory state.
Fig. 4. Changes in R(t) and overregulation in ANN tuning.
It would be interesting to study the possibility to control the quality of transition processes based on the stabilization criterion R(t), including the tuning period ΔT1 . 4. CONTROL OVER THE QUALITY OF TRANSITION PROCESSES Control over the value of R(t) can be done both by influencing γ while minimizing the deviation and based on the correction of learning rate γu with respect to the deviation ΔU2 − ΔU1 . In numerical experiments, we combined these two approaches on every time step. A change in γ and γu was done with integral controllers Ki /s with values of Ki respectively 10−7 and 2 × 10−4 . Initial values of δγ and δγu were taken to be 5 × 10−5 and 0.15. A given value of R(t) for integral controllers was taken to be equal to 0.3. We performed computational experiments for three versions of dynamical properties of the control object: W1 =
1 1 × , 4s + 1 2s + 1
W2 =
1 1 × , 20s + 1 2s + 1
AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
W3 =
1 1 × . 100s + 1 2s + 1
332
RYABCHIKOV, RYABCHIKOVA
Fig. 5. Distributions of overregulation values after stabilization of R(t) for the following objects: 1—W1 ; 2—W2 ; 3—W3 .
In computational experiments with W1 , W2 , W3 we have used the same set of fifty variations of the original values of ANN weight coefficients with one hidden layer with four neurons. For versions of the system with W1 and W2 we have taken the disturbance parameter Ts = 400 s; for W3 , Ts = 1400 s. According to these result, for sixteen of the original ANNs we ensure stable transition processes with respect to R(t) for all three versions of dynamical properties of the control object. Figure 5 shows the distributions of overregulation values after the stabilization of R(t) for the said ANNs. Comparing ANN coefficients in groups with stable and unstable processes has shown that there is no connection with the average value or the variance of the weight coefficients. To compare the dependencies implemented by a group of ANNs with stable processes, we have enumerated possible combinations of input signals of the ANN and computed the correlation matrix between the resulting controlling signals (Table 1). These results indicate that we can distinguish three groups of ANNs with different types of initial dependencies. Table 1. Correlation matrix between Group 1 ANN 32 11 1 23 25 nos. 32 1.00 0.85 0.74 0.83 0.97 0.85 1.00 0.98 0.96 0.94 11 1 0.74 0.98 1.00 0.95 0.87 0.83 1.00 1.00 1.00 0.95 23 0.97 0.94 0.87 0.95 1.00 25 0.90 0.90 0.86 0.97 0.97 34 0.67 0.95 0.99 0.95 0.83 40 0.43 0.84 0.92 0.81 0.63 14 0.27 0.65 0.77 0.75 0.50 45 12 –0.21 0.32 0.49 0.34 0.04 13 –0.23 0.29 0.46 0.33 0.02 17 –0.43 0.11 0.28 0.06 –0.22 29 –0.47 0.02 0.21 0.10 –0.23 9 –0.69 –0.30 –0.18 –0.45 –0.61 48 –0.87 –0.50 –0.33 –0.46 –0.72 38 –0.95 –0.65 –0.50 –0.62 –0.84
the resulting controlling signals Group 2
Group 3
34
40
14
45
12
13
17
29
9
48
38
0.90 0.90 0.86 0.97 0.97 1.00 0.86 0.66 0.63 0.14 0.14 –0.16 –0.07 –0.65 –0.61 –0.74
0.67 0.95 0.99 0.95 0.83 0.86 1.00 0.95 0.86 0.58 0.56 0.35 0.32 –0.16 –0.23 –0.41
0.43 0.84 0.92 0.81 0.63 0.66 0.95 1.00 0.89 0.78 0.76 0.62 0.55 0.15 0.06 –0.13
0.27 0.65 0.77 0.75 0.50 0.63 0.86 0.89 1.00 0.84 0.84 0.61 0.72 0.01 0.23 0.04
–0.21 0.32 0.49 0.34 0.04 0.14 0.58 0.78 0.84 1.00 1.00 0.94 0.94 0.54 0.66 0.50
–0.23 0.29 0.46 0.33 0.02 0.14 0.56 0.76 0.84 1.00 1.00 0.93 0.96 0.52 0.68 0.53
–0.43 0.11 0.28 0.06 –0.22 –0.16 0.35 0.62 0.61 0.94 0.93 1.00 0.91 0.80 0.79 0.68
–0.47 0.02 0.21 0.10 –0.23 –0.07 0.32 0.55 0.72 0.94 0.96 0.91 1.00 0.56 0.84 0.72
–0.69 –0.30 –0.18 –0.45 –0.61 –0.65 –0.16 0.15 0.01 0.54 0.52 0.80 0.56 1.00 0.77 0.76
–0.87 –0.50 –0.33 –0.46 –0.72 –0.61 –0.23 0.06 0.23 0.66 0.68 0.79 0.84 0.77 1.00 0.98
–0.95 –0.65 –0.50 –0.62 –0.84 –0.74 –0.41 –0.13 0.04 0.50 0.53 0.68 0.72 0.76 0.98 1.00
Remark: we show high correlations, higher than 0.5, in dark. AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
SELF-TUNING OF A NEURAL NETWORK CONTROLLER
333
Fig. 6. (a) Transition processes with respect to R(t) and (b) the corresponding change in the overregulation and (c) and(d) learning rates for the following objects: 1—W1 ; 2—W2 ; 3—W3 .
Figure 6 shows transition processes with respect to R(t) when tuning an ANN from the first group, and also the corresponding change in overregulation and learning rates. We see that for various properties of the object, in all cases the control system after stabilization of R(t) ensures overregulation by 16–18 %. Table 2 shows the limits of change for the value of the transition coefficient for object Kou , for which stable transition processes with respect to R(t) are preserved for the considered group of ANNs for different object dynamics. Table 2. Admissible limits of change Kou Control object Kou W1 W2 Maximum 20 15 0.5 0.05 Minimum∗ ∗
W3 1.5 0.025
To obtain the minimum we reduced the amplitude of disturbances while keeping the limits of the ANN control signal constant.
AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
334
RYABCHIKOV, RYABCHIKOVA
Fig. 7. (a) Influence of Ts on the established learning rate δγ (1—W1 ; 2—W2 ; 3—W3 ); (b) influence of the number of averaging cycles N in (3) on the process with respect to δγ (1—δγ = 102 ; 2—δγ = 2500; 3—δγ = 104 ) for W2 and Ts = 50 s.
It is interesting to study the influence of frequency characteristics of the disturbances on the supported values of the learning rate δγ. Figure 7a shows the influence of Ts on the established learning rate δγ. We see that as the disturbances move to a high-frequency range compared to the assumed object properties, the established value of δγ quickly drops. This situation is characteristic for all ANNs except one that belongs to the third group; in this ANN we see a small duration of period ΔT1 and intensive growth of D(ΔU1 ) with subsequent loss of stability. Figure 7b shows an example of how the number of averaging cycles N in (3) influences the process with respect to δγ. A change of N δ in a wide range lets us preserve the stability of the transition process with respect to R(t). Our results on the admissible range of changes in dynamic and static properties of the control object, stability of tuning results in a sufficiently wide range of frequencies of acting disturbances indicate that the proposed approach to constructing self-tuning control systems based on comparing the influences of the learning algorithm and the memory constructed during the generalization process is feasible. 5. COMPARING CONTROL RESULTS FOR THE TUNED NEURAL NETWORK CONTROLLER WITH A PID CONTROLLER Figure 8a shows sample transition processes in the control system with a neural network controller and object W2 for Kou = 0.05 and Kou = 15 after stabilization of δγ, and also in the control system with the same object and an integral controller. We see that the processes have similar quality parameters. When we use for control object W2 a PID controller, the efficiency of transition processes is limited only by the accepted limit values of the control signal [−100; 100], which are also used for the neural network controller. The quickest process without overregulation is shown on Fig. 8b. Obviously, the PID controller lets one significantly increase the speed of transition processes compared to a neural network and integral controllers under the same constraints on control signal levels. Figure 8b also shows transition processes in a system with PID controller as Kou reduces from 15 to 0.05 with preliminary tuning for Kou = 15, and also as Kou increases from 0.05 to 15 with AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
SELF-TUNING OF A NEURAL NETWORK CONTROLLER
335
Fig. 8. Transition processes in the system for object W2 with: 1—neural network controller for Kou = 0.05; 2— neural network controller for Kou = 15; 3—integral controller; 4—PID controller with tuning for the quickest process without overregulation; 5—PID controller as Kou reduces from 15 to 0.05 with tuning for Kou = 15; 6—PID controller as Kou increases from 0.05 to 15 with tuning for Kou = 0.05.
preliminary tuning for Kou = 0.05. We see that a change in Kou negatively influences transition processes, causing long delayed processes and too differential control that leads to high-frequency oscillations of the controlled parameter. Relatively low performance of the system with a neural network controller can be related to the learning algorithm that we have used for the neural network, and also its structure. Therefore, an important topic for further work is to study the possible results of using, in the proposed selftuning approach, more advanced algorithms for neural network training such as, for instance, the Levenberg–Marquardt algorithm, which significantly outperform backpropagation in terms of the learning speed. 6. CONCLUSION To use the proposed approach in practice to create a self-tuning control system one has to optimize the control system R(t) in order to ensure the maximal possible tuning performance. We have to note that our results regarding successful tuning were obtained on the period ΔT1 , where memory contents influences the transition process implicitly, via an interaction with the learning algorithm. Due to this feature, it makes sense to reduce the initial level of weight coefficients to ensure initial stability and increase the duration ΔT1 , which can, however, negatively influence the duration of tuning (transition process with respect to R(t)). An important feature of the proposed approach is the possibility of temporarily stopping the process of memory updates while continuing the control process. To do that, we need to integrate the values of ΔU2 with the load at the next step of ANN operation from the previous time step without correcting the weight coefficients (keeping the memory unchanged). REFERENCES 1. Upravlenie i informatsionnye tekhnologii. Nauka i obrazovanie (Control and Information Technologies. Research and Education), Shestopalov, M.Yu., Ed., St. Petersburg: “LETI,” 2015. 2. Terekhov, V.A., Efimov, D.V., and Tyukin, I.Yu., Neirosetevye sistemy upravleniya (Control Systems with Neural Networks), Moscow: Vysshaya Shkola, 2002. 3. Ryabchikov, M.Yu., Andreev, S.M., and Ryabchikova, E.S., Algoritmy i sposoby samonastroiki sredstv regulirovaniya v sovremennykh mikroprotsessornykh kontrollerakh (Algorithms and Methods for SelfAUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018
336
RYABCHIKOV, RYABCHIKOVA Tuning of Control Methods in Modern Microprocessor Controllers), Magnitogorsk: Magnitogor. Gos. Tekhn. Univ., 2011.
4. Beinarovich, V.A., Self-Tuning Systems with a Reference Model, Tekhn. Sist. Dokl. TUSUR, no. 1 (21), part 1, 2010, pp. 67–69. 5. Denisenko, V.V., PID Controllers: Design and Modification Principles, STA, 2006, no. 4, pp. 66–74. 6. Chernodub, A.N. and Dzyuba, D.A., A Survey of Neural Control Methods, Probl. Programmir., 2011, no. 2, pp. 79–94. 7. Ryabchikov, M.Yu., Ryabchikova, E.S., and Sunargulova, A.I., Sampling Technological Information in the Control of Metallurgic Processes, Avtomatiz. Tekhnol. Proizvod., 2016, no. 2 (12), pp. 34–40. 8. Dmitrienko, V.D. and Zakovorotnyi, A.Yu., Solving the Fine-Tuning Problem for Classical Neural Networks, Avtomatiz. Tekhnol. Proizvod., 2015, no. 4 (10), pp. 32–40. 9. Dmitrienko, V.D., Zakovorotnyi, A.Yu., and Brechko, V.A., A Three-Layer Perceptron Able to FineTune, Avtomatiz. Tekhnol. Proizvod., 2014, no. 6, pp. 12–21. 10. Adaptivnye reguliruyushchie kontrollery (Adaptive Controllers). http://st07.ru/eldgt/asu/3/s/index39. html (Accessed at 26.07.2016.) 11. Rotach, V.Ya., Teoriya avtomaticheskogo upravleniya (Automated Control Theory), Moscow: Izdatel’skii Dom “MEI,” 2008. 12. Rotach, V.Ya., Avtomatizatsiya nastroiki sistem upravleniya (Automating the Tuning of Control Systems), Moscow: Energoatomizdat, 1984. 13. Mazurov, V.M. and Kondrat’ev, V.V., An Adaptive PID Controller with Frequency Separation of Control and Self-Tuning Channels, Prib. Sist. Upravlen., 1995, no. 1, pp. 33–35. 14. Elizarov, I.A. and Soludanov, M.N., A Self-Tuning Neural Network Control System for Multiconnected Dynamicsl Objects, Inform. Protsessy Upravlen., 2006, no. 1, pp. 30–44. 15. Cortes, C., Jackel, L.D., Solla, S.A., Vapnik, V., and Denker, J.S., Learning Curves: Asymptotic Values and Rate of Convergence, Adv. Neural Inform. Proc. Syst., 1994, no. 6, pp. 327–334. 16. Cortes, C., Jackel, L.D., and Chiang, W.-P., Limits on Learning Machine Accuracy Imposed by Data Quality, Adv. Neural Inform. Proc. Syst., 1995, no. 7, pp. 239–246.
This paper was recommended for publication by V.I. Vasil’ev, a member of the Editorial Board
AUTOMATION AND REMOTE CONTROL
Vol. 79
No. 2
2018