Bioprocess Engineering 7 (199l) 77-82
Bi0pr0cessEngineering
01785t 5X9100039Z
9 Springer-Verlag 1991
Bioprocess model building using artificial neural networks C. Di Massimo, M. J. Willis, G.A. Montague, M. T. Tham and A. J. Morris, Newcastle-upon-Tyne
Abstract. Artificial neural networks are made upon of highly interconnected layers of simple 'neuron-like' nodes. The neurons act as non-linear processing elements within the network. An attractive property of artificial neural networks is that given the appropriate network topology, they are capable of learning and characterising non-linear functional relationships. Furthermore, the structure of the resulting neural network based process model may be considered generic, in the sense that little prior process knowledge is required in its determination. The methodology therefore provides a cost efficient and reliable process modelling technique. One area where such a technique could be useful is biotechnological systems. Here, for example, the use of a process model within an estimation scheme has long been considered an effective means of overcoming inherent on-line measurement problems. However, the development of an accurate process model is extremely time consuming and often results in a model of limited applicability. Artificial neural networks could therefore prove to be a useful model building tool when striving to improve bioprocess operability. Two large scale industrial fermentation systems have been considered as test cases; a fed-batch penicillin fermentation and a continuous mycelial fermentation. Both systems serve to demonstrate the utility, flexibility and potential of the artificial neural network approach to process modelling.
1 Introduction The idea of using artificial neural networks (ANNs) as a potential solution strategy for problems which require complex data analysis is not new. Over the last 40 to 50 years, scientists have been attempting to emulate the 'real' neural structure of the brain and to develop an algorithmic equivalent of the learning process. The principal motivation behind this research is the desire to achieve the sophisticated level of information processing that the brain is capable of. The structure, however, of the human brain is extremely complex. Indeed, whilst the function of single neurons is relatively well understood, their collective role within the conglomeration of cerebrum elements is less clear and a subject of avid postulations. Consequently, the architecture of an A N N is based upon a primitive understanding of the functions of the biological neural system. Even if neuro-physiology could untangle the complexities of the brain, due to the
limitations of current hardware technology, it will be extremely difficult, if not impossible, to emulate exactly its immensely distributed structure. Thus, rather than attempt to accurately model the intricacies of the human cerebral functions, A N N s attempt to capture and utilise the connectionist philosophy on a more modest and manageable scale. Within the areas of process engineering, design and simulation; supervision, control and estimation; fault detection and diagnosis there is a reliance upon the effective processing of unpredictable and imprecise information. Bioprocess systems, in particular, provide extremely challenging demands in these areas. To tackle such complex data processing tasks current approaches tend to be based upon some 'model' of the process in question. The model can either be qualitative knowledge derived from experience; quantified in terms of an analytical (usually linear) process model, or a loosely integrated combination of both. Although the resultant procedures can provide acceptable solutions, there are many situations in which they are prone to failure because of the uncertainties and the non-linearities intrinsic to bioprocess systems. These are, however, exactly the problems that a well trained human decision process excels in solving. Thus, if A N N s fulfill their projected promise, they may form the basis of improved alternatives to current engineering practice. Indeed, applications of A N N s to solve process engineering problems have already been reported. Traditionally, one of the major obstacles to the widespread use of advanced modelling and control techniques is the cost of model development and validation. The utility of neural networks in providing viable process models on a more cost effective basis was demonstrated in [1], where the technique was used to successfully characterise two nonlinear chemical systems as well as interpret biosensor data. In an adjunct area, the use of neural network based models (NNM) for the on-line estimation of process variables was considered in [2]. The relative merits of process estimation using an adaptive linear estimator and an estimator based on an N N M , with application to a fermentation process were further discussed in [3]. In addition, the applicability of neural networks for improving process operability was in-
78 vestigated by [4]. In some situations, techniques based on the use of an NNM may offer significant advantages over conventional model based techniques. For instance, if the NNM is sufficiently accurate, it could theoretically be used in place of an on-line analyser. This is of particular importance when considering bioprocess application where on-line analysers may not be available to perform the task required. Such a philosophy may be used to provide more frequent measurements than could be achieved by hardware instrumentation. This is advantageous from the control viewpoint as the feedback signals will not be subject to measurement delays. Consequently, significant improvements in control performance can be expected. If an accurate NNM is available, then it could also be directly applicable within a model based control strategy. The particularly attractive feature is the potential to handle non-linear systems. Psaltis [5] investigated the use of a multilayered neural network processor for plant control. Whilst Willis [6] proposed the use of NNMs in the synthesis of cost-function based controllers. Here, an on-line optimisation algorithm is used to determine the future inputs that will minimise the deviations between setpoints and the predicted outputs obtained from an NNM. Another promising area for ANNs is in fault diagnosis and the development of intelligent control systems [7]. Here, a robust control system must be designed to accommodate highly complex, unquantifiable data. Hoskins and Himmelblau [8] described the desirable characteristics of neural networks for knowledge representation in chemical engineering processes. Their paper illustrated how an ANN could be used to learn and discriminate successfully amongst faults. Birky and McAvoy [9] presented an application of a neural network to learn the design of control configurations for distillation columns. They were able to demonstrate that their approach was an effective and efficient means to extract process knowledge. 2 Bioprocess modelling via the use of artificial neural networks As would have been noted, in almost all of the work cited above, an NNM is used in place of conventional models. The accuracy of the NNM model may be influenced by altering the topology (structure) of the network. It is the topology of the network, together with the neuron processing function, which impart to an ANN its powerful signal processing capabilities. Although a number of ANN architectures have been proposed [10], the 'feedforward' ANN (FANN) is by far the most widely applied. Indeed, Cybenko [11] has recently claimed that any continuous function can be approximated arbitrarily well on a compact set by a FANN, comprising two hidden layers and a fixed continuous non-linearity. This result essentially states that a FANN could be confidently used to model a wide range of non-linear relationships. The implications of this statement are considerable and therefore subsequent discussions will be restricted to FANNs.
Bioprocess Engineering 6 (1991)
FeedforwordNeuralNetwork Input[ a y e ~ Hiddenlayer Outputlayery
y
Fig. 1. Schematic of feedforward neural network architecture
2.1 Feedforward artificial neural networks The architecture of a typical FANN is shown in Fig. 1. The nodes in the different layers of the network represent 'neuron-like' processing elements. There is always an input and an output layer. The number of neurons in both these layers depends on the respective number of inputs and outputs being considered. In contrast, hidden layers may vary from one to any finite number, depending on specification. The number of neurons in each hidden layer is also a user specification. It is the hidden layer structures which essentially defines the topology of a FANN. The neurons in the input layer do not perform data processing functions. They merely provide a means by which scaled data is introduced into the network. These signals are then 'fed forward' through the network via the connections, through hidden layers, and eventually to the final output layer. Each interconnection has associated with it a weight which modifies the strength of the signal flowing along that path. Thus, with the exception of the neurons in the input layer, inputs to each neuron is a weighted sum of the outputs from neurons in the previous layer. For example, if the information from the i th n e u r o n in the j _ yh layer, to the kth neuron in t h e f h layer is I j_ 1,~,then the total input to the kt~ neuron in the fh layer is given by: Cgj,k = d j, k + ~
i=1
w j _ 1,i,k [ j - 1,i
(I)
where dj, k is a bias term and wj_ 1,~,kis the weight which is associated with each interconnection. The output of each node is obtained by passing the weighted sum, ctj,k, through a non-linear operator. This is typically a sigmoidal function, the simplest of which has the mathematical description:
ILk =
1/(1 + exp ( - ej,k))
(2)
and response characteristics shown in Fig. 2. Although the function given by Eq. (2) has been widely adopted, in principle, any function with a bounded derivative could be employed [12]. It is, however, interesting to note that a sigmoidal non-linearity has also been observed in human neuron behaviour [13]. Within an ANN, this function provides the network with the ability to represent non-linear relationships. Additionally, note that the magnitude of bias term in Eq. (1) effectively determines the co-ordinate space of
C. Di Massimo et al. : Bioprocess model building using artificial neural networks
where O is a vector of network weights, E is the output prediction error and t is time. In this contribution the objective function will be minimised using the chemotaxis algorithm [14]. This algorithm adjusts weights by adding Gaussian distributed random values to old weights. The new weights are accepted if the resulting prediction error is smaller than that recorded using the previous set of weights. This procedure is repeated until the reduction in error is negligible. The algorithm is summarised below:
Neuron processing (1/(l+exp l-z)) 1.0
+5 ~.0.8 0.6
U
r
0.4 Z
0.2 0
-10
!
-5
5
79
10
Step I Step II
Sum of neuron inputs
Fig. 2. Sigmoidal function
Initialise weights with small random values Present the inputs, and propagate data forward to obtain the predicted outputs Step Ill Determine the cost of the objective function, E 1, over the whole data set Step IV Generate a Gaussian distributed random vector Step V Increment the weights with random vector Step VI Calculate the objective function, E2, based on the new weights Step VII If E z is smaller than E 1 , then retain the modified weights, set E1 equal to E2, and go to Step V. IfE z is larger than E~, then go to Step IV.
the non-linearity. This implies that the network is also capable of characterising the structure of the non-linearities: a highly desirable feature. To develop a process model using the neural network approach, the topology of the network must first be declared. The convention used in referring to a network with a specific topology follows that adopted by Bremmerman and Anderson [14]. For example, a FANN with 3 input neurons, 2 hidden layers with 5 and 9 neurons respectively, and 2 neurons in the output layer will referred to as a (3-5-9-2) network.
Note that during the minimisation, the allowable variance of the increments may be adjusted to assist network convergence.
2.2 Algorithms for network training (weight selection)
2.3 Dynamic neural networks
Once the network topology has been specified, a set of inputoutput data is used to 'train' the network, i.e. determine appropriate values for the weights (including the bias terms) associated with each interconnection. The data is propagated forward through the network to produce an output which is compared with the corresponding output in the data set, hence generating an error. This error is minimised by making changes to the weights and may involve many passes through the training data set. When no further decrease in error is possible, the network is said to have 'converged' and the last set of weights retained as the parameters of the NNM. Process modelling using ANNs is therefore very similar to identifying the coefficients of a parametric model of specified order. Loosely speaking, specifying the topology of an ANN is similar to specifying the 'order' of the process model. For a given topology, the magnitudes of the weights define the characteristics of the network. However, unlike conventional parametric model forms, which have an a priori assigned structure, the weights of an ANN also define the structural properties of the model Thus, an ANN has the capability to represent complex systems whose structural properties are unknown. A numerical search technique is usually applied to determine the weights as this task is usually not amenable to analytical solution. Clearly, determining the weights of the network can be regarded as a non-linear optimisation problem. The objective function for the optimisation is written as:
The FANNs discussed above, merely perform a non-linear mapping between inputs and outputs. Dynamics are not inherently included within their structures, whilst in many practical situations, dynamic relationships exist between inputs and outputs. As such, the FANNs will fail to capture the essential characteristics of the system. Although, dynamics can be introduced in a rather inelegant manner by making use of time histories of the data, a rather more attractive approach is inspired by analogies with biological systems. Studies by Holden [13] suggest that dynamic behaviour is an essential element of the neural processing function. It has also been suggested that a first-order low-pass filter may provide the appropriate representation of the dynamic characteristics [15]. The introduction of these filters is relatively straightforward, with the output of the neuron being transformed in the following manner:
v ( o , t) = ' / 2 2 E ( o , 0 2 ,
(3)
yY(t) = (2 yY(t- 1)+(1 -f2) y(t);
0 < O < 1.
(4)
Suitable values of filter time constants cannot be specified a priori, and thus the problem becomes one of determining f2 in conjunction with the network weights. An appealing feature of the chemotaxis approach is that the algorithm does not require modification to enable incorporation of filter dynamics; the filter time constants are determined in the same manner as network weights. A particular instance where the use of a time history would still be appropriate is when uncertainty can exist over system time delays. The use of input data over the range of
80
Bioprocess Engineering 7 (1991)
expected dead-times could serve to compensate for this uncertainty. In this situation a 'limited' time history together with neuron dynamics may be appropriate. .O
"
3 Bioproeess application studies Whilst ANNs have wide applicability, this contribution concentrates upon their use in model building for on-line estimation (A wider scope of applications is presented in Willis [6]). Over the last decade biosensor technology has been evolving rapidly, however, the benefits of their application are still to be realised on an industrial scale. Lack of biosensor reliability and more importantly the financial consequences of sensor failure in its widest sense has served to maintain the prevalence of off-line sample analysis for bioprocess monitoring and supervision. However, it is often the case that the frequency of off-line analysis of the 'controlled' process output is insufficient to maintain 'tight' process regulation. A potential solution to this problem is to use on-line estimators to provide 'fast' inferences of variables that are 'difficult to measure' (a number of references to these techniques can be found in [16]). Although results from industrial evaluations have been promising, it is suggested that, due to their ability to capture non-linear characteristics, the use of N N M s may provide improved estimation performances. The following sections present the results of some evaluation studies.
ActuQI ....... Estimate
"
"
~
f~
o
,
0
9
i
)
100
200
Fig. 3. Biomass estimation
~) ~
300
"
9
I
Time t mycelial fermentation
~
400
h 500
Actual ....... Estimate
c~
o
o
h3 100
200
Fig. 4. Biomass estimation
300 400 Timet mycelial fermentation
h
500
3.1 Biomass estimation in continuous mycelial fermentation Due to industrial confidentiality, a thorough description of the fermentation process is not permissible. Nevertheless, note that biomass concentration within the fermenter is the primary control variable. However, with existing sensor technology, on-line measurements are not possible. In this example, biomass concentrations are determined from laboratory analysis and only made available to the process operators at four hourly intervals. However, this frequency is inadequate for effective control. The control problem would be alleviated if the process operators were provided with more frequent information. Fortunately, a number of other (secondary) process measurements such as dilution rate (the manipulative input), carbon dioxide evolution rate (CER), oxygen uptake rate (OUR), and alkali addition rate, provide useful information pertaining to biomass concentration in the fermenter. If the complex relationships between these variables and biomass concentration can be established via an ANN, then the resulting N N M could be used to infer biomass concentrations at a frequency suitable for control purposes. Although other variables, e.g. pH, temperature etc., affect the non-linear relationship, tight environmental regulation maintains these at an approximate steady state. Firstly, consider a neural network which relies solely upon a time history of inputs in order to incorporate dynamics. To model the process, a (6-4-4-1) network was specified,
and 'trained' on appropriate process data. Although a number of input variables could be used to train the ANN, any duplication of information in the network could be detrimental to the quality of estimates. This problem is similar to that due to multi-collinearities in multivariate regression. Inputs to the network should therefore be chosen with care. Two variables have already been identified as being 'critical' in previous estimation work [2], viz. CER and fermenter dilution rate. Thus, at each pass of the training circuit, inputs to the network were the current values of CER and dilution rate values in the data base. Additionally, two previous respective data pairs were also used. The data for comparison with network output for weight adjustment purposes, was the biomass concentration corresponding to the first input data pair. Therefore, 6 neurons were specified in the input layer, and 1 in the output layer. Figure 3 demonstrates the ability of the neural network to fit the noisy biomass data set, even when there was a major change in operating conditions and the resulting process gain. Here, a step change in the fermenter dilution rate has been applied at approximately 200 h. In this and subsequent figures scales have been removed for reasons of industrial confidentiality. Figure 4 shows the effect of introducing dynamics into the network. A 20% improvement of fit (in terms of Integral Square Error) which results, is associated with two addition-
C. Di Massimo et al.: Bioprocess model building using artificial neural networks al performance features. Firstly, a complex dynamic relationship exists between inputs and outputs. The inclusion of dynamics as an integral part of the network enables the building of a more comprehensive dynamic process description. Secondly, in Fig. 3 'raw' process data is fed to the network. Although there is some filtering incorporated within the datalogging system, the level of process noise is still reflected in noisy estimates of biomass. Whilst it could be argued that increased data filtering prior to network analysis is a possibility, the inherent filtering and 'automatic' choice of filter constants of the dynamic network is a major asset.
81
kcl/m3 ....................
"~176176
E o h5
....... Estimate
'
50
100 Timet
Fig. 5. Biomass estimation
10
h
ioo
penicillin process (data set 1)
3.2 Biomass estimation in fed-batch penicillin fermentation The measurement difficulties encountered in the control of penicillin fermentation are similar to those experienced in the continuous fermentation system described above. The growth rate of the biomass has to be controlled to a low level in order to optimise penicillin production. Whilst the growth rate can be influenced by manipulating the rate of substrate (feed) addition, it is not possible to measure the rate of growth on-line. The aim is, therefore, to develop a model which relates an on-line measurement to the rate of biomass growth. In this case, the O U R rather than CER was used as the on-line measured variable. Although both variables included the necessary information, measurements of O U R were more readily available on the plant. The continuous fermentation process considered previously normally operates around a steady state. Thus a linear estimator may prove sufficient in the vicinity of normal process operating conditions. On the other hand, the fed-batch penicillin fermentation presents a more difficult modelling problem since the system passes through a wide spectrum of dynamic characteristics, never achieving a steady state. The results of applying a N N M to predict fermentation behaviour should therefore provide an indication of the viability of the technique for modelling complex processes. Data from two fermentation batches were used to train the neural network. The current on-line measurement of OUR, was one of the inputs to the neural network. Unlike the previous case, fermenter feed rate was not used as an input variable since it is essentially constant from batch to batch at any point of each fermentation. Thus, the information it contains would not contribute towards the prediction of variations in biomass levels between fermentations. However, since the characteristics of the fermentation are also a function of time, the batch time was considered a pertinent input. The specified network therefore had a (2-3-1) topology. Figures 5 and 6 show the performance of the neural estimator when applied to the 2 training data sets. As expected, good estimates of biomass were achieved. OUR data from another fermentation batch was then introduced to the NNM, resulting in the estimates shown in Fig. 7. It can be observed that the estimates produced by the N N M are very acceptable, and almost as good as those observed in Figs. 5 and 6.
kg/m3 .,"
" ~ 1 7 6 9. . . . . . . . . . . . . . . .
" ............
~. . . . . . . .
E h~ o
....... Estimate r
0
i
5
100 150 Time t Fig. 6. Biomass estimation - penicillin process (data set 2)
200
kg/m3 /
......
~
..
E
o m
al ....... Estimate
o
s'o
1;o
Timet
40
2;o
250
Fig. 7. Biomass estimation - penicillin process (data set 3)
These results are very encouraging, since work on the development of a non-linear observer to estimate the biomass concentration of the penicillin fermentation has taken many 'man months' (Di Massimo [17]). In contrast, a neural network model can be established in a reasonably short time scale. Additionally, compared to the mechanistic model based observer, relatively good estimates have been achieved without the need for corrective action from off-line biomass assays. Nevertheless, the possibility of introducing off-line biomass data, to improve the performance of the neural estimator, is presently under investigation.
82 4 Concluding remarks The use of an artificial neural network methodology for bioprocess model development, for use within a estimation scheme, has been considered. Here, of particular importance is the ability of the neural model based estimators to provide 'fast' inferences of important, but 'difficult to measure', process outputs, from other easily measured variables. Applications to data obtained from industrial processes reveal that given an appropriate topology, the network could be trained to characterise the behaviour of the systems considered. Future work will concentrate on closing the control loop with the estimates produced by the neural estimators used as feedback signals for a conventional controller. This structure offers the potential of significant improvements in process regulation since secondary measurements enable the detection of load disturbances in a feedforward sense. The results presented evidence that A N N s can be a valuable tool for alleviating many current bioprocess engineering problems. However, as the field is still very much in its infancy and many questions still have to be answered. Determining the 'optimum' network topology is one example, currently ad hoc procedures are used. This arbitrary facet of an otherwise promising philosophy is a potential area of active research. A formalised technique for choosing the appropriate network topology is desirable. There also appears to be no established methodology for determining the stability of ANNs. This is perhaps the most important issue that has to be addressed before the full potential of A N N methodologies can be realised on-line. Nevertheless, given the resources and effort that are currently being infused into both academic and commercial research in this area, it is predicted that within the decade, neural networks will have established itself as a valuable modelling tool.
Acknowledgements The support of the Dept. of Chemical and Process Engineering, University of Newcastle; Smith Kline Beecham; and M arlow Foods are gratefully acknowledged.
Bioprocess Engineering 7 (1991) neural networks. Proc. 3rd Int. Symp. Control for Profit (1989) Newcastle-upon-Tyne 3. Lant, P. A.; Willis, M. J.; Montague, G. A.; Tham, M. T.; Morris, A. J.: A comparison of adaptive estimation with neural based techniques for bioprocess application. Preprints ACC, San Diego (1990) 2173-2178 4. Di Massimo, C.; Willis, M. J.; Montague, G. A.; Kambhampati, C.; Hofland, A. G.; Tham, M. T.; Morris, A. J.: On the applicability of neural networks in chemical process control. To be presented at AIChE Annual Meeting (1990) Chicago 5. Psaltis, D.; Sideris, A.; Yamamura, A. A.: A multilayered neural network controller. IEEE Control Systems Magazine, April 1988, pp. 17-21 6. Willis, M. J.; Di Massimo, C.; Montague, G. A.; Tham, M. T.; Morris, A. J.: On artificial neural networks in process engineering. Submitted to Proc. IEE (1990) Pt.D. 7. Bavarian, B.: Introduction to neural networks for intelligent control. IEEE Control Systems Magazine, April 1988, pp 3-7 8. Hoskins, J. C.; Himmelblau, D.: Artificial neural network models of knowledge representation in chemical engineering. Cornput. Chem. Engng. 12, 9/10 (1988) 881-890 9. Birky, G. J.; McAvoy, T. J.: A neural net to learn the design of distillation controls. Preprints IFAC Symp. Dycord + 89, Maastricht, The Netherlands, Aug. 21-23, 1989, pp. 205-213 10. Lippmann, R. P.: An introduction to computing with neural nets. IEEE ASSP Magazine, April 1987 11. Cybenko, G.: Continuous value neural networks with two hidden layers are sufficient. Internal report, Dept. of Comp. Sci. Tufts Univ. Medford, 1989 12. Rumelhart, D. E.; Hinton, G. E.; Williams, R. J.: Learning representations by back-propagating errors. Nature 323 (1986) 533-536 13. Holden, A. V.: Models of the stochastic activity of neurones. Springer-Verlag 1976 14. Bremermann, H. J.; Anderson, R. W.: An alternative to backpropagation: a simple rule for synaptic modification for neural net training and memory. Internal Report, Dept. of Mathematics, Univ. of California, Berkeley 1989 15. Terzuolo, C. A.; McKeen, T. A.; Poppele, R. E.; Rosenthal, N. P.: Impulse trains, coding and decoding. In: Terzuolo, C. A. (ed.): Systems analysis to neurophysiological problems. University of Minnesota, Minneapolis (1969) 86-91 16. Montague, G. A.; Morris, A. J.; Ward, A. C.: Fermentation monitoring and control: A perspective. Biotechnology and Genetic Engineering Reviews, Vol. 7 (1989) 147-188 17. Di Massimo, C.; Saunders, A. C. G.; Morris, A. J.; Montague, G. A.: Non-linear estimation and control of mycelial fermentations. ACC, Pittsburgh, USA (1989) 1994-1999 Received June 26, 1990
References 1. Bhat, N.; Minderman, P.; McAvoy, T. J.: Use of neural nets for modelling of chemical process systems. Preprints IFAC Symp. Dycord+ 89, Maastricht, The Netherlands, Aug. 21-23 (1989) 147-153 2. Montague, G. A.; Hofland, A. G.; Lant, P. A.; Di Massimo, C.; Saunders, A.; Tham, M. T.; Morris, A. J.: Model based estimation and control: Adaptive filtering, nonlinear observers and
G.A. Montague (corresponding author) C. Di Massimo M.J. Willis M.T. Tham A.J. Morris Department of Chemical and Process Engineering University of Newcastle-upon-Tyne Newcastle, NE1 7RU England