Biological Cybernetics
Biol. Cybern. 58, 361-372 (1988)
@ Springer-Verlag 1988
Recognition of General Patterns Using Neural Networks A. Jun-Wei Wong* Oak Ridge High School, Oak Ridge, TN 37830, USA
Abstract. The Hopfield model of neural network
stores memory in its symmetric synaptic connections and can only learn to recognize sets of nearly "orthogonal" patterns. A new algorithm is put forth to permit the recognition of general ("non-orthogonal") patterns. The algorithm specifies the construction of the new network's memory matrix Tu, which is, in general, asymmetrical and contains the Hopfield neural network (Hopfield 1982) as a special case. We find further that in addition to this new algorithm for general pattern recognition, there exists in fact a large class of Tu memory matrices which permit the recognition of non-orthogonal patterns. The general form of this class of Tu memory matrix is presented, and the projection matrix neural network (Personnaz et al. 1985) is found as a special case of this general form. This general form of memory matrix extends the library of memory matrices which allow a neural network to recognize non-orthogonal patterns. A neural network which followed this general form of memory matrix was modeled on a computer and successfully recognized a set of non-orthogonal patterns. The new network also showed a tolerance for altered and incomplete data. Through this new method, general patterns may be taught to the neural network.
1 Introduction
Classical pattern recognition programs using template-matching~ extensive specific descriptions of the patterns, and/or complex knowledge systems often require millions of steps to reach recognition (Winston 1984). These techniques do not seem to model accurately the process of human thought, as complex animal recognition often occurs in less than a hundred * Present address: Henry DeWolf Srnyth Scholar, Princeton University, Princeton, NJ 08544, USA
neural fring times (Ballard 1986; Brown 1984). The classical methods of pattern recognition also often have difficulty in recognizing patterns that are slightly altered from the original model patterns, and must find complicated algorithms to deal with incomplete and "fuzzy" data. Often, these methods will not function if a single programming component is ruined. Recent new advances (Hopfield 1982, 1984; Hopfield et al. 1983; Hopfield and Tank 1985; Tank and Hopfield 1986; Anderson 1983) about the pattern recognition capabilities of neural networks show that neural networks working in parallel are an effective alternative to the classical methods of pattern recognition. The Hopfield neural network (Hopfield 1982) is a special form of Von Neumann's Cellular Automata mathematical models of a cell network in which space and time are discrete (Von Neumann 1966). This neural network consists of interconnecting neurons. The memory of this neural network is stored within the synaptic connection strengths between the neurons. It eliminates some of the drawbacks involved in classical techniques and has the advantages of speed, fault tolerance, and biological similarities. Besides providing new insight into the function of the brain, it leads to feasible technological applications. A neural network is already under design and production as an integrated circuit at AT&T Bell Labs, in development stages at Texas Instruments and IBM, and in a prototype in an optical information processing system (Psaltis and Farhat 1986). It holds promise of many applications, such as in the optimization of routes in the traveling salesman problem (Hopfield and Tank 1985), high speed sort/search routines, unmanned space probes (due to its "robustness", i.e. its ability to function when individual processors fail), recognition of aircrafts, and converting speech into text (Sejnowski et al. as reported by Amado 1987). Its connection to dreams and REM (Hopfield et al. 1983; Crick and Mitchison 1983) as well as psychological conditioning
362
(Gelperin et al. 1985; Tesauro 1986) have been recently proposed. As of now, success has been attained in pattern recognition with the Hopfield neural network, but the domain of the patterns that may be taught to this network seems limited by an "orthogonality" or "pseudo-orthogonality" constraint. The vast majority of general pattern combinations do not follow this constraint. Several methods have previously been proposed in order to allow the recognition of nonorthogonal patterns such as by killing frustrated bounds (Kinzel 1985) and by implementing a quantized Hadamard transform (Shiozaki 1980). The construction of a memory matrix derived from vectors which fulfill an orthonormality condition has been proposed for the case where neuron states are represented by { - 1 , +1} (Grant and Sage 1986). The additional basis vectors employed in this technique are represented implicitly. Of special note is general pattern recognition through a synaptic weight projection matrix (Personnaz et al. 1985). This projection matrix method works particularly well when the states of the neurons are represented by { - 1, + 1}. However, when neuron states are represented by {0, 1}, this method does not contain the Hopfield model as a special case. This paper analyzes whether one can develop a new generalized neural network synaptic weight matrix which is explicitly defined, which is able to recognize non-orthogonal patterns where neuron states may be represented by either {0, 1} or { - 1 , + 1}, and which will contain as special cases both the Hopfield model and the projection matrix model of neural network. 2 Description of a Neural Network A neural network is a mathematical model of neurons, j, whose individual state at any given time, t, is described by Vj(t) which is the voltage in the neuron and has the value of 0 or 1 corresponding to the neuron's firing status) A network consisting of N neurons has an instantaneous state V(t) that can be imagined as an N dimensional vector having components Vj(t). These neurons are connected to one another with synaptic strengths, Tu from neuron j to neuron i, forming the memory matrix {Tij} of the network. The instantaneous input current, I~(t), into neuron i, is determined by: N
li(t)= j =2l TijVj~t)"
(1)
asynchronously in time and readjusts its state V~ according to the instantaneous input current Ii(t) and its threshold value Oi: V~(t+ At) = 1, if l,(t) > 0~, if I~(t) < 0~,
V~(t+ At) = 0,
(2a)
V~(t+ At) = V~(t), if I~(t)= 0,. Since only neural networks where 0i=0 for i - 1 ..... N are examined in this paper, (2a) may be written as V~(t+ At) = 89{1 + sgn(Ii(t))}, (2b) V~(t+At)=V~(t), if I~(t)=O. Depending upon the connection strengths Tu of the network, the evolution of V(O) may reach a stable state, V(t) after which successive evolutions of the network's state are always identical, i.e. V(t)=V(t+At). This state, lim V(t), is the "recognized" class of pattern to t--+ oD
which V(O) belongs. 3 The Hopfield Model Neural Network The Hopfield model's memory matrix {TU} for a set of patterns {VS}, s = 1.... n is:
(3) S=l
where/~ =(2Vi s - 1), ~s is greater than zero and is the relative teaching strength of pattern s, and T~ is set to 0. One notes the symmetry of the connection matrix,
T/j: Tji. The teaching strength ~, is the relative weighting of the pattern s with respect to other patterns. The traditional Hopfield model sets q = l , for s=l...n. However, this is not a necessary condition. By incorporating e~ teaching strengths into the T~j memory matrix equation, the basins of attraction of a pattern state may be modified. It was observed empirically that the larger the relative magnitude of q with respect to other teaching strengths e,, o-=l...n where o-#s, the greater the size of the basin of attraction for prototype pattern s. Although the e~term in (3) is not necessary for a properly functioning neural network, it is useful in manipulation of attraction basins. For the Hopfield model to function properly, it is necessary that the patterns to be incorporated into the memory matrix are nearly "orthogonal" to one another, i.e. the all patterns {V'} must comply (or nearly comply) with the following orthogonality condition: N
The network evolves under the following algorithm. Each neuron, i, interrogates itself randomly and
w)- Z
GG,
(4)
j=l N
where C~= Y~ #~Vj~=(lt~,V~)>0 and is the number The case when the state of n e u r o n j is described by #j(t) = • I is discussed in Sect. 6
j=l
of l's in V ~ and 6 is the Kronecker delta.
363
Initial state,
R e s u l t i n g states, t = 200
t---0
Case 1
Case 2
Case 3
- - + - -+_+_ + .... ~ +++++
+++++ +___+ +++++
+++++ +. . . . . + . . . . +++++
+++-+ . . . . + . . . . +++++
+++-+ - - + F + - - - + +----+
+ . . . . + . . . .
F ~
+++++ . . . . .
+++++ + . . . .
+--+--+ + ....
+----+ ++----+
+ . . . . + ....
F ~
+ - - - + +++++
~ +++++
+ - - - + +++-+
+----++++.--
+++++ .... + . . . .
~ ~
+++++ + - - - + q. . . . .
+++++ §. . . . . + . . . . .
++-++ +-+-+ +++-
+++++ . . . . + - - - +
+++++ +++++
+++++ +++++
-++++ -++++
+++++ +++++
F
+ - - - + +++++
+. . . . . +++++
+-+-+ ++-++
+ . . . . ++++-
---+---- + - + + - - - + + . . . . + . . . . + .... ~ - + - + - - + - -
++++q ~ +++++ +++++ + - - - + + - - - + +++++
+++++ q +. . . . . ++-++ +++--+ + - - - + +=--++++++
+++--+ .... + +----+ + - - - + +-----+ + - - + +++--
+++--+--+-+ - - + - - + - - +----+ + - - + ++§
+++-+ - - + + - - - + + - - - + +------+ +___+ +__+_ +++--
+++++ + - - - + + . . . . +++++ +++++ +. . . . . +___+ +++++
+++++ + . . . . +___+ +++++ +++++ + . . . . + . . . . +++++
+++-+__+_ +___+ + - - - + +------+ +___+ +__+_ +++--
+++-+---++ - - + - - + - - +-----+ + - - + ++~--
+++++
+++++ + - - + . . . +++++ +++++ + . . . + - - +++++
+++++ + . . . . + +++++ +++++ + + . . . . +++++
++-++ +-+-+ +++-++++ -++++ ++++-+-+ ++-++
+++++ . . . . +-----+ +++++ +++++ +----+ + . . . . ++++-
+++++++++ .... ++++-
+ . . . . +++++ +++++ + . . . . + . . . . +++++
+ .
. +
Case 4
+-
+ + + -
+ + + -
Thus, although this model can easily recognize "orthogonal" patterns, it cannot recognize general ("non-orthogonal") patterns. In Fig. 1, we show an attempt at using the Hopfield memory algorithm (3) to recognize the alphabetical letters A, B, C, D, and E which belong to a set of non-orthogonal patterns, i.e. patterns which do not satisfy the orthogonality condition (4). In the computer simulation, the patterns were represented by the states {V~}, s = A, B, C, D, E in a network with dimension N = 40. In Fig. 1 the patterns are displayed with the "+" symbol for Vj= 1 and
Fig. 1. H o p f i e l d m o d e l l e a r n i n g n o n - o r t h o g o n a l p a t t e r n s . T h i s figure d i s p l a y s f o u r d i f f e r e n t a t t e m p t s t o c o n s t r u c t a H o p f i e l d m e m o r y m a t r i x w h i c h i n c o r p o r a t e s t h e a l p h a b e t i c a l l e t t e r s A, B, C, D, a n d E a s s t a b l e s t a t e s o f t h e n e u r a l n e t w o r k . T h e s e prototype patterns {VS}, s = A , B, C, D, E a r e seen in the leftm o s t c o l u m n of the figure. T h e difference in the f o u r a t t e m p t s a r i s e s f r o m the different teaching strengths {e,}, s = A , . . . E e m p l o y e d in the c a l c u l a t i o n of the H o p f i e l d m e m m y m a t r i x (3). By modifying the {~), s = A , . . . E v a l u e s , t h e a t t r a c t i o n b a s i n s c a n a l s o h e a l t e r e d . E a c h n e t w o r k , f o u n d f r o m (3) using the {V~}, s = A , . . . E v a l u e s and the given {es} v a l u e s , w a s a s s i g n e d o n e of t h e p r o t o t y p e p a t t e r n s V ~ as a n i n i t i a l s t a t e V(0), a n d t h e n a l l o w e d t o e v o l v e as a c c o r d i n g t o r u l e s (1) a n d (2). T h e s t a t e s of the n e t w o r k after 200 t i m e s t e p s (or 5 m e a n n e u r a l u p d a t e s ) a r e shown for e a c h case of {~s} values. I n C a s e 1, t h e teaching strengths a r e all set e q u a l ~ a = ~ B = e c = ~ o = ~ = l . I n C a s e 2, e a = 1, ~B = 2, ~c = 3, ~D = 4, a n d e~ = 5. I n C a s e 3, e A = 2, e B = 1, ec = 3, eo = 2, a n d e~ = 1. I n C a s e 4, e A = 1, eB = 1, ec = 1, eD = 3, and e E = 1. N o n e o f t h e s e a t t e m p t s a t i n c o r p o r a t i n g t h e p r o t o t y p e p a t t e r n s as s t a b l e s t a t e s in the H o p f i e l d T~j m e m o r y m a t r i x w e r e successful
the " - " symbol for Vj=0, and time is measured in steps of updating a neuron in the network which is 1IN of the mean update period for an individual neuron. After 5 mean neural update periods, the resultant patterns did not coincide with the initial set to be recognized, even when the teaching strengths {es} of the different patterns were varied. Individual patterns could be forced to become stable by drastically increasing their teaching strength ~s, but despite any manipulation of the es values, the five patterns could never be taught simultaneously. The difficulty of the Hopfield
364 model in recognizing non-orthogonal patterns places a severe limitation on its versatility because for a network of N neurons, there are 2N general patterns, but a selected pattern can only have N - 1 orthogonal patterns. In passing note that if V is a stable state, then the conjugate state Y' obtained from V by interchanging 0 ~ 1 is also a stable state when there is a nearly equal numbers of l's and O's in each pattern of the set {VS}. 4 New Neural Network for General Patterns
Note that these vectors are both linear combinations ofp 1 and 0 2. Since Ds and K are both constants, each {s vector is a linear combination of vectors from the set {0~}, a = 1, ...n. The set of vectors {{s} has been introduced to satisfy a modified orthogonality condition: (~, W) = D~aso,
where D, is the D, found in (3') and is greater than zero. The validity of the modified orthogonality condition (4') may be demonstrated as follows. The definition of {~ yields
To recognize a set of general patterns {W}, s = 1.... n, we propose a new model with a memory matrix { Tij} given by
E s=l
(3')
(4')
= ~ (M s, V ")
The dot product (M s, V ~) may be attained by replacing in row s o f M s [from (3a')] the values It1, It2...P" by the values (g 1, V'), (0 2, V')...(it', W) yielding
where e~> 0 and is, again, the relative teaching strength
(01,V 1)
M~
(0 2, V 1)
...
(it",W)
(01, V s - 1) (02 V 2- 1)
...
(o",W -1)
(01 , V a) (o~,v ~ (01, VS+ 1) (02, v ~+ 1)
...
(o", w )
of pattern s, and ~ = D~ ~ .
Here D~ is an arbitrary constant greater than 0 which may optionally serve as a normalization factor, and we choose to define
(01 , vl)
(02, vl)
""
(M ~,V ")
=
(01,V ")
(02,V n)
...
on (0n, v s+l)
(02, V 1)
...
(0n, v 1)
(01 Vf - l ) (01, V s) K= (01, V s+l)
(02 V s-l) (it2, V s) (it2, vs+l)
... ... ...
(On,v s-l) (0n, V s) (0n, v s+l)
(0 2,v")
...
(it",v")
(3a')
(o",V")
for
a = s.
(4c')
But if o-4=s, i.e. V * + V s, then row s in the determinant (M~,V *) will be identical to row a causing (M ~,V *) = 0. Hence, D~ ({s, V .) = (M s, V ~) K = 0, (3b')
and It
9..
If a = s, i.e. V~= W, then (M s, V s) = K. Hence, Os ({s, V s) = (M s, V s) K- = D,,
(02~;2) --01(02' V2)--02(01, V 2)
(o_1)2v
(0 ~-,v")
(4b')
The M s term from (3a') is an N-dimensional vector represented in determinant form. To explain this notation a simple example is provided. In the two pattern case,
p1
+1)
(0n, v n)
(01,V 1)
M2= (01, vl)
(o,,V
(itnvs 1)
and
(01,v")
...
(0", vl)
(01, v") (01, VS-- 1) (02, VS- 1) ... M8 = 01 It2 ... (01, V s+l) (02,V s+l) ...
(4a')
~----0 1(.,v)+it 2 1 2(o1,Vl).
for
o-# s.
(4d')
Statements (4c') and (4d') combined establish the modified orthogonality condition (4'). The new model T~j memory matrix from (3') is in general not symmetrical. It is constructed to permit non-orthogonal pattern recognition. If all the patterns, {W} are orthogonal to each other [in the sense of (4)J and we solve for {~ in (3'), we obtain {s= Ds Its o r T/j= s= ~O ~ ~~ es#~#~, where D~ is the normalization factor in (4') and C~ is the analogous factor in (4). Note that if the Ds normalization factor is chosen to correspond to the value of C~ in (4) as D~=Cs=(its, W), i.e. the number of l's in W, then the neural network has been normalized so that it will contain the exact Hopfield model as a special case of this new model, with Hebb's
365 rule for a synaptic weight matrix when the prototype vectors are orthogonal. In this paper, Ds will be chosen as such unless otherwise specified. However, D~ need not necessarily be so chosen. It is clear from the structure of the determinant K that if {V~,...,V"} are not linearly independent, then K = 0 and {{~} becomes meaningless. Consequently, using the memory matrix in the form of (3'), a neural network of dimension N can recognize at most N patterns.
closeness condition is determined by first calculating Ii(t) which is given from (1) and (3') as
=
N
N
j=l
j=l
s=l
E
(8)
S=I
Substituting V(t) in (8) with the linear combination of {V i} as found in (7) yields
Ii(t)= ~=1 ~ 8sktS( ~s' j=a ~ PJVJ)
5 The Dynamics of the New Model We have chosen to construct the memory matrix in (3') so that the taught patterns, {V~}, s= 1,...n, are stable states (Sect. 2). The reason for this stability is demonstrated with the following argument. If a pattern, V(0) evolves into an instantaneous state V(t) and V(t) corresponds to one of the pattern states, V ~, that has been incorporated into memory in (3'), then from (1), (3'), and (4')
= ~
~ espj#~(~s,W).
s=l
(9)
j=l
Since the modified orthogonality condition (4') causes the embracketed term on the right to become Dfisj, current is given by Ii(t)= ~ 8~p~#[Ds.
(10)
s=l
The state of the neuron i at time t + 1 is then from (2b) and (10) =
~
1 Vi(t + 1)= ~ {1 + sgn(Ii(t))}
8"sl~s V a',) s#i[% ,
S=I __
- 8,D~pffi .
(5)
That is, I~(t) has the same sign as #7. Interpreting (5) and using (2b), it is seen that: If if
V~(t)=0 ( o r # ~ ( t ) = - l ) , then Ii(t)
I~(t)>0,
so
and (6) = 1-- 1-+-
s=l,s~:a
.
(11)
2
V~(t+At)=l.
That is, V(t)=V(t+At)= lim V(t). t-*co
Thus, V(t) will not change with time, hence there is local stability, and the pattern V(0) is recognized as the taught pattern V ". The case when V(t) is not exactly the prototype pattern W but is "close" to V" is next examined. An instantaneous pattern V(t) is defined to be "close" to prototype pattern V" if that pattern V(t), when represented as a linear combination of the taught patterns .... n,
V(t)= ~ piV i,
(7)
i=1
satisfies [e,D,p~] > i= a,~i*~8 i D i P i
Since/z~ = + 1,
~=1,~,~ 8~p~#~Ds ~ =l~s*~r 8~psD~
"
(12)
The closeness condition (7) and (12) cause
~= ~
i
1,s#a
e~p~#TOs < [e~p,O,[.
(13)
But since ]e.p~D.I = le.p~#'[D.[, the value of the quotient term on the right in (11) will depend only upon the sign of p~. That is, v,(t + 1) =
1 ( 1 + /~g~ 1 71/ = V(.
(141
"
This condition shall be referred to as the closeness condition. The evolution of a vector V(t) satisfying this
Hence, if the closeness condition (7) is fulfilled between and instantaneous network state V(t) and a taught
366
pattern state V ", then the state vector V(t+l) will update one of its neurons i so that V~(t+ 1)= Vfl. Thus the Hamming distance (the number of positions in which two binary vectors are different) between the instantaneous state V(t) and V ~ will decrease when V(t) ~V(t + z), where z is the lowest value for which V(t + T) , v(t). If each successive instantaneous pattern state V(t+kz), where z is defined as above and k = l .... n, continues to satisfy the closeness condition (7) with respect to the prototype pattern state V ~, then the instantaneous state V shall converge to V~ as t ~ . This analysis of the evolution of instantaneous patterns which are "close" to prototype patterns yields conditional basins of attraction, i.e. an instantaneous pattern will be drawn towards a prototype o- as long as this "closeness" condition continues to be fulfilled. Since this closeness condition (7) contains the values e~ and Ds which are both freely chosen arbitrary constants greater than 0, these conditional basins of attraction may be manipulated according to will by assigning different values for these constants. The question of global stability, i.e. the evolution of the neural network into a stable state for any input pattern, is next examined. If a new neural network's T~i memory matrix constructed using (3') happens to fall into a special case where the T~j values are symmetric T~j= Tji, e.g. the construction of the Hopfield special case from this new neural network, it has been shown (Hopfield 1982) that the system will be globally stable, i.e. always evolve inito stable states, due to a monotonically decreasing energy function. This Liapunov energy function of a state V is defined as E(V)= - ~
E T~jV~Vj,
(15)
i=1 j = l
so AE from V ~ V ' due to A Vi is given by N
AE(V~V')=-AV~(V~V') Z T~jVj, j=l
(16)
where AE(V~V')=E(V')-E(V) and AV~(V~V') -- v / - v,. Hence, AE due to a change in AVi will always be negative. So under the asynchronous updating algorithm, the energy function E will monotonically decrease until a least local E value is found which corresponds to a stable state. Therefore, all such symmetric T~j memory matrices obey global stability characteristics. However, in most cases, the T~jmemory matrix from (3') does not fall into this symmetric category. The general asymmetry of the new T~j memory matrix raises questions about the dynamics of the new model. The new proposed T~j memory matrix empirically demonstrated global stability for all input pat-
terns tested. When the Liapunov function values (15) were calculated for the instantaneous states of an evolving asymmetric neural network derived from (Y), these energy values were almost always monotonically decreasing with respect to the progression of time. The Liapunov model of energy function seemed nearly to describe the new model. There were only a very few ease where the transition from one state to another exhibited an increase in the energy function. Further, these few transitions in which the Liapunov energy function increased were all large Hamming distances away from any of the stored prototype pattern states. One would guess then, that the energy landscape for the new neural network may be quite similar to that of the Hopfield model. As the new neural network has the Hopfield model as a special case, this would not be a surprising result. It has previously been suggested that randomly determined asymmetric T~j memory matrices yield energy functions which change similarly to the Liapunov symmetric energy function, but with the addition of an algorithm corresponding to a finite temperature (Hopfield 1982). The new asymmetric T~j memory matrix may have an energy function which behaves in such a manner. However, such an energy function justifying the empirically observed global stability of the new natural network has not yet been found. It has empirically been shown (Hop field 1982) that systems with asymmetric T~j memory matrices evolve and eventually either 1) find stable states, 2) cycle among states, or 3) exhibit chaotic wandering in a small Hamming distance area. The implementation of a time-dependent general energy function is proposed in the hope of aiding a future understanding of why this asymmetric neural network derived from (3') exhibits global stability characteristics, whereas other asymmetric neural networks cycle or behave chaotically. A generalized energy function ~2 is introduced for any T~j memory matrix and defined as ~2(0)= T(T~j, V(0)),
(17a)
where f(Ti~, V(0)) is any function which may depend upon the values of T~j and values of V(0), and i = l L J = l T/jV/(t)
AVi(t--l--+t),
(17b)
where A V~(t-- 1 ~ t ) = V~(t)- V~(t-- 1). Note that rather than being a function the neural network's state V as is the Liapunov energy function (15), this O function is a function of time t. Additionally, ~2(t) is a monotonically decreasing function, since the change A(2 from t - 1 to t is
(18)
367 That is, Af2 is always negative or 0. Note that (18) implies using (1) and (2b) that if for any t, t = 1.... ~ , there exists a z, 1 < ~ < oo such that
behavior in a neural network is that lira f2(t)= - oo
(21b)
t ---~ oO
AVi(t--*t+7)4:0,
i.e.
V(t)+V(t+~),
then
(19)
Af2(t~t + ~)
f(T0, V(O)) = - ~
Z T~V~(O)Vj(O) i=l j=l
=E(V(0))
(20)
in order to equalize the initial values of the Liapunov energy function and the f2 energy function and if in addition the system is symmetric, then f2(t)= E(V(t)) for all t = 0 . . . ~ . That is, the Hopfield model Liapunov energy function coincides with a special case of the (2 energy function when the neural network in question is symmetric and f2(0)= E(V(0)). The behavior of the f~ function when there exists global stability is next examined. In this case, after a certain time t, all input patterns V(0) will evolve into some stable state V(t)= V ~ with finite energy f2(t)= ?, - oo < 7 < + oo. Since V(t) = V ~ is stable state, then for all 7 > t , A V/(7~'c+ 1)=0. Hence, from (18), Af2 after this time t will always be 0, and all successive values of f2(r), 7 = t .... oo will be equivalent to O(t)= ?. Thus, a condition of global stability is that for all initial states V(0), lira (2(t) = ?,
(21 a)
t---~ oO
where 7 may take on any finite value, - ov < 7 < + oo. Next the behavior of the f2 energy function is analyzed for the cases in which the system cycles or displays chaotic behavior in a small Hamming distance area. For the following discussion, ~ is defined as the minimum value for which V(t) =4=V(t + z). In the case of neural network systems which cycle or exhibit chaotic behavior, there must exist an infinite number of changes in the system's state, i.e. there exists an infinite number of values of t and ~ for which V(t) + V(t + ~). Using this fact and (19) it is evident that there are also an infinite number of Af2's such that A~(t ~ t + ~) < 0. Further, from (l 9), lim A~2(t~ t + ~) < 0. Hence, there t~oo
is no minimum value for the f2 energy function. Thus, the condition for the existence of cycles and chaotic
for some initial states V(0). If such an energy function were sketched upon the pattern state landscape, patterns to be found as stable states of the neural network would be, as expected, positioned as local minima of various energy troughs of the f2 function. An interesting result is that the patterns involved in cycles and patterns involved in chaotic wandering in a small Hamming distance area would be found as pattern states on the sides of infinite energy sinks. Thus, in a 3-dimensional envisionment of the f2 function on a pattern state landscape, for every set of patterns where a cycle existed, a vertical energy cylinder would be seen extending downward to - o o with the patterns involved in the cycle/chaotic wandering positioned as vertical lines on the sides of the cylinder. In a cycle, the evolution from one state to another would be represented by the evolution from one point on the side of the cylinder to another point on the side of the cylinder further below, and the entire evolution of the cycle could be seen holistically as a helix spiralling downward toward infinitely small energies. Chaotic behavior is envisioned similarly except that instead of exhibiting an orderly spiral downwards with its increasing energy, the system would evolve with a "random walk" down the sides of the energy cylinder. With this knowledge about the difference between neural networks which exhibit cycles and chaotic behavior and those which are globally stable [most notable in comparing conditions (21a) and (21b)], a new framework with which asymmetric neural network stability questions may be attacked is established. Although the new neural network for (3') has not been shown as globally stable by mathematical deduction, these new tools may aid in a proof. The question of global stability is a topic worthy of further research.
6 Other M e m o r y Matrices for General Pattern Recognition
There exist many different solutions for attaining local stability in the above neural network. We can see from the dynamics (Sect. 5) that the determining factor in the local stability of taught patterns is the new orthogonality condition (4'). As long as the constructed vectors ~s satisfy (4'), then M* and K, presently defined in (3a') and (3b'), may be redefined or generalized, and still the resultant memory matrix will give the set of patterns {V+} as stable states.
368
Thus, if new values for M Sand K are generalized by replacing Bk in the old definitions of M s and K from (3a') and (3b') with the rule
Under the new rule of determining the current we also need to introduce a new spin-glass orthogonality condition,
pk-*ak,
(~s, Its) = D~Ss~,
k = l , ...,n,
where {a 1.... a "} is any set of linearly independent vectors of dimension N for which K =~0, yielding (al, V 1)
( a 2 , V 1) ... (an, v 1)
:
MS =
:
(al V s - l ) aI
:
(~2 VS-1) ...(an, V s - l ) a2 . .. an
(3a'*)
(al, V S+1) (a2,VS+l) ...(a",V S+1) ...
(a 1, v")
(a ~, v") ... (a", v")
(4")
where D~ > 0. Note that if D~ = 1, then (4") is identical to the orthogonality relationship of Grant and Sage (1986). In this spin-glass formulation of neural network, the Tij memory matrix takes the same form as in (3') except that the quantities M S and K are redefined and generalized so that the spin-glass M s and K quantities are obtained by replacing the pk and yk values found in the old definition o f M s and K from (3a') and (3b') with the rules ~[k ---~otk
and (a I , V 1)
(a 2 , V 1)
...
(a n, V 1)
(~1,w-1) K= (a 1, w)
(a~,w -1) ... (a",w -~) (a ~, w) ... (a", w)
(al V s+l)
(a2 V s+l) ... (an, V s+l)
:
(a 1, v")
:
and vk.--).it k ,
(3b'*)
:
where k = l ..... n and {a k} is any set of linearly independent vectors of dimension N for which K + 0. This substitution yields
(a ~, v") ... (a", vn)
(al, pl)
Then the vectors {~s} calculated from the new M s and K from (3a'*) and (3b'*) will still satisfy (4'). The dynamics (Sect. 5) remain the same and any memory matrix so constructed will allow general patterns to be taught to the new neural network. Hence, the T~j memory matrix from (3') where {~s} has values M ~and K found from (3a'*) and (3b'*) is the general form of memory matrix which will allow non-orthogonal pattern recognition when the neuron states are represented by {0,1}. The freedom with which the {a s} vectors in (3a'*) and (3b'*) may be chosen permits great variability in the construction of such a T~j memory matrix. This extends significantly the library of memory matrices which allow non-orthogonal pattern recognition where individual neuron states are represented by {0, 1 }. The T~jmemory matrix in this form is in general not symmetrical. Another neural network which is capable of learning general patterns may be found if we alter the rule concerning the current Ii(t) of (1) into N
If(t)= E Tij#j(t). j=l
(1')
This is the spin-glass formulation of neural network where the state of a neuron i is represented by #f = { - 1, + 1 } and a pattern state is represented by &(t)= 2Vi(t)-i, with the evolution rule p~(t + A t) = sgn [I~(t)], kh(t + At)--#~(t), if I~(t)=O.
(2b')
(a2,~ 1) ...
(an,|l 1)
Z
(al,.s- 1) M s=
a 1
(a2 I t s - l ) . . . (@n, its-1) a2 . .. an
(3a'**)
(al, ps+ 1) (a2,its+t) ... (=,,ps+l) (al, It")
(a 2,~t") ... (a",~")
and
K=
(al,~ 1)
(a2,~ ') ... (a",~t 1)
(al, its-')
(a2,~e -1) ... (a",it s-l)
(a 1, ps)
(a 2, tts) ...
(a', Its)
(3b'**)
(al,|e +~) (a2,it s+l) ... (a",~e +1)
(al,|t ")
(a2,|t ") ... (a",|t")
The spin-glass M s and K values constructed in this manner yield {~s} vectors which satisfy the new spinglass orthogonality condition (4"). It is obvious that following an argument nearly identical to that shown in Sect. 4, local stability for taught, general patterns may similarly be proven. Hence, the general form of the T~j memory matrix which can recognize nonorthogonal patterns when the neuron states are represented by { - 1, + 1 } is the T~j memory matrix representation from equation (3') where the {~s} vectors in (3') are defined with M s and K as found in (3a'**) and (3b'**).
369
Epsilonvalues
Resultingstates, t = 2 0 0
~=0
+++---+++++ +-+++ --++++ ----+--+ +++++ +--+++ +++++
+++--+ +++--+ +--+++ --+++--++---+++--+ +++++ +++++
+++-+ +++++-+++ -++++ --+-+ +++++ +--++ +-+++
+++-+ ++++--+++ -+++-++-+++++ +++++ +++++
+++-+ +++++-+++ -++++ -++-+ +++++ +-+++ +-+++
e~=0.25
+---+++-+--+++ +++++ . . . . + +++++ +------+ +----++
+++++--+--+ --++++ --+++---+++-+++--+ +------+ ++++--
+++-+-+++ +++++ -+++-++-+ +++++ +-+++ ++++-
+++-+-+++ --+++ -++++ ---++ +++++ +---+ ++++-
+++++ +-+-+ -++++ -+++-++++++-+ +-+-+ ++++-
e~=0.5
----+-----+--+-+------+ +++++ +------+ +------+ +------+ +------+
++++-+--+--+ --++++ --+++---+++-++----+ +------+ ++++--
--.+.--++++--++ --.+-+---+ +---+ ++-++ +-+-+
+++++-+-+ -++++ -+++-++++++++ +-+-+ ++++-
++-++-+-+ -++++ -+++-++++++++ +-+-+ ++++-
e~=l
F --+--+-+------+ +++++ +------+ +------+ +------+ +------+
++--+-+--+--+ --++++ --+++---+++-+++++ +--+--+ ++++--
-+-++---+ + . . . . + . . . . +---+ -+-+-
+++-+--++---+ +--_+ +---+ +---+ +--++++--
++-++ +-+-+ -+++-++++ -++++ -++++-+-+ ++-++
F --+--+-+------+ +++++ +------+ +------+ +------+ +------+
++++-+------+ +------+ ++++-++++-+------+ +------+ ++++--
+++-+--++---+ +--_+ +---+ q. . . . +--++++--
+++++ + . . . + . . . +++++ +++++ + . . . + . . . +++++
e~=2
-+-++---+ + . . . . + . . . . +---+ -+-++
+
. .
. .
Fig. 2. New model learning non-orthogonal patterns. This figure displays the incorporation of a set ofnon-orthogonal patterns as stable states of the neural network using the new memory matrix of Sect. 4. A neural network whose Tq memory matrix contains random elements between - 1 and + 1 is first introduced. Beginning with this pre-memory neural network, we attempt to incorporate the alphabetical letters A, B, C, D, and E as stable states of the neural network by adding the new T~jmemory matrix from (3') to this random T~jpre-memory matrix yielding a summed T~j memory matrix. Each row in the figure represents a different summed T~i memory matrix used for the evolution of the neural network. The pre-memory part of the summed Tij matrix remains constant in all rows. The new T~j matrix varies with each row, as the {e~}, s = A .... E teaching strengths used to calculate this new T~jmemory matrix part from (3') are all set equal to the ~ value listed in the left-most column of the row. The prototype patterns {W}, s = A .... E used to calculate the new T~j memory matrix part from (3') are identical to those prototype patterns in Fig. 1. With each successive row in the figure, i.e. for each successive summed T~j memory matrix used for the evolution of the neural network, these {e~} teaching strengths used to calculate the new T~j memory matrix part of the summed T~j matrix are increased. After each neural network's summed T~j matrix is calculated, the neural networks are set with initial states corresponding to one of the prototype patterns, V ~, s = A .... E. Each column in the figure represents a different input state pattern, V s, s = A .... E with which each neural network is initially set. The initial states for each column are assigned progressing alphabetically from left to right, so that the first column has input pattern V A, the second column has input pattern V ,B and so on. This figure shows the state of each neural network after these input patterns evolved into a stable (after 200 time steps or 5 mean neural updates). Note that the greater the {e~} teaching strengths are, the more dominant are the stable states of the new T~j memory matrix part of the summed T~j matrix. When es = 0 and es = 0.25 the pre-memory matrix stable states are more dominant. When ~ = 1 and es = 2 the new T~jmatrix stable states are more dominant. After sufficiently large teaching strengths, ~s = 2, s = A .... E, the new neural network was able to recognize the alphabetical letters exactly
370
It will be shown that this general form of T~ memory matrix contains the T~j projection memory matrix model (Personnaz et al. 1985) as a special case. The T~j projection matrix has been defined identically as in (3') with e~ = 1, for s = 1,...n and with { given by the equation
~r = SI,
(3P')
where 3 r is the matrix with rows of {{1,...{,}, i.e. (3r)ij= ~}, S is the matrix with columns of {It1, .. 4t"}, i.e. Sij=gi, S r is the transpose of N, and where S I = ( S r S ) - I Z r is the pseudoinverse of s The solution of S in this equation (3P') can be found by the method of matrix inversion and shown as the special case of the solution of ff given by (3a'**), (3b'**), and (3') when the linearly independent set of Ndimensional vectors {ak}, k = 1.... n used to construct the spin-glass M s and K values in (3a'**) and (3b'**) is exactly the same as the set of prototype vector states {Its}, a = l , ...n, (that is otk=tt k, for k = l , ...n), and D~ from (3') is chosen such that D~=(lt~,p')=N, for a = 1,...n. With these values so chosen the general form of the new T~j memory matrix T~j reduces to the synaptic projection matrix (Personnaz et al. 1985). So, the projection matrix model is a special case of this neural network model. The existence of different neural networks for recognizing general patterns shows the large degrees of freedom in our choice of the methods to recognize patterns. The merits and/or disadvantages of the various methods is worthy of further exploration. 7 Computer Experiments We test the new neural network described in Sect. 4 on a computer. As an example, a neural network with an initially random T~i synaptic memory matrix successfully incorporated the alphabetical patterns A, B, C, D, and E as stable states of the neural network by adding
increments of the new T~jmemory matrix from (3') to the random T~j memory matrix, (Fig. 2). Here all the es, s = A . . . E teaching strengths used to calculate the T~i memory matrix from (3') are set equal and represented by the es value in the left-most column in the figure. For small values of es, the stable states of the initial random T~j memory matrix dominated, but for larger es values, Amount of distortion 5%error
10% error
t=50
- - + + -Jr + - - - §
--++ . . . . k ~--+- + - § + - - - + + - - - +
+ - + - + + - - - +
+++++ + - - - +
+++++ + - - - §
+++++ +----k
+§ + - - - +
+ - - - +
+ . . . .
k
+ - - - §
+ . . . .
I-
+ . . . . I+ - - - +
+ . . . . + . . . .
~- + - - - § ~- + - - - +
+ . . . . + . . . .
b t-
--+ ------++ +------+
+ ------++ +------+
~--+--++ +------+
k --+--+-+------+
+++++ +------+ +----+§ ++----§
§ +------+ +------+ +------+
+++++ +------+ +------+ +------+
+++++ §247 + . . . . +------+
+ ....
-/-
t=100
t=150
+ . . . .
t-
§
§ . . . .
------§247 +------+
--§ § ....
+
--+--§ +------+
--+--§ +------§
§247 +------+ +------+
§247 + . . . . + ....
§ §
+++++ § +------+
+++§247 +------+ +------+
§ +------+
+------§ §247
§ §
1 5 % error
+ . . . . §247
20%error
ID F i g . 3. Typical evolution of partially erroneous data. The neural
network's ability to recognize patterns which have been distorted from the original is tested. First, the neural network incorporates the patterns V s, s = A, B, C, D, E (identical to the corresponding prototype states in Figs. 1 and 2) into memory as stable states using the T~j memory matrix (3'). For convenience, the {es}, s-= A . . . . E teaching strengths used in calculating (3') are all set equal to 1. Then, the individual neuron states of the pattern " A " , i.e. V a were altered randomly in successive increments. The resulting patterns were then input to the neural network as initial states. Each row of the figure corresponds to a different initial state from which the neural network began. The network evolves in time from left to right with the network state displayed at 50 time step (i .25 mean neural update) intervals. In all the cases with up to a 3 0 % distortion, the altered patterns evolved back into the correct stable pattern of the letter " A " . The neural network was tolerant to small distortions of the o r i g i n a l patterns
t=0
3 0 % error
3 5 % error
+
-I- . . . . +------+
----+
§
k
+
-t ++--++
-~ ++----+
§
+++--+ ------++ §247 +----++ +----§
+++++ +------+ §247 §247 +------§
+++§247 +------+ +------+ +------+ +------+
_ _ § . . . .
+
q------
+
§247 +------+ +++++ +------+ +------+ +------+
q-------q-
-q
q
--+--+-+------+
--+--++ +------+
--+--++ +------+
--+--+-§247
--§247 --++--+ ++----+ ++++§ +++--+
--§ --+----+ +------+ +------§ ++----+
++++-++----§ §247 +------+ §247
+++++ ++----+ +------§ +------§ +------+
--+----+ --+--++ + . . . .
+++---++--++ §247
++++-+. . . . §247
----+++ +-§ +--+++ +--§247 ----+--+
§ +++++ +--+--+ §247 ++--++
++++-++++-+--+--+ +. . . . +§247
+
+
++++-+------+ §247 q-q-++++++-+------+ §247 §247
371
Initial state, t=0
t=50
-- -- +
t=100 F. . . .
t=150
F
-----F++
+ . . . .
F
+------+
. . . . .
F
+ . . . .
. . . . .
. . . . .
++++-F F
----++------F+--
++++-F F
F ------+--
. . . . .
+------+
+------
+ + +
+++++ +------+ +------+
+++++ +------+ +------+
+ - - -
+
+------+
+------+
+
+------+
+------+
+
++++-+------+
++++-+------+
++++--
++++--
++++--
++++-. . . .
++++-+------+
++++-+------+
F
+ . . . . +------+ ++++--
. . . . .
+
F
----++--
+------+
++++
. . . . .
F
++++-. . . . +------+
+
+------+ --+----+ +++--
--+----+
--+----+
--+----+
--+++--
--+++--
--+++--
F --+--+--
--+--+--
+ --+--+--
+ --+--+--
+
+------+ -~ -~
+------+ + . . . . + . . . .
+------+
+------+
F--
--+--+-~
--+--+-F
--++---+----+--
+++---+----+--
F
. . . . . + +
F
+------+
-~
Jr
~. . . . ~ ~ +------+
-f F
F ------+--
--+--+--
+
F + - - -
+++++
t=200
-~ +--+--
.....
-~
F---+----+--
--++---+----+--
. . . . .
F
+------+
~. . . .
+------+
+------+
. . . . .
F
+------+
+------+
+------+
+------+
. . . . . . . . . .
F F
+------+ . . . . . ------+--
+------+ +------+ +----+--
+------+ + . . . . +------+ + . . . . +----+--+----+--
F
+
----+----
+++----
+++----
+++----+++----
-----F++
----+++
----+++ + +. . . . .
----+++ + +
----+++
+++++
+++++
+++++
++++--
----+++
+--+++
+--+++
+++++
+++++
+
~
+
+
-----F++
the stable states of the new T~imemory matrix became more and more dominant. The neural network seemed to incorporate as stable states first, the more individually distinct letters, i.e. A, C, and D, and then, the more similar letters, i. e. B and E. The results of this computer experiment indicate that the new neural network model was successful in learning general patterns. Once the new neural network model had learned the alphabetical letters A, B, C, D, and E, using (3'), the new neural network model was able to identify the letters correctly even when they had been distorted from their original shape (Fig. 3). In general, the more distorted that the letter was, the greater was the time required to recognize the pattern and the greater was the chance of incorrectly identifying the pattern. In the case of the letter A represented by bits in 40 neurons, the pattern was recognized when up to 30 % of its bits are altered. This ability to recognize distorted patterns seems similar to human pattern recognition. Similar results were achieved with incomplete letters. The neural network was able to recognize a letter correctly even when the letter was incomplete. When part of the letter was covered and then shown to the network, the network was able to yield the correct or near correct letter pattern from the incomplete data (Fig. 4). The network showed the ability to retrieve memory from partial data. + +
----+++ +
----+++----+++----+++----+++
Fig. 4. Typical evolution of incomplete data. The neural network's ability to recall a complete pattern from incomplete data is tested. The neural network first incorporates the patterns V ~, s = A . . . . E into the T~j memory matrix with es= 1, for s = A . . . . E. This is the identical T,v memory matrix as the one employed in Fig. 3. The initial state of each row in the figure is assigned as one of the prototype patterns, V s with the first two columns of each pattern nullified. After approximately 5 mean neural update periods (200 time steps), the network was able to find the exactly correct pattern from the incomplete data for V s, s = A, C, D and a nearly correct pattern for V ~, s = B, E. It was found in a subsequent experiment, that if the {e~} teaching strengths used in calculating the T~ memory matrix were assigned as eA = 1, e~=2, ec = 1.25, e s = 1.5, and eB=2 then the neural network would find the exactly correct stable pattern state from the incomplete data for all V ~, s = A . . . . E. Hence, the network has the ability to retrieve memory from incomplete data
8 Discussion and Conclusions We describe and tested a new neural network model which allows for general patterns to be taught to a neural network. The success of the test experiments and the mathematical support given suggest that this method will be successful for general sets of patterns. This neural network has a high fault tolerance to altered and incomplete data. Its reactions to altered data seem analogous to human reactions to distorted patterns. The neural connections in this neural network are in general asymmetric, as different from the symmetric Hopfield neural network. The effectiveness of such a neural network model in serving as a pattern recognition system may provide some insight into the asymmetry/symmetry of neural connections in the brain. Conditional attraction basins, i.e. prototype states that would attract an instantaneous state V(t) as long as V(t) fulfilled a "closeness" condition, were shown to exist and found to be mathematically manipulatable by modifying constants in the closeness condition. The model empirically demonstrated global stability, but no mathematically deductive justification for this characteristic has yet been given. The question of global stability is a topic worthy of further research.
372 A time-dependent energy function f2 was introduced to facilitate analysis of the dynamics of general asymmetric neural networks. This time-dependent energy function O coincides with the Hopfield model Liapunov energy function when neural network in question is symmetric and the initial values of both energy functions are equalized. In the (2 energy representation, it was found that global stability occurs when for all input patterns, the limiting value of the O energy function is a finite number; cycling and chaotic behavior were shown to occur when there exist some input patterns for which the limiting value of the O energy function is - o e . Cycles and chaotic behavior m a y then be represented as infinite sinks in the system's state-energy landscape. It was realized that both the Hopfield and projection matrix models of T~j m e m o r y matrix construction are special cases of a more general pattern recognition method. Further, there exist m a n y m e m o r y matrices which permit the recognition of general patterns, in addition to the projection matrix model. In fact, any linearly independent set of vectors {ats} of dimension N will allow the construction of a memory matrix which permits the recognition of general patterns for both the neuron state representation of {0, 1} and that of { - 1, + 1} (Sect. 6). The large degree of freedom in the construction of the m e m o r y matrices is an interesting and unexpected result. It m a y provide new insights into the function of the brain. It m a y also lead to feasible technical applications in the future.
Acknowledgements. The author would like to thank Dr. A1Geist and Dr. Renchuang Wang for helpful discussions. He would like to thank Dr. Alan Solomon and Mrs. Benita Albert for their persistent encouragement. Part of this research was conducted at the Oak Ridge National Laboratory. He would also like to thank the members of the Mathematical Sciences Section of the Engineering Physics Division for their kind hospitality. A special thanks is due to the author's parents for their always very kind words of praise, their sensitivity, and their continually constructive encouragement which have made this paper a memorable pleasure to create.
References Amado I (1987) Growing "brains" in a computer. Sci News 131:60-61 Anderson JA (1983) Cognitive and psychological computation with neuronal models. IEEE Trans SMC-13:799-815
Ballard DH (1986) Cortical connections and parallel processing: structure and function. Behav Brain Sci 9:67-120 Brown CM (1984) Computer vision and natural constraints. Science 224:1299-1305 Crick F, Mitchison G (1983) The function of dream sleep. Nature 304:111-114 Gelperin A, Hopfield JJ, Tank DW (1985) The logic of Limax learing. In: Selverston A (ed) Model neural networks and behavior. Plenum Press, New York Grant PM, Sage JP (1986) A comparison of neural network and matched filter processing for detecting lines in images. In: 1986 Neural Networks for Computing Conf, Snowbird, UT, USA. AIP Conference Proc 151 Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79:2554 2558 Hopfield JJ (1984) Neurons with graded response that have collective computational properties like those of two-state neurons. Proc Natl Acad Sci USA 81:3088-3092 Hopfield J J, Tank DW (1985) "Neural" computation of decisions in optimization problems. Biol Cybern 52:141 152 Hopfield JJ, Feinstein DI, Palmer RG (1983) "Unlearning" has a stabilizing effectin collective memories. Nature 304:158-159 Kinzel W (1985) Learning and pattern recognition in spin glass models. Z Phys B - Condensed Matter 60:205-213 Personnaz L, Guyon I, Drefus G (1985) Information storage and retrieval in spin-glass like neural networks. J Phys (Paris) Lett 46:L359-L365 Psaltis D, Farhat N (1986) Optical information processing based on an associative-memory model of neural nets with thresholding and feedback. Opt Lett 10:98-100 Shiozaki A (1980) A model of distributed type associative memory with quantized Hadamard transform. Biol Cybern 38:19~2 Tank DW, Hopfield JJ (1986) Simple "neural" optimization networks: an AID converter, signal decision circuit, and a linear programming circuit. IEEE Trans CAS-33:533-541 Tesauro G (1986) Simple neural models of classical conditioning. Biol Cybern 55:187-200 Von Neumann J (1966) Theory of self-reproducing automata. University of Illinois Press, Urbana London Winston PH (1984) Artificial intelligence. Addison-Wesley, Reading, Mass Received: March 21, 1987 Accepted in revised form: September 21, 1987 A. Jun-Wei Wong Henry DeWolf Smyth Scholar Princeton University Princeton, NJ 08544 USA