Biological Cybernetics
Biol. Cybern. 65, 351-355 (1991)
9 Springer-Verlag1991
Cross-talk theory of memory capacity in neural networks R. J. MacGregor t and G. L. Gerstein 2 1 Aerospace Engineering Sciences, University of Colarado, Boulder, CO 80309-0429, USA 2 Biophysics and Physiology, University of Pennysivania, Philadelphia, PA 19104, USA Received September 20, 1990/Accepted in revised form May 8, 1991
Abstract. The present paper presents a theory for the mechanics of cross-talk among constituent neurons in networks in which multiple memory traces have been embedded, and develops criteria for memory capacity based on the disruptive influences of this cross-talk. The theory is based on interconnection patterns defined by the sequential configuration model of dynamic firing patterns. The theory accurately predicts the memory capacities observed in computer simulated nets, and predicts that cortical-like modules should be able to store up to about 300-900 selectively retrievable memory traces before disruption by cross-talk is likely. It also predicts that the cortex may has designed itself for modules of 30,000 neurons to at least in part to optimize memory capacity.
in which multiple memory traces have been embedded, and develops criteria for memory capacity based on the disruptive influences of this cross-talk. The theory is based on interconnection patterns defined by the sequential configuration model of dynamic firing patterns. The theory accurately predicts the memory capacities observed in computer simulated nets, and predicts that cortical-like modules should be able to store up to about 300-900 selectively retrievable memory traces before disruption by cross-talk is likely. It also predicts that the cortex may have designed itself for modules of 30,000 neurons to at least in part to optimize memory capacity. Additional simulation resuits and further elaborations of the theoretical approach followed here are given in MacGregor (1991b). Theory and methods
Introduction
Recently a number of theoretical studies have addressed the question of memory capacity in neural nets using various assumptions regarding the character of memory traces and the underlying nets (Braitenberg 1977; Kohonen 1972; Palm 1980, 1981a, b, 1982, 1987a, b; Hopfield 1982; Aertsen and Palm 1986; Sejnowski 1986; McEliece 1987; Amari 1989; MacGregor 1987). These studies have indicated that more specific definitions of hypothesized cell assemblies should increase our abilities to predict storage capacities and their dependence on underlying parameters. For example, none of these studies recognize temporal relations among traces neurons. A companion paper introduces a sequential configuration model for coordinated dynamic firing patterns in local neural networks, and shows by computer simulation that this model produces retrievable memory traces with firing patterns compatible with multi-microelectrode experimentation (MacGregor 1991a). The present paper presents a theory for the mechanics of cross-talk among constituent neurons in networks
When multiple traces are embedded in one system, cells which are participating in a realization of one of the traces may also project extraneous activation ("crosstalk") to other cells with which they may be coupled in other of the traces. This cross-talk is fundamentally disruptive. It contributes to the intrinsic stochastic nature of realizations of traces; determines and governs the ability of realizations to maintain themselves in the presence of noise and various types of simulation); and limits the numbers of distinct traces which can be simultaneously embedded in networks.
Stochastic theory of cross-talk The analysis of cross-talk developed here is based on the arrangements of synaptic interconnections defined by the sequential configuration model for memory traces developed in MacGregor (1991a). Note first that if a sender link active at any given time shares x cells with any other link, S*, embedded in the net, then cross-talk consisting of x psps of amplitude dl (see MacGregor, 1991a) will be projected to all cells in all links to which link S* projects. The mathe-
352 matical description of the cross-talk follows from this observation. The probability that a given sender link shares x cells with any other independently chosen single link is given by the hypergeometric distribution, p(x), with mean n*(nf/N), and also known standard deviation. Here n is the number of cells in a link of the bed of the trace, nI is the number of trace cells that fired in the active link, and N is the number of cells in the net. Now, if a net contains y independently chosen links, the number of these links, Zx, which share x cells with a given sender link can be obtained by a Bernoulli trial situation with probability of success on one trial equal to p(x). Thus Z x is a binomially distributed random variable with mean, Ox, equal to p(x)*y, and also known standard deviation. Next, note that each of these sets of links which share x cells with the given sender will project x psps to the downstream links to which they connect. Thus, they determine (ny + 1)*nL sets of receiver links, Sx~, each of whose members receive cross-talk of x psps of strength d/when the given initial sender is activated. Here nL is the number of links projected to by the sequential configuration model, and d/is the synaptic strength of lth order synaptic conenction (MacGregor 1991a). The number of links in each Sxl set is also given by Zx, and has the same mean and standard deviation. The amount of cross-talk any given network neuron experiences, can now be obtained by determining how many of the Zx S~/receiver links it is in. If, for example, all the links are initially chosen independently with all cells equally likely to participate in any one link, then the number of Sxz receiver links it is in, zxt, is given by a Bernoulli trial situation with probability of success on one trial equal to the probability that any given cell is in any given link, n/N. Thus, zxz is a binomially distributed random variable, Zxt, with mean, Ox *(n/N), or p(x)*y*(n/N), and known standard deviation. Finally, the total cross-talk on a given network cell is a random variable, 3, whose distribution is given by the sum of x psps of strength d/ for all of the Sx~ receiver links it is in; that is, ~ is given by (I) nL
~(x,{Zx,})= ~
Z Zxz*x*dz.
(1)
x=Oi=l
The mean and standard deviation of the cross-talk variable, r defined in (1), are given in (2) (a) ,ur
= ~
Here, d is the sum of the individual synaptic strength values, Z dt; D is the sum of the squares of these values, Z (d~); and r is a short hand for the ratio of link cells to network cells, n/N. Note that the values of pc and a~ are given as functions of the number, y, of links stored in the net, and therefore show how cross-talk depends on numbers of traces embedded. Equations (2) allow us to predict the distribution of cross-talk activation over network cells for any given number of embedded links, y, as a function of network parameters.
Disruption criterion: general The number of cells which will have r greater than any particular number can be determined by (3) which has the property that A *N cells will be expected to have a such that ~ > ~crit on any given trial. If, for example, A =ny/N, and ~crit = threshold, then the number of cells that would be expected to fire at a single trial due to cross-talk alone would be nf.
I.Zr + (k(A) + m)*o'r = r
Here, k(A) and m are convenient measures of the number of standard deviations between ~crit and &. If one uses the expressions for/~r and ar from (2) in the criterion of (3), and solves the resulting quadratic equation, the expression given in (4) is obtained for the allowable number of (a) y< = ,,/B/2 - (B/2) 2 - (~r
(b) B -
= d*r2*nf*Y
(k + m)2*D*(1 -- r)
dZ*r2*nf [( 1
nf/N) + r*nf + 2 * ~ c r i t (1 -- 1/N) d*r2*nf
-- r)*( 1 --
links, Yc, that may be embedded in the network before disruption by cross-talk should be expected.
Disruption criterion: specific Now suppose that disruption will occur when the magnitude of the cross-talk at each time level approaches some given fraction, s, of the signal input at each time. This criterion is stated in (5) ~crit ~---
= Z a,*r*y*~ x*p(x) = a*r*y*~
))2
(4)
~ x*d,*Iz~, = ~, ~ x*dz*r*p(x)*y
x=O/=l
(3)
9
s*nf.
(5)
The appendix estimates s in terms of physiological parameters. (2)
2
Results
= ~ Z x2d~ *r*(1 -- r)*p(x)*y = D *r 2.( 1 - r) *nf*[( 1 -- r) *( 1 - nf/N) [ (1 - 1/N) + r*nf]*y.
This theory can be used to predict levels of cross-talk activation in single neurons, the storage capacities of model and real local circuits, and the dependence of these on neural parameters.
353 nf
inversely as the number of cells per bed link, n with a proportionality factor of s[d. With n = n f = 5 6 , and N = 3 0 , 0 0 0 , we find y = 26,880 and NT,,x = 896. The more exact solutions (using the standard deviations) show further that for m = 1, NT,,x = 814, and for m = 2, NT,,~ = 740. We have determined approximate storage capacities of small nets (N = 100) for the following three cases: (nf= 4, L = 12), (nf= 7, L = 7), (nf= !0, L = 10). All three cases used dc = 2, and nL = 4. With trace signal levels moderately above threshold (b about 1.1), the 100-cell nets allowed successful retrieval of up to eight 4 - 1 2 traces, five 7 - 7 traces, and only one 10-10 trace. The theory for this case predicts: for the nf= 4, L = 12 traces: y = 94.8 and N T = 7.9 (compared with eight observed); for the nf= 7, L = 7 traces: y = 31.0 and N T = 4 . 4 (compared with 5 observed); for the nf= 10, L = 10 traces: y = 15.2 and N T = 1.5 (compared with 1 observed). If one uses the criterion that disruption will occur when nf cells are expected to fire (i.e., A =nf/N: k(0.04) = 1.75, k(0.07) = 1.475, k(0.10) = 1.28), the predictions are higher: y = 136 and N T = l l . 3 for nf = 4; y = 55 and N T = 7.9 for nF = 7; and y = 29 and N T = 2.9 for nI = 10. Both approaches show the right trends; the first criterion matches the numbers more precisely.
b0
r= - ~ = S--. ~ m=O
r=0.00012
(O3o=15) 100 rrl=2 m=l m=O
r=0.00047 (03o=60)
10 m=2 I! x
m=0
, ~ ,
z
r=0.00186 (030=240)
0.1 m=O
r ,
r
'
~' ,
~ ,
,
,
,
r=0.00746 (030 =960)
m = 2
I
I
1
3
I 10 30 NET SIZE
I 100
I 300 x 103
Fig. l . M e m o r y capacities for large nets - basic trends
Storage capacities It is convenient to illustrate the main large-scale properties of the theory by plotting predictions for memory capacity y and NT,,x as a function of network size, N, where the ratio, r = nf/N is held constant, and the threshold, q~, is taken as proportional to net size. These results are shown in Fig. 1. Figure 1 shows that the allowable capacities for any relative threshold value levels off at higher values of N. Physically this corresponds to the fact that the standard deviation of ~ becomes progressively less significant as N gets large. This means that one can develop approximate equations by setting a t = 0 for application to larger nets. The curve corresponding to r =0.00187, which corresponds to our best estimate for the cortical module, q~ = 240, levels off at a net size of just about the observed module size, N = 30,000. The trace capacity goes down by a factor o f about 16 for every drop of a factor 4 in the ratio, r. Using the assumption that ~r = 0 for larger nets, we can write the simplified approximate equations shown in (6).
If network cells are bombarded by random input, R, as well as cross-talk, the total extraneous activation, ~, must be incremented by an amount R. If this input is fed to each cell in densely connected nets through N synapses each of which has probability, q, of being active at any given time unit, then R is a binomially distributed variable with mean q*N*s # and standard deviation ~ / q * ( 1 - q ) * N * s # , where s # is the average synaptic strength. Thus the mean and standard deviation of ~ are modified from (2) for this case as shown in (7). This predicts that (a) /~ = d*r2*nf*y + q*N*s # (b) a~ = a2CT + q*(1 - q)*N*s #
(7)
maximum storage capacity occurs at a prescribed value of net size and that the capacity is reduced to 1/3 the cross-talk only value. The relevant simplified approximate equations are shown in (8) (a) #~ = ~r~t
(a) /~r = ~r
(b) d*rZ*nf*y = s*nf (c) y = s/(d*r z) = s*N2/(d*n 2)
Trace capacities in larger nets with random background activity
(6)
(d) NT,,x = y / L . Here, L is the length of the sequential configuration. We might estimate it as 30 for cortical modules. These equations show that the memory capacity for large nets increases as the square of net size, N, and
(b) d*r2*nf*y + q*N*s # = s*nf (c) y = (s*nf*N 2 - q*s # *N3)/(d*n} ) dy 2*s*ny*N - 3*q*s # *N 2 = o (d) ~ = d'n)
(e) Np = 2*s*nf/(3*q*s # ) (f) y~ = s/(3*d*r~) = (ycr/3) (g) NTmx = yp/L.
(8)
354
/t
g/
300
~
//
o,=4,so
,
200
100
0
1 10
I 20
I 30
40
I x 103 50
NET SIZE
Fig. 2. Memory capacity with random background firings
If n f = 5 6 , and wr =4/sec, then Np =31,779, and NTr,,x = 336. Figure 2 shows exact solutions for NT,,,x for this case. For m = 1, Np = 31,250 and NT,,,~ = 268; for m = 2, Np = 30,500 and NT,,,x = 244. Discussion
The main contribution of this paper is to establish a theoretical mechanics for predicting the distribution of cross-talk activity levels in multiply-embedded networks. The theory allows the prediction and interpretation of a number of significant features of neural network design. Two broad results of the theory are significant. The first is that cortical-like modules should store about 300 memory traces. This seems like a reasonable number in view of the current interpretation of modules in visual, auditory, and somatosensory cortex as feature extractors. One might define 300 different values of direction of a line, for example. A significant question is whether the modular organization is characteristic of posterior association cortex and frontal cortex as well as for the posterior sensory areas (Braitenberg 1978; Eccles 1981; Palm 1981; Abeles 1982; Szentagothai 1983; Gerstein et al. 1989; Vaadia et al. 1989). A further question is then whether it is fruitful to view an entire region, say the right or left posterior association areas, or the primary auditory area, as consisting of large numbers of independent or quasi-dependent modules each of which is capable of resonating in any of about 300 modes. What would be the capacities and fundamental characteristics of a neural system designed in such a way? If we estimate that one posterior association area, for example, contained about 1 billion cells, it could contain about 30,000 non-overlapping modules, each of which, in turn, contained about 30,000 cells. Is this rough correspondence a coincidence or does it speak to some principle of hierarchical design? One can arrive at this suggestion of a symmetrical hierarchical organization by asking how one might minimize the connections while maintaining the connectivity (Braitenberg V., per-
sonal communication). It is interesting that two such completely different approaches lead to the same largescale organizational scheme. How many distinguishable patterns might such a system allow if each module, in fact allows 300 traces? The answers to such a question depend not on combinatorics but on the physiological mechanics of interaction as we've seen on a smaller scale for modules in this paper. Perhaps at present we can begin to speculate about the interactions of modules and the properties of systems composed of multiple modules. The second significant broad result is that the present theory predicts that two interesting things happen as module size approaches the value of 30,000 cells. First, the variation in cross-talk activation level becomes insignificant in comparison with its mean value. This in itself could contribute to stability of memory recall, rendering very unlikely the occurences of potentially disrupting but unusual combinations. Second, the spurious activation of cells by the random firing of other network cells, when combined with cross-talk activity, tends to overwhelm signal levels for cortical-like modules. Both these factors contribute to the occurence of a peak value in memory capacity at observed size of cortical modules. Is it possible that the brain has designed itself at this level to optimize memory capacity? At first glance 300 memory traces seems small compared to, say, 30,000 neurons. But then a system which distributes memories over many cells has compensatory advantages- such as resilience in the face of cell loss, etc. This would seem a reasonable conjecture at this point in our understanding. The proper context for this question is the study of the capacity of the higher level system that uses large numbers of such modules as components. The theory as it stands allows the prediction of trends and reasonable estimates of design parameters. Observed memory capacities in 100-cell nets are matched remarkably well by the theory. Using reasonable estimates of observed cortical parameters the theory predicts that cortical modules should use about 56 cells per link, should exhibit memory capacities of about 300-900 traces, and might have been designed at 30,000 cells to optimize memory capacity and stability. That traces should fire about 56 neurons per msec in order to safely sustain themselves would correspond to what has been called "sparse coding" in the previous literature (Palm 1987a; Amari 1989). Our theoretical prediction for memory capacity (in the absence of external noise) reduces from the general form given in (4) to the approximate relation given in (9) (cf. 6 and A-4).
NTmx = s*NZ/(d*n 2)/L = Ssig *NZ/(b*Smx *s # *d*nZ*L).
(9)
This form is essentially equivalent to that derived by previous investigators. Our contribution is to show the significance of stochastic variability (in terms of probabilistic derivation and corresponding standard deviation); to link the parameters more explicitly to
355 physiologically-measureable parameters (in terms o f the sequential configuration model, and expression for s); and to show the influence o f extrinsically-supplied random bombardment. A desirable refinement for the theory is the inclusion o f m o r e fine-grained representation o f physiological mechanisms o f competition, selection, and cross-talk a m o n g traces involving interactions o f inhibition and excitation. I f excitation o f excitatory cells is preeminantly i m p o r t a n t in cross-talk disruption, is it that extra excitation o f trace cells disrupts their timing? Or, is it that the cross-talk excites t o o m a n y non-trace cells to fire? O r is it otherwise: that inhibitory signals m a y be the m o r e critical? I f so, is it that activation o f non-trace inhibitory cells is the key ingredient (these in turn tending to inhibit trace cells)? O u r c o m p u t e r simulations o f traces in smaller nets use the interactions o f excitatory and inhibitory sequences in each m e m o r y trace; one w a y o f seeing this is that each trace has a cadre o f inhibitory cells which tends to shut d o w n competing traces, but directs m a r k e d l y diminished inhibition to trace neurons. Equations (6) and (8) are p r o b a b l y a reasonable a p p r o x i m a t i o n for any o f the several physiological possibilities, when properly interpreted. The predicted values m a t c h the simulations for small nets very well. Nonetheless, one does want eventually to b e c o m e clear as to precisely what physiological determinants are most significant.
Appendix: physiologically-based estimate of s The parameter s o f (5) can be related to physiological parameters by relating total signal level and cross-talk level to cell thresholds. First, note that for sequential configurations to be self-sustaining, their signal levels must exceed cell thresholds by some a m o u n t . We can state this as in (A-l). nf*S~ig = b*dp .
(A1)
Here, ~b is cell threshold; b is a security factor, say a b o u t 1.1; and ns*Ssig is the total signal level in trace cells at the time o f firing. Ssi, represents the temporal s u m m a t i o n o f trace psps. One m a y show by using the geometric series for exponentially decaying psps, that Ssi, m a y be a p p r o x i m a t e d by (A-2). (a) S,ig = ( 1 -- exp( - w * n L ) / ( 1 - exp( - w) (b) w = dc/nL + A / T .
(A2)
Here nL is the n u m b e r o f links synaptically connected, dc is psp link decay in the sequential configuration model ( M a c G r e g o r 1991a), A is interlink time interval, and T is m e m b r a n e time constant. Secondly, one could say that disruption could occur when temporally-summed cross-talk in itself app r o a c h e d threshold level, as given by (A-3). (a) r
*S # = ~b
(b) S,,,x = 1/( 1 - exp)( - A / T ) ) (c) s # = ( ~ exp( - d c * ( l - 1)/nL))/nL.
(A3)
Here, Smx represents temporal s u m m a t i o n o f (c,it, s # represents average unity psp strength, and q~ again represents cell threshold. C o m b i n i n g A-1 and A-3 gives (A-4). s = Ssi,/(Smx*b*s # ).
(A4)
References Abeles M (1982) Local cortical circuits. Springer, Berlin Heidelberg New York Aertsen A, Palm G (eds) (1986) Brain theory. Springer, New York Berlin Heidelberg Amari S-I (1989) Characteristics of sparsely encoded associative memory. Neural Networks 2:451-457 Braitenberg V (1977) On the texture of brains. Springer, New York Berlin Heidelberg Braitenberg V (1978) Cell assemblies in the cerebral cortex. In: Helm R, Palm G (eds) Theoretical approaches to complex systems. Springer, New York Berlin Heidelberg, p 171 Eccles JC (1981) The modular operation of the cerebral neocortex considered as the material basis of mental events. Neuroscience 6:1839-1856 Gerstein GL, Bedenbaugh P, Aertsen A (1989) Neuronal assemblies. IEEE Trans BME 36:4-14 Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proe Natl Acad Sci 79:2554-2558 Kohonen T (1972) Correlation matrix memories. IEEE Trans Cornput C-21:353-359 MacGregor RJ (1987) Neural and brain modeling. Academic Press, New York MacGregor RJ (1991a) The sequential configuration model for firing patterns in local neural networks. Biol Cybern 65:339-349 MacGregor RJ (1991b) Theoretical mechanics of neural networks. Academic Press, New York (in preparation) McEliece RJ (1987) The capacity of the Hopfield associative memory. IEEE Trans Inf Theory IT-33:461-482 Palm G (1980) On associative memory. Biol Cybern 36:19-31 Palm G (1981a) On the storage capacity of an associative memory with randomly distributed storage elements. Biol Cybern 39:125--127 Palm G (1981b) On the storage capacity of an associative memory with randomly distributed storage elements. Biol Cybern 39:125-127 Palm G (1981c) Towards a theroy of cell assemblies, Biol Cybern 39:125-127 Palm G (1982) Rules for synaptic changes and their relevance for the storage of information in the brain. In: Trappl R (ed) Cybernetics and systems research. Elsevier, Amsterdam Palm G (1987a) On the asymptotic information storage capacity of neural networks M.P,I. for Biol Cybern, Comnsci Book Palm G (1987b) Local rules for synaptic modification in neural networks. M.P.I. for Biol Cybern, Comnsci Book Sejnowski TJ (1986) Open questions about computation in cerebral cortex. In: McClelland JL, Rumelhart DE (eds) Parallel distributedc processing: explorations in the microstructure of cognition, vol. 2: Applications. MIT Press, Cambridge Sztengaothai J (1983) The modular architectonic principle of neural centers. Rev Physiol Biochem Pharmacol 98:11-61 Vaadia E, Bergmann H, Abeles M (1989) Neuronal activities related to higher brain functions - theoretical and experimental implications. IEEE Trans BME 36:25-35 Prof. Ronald J. MacGregor Aerospace Engineering Sciences Campus Box 429 University of Colorado Boulder]CO 80309-0429 USA