Biol. Cybernetics 21, 23--28 (1976) 9 by Springer-Verlag 1976
Signal Transformation and Pattern Recognition in Visual Pathways Francesco Andrietti Istituto di FisiologiaGenerale dell'Universit/t di Milano, Milano, Italy Received: March 23, 1975
Abstract A model of the visual pathways it has been developed consisting of two neural systems working together: a feed-forwardmechanism given by convolutionsthrough different layers and a feed-back one consisting of the comparison of the results of the first one. The first mechanism would be predominant at lower centers, and it would be able, for example, to explain the increase in contrast sensitivity at different stations of visual pathways. The second mechanism seems apter to explain the behaviour of complex and hypercomplexcells whose activity is to a great extent independent of the position of the stimulus in the receptive field. The ideas developed in this paper could have some implications for building a machine able to recognize patterns independently of their position.
1. Introduction We may speculate about two possible kinds of organization of visual pathways in the brain: a) a digital one; this means that beside some neurons uniquely involved in the transmission of signals through the different parts of c.n.s, others are exclusively concernedwith their elaboration. One must suppose that the last ones have an a priori knowledge of the constants of c.n.s., as the dimensions of receptive fields, the times of decay, etc. In other words one must presuppose the existence of an external world whose features have to be detected, an internal structure with given mechanical characteristics and a program that, taking into account the constraints of the internal structure, is able to analyse the signal (see, for example, Leibovic, 1966, 1969). If this approach on the one hand enables us to quantify the information content of neural structures it is on the other hand defective for three main reasons. The first one consists in the fact that the brain does not necessarily execute the same operations that we would make at its place, as it has already been pointed out by Pitts and McCulloch, 1947. The second one concerns the fact that one only looks for the final stage, i.e. for the cerebral structure already defined, and one lacks any indication about the way to realize it. At last the research for t h e meaningful features of the stimulus, for example lengths, areas,
etc., is based on our cultural and logical experience; and if one can say that also logics may finally be explained in terms of neural organization, this does not imply that it can be derived by the elementary circuitries with which we are actually concerned. b) an analogical one; in this case one does not make any distinction between channels carrying and elaborating the information; instead each one is involved in both processes. One does not claim the brain has any a priori knowledge of its internal constraints but only that some particular neural circuitries have been selected, among all possible ones, which are able to detect contextual features apt to animal survival. On the other hand both schemes are not mutually exclusive; in fact, as it will be showr/ later, starting from one kind of organization as b) one is led to a need of computation as a). The difference between both schemes lies rather in the fact that the first one is concerned with definite structures with definite performances while the second seems apter to explain the behavior and the evolution of a system whose structures and performances are dynamically defined in their reciprocal interaction. These cdnsiderations now perhaps rather obscure, will be resumed in the Discussion. In the following we will look, preferably at data coming from physiology rather than from psychophysics. This because of the fact we need data from identifiable anatomical structures in order to make a suitable analysis of some neural organizations.
2. Signal Transformation It is well known that there is a cortical representation, at least approximative, of the retina in the visual cortex (Minkowski, 1913). One assumes moreover the existence of two kinds of organizations through the whole visual pathway: a vertical one, possibly related to the different layers in the same center or to the different centers of the same pathway, and an horizontal one. The first organization would be devoted to
24
increase the specificity in the response to a given stimulus, while the second one would be able to distinguish a m o n g different classes of stimuli. These kinds of organizations were well established by Hubel and Wiesel (1962) in the visual cortex. But these representations are generally established looking for the maximal evoked activity in the cortex during an electrical stimulatmn of the retina or by using a spot of light. In this way one misses the peripherical activity and one does not take into account the possible interferences a m o n g different parts of a complex stimulus. That this last effect really exists is a matter of physiological and psychophysical experience, and it will be enough to recall, as an example, the case of the socalled optical-geometrical illusions. For these reasons it will be better to try to represent the signal transformations by a functional (instead of a function) dependent on the activity of the whole retinal field. In this way the classical representations of retinal projections in striate cortex become the points where, for a given stimulus pattern (spot of light), the value of the functional is maximal. A general scheme of convergence-divergence suggests the idea to represent such a functional as the convolution of the stimulus with another function (weighting function: w.f.) that depends on the neural level one is considering (Ratliff, 1965; Harth and Pertile, 1971; Blakemore and others, 1970). One assumes a complete homogeneity among connections and the lack of reciprocal interactions a m o n g neurons of the same level.
3. Unbounded Layers Let us characterize the responsefk(x), (k = 1, 2 .... n), of n successive layers of neurons to a one dimensional spatial stimulation fo(x) as given by the following integrals, when they exist ~ (Fig. 1):
f l(x) = ~+-~ fo(P + x)w~(p)@ = j'+_~ fo(p)wl(p- x)@
fz(x)= ~+~f~(p+ x)wz(p)dp - j'+~fl(p)wz(p-x)@ (1) f.(x)- I :~ f . - l(p + x)w.(p)dp = ~+_~L - ~(p)w.(p- x)@, where Wk(X) is the w.f. between layers k - 1 and k. All functions fk and Wk are referred to a cartesian axis with the same origin. 1 W e a s s u m e here t h a t both fk(x)" and Wk(X) b e l o n g to L 1 a n d L a ( a l t h o u g h w e a k e r c o n d i t i o n s could be used), in o r d e r to m a k e use of the F o u r i e r t r a n s f o r m s w i t h o u t a n y problem.
~:~fo
Wl [~1
fz
Wn
fn
Fig. 1. U n b o u n d e d layers of n e u r o n s
When the w.f.'s are even functions the above integrals may be rewritten as convolutions:
f l(x)-- ~+~ fo(P)W~l~(x-p)dp f2(x) = ~+~ f l(P)Wz(X-p)dp = J'+ ~ fo(P)W~z~(x- p)dp (2) to9
--
"+o0
f,(x) = ~_ ~ f ~_ l (p)w,(x - p)@ - J _ ~ f o(P)Wc,)(x- p)@, where w(~)(x- p) = w 1( x - p)
w(2)(x-p)= ~+~ w m ( q - p ) w 2 ( x - q ) d q ..................................
(3)
w(,)(x- p) = ~+~ w~,_ a~(q- p ) w , ( x - q)dq are obtained from the repeated appfications of the convolutions (2), taking into account the fact that the order of integration can be inverted, and by the use of the convolution theorem for Fourier transforms. So we see that the successive convolutions through n different layers are equivalent to a single convolution w(,) whose compounded w.f. is given by (3). H a r t h and Pertile (1972) computed w~,, for different values of n when all the w.f.'s are equal and are given by a double gaussian, i.e. a function expressed by: Ae-~l(x-p)Z
_ Be-~Z(x-p)2
where A, B, ~1, and c~a are parameters. They found that the repeated mapping gave rise to a compounded w.f. with new maxima and minima m addition to those of the original one. On the other hand when one uses a simple gaussian instead of a double one, i.e. one does not take into account the lateral effects, the compounded w.f. still remains a sample gaussian whose variance increases with n. These facts suggest the idea that the mapping will improve the resolution under repeated applications, provided that certain conditions are fulfilled, as the presence of a lateral effect of opposite sign; this is the case, for example, when there is a center-surround
25
organization of the receptive field. Concerning this problem it would be of interest to investigate experimentally if an increase in background luminance, whose effect at ganglion cells level consists uniquely of the increase in the surround mechanism (Maffei and others, 1970), may also increase the ability to discriminate stimuli. It is interesting to observe that on the ground of purely qualitative speculations Barlow and Levick (1965) reached similar conclusions about the importance of lateral inhibition and repeated mapping for the selectivity in the response. The idea of repeated mapping has moreover some genetical implications; it suggests that the repetition of a simple w.f. as that given by a double gaussian, which in turn could be related to the anatomy of the dendritic field of cells (Leibovic, 1972) easily specified by the genetic code, is able to increase the ability to discriminate signals. Another case when the use of convolutions may be of help concerns the problem of contrast sensitivity to a periodical stimulus. Christina and Robson EnrothCugell (1966) proved the contrast sensitivity of cat ganglion cells as being proportional to the Fourier transform of their w.f., when this is an even function: they found a theoretical curve well fitted to the experimental one. Maffei and Fiorentini (1973) measured the contrast sensitivity at different levels in cat visual system, namely in retina, lateral geniculate and striate cortex. Starting from Fig. 1 of their work and making the assumption that the w.f.'s are the same for every level 2, by a simple use of the convolution theorem for Fourier transforms (Appendix), we obtained the curves of Fig. 2 of our paper, that are simply given by the curve corresponding to n - 1 raised to the power of 2 and 3. In experimental situations one must take into account the presence of different ganglion cells {X or Y), whose receptive fields also have different dimensions. But when one tries to look at the same type of ganglion cells in different stations 3 one finds that the two main results that have been shown in Fig. 2, i.e. the fact that curves become narrower and still maintain the same point of maximum, correspond to the experimental data of Maffei's and Fiorentini's work for L G N and striate cortex (simple cells). z This is, of course, a very strong assumption, but it is not necessary. The only thing we need, in order to draw narrower curves for LGN and striate cortex level (see further) is that the Fourier transforms of the different w.f.'s have a common point of maximum (Appendix). This condition is fulfilled if the w.f.'s are expressed by double gaussians for many different parameters. 3 It has been shown (Cleland and others. 1971] that the pathways carrying sustained and transient informations from retina remain essentially separate through the LGN to the visual cortex.
v
z o w
1.0 0.8
iz IJ.
o LIJ C~ I.-
11=2
0.4
n=
3
-.I IX. Z I 0.1
I 0.2
I 0.5
I 1
I 2
I 5
SPATIAL FREQUENCY (c/deg) Fig. 2. n = 1: response of retinal ganglion cells to a sinusoidal grating, as a function of spatial frequency (from Mallei and Fiorentini, 19731; n = 2 and n = 3 : computed responses of L G N cells and simple cortical cells, obtained from the curve corresponding to n = 1 raised to 2 and 3. The responses are expressed as a percent of their maximal value; the spatial frequency is expressed in cycles ~degree
4. The Problem of Signal Detection Looking at the different stations of the visual pathway we always find, beside less specific neurons, others showing a particular selectivity for a given class of stimuli. These neurons are often called signal detectors. There are many different classifications of such cells in the physiological literature, which strongly depend on the different kinds of animals which have been observed and on the different experimental conditions. But there is a general agreement about the fact that, going from more peripherical to more central structures, there is a progressive increase in the selectivity of responses and at the same time a minor dependency on the actual position of the stimulus in the receptive field. So let us consider two limit classes of signal detectors, both showing a certain specificity for a given stimulus or a given class of stimuli, but differing in the fact that the activity of the detectors of the first class also depends on the position of the signal in the receptive field. Some ganglion cells, for example (Maturana and others, 1960), may be considered typical elements of the first class, while complex and hypercomplex cells in the striate cortex (Hubel and Wiesel, 1962, 1968) would belong to the second one. We ask now whether the concept of weighting functions the way we explained before could satisfy the requirements of these two classes of detectors. The answer is affirmative for the first case, as we have seen
26 above, while in order to explain the behaviour of the second class of detectors we have to consider a different model. 5. A Model of Signal's Detector Let us assume we have two layers of neurons separated by a given w.f.w. The stimulus fo vanishes4 out of a certain interval ( - 2, + ;~), where 2 is a positive constant. It will be:
f l(x) = ~+~fo(p)w(p- x)dp . As w(p) is an even function, one can immediately see that w(p-x) is symmetric with respect to b o t h variables p and x, i.e. w(p-x)=w(x-p). This fact enables us to m a k e use of the theory of symmetric kernels as it has been developed to solve linear integral equations. If one takes a w.f. w(p- x) such that one of its eigenconstants, that are all real (Tricomi, 1957), is equal to 1, it becomes possible to build a detector for the eigenfunctions belonging to 1. In fact let us assume we have a n e u r o n able to c o m p a r e the stimulus f0 with the o u t p u t f l for Ix[~ 2, and let us assume furthermore that this n e u r o n fires at its m a x i m u m rate when and only when b o t h functions are equal; the n e u r o n will behave as a complex or an hypercomplex cell, i.e. it will be selective for a given class of stimuli (eigenfunctions belonging to 1) and in the meantime the rate of firing will not change with the position of the stimulus, because when this is shifted from position 1 to position 2 (Fig. 3) one m a y repeat the same reasoning as above, b u t n o w with respect to the reference O', center of the interval of definition of the translated function.
(2)
(1)
.X
f0
One m a y also suggest a way to test the model: by bringing a new stimulus into the receptive field of a unity already responding maximally one would expect a lowering of its activity. Let us end with an example: if we take a w.f. w(p-x) given by: e x-p + e p-x = 2 c o s h ( p - x),
(4)
we see that this particular kernel can be also written as:
X 1(x) YI(P) + X 2(x) Y2(P), where: Xl(x)=eX; Yx(p)=e-V; Xz(x)=e-~; Yz(p)=eP. In this case (Tricomi, 1957) the eigenconstants ~ of the kernel are the solutions of the equation:
--0a1'2 = 0
1--~a!'l -~a2,1
(5)
1-0a2,2
where ah,k= ~+-I Xk(X)gh(x)dx
(h, k = 1, 2).
(6)
F r o m (5) and (6) we obtain: 0z(42 z - sinhZ22)- 42~ + 1 = 0.
(7)
Let us call 01 and 02 the roots (real) of (7); we find: 1 ~1 - 2 2 - s i n h 2 2 '
1
02= 2 2 + s i n h 2 2 ' It is easy to see that 01 is always negative; instead we can find a positive value of 2, 2o, such that: sinh22 o = 1 - 22 o, so that 02 will be equal to 1. The eigenfunctions corresponding to 0 = 1 will be p r o p o r t i o n a l to: Xl(x ) +X2(x), as it is not difficult to prove, using the theory of integral equations. So we find that all functions given by: K(eX+ e-X), where K is any real constant, are eigenfunctions of (4) belonging to 1, in the interval ( - 2 0 , + 2o), as it m a y be easily verified.
6. Discussion O-X
O'
+~.
71
Fig. 3. Position (1): fo vanishes out of an interval of width 22 and center 0; fl is the transformed signal; position (2): fo has been translated in the direction of the arrow; the reference O has been shifted to O'. The areas marked with ////and \\\\ indicate the two intervals where one must compare f0 and fl, respectively in case (1) and (2) 4 iffo is the stimulus pattern, it will be assumed vanishing out of a certain interval; if it is already the result of the convolutions given by (2) it may be assumed arbitrarly small out of a sufficiently wide interval.
The signal transformation in the visual p a t h w a y must deal with the problem of invariants or universals, using a Pitts' and McCulloch's expression (1947). In other words one must face the fact that the perception of a given pattern is to a great extent independent of the physical characteristics of the stimulus, as the light distribution or the geometrical position with respect to a given reference in the retina. This fact, well k n o w n from psychophysics, has been shown true also in physiology; as we have seen above, the response of complex and hypercomplex cells is to
27 a great extent independent of the actual position of the stimulus in the receptive field. In order to explain some invariances in brain pattern recognition Pitts and McCulloch (1947) proposed a model where they assumed, as a working hypothesis, that the brain was able to average on a group of transformations of the actual stimulation; furthermore they indicated some anatomical structures as the possible centers where these operations were performed. /Their approach is faulty not only for the lack of any experimental evidence of the postulated mechanism and for the enormous amount of neurons required to process the signal, but also for the reason that it is still a computational scheme of the type a) we have seen above, i.e. it represents a way as we would resolve the problem, on grounds of our geometrical and mathematical knowledges. The ideas we have tried to develop in this paper are highly speculative, but their main implications could be the suggestion that the brain's way of working may be based upon the interplay of two different mechanisms, i.e. a feed-forward system, represented by the repeated mapping of the w.f.'s; and a feedback one, consisting in the comparison among the activities of different layers. Using the terminology we adopted in the Introduction we will call the first way of acting "analogical", the second "digital". Both mechanisms work together at every level, but the first one is predominant at lower statiolas, the second at higher levels. In these last stations the computational effort of the feed-back mechanisms may be sustained by some of the numerous neural loops that have been evidenced in striate cortex (Szentfigothai, 1972). The importance of a feed-back mechanism to explain the performances of complex and hypercomplex cells has already been stressed by Mfiller and Taylor (1973) on the ground of the results of a computer simulation. The correlation and the final adjustment between both systems will be related to the darwinian fittness of a certain solution together with the constraints inherent the neural structures. In this context we see that the perceptual world with its presupposed features of invariance and necessity will be the result of an operation actively performed by the organism rather then an a priori datum. As it has been shown by psychologists (it will be enough to recall Piaget's works), even also such an apparently elementary notion as length becomes consolidated only some years after the birth. Every neural scheme must take into account this fact, but for the moment there is no clear descriptions of such changes
in neural structures s. For these reasons we think that the elementary geometrical features really correspond to a much more sophysticated and deeper level of elaboration than can be justified on grounds of our actual knowledge of neural machinery. This is a good reason, we believe, to avoid relating data coming from psychophysics to those coming from neur0physiology. On the other hand Maturana and others (1960) explained the activity of ganglion cells in terms of the "stimulus contextual meaning" rather than as an on-off response to a spot of light. So we may believe that, at least to a certain extent, models based on our actual knowledge of neurophysiology could be of help in a theory of pattern recognition. In particular the model we have developed may give some suggestions about a way to build an analogical machine able to recognize patterns independently of their position. To be useful the theory should be extended to the bidimensional case, but probably this does not constitute a great difficulty. We remark here that the width of classes in which signals are classified depends on the accuracy in the comparison between input and output.
Appendix As it has been proved by Christina and Robson Enroth-Cugell (1966) if we consider a sinusoidal grating pattern moving in a direction perpendicular to the grating bars, the contrast sensitivity of ganglion cells (i.e. the maximal amplitude in the modulation of their response) to the stimulus will be proportional to the Fourier transform of their w.f., when this is an even function. So we can write: Sac ~+~ w(q)cos2rcvqdq = W(2~v), where v is the spatial frequencyof the stimulus and Wis the Fourier transform of w. In the caseof n repeated applications of the samew.f.w, it willbe: S ~: S0-~ w(~)(q)cos2~vqdq, and taking into account (3) and the convolutiontheorem for Fourier transforms we obtain: Sac W"(2~v). The Fig. 2 has been obtained from this last relation when n = 2, 3. In the case the w.f.'s are not all equal, we will have: S ~ Wl(2~v)W2(2~zv)... W,(2~v),
where W1, W2, ..., W, are the Fourier transformsof wl, w2, ..., w,. s Descriptions of changes in the connectivities among neurons during the process of maturation are beginning to appear in the literature (see, for example, Hubel and Wiesel, 1970; Scheibel and Scheibel, 1973), but they are not yet apt to the building of any particular model.
28 Aswe can see, the only thing we need in order to draw curves that become narrower (around the point of maximum) with the increase in n is that the Fourier transforms have a common point of maximum (see footnote page 7).
Acknowledgements. I want to thank Professor K. N. Leibovic for his continuous encouragement, his most helpful suggestions and for his comments on the manuscript. Thanks also to my friend Silvano Colombano for the reading of the manuscript. Most of the work was carried out during a stage at the Center for Theoretical Biology at Buffalo under a NATO Postdoctoral fellowship. The use of the services of the Center was sponsored by N.A.S.A. Grant N C R 33-015-002.
References Barlow, H. B.. Levick. W. R.: The mechanism of directionally selective units in rabbit's retina. J. Physiol. (Lond.) 178, 477--504 (1965) Blakemore, C.0 Carpenter, R.H.S., Georgeson, M.A.: Lateral inhibition between orientation detectors in the human visual system. Nature (Lond.) 228, 37--39 (1970) Cleland, B. G., Dubin, M.W., Levick, W. R.: Sustained and transient neurons in the cat's retina and lateral geniculate nucleus. J. Physiol. (Lond.) 217. 473 496 (1971) Enroth-Cugell, Chr., Robson, J. G.: The contrast sensitivity of retinal ganglion cells of the cat. J. Physiol. (Lond.) 187. 517 (1966l Harth, E., Pertile, G.: The role of inhibition and adaptation in sensory information processing. Kybernetik 10. 32--37 (1972) Hubel.D.H., Wiesel, T.N.: Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex. J. Physiol. (Lond.) 160, 106 154 (1962) Hubel, D. H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. (Lond.) 195, 215--243 (1968)
Hubel, D. H., Wiesel, T. N.: The period of susceptibility to the physiological effects of unilateral eye closure in kittens. J. Physiol. (Lond.) 206, 4 1 9 4 3 6 (1970) Leibovic, K.N.: A model for information processing with reference to vision. J. theor. Biol. 11, 112--130 (1966) Leibovic, K.N.: Some problems of information processing and models of the visual pathway. J. theor. Biol. 22, 62--79 (1969) Leibovic.K.N.: Nervous system theory. New York-London: Academic Press [972 Maffei, L., Fiorentini, A., Cervetto~ L.: Homeostasis in retinal receptive fields. J. Neurophysiol. 34, 579 587 (1970) Maffei, L., Fiorentini, A.: The visual cortex as a spatial frequency analyser. Vision Res. 13. 1255 1267 r1973) Maturana, H. R.. Lettvin, J. Y., McCulloch, W. S., Pitts, W. H.: Anatomy and physiology of vision in the frog. J. gen. Physiol. 43, Suppl. 2, 129 175 (1960) Minkovsky: Experimentelle Untersuchungen tiber die Beziehungen der GroBhirnrinde und der Netzhant zu den prim~iren optischen Zentren besonders zum Corpus geniculatum externum. Arb. himanat. Inst. Ziirich. 7, 259--362 (1913) Miiller, F.J., Taylor.W.K.: A comparative study of electronic and neural networks involved in pattern recognition. J. theor. Biol. 41, 97--118 (1973) Pitts, W., McCulloch, W. S.: How we know universals. The perception of auditory and visual forms. Bull. math. Biophys. 9, 127--147 (1947) Ratliff.F.: Mach Bands: Quantitative studies on Neural Networks in the Retina. San Francisco: Holden-Day 1965 Scheibel, M.E., Scheibel, A.B.: Dendrite bundles as sites for central programs: an hypothesis. Int. J. Neurosci. 6, 195--202 (1973) Szeutagothai, J.: Handbook of Sensory Physiology, Vol. VIIi3. Berlin-Heidelberg-New York: Springer 1972 Tricomi, F.G.: Integral Equations. New York: Interscience Publ. 1957 Dr. Francesco Andrietti Istituto di Fisiologia Generale Via Mangiagalli, 32 1-20133 Milano. Italy