Cogn Comput (2009) 1:4–16 DOI 10.1007/s12559-008-9001-8
Cognitive Computation J. G. Taylor
Published online: 23 January 2009 Ó Springer Science+Business Media, LLC 2009
Abstract We present a proposal as to how to create a Cognitive Machine. We start by raising a set of basic questions relevant to the creation of such a machine. These include the nature of human cognition and how it might be modelled, is consciousness crucial for cognition, and how might an autonomous cognitive agent impute the internal mental state of another such agent. In the following sections we propose a set of possible answers to these questions. The paper finishes with conclusions as to the most viable and interesting directions to be pursued to create cognitive machines. Keywords Autonomous agents Attention Emotion Consciousness Mental simulation Theory of mind
Introduction We are living in ever more stimulating times. The processes of science are leading to an increasing understanding of ourselves and the world around us. At the same time such understanding opens us up to ever more questions, especially as to how we, as humans, have such amazing mental powers to have progressed this far. A crucial question is as to how our cognitive skills are part of our computational repertoire: how does cognition work? An answer to this question is basic to understanding ourselves, but it is also highly relevant to creating more powerful machines to lessen our load.
J. G. Taylor (&) Department of Mathematics, King’s College Strand, London WC2R2LS, UK e-mail:
[email protected]
123
Industry, commerce, robotics and many other areas are increasingly calling for the creation of cognitive machines. These are machines which will have ‘cognitive’ powers similar to those of ourselves. They will be able to ‘think for themselves’, reaching decisions on actions in a variety of ways, some similar to those we use. They should be flexible and adaptive, able to learn from their past and that of others around them. They may even be close to ourselves in some (although not necessarily all) ways. It is such machines we want to create, for a variety of reasons. Some of us wish to be able to develop our understanding of our own cognitive powers—to find out how they are created and fostered, and how they can go wrong due to brain malfunction. Modelling the cognitive brain is an important step in developing such understanding. Others wish to provide humanity with robots able to ‘think’ cognitively so they can support us in our daily lives. Even others look at the creation of a cognitive machine as an engineering challenge of the highest order. Some of us work on cognitive machines for all three reasons. To achieve these aims, most of us believe that some level of guidance from our understanding of human cognitive powers will be an important component to help us construct such machines. Other routes must be tried as well (machine learning, fuzzy logic, evolutionary algorithms, etc.) and these can also contribute strongly. However, we have to accept that as we develop ever more powerful and autonomous machines, the human guidance, especially as to how we create decent/non-threatening human beings by education in the family and school, must be ever more strongly appealed to and used—implying the need for some emotional abilities to be included in such advanced machines. One feature in particular that must be developed is the ability in the machine to discern and empathise with
Cogn Comput (2009) 1:4–16
the mental state of others with which it is in interaction, both machines and humans. At the same time, whilst the cognitive powers of humans are the most developed of all other animals it is also valuable to consider, in order to gain a further handle on human cognition, how animals can also possess cognitive powers, although at a lower level than humans. The powers possessed, for example, by the famous ‘Betty the New Caledonian Crow’ are worthy of careful research since they provide a new window on reasoning and the neural mind. We therefore see that the topic of cognitive machines is a very broad one, covering as it does animal intelligence, human intelligence and machine intelligence. These disciplines are to be used as guidance to create a machine that can think, reason, set up goals and work out how to attain them, be aware of its surroundings and what it is doing, and even be aware in the sense of being conscious, both of itself and other conscious beings. This leads to a range of problems that should be addressed as part of the program of creating cognitive machines, including the questions: 1. 2. 3.
4.
5. 6.
7. 8.
What is human cognition in general, and how can it be modelled? What are the powers of animal cognition as presently understood, and how can they be modelled? How important is language in achieving a cognitive machine, and how might it be developed in such a machine? What are the benchmark problems that should be able to be solved by a cognitive machine so as to be allowed to be described as ‘cognitive’? Does a cognitive machine have to be built in hardware or can it work solely in software? How can hybridisation (in terms of fusing computational neuroscience and machine intelligence methods) help in developing truly cognitive machines? Is consciousness crucial for cognition? How are the internal mental states of others to be discerned by the cognitive machine?
In this article we develop brief answers to these questions, together with references to fuller solutions developed so far. In the next section, we consider the nature of human cognition and its modelling, and follow that in the section ‘Animal cognition (question 2)’ by a similar consideration of animal cognition. How language might be intrinsically involved in cognitive machines is then discussed in the following section. In the section ‘Benchmarking problems (question 4)’, we briefly consider the question of suitable benchmarks and in the following section debate the issue of hardware versus software implementation. In the section ‘Possible hybridisation (question 6)’, we consider the possible gain arising by hybridising ideas from machine intelligence and computational neuroscience. We then, in
5
the section ‘The need for consciousness (question 7)’, discuss how consciousness might be involved in the overall cognitive computational system, as occurs in humans. As part of this we describe an approach to consciousness (through attention) which allows us to provide a functional analysis of consciousness and hence its place in the gallery of cognitive components. How the mental states of others (both human and autonomous machine) can be discerned is considered in the section ‘Discerning the mental states of others (question 8)’. In the final section, we describe some future avenues worth pursuing.
What is Human Cognition? (Question 1) Cognition means many things to many people, but here it is taken necessarily to include the higher-level information processing stages that we know are carried out by the human brain: thinking, reasoning and eventually consciousness. Each of these can occur without necessarily any language, but they each require processing of a different order than that involved in perceiving external stimuli with suitable sensors (cameras, microphones or whatever) or performing motor actions with suitable effectors (wheels or grippers). There are various component processes needing to be included in the overall control system of the cognitive machine. Thus in order to lift the neural processing up to the high level expected to be involved in cognition, it is necessary to filter out distracters (especially in a complex environment), using attention. The complexity need not only occur outside the machine or agent: as the size of the agent software increases to take account of a complex external environment so the problem of distracters will increase accordingly. Attention is a brain processing mechanism which is now well studied. This is true both at the behavioural level, such studies going back to the time of Aristotle, as well as at the brain level, with both single cell and brain imaging approaches having been used. Without attention much is missed by a cognitive system at all levels of the animal kingdom. Indeed to some attention is the superior control system of the brain, hence of the cognitive agent. Cognition also needs memory powers including both of short-term and long-term form, again well studied in the brain. Due to the nature of memory as based on an adaptive process in the connection between the nerve cells of the brain, the results of long-term memory storage can be complex in the structural changes involved. There are also different forms of long-term memory, such as in the division of long-term memory into an episodic (involving the presence of the agent as a purposive component in past episodes) and a semantic form (the agent’s presence is not
123
6
Cogn Comput (2009) 1:4–16
part of the memory). There is also procedural memory, in which motor skills are learnt by gradual repetition, to be compared with the one-off nature of episodic memory, for example. Besides attention and the various forms of memory, internal motor control systems are needed for imagining the effects of motor actions (so leading to reasoning and more general thinking). Also, some form of emotional coding is essential to provide empathy to the machine so that it is not a dangerous psychopath when let free in society (there are many cases studied by psychologists, psychiatrists and the legal profession in which psychopaths—with no understanding of the emotion of others— have committed most dreadful crimes [1]). These components, attention, memory, motor control systems and emotions, are basic to the construction of an autonomous agent; there are many others but we will only take the present list for more detailed consideration here. Attention is now understood as involving biasing of the position of the focus of attention, either top-down from some endogenous goal state set up in prefrontal cortex or bottom-up by a competitive ‘breakthrough’ of salient information to change the attention focus to the most salient input stimulus. Such modification of the focus of attention can be encapsulated in an engineering control framework [2], so bringing in various functional modules such as, in engineering control terms, the plant being controlled (posterior cortex), an inverse model controller generating a signal to move the focus of attention (in parietal lobes), a goal site (in prefrontal cortex, for both exogenous and endogenous attention movement), an error monitor (for rapid error correction, most likely in cingulate cortex, as shown by measurements on error-related negativity), an attention copy signal or corollary discharge (expected in parietal lobe), and so on. The difference between engineering control and attention control is that in the former an estimate of the state of the total plant is made to speed up and improve the accuracy of the control; in the latter it is only the attended state of the world that is of relevance (and only that which is to be used in any predictive model to speed up the movement of the focus of attention). The simplest neural architecture for an attention control is shown in Fig. 1. The goal module sends a bias or guidance signal to the module functioning as an attention signal creator (or GOAL MODULE
ATTN SIGNAL CREATOR
Fig. 1 Ballistic attention control system
123
INPUT MODULE
inverse model controller). This latter produces a signal to change the focus of attention on the input module to the desired position. The input on the extreme right may be in any sensory modality in which attention control occurs. The feedback signal onto the input module can function as either a modulating (multiplicative) signal or an additive signal (or a combination of both) [3]. A similar combination of multiplicative and additive feedback could exist for the feedback guidance signal from the goal module. The attention control model of Fig. 1 is termed a ballistic one since the target is set up at the beginning of the control process, and once the attention feedback signal has been generated there is no change in it; attention guidance continues until the focus of attention has been changed. This is independent of any errors that may have occurred in the creation of the attention movement signal or in its guidance, due to there being no feedback during the attention focus movement process. There is direct analogy in this model to the aiming and firing of a gun—the bullet goes to wherever it has been directed initially, with no compensation for possible error or change of target in the meantime. A more sophisticated model for attention control is presented in Fig. 2. Two further modules have been added to the ballistic control model of Fig. 1: the buffer (working memory) module, as the attention controlled access module for input stimulus representations and the attention copy module. The former of these modules acts as a working memory for the attended stimulus representation, to allow such representations to be reported to the rest of the brain (so attain awareness, regarded as being able to be reported anywhere). The latter module, the attention copy module, carries a copy of the attention movement signal, so as to help speed up access of the attended stimulus representation to the buffer memory module as well as to inhibit distracter representations from the input module gain such
ATTN SIGNAL CREATOR
INPUT MODULE
ATTN COPY MODULE
BUFFER MEMORY
Fig. 2 Attention copy model of attention control
Cogn Comput (2009) 1:4–16
access. The copy module can also be used to generate an error signal to modify the attention movement signal if it is incorrect in any way. The bias from the goal module (dropped in Fig. 2 as compared to Fig. 1, although it should still be present) enters the attention movement signal generator module as a guidance signal entering on the left of that module. The control architecture in Fig. 2 is more sophisticated than the ballistic architecture of Fig. 1 in that, for the former architecture, access can be speeded up of the attended stimulus representation to the buffer working memory site as well as there be correction of any errors in predicted effects of the attention movement signal as compared to the goal signal. Such further development has been shown experimentally to occur in motor control in the brain [4]. Here, this more advanced form of control has been suggested as existing in attention [1, 2, 5]. It is possible to develop models of various cognitive processes in terms of this control model of attention. The special additional sites acting as buffer (short-term) working memory sites to hold for a few seconds the neural activity amplified by attention have already been included in the architecture of Fig. 2. The buffered activity is expected to stand out from the surrounding distracters. In these terms, one of the fundamental process of cognition— that of rehearsing attended neural activity on the relevant buffer—can be attained by setting up as a goal the condition to preserve the buffer activity above a certain threshold level; if it drops below it, then attention will be redeployed to the stimulus on the buffer (or at a lower level). This was achieved in [6] by use of the monitor (mentioned above as arising as part of the engineering control approach to attention); the decaying stimulus is then refreshed by refocusing of attention onto it. A further level of cognition is that of manipulating an ‘image’ on a buffer site so as to become another desired image, such as being upside down or fusing it with another image. Such transformations can be achieved by setting up the top-down template so as to achieve the final stimulus configuration. Then attention will be directed to alter the input stimulus representation to the final desired goal stimulus representation on the buffer (and its related lower cortices). Such manipulation allows comparisons to be made between images, which, for example, may be totally different figures or may only be rotated versions of each other. Reasoning can be seen to require additions to the sensory attention system considered above. Besides sensory attention, there is also a parallel system of motor attention [7] which is based in the left hemisphere, in comparison to the sensory attention system in the right hemisphere. These two systems are apparently fused in sets of pairs of internal control models associated with motor control: an inverse model (IMC: generating an action to attain a desired goal
7
state from a present state) and a forward model (FM: predicting the new state caused by a given action on a present state); the states here are all taken as sensory. The reasoning process can then use these sets of FM/IMC pairs to determine which virtual sequences of actions would attain a given goal. This is a planning problem, taking place in the space of concepts. In addition, a long-term memory system is crucial to enable internal simulations of various possible actions to take place (on stimulus representations from those held in the long-term memory), so corresponding to mental simulation in its general form [8]. We thus see that the major components of advanced cognition—thinking, looking ahead and planning—can all be accomplished, at least in principle, by means of the attention-based architecture of [2, 6, 7]. There is the need to develop such an attention control system by training the attention feedback control system (the internal model controller or IMC for attention, generating a signal to change the focus of attention) as part of learning the representations of objects and space (in the visual case) and of similar representations for attention control in other sensory modalities. Finally we note that a neural architecture for emotions, considered, for example, as arising by appraisal from an expected value system, can be developed by suitable thresholds applied to various levels of expectations/actual levels of reward [9]. This or similar approaches are strongly needed to prevent the production of psychopathic autonomous cognitive agents, which could be dangerous to those around them (in the same way that poor upbringing needs to be avoided for a similar production of human psychopaths). From an evolutionary point of view, the perceptionaction cycle is to be regarded as the basic process for which brains allow increasing precision of motor response and capacity of stimulus representation. But brains have a much more crucial role as increasing the look-back and lookforward horizons by addition of the various memory components and the attention and motor control internal models. These memory and control systems are to be combined with the emotion bias system to include the expected rewards of various stimuli in a range of contexts, so further biasing decisions. The ability of adding these various components leads us ultimately to our own cognitive brains. But many of the detailed computational mechanisms are still to be teased out. Animal intelligence may help us achieve that.
Animal Cognition (Question 2) There are numerous observations of animals using reasoning to solve tasks. A well-known example is that of
123
8
Betty the Crow [10], who worked out how to use a bent piece of wire to extract a small basket with food in it from the bottom of a transparent tube. Betty was even able to make a suitably bent wire from a straight one in several cases when no bent wire was present. On the other hand, chimpanzees are able to solve a paradigm such as the ‘2 sticks’ paradigm. In this a chimpanzee is put in the situation of having a small stick within reaching distance which, however, is too short to retrieve a reward such as a grape outside its cage. There is also a longer stick further away outside the cage, which can only be reached by use of the shorter stick. The chimp is able to work out, apparently by reasoning (not trial and error learning), that it has first to use the shorter stick to draw the longer stick to it, and then use the longer one to obtain the food reward. In addition to the apparatus of internal models already mentioned (including the functional models of attention mentioned above, as well as relevant coupled IMC/FM pairs), there is need for manipulation of rewards for the various stimuli in order to determine useful sub-goals. We have suggested elsewhere [11] that altering rewards expected from stimuli allows an animal to arrange the external stimuli in order of their being attained, as a set of sub-goals, so that the modified rewards carry a memory of the reverse order in which the sub-goals must sequentially be obtained. In this way, autonomous flexible planning occurs in the animal: the rewards act as a set of drives causing response to attain the sequence of goals determined by the order of goal values. There are extended features of cognition displayed in infants [12], in which infants are observed to detect novelty in various stimulus motions, observed by longer looking time at a novel stimulus display than at one already observed. It is possible to include this level of cognition in a neural model by the introduction of a novelty detector which directs attention to the novel stimulus until it becomes familiar [13]. The set of modules I have so far introduced (coupled IMC/FM pairs and reward lists that can be manipulated, plus a novelty detector system) needs to have further modules to enable effective planning. In particular, both motor and sensory attention are needed in order to keep down errors and reduce the computational complexity of the overall reasoning process. Together with the buffer site, the overall system allows for efficient and flexible reasoning to be attained.
Inclusion of Language Powers (Question 3) It is a truism that language grants amazing powers to human beings. Without it and the cumulative written records encoding the accumulated wisdom of cultures, it is
123
Cogn Comput (2009) 1:4–16
expected that much less civilizing and technological advances would have occurred. The task of introducing language in a machine system is not insuperable, provided we accept that the system is able to learn continuously in the manner we ourselves do in growing from the first few months into adulthood. If such learning is allowed, then an initial vocabulary of say 500 words (the size of a basic vocabulary for some parts of the world, and more generally of infants at age 2 or so) can soon expand to tens of thousands of words. The nature of syntax, in terms of phrase structure analysis, can be seen as part of learning various time sequences of concepts. This can be achieved by means of recurrent networks (of which there are plenty in the prefrontal cortex and sub-cortical sites, especially the basal ganglia); these can be modelled at a range of levels of identity to the circuits in the prefrontal lobes. We take the position that language can be learnt through attaching sensory feature codes to the associated words that the system hears, so giving proper grounding in the world (represented by sensory codes in the brain). This grounding can be extended to grounding action words (verbs) in the action codes in the brain. Such a possibility requires considerable work to achieve in software, although the principles would appear accessible, and the computational complexity is currently approaching the possible, using grid or Beowulf computing. Given a linguistic system like that above, we can then ask how it might be used to achieve reasoning powers. Chunks of syllogistic reasoning processes would be expected to be learnt more by rote initially, so they could then be used recurrently to allow for more powerful syllogistic reasoning. Mathematical reasoning would also depend on the building of suitable simple rules, in terms of the axioms of the mathematical system being used (such as the Peano postulates). Again it would then ultimately be possible to develop mathematical arguments of ever greater sophistication on the basis of these rules and postulates (such as in the case of solving Fermat’s last theorem). One of the features learnt from studies on single cell activity on monkeys is that sequences of length up to 3 can be learnt by monkeys [14]. Thus monkeys were trained, on a cue, to make a particular sequence of actions, such as PUSH, PULL, TURN using a moveable handle, which always returned to its original position after the action of either push, pull or turn. Various sequences of these actions were learnt by the monkeys (after several months of continued learning). It was possible to recognize that several types of neurons were involved in the replaying of these sequences by the trained monkeys: transition nodes that allowed the action to be changed from one action to another, initiator nodes that were specific to a particular sequence, and of course dedicated nodes for the various components of a sequence.
Cogn Comput (2009) 1:4–16
Thus the transition node TRANS(PUSH, PULL) led to the transfer of a push action to one of pulling; the initiator node IN(PUSH, PULL, TURN) was active when the sequence PUSH ? PULL ? TURN was about to be started to be repeated. These nodes were observed in a trained recurrent net (with architecture similar to that of the basal ganglia) [15]. A basis for language learning (the LAD program, started in 1999 at KCL) was developed from these results using the notion that the grounding of language was based on models of the external world in the brain, as pre-linguistic object and action maps [16]; others have more recently suggested that such grounding may be used to learn language in robots [17]. The ability to extend to longer sequences of words was shown to be possible in LAD, and as occur in infant language development beyond the two or three word stage. More recent work has shown how various more complex components of language, such as movements of words carrying their previous meaning to new positions, are possible in the LAD program. It thus appears that the LAD approach is well related to known linguistic features of word order, as well as to more general aspects of linguistic development.
Benchmarking Problems (Question 4) It is necessary to have some problems that can act as a standard and that thereby allow testing the abilities of various cognitive systems. In the case of computational neuroscience, these problems will be those able to be carried out by animals and children of various ages, so that the level of progress of a cognitive machine can be tested against its competitors. As examples of this, it is appropriate to consider paradigms which test the various components of a cognitive system, such as the perception-action loop, the attention control system, the short- and long-term memory systems and the emotion bias system as suggested so far. Each of these components has their own testing paradigms as well as those testing combinations of two or more components at once. Thus, in perception-action, it is natural to start with situations in which a cognitive machine has to move to, touch or pick up a given stimulus in its environment. Such a process will have required the machine to learn representations of the various stimuli in its environment as well as the actions that can be taken on them. This can itself take a considerable amount of time, depending on the ability to learn concepts of stimuli as well as develop sensitivity to them in terms of their affordances as well as the different digits most appropriate to be used in interaction, such as grasping, with them.
9
For the faculty of attention, there are several wellexplored paradigms, such as those of Posner benefit paradigm (determining the level of reaction time benefit that can be gained by attending to a stimulus when its presence has to be signalled as compared to attention being directed elsewhere than the target stimulus) and of the attentional blink (where a rapid serial visual search stream is used, and a deficit is found if there is a time delay of about 270 ms between different stimuli). There is also a whole range of target search tests. Furthermore, a mixture of attention and short-term memory can be tested by analysis of list learning in the presence of distracters. For non-linguistic reasoning, there are numerous benchmarks of animal powers in [18], to which we refer the reader. The development of benchmark problems for linguistically trained autonomous machines can be considered in terms of spatial reasoning tasks or those of mathematical reasoning. The former of these can be seen as part of the ‘models of the mind’ approach to reasoning, whilst the latter involve more sophisticated conceptual spaces (also involving some form of mental models although not in two or three-dimensional space but in more general concept spaces). It is interesting to consider test paradigms for emotion. There are few computational neuro-science models of the emotions per se on the market, especially those that try to bridge the gap between the neural brain activations associated with a particular emotion and the experience elicited by that activity. A recent model has been developed in which the appraisal theory of an emotional experience of psychologists is proposed as arising from a threshold process involving expected reward values at certain times as compared with normal or actual reward values [9]. Such an approach leads to the possibility of the model’s inclusion in a cognitive agent, biasing the activity of the agent in terms of avoidance of penalty and search for reward, but with inclusion of the nuances of context so as to allow for various different emotions to arise in the agent according to these various parameters involved. Thus emotional paradigms are now being brought into the set of testing paradigms for cognitive agents.
Hardware Versus Software (Question 5) There has always been the debate between these two modes of implementation. Software is usually easier to implement quickly, although it is more difficult to incorporate embodiment into a software environment without careful interfacing. As an example of this embodied approach, see the results of the EC GNOSYS program, in which a cognitive robot was designed and created at: http://www.cs. forth.gr/gnosys [19]. The need for embodiment to be at the basis of any cognitive system has been argued strongly for
123
10
some time [20]. However, there are examples of people who have lost all peripheral feedback (from a viral infection) who can still continue with unabated cognitive powers; these people need to attend solely to walking around for example, otherwise they will fall down. Thus embodiment may not play such a truly fundamental role, but it clearly plays an important role in survival and response development. At the same time there is the question as to whether or not a machine with some level of consciousness could ever exist only in a software state. For the analogy of a model of the weather is relevant here. A software model of weather patterns cannot be wet or dry itself, nor hot or cold. All that the model can do is make predictions about the numbers (wind speeds, rainfall levels, etc.) associated with these features in a particular region. But it cannot be raining in the model nor can any other of the modes of action of the modelled weather act like those modes in real life. For consciousness it would seem that the same situation would occur: the cognitive machine would need to be implemented in hardware in order for the ‘conscious experience’ ever to arise in real time the machine. This would somehow encapsulate what happens in the real world, where consciousness is based on activity levels of nerve cells. Clearly much more has to be developed on this: the importance of consciousness for cognition (to be discussed shortly), the details of hardware implementations of components (neurons, synapses, neuromodulators, etc.), overall processing architectures, etc.
Possible Hybridisation (Question 6) It is natural at this point to ask if we can gain by putting together the strengths of machine intelligence and computational neuroscience. At present levels of implementation, such hybridisation can clearly help if the desire is to proceed to higher level processes (thinking, consciousness, etc.) on the basis of the presently most powerful lower level representations of the surrounding environment. Thus object representations can be powerfully constructed by present methods of machine vision, although guidance from the hierarchy of the modules V1, V2, etc. in visual cortex, as well as their more detailed architecture, has proved of value in helping create adaptive temporal lobe representations of objects. In the GNOSYS project, we fused both this hierarchical brain-guided approach (at the small scale) with a more global coarse-scaled approach using machine vision techniques [19]. The large-scale vision system was used to give rough co-ordinates of objects, and their character, directing the attention of the smaller scale hierarchical system to give a more precise identification. Such fusion can no doubt be applied to motor responses and other aspects of the information processing to be carried out by a
123
Cogn Comput (2009) 1:4–16
putative cognitive machine, for example involving other modalities, such as audition and touch.
The Need for Consciousness (Question 7) The nature of consciousness is still very controversial, although it has now become a legitimate subject of scientific study. Various models have been suggested [5, 20–23, amongst many others], although none has yet gained universal acceptance. Any model worth its salt should be able to give a sense of ‘inner self’ as well as provide that self with ‘immunity to error through misidentification of the first person pronoun’ [1, 24]. Such a model can be constructed using attention as the gateway to consciousness, and in particular relates the inner self or ‘owner’ of the content of consciousness to the signal arising as a copy of the attention movement signal. It is this attention copy model (more technically termed the CODAM model, from Corollary Discharge of Attention Model) which can be related to various paradigms sensitive to loss of awareness, such as the attentional blink [25], but at the same time can lead to there being no misidentification of the inner self as belonging to someone else. Such immunity to error arises because the attention copy signal is used to speed up access to awareness (on a buffer memory site) as well as inhibit possible distracters. Thus the ownership signal is also a guarantee that ‘what you are about to receive (into consciousness) is exactly what you wanted to receive’ (including awareness of highly salient sudden stimuli which are also set up in frontal lobes as a goal, so are processed in a similar manner to the top-down attended stimuli). A possible architecture for CODAM has been presented in [1, 2 and references therein; see also 5], and is shown in Fig. 3. The model consists of the modules already mentioned in association with Fig. 2, but now made more explicit. The IMC (the attention controller) in Fig. 3 is the generator of a signal for the movement of the focus of attention, termed the attention signal generator in Figs. 1 and 2. The monitor assesses the error in the expected attention modification, as compared to the desired goal, and sends a modifying signal
Fig. 3 The corollary discharge of attention model (CODAM) for consciousness
Cogn Comput (2009) 1:4–16
to the IMC accordingly. The object map contains object representations learnt from the environment. The corollary discharge module was denoted the attention copy module in Fig. 2. The CODAM model achieves the creation of consciousness in two stages. In the first stage, the corollary discharge signal is employed to speed up access to the buffer memory module for report of the attended stimulus. This is done by both amplification of the representation for the attended stimulus as well as inhibition of distracter representations. In the case of object representations, it is expected that the attention feedback to these various representations must be learnt alongside the representations themselves; such a process occurred in the GNOSYS object representation process [19]. However, this process is relatively slow, and so the attention copy signal is used to help speed up this process by amplifying the target stimulus representation on the buffer site as well as inhibiting distracters. At the same time, this copy signal is used as a predictor to check if the expected result of the attention movement signal will attain the gaol; this uses an error assessor (from the difference between the goal and the predicted goal signals). This error signal is used to modify the attention signal itself, if needed. All of these processes use the attention copy signal in an essential manner. It is that signal which has been suggested as carrying the signature of ownership of the about-tohappen attainment of its buffer by the attended stimulus representation [1, 2, 5]. The ownership signal is thus proposed as that of the inner or pre-reflective self, the ‘I’ which owns the experience of content. The second stage of the creation of consciousness is then that of the attainment of the attended stimulus representation onto its appropriate buffer, thereby becoming available for report to other such buffers and thus playing the role of the instigator of the content of consciousness. The details of this consciousness can be unpacked from the lower level activity associated with the various features of the attended stimulus at differing scales. These two stages of the creation of consciousness in the brain are thus proposed to be: 1.
2.
The creation of the activity of the inner self by the arrival of the attention copy signal. This signal acts to ensure speed up of access of content to consciousness, as well as the correctness of that content. Thus the inner self is a ‘sentry at the gate’ to consciousness, thus granting a subject a certainty that it is they themselves who are having the relevant conscious experience and not someone else. The secondary activation of the attended stimulus representation on the relevant sensory buffer. The relevant content is detailed by associated lower level
11
posterior cortical activity for feature components of the attended stimulus at different scales. This sequential process fits well with the results of phenomenological analysis by Husserl and many others of the tradition of Western phenomenology [26, 27], with the sequence of Pretention ? Primal moment ? Protention. Such a tripartite sequential process can be seen to arise from the two-stage dynamical scheme envisaged above by the identifications Pretention = Activation of the attention copy module and its usages Primal moment = Access of the attended stimulus to its relevant sensory buffer Protention = Continued (but decaying) activity on its sensory buffer, possibly able to be rehearsed so as to be continued as long as desired [6]. In this way we see that there is further support, from a branch of the philosophy of mind, for an approach such as that of CODAM. The ‘I’’s Eye View of Consciousness It is important to consider possible aspects of the consciousness of the inner or pre-reflective self, as represented by the activation of the corollary discharge module of Fig. 3. It has been proposed [1, 2, 5] that this site is the ultimate home of the pre-reflective self, and as such therefore the heart of consciousness. Due to the early activity in the corollary discharge buffer of Fig. 3 being unconnected with lower level posterior stimulus activity, it is to be expected that it is content free. That is the main source, we propose, of the content-free nature of inner selfconsciousness; it is similarly the ‘nothingness’ in Jean-Paul Sartre’s powerful book Being and Nothingness [28]. The corollary discharge buffer is closely connected, according to the architecture of Fig. 3, with the error module and the target buffer module; there may be even strong interaction with the IMC for attention movement. Thus it is possible that other colorations of the inner self can arise besides those purely of the ownership associated with manipulation of the activity on the buffer working memory site to get the content of consciousness. Thus some knowledge will occur in the Corollary Discharge module of the error expected to occur in reaching the desired goal, of some aspects of the attended stimulus (from the coding on the buffer memory site) and of the attention movement signal itself. We note that the corollary discharge will be held in a veritable network as the seat of the inner self. This is
123
12
because there are various buffer working memory sites (for spatial aspects of visual stimuli, for object representations, for phonological coding of language, and numerous more such short-term memory stores). Thus we expect the cortical spread of the set of corollary discharge modules to be quite extensive. Moreover, the set of these modules is expected to form a well-connected (possibly inhibitory) network, in order that there is only one sense of self. Thus the corollary discharge network will spread across various areas of cortex, and so be expected to have contacts across a considerable range of cortex. It is this feature, along with its expected central position in the brain, that will lead to the corollary discharge network being sensitive to a number of features of brain information processing information at the highest level. This information accessible to the corollary discharge module is still only expected to be content free, so Sartre’s description of it being ‘Nothingness’ [28] will still be apposite. The corollary discharge network is thus to be regarded, from its receipt of such high-level brain information, as at the centre of the attention control system. It is listening in to error correction as well as to content access for report, and may know about attention control boosting (through the error correction signal) as well as possible switches of attention to other modalities (through distracters). Thus the inner self is expected to be at the centre of control of everything of importance ongoing in the whole brain. Such a role is consistent with the hierarchical view that sensory attention is at the top of the control systems in the brain (above motor attention, although considerably fused with it). Moreover, the corollary discharge system (the corollary discharge buffer, the error correction process, the feedback to the stimulus buffer and that to the attention signal generator) is to be regarded as the top of the sensory attention hierarchy, with the module for the inner self (the corollary discharge buffer) as the centre of such information processing, with a complex range of pre-reflective awareness of various components of the message passing involved. But first among equals would be the knowledge of ownership of the consciousness of the content of the attended stimulus about to occur, with the ancillary activities being developed using this ownership signal. Thus the ‘I’’s eye view is that of the ongoing processing of the whole brain. It is the kingpin of the brain. Such an all-embracing view would seem to be contrary to that of the ‘nothingness’ of Sartre [28] or of the majority of the researchers in Western phenomenology [26]. However, we are able to go beyond the results of the latter since we have a specific model of how consciousness could be created in CODAM [1, 2, 5]. The exploration of the connectivity of the corollary discharge network allows us to extract what high-level information is available to this net, and leads to the cited result. Thus the nothingness of the
123
Cogn Comput (2009) 1:4–16
inner self is imbued with all that is ongoing of importance in the brain. It is content-free since it is coded at high level, but provides the ‘I’ to be a concerned watcher at the ongoing brain processes. The ‘I’ has no will of its own, but is concerned with any error correction of great import as well as being cognisant with the changing contents of consciousness as they occur, and over which the ‘I’ stands guard. Undoubtedly such a complex system will be needed in the advanced cognitive machines of the future.
Discerning the Mental States of Others (Question 8) These has been considerable controversy over how our ability to understand the mental states of others is achieved, as well as how this is lost (partially or completely) in children with autism. In this section, we have to extend the question just raised also to autonomous robots, when our quest takes the form: What software architecture can be included in an autonomous robot so that it can deduce the internal mental state of another autonomous agent? Such deduction may require a learning stage, since it is unlikely that a hard-wired system will be able to be constructed that can detect, for example solely in a hard-wired manner from facial patterns (in the case of interaction with a human) or from patterns of body movements (in the case of another autonomous machine or a human), what such complex patterns mean as far as the present sensations (if any) and future intentions (again if any) are of the autonomous agent being viewed. The first question to consider in this problem, before all else, is as to what level of autonomy is possessed by the autonomous agents being interacted with. Let us suppose the agents being investigated possess a set of basic drives (including curiosity), a generous (although not infinite) supply of energy, and the cognitive components of attention, emotion, long-term memory and decision making. They can temporarily store goals, with their associated reward values, and plan so they can attain the most-valued current goal, as well as remember the method used to attain the goal. This and other cognitive features (including linguistic powers) are assumed to have been included in the software architecture of these autonomous agents. This is clearly beyond what is presently available, although will not be so in due course. Having given the agents both autonomy as well as a comparatively high level of cognitive powers, the next question we must consider is how can an autonomous agent differentiate between other autonomous agents and purely non-autonomous ones, such as a ball rolling across the field of view? There are a variety of mechanisms that can be expected to be employed in the building of autonomous
Cogn Comput (2009) 1:4–16
agent detectors as part of our own autonomous agent. Such a detector would classify as autonomous, among other things: 1.
2.
3.
Those agents with similar shape to the carers who had initially trained them, assuming the latter agents had been taught by carers who could inculcate empathy and a panoply of emotions, as part of its developing architecture, as would occur for an infant. This affective upbringing approach should thereby avoid the production of psychopathic agents; Those agents with unexpected and unpredicted movements and responses to other agents, be they autonomous or not; those agents with unexpected or unpredictable speech patterns.
This is but a small set of possible triggers to alert an agent that an autonomous agent is in its vicinity. We assume that one or other (or all) of these mechanisms is possessed by our autonomous agent. As such it can therefore differentiate other agents into those which are autonomous and those which are not. The latter may well be analysed in living agents by genetically programmed circuits; this reduces the computing load to developing suitable detectors and internal models associated only with autonomous agents newly met in the environment (although these mechanisms may also have a genetic basis). It is now necessary to consider the sorts of inner mental states that are assumed to be possessed by the autonomous agent for it to be able to discover the mental states of others. An approach to this is to start with the possession of mirror neurons [29] by the agent. Mirror neurons have been observed in monkeys and humans when they are observing the actions of others, such as when the hand of the experimenter picks up a raisin in view of the monkey. The mirror neurons need to be expanded to the mirror neuron system [30], which has numerous brain areas active when the monkey is observing others which have a good overlap with the set of areas active when the monkey makes the corresponding movement. Such extension is also needed for human mirror neurons. This neuronal system has been recognised as very likely involving the activation of internal motor models, including both a forward or predictor model of the next state of the observed system, and an inverse model, which generates the action needed to have changed the state from a previous one to the new one being observed. Such internal models have been embedded in a larger architecture so as to enable mental simulation to be performed [8]. The final question, given the above possibilities, is therefore what is the neural architecture that would enable an agent to observe another one so as to discover the
13
internal mental state of the other agent? Such an internal state would consist of long-term memories (LTM) of propositional form (such as ‘a box with a picture of sweets on its outside contains sweets, which will be found in the box if it is opened’). Such propositional content of LTM will be activated when context is present, such as the presence of such a box, and will be available for report on the global workspace of working memories in parietal and prefrontal cortices. Such propositions, in linguistic form or not, will be termed, as usual, beliefs. It is these about which much controversy has occurred in analysing the understanding of the beliefs of others, in terms of the presence of some sort of ‘theory of mind’ in the mind of the autonomous agent. Here we will assume only that the beliefs can exist in the form defined (as coded LTM neural representations), and that they can be expressed in the internal models by suitable forward models. Thus the sweet box belief can be represented by the forward model (sweet box appearance, action of opening the box) ? opened box of sweets (where the forward model notation is (x, u) ? x0 and corresponds to the action u on the state x leading to the state x0 ). The interesting question, which has been experimentally probed in children of a variety of ages and also in autistic children, is if a child is shown, for example, that the sweet box actually contains pencils, then when another child or adult comes into the room what will the newcomer think is in the box: pencils or sweets? For older children with a good LTM and associated FM, they will use their LTM to predict that the newcomer will say ‘sweets’. Younger children (say 3 years old) will predict that the newcomer will say ‘pencils’. This changeover is explained most simply by the maturing of the children, with 3 year olds only able to process the previous experience (being shown the sweet box as containing pencils) by a short-term memory (STM). Their earlier experience with the sweet box (that it actually contains sweets) has not been registered in their memory (having decayed away in their STM and not encoded sufficiently in their LTM). For older children, the LTM of the sweet box containing sweets can be excited by the appearance of the current box. Moreover, the executive control of prefrontal cortex would have increased in the older children so that the LTM activity takes precedence over the STM one previously used by the younger children (a similar explanation can be applied to autistic children). One feature of this explanation requires further discussion, especially in the light of the so-called ‘theory of mind’ approach to the children’s mental powers. How do the children impute to the newcomer a mental state at all, and on what grounds? Do they really have a theory of mind? More specifically on what grounds do the infants infer that the newcomer has a mind at all, with associated
123
14
mental states like their own? Is this achieved solely by observational learning or do they possess some form of genetically pre-wired recognition system enabling to reach such a sophisticated conclusion? It is to be expected that higher order brain processes involved in such mirroring or observational learning are mainly concerned with understanding and copying the key indicators of external autonomy in an observed agent. Nonautonomous movements of objects in the environment will be expected to be dealt with at a much brain lower level, as already noted, such as in V5/MT or even at a lower level in the superior colliculus. Thus the activations of the higher level mirror neuron system, for a given class of autonomous agents for which internal models have been created, will be the possible internal ‘states of mind’ of the set of associated autonomous agents activated in the observing agent. It would thus seem that all that is needed for a mental state assumption of other such agents by a given agent is already achieved by the mirroring process—the construction of appropriate sets of internal forward-inverse motor control model pairs and their resulting activation by observation of the autonomous movement of another agent. However, is this construction able to provide an experience of the ‘mind’ of the other agent? The difficulty here is that there is apparently no internal architecture delineated to provide any experience of its own mind in the given autonomous agent being tested, let alone impute a mind to another such agent. But in terms of the present architecture it is unclear that a ‘mind’ is needed in the autonomous agent nor as being posited as present in other autonomous agents. Provided the autonomous agent can function through its mirror neuron system in the way suggested, suitably extended to the mental simulation loop with suitable working memory, executive control system and long- and short-term memory components [8], then it would be able to perform like the older children on the sweet box task, and with loss of executive control it would function like the 3 year olds. It thus appears that the ‘theory of mind’ explanation is not needed and from this viewpoint is a misnomer. Whatever experience the autonomous agent is gaining from the activations of its internal model system/mental simulation loop when it observes another agent applies both to itself and to other such similar autonomous agents. All of the agents being considered here are really to be regarded as zombies, with mind and consciousness not being appropriate concepts to discuss about them. Thus the question of ‘mental states’ is reduced to suitable defined states of neural activity allowing for effective responses and goal attainment. The autonomous agents presently under consideration would have no ‘mind’ in the proper meaning of the term. To construct agents with experiencing or conscious minds would need extension of the agents to
123
Cogn Comput (2009) 1:4–16
include advanced attention control system, say along the lines of that considered in the section ‘The need for consciousness (question 7)’ associated with an attention copy signal, for example. We have thus replaced the ‘theory of mind’ approach to observation learning by the ‘theory of internal replication’ for the class of autonomous agents without consciousness but with the ability of observational learning.
Future Avenues In all then, possible neural implementations of the architectures to solve the various questions 1–8 above have been suggested in this article. These have been mainly guided by the known (or conjectured) architectures of the human brain, this being the best example we presently have of a cognitive machine. However, there are certainly many improvements and alternative approaches to these architectures which space has precluded including, as well as further advances to be made over the next few decades. One important feature is the involvement of consciousness and its models in such progress. Most present research on creating cognitive machines tends to leave aside such esoteric problems as consciousness. It does appear to be an unnecessary component for embodied agents already beset with many lower level problems. However, from the ‘I’’s eye view described in the section ‘The need for consciousness (question 7),’ all of these approaches will be limited in their processing powers. Evolution has led us to a more complex solution but one seeming to need attention and consciousness to survive a range of complex environments (both external and internal in the brain itself), as well as develop societies able to support their members in a most effective manner. Thus I can only conclude that in the long-term attention and at an even higher level, consciousness must be included in the architecture of an autonomous cognitive agent. Short-term solutions of autonomous agency will not need it. But let me repeat: in the long term they will. It can even be asked if there is some faculty which lies beyond consciousness and which can be implemented in a cognitive agent so that it is even more cognitive than with consciousness. There may be such a faculty, but it seems that to try to continue even further upward in the processing hierarchy in the brain appears difficult. One could consider a hierarchy of attention levels, in which each involves objects of a certain level of complexity or a certain number of them grouped together. But what is to be gained? Attention as understood today achieves a sequential treatment of a complex environment, filtering out all but one (or a small group of) stimulus at a time. It is chosen because of its value at the time (in the given context and
Cogn Comput (2009) 1:4–16
the present goals of the agent, etc.). Why choose a number of stimuli to be filtered at once? Or chosen to be filtered by order of their complexity? Or chosen by any other criterion? There may be ways of taking object representations suitably expanded across the centre-line of vision, for example. But this does not change basic principles of filtering as simply as possible. So it is unclear that there are any better principles than the ones adumbrated in, for example, the CODAM model. As such, then, we as humans would have seemed to come to the end of the road of evolution, at least of our consciousness. Given such a situation, what are to be seen as the important avenues for future development, both from a theoretical or architectural point of view as well as from an applied position? There are several: 1.
2.
3. 4.
5.
6.
7.
8.
Extend (and if necessary modify as needed) the CODAM model and other neural models of consciousness, so as to test them against all available psychological and related brain imaging data. In the process, further tests of the models by such approaches will become clear, refining the testing even further. Hopefully the main framework of a suitable model will become clear by such a down-to-earth scientific approach. Create software able to function as the higher regions of the brain of a human, including if necessary various sub-cortical sites suggested as also important for helping create consciousness [22, 23]. Develop hardware platforms for providing embodiment for the software brain. Develop hardware chips of semi-realistic neurons able to provide a physical realization of the neuron processing in the brain (so as to go beyond pure software simulation to hardware emulation). Allow the embodied/cognitive chips system to train on suitably simple (but increasingly complex) environments so as to build autonomous internal representations of stimuli through vision (and other possible modalities) as well as develop internal models of actions and of the affordances associated to stimuli whose representations at different feature levels are being learnt. Expose the embodied/hardware brain to develop suitable internal models to solve simple reasoning tasks (the benchmark animal level ones). Develop a language understanding component, such as along the LAD lines or others, but which must be included in order to allow the overall system to begin to attain adult human cognitive capabilities. Create a neural architecture able to use the language system being developed as part of the stimulusresponse system so as to apply it to mathematical
15
9.
and logical reasoning. This is becoming close to the abilities of a digital computer, and possible HCI interactions should be considered as part of this avenue of R & D. In relation to the previous avenue, and on the basis of results coming from this and earlier avenues of R & D, begin to fuse with the structures in the semantic web, especially giving a grounding in the world for the semantic web.
All of these avenues of research will lead to corresponding developments penetrating the industrial and commercial domains, such as 1.
cognitive robots as more autonomous in homes, especially to help the aged; 2. robotic controllers in factories (beyond those presently in place), taking roles of control and decision making, especially in hostile environments; 3. robotic controllers in cars (as already beginning) so as to be safe drivers; 4. robotic teachers/hospital carers/doctors; 5. robotic service aids (in supermarkets, call centres, etc.); 6. robotic translation agents (in call centres, travel shops, etc.); 7. reasoning systems of a variety of sorts; 8. PC or mobile phone-based HCI systems allowing for direct communication between electronic device and human. At this point should be raised the ethical problems and dangers well known to be associated with robots becoming ubiquitous and in particular causing great problems in the job markets across the world. Such a problem will undoubtedly have to be faced in due course, although the speed of research in this area could well indicated at least several decades before many of the above proposed developments will have been achieved. But the problem will have to be faced ultimately as it will grow in magnitude continuously over those decades. Acknowledgements The author would like to thank the Cognitive Systems Unit of the EU for financial support through the GNOSYS project to create a cognitive robot (2004-7) and the MATHESIS project on Observational Learning (2006-9), as well as the EPSRC of the UK for support for developing a control model of attention (20036). He would also like to thank his numerous young colleagues involved in those projects for stimulating discussions.
References 1. Taylor JG. The mind: a user’s manual. Chichester: Wiley; 2006. 2. Taylor JG. Paying attention to consciousness. Prog Neurobiol. 2003;71:305–35. 3. Taylor NR, Hartley MR, Taylor JG. The micro-structure of attention. Neural Netw. 2005;19(9):1347–70.
123
16 4. Desmurget M, Grafton S. Forward modelling allows feedback control for fast reaching movements. Trends Cogn Sci. 2000; 4(11):423–31. 5. Taylor JG. CODAM: a model of attention leading to the creation of consciousness. Scholarpedia. 2007;2(11):1598. 6. Korsten N, Fragopanagos N, Hartley M, Taylor N, Taylor JG. Attention as a controller. Neural Netw. 2006;19:1408–21. 7. Schluter N, Krams M, Rushworth MFS, Passingham RE. Cerebral dominance for action in the human brain: the selection of actions. Neuropsychologia. 2001;39(2):105–13. 8. Hartley M, Taylor JG. Towards a neural model of mental simulation. In: Kurkova´ V, Neruda R, Koutnı´k J, editors. Artificial neural networks – ICANN 2008, Proceedings. Lecture notes in computer science, vol. 5163. Springer; 2008. p. 969–80. ISBN 978-3-540-87535-2. 9. Korsten N, Fragopanagos N, Taylor JG. Neural substructures for appraisal in emotion: self-esteem and depression. In: Marques de Sa J, Alexandre LA, Duch W, Mandovic D, editors. Artificial neural networks – ICANN 2007, Part II. Berlin: Springer; 2007. p. 850–8. 10. Weir AAS, Chappell J, Kacelnik A. Shaping of tools in New Caledonian Crows. Science. 2002;297:981–3. 11. Taylor JG, Kasderidis S, Trahanias P, Hartley M. A basis for cognitive machines. In: Kollias S, Stafylopatis A, Duch W, Oja E, editors. Artificial neural networks – ICANN 2006, Part I, Proceedings. Lecture notes in computer science, vol. 4131. Springer; 2006. p. 573–82. ISBN 13 978-3-540-38625-4. 12. Gergely G, Csibra G. Teleological reasoning in infancy: the naive theory of rational action. Trends Cogn Sci. 2003;7:287–92. 13. Taylor NR, Taylor JG. A novel novelty detector. In: Marques de Sa J, Alexandre LA, Duch W, Mandovic D, editors. Artificial neural networks – ICANN 2007, Part II. Berlin: Springer; 2007. p. 973–83. 14. Tanji J, Shima K, Mushiake H. Multiple cortical motor areas and temporal sequencing of movements. Brain Res Cogn Brain Res. 1996;5(1–2):117–22.
123
Cogn Comput (2009) 1:4–16 15. Taylor NR, Taylor JG. The neural networks for language in the brain: creating LAD, ch. 9. In: Hecht-Nielsen R, McKenna T, editors. Computational models for neuroscience. London: Springer; 2003. p. 245–66. 16. Taylor JG, Taylor NR, Apolloni B, Orovas C. Constructing symbols as manipulable structures by recurrent networks proc ICANN 2002. 17. iTALK at http://www.italkproject.org/. 18. Hurley S, Nudds M, editors. Rational animals? Oxford: Oxford University Press; 2006. 19. For results of the GNOSYS program see: http://www.cs. forth.gr/gnosys. Accessed 01 Jan 2009. 20. Damasio A. Descartes’ error. New York: Picador Press; 2000. 21. Taylor JG. The race for consciousness. Cambridge: MIT Press; 1999. 22. Crick F, Koch C. What is the function of the claustrum? Philos Trans R Soc B. 2005;360:1271–9. 23. LaBerge D. Defining awareness by the triangular circuit of attention. Psyche. 1998;4(7). .http://psyche.cs.monash.edu.au/v4/ psyche-4-07-laberge.html. 24. Shoemaker S. Self reference & self-awareness. J Philos. 1968;65:555–67. 25. Fragopanagos N, Kockelkoren S, Taylor JG. A neurodynamic model of the attentional blink. Brain Res Cogn Brain Res. 2005;24:568–86. 26. Zahavi D. Subjectivity & selfhood. Cambridge: MIT Press; 2005. 27. Sokolowski R. Introduction to phenomenology. Cambridge: Cambridge University Press; 2000. 28. Sartre J-P. Being and nothingness. London: Routledge; 1943. 29. Rizzolatti G, Fadiga L, Gallesse V, Fogassi L. Premotor cortex and the recognition of motor actions. Brain Res Cogn Brain Res. 1996;3:131–42. 30. Raos V, Evangeliou MN, Savaki HE. Observation of action: grasping and the mind’s hand. Neuroimage. 2004;23:193–204.