Comput Math Organ Theory (2010) 16: 271–299 DOI 10.1007/s10588-010-9065-3
The synthetic teammate project Jerry Ball · Christopher Myers · Andrea Heiberg · Nancy J. Cooke · Michael Matessa · Mary Freiman · Stuart Rodgers
Published online: 1 August 2010 © US Government 2010
Abstract The main objective of the Synthetic Teammate project is to develop language and task enabled synthetic agents capable of being integrated into team training simulations. To achieve this goal, the agents must be able to closely match human behavior. The initial application for the synthetic teammate research is creation of an agent able to perform the functions of a pilot for an Unmanned Aerial Vehicle (UAV) simulation as part of a three-person team. The agent, or synthetic teamJ. Ball () Air Force Research Laboratory, Mesa, AZ, USA e-mail:
[email protected] C. Myers Air Force Research Laboratory, Wright-Patterson Air Force Base, OH, USA e-mail:
[email protected] A. Heiberg General Motors, Mesa, AZ, USA e-mail:
[email protected] N.J. Cooke Cognitive Engineering Research Institute, Mesa, AZ, USA e-mail:
[email protected] M. Matessa Alion, Boulder, CO, USA e-mail:
[email protected] M. Freiman L3 Communications, Mesa, AZ, USA e-mail:
[email protected] S. Rodgers AGS TechNet, Dayton, OH, USA e-mail:
[email protected]
272
J. Ball et al.
mate, is being developed in the ACT-R cognitive architecture. The major components include: language comprehension and generation, dialog management, agentenvironment interaction, and situation assessment. Initial empirical results suggest that the agent-environment interaction is a good approximation to human behavior in the UAV environment, and we are planning further empirical tests of the synthetic teammate operating with human teammates. This paper covers the project’s modeling approach, challenges faced, progress made toward an integrated synthetic teammate, and lessons learned during development. Keywords Synthetic teammate · Language comprehension/generation · Dialog management · Situation model · Agent-environment interaction 1 Project overview Previous research has shown the benefits of developing synthetic agents to support team training (Jones et al. 1999; Tambe et al. 1995; Zachary et al. 2001) and for evaluation of computer interfaces (Byrne et al. 1994; Ritter et al. 2002). The main objective of the Synthetic Teammate project is to develop synthetic agents capable of being integrated into team training simulations. To achieve this goal while maintaining training efficacy, the synthetic agents must be capable of closely matching human behavior across several capacities, including situation assessment, task behavior, language comprehension and generation, and dialog management. Matching human behavior is a goal of computational cognitive architectures, which replicate human perceptual, cognitive, and motor abilities. Models developed within a cognitive architecture are empirically validated against human data. However, most models developed within cognitive architectures model laboratory tasks which isolate specific cognitive and perceptual phenomena. These models are typically small in scale and do not generalize to other tasks. Although this project takes advantage of previous work done with the ACT-R cognitive architecture (Anderson 2007), it broadens that work considerably, integrating multiple components into a large-scale model of a complex task. In developing a large-scale model of multiple cognitive capacities, this research aligns with research aimed at the development of Artificial General Intelligence. However, most AI research adopts a black-box approach that focuses on modeling input-output behavior and makes little or no commitment to the cognitive plausibility of internal mechanisms, relying instead on use of high powered algorithmic mechanisms that are typically not cognitively plausible. By way of contrast, our research is focused on glass-box modeling of the internal mechanisms in a cognitively plausible manner in support of developing a functional system (Ball 2006, 2008). In attempting to build a large-scale model of a complex task, we are pushing the field of computational cognitive modeling outside of its comfort zone. Complex integration issues which do not arise in small-scale models become major challenges. Determining the appropriate level of cognitive fidelity for the different components of the system hinges on the availability of time and resources, and interacts with the overall goal of building an end-to-end system. Empirical validation becomes a real challenge. The complexity of the model constrains the use of standard empirical validation methodologies (Cassimatis et al. 2008). It is unclear to what extent empirical
The synthetic teammate project
273
validation can proceed before a functional model is available. What are the implications for empirical validation of having components with varying levels of cognitive fidelity? These and other challenges must be overcome if we are to build models of complex tasks that can leverage research in cognitive architectures and cognitive modeling to advance progress in the development of synthetic teammates with high cognitive fidelity. This paper summarizes our current progress, highlights the challenges we have faced, and describes the choices we have made to overcome them. The initial application for the synthetic teammate research is the creation of an agent capable of functioning as the pilot of an Unmanned Aerial Vehicle (UAV) within a synthetic task environment (STE). We currently have a prototype synthetic teammate capable of processing a limited range of text inputs, responding appropriately and performing the piloting task. As the capabilities of the situation model and language comprehension components expand, the synthetic teammate will be able to handle a broader range of text inputs. Our ultimate goal is to empirically evaluate (and hopefully validate) the behavior of the synthetic teammate in an experiment involving two human teammates. To achieve this goal, the synthetic teammate must be capable of responding appropriately to close to the full range of text inputs that a human pilot must process. In addition, the synthetic teammate must match human behavior in the piloting task, and must be capable of doing all this in near real-time on available hardware.
2 Synthetic task environment The Synthetic Task Environment (STE) used for developing the synthetic teammate is the Cognitive Engineering Research on Team Tasks (CERTT) UAV-STE (Cooke and Shope 2005). The CERTT UAV-STE simulates teamwork aspects of UAV operations rather than equipment aspects (e.g., buttons and dials). The UAV-STE involves three heterogeneous and interdependent team members, each with a different role. The team members are the Data Exploitation Mission Planning and Communications operator (DEMPC, the navigator) who is responsible for creating a dynamic flight plan, including speed and altitude restrictions, an Air Vehicle Operator (AVO, the pilot) who controls flight settings and systems, and a Payload Operator (PLO, the photographer) who monitors sensor equipment and takes photographs. The team members’ common goal is to photograph ground targets during reconnaissance missions, and this requires interaction between all team members. Interaction occurs through either a text chat based communications system or via spoken language using head sets and microphones. A single UAV-STE mission consists of 9–12 targets and lasts a maximum of 40 minutes. However, a mission can end once the team photographs all possible targets. The CERTT UAV-STE provides an ideal environment for developing a synthetic teammate for three important reasons. First, there has been over a decade of team research using the CERTT UAV-STE that can be leveraged to guide synthetic teammate development and validation. An understanding of team and teammate behavior within the task is well documented and has already proven useful in the development of the synthetic teammate. Second, the task requires a high degree of
274
J. Ball et al.
coordination due to time pressures and mutual constraints among the team member roles. To perform well within the UAV-STE, team members must understand their own tasks, and, more importantly, coordinate with each other to complete their common goal. Third, the CERTT UAV-STE is a software simulation, which helps to reduce the complexity of integrating a software implementation of a synthetic teammate. Although there remain hardware components that the synthetic teammate must use (i.e., mouse and keyboard), numerous approaches to developing software based cognitive models have overcome this challenge (Byrne 2001; Kieras and Meyer 1997).
3 ACT-R cognitive architecture A cognitive architecture is a theory of human cognition. According to Anderson (2007, p. 7), “A cognitive architecture is a specification of the structure of the brain at a level of abstraction that explains how it achieves the function of the mind”. A cognitive architecture may or may not have a computational implementation. If it has a computational implementation, then models can be developed in the cognitive architecture which add the task-specific knowledge and mechanisms needed to perform a specific cognitive task. If the cognitive architecture has (simulated) perceptual/motor capabilities, models developed in the cognitive architecture can interact with (simulated) task environments. Cognitive architectures may also support measurement of the time it takes to complete a cognitive, perceptual or motor process. ACT-R is a cognitive architecture based on 30+ years of psychological research (Anderson 2007) with a computational implementation and simulated perceptual/motor capabilities that provides a modeling environment for the development of computational cognitive models. ACT-R provides support for measuring processing time and has been used extensively in the modeling of higher-level cognitive processes. ACT-R includes symbolic production and declarative memory systems integrated with subsymbolic production selection and declarative memory spreading activation and decay mechanisms. Production selection involves the parallel matching of the left-hand side of all productions against a collection of buffers (goal buffer, retrieval buffer, visual buffer, manual buffer) which contain the active contents of memory and perception. The production with the highest utility that matches the buffer contents is selected for execution. Production execution is a serial process— only one production is executed at a time. The parallel spreading activation and decay mechanism determines which declarative memory chunk is put into the retrieval buffer for comparison against productions. With its symbolic and subsymbolic processing mechanisms, ACT-R is a hybrid system of cognition. The subsymbolic processing mechanisms are modulated by a noise parameter which adds stochasticity to the architecture. ACT-R supports single inheritance of declarative memory chunks, limited, variable-based pattern matching against declarative memory (including a partial-matching capability), and forward chaining of productions. ACT-R incorporates learning mechanisms for learning both declarative and procedural knowledge. The major components of ACT-R are shown in Fig. 1 along with the mapping to the corresponding brain regions. The intentional module (brain location not iden-
The synthetic teammate project
275
Fig. 1 ACT-R cognitive architecture with mapping to brain regions (from Anderson et al. 2004)
tified) interfaces to the production system via the goal buffer which stores the current goal (located in the dorsal-lateral pre-frontal cortex). The declarative module (temporal region and hippocampus) interfaces to the production system via the retrieval buffer which stores the retrieved declarative memory chunk (ventro-lateral pre-frontal cortex). The visual module (occipital region) interfaces to the production system via the visual buffer which stores the currently attended visual object (parietal region). The visual module and the auditory module (not shown) combine to provide the perceptual capabilities of ACT-R. The manual or motor module (motor region/cerebellum) interfaces to the production system via the manual buffer which stores the current motor action (motor region). The production system (basal ganglia) includes mechanisms for matching productions (striatum), selecting productions (pallidum) and executing productions (thalamus). Production execution takes 50 ms. Additional time is required to perform memory retrievals, shift attention to visual objects and perform motor actions.
4 Synthetic teammate overview The Synthetic Teammate project is intended to lead to development of a cognitively plausible, yet functional synthetic teammate that operates as the pilot in the UAVSTE. The synthetic teammate is implemented within ACT-R, reflecting the focus on
276
J. Ball et al.
cognitive plausibility. As argued in Ball (2006), for inherently human behaviors like language comprehension and generation, the use of a cognitive architecture to guide and constrain the implementation of a system may actually facilitate, rather than hinder, development. The constraints imposed by the cognitive architecture push system development in cognitively plausible directions which are assumed more likely to lead to human-like behavior than purely algorithmic solutions which ignore such constraints. Although purely algorithmic solutions may provide short-term gains, they often lead to long-term difficulties as in a parser which processes the linguistic input from right to left—taking advantage of the punctuation at the end of a sentence—but cannot be integrated with a speech recognition system or process language incrementally in real-time. You don’t know what you’re giving up when you adopt cognitively implausible techniques. Where possible we have accepted short-term costs in pursuit of longer term benefits. For example, we chose to develop the word recognition subcomponent in ACT-R rather than using available off-the-shelf tokenizing and part of speech tagging tools. The off-the-shelf tools operate in stages which are not cognitively plausible. Our decision resulted in a short-term cost, but has allowed us to integrate the word recognition subcomponent with higher level language processing. This integration has facilitated the recognition of words and multi-word expressions in contextually dependent and cognitively plausible ways that are not available to the staged tokenizing and part of speech tagging approach. The major components of the synthetic teammate vary in their level of cognitive fidelity—short-term gains are sometimes necessary to achieve the goal of building an end-to-end system within resource constraints. The language generation component uses fixed templates to generate output text messages. Although fixed templates are not cognitively plausible, the language generation component still does a good job of modeling the output messaging behavior of the human pilot which provided the empirical data. Given more time and resources, we would modify this component to improve its cognitive fidelity. However, we do not currently have the resources to absorb the short-term degradation in the capabilities of this component that would be entailed. The major linguistic components of the system include text chat based language comprehension and generation components, which are under the control of a dialog manager (see Fig. 2). The linguistic subsystem interacts with a situation model component that contains propositional representations of the current state of affairs encoded from reading text chat messages and interacting with the task environment. The situation model component is intended to be a computational implementation of the notion of a situation model as originally put forward in van Dijk and Kintsch (1983) with extensions summarized in Zwann and Radvansky (1998). The agentenvironment interaction component implements the observable behavior of the system, controlling shifts of attention in the visual system and motor actions needed to perform the pilot’s tasks. Input to the system is mediated by ACT-R’s visual module and motor actions are mediated by ACT-R’s motor module. The visual and motor modules provide an interface between ACT-R and the environment. Each of the system components makes use of ACT-R’s intentional module, declarative memory and production system.
The synthetic teammate project
277
Fig. 2 Synthetic teammate overview
The components of the synthetic teammate were initially developed in isolation from each other. The components have now been integrated into a single system, sharing information through the situation model component. Although all of the synthetic teammate’s components have been integrated, each of the components is continuously being improved. At the time of publication and across all five of the components, the synthetic teammate represents a very large ACT-R model with 881 productions representing its procedural memory, and 6940 declarative memory chunks representing its declarative memory at startup of the model. When the model is run, new chunks are created which gradually increase the size of the declarative memory. The following sections provide further details for each of the components.
5 Language comprehension component The language comprehension component has been under development in ACT-R since 2002, and extends the research of Ball (1991) which included a Prolog based implementation of language processing. The first ACT-R implementation was in ACT-R version 5 (Ball 2004a). The current version runs in ACT-R 6 (Ball et al. 2007). The language comprehension component is intended to be a domain general system capable of handling a broad range of English constructions. Additions to the model to handle the text chat specific corpus are being made in the context of a regression testing capability to insure that the broad coverage of the component is maintained.
278
J. Ball et al.
The language comprehension component is a construction-driven processing system (Ball 2007a) based on a linguistic theory of the grammatical encoding of referential and relational meaning (Ball 2007b) which is aligned with basic principles of Cognitive and Construction Grammar (cf. Langacker 1987, 1991). Lexical items in the linguistic input activate constructions which drive processing. For example, the verb “get” activates a transitive verb construction which may be selected for processing. This construction, if selected, sets up an expectation for an object to occur. The transitive verb construction also projects a clausal construction (if one hasn’t already been projected by a preceding auxiliary verb). The clausal construction sets up an expectation for a subject. The subject of the clausal construction is typically available in the current context and, if available, is integrated into the clausal construction. The absence of a subject can trigger projection of an imperative clause construction if the verb is in the base form as in “get the photo”, otherwise a declarative clause construction is projected even if the subject is missing (e.g., “got the photo”). The occurrence of an auxiliary verb preceding the subject can trigger projection of a yes-no question construction as in “did you get the photo”. If a wh-word precedes the auxiliary verb, a wh-question construction is projected as in “who is getting the photo” or “when are you getting the photo”. The language comprehension component adheres to two well-established cognitive constraints on language processing—incremental and interactive processing (cf. Gibson and Pearlmutter 1998; Altmann and Steedman 1988; Tanenhaus et al. 1995). The component processes the input incrementally (one word at time), constructing a linguistic representation of the input based on the current word, constructions activated by the word, and the prior context. If necessary, the current input is accommodated by adjusting the current representation or coercing the current input into that representation without backtracking or lookahead. The mechanism of context accommodation is part and parcel of the basic left-to-right, incremental processing mechanism which is implemented in ACT-R’s serial production system. For example, in the processing of “the airspeed restriction”, when “airspeed” is processed it is integrated as the head of the nominal construction projected by “the”. However, when the word “restriction” is processed, the nominal construction is adjusted so that “airspeed” functions as a modifier, with “restriction” functioning as the head. Context accommodation often avoids the need to carry forward multiple representations in parallel, and yet the model still arrives at an appropriate representation at the end of processing. The language processor is highly context sensitive and makes use of all available information—lexical, syntactic, semantic and pragmatic—in interactively deciding how to process a given input at each choice point. There is no autonomous syntactic component or syntactic processor, although grammatical information is very important for determining meaning. Contextual information is probabilistically summed via ACT-R’s parallel spreading activation mechanism to return the best alternative from DM given the current input and context. The selected alternative is assumed to be correct and the processor proceeds deterministically and serially forward. The context sensitive, probabilistic, parallel, spreading activation mechanism, combined with a mechanism of context accommodation makes a deterministic, serial language processing system which builds a single representation possible. However, overall the
The synthetic teammate project
279
Table 1 Example text-chat communications between teammates in the CERTT UAV-STE Message sender
Message
Photographer
Got photo lets go
Navigator
Radius = 5, Speed = 300-400, Alt. = 3000-5000
Photographer
Can you go faster yet or is it still 200
Navigator
H-Area = Speed = 50-20
Pilot
We’re inside the parameters will continue cruising at 377 - alt. 2645
system is pseudo-deterministic in that the parallel integration of information at each choice point and the context accommodation mechanism, which is non-monotonic, are not characteristic of deterministic processing. Recent modifications to the language comprehension component have focused on the processing of long-distance dependencies—demonstrating that the system is capable of handling theoretically important constructions such as “hei promised me ti to go” (“promise” is a subject control verb) and “he persuaded mei ti to go.” (“persuade” is an object control verb). In the first example, the index i indicates that “hei ”, the subject of “promise”, corresponds to the implied subject “ti ” of “to go”—i.e. “ti ” occurs in subject position before “to go”. The referent of “hei ” is the person going. In the second example, “mei ”, the object of “persuade”, corresponds to the implied subject “ti ” of “to go”. The referent of “mei ” is the person going. “ti ” stands for a trace of the implied subject. Modeling long-distance dependencies and identifying implicit subjects is a well-known challenge for development of functional language comprehension systems. The language comprehension component is also being extended to handle the text chat corpus that was collected in an experiment involving human subjects and the CERTT UAV-STE. The text chat corpus is full of interesting variability and irregularity in the form of linguistic input (e.g., typos, spelling variants, morphological variants, abbreviations, acronyms, concatenations, new coinages; see Table 1). In order to handle this variability, lower level processes of word recognition have been added to the language comprehension component. The spreading activation mechanism of ACT-R allows the model to retrieve words from the mental lexicon that are not an exact match to the input. Letters and trigrams in the input spread activation to the words containing those letters and trigrams in the mental lexicon. These processes and encodings are based on the Interactive Activation model of word recognition (McClelland and Rumelhart 1981), with the addition of trigrams based on the “letter triples” as later described by Seidenberg and McClelland (1989). Though inspired by the findings of word recognition studies, this subcomponent does not just model an isolated word recognition task. It is embedded in the language comprehension component as a whole; therefore, the effects of context and previous activation levels must be taken into consideration when encoding each individual word (Freiman and Ball 2008). In addition to adhering to cognitive constraints on the incremental and interactive processing of the linguistic input, we have recently focused on improving the model’s reading rate to bring it into closer alignment with adult human reading rates
280
J. Ball et al.
(Freiman and Ball 2010). To achieve this, we modified ACT-R to incorporate a more cognitively plausible perceptual span. Previously, a word like “ACT-R” required three separate attention fixations—one each for “ACT-R”, “-“ and “R”—which was both slow and cognitively implausible. Now “ACT-R” is recognized in a single fixation. The improved perceptual span also supports the recognition of multi-word expressions (e.g. “get out”). In addition, we modified the language comprehension model to reduce the amount of structure building to the minimum needed given the linguistic input. Previously the model built structure that was only needed to handle more complex inputs. For simpler inputs, this additional structure was not necessary. For example, given the simple input “the airplane”, there is no need to build structure to support integration of a modifier like “on the runway” which occurs in the more complex input “the airplane on the runway”. The model now builds the minimal structure needed for the simpler input, but is also capable of efficiently building more complex structures when needed. With these improvements, the model is currently capable of reading 143 words per minute compared to a range of 200–300 words per minute for adult humans. Although we have not yet achieved adult human reading rates, we have identified additional improvements which should allow us to do so. Basic claims of cognitive plausibility are primarily founded on adherence to wellestablished cognitive constraints, and are less reliant on matching specific human data sets. ACT-R is a major contributor to this adherence. To see this, we compare a few mechanisms of ACT-R to Prolog, a logic programming language originally designed to support natural language processing applications (cf. Colmerauer and Roussel 1996). Productions in ACT-R are quite similar to the logic rules supported in Prolog. If the left-hand side of a Prolog logic rule matches the current context, Prolog tries to “prove” the right-hand side of the rule. This is equivalent to the matching of an ACT-R production against the buffers which represent the current context, and executing the right-hand side of the production. However, in selecting a logic rule, Prolog uses a powerful unification mechanism which is capable of matching arbitrarily complex, deeply nested structures together. ACT-R has a much less powerful pattern matching capability limited to matching structures at a single unnested level. By comparison, although Vosse and Kempen (2000) use unification in their cognitively motivated syntax processing model, they limit it to non-recursive unification (i.e. unification on a single level). In ACT-R, the selection of a production is determined by its utility relative to other productions which match the current context. The utility mechanism supports the selection of productions that are likely to lead to successful behavior given the current context. The result is a processing mechanism which presents the appearance of deterministic processing—the model always selects the best production given the context (subject to noise). Prolog provides no utility mechanism for deterministically selecting likely rules—all rules which match the current context are equally eligible for selection. Instead, Prolog provides a non-deterministic, algorithmic backtracking mechanism, and relies on detecting failure to drive rule selection and execution. There is little psychological evidence that humans use anything like non-deterministic, algorithmic backtracking in their cognitive processing. Even when humans do occasionally backtrack during language processing, that backtracking is not algorithmic in always backing up to the most recent choice point (cf. Grodner et al. 2003), and the context
The synthetic teammate project
281
that was created prior to backtracking is not retracted (cf. Christianson et al. 2001). Further, there is extensive psychological evidence that proceduralized skills are directionally specific (Anderson et al. 1997). Humans have great difficulty in performing procedural skills backwards. In line with this evidence, ACT-R only supports forward chaining of productions and does not provide any backtracking or back chaining capability. ACT-R also provides a spreading activation mechanism which supports the retrieval of declarative memory chunks which are not an exact match to the current input. There is extensive evidence that humans are good at recognizing the similarity between perceptual inputs (including memories of perceptual inputs) which are not an exact match (cf. McClelland and Rumelhart 1981). We refer to this capability as providing soft constraints on memory retrieval. Prolog provides no such partial matching capability, requiring hard constraint, exact matches in its unification process. These are just some of the ways in which the use of ACT-R constrains the language comprehension model to be more cognitively plausible. Of course, it is still possible to build a cognitively implausible model within ACT-R, and fitting the model to human data provides an additional mechanism for validating cognitive models developed in ACT-R. Even a model developed in ACT-R that fits a given data set may be cognitively implausible. One criticism of cognitive modeling is that modelers can manipulate architectural parameters to fit a model to almost any isolated data set. A model that fits a broader range of human data has less degrees of freedom in terms of its parameter settings and this reduces the risks of over-fitting to a particular data set. A model capable of functioning as a pilot in a complex task environment would have very few degrees of freedom for adjusting parameters. In integrating our separately developed components, one of the challenges we faced was reconciling the parameter settings across the individual components.
6 Language generation & dialog management components The language generation and dialog management components were developed to capture the dynamic nature of human language production, following earlier approaches involving dynamic dialog constraints (Ericsson 2004), accommodation (Matessa 2000), and adaptive content selection (Walker et al. 2004). The focus of the language generation component is on selecting from a set of possible utterances, akin to overgeneration-and-ranking approaches (Varges 2006). The focus of the dialog management component is the management of communication obligations (Traum and Allen 1994), with messages abstracted as dialog acts (Core and Allen 1997). Although the development of the language generation and dialog management components is informed by research in natural language generation and dialog modeling, our decision to implement the capability within ACT-R precluded use of an existing system. Early on we did consider use of the Mumble system (McDonald 1999), but decided to go with our own ACT-R implementation. Even though the language generation component is developed in ACT-R, we relaxed the cognitive constraints, adopting a template based approach, in order to build an end-to-end system given available resources. With more time and resources, we will be able to improve the cognitive plausibility of these components. Had we chosen an existing system, it would be much more difficult to improve its cognitive plausibility.
282
J. Ball et al.
The language generation component uses Optimality Theory (OT; Prince and Smolensky 1993/2004) to select an optimal utterance, given a set of utterances and their dialog constraints. In OT, constraints are simple, violable, conflicting, and motivated by cross-linguistic evidence. Constraints are arranged in a strict dominance hierarchy; the optimal utterance is the one that least violates the hierarchy. The language generation component expresses constraint ranking through ACT-R declarative memory activation: the most important constraint is most highly activated. Activation spreads from dialog constraints to utterances to determine which utterance is retrieved from memory; the most important constraint has the greatest effect on the retrieval. Under standard OT, the permutations of a hierarchy of universal constraints predict a factorial typology of language systems (Prince and Smolensky 1993/2004). For example, the constraint hierarchy A B may describe the phonological system of one language, while B A may describe another. In contrast, for human language production, there is variation in the utterances of a single speaker in a single language. The variation in human language production may be similarly captured through the permutations of an OT constraint hierarchy. The notion of stochastic OT (Boersma and Hayes 2001) helps account for the variation. Under stochastic OT, constraints are ranked on a continuous scale, with overlapping ranking distributions. Factors from the situation model component dynamically affect the constraint ranking, providing principled variation in utterances over time. The dialog management component models the push and pull of information to and from the synthetic teammate. Messages are abstracted as speech acts based on an annotation scheme developed by Core and Allen (1997). For example, the message “Can I proceed to the next waypoint?” is classified as a check question. By abstracting messages to speech acts such as questions, statements, and agreements, general rules can be used to respond to the intention of the message sender. This reduces the need for numerous specific dialog rules that would check the specific words that are used. In normal conversation, there is a sense of obligation to follow rules of communication. For example, if a question is asked, that question should be answered, or at least addressed somehow. Some of these obligation rules were made explicit by Traum and Allen (1994) and were found to be useful in other dialog management systems (e.g., Traum et al. 2003; Matessa 2000). As Traum and Allen note, these rules do not represent the full complexity of a deontic logic of behavior obligation (e.g., Yang and Bello 2005), but they allow integration with a more robust deontic reasoning system. Obligation rules facilitate effective discourse by setting up expectations for future messages (for example, a questioner can expect that a response such as “Yes” will occur and will refer back to the original question). Messages with little local context information can then be understood because of the context from previous expectations. In the current dialog management component, obligations are stored as declarative chunks in a specialized module to represent the obligations of others and self. The module is partially motivated by neurological studies that find evidence for processing the beliefs and intentions of others in the paracingulate cortex (Gallagher and Frith 2003). Dialog management productions create, release, and use obligations to fill in context. In addition to the obligation module, the dialog management component also uses a temporal module extension to ACT-R to avoid repeatedly asking for the same information.
The synthetic teammate project
283
Fig. 3 The left box is an example of the interface used to enter the airspeed, altitude, and course. The right box is used for setting waypoints
7 Agent-environment interaction component The agent-environment interaction component was developed to fly the UAV from waypoint to waypoint in a cognitively plausible manner. Flying to waypoints involves interacting with the UAV-STE to queue the correct waypoint and enter the correct course. The pilot must also set the UAV airspeed and altitude within restrictions provided by the photographer (PLO) and navigator (DEMPC). The component directly interacts with the UAV-STE using the same devices as humans (via the underlying operating system commands), using the mouse pointer to interact with the UAV flight controls in a point-and-click fashion and the keyboard to send and receive messages to and from teammates. The following sections cover a task analysis of the pilot and brief descriptions of modeled strategies used by the synthetic teammate. 7.1 A task analysis of the CERTT UAV-STE air vehicle operator To fly the UAV, the pilot must complete six goals: 1) set the airspeed, 2) altitude, 3) course, 4) waypoint, and 5) send and 6) receive text chat messages. Of these six, the solution for the first four is covered in this section. The CERTT UAV-STE was designed to simulate a team task; consequently the user interface was not designed as a high fidelity representation of any existing UAV system in use by the military. To maneuver the UAV from one location to another, the pilot uses a point-and-click interface to enter settings (see Fig. 3). To set the waypoint, the pilot toggles through a list of 109 alphabetically organized waypoints by pressing the setting adjustment buttons. Each time an adjustment button is pressed, a new waypoint value is queued (e.g., BEP in Fig. 3). The waypoint list operates as a continuous loop so that A comes after Z. When the pilot has queued the next waypoint to visit, she presses the “New TO” button, which would result in changing “H-AREA” to “BEP” in Fig. 3. There are three flight parameters (altitude, course, and airspeed). Each flight parameter has a separate user interface, though they are nearly identical. To set the flight parameters, the pilot adjusts the setting value by using the small (+ | −) and large (++ | −−) setting adjustment buttons. These buttons have different increments and decrements depending on the parameter (see Table 2). Similar to the waypoint list, course values are a continuous loop, returning to 1° after 359°. The airspeed and altitude values do not loop, but begin at 0 and end at 999 or 9999, respectively. When the desired setting value is reached, the pilot presses the “Enter” button to complete the setting goal.
284 Table 2 Setting adjustment buttons for each task setting goal. The waypoint buttons either increment to the next (+), or decrement (−) to the previous waypoint in an alphabetical list. The other button increments increase or decrease setting values, accordingly
J. Ball et al. Task goals
Large
Small
++ | −−
+|−
Airspeed
20 | −20
2 | −2
Altitude
1000 | −1000
100 | −100
Course
10 | −10
1 | −1
Waypoint
Not applicable
1 | −1
A hierarchical task analysis using NGOMSL notation (Kieras 1988) revealed a consistent three-step subgoal structure across each of the four goals, composed of 1) obtaining the desired setting value, 2) comparing the desired setting value against the current value, and 3) Changing the current setting value to the desired value. The following methods m and selection rules sr are identical across each of the four goals: Obtain subgoal sr: Either Retrieve the desired information from memory Or Request the information from a teammate. Compare subgoal m: 1. m Visually encode one of the adjustment buttons 2. m Move mouse to, and click on, button [system-event] := setting value appears 3. m Visually encode setting value 4. sr IF button adjustment values are unknown, THEN retrieve them from memory 5. sr Given the current setting value, desired setting value, and adjustment button values, select adjustment button Change subgoal m: 1. sr IF mouse is at the selected adjustment button, THEN goto m 4; ELSE continue 2. m Visually encode button 3. m Move mouse to button 4. m Click mouse [system-event] := setting value changes 5. sr IF not attending to setting value, THEN visually encode setting value; ELSE continue 6. sr IF the current setting equals the desired setting, THEN visually encode “Enter”/”New TO” button and goto change subgoal m 7; ELSE IF large adjustment clicked, THEN goto compare subgoal, sr 5; ELSE goto change subgoal, m 4. 7. m Click mouse–return with goal accomplished. [system-event] := setting value disappears Although setting flight parameters and waypoints follow the same subgoal structure, methods for completing steps in the subgoal methods presented above diverge. The divergence results from five differences between setting a waypoint and setting
The synthetic teammate project
285
the flight parameters. First, the adjustment buttons for setting the flight parameters have small and large adjustments, whereas there are only small adjustments for setting the waypoint (see Fig. 3 and Table 2). Second, there is an “Enter” button for setting the flight parameters and a “New TO” button for setting the waypoint. Although these buttons have different names, their functions are identical. Third, the values of flight parameters are integers, whereas waypoints are strings of numbers and letters (e.g., WP8, BEP). The fourth difference is the addition of the queued value for setting the waypoint, and the fifth and final difference is the spatial arrangements of the user interfaces. These differences specifically affect methods for completing sr 5 of the compare subgoal. Strategies for setting the flight parameters and waypoints are explained in the remainder of this section. 7.2 Strategies for setting waypoints & flight parameters & maximizing rule reuse When the synthetic teammate sets one of the flight parameters, it first obtains the desired setting value, and compares the desired setting to the current setting, following the task analysis presented above. When a flight parameter needs to be changed, the synthetic teammate uses an “undershoot strategy” similar to one described by Lovett (1998). To execute the undershoot strategy, the synthetic teammate selects a large or small adjustment button based on the difference between the desired and current setting values and the values of the large and small adjustment buttons for the particular setting. Until the difference between the current and desired values is less than the size of the large increment adjustment button for the flight parameter being set, the synthetic teammate selects and uses the large increment button. Once the difference between the current and desired values is less than the size of the large increment adjustment button the synthetic teammate then selects and uses the small increment button until the desired value is reached. The agent-environment interaction component was not given knowledge that course values looped back to 1° after passing 360°. The waypoint adjustment button selection strategy is based on a model of letter retrieval and comparison hypothesized by Klahr et al. (1983). The English alphabet was divided into six individual ACT-R chunks that contained letters (i.e., alpha-chunks), instantiating Klahr et al.’s alphabet nodes. Alpha-chunks were stored as items in the synthetic teammate’s declarative memory. In addition to letters, the chunk’s name and the name of the subsequent alpha-chunk were also stored in alpha-chunks. Different from Klahr et al., slots for the name of the alpha-chunk that comes prior to a given alpha-chunk and the absolute position of the alphabet chunk in the alphabet were added to each alpha-chunk. These additions were strings and thus have no effect on memory retrieval in ACT-R, and were used for comparing positional relationships between alpha-chunks. A two-step process was developed to select the appropriate setting adjustment button (i.e., sr 5 of the compare subgoal), and changing waypoint BEO to BEP will be used as an example. The process begins by comparing the first letter of each waypoint name (e.g., B in BEO and BEP). If they are equal, subsequent letters are compared until two are different (e.g., O and P). At this point the second step begins. The second step involves retrieving alpha-chunks for each of the different letters for comparison
286
J. Ball et al.
(in our example letters O and P) from declarative memory, and then comparing information from the chunks. When retrieving alpha-chunks, activation is spread from letters in the goal buffer. Thus, alpha-chunks are retrieved independently, without the need to serially traverse the alpha-chunks/nodes until the desired alpha-chunk is reached. This parallel retrieval of alpha-chunks differs from the Klahr et al. (1983) model which serially traverses alpha-chunks/nodes. The structure of the goal hierarchy gleaned from the task analysis suggests that there should be a high proportion of shared ACT-R production rules to set flight parameters and waypoints. The similarity in procedures for setting the flight parameters is high, and the only difference is the setting adjustment increments retrieved from declarative memory. Hence, each flight parameter (i.e., airspeed, altitude, and course) shares 100% of its production rules with each of the other flight parameters. However, production sharing between the setting flight parameters and waypoints is not nearly as high, with a minimum of 32% and a maximum of 44.5%. The maximum similarity value comes from the model not having to serially search through an alpha-chunk, and the minimum value comes from the model having to exhaustively search through the largest alpha-chunk. 7.3 Lessons learned from agent-environment interaction component development A key lesson learned is the importance of conducting a detailed task analysis as the initial step in the model development processes. The task analysis provided insight into processes that must be carried out by the model, such as integer and letter comparisons and the degree of overlap among the processes. Furthermore, the task analysis helped to identify the type of information processed (e.g., integers and letters), and how the processes had to differ as a function of the different types of information. Another lesson learned in the development of the agent-environment interaction component is that rather than reinventing the wheel, it is advantageous to search for previously published models that perform processes that are integral to the model being developed. For example, rather than making assumptions on how to represent the human alphabet within the model’s declarative memory system, we looked to the literature for guidance and found the Klahr et al. (1983) model of alphabet retrieval. A direct consequence of incorporating their model demonstrates the generality of the Klahr et al. model and simultaneously reduces assumptions within the agentenvironment interaction component. 8 Situation model component The concept of a situation model originates in the research of van Dijk and Kintsch (1983) and corresponds to a mental representation of the propositional content of a text—including the addition of propositions corresponding to inferences that are derived from the text and encoded from the environment. The term situation model implies that this propositional representation is a model of the situation described in the text. For example, given the text “he put the book on the table” a propositional representation like PUT(JOHN,ON(BOOK,TABLE))
The synthetic teammate project
287
(where “he” is resolved to refer to John and the use of uppercase words correspond to concepts) might be generated. Note that this representation contains the inference that the book is on the table. To date, the mapping from a linguistic text to a propositional representation of the corresponding situation has not been fully automated in the computational research of Kintsch (1998). Later psychological research on situation models has established that the mental representation of situations corresponding to texts contains spatial-imaginal and temporal information, as well as propositional information (cf. Zwann and Radvansky 1998). However, we are not aware of any computational accounts of how spatial-imaginal information is represented in a situation model. In our system, the situation model component represents the current situation as informed by the linguistic input, the task environment, the discourse context, and salient world knowledge. The situation model component is also the interface between the language comprehension and language generation components and the agent-environment interaction component. The situation model constitutes a key element of meaning representation of the system, although the linguistic representations that are mapped into the situation model also represent important aspects of meaning. The situation model component is responsible for grounding the meaning of referring expressions in the linguistic input into the objects and situations which are represented in the situation model. The design of the situation model component relies on a dual strategy for constructing and refining the situation model. This dual strategy draws information primarily from the language comprehension component and from the task environment. As the language comprehension component identifies either an object referring expression (i.e., nominal) or a situation referring expression (i.e., clause), the situation model creates or updates a corresponding situation element (an object, event, action, relation, or a subtype of one of these). The correspondence between the expression identified by language comprehension and the situation element is dependent on the elements of the referring expression and the state of the situation model. As the task environment is updated (as the model runs), context from the task state and the situation model result in production firings to reflect the model’s intentions, awareness, and recognition of needed actions. The initial implementation of these mechanisms was highly engineered and one focus of the research is to determine the regularities and patterns within the correspondence mappings in order to define domain general mechanisms. 8.1 Propositional content In terms of representing propositional content, we adhere to the principle that the propositional (or logical) notation should be as close to English as possible (Hobbs 1985; Ball 2010). In this regard, the predicates used in the propositional representations are referred to as “word-concepts”. That is, they are concepts that correspond to English words. The primary distinction between a word and a word-concept is not based on the idea that concepts are non-linguistic or pre-linguistic, but that words are organized into an ontology which reflects their grammatical function, whereas word-concepts are organized into an ontology which reflects their semantic content.
288
J. Ball et al.
In this regard, we are considering the use of WordNet synonym sets (cf. Miller 1995) as the source of word-concepts. For example, the word “kick” is grammatically categorized as a transitive verb, whereas the word-concept “kick-1-cncp” is semantically categorized as a contact verb and “kick-2-cncp” is categorized as a motion verb in WordNet—in two common verb senses of “kick”. The word “kick” participates in linguistic processing and the generation of linguistic representations, whereas the word-concepts “kick-1-cncp” and “kick-2-cncp” participate in situation model processing and in the generation of situation model representations. In the simplest case, there is a direct mapping from word to word-concept and the generation of a situation model representation from a linguistic representation is facilitated. However, besides often having multiple senses that need to be disambiguated to do the mapping, it may be that words map into word-concepts based on a synonym of a word, rather than the word itself. For example, the word “radius” as used in “the effective radius is 5 miles”—which indicates the region around a waypoint at which a picture may be taken—may map into a “region-cncp” which could be used as the word-concept label for the WordNet synonym set for this sense of “radius”. Besides specifying the nature of word-concepts corresponding to predicates, we need to specify how these predicates are integrated together into complex representations, and, ultimately, how these representations are mapped into the representational formalism of ACT-R which is frame based—i.e., DM chunks are named and typed sequences of slot-value pairs organized into a single inheritance hierarchy. We are borrowing ideas from Hobbs (1985, 2003) and Discourse Representation Theory (Kamp and Ryle 1993) in the design and implementation of our propositional system of representation. Many of the domain specific concepts have now been identified via analysis of the text chat corpus and task domain. 8.2 Spatial & imaginal content In this task, it is the responsibility of the navigator to decide on the order of waypoint destinations and therefore to have a spatial representation of where waypoints are on a map. Even though the pilot “flies” the UAV, this is done through an autopilot which takes waypoint identifiers as input. No map is available to the pilot, and so the model does not need to represent spatial locations of waypoints. To represent spatial aspects of the pilot interface such as the order of planned waypoints, we plan to use a spatial module developed for use with ACT-R and described in Douglass (2007). This module is designed to support the mental representation of objects and spatial relations between objects in a graphical display. An obvious use of this module is for representing the graphical objects in the three monitors that constitute the graphical user interface (GUI) of the pilot. Another possible use is to represent the sequence of waypoints that must be visited during a reconnaissance mission. Evidence that humans reason using imaginal information is abundant (cf. Kosslyn 2006). However, we are not aware of any computational implementations of an imaginal reasoning capability. Although imaginal representations are of interest, development of such a capability is currently outside the scope of the project.
The synthetic teammate project
289
8.3 Discourse content A representation of the discourse participants (e.g., photographer, navigator, Intel Officer [played by the experimenter], pilot) is crucial to development of a functional synthetic teammate, as is a capability to determine the discourse acts that are inferable from the linguistic inputs. For example, when the photographer sends the message “I need to be above 3000” to the pilot, the pilot must infer that this is a request to increase the altitude of the UAV to be above 3000 feet above sea level, despite the fact that the linguistic input is a declarative statement which is ostensibly about the photographer, not the UAV, and there is no mention of what “3000” quantifies. As the discourse advances across missions, human teammates adapt to each other’s communications, standardizing forms and providing less and less explicit content in the messages. An adaptive capability to adopt standard forms and to infer implicit information from the evolving discourse context is needed (Matessa 2000). That adaptive capability will hinge on the information available in the situation model. We would also like the synthetic teammate to be capable of reasoning about the mental state of the other team members, but this is currently outside the scope of our development efforts.
9 Component integration A major milestone in the Synthetic Teammate project was achieved when all the major components were integrated into an end-to-end system. This integration was based on the processing of the text chat communication required to pass through an initial entry waypoint and proceed to the first target waypoint. Several development tasks were accomplished to achieve this milestone: 1) an initial implementation of the situation model component capable of representing the referents of the incoming text chat communication was developed; 2) the language comprehension component was extended to map linguistic representations into the situation model; 3) the agent-environment interaction component was modified to access information in the situation model; 4) relevant productions in the language generation component were modified to interact with the situation model; and 5) a lightweight agent implementation of the navigator was developed using the Rule & Automata Modeling Language (RaAML) (Douglass 2010) and interfaced to the synthetic teammate. The synthetic teammate is not yet capable of interacting with the UAV-STE and human teammates. To support development and component integration, some capability to generate inputs to the synthetic teammate from the navigator and photographer was needed. Initially these inputs were embedded within the synthetic teammate itself. To improve the synthetic teammate, we removed the embedded inputs and implemented lightweight agent versions of the navigator and photographer. These lightweight agents run in separate process threads, send text chat to the synthetic teammate, and respond to text chat outputs from the synthetic teammate. Although the navigator and photographer agents are not cognitive models on the scale or fidelity of the synthetic teammate, they perform a very useful development function in mimicking the input/output behavior of human navigator and photographer.
290
J. Ball et al.
The synthetic teammate now receives linguistic inputs specifying the entry waypoint and first target waypoint from the navigator and photographer agents, processes the inputs and generates linguistic representations, maps the linguistic representations into the situation model, sets the current and next waypoint based on the situation model, and sets the flight parameters based on the task environment. We are now working on the capability of the synthetic teammate to process a broader range of linguistic inputs, map the resulting linguistic representations into the situation model and respond appropriately. Achieving this will require scaling up the synthetic teammate beyond its already large size and capability.
10 Scaling up the cognitive architecture ACT-R was designed to support the development of small-scale cognitive models of specific laboratory phenomena. Since the advent of the first computational version of ACT-R, hundreds of small-scale models have been developed. The Synthetic Teammate project is one of a few attempts to develop a larger-scale model (or system of models) in ACT-R. This development is pushing the architecture in directions for which it was not originally designed. For example, the parallel spreading activation mechanism of the ACT-R architecture is computationally explosive on serial hardware. To support the computation of the activation of DM chunks corresponding to thousands of lexical items, we have integrated the PostGreSQL relational database with ACT-R (Douglass et al. 2009). The relational database allows us to externalize ACT-R’s DM and provides highly efficient database retrieval mechanisms that are allowing us to expand the model’s mental lexicon to a reasonable size. Further, the integration of a relational database allows us to maintain declarative knowledge acquired over many model runs—a capability that was previously unavailable in ACT-R. The current language comprehension component contains approximately 7000 words in its mental lexicon. For this project, we expect to need 10,000–15,000 words in the mental lexicon. We have developed an approach to map the entries from WordNet and other lexical resources into lexical chunks of the form needed by the language comprehension model. The mapping of nouns, adjective and adverbs was straightforward and has been substantially automated, but the mapping of verbs with their varying argument structures is more complex. Although the principle mapping of verbs to lexical chunks has been determined, there remains some work to fully handle the sub-typing of verbs. Currently the model has some capability for word sense disambiguation (WSD), but the addition of a full-size mental lexicon will stress this capability beyond its limits. We are evaluating the use of Latent Semantic Analysis (cf. Landauer and Dumais 1997) to provide additional WSD capability. In addition, it is not enough to just have a large lexicon. The model must be capable of taking appropriate action given the linguistic input, and this requires a deeper level of understanding than is typical of most wide coverage, but superficial, computational linguistic systems.
The synthetic teammate project
291
11 Empirical validation An important goal of the project is to develop a synthetic teammate that is at once functional and cognitively plausible. In a system as complex as the synthetic teammate, empirical validation is a significant challenge. It is not practical to individually validate all the behaviors of the system. Instead, a few key behaviors have already been tested and a few more will be selected for scrutiny and validated against empirical data. At the highest level, we will determine whether or not teams with a synthetic pilot show evidence for the basic learning effect characteristic of all human teams in the UAV-STE. Specifically, research in the UAV-STE context with three human team members has indicated that teams demonstrate skill acquisition at the team level that is reflected in a team learning curve (Cooke et al. 2001). There is also evidence for decay of team skill, and team-level expertise (Cooke et al. 2007). Because we have robust findings on patterns of human team coordination (i.e., the timing associated with providing and requesting information among team members), we also plan to compare the coordination of the human + synthetic teammate teams to that of the allhuman teams. It should be noted that this empirical validation will occur within the context of a functioning synthetic teammate, an atypical empirical approach which will lend credibility to the model in the sense that the model must do much more than just show evidence for aligning with a specific data set—the model must also function as a teammate with all the constraints on model behavior which that entails. Aspects of the agent-environment interaction component already show promise of validity. Human data were collected to determine if the agent-environment interaction component of the synthetic teammate behaves in a similar manner to humans within the CERTT UAV-STE. The model and human participants were instructed to set the waypoint, altitude, airspeed, and course settings in the CERTT UAV-STE twenty times, each (Myers 2009). Human and model data were compared across three dependent variables: the number of actions to complete a setting, the time to complete a setting, and the time between mouse-clicks when entering a setting. Time between mouse-clicks was selected to demonstrate that the model was similar to humans at recognizing information and acting on it (the perception-action cycle); the time to complete a setting was selected to demonstrate that the model was taking a similar amount of time to humans in completing goals; the number of actions to complete a setting was selected to ensure that the model was using a similar strategy to humans in selecting task adjustment buttons to complete a setting goal. Across the three dependent variables, there was a high correlation (r 2 = 0.98) and a low root mean-squared deviation (RMSD = 1.2; see Fig. 4). The results indicate that the model takes approximately the same amount of time and mouse-clicks to complete setting goals, as well as mimics the perception-action cycle in human participants. These results demonstrate that the model is a good approximation to human behavior across the goal-level of analysis as well as the perceive-act cycle. Consequently, there is mounting evidence that the agent-environment interaction component of the synthetic teammate accurately mimics human behavior, however further tests must be conducted before claiming complete validity.
292
J. Ball et al.
Fig. 4 Human and model data from three dependent variables. Results indicate a valid agent-environment interaction component
The synthetic teammate project
293
Not all components of the synthetic teammate are equally cognitively plausible. In the interest of building an end-to-end system, cognitive constraints on the development of the language generation and dialog management components have been relaxed. An important issue is how to validate the overall model given that the components vary in their level of cognitive fidelity. This is currently an open research question.
12 Discussion & conclusions The Synthetic Teammate project is a challenging project reminiscent of earlier research in AI and cognitive science which focused on solving AI Hard Problems using cognitively motivated computational techniques (cf. Lebiere and Wray 2006). An initial, prototype end-to-end system has been developed. The process of developing and integrating each of the synthetic teammate’s components has taught us lessons not only about developing the components, but also integrating them into a system. In the rest of the section, we discuss lessons learned, comparisons to other approaches, and potential challenges associated with the synthetic teammate’s teamwork capabilities. 12.1 Lessons learned The development of cognitively plausible computational models at the size and level of complexity of the synthetic teammate precludes a single scientist from developing the model. Expertise across many domains is required. We approached this problem by first identifying the required domains and dividing them among the research team members for individual development. The development of the individual components took advantage of previously-published models. For example, alphabet retrieval in the agent-environment interaction component, trigram activation in the language comprehension component, and speech act and obligation use in the dialog management component. This divide and conquer strategy supported individual development, but closer interaction among component developers was needed for the integration of the components. That integration resulted in a “baton-passing” system which represents a shallow integration of the components. In the baton-passing system, each component operates in a serial fashion with respect to the other components—that is, components do not operate in parallel relative to one another. For example, it is safe to assume that while performing the actions of setting the next waypoint to fly toward, a human pilot could “cognitively ready” a message to teammates that the UAV is now headed toward the next reconnaissance target, suggesting the parallelization of environment interaction with language generation. However, the agent-environment interaction and language generation components do not operate in such a parallel manner. Rather, the agent-environment interaction component will complete the next waypoint setting, and once complete “free resources” for one of the other components to take control of the system and complete a goal, such a sending a message to teammates, and so on. Although the baton-passing system has worked to integrate
294
J. Ball et al.
the components, we would prefer greater parallelization between the components. We have already achieved this in the case of the situation model component which provides the interface between, and is more deeply integrated with, the other model components. The situation model component operates in an interleaved fashion with the language comprehension and agent-environment interaction components. Due principally to interrelatedness of the elements of the situations encountered in the task, the situation model component needed to more continuously interact with the other components. This continuous interaction enables the situation model to more appropriately situate the concepts, objects, events, actions, and relations of the situations in relation to each other and to incrementally build and refine mental representations of those situations. The baton passing system had been helpful in earlier model development to limit model complexity and undesired model interactions, but hampered the needed fine grained refinements of situation representations. 12.2 Comparison to other approaches The use of the term synthetic teammate is borrowed from research ongoing at Chi Systems (cf. Scolaro and Santarelli 2002). In a panel session at the Behavior Representation in Modeling and Simulation (BRIMS) conference in 2004, there were presentations of several different approaches to the development of synthetic agents with natural language capabilities (Ball 2004b). The Synthetic Teammate project aligns with this research. However, unlike many other systems, the Synthetic Teammate project is based on text chat rather than spoken input. The challenges of processing spoken language limit the capabilities of spoken language systems (Stokes 2001). Such systems typically assume a restricted vocabulary and limited forms of input in order to cope with this challenge. We decided to use text chat to sidestep these limitations and, coincidentally, this mode of communication is increasingly used in military settings. A similar approach has been adopted in the Situation Understanding BOT through Language and Environment (SUBTLE) project (Marcus et al. 2008). However, the SUBTLE project has the additional challenge of having to situate the synthetic teammate on a robot platform and act in the real world. A defining feature of this research is the focus on cognitive plausibility, often at a fine-grained level of cognitive fidelity uncharacteristic of most research in the development of synthetic agents. For example, although the TacAir-Soar project (Jones et al. 1999) took advantage of the ability of the Soar cognitive architecture to produce timing of tactical decision-making that generally matched human timing, finegrained detail such as low-level eye and finger movements were abstracted away. This speeded development but decreased the model fidelity at times. For example, output from the radar was abstracted to provide the status (friend or foe) of an aircraft, and subject matter experts noticed that the model would react to an enemy plane too quickly (Laird and Jones 1998). An example of another project that does contain fine-grained fidelity is the taxi error model of Byrne and Kirlik (2005). Similarities between that project and the Synthetic Teammate project include both being models of pilots that contain detailed perceptual and motor fidelity. Differences include the focus on error in the taxi model and the focus on communication in the Synthetic Teammate.
The synthetic teammate project
295
12.3 Potential challenges to teamwork There are potential challenges associated with the synthetic teammate’s teamwork capabilities. Within the context of the UAV-STE, task success is dependent on the timeliness of requesting necessary information from teammates (i.e., pulling) as well as the timeliness of providing necessary information to teammates (i.e., pushing). The pulling and pushing of requisite information requires understanding what each teammate must do to facilitate team task success, requiring good situation awareness (SA) of not only where the team is with respect to task success, but also good SA of where individuals are with respect to their individual tasks. For example, continually requesting information from the photographer while she is attempting a photograph would interfere with team success, and knowing this fact, as well as understanding when the photographer will be attempting a picture, will help to preclude teammates from interrupting the photographer. Hence, our synthetic teammate must have a good representation of SA that includes not only the task, but other teammates as well. Our approach to modeling SA is focused on integrating and appropriately structuring information from multiple sources, and follows Endsley’s SA framework (Endsley 1995). Perceptual information is acquired and integrated with information derived from incoming messages from teammates. This information is reasoned over to make decisions regarding what to do next, and where to get the requisite information for accomplishing the goal. Further, the acquired and derived information can also be used to predict future states, and this is where our current instantiation falls short. The model does not yet hypothesize, or abductively reason, about possible states of the UAV, or status of the teammates and, as a result, cannot direct its behavior based on hypothesized states. A consequence of this inability is that it is very likely that our synthetic teammate will be perceived as overly persistent, or bossy, and this may interfere with important team tasks directly leading to a measured reduction in team performance. Establishing good SA of the task and teammates is a necessary prerequisite for a synthetic teammate to function as a good team player. However, there are also other teamwork knowledge, skills, and abilities, sometimes very subtle, that human teammates bring to a team task or develop in its context. For instance, human teammates know the nature of being busy, being interrupted, and working simultaneously toward individual and team goals. As they interact, they develop knowledge of the task, knowledge of team roles, and also knowledge of individual strengths and weaknesses. Patterns of interaction develop as signatures of individual teams (Cooke et al. 2008). In addition, team situation awareness is more than good SA on the part of each team member. Team members must assess the situation and take action in a coordinated manner (Gorman et al. 2006). It is precisely these subtle teamwork skills that we expect will be the most challenging for our synthetic teammate and that may hamper it from becoming a good team player. 12.4 Conclusions The main objective of the Synthetic Teammate project is to develop a synthetic agent capable of being integrated into a UAV team task for training purposes. To
296
J. Ball et al.
achieve this goal while maintaining training efficacy, the synthetic teammate is being developed to closely match human behavior across situation assessment, agentenvironment interaction, language comprehension and generation, and dialog management cognitive capacities. An initial, prototype system has been developed. The initial system will be subjected to iterative refinement until a version which is capable of functioning as a teammate in the UAV-STE simulation is available. Once reasonable functionality is achieved, a validation experiment will be conducted in which the synthetic teammate will interact with human teammates, and the performance of this hybrid team will be compared against all human teams at the individual and team levels. Although the synthetic teammate is not complete, the project is in a fairly advanced stage of development, continued funding has recently been awarded, and we are nearing the point at which the synthetic teammate will be incorporated into human teams—so stay tuned. Acknowledgements This research is funded in part by applied research funds from the Warfighter Readiness Research Division, Human Effectiveness Directorate, 711th Human Performance Wing, Air Force Research Laboratory and by basic research funds from the Air Force Office of Scientific Research and Office of Naval Research. The Synthetic Teammate project is a sizable project involving collaboration between the Air Force Research Laboratory and the Cognitive Engineering Research Institute. Project team members not listed as authors on this paper—but who have contributed to the project in important ways—include Dee Andrews, Scott Douglass, Jasmine Duran, Kevin Gluck, Jack Harris, Mike Krusmark, Don Lyon, Harry Pedersen, Eric Robinson, Steven Shope, Ronnie Silber and Amanda Taylor.
References Altmann G, Steedman M (1988) Interaction with context during human sentence processing. Cognition 30:191–238 Anderson JR (2007) How can the human mind occur in the physical Universe? Oxford University Press, New York Anderson JR, Fincham JM, Douglass S (1997) The role of examples and rules in the acquisition of a cognitive skill. J Exp Psychol Learn Mem Cogn 23:932–945 Anderson JR, Bothell D, Byrne M, Douglass S, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):1036–1060 Ball J (1991). PM, propositional model, a computational psycholinguistic model of language comprehension based on a relational analysis of written english. UMI Dissertation Information Service, Ann Arbor, MI Ball J (2004a). A cognitively plausible model of language comprehension. In: Proceedings of the 13th conference on behavior representation in modeling and simulation, pp 305–316 Ball J (2004b) Software agents with natural language capabilities–where are we? In: Symposium conducted at the 13th conference on behavior representation in modeling and simulation, Arlington, VA Ball J (2006) Can NLP systems be a cognitive black box? (Is cognitive science relevant to AI problems?) In: Paper presented at the AAAI spring symposium: between a rock and a hard place, cognitive science principles meet AI hard problems (Technical Report SS-06-02). AAAI Press, Menlo Park Ball J (2007a) Construction-driven language processing. In: Vosniadou S, Kayser D, Protopapas A (eds) Proceedings of the 2nd european cognitive science conference. LEA, New York, pp 722–727 Ball J (2007b) A bi-polar theory of nominal and clause structure and function. Ann Rev Cogn Linguist 5(1):27–54 Ball J (2008) A naturalistic, functional approach to modeling language comprehension. In: Paper presented at the AAAI Fall Symposium, Naturally Inspired Artificial Intelligence (Technical Report FS-08-06). AAAI Press, Menlo Park Ball J (2010) Simplifying the mapping from referring expression to referent in a conceptual semantics of reference. In: Proceedings of the 32nd annual meeting of the cognitive science society
The synthetic teammate project
297
Ball J, Heiberg A, Silber R (2007) Toward a large-scale model of language comprehension in ACT-R 6. In: Lewis R, Polk T, Laird J (eds) Proceedings of the 8th international conference on cognitive modeling. Psychology Press, New York, pp 173–179 Boersma P, Hayes B (2001) Empirical tests of the gradual learning algorithm. Linguist Inq 32:45–86 Byrne MD (2001) ACT-R/PM and menu selection: applying a cognitive architecture to HCI. Int J HumanComput Stud 55(1):41–84 Byrne MD, Kirlik A (2005) Using computational cognitive modeling to diagnose possible sources of aviation error. Int J Aviat Psychol 15(2):135–155 Byrne MD, Wood SD, Sukaviriya P, Foley JD, Kieras DE (1994) Automating interface evaluation. In: Proceedings of the SIGCHI conference on Human factors in computing systems: celebrating interdependence. ACM, New York, pp 232–237 Cassimatis N, Bello P, Langley P (2008) Ability, breadth, and parsimony in computational models of higher-order cognition. Cogn Sci 32:1304–1322 Christianson K, Hollingsworth A, Halliwell J, Ferreira F (2001) Thematic roles assigned along the garden path linger. Cogn Psychol 42:368–407 Colmerauer A, Roussel P (1996) The birth of Prolog. In: Bergin T, Gibson R (eds) History of programming languages II. ACM Press/Addison-Wesley, New York, pp 331–367 Cooke N, Shope S (2005) Synthetic task environments for teams: CERTT’s UAV-STE. Handbook on human factors and ergonomics methods, vol 46. CRC Press, Boca Raton Cooke NJ, Kiekel PA, Helm EE (2001) Measuring team knowledge during skill acquisition of a complex task. Int J Cogn Ergon 5(3):297–315. Special section on knowledge acquisition Cooke NJ, Gorman JC, Duran JL, Taylor AR (2007) Team cognition in experienced command-and-control teams. J Exp Psychol Appl 13(3):146–157. Special issue on capturing expertise across domains Cooke NJ, Gorman JC, Kiekel PA (2008) Communication as team-level cognitive processing. In: Letsky M, Warner N, Fiore S, Smith CAP (eds) Macrocognition in teams: theories and methodologies. Ashgate, Hants, pp 51–64 Core MG, Allen JF (1997) Coding dialogs with the DAMSL annotation scheme. Paper presented at the AAAI fall symposium on communicative action in humans and machines, November 8–10, 1997, Cambridge, MA Douglass SA (2007) A computational model of situated action. Carnegie Mellon University, Pittsburgh (Doctoral dissertation) Douglass SA (2010) Rule & Automata Modeling Language (RaAML) (in preparation) Douglass S, Ball J, Rodgers S (2009) Large declarative memories in ACT-R. In: Proceedings of the 9th international conference on cognitive modeling 2009, Manchester, UK Endsley MR (1995) Toward a theory of situation awareness in dynamic systems. Hum Factors 37(1):32–64 Ericsson S (2004) Dynamic optimisation of information enrichment in dialogue. In: Proceedings of the 8th international workshop on formal semantics and pragmatics of dialogue. Catalog, Barcelona Freiman M, Ball J (2008) Computational cognitive modeling of reading comprehension at the word level. In: Proceedings of the 38th western conference on linguistics. University of California, Davis, Davis, pp 34–45 Freiman M, Ball J (2010) Improving the reading rate of Double-RLanguage. In: Proceedings of the 10th international conference on cognitive modeling (to appear) Gallagher HL, Frith CD (2003) Functional imaging of ‘theory of mind’. Trends Cogn Sci 7(2):77–83 Gibson E, Pearlmutter NJ (1998) Constraints on sentence comprehension. Trends Cogn Sci 2(7):262–268 Gorman JC, Cooke NJ, Winner JL (2006) Measuring team situation awareness in decentralized command and control systems. Ergonomics 49:1312–1325 Grodner D, Gibson E, Argaman V, Babyonyshev M (2003) Against repair-based reanalysis in sentence comprehension. J Psycholinguist Res 32(2):141–166 Hobbs JR (1985) Ontological promiscuity. In: Proceedings of the 23rd annual meeting of the association for computational linguistics. Chicago, IL, pp 61–69 Hobbs JR (2003) Discourse and inference. Retrieved from http://www.isi.edu/~hobbs/disinf-tc.html Jones RM, Laird JE, Nielsen PE, Coulter KJ, Kenny P, Koss FV (1999) Automated intelligent pilots for combat flight simulation. AI Mag 20(1):27–41 Kamp H, Ryle U (1993) From discourse to logic: introduction to model-theoretic semantics of natural language, formal logic and discourse representation theory. Studies in linguistics and philosophy. Kluwer Academic, Dordrecht Kieras DE (1988) Towards a practical GOMS model methodology for user interface design. In: Helander M (ed) The handbook of human-computer interaction. North-Holland, Amsterdam, pp 135–158
298
J. Ball et al.
Kieras D, Meyer DE (1997) An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Hum-Comput Interact 12(4):391–438 Kintsch W (1998) Comprehension, a paradigm for cognition. Cambridge University Press, New York Klahr D, Chase WG, Lovelace EA (1983) Structure and process in alphabetic retrieval. J Exp Psychol Learn Mem Cogn 9(3):462–477 Kosslyn S (2006) The case for mental imagery. Oxford University Press, New York Laird JE, Jones RM (1998) Building advanced autonomous AI systems for large scale real time simulations. In: Proceedings of the 1998 computer game developers’ conference. Freeman, Long Beach, pp 365–378 Landauer T, Dumais S (1997) A solution to Plato’s problem: the latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240 Langacker RW (1987) Foundations of cognitive grammar, vol I: theoretical prerequisites. Stanford University Press, Stanford Langacker RW (1991) Foundations of cognitive grammar, vol II: descriptive application. Stanford University Press, Stanford Lebiere C, Wray R (eds) (2006) Between a rock and a hard place: cognitive science principles meet AIhard problems. AAAI Press, Menlo Park. Papers from the AAAI spring symposium Lovett MC (1998) Choice. In: Anderson JR, Lebiere C (eds) The atomic components of thought. Erlbaum, Mahwah, pp 255–296 Marcus M, Badler N, Joshi A, Pappas G, Pereira F, Romero M, McCallum A, Potts C, Yanco H (2008) SUBTLE (Situation Understanding Bot Through Language and Environment) project program review. University of Massachusetts, Amherst Matessa M (2000) Simulating adaptive communication. Carnegie Mellon University, Pittsburgh (Doctoral dissertation) McClelland JL, Rumelhart DE (1981) An interactive activation model of context effects in letter perception. Part I. An account of basic findings. Psychol Rev 88(5):375–407 McDonald D (1999) A rational reconstruction of Genaro. In: Proceedings of the RAGS Workshop, Edinburgh Miller G (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41 Myers CW (2009) An account of model inspiration, integration, & sub-task validation. In: Proceedings of the 9th international conference on cognitive modeling, Manchester, UK Prince A, Smolensky P (1993/2004) Optimality theory: constraint interaction in generative grammar. Wiley-Blackwell, New York Ritter FE, Van Rooy D, St Amant R (2002) A user modelling design tool based on a cognitive architecture for comparing interfaces. In: Computer-aided design of user interfaces III. Proceedings of the 4th international conference on computer-aided design of user interfaces CADUI’2002, Valenciennes, France, 15–17 May 2002. Kluwer Academics, Dordrecht, pp 111–118 Scolaro J, Santarelli T (2002) Cognitive modeling teamwork, taskwork, and instructional behavior in synthetic teammates. In: Proceedings of the 11th conference on computer generated forces and behavioral representation. Institute for Simulation and Training, Orlando Seidenberg MS, McClelland JL (1989) A distributed, developmental model of word recognition and naming. Psychol Rev 96(4):523–568 Stokes J (2001) Speech interaction and human behavior representations (HBRs). In: Proceedings of 10th conference on computer generated forces and behavioral representation. SISO, Inc, Norfolk, pp 467– 476 Tambe M, Johnson WL, Jones RM, Koss F, Laird JE, Rosenbloom PS, Schwamb K (1995) Intelligent agents for interactive simulation environments. AI Mag 16(1):15–40 Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JC (1995) Integration of visual and linguistic information in spoken language comprehension. Science 268(5217):1632–1634 Traum DR, Allen JF (1994) Discourse obligations in dialogue processing. In: Proceedings of the 32nd annual meeting of the association for computational linguistics. Association for Computational Linguistics, Las Cruces, Morristown Traum DR, Rickel J, Gratch J, Marsella S (2003) Negotiation over tasks in hybrid human-agent teams for simulation-based training. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, Melbourn, Australia, New York, pp 441–448. van Dijk T, Kintsch W (1983) Strategies of discourse comprehension. Academic Press, New York Varges S (2006) Overgeneration and ranking for spoken dialogue systems. In: Proceedings of the 4th international natural language generation conference, Sydney, Australia, July 2006. Association for Computational Linguistics, pp 20–22
The synthetic teammate project
299
Vosse T, Kempen G (2000) Syntactic structure assembly in human parsing: a computational model based on competitive inhibition and a lexicalist grammar. Cognition 75:105–143 Walker MA, Whittaker SJ, Stent A, Maloor P, Moore J, Johnston M, Vasireddy G (2004) Generation and evaluation of user tailored responses in multimodal dialogue. Cogn Sci 28(5):811–840 Yang Y, Bello P (2005) Some empirical results concerning deontic reasoning: models, schema, or both? In: Proceedings of the 27th annual meeting of the cognitive science society. Erlbaum, Mahway, pp 2393– 2398 Zachary W, Santarelli T, Lyons D, Bergondy M, Johnston J (2001) Using a community of intelligent synthetic entities to support operational team training. In: Proceedings of the tenth conference on computer generated forces and behavioral representation. Orlando, FL: Institute for Simulation and Training, University of Central Florida, pp 215–224 Zwann R, Radvansky G (1998) Situation models in language comprehension and memory. Psychol Bull 123(2):162–185
Jerry Ball is a senior research psychologist in the Human Effectiveness Directorate, 711th Human Performance Wing, Air Force Research Laboratory. He has a Masters Degree in Computer Science from the University of Florida and a PhD in Cognitive Psychology from New Mexico State University. Christopher Myers is a research psychologist in the Cognitive Models and Agents Branch, Human Effectiveness Directorate, 711th Human Performance Wing, Air Force Research Laboratory. He has a PhD in Cognitive Science from Rensselaer Polytechnic Institute. Andrea Heiberg was previously a research scientist at L3 Communications and is currently working for General Motors. She has a PhD in Linguistics from the University of Arizona. Nancy J. Cooke is the lead scientist at CERI and a professor at Arizona State University – Polytechnic. She is editor of the Human Factors journal and has a PhD in Psychology from New Mexico State University. Michael Matessa is a research scientist at Alion. He has a PhD in Psychology from Carnegie Mellon University. Mary Freiman is a research scientist at L3 Communications. She has a Masters Degree in Human Language Technology from the University of Arizona. Stuart Rodgers is a retired Air Force colonel and co-owner and lead scientist of AGS TechNet.