Psychological Research (2001) 65: 279±288
Ó Springer-Verlag 2001
ORIGINAL ARTICLE
Till Pfeier á Tanja Czech
Working memory limitation as a source of confusion in the abstract THOG task
Received: 20 March 2000 / Accepted: 22 January 2001
Abstract Limitations of working memory are proposed as a major determinant of problem diculty in the THOG task. This task is a logical reasoning task which uses an exclusive disjunction and requires hypotheticodeductive reasoning. Four experiments with students of mathematics or psychology were used to test the hypotheses that, ®rst, guiding participants' attention facilitates the task and, second, the use of paper and pencil as external problem representation reliefs working memory load. Focusing participants' attention upon a critical aspect of the task does not improve solution rates. Students of mathematics were better than students of psychology, but only if they were allowed to use paper and pencil or to work on the task repeatedly. These results partially support the working memory hypothesis. They point toward the importance of training and practice in relatively simple meta-cognitive skills in logical reasoning.
Introduction Suppose someone hears the threat ``Feed me artichokes or I'll walk out on you!'' (Wall & Schwartz, 1991, p. 27). If the addressee of this message decides not to feed the speaker with artichokes, one may safely assume that he or she understood the message perfectly well, including the logic of the connective or. Or is simple, although the connective has two interpretations. Under the inclusive interpretation ``A or B'' is true if A or B or both are true (``Do you prefer shrimps or prawn?'' ``Oh, I like both!'').
T. Pfeier (&) Department of Psychology, University of Regensburg, 93040 Regensburg, Germany e-mail: till.pfei
[email protected] T. Czech Institute of Business Education and Economics Teaching, Dresden University of Technology, Dresden, Germany
Under the exclusive interpretation ``A or B'' is true if either A is true or B is true, but not both (``Is it a boy or a girl?''). Nevertheless, Wason (1977) used the exclusive disjunction to devise a reasoning task which is rather hard to solve. The so-called THOG task is usually solved by only 30% of adult participants. In the classic variant of the task (Wason & Brooks, 1979) participants are shown a black diamond, a white diamond, a black circle, and a white circle. The experimenter announces that he has chosen one of the colors (black or white) and one of the forms (diamond or circle), but participants are not told which ones. Participants are informed about an arbitrary classi®cation rule which states that any ®gure which has the form or the color, but not both, which the experimenter has chosen, is also a THOG. One of these ®gures, say the black diamond, is made known to the participants as a THOG. Then participants are asked to classify the remaining three objects as (a) de®nitely a THOG (b) de®nitely not a THOG or (c) indeterminable. This task requires hypothetico-deductive reasoning. Participants have to build the two possible combinations of color and form which the experimenter might have chosen. Because the black diamond is a THOG it must have either the form or the color in common with the experimenter's choice. It can be deduced that the experimenter might have chosen white, diamond or black, circle. In both cases the black diamond (the designated THOG) has exactly one and only one of these properties and therefore obeys the disjunctive rule. This line of reasoning has to be applied to any of the remaining three ®gures for both hypothetical feature combinations. This is known as combinatorial analysis. The correct solution is that the white diamond and the black circle are de®nitely not THOGs, because for each possible combination one ®gure has both properties and the other has none. The white circle is the only other THOG, because this ®gure has, like the black diamond, one and only one of the properties in common with both hypothetical combinations. Like its prominent relative, Wason's selection task (Wason, 1966), the THOG problem is puzzling because
280
the majority of intelligent adults fails to solve this task. This incorporates the claim that the same individuals are regularly able to accomplish much more complex tasks (e.g., academic achievement, social interaction, realworld problem solving), which has been termed the ``rationality paradox'' by Evans and Over (1996). Furthermore, in the case of the THOG task, it has been shown that participants are able to perform each of the subtasks of the problem quite well. People do not fail to construct the two possible combinations chosen by the experimenter (Wason & Brooks, 1979), nor do they systematically misunderstand the logic of the rule (Ziegler & Schober, 1995). The abstract material alone is also not the decisive factor of task diculty because there are thematic versions with comparably low solution rates (Smyth & Clark, 1986) as well as abstract versions with very good solution rates (Girotto & Legrenzi, 1993). Overviews have been given by Evans, Newstead, and Byrne (1993) and Newstead, Girotto, and Legrenzi (1995). Girotto and Legrenzi (1989) proposed that participants confuse the data level, the properties of the designated THOG, with the hypothesis level, the properties the experimenter might have chosen. This approach was called confusion theory by Newstead and Griggs (1992). For instance, if one believes that the experimenter has chosen black and diamond, then the ``correct'' solution is to say that the white diamond and the black circle both de®nitely are THOGs and the white circle de®nitely is not a THOG. This error pattern has been termed ``intuitive error'' type A by Wason and Brooks (1979). Therefore, confusion theory claims that participants do reason logically but under false assumptions1. Variants of the THOG task which avoid confusion are called separation problems (Newstead & Griggs, 1992). These problems highlight the separation of the properties of the designated THOG and the hypothetical features by describing an extra card with the features (Girotto & Legrenzi, 1989, Newstead & Griggs, 1992), by describing a person who writes down the features (Needham & Amado, 1995), or by providing dierent labels for the THOG and the ®gure bearing the hypothetical features (Girotto & Legrenzi, 1993). The ®rst goal of this research was to explore another simple possible mechanism to obtain facilitation by separation. If the description of an extra card or a person writing down the hypothetical features suces to ensure separation of data and hypothesis level, then the experimenter should also be able to ful®ll this function. This hypothesis was explored by having the experimenter actively and demonstratively writing down two features on a card (of course, without the participant being able to read the words). The second goal of this research related to an acknowledged theoretical weakness of confusion theory. Newstead et al. (1995) pointed out that confusion theory does not explain why the confusion arises in the ®rst 1 For this reason the label intuitive error might seem slightly misleading from the perspective of confusion theory.
place. Newstead et al. proposed that the combination of the necessary processing steps (building the hypothesis, making the combinatorial analysis, and evaluating the results against the hypothesis) overloads working memory capacity. Similarly, Griggs and Newstead (1982) proposed that their variant of the THOG task, the DRUG problem with very high solution rates induces an internal problem representation which could be visualized externally as two symmetric, binary trees. Girotto and Legrenzi (1989), however, objected that this problem representation enables facilitation because it implies a simple elimination strategy, which renders the results of Griggs and Newstead not comparable to standard THOG problems. Another explanation for the facilitation by separation problem hypothesizes that successful problem solvers use a self-generated external representation in solving the THOG problem, that is use paper and pencil. This hypothesis is motivated by the observation that in many studies of separation problem participants were allowed to use paper and pencil to make notes (e.g., Girotto & Legrenzi, 1989; Needham & Amado, 1995); in none of these studies, however, is the number of participants reported who actually used paper and pencil. The basic rationale in this research is to test various measures which should help to overcome the capacity limitations of working memory. The following section describes the four experiments which addressed the two research questions (demonstrative writing and the working memory hypothesis).
Overview of the experiments All four experiments reported in this research used the standard abstract THOG task. Exp. 1 tested the facilitatory eects of demonstrative writing, the use of paper and pencil, and the question for the hypothetical features. Because the participants of Exp. 1 (students of psychology) often refused to use paper and pencil, Exp. 2 explored demonstrative writing with a sample of students of mathematics and physics, who were expected to use paper and pencil much more often. Exp. 3 compared samples of students of psychology with students of mathematics and physics, without paper and pencil, and once more tested the eect of demonstrative writing. Exp. 4 was a replication of Exp. 3, but without demonstrative writing and two isomorphic THOG tasks for each participant. For this and the following experiments we adopted the analysis scheme of O'Brien, Noveck, Davidson, Fisch, Lea, and Freitag (1990). Within groups, the frequency of occurrence of the empirically observed answer patterns among the 27 possible (THOG, not THOG, or indeterminable for three designs) was tested against chance level using a 0.01. Between groups, the frequency of correct versus incorrect answers were compared with v2 tests, using a 0.05. Tables show the frequencies of the correct answer and a breakdown of the incorrect answer patterns in the categories near in-
281
sight answers, both intuitive errors, and all other errors. In near insight answers, the only other THOG, the white circle, is classi®ed correctly, but one or both of the other designs are classi®ed as indeterminable. Therefore, in the within groups assessment of the frequency of occurrence for this answer pattern the chance level is determined by 3/27 as compared to 1/27 for the other answer patterns. In the intuitive error type B, the only other THOG is classi®ed as de®nitely not THOG, while the remaining two designs are classi®ed as indeterminable.
Experiment 1 The main purpose of Exp. 1 was to test the eectiveness of demonstrative writing and to observe the frequency of the use of paper and pencil. In addition, we tested the eectiveness of another possible facilitation in combination with demonstrative writing. Girotto and Legrenzi (1989) and Newstead and Griggs (1992) have shown that the solution rates of thematic THOG problems are better if participants are asked to construct the possible combinations of features which the experimenter could have written down. Solution rates of the abstract version, however, were not aected by this manipulation. We therefore tested whether the combination of two manipulations, demonstrative writing and the question for the hypothetical combinations, could improve solution rates. Thus, initially there were three conditions, a control condition without demonstrative writing and the question for the hypothetical combinations and two conditions with demonstrative writing, one with and one without the question. In all three conditions paper and pencil were available for the participants. After about one third of the data were collected it became clear that the vast majority of participants did not use paper and pencil spontaneously, thus threatening the rationale of the experiment. Therefore, we decided to include two more conditions with demonstrative writing which were intended to provide further encouragement to use paper and pencil and are described in detail in the Materials and procedure section below. Method Participants One hundred students of psychology at the University of Regensburg participated for course credit. None of the participants knew the task or had received formal training in logic. Participants were tested in single sessions to ensure (a) administration of demonstrative writing at the right time and (b) participants' attention toward the manipulation. Materials The text of the task was written on ®ve cards: Card 1: Presented the four designs (black diamond, white diamond, black circle, white circle).
Card 2: Showed the sentence ``I, the experimenter, write down one of the colors (black or white) and one of the forms (circle or diamond).'' Card 2 (control): This was the same as card 2 except that ``write down'' was replaced by ``remembered''. This card was used instead of card 2 in the control condition without demonstrative writing. Card 3: Showed the rule de®ning THOGness. The ``or'' in the rule was disambiguated (``... has the form or the color but not both''). Card 4: Identi®ed the black diamond as THOG. Card 5: Asked participants to imagine which features the experimenter could have chosen. This card was used only in those conditions which included the question for the hypothetical combinations.
Procedure In the written instructions participants were warned that the problem might be more dicult than it seems to be. Then the cards were given one after the other to the participants, always in the same order. Participants were allowed to read each card as long as they wished. Cards which a participant had read were placed on the table in front of the participant and could be inspected at any time. Then the three designs, which had to be classi®ed, were presented on single cards, given one at a time to the participant in random order. In all conditions paper and pencil were available for participants. In all conditions except the control condition, the experimenter demonstratively wrote down two words during the presentation of the second card on a card of his own which was then placed face down on the table. The experimenter took care that participants noticed the act of writing without being able to see the words. In the condition with `demonstrative writing, no question, paper and pencil encouraged' the experimenter verbally encouraged participants to use paper and pencil to make notes. In the condition with `demonstrative writing, no question, paper and pencil enforced' participants were instructed to write down anything they thought to be important after each card had been handed out and read. Table 1 summarizes the ®ve conditions.
Results Table 1 shows the distribution of answer patterns in the ®ve conditions of Exp. 1. For the statistical analyses between groups only the proportions of correct solutions are compared. There was neither a dierence between condition demonstrative writing, question, paper and pencil available and condition demonstrative writing, no question, paper and pencil available nor between condition demonstrative writing, no question, paper and pencil encouraged and condition demonstrative writing, no question, paper and pencil enforced, both v2 (1, N 40) 0.0. For the following analyses, the conditions demonstrative writing, question, paper and pencil available and demonstrative writing, no question, paper and pencil available were collapsed as were the conditions demonstrative writing, no question, paper and pencil encouraged and demonstrative writing, no question, paper and pencil enforced. The combined group demonstrative writing, with and without question, paper and pencil available did not dier from the control group, v2 (1, N 60) 1.49, P 0.22. Encouraged and enforced use of paper and pencil in combination with demonstrative writing also failed to produce facilitation,
282 Table 1 Responses of students of psychology to the ®ve conditions of the THOG task in Exp. 1 (P&P paper and pencil) Answer pattern
Condition Control Question P&P available n = 20
Demonstrative writing Question P&P available n = 20
Demonstrative writing No question P&P available n = 20
Demonstrative writing No question P&P encouraged n = 20
Demonstrative writing No question P&P enforced n = 20
Correct Near insight Intuitive A Intuitive B
2 2 10 5
5 5 4 1
6 4 3 3
7 1 4 2
6 1 3 3
1
5
4
6
7
Other
v2 (1, N 60) 2.50, P 0.11. Within the control group, the frequency of intuitive error type A and intuitive error type B was above chance level. In the combined conditions without encouraged/enforced use of paper and pencil the frequency of the correct answer and intuitive error type A was above chance level. In this combined group, there was also a non-signi®cant tendency towards an increase of the answer patterns near insight (P 0.029). In the combined group with encouraged/enforced use of paper and pencil also correct answer and intuitive error type A were observed more frequently than was to be expected by chance; in this group there was no tendency towards an increase of near insight answers. The preliminary observation that participants rarely used paper and pencil spontaneously was con®rmed. In the three conditions which did not encourage or enforce the use of paper and pencil 20 of 60 participants (33.3%) used paper and pencil; among these participants 6 (30%) solved the task. Among the remaining 40 participants who did not use paper and pencil 7 (17.5%) solved the task. In the combined groups with encouraged or enforced use of paper and pencil 33 of 40 (82.5%) of participants used paper and pencil; among these participants 11 (33.3%) solved the task. Among the remaining 7 participants who did not use paper and pencil 2 (28.6%) solved the task. In the two conditions with the question for the hypothetical features which the experimenter might have written down 10 of 14 participants (71%) who actually used paper and pencil produced both possible pairs of features. Among these, 40% produced the correct solution. In the three conditions with paper and pencil available those participants who actually used paper and pencil mostly made short notes using verbal labels (e.g., ``THOG: black diamond'') or pictorial symbols or a mixture of both. In the two conditions with encouraged or enforced use of paper and pencil the majority of those participants who actually used paper and pencil produced lengthier verbal statements (e.g., reproduction of the rule), often mixed with pictorial symbols.
Discussion Neither of the two major hypotheses has been con®rmed. Demonstrative writing failed to produce facilitation besides a non-signi®cant tendency to increase the rate of near insight answer patterns in the conditions with and without the question. Our manipulations which were intended to relieve the load upon working memory also had no eect. Because demonstrative writing had no eect, the question for the hypothetical features did not aect solution rates. Therefore, this is a replication of the ®ndings of Girotto and Legrenzi (1989) and Newstead and Griggs (1992) that abstract versions without separation do not pro®t from the question. Encouraging and enforcing use of paper and pencil was eective insofar as the majority of participants used paper and pencil in these two conditions. The manipulation was not eective, however, in producing better solution rates. The marginal increase of near insight solutions, which might be due to demonstrative writing, was not observed in the conditions with encouraged or enforced use of paper and pencil. These observations suggest that the use of external memory aids is a technique which has to be learned and trained to be eective. We suggest that the use of paper and pencil in problem solving is a meta-cognitive skill. If this meta-cognitive skill is not readily available, its use in compliance with experimental instructions might well turn out to impose additional cognitive strain. If this were the case, the minimal facilitation obtained by demonstrative writing would have been obscured by the additional task of using external memory aids. To overcome this possible in¯uence it is necessary to employ a sample of participants who are well trained in the use of paper and pencil in problem solving. We decided therefore to recruit students who had experienced university mathematics education for several semesters. University education in mathematics entails weekly homework which is almost entirely in the form of proofs of theorems. Besides the fact that the solutions to these task have to be brought in written form to the classes
283
accompanying the lectures, it is almost impossible to solve these tasks without paper and pencil. We assumed therefore (a) that participants from this population use paper and pencil spontaneously, and (b) that its use is highly trained and does not impose an additional cognitive load.
Experiment 2 Students with mathematical background and training might bene®t not only from superior meta-cognitive skills, but also from better availability of problem solving strategies with logical reasoning tasks or better logical reasoning competence per se. The available empirical evidence, however, is mixed. Unselected student samples understand the logic of exclusive disjunction (Ziegler & Schober, 1995). For another notorious reasoning task, the Wason selection task, Cheng, Holyoak, Nisbett, and Oliver (1986) showed that university education in formal logic only marginally improved solution rates. On the other hand, Jackson and Griggs (1988) found that mathematicians performed better in the abstract selection task than computer scientists, electrical engineers and social scientists. Newstead, Girotto, and Legrenzi (1995), who cite another unpublished study which found superior performance of science students compared with arts students, discuss two more indirect sources of advantage by formal mathematics education: acquaintance with (a) hypothesis testing and (b) combinatorial analysis. It seems plausible that these problem solving strategies often require the use of paper and pencil. Therefore, in Exp. 2 students of mathematics, who were allowed to use paper and pencil, tried to solve the THOG task. Method Participants Sixty students who had received at least two semesters of formal university education in mathematics participated in the experiment. Three students were engineering students from other universities, the rest were students of mathematics or physics at the University of Regensburg. One student was in his second semester, all others in the fourth or a higher semester.
Table 2 Responses of students of mathematics to the three conditions of the THOG task in Exp. 2
Answer pattern
Correct Near insight Intuitive A Intuitive B Other
Materials and procedure The procedure of delivering the cards with the problem text was identical to Exp. 1. Originally there were two conditions, one with and one without demonstrative writing. The second card which described the choice of features was formulated in present tense in the condition with demonstrative writing (``I, the experimenter, write down ...'') and in past tense in the condition without demonstrative writing (``... wrote down ...''). Two participants in the condition without demonstrative writing commented during debrie®ng that they had expected to see a piece of paper at that point. We decided to check whether this might have in¯uenced solution rates and established a third condition which used a dierent version of the second card. This version used the same wording as in Exp. 1 that the experimenter had remembered two features.
Results For the statistical analyses between groups only the proportions of correct solutions were compared. There was no dierence in the solution rates between the two conditions without demonstrative writing. The solution rate in the condition with card 2 saying ``... wrote down ...'' was 50% correct, the solution rate in the condition with card 2 saying ``... remembered ...'' was 60% correct, v2 (1, N 40) 0.10, P 0.75 (Table 2). For further analyses these conditions were collapsed. Although the solution rate was highest in the condition with demonstrative writing (70% correct), there was no dierence between this condition and the combined conditions without demonstrative writing, v2 (1, N 60) 0.70, P 0.40. Once again, demonstrative writing failed to produce facilitation. In the group with demonstrative writing, the occurrence of correct answers and of intuitive errors type A was above chance. In the combined groups without demonstrative writing, correct answers and intuitive errors type B were observed more frequently than was to expected by chance. As expected, the majority of participants spontaneously used paper and pencil, with an overall rate of 68.3% ranging from 55% (condition without demonstrative writing, `remembered') to 80% (condition with demonstrative writing). Of those participants who used paper and pencil, 70.7% produced the correct solution. Of those participants who used paper and pencil, 15% used mathematical symbols, the rest used text, tables or
Condition With demonstrative writing (``... write down ...'') n=20
Without demonstrative writing (``... wrote down ...'') n = 20
Without demonstrative writing (``... remembered ...'') n = 20
14 0 4 0
10 0 3 5
12 0 2 2
2
2
4
284
drawings. Among those participants who did not use paper and pencil, 47.4% produced the correct solution. In a post hoc comparison the dierence in solution rates between participants with and without the use of paper and pencil was not signi®cant, v2 (1, N 60) 2.13, P 0.14.
Experiment 3 Method Participants
Discussion The majority of students of mathematics or physics used paper and pencil spontaneously. For these participants, the use of an external representation should not pose an additional cognitive load, but should be a well-trained skill. Under these circumstances an eect of demonstrative writing should not be obscured. Nevertheless, demonstrative writing had no eect. The overall solution rate of 63.3% was quite high as compared with the solution rates of about 30% obtained with the standard THOG administered to unselected student samples. The solution rate obtained here, however, does not seem to be high enough to suggest that an eect of demonstrative writing could have been obscured by a ceiling eect. A minority of participants did not use paper and pencil. Fortunately, this allowed a post hoc test of the conjecture that solution rates should be higher for participants who actually and voluntarily use paper and pencil. Unfortunately, the dierence in solution rates was in the expected direction but not signi®cant. This suggested the testing of another conjecture of the working memory hypothesis. In Exp. 3 we compared students of mathematics or physics with students of psychology, both without the use of paper and pencil. If students studying mathematics or physics fare better in the THOG task primarily because a majority would use paper and pencil to reduce working memory load, and thus avoid confusion, we expect no dierence in solution rates between the two groups. If, on the other hand, the advantage of students of mathematics or physics is based upon better acquaintance with hypothesis testing and combinatorial analysis as suggested by Newstead, Girotto, and Legrenzi (1995), this group should still produce higher solution rates than students of psycholTable 3 Responses of students of mathematics and students of psychology to the two conditions of the THOG task in Exp. 3
ogy. Exp. 3 also served as a last try to test demonstrative writing as a means to achieve facilitation.
Group
Forty students of mathematics or physics were invited to participate in the experiment. These participants participated voluntarily and received no payment. All participants from this sample had received at least four semesters of formal maths training. Eighty psychology majors were used as participants. None of the psychology students had received training in formal logic. Materials and procedure The same materials and procedures as in the previous experiments were used. In the conditions without demonstrative writing, the second card stated that the experimenter had remembered two features. In both conditions no paper and pencil was available for use.
Results As before, between groups only the proportions of correct solutions were compared. Demonstrative writing once more had no eect, v2 (1, N 120) 0.14, P 0.71, Table 3). There was no dierence between psychology students and students of mathematics or physics [v2 (1, N 120) 1.58, P 0.21]. Without paper and pencil, psychology students were slightly better than maths students (43.8% correct vs 30% correct). In the sample of maths students, the rate of near insight answers was marginally signi®cant (P 0.011), in the sample of psychology students, the rate of near insight answers was not above chance level (P 0.17). Discussion Without paper and pencil, the solution rate of a sample of students with a broad background in mathematics is
Condition With demonstrative writing
Without demonstrative writing
Maths students
n = 20 Correct Near insight Intuitive A Intuitive B Other
6 6 5 1 2
n = 20 Correct Near insight Intuitive A Intuitive B Other
6 4 3 2 5
Psychology students
n = 40 Correct Near insight Intuitive A Intuitive B Other
16 8 6 9 1
n = 40 Correct Near insight Intuitive A Intuitive B Other
19 4 6 7 4
285 Table 4 Eect size (w), power, and number of participants (n) in the statistical analyses of an eect of demonstrative writing in Exps. 1±3. Type I error and degrees of freedom were a = 0.05 and df = 1, respectively, for all analyses (DW demonstrative writing, P&P paper and pencil) Comparison
w
Power
n
Exp. 1, Control vs DW, Question, P&P available Exp. 1, Control vs DW, Question, P&P enforced Exp. 2, with DW vs without DW Exp. 3, with DW vs without DW
0.16
0.23
60
0.20
0.36
60
0.11 0.03
0.13 0.06
60 120
not higher than the solution rate of a sample of psychology students. This result suggests that the superior performance of the former was indeed due to these participants' superior skill in one particular meta-cognitive strategy. In this case it was the simple strategy of using paper and pencil to note the hypothetical features and the interim results. After the third unsuccessful attempt it seems safe to conclude that demonstrative writing has no facilitating eect. Besides a marginal tendency to increase the rate of near insight answer patterns (Exps. 1 and 3), there was no hint of a facilitation of reasoning. Because this amounts to accepting the null hypothesis, we looked for the power of the statistical analyses of the eect of demonstrative writing in Exp. 1±32. The analyses were computed with GPOWER (Erdfelder, Faul, & Buchner, 1996). The results are summarized in Table 4, which also provides w as an index of eect size (see Cohen, 1988). Power was very low in all analyses. An inspection of Table 4 suggests that this is due to the low eect sizes. For comparison, we calculated the eect sizes of four demonstrations of facilitation by separation found in the literature. We analyzed Girotto and Legrenzi (1989, homogeneous pub problem, Exp. 3, vs standard THOG, Exp. 2), Newstead and Griggs (1992, homogeneous pub problem vs standard THOG, Exp. 1), Girotto and Legrenzi (1993, SARS-THOG vs standard THOG, Exp. 1), and Needham and Amado (1995, Executioner THOG, complete version vs standard THOG, Exp. 1; Pythagoras THOG vs standard THOG, Exp. 1). Eect sizes (w) were 0.54, 0.45, 0.40, 0.42, and 0.41, respectively. Given the lowest of these values as an estimate (w 0.40) we would have obtained power values from 0.87 to 0.99 with our sample sizes. This analysis suggests that indeed the low power in our experiments resulted from the low eect sizes for demonstrative writing compared to other facilitatory eects. Therefore, although power was very low, we conclude that demonstrative writing is not an appropriate manipulation to obtain facilitation. A discussion of the implications of this ®nding is postponed to the general discussion.
2
We thank Peter Frensch for this suggestion.
Experiment 4 Exp. 4 was intended to test the robustness of the ®nding of Exp. 3 that maths students without paper and pencil were no better than psychology students. Demonstrative writing was no longer investigated in Exp. 4. Method Unlike the previous experiments, Exp. 4 was done in groups, not in single sessions. Data collection took place in two regular classes of psychology and mathematics. The lecturers had informed their students beforehand about the data collection. Participation was voluntary. Participants received neither payment nor course credit. Participants In Exp. 4, 44 students of mathematics or physics participated; 2 of these participants were in the second semester, all others in the fourth or higher. There were 52 students of psychology, most of them in the second semester; 6 of these participants had to be excused; 4 participants knew the task; 2 participants had received erroneous material. This left 46 participants in the sample of psychology students. None of these participants had received training in formal logic. Materials and procedure Each participant tried to solve the THOG task twice. Participants received a booklet with two dierent versions of the THOG task on two pages. One version of the THOG task used circles and diamonds and the colors red and blue, the other version used triangles and stars and the colors green and yellow. The geometrical ®gures were shown in random order on the top of the pages, the leftmost ®gure was the THOG. The second THOG was never in the same position among the remaining three ®gures in the two versions. At the bottom of the page the three ®gures were shown once more, each with the three answer categories beside it. For each ®gure, participants had to tick o their choice. Half of the participants received the version with circles and diamonds ®rst, the other half the other version ®rst. The experimenter ensured that participants who sat side by side received two booklets with a dierent order of the two versions. Participants were told that their neighbor had a dierent version. As in the previous experiments, participants were warned that the task is dicult. Participants were advised to take no notes. Seven minutes were allowed for each task. At the end of each task, participants were asked to mark their ultimate choice with an arrow, in case they had changed their initial solution.
Results An inspection of the answer sheets revealed that in both groups and in both tasks sizable minorities had realized a ®rst solution, which was changed within the alloted 7 min. This was indicated by arrows marking the ultimate solution, as described above. Two independent judges inspected all answer sheets. Their task was to ®nd and classify preliminary solution attempts. A solution attempt was ambiguous if a participant had marked more than one category before indicating the ultimate solution with an arrow. The judges agreed perfectly. In the sample of maths students, 16 of 44 participants realized a ®rst solution in the ®rst task, 7 participants in the second task. In the
286 Table 5 Responses of students of mathematics and students of psychology in the two THOG tasks used in Exp. 4. The columns ``With preliminary solution'' include solutions of participants (1. attempt), who later changed their initial solution within a task (2. attempt)
Group
First THOG task Maths students n = 44
Psychology students n = 46
Second THOG task Maths students n = 44
Psychology students n = 46
Answer pattern
With preliminary solution 1. attempt
Correct Near insight Intuitive A Intuitive B Other S Correct Near insight Intuitive A Intuitive B Other S
2 6 1 4 3
Correct Near insight Intuitive A Intuitive B Other S Correct Near insight Intuitive A Intuitive B Other S
1 4 0 2 0
sample of psychology students, 14 of 46 participants realized a ®rst solution in the ®rst task, 10 participants in the second task. Among the answer sheets of maths students, there were 4 ambiguous preliminary solutions (3 in the ®rst task, 1 in the second task). Among the answer sheets of psychology students, there were 5 ambiguous preliminary solutions (4 in the ®rst task, 1 in the second task). For 3 of the 4 ambiguous preliminary solutions of maths students, the following ultimate solution was correct. In the sample of psychology students, this was the case for only 1 of the 5 ambiguous preliminary solutions. To obtain a conservative test of the ®nding of Exp. 3 that math students were no better than psychology students without the use of paper and pencil, ambiguous solutions were equated with the ultimate solutions. The results are summarized in Table 5. For the following analyses only the proportions of correct solutions were compared. In the ®rst task there was no signi®cant dierence in solution rates between those students of mathematics and psychology who did not change their solution on the answer sheets [v2 (1, N 60) 1.44, P 0.23]. Among those students who did change an initial solution, there was no dierence between maths and psychology students for the ®rst attempt (P 0.28, Fisher's exact test, one-tailed3), but maths students were signi®cantly better than psychology students in the second attempt (P < 0.01, Fisher's exact test, one-tailed). 3 In case of small cell frequencies, Fisher's exact test was used rather than a v2 test.
0 4 1 1 8
1 1 0 2 6
2. attempt
Without preliminary solution
10 3 0 0 3 16 1 5 0 3 5 14
13 3 1 4 7 28 9 1 5 9 8 32
5 0 0 0 2 7 4 1 1 1 3 10
27 4 1 1 4 37 13 3 4 9 7 36
In the second task, however, among those participants who did not change an initial solution, maths students were much better than psychology students (v2 (1, N 73) 8.58, P < 0.01). Among those students who changed a solution, there was neither a dierence between maths students and psychology students in the ®rst solution attempt (P 0.85, Fisher's exact test, onetailed), nor in the second solution attempt (P 0.18, Fisher's exact test, one-tailed).
Discussion In this experiment, participants solved the THOG task twice. Within each task, some participants changed their initial solution. If only the ®rst solution in the ®rst task is considered, whether or not it was subsequently changed, then 15 of 44 maths students (34.1%) solved the THOG problem at the ®rst attempt. This is close to the solution rate of the sample of maths students in Exp. 3 (30% correct). Among the psychology students in Exp. 4, 19.6% solved the THOG problem at the ®rst attempt. As in Exp. 3, where participants solved the task only once, there was no signi®cant dierence between maths students and psychology students. This is an important ®nding because it con®rms the conclusion drawn from Exp. 3, that it is not better acquaintance with hypothesis testing and combinatorial analysis which gives maths students an advantage in a deductive reasoning task like the THOG task.
287
Both groups improved markedly over the course of Exp. 4. In both groups the number of participants who ultimately solved the task correctly was about twice the number of successful participants in the very ®rst solution attempt. It is important to note that participants received no feedback during the experiment. Any motivation to change a solution must have been self generated. In Exps. 1±3 it was common that participants who had been informed that their solution was wrong, spontaneously started to produce another, often correct solution. In Exp. 4, however, any change in solution cannot be traced to feedback by the experimenter. A participant might change a solution for several reasons: (a) as the result of another random choice, (b) as the result of an unsuccessful reproduction of the previous solution, or (c) because the participant deemed the previous solution wrong and undertook a new solution attempt. The ®rst possibility can be ruled out because answer patterns were not evenly distributed among the 27 possible ones. The second possibility can be ruled out because then also correct solutions should have been forgotten and erroneously reproduced. This was never the case. Therefore, it is likely that the majority of participants who changed a previous solution tried to correct an error. Here, students of mathematics were more successful than students of psychology. Altogether, 20 (45.5%) of the maths students and 23 (50%) of the psychology students changed a solution at least once (these numbers not only include all instances of participants who changed a solution within a task, but also all instances where participants changed a solution from the ®rst to the second task). Of the 20 changing maths students, 13 ®nally arrived at the correct solution, but only 8 of the 23 changing psychology students did so. This suggests that students of psychology and mathematics have an equal chance of noticing an error in logical reasoning but that students of mathematics have an higher chance of correcting the error. Finally, it should be noted that in previous experiments improvement of solution rates in repeated administration of the abstract THOG task has not been observed (Smyth & Clark, 1986, Exp. 1; Wason & Brooks, 1979, Exp. 1; Ziegler & Schober, 1995).
General discussion The general discussion focuses on four themes in order: (1) the role of the question for the hypothetical combinations, (2) demonstrative writing as a means to promote separation, (3) the working memory hypothesis, and (4) formal training and meta-cognitive skill in logical reasoning. It has been known for some time that the question for the hypothetical combinations has no eect upon the solution rates of the abstract version (Girotto & Legrenzi, 1993; Newstead & Griggs, 1992; Wason & Brooks, 1979). Needham and Amado (1995) proposed that only full narrative versions produce facilitation. According to the authors, full narrative versions trigger mechanisms
of text understanding, which in turn produce the separation of conceptually dierent levels. Describing the role of a person who writes down two properties in a text is a possibility to focus a participant's attention upon the separation of data level and hypothesis level. Our intention was to test whether the attention focusing by demonstrative writing could be transferred from the domain of text understanding to live performance in an experimental setting. Because demonstrative writing failed to produce facilitation, our research on the role of the question merely replicated the known fact about its irrelevance for abstract versions. Why did the act of demonstrative writing down of the (hidden) properties by the experimenter have no facilitatory eect? In the cover story of the Pub problem (Girotto & Legrenzi, 1989) and in the narrative versions of Needham and Amado (1995) a card was described upon which the two features were noted. In the conditions with demonstrative writing in Exp. 1, 2, and 3, we used a real card upon which the experimenter noted two words under the eyes of the participants. We hypothesized that according to confusion theory, this manipulation should establish a separation problem with the classic abstract THOG without thematic content, material or narrative embedding. This hypothesis turned out to be false. Nevertheless, we do not wish to imply that this result should be regarded as a falsi®cation of confusion theory. Because there are a number of thematic separation problems which produce facilitation, demonstrative writing obviously is no proper means to achieve separation and thus facilitation. There are a number of possible explanations for this result. On the ®rst sight, it seems strange that a narrative version, which merely describes an individual who writes down the properties upon a card works better than a real person who does the same. Can a story be bigger than life? A story is certainly denser than life. A reader can expect that a story entails only information that the author regarded as important for his message. This might be especially true for a participant reading instructions in a psychological experiment. Watching the experimenter, on the other hand, is not an obvious strategic option in most experiments. Note, however, that demonstrative writing also had no eect upon students of mathematics or physics, who certainly lack the sophistication which psychology undergraduates develop as participants of psychological experiments. We conclude therefore that demonstratively writing down properties upon a card is simply not a sucient manipulation to capture a participant's attention. The working memory hypothesis was partially supported. Our initial goal was to demonstrate better performance on the THOG task by reducing working memory load. We were not able to demonstrate improvement of solution rates by encouraging or enforcing participants to use paper and pencil (Exp. 1). We were also not able to demonstrate higher solution rates in a sample of participants who were expected to use paper and pencil spontaneously and more often contingent
288
upon the actual use of paper and pencil (Exp. 2). We were able, however, to demonstrate that solution rates in samples of students of mathematics or physics are not better than solution rates in samples of students of psychology, if participants are not allowed to use paper and pencil (Exp. 3 and 4) and only if the very ®rst solution attempts of participants are considered (Exp. 4, see below). This implies, we believe, that a more direct demonstration of the usefulness of a reduction of working memory load by the use of paper and pencil poses three challenges for future research: (1) to establish the voluntary use of paper and pencil in a training, (2) to demonstrate that the use of paper and pencil transfers from the training to the THOG task, and (3) to obtain higher solution rates in the training group as compared to an untrained control group. The majority of maths students who used paper and pencil just used the medium to note interim results. Only a few used specialized mathematical symbols. Declarative mathematical knowledge appears to be neither necessary nor sucient for successful solution of the THOG task. Otherwise this task would be virtually impossible for participants without mathematical training. Procedural mathematical knowledge can be more or less domain speci®c. Knowledge about speci®c techniques in the construction of proofs, e.g., reductio ad absurdum, or knowledge about techniques in transformation of formulas, e.g., telescoping a series, certainly is domain speci®c. Techniques like visualization, going to the extremes, using external information representation, or checking preliminary solutions, however, are not domain speci®c (Levine, 1988). These techniques can be used in solving mathematical problems as well as in everyday problem solving or logical reasoning. Our evidence suggests that it is the use of these techniques which students of mathematics could transfer from their training background to the logical reasoning task. Only the improvement of maths students across repeated solution attempts in Exp. 4 can be seen as the in¯uence of domain-speci®c skills. Students of psychology and students of mathematics were equally likely to change a previous solution attempt, but only students of mathematics were more likely to produce a correct solution. In conclusion, we suggest that it is worthwhile focusing upon a very basic aspect of most of these techniques: that they require the use of paper and pencil. Acknowledgements The authors would like to thank two anonymous reviewers for helpful comments and Peter French for editorial assistance.
References Cheng, P. W., Holyoak, K. J., Nisbett, R. E., & Oliver, L. M. (1986). Pragmatic versus syntactic approaches to training deductive reasoning. Cognitive Psychology, 18, 293±328. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Erdfelder, E., Faul, D., & Buchner, A. (1996). GPOWER: a general power analysis program. Behavior Research Methods, Instruments, and Computers, 28, 1±11. Evans, J. St. B. T., Newstead, S. E., & Byrne, R. M. J. (1993). Human reasoning. Hove, UK: Erlbaum. Evans, J. St. B. T., & Over, D. E. (1996). Rationality and reasoning. Hove, UK: Psychology Press. Girotto, V., & Legrenzi, P. (1989). Mental representation and hypothetico-deductive reasoning: the case of the THOG problem. Psychological Research, 51, 129±135. Girotto, V., & Legrenzi, P. (1993). Naming the parents of the THOG: Mental representation and reasoning. The Quarterly Journal of Experimental Psychology, 46A, 701±713. Griggs, R. A., & Newstead, S. E. (1982). The role of problem structure in a deductive reasoning task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 297±307. Jackson, S. L., & Griggs, R. A. (1988). Education and the selection task. Bulletin of the Psychonomic Society, 26, 327±330. Levine, M. (1988). Eective problem solving. Englewood Clis, NJ: Prentice Hall. Needham, W. P., & Amado, C. A. (1995). Facilitation and transfer with narrative thematic versions of the THOG task. Psychological Research, 58, 67±73. Newstead, S. E., Girotto, V., & Legrenzi, P. (1995). The THOG problem and its implications for human reasoning. In S. E. Newstead, & J. St. B. T. Evans (Eds.), Perspectives on thinking and reasoning (pp. 261±285). Hove, UK: Erlbaum. Newstead, S. E., & Griggs, R. A. (1992). Thinking about THOG: sources of error in a deductive reasoning problem. Psychological Research, 54, 299±305. O'Brien, D. P., Noveck, I. A., Davidson, G. M., Fisch, S. M., Lea, R. B., & Freitag, J. (1990). Sources of diculty in deductive reasoning: the THOG task. The Quarterly Journal of Experimental Psychology, 42A, 329±351. Smyth, M. M., & Clark, S. E. (1986). My half-sister is a THOG: Strategic processes in a reasoning task. British Journal of Psychology, 77, 275±287. Wall, L., & Schwartz, R. L. (1991). Programming perl. Sebastopol, CA: O'Reilly & Associates. Wason, P. C. (1966). Reasoning. In B. M. Foss (Ed.), New horizons in psychology (Vol. 1, pp. 135±151). Harmondsworth: Penguin. Wason, P. C. (1977). Self-contradictions. In P. N. JohnsonLaird, & P. C. Wason (Eds.), Thinking: Readings in cognitive science (pp. 114±128). Cambridge, UK: Cambridge University Press. Wason, P. C., & Brooks, P. G. (1979). THOG: the anatomy of a problem. Psychological Research, 41, 79±90. Ziegler, A., & Schober, B. (1995). Smedslunds Zirkel oder die RepraÈsentations-Interferenz-Dichotomie beim logischen Schluûfolgern [Logical thinking: Smedlunds circle or the dichotomy of problem-presentation and inferences]. Psychologische BeitraÈge, 37, 181±198.