Law Hum Behav DOI 10.1007/s10979-011-9282-4
ORIGINAL ARTICLE
If Anything Else Comes to Mind… Better Keep It to Yourself? Delayed Recall is Discrediting—Unjustifiably Aileen Oeberst
American Psychology-Law Society/Division 41 of the American Psychological Association 2011
Abstract Inconsistencies in eyewitness accounts are perceived as indicative of inaccuracy and reduce the witnesses’ credibility. Reminiscence, the delayed recall of previously not recalled information, is generally interpreted as a type of inconsistency. Even though it does not necessarily involve the falsity of the statements, reminiscence presents a counterintuitive instance with mostly unknown reliability. Two studies empirically assessed the accuracy of reminiscent items after retention intervals of up to 1 week and contrasted them with peoples’ beliefs regarding their accuracy. In line with an implicit assumption of memory fading with the passage of time, delayed recall of previously unmentioned details was judged to be unreliable. In contrast, actual accuracy of reminiscent details was consistently high and even comparable to immediate recollections. Although participants generally underestimated accuracy, it was most pronounced in the case of reminiscence. The findings are discussed within the context of contemporary legal practice, such as jury instructions. Keywords Eyewitness memory Reminiscence Implicit theories Credibility Judgment
Jury members are usually free to decide which (part of a) testimony to believe and how to weigh the evidence (e.g., Ninth Circuit Model Criminal Jury Instructions, 2010). Sometimes they are even explicitly asked to rely on their common sense (e.g., Florida Supreme Court Standard Jury Instructions 3d, 2009). On the other hand, some jury A. Oeberst (&) Faculty of Human Sciences, Institute of Psychology, University of Osnabru¨ck, Seminarstr. 20, 49074 Osnabru¨ck, Germany e-mail:
[email protected]
instructions also recommend what factors to consider when evaluating eyewitness testimony. Specifically, jurors are often explicitly requested to consider the consistency of witnesses’ statements made on various occasions (e.g., Florida Supreme Court Standard Jury Instructions 3d, 2009; New York Criminal Jury Instructions 2d, 2007). Moreover, inconsistent accounts may prompt the lawyer to interrogate the witness more closely on that particular issue during the trial. In fact, eliciting and interrogating inconsistencies is a courtroom convention typically encouraged by trial practice manuals (e.g., Bailey & Rothblatt, 1985). This practice is an important part of trial preparation (Prager, Moran & Sanchez, 1996) as well as a potent strategy for discrediting vulnerable eyewitnesses (Ellison, 2001). The underlying implicit assumption of this proceeding is that inconsistencies are indicative of inaccuracy. Several current studies have demonstrated that lawyers, judges, police officers as well as potential jurors frequently hold this assumption (Fisher & Cutler, 1995; Potter & Brewer, 1999; Uviller, 1993). Empirical evidence with respect to this issue is by no means straightforward and persuasive, however. Correlations between consistency and accuracy are moderate at most (e.g., Fisher & Cutler, 1995) and nonexistent in some cases (e.g., Smeets, Candel, & Merckelbach, 2004), leading a number of researchers to question the assumption that inconsistent statements are associated with inaccuracy (e.g., Berman & Cutler, 1996; Brewer, Potter, Fisher, Bond, & Luszcz, 1999; Fisher & Cutler, 1995). As Fisher, Brewer, & Mitchell (2009) concisely point out consistency and accuracy do not rely on identical mental processes. Thus, their correspondence should be far from perfect. One major difficulty regarding legal practice arises from the undifferentiated use of the concept of inconsistency.
123
Law Hum Behav
Jury instructions usually refer to ‘‘inconsistency’’ in a rather general way, which includes several different types of inconsistencies (Sixth Circuit Criminal Pattern Jury Instructions, No. 107, 2005). Some research studies also conflate these subtypes (e.g., Berman, Narby, & Cutler, 1995; Leippe & Romanczyk, 1989; van Giezen, Arensman, Spinhoven, & Wolters, 2005). Recently, however, an increasing number of authors have addressed the distinction of various kinds of inconsistencies (Berman & Cutler, 1996; Brock, Fisher, & Cutler, 1999; Gilbert & Fisher, 2006). This is crucial to account for the many fundamental differences that occur both logically and empirically. For example, while explicit contradictions necessarily imply that one statement is incorrect,1 the mere presence versus absence of a detail does not. Hence, details recalled at t1 only (forgetting) or at t2 only (reminiscence) do not need to be incorrect. This logical distinction makes a difference in the perception as well. Participants judged reminiscent items to be less indicative of inaccuracy than contradictions (Brewer et al., 1999; Potter & Brewer, 1999). Nevertheless, Berman & Cutler (1996) reported that novel recollections were still less convincing than consistent statements. Fluctuations in item accessibility (i.e., reminiscence and forgetting), however, are pervasively common (Tulving & Pearlstone, 1966; see also Buschke, 1974; Tulving, 1967) and therefore are not necessarily clear indications of inaccuracy. While this somehow exhibits a consensus with respect to forgetting, it remains rather unknown in the case of reminiscence.
Reminiscence Intuitively, one might assume, that the first recollection is the most exhaustive one, followed by forgetting, which leads to a continuous drop in performance with increasing retention intervals (Ballard, 1913; Ebbinghaus, 1885; Gilbert & Fisher, 2006). Since Ballard (1913), however, extensive research has demonstrated that the recollection of novel details, i.e., reminiscence, also occurs and may exceed forgetting even weeks and months after encoding took place (Bluck, Levine & Laulhere, 1999; Dunning & Stern, 1992; for a review see Erdelyi, 2010). Nonetheless, the accuracy of information cannot be assessed based solely on the observation that delayed retrieval of correct details reliably occurs. There are two
notable studies that addressed this issue.2 Brock et al. (1999) reported a 66% rate of accuracy for items, which were recalled only at the second interview (2 weeks later). Moreover, reminiscent items were significantly less accurate than consistent items (81%). After a shorter retention interval (48 h), Gilbert and Fisher (2006) found that 87% of novel recollections at t2 were correct. Although the evidence is still scarce and requires further investigation, it points out a potential discrepancy. Following the prevailing view of memory decreasing with the passage of time, Fisher et al. (2009) identified the naı¨ve expectation of reminiscence occurring infrequently and being rather inaccurate. In their concise review of the research in cognitive psychology, however, they clearly identify reminiscence as a common experience, which is not indicative of error at all. Nevertheless, statements are ‘‘weighed against expectations’’ (Leippe & Romanczyk, 1989, p. 128; see also Steward et al., 1996). So far, no scholarly endeavor directly contrasted these implicit assumptions with empirical evidence. Therefore, this paper seeks to contribute to this disciplinary undertaking by assessing expectations about the accuracy of reminiscent items and by comparing these with empirically observed data.
Study 1 In one part of this study, participants encoded and repeatedly recalled a list of pictures (recall group). A sample of law students were then asked to estimate the accuracy of novel recollections and of memory reports in general made by the participants (estimation group). I hypothesized that the decline in expected memory performance would be sharper than the decline in actual memory performance (H1). Additionally, I expected that the accuracy of novel items would be significantly underestimated (H2). Method Participants. Seventy-six German university students participated in the study. Thirty-nine law students were in the estimation group. The recall group consisted of 37 undergraduates from psychology and cognitive sciences. Mean age in the total sample was 24.5 and both groups did not differ from one another, t \ 1. As the effort varied greatly between the groups (see below), so did the compensation. All law students took part in a drawing for gift coupons for cinema. Students in the observation group received course credit.
1
Note that in the case of continuous variable (e.g., height) and a lenient criterion of accuracy (e.g., ±5%) may include the case of two inconsistent recollections that are both correct (Ron Fisher, personal communication, October 31, 2010). However, as there was no such instance in my data, this possibility is left out for the sake of simplicity.
123
2
The paper focuses on research that was conducted under controlled conditions and with adult samples, since age of participants was found to make a difference with respect to consistency and accuracy of reminiscent items (Peterson, Moores, & White, 2001).
Law Hum Behav
Materials. In the memory task, stimuli consisted of 24 monochromatic line drawings of common objects (Snodgrass & Vanderwart, 1980). Familiarity and complexity of the images were controlled for. Design and Procedure. The study is composed of two factors. First, the condition of estimation versus observation varied between subjects in a quasi-experimental design. Second, the time interval varied within subjects. Recollections and estimations were collected at four time points— immediately, 5 min, 20 min and 1 week after stimulus presentation took place. In the observation group, the experiment consisted of two sessions, 1 week apart. In the first session participants learned that they would see a list of pictures they would have to memorize and recall. In order to minimize rehearsal, the number of recall tests was not announced and the alleged purpose of the second session was to assess retest reliability of some questionnaires. Immediately after stimulus presentation participants completed the first recall test (3 min). Subsequently they engaged in an unrelated filler task (2 min) before taking the second test. After working on another unrelated filler task (12 min) the third recall test followed. Finally, participants filled out some unrelated questionnaires. They returned to the lab 1 week later to recall their memories for the last time. They were then debriefed and dismissed.
Participants in the estimation group received a small booklet, which informed them that the study would assess their personal evaluation of human memory performance. Then they answered three questions. The first question targeted expectations about the average recall performance of people repeatedly asked to retrieve encoded information after several minutes, hours, and up to 1 week. Participants were given the choice of either describing their answer in own words or selecting one of fourteen graphs that best fitted their expectations (see Fig. 1). Instructions emphasized that it was not about the precise values of memory performance, but rather the schematic pattern of the course over the passage of time. The next question described a hypothetical experiment that resembled the procedure carried out in the observation group. Participants were asked to indicate their expectations regarding the amount of details recalled over time. In order to facilitate the comparisons, the average memory performance for the immediate recall test was preset at 60% of the pictures. This rate resembled the actual mean immediate recall performance that had been observed in a pilot study. Thus, they estimated the mean recall performance for the second, third and fourth test, respectively. The third question asked participants to estimate the accuracy of items that were recalled for the first time only in the second, third or fourth test, respectively. Responses were given in percentages. To ensure a correct understanding of this measure a legend
Fig. 1 Rough drafts of the schematic course of memory performance over increasing retention intervals (minutes to 1 week) participants had to choose from or describe with their own words
123
Law Hum Behav
clarified the parameters: given 100 hypothetical novel recollections, 0% meant that none were correct, while 100% indicated that all of them were accurate, with all other possible values in between. Finally they indicated whether they had heard of the phenomenon of reminiscence before, which none of them had. Results Analyses of the Hypotheses. A comparison of the observed memory performance in the immediate test (M = 55.2, SD = 3.71, CI95 = 50.1, 60.3) with the preset value in the estimation group (60%) yielded a marginal difference, t(36) = 1.89, p = .07. Note, however, that the difference was reversed to the divergence that developed over the following recall tests. Testing hypothesis 1, a 2 (estimation vs. performance) 9 3 (recall test) repeated measures analysis of variance revealed a significant main effect for test, F(2,142) = 145.63, p \ .001, g2 = .51, a significant main effect for group, F(1,71) = 25.96, p \ .001, g2 = .27, and, more importantly, a significant interaction, F(2,142) = 71.17, p \ .001, g2 = .25. Contrast analyses yielded significant interactions both, for the interval from the first to the second measure, F(1,71) = 55.08, p \ .001, as well as for the interval from the second to the third measure, F(1,71) = 79.88, p \ .01. Simple t tests indicated the predicted pattern (see Fig. 2). While there were no significant differences between the estimated (M = 54.7, SD = 7.4, CI95 = 52.3, 57.1) and observed (M = 54.9, SD = 15.9, CI95 = 49.8, 60.2) memory performance with respect to the second test, t(50) = .1, estimations and observations diverged in the
Fig. 2 Mean estimated and empirically observed rates of correct recall of 24 pictures for the repeated recall tests. Error bars reflect the 95% confidence interval
123
next two measurements: Twenty minutes after the end of the stimulus presentation law students expected people to recall 41.9% (SD = 12.5, CI95 = 37.9, 45.9) of the stimuli, which resembles a significant drop from the expectations regarding the second test, t(38) = 7.79, p \ .01, d = 1.24. Yet, the average of actually recalled pictures at the same time was significantly higher (M = 57.4, SD = 17.32, CI95 = 51.72, 63.1), t(65) = 4.46, p \ .01, d = 1.07. Note that there was a slight, but significant increase in the actual memory performance from the second to the third test, t(36) = 2.3, p \ .05, d = .19. One week later estimated performance significantly decreased (41.9% ? 19.8%, SD = 12.2, CI95 = 15.9, 23.7), t(38) = 12.35, p \ .01, d = 1.79, and so did obtained memory performance (57.4% ? 50.1%, SD = 19.8, CI95 = 43.3, 56.9), t(33) = 5.35, p \ .01, d = .43. Estimations and observations differed, again, t(54) = 7.74, p \ .01, d = 1.84. Taken together, the results confirm the first hypothesis. The course of memory performance over time was significantly underestimated. The second hypothesis proposed a significant underestimation of the true accuracy of reminiscent details. As such, accuracy of items not previously recalled was the dependent variable. For each test there was a substantial group of participants who recollected novel details (t2: 67.6%, t3: 62.2%, t4: 37.8%). Since the majority of participants did not provide new items on every test, however, comparisons between the estimated and the observed values were analyzed for every recall separately. Given the lack of normal distribution (Kolmogorov Smirnoff’s Z [ 2.0, p \ .001) Mann–Whitney U tests were conducted. All analyses yielded significant differences between estimations and observations. Figure 3 shows that novel recollections were highly accurate (t2: M = 96.2, SD = 19.6, CI95 = 88.6, 103.8; t3: M = 90.3, SD = 24.5, CI95 = 80.3, 100.3; t4: M = 91.11, SD = 26.6, CI95 = 78.2, 104.1) with no significant changes over retention intervals (Z \ 1, p [ .5). In contrast, there was a significant decrease in estimated accuracy of reminiscent details, both from second to third test (t2: M = 76.1, SD = 22.7, CI95 = 68.9, 83.3; t3: M = 71.2, SD = 21.2, CI95 = 64.4, 77.0, Z = 2.37, p \ .05) as well as from third to fourth test (t4: M = 37.5, SD = 19.0, CI95 = 31.5, 43.5, Z = 5.23, p \ .01). More importantly, law students underestimated the true accuracy of reminiscent in every single test (2nd test: Z = 6.03, p \ .001, 3rd test: Z = 4.43, p \ .001, 4th test: Z = 4.88, p \ .001). The results, therefore, fully confirm the second hypothesis. Additional Analyses. Comparisons between the observed accuracy of reminiscent items and the observed general accuracy of the first recall test (M = 98.9, SD = 3.5, CI95 = 97.7, 100.1) revealed marginal but no
Law Hum Behav
Fig. 3 Mean estimated and empirically observed rates of items not previously recalled being correct for the various tests. Error bars reflect the 95% confidence interval
significant differences for any of the three recalls (t2: Z = 0; p = 1; t3: Z = 1.82; p = .07; t4: Z = 1.34; p = .18). Observed accuracy of reminiscent details was not associated with the amount of information recalled, p [ .3. Also, accuracy of reminiscent details was independent of the development of overall recall performance over tests, p [ .2. In contrast, estimated overall recall performance and accuracy of reminiscent items were correlated. Higher estimated accuracy of reminiscent details was associated with a higher estimated overall recall performance (t3: r = .31, p = .05; t4, r = .40, p = .01). Finally, law students’ beliefs regarding the very general development of memory performance over time revealed that 69% of the participants expected a monotonous decline in performance (varying only with respect to the underlying function, e.g., linear, hyperbola). One student chose the constant level function while another went for the monotonous increase. All other participants selected graphs that consisted of alternating phases of increases and decreases. Discussion The goal of the study was to contrast laypeople’s implicit assumptions about memory performance and about the accuracy of delayed novel recollections with empirical evidence. The findings show that law students indeed hold a theory of memory fading with increasing retention intervals (see also Ballard, 1913; Gilbert & Fisher, 2006). Actual memory performance, however, differed from these expectations in two ways. First, recall performance did not decline monotonically. Rather, there was a small, yet
significant, increase (i.e., hypermnesia) within the first three tests. Second, the actual amount recalled was significantly higher than had been estimated for retention intervals of 20 min to 1 week after stimulus presentation. That is, law students not only expected forgetting over time, they expected substantially more of it than was observed in actual recollections. This finding empirically supports the naı¨ve assumption about memory outlined by Fisher et al. (2009). In correspondence with the idea of a declining memory performance, law students expressed significant doubts about the credibility of delayed novel recollections and these doubts significantly increased with the passage of time. Again, however, there was a huge discrepancy compared to the empirical data. Novel recollections were highly accurate (see also Gilbert & Fisher, 2006) and even comparable to immediate recollections. Hence, the findings suggest that the critical view upon reminiscence as exhibited by law students is unjustified. There are some limitations, however. First, one may question the ecological validity of the findings for various reasons. One regards the stimuli used. Lacking the associative structure of information and the gist of an episode, the findings are difficult to generalize to events, particularly because event characteristics are in two ways relevant to this topic. On the one hand, it increases the likelihood of false memories (Brainerd & Reyna, 2002; see also Kleider, Pezdek, Goldinger, & Kirk, 2008). On the other hand, meaningfulness was found to promote novel recall (Erdelyi, Buschke, & Finkelstein, 1977; Erdelyi & Stein, 1981) and has been even proposed to constitute one of two necessary preconditions of increases in recall performance (Kaze´n & Solı´s-Macı´as, 1999). Beyond retention one might argue that the stimuli reduce the extent to which estimations may be generalized. After all, students are less likely to have previous experience relating to the recall of line drawings. Another critical aspect regards intentional learning. Participants in this study knew about a memory test in advance, while learning in real life situations is incidental. In the same vein, the rather high number of successive recall tests is unlikely to occur in real life. Moreover, it may have fostered memory performance (Roediger III & Karpicke, 2006) and it might have raised suspicions about another upcoming test in the second session, which could have improved retention as well (Szpunar, McDermott, & Roediger, 2007). A second limitation of the study concerns the application of a quasi-experimental design. Law students had been recruited for the estimation group because they represent future workers in the legal system. Although positions in court are not assigned randomly, differences in the sample or the compensation may have contributed to the discrepancies between estimations and observations.
123
Law Hum Behav
Last but not least one may question whether the underestimation of accuracy is unique for reminiscent items. Although repeatedly recalling an item as well as forgetting is in line with an implicit theory of declining memory performance, item accuracy could be generally underestimated. All of these aspects were addressed in Study 2.
woman stumbling in the street. A man approaches her but she walks away. Yelling and not caring about the traffic, she walks into some passing cars forcing them to stop. Finally the man runs to her and tries to force her off the street. A second man tries to help the first keep hold of the woman, who is fighting and shouting. Afterwards a police car arrives at the scene. Two officers get out and approach the woman and the two men.
Study 2 The basic approach was identical to Study 1, with the following exceptions: First, the stimuli consisted of a film clip instead of line drawings. Second, data was collected at only two time points to minimize the possibility of test expectation. Third, encoding was incidental. Fourth, an additional estimation group was added to the design, in which participants generated their evaluations at the very same time as the other participants reported their recollections (online estimation group). Fifth, participants in the estimation conditions experienced the to-be-remembered materials in the same way as participants in the recall condition. Sixth, participants estimated the accuracy of reminiscent items as well as the accuracy of consistent and forgotten items. In line with the results of Study 1, I expected that the observed accuracy of reminiscent items would be significantly underestimated (H1). Given the lack of previous research, no specific hypothesis was proposed with regard to the different estimation conditions. Hypothesis 2 and 3 stated that the estimated accuracy of reminiscent details would be significantly lower than the estimated accuracy of consistent (H2) and forgotten items (H3). Compared to the estimation conditions, a significantly lower discrepancy was expected to occur between the accuracy of reminiscent and consistent items in the recall condition (H4). This hypothesis was derived from the findings of study 1 and extended to forgotten items, expecting a lower discrepancy between the accuracy of reminiscent items and forgotten items in the recall condition compared to the estimation conditions (H5). Method Participants. Fifty-nine undergraduate German university students participated in this study in exchange for course credit. Mean age was 23.2. Twenty participated in the external estimation group, while the online estimation group and recall group consisted of 19 and 20 students, respectively. Materials. The stimulus material consisted of a film clip3 (1:45 min) depicting an apparently drunk young 3
The materials and measures are available from the author.
123
Design and Procedure. The study is composed of three factors. Group membership varied between subjects with random assignment regarding the observation and the online estimation group. The external estimation group consisted of an extra sample of students. Participants in this group were asked immediately to estimate memory performance for the immediate session as well as the 1-week interval, in order to prevent suspicions about a second measurement. The second independent variable refers to the time interval for which memory performance and accuracy rate for novel, forgotten and consistent items were estimated and measured, respectively. This factor varied within subjects with two measurement points given—10 minutes and 1 week after stimulus presentation. Third, estimated and observed accuracy was collected for three different types of details (reminiscent, forgotten, consistent; within subjects). The study took place in the context of a course. None of the participants knew about the nature of their task before viewing the film. After an unrelated filler task (10 min), participants in the recall condition were asked to report their recollections ‘‘as concretely and accurately’’ as possible. In the online estimation condition, participants learned that their peers had to recall what they remember and that their task was to estimate their peers’ memory performance. They were asked to judge the mean overall accuracy of the memory reports (in percent). This was the question of interest for the first session; all others were filler questions. Finally, all participants generated a code to match their data between both sessions. Participants took as much time as they needed; when everyone had finished, the experimenter informed them that the study aimed at contrasting estimations and observations without mentioning the second session, however. They only learned about it at the beginning of the course 1 week later. They received sheets that matched their previous assignment to the conditions and in the recollection group instructions were identical to session 1. Participants in the online estimation condition first indicated again the mean overall accuracy of their peer’s memory reports. Then they estimated the average accuracy of the following items: (a) details, that had been recalled consistently in both the memory reports; (b) details, that had been recalled in session 1 only, but not in session 2 (forgetting), and (c) details, that had been
Law Hum Behav
recalled in session 2 only, but not in session 1 (reminiscence). Similar to study 1, responses were given in percentages with the identical additional clarification of what these percentages meant. Finally, all participants generated the code again. For participants in the external estimation condition, there was only one session. They also viewed the same film clip and were subsequently asked to estimate mean overall accuracy of the memory reports of the participants in the recollection condition at t1 and t2. Next, they estimated the accuracy for each of the three item types (consistent, forgotten, reminiscent). Results Due to the fact that some students were absent in the second session, all analyses of variables concerning both sessions are based on a sample size of 16 and 13 participants in the recall and online estimation condition, respectively. Data Analysis. Recollections of all participants were split into single information units, scored for their accuracy and classified based on item type (consistent, reminiscent, forgotten). Irrelevant changes that did not affect the content of the information (e.g., synonyms) were treated as consistent information. Explicit changes (i.e., contradictions) were treated as two different units of information such that the first resembling a forgotten item and the second falling into the category of a reminiscent item. This approach was chosen to provide the most conservative test of item accuracies. Given that accuracy of contradictions may not exceed the rate of .50, the range of accuracy between item types is differing a priori. While this corresponds to the logical distinction, it may inflate the meaning of superior accuracy of reminiscent or forgotten items when compared to contradictions. Moreover, this procedure allows for more precise measurement of accuracy as it enables a differentiated recording for each point in time. Finally, contradictions accounted for less than 1% of all items (i.e., three details in absolute numbers). Thus, this procedure also increased the reliability of the results (see also Fisher et al., 2009). Unverifiable statements such as suspicions about intentions of actors were not included.
97.0; Mt2 = 92.4, SD = 9.8, CI95 = 88.1, 97.3), t(15) = 1.62; p = .13. On average, 18.4 (SD = 4.4, CI95 = 16.2, 20.6) details were recalled consistently at both dates, whereas 6.2 items (SD = 3.6, CI95 = 4.4, 8.0) were recalled at t1 only (forgotten items) and 4.1 (CI95 = 2.7, 5.5) were reported at t2 only (reminiscent items). Every single participant recalled at least one reminiscent detail (range: 1–11). Testing the Hypotheses. All analyses were run twice— with the data of both estimation groups aggregated as well as with all groups separately. Since most analyses rendered identical results, the reports focus on the observation– estimation difference. Thus, for the sake of simplicity, the findings are based on the collapsed data of both estimation groups as far as they did not yield inter-group differences, which are reported separately. There was one outlier in the recall group (Z [ 2), who recalled substantially more (11) but less accurate (27.3%) reminiscent details. To provide the most conservative test of the hypotheses it was included in the analyses. When significance was dependent upon the inclusion/exclusion of this case, the results of both analyses are reported. Estimations regarding the accuracy of novel recollections was significantly lower (Mcollapsed = 19.0, SD = 15.9, CI95 = 13.4, 24.6; see Table 1 for the respective means) than empirically observed accuracy (M = 83.9, SD = 25.1, CI95 = 71.3, 96.5), t(46) = 10.95; p \ .001; d = 3.1. Thus, hypothesis 1 could be confirmed. Accuracy of reminiscent items was again substantially underestimated. Hypothesis 2 proposed that estimated accuracy of reminiscent details is lower than the estimated accuracy of consistent items. To simultaneously check for differences between both estimation groups, a repeated measures analysis of variance was conducted with item type (reminiscent vs. consistent) as within subjects factor and group (online estimation group vs. external estimation group) as between subjects factor. There was a significant main effect of item type, F(1,30) = 83.73; p \ .001, g2 = .74, indicating that participants judged the accuracy of novel recollections (Mcollapsed = 19.0; SD = 15.9, CI95 = 13.4, 24.6) to be significantly lower than the accuracy of Table 1 Estimated and observed accuracy for reminiscent details M
Descriptive Analyses of the Memory Reports. Participants on average recalled 24.6 units of information (SD = 5.8, CI95 = 21.9, 27.3) in the first test and 22.5 (SD = 4.3, CI95 = 20.4, 24.6) at t2, which reflects a significant drop, t(15) = 2.94; p = .01; d = .40. Mean accuracy of both memory reports in contrast, did not significantly differ, (Mt1 = 94.4, SD = 6.0, CI95 = 91.8,
SD
95% Confidence level Lower bound
Upper bound
Recall
83.9
25.1
71.3
96.5
Online estimation
25.7
14.5
17.3
34.1
External estimation
14.9
15.9
7.9
21.9
Collapsed estimations
19.0
15.9
13.4
24.6
123
Law Hum Behav Table 2 Estimated and observed accuracy for consistent details M
SD
Table 3 Estimated and observed accuracy for forgotten details
95% Confidence level Lower bound
M
SD
Upper bound
95% Confidence level Lower bound
Upper bound
Recall
92.6
4.3
90.5
94.7
Recall
90.6
16.6
82.3
98.9
Online estimation
65.4
22.3
52.5
78.3
Online estimation
60.0
17.1
50.2
69.8
External estimation Collapsed estimations
52.8 57.5
25.9 25.1
41.2 48.7
64.4 66.3
External estimation Collapsed estimations
70.8 66.7
22.0 20.7
60.8 59.3
80.8 74.1
consistent items (Mcollapsed = 57.5, SD = 25.1, CI95 = 48.7, 66.3; see Table 2 for the respective means). Neither the main effect of group nor the interaction reached significance, p [ .1. Thus, the results confirm hypothesis 2. Reminiscent details were expected to be less accurate than information consistently recalled at both dates. Analogously, similar effects were proposed when comparing reminiscent items with forgotten items (H3). A 2 (reminiscent vs. forgotten details) 9 2 (online estimation group vs. external estimation group) yielded a significant main effect of item type, F(1,30) = 141.09; p \ .001; g2 = .79. Consistent with the hypothesis, participants expected the accuracy of novel recollections (Mcollapsed = 19.0, SD = 15.9, CI95 = 13.4, 24.6) to be significantly lower than the accuracy of forgotten items (Mcollapsed = 66.7, SD = 20.7, CI95 = 59.3, 74.1; see Table 3 for the respective means). Moreover, there was a significant interaction of group 9 item type, F(1,30) = 8.00; p \ .01; g2 = .04. The discrepancy was more pronounced in the external estimation group. Taken together, the findings support hypothesis 3. Reminiscent details were perceived as less credible than forgotten items. The fourth and fifth hypotheses suggested that observed accuracies of the various item types were assumed to differ to a significantly lower extent than estimated accuracies, if at all. Two separate repeated measures analyses of variance were conducted to test each hypothesis. Item type (reminiscent vs. consistent and reminiscent vs. forgotten, respectively) was entered as within subjects factor, group (estimation vs. observation) as between subjects factor. When comparing reminiscent and consistent items (H4), both, the main effect of item type, F(1,46) = 46.25; p \ .001; g2 = .42, and the main effect of group F(1,46) = 97,49; p \ .001; g2 = .68, reached significance. More importantly, however, the proposed significant interaction was significant, F(1,46) = 18.46; p \ .001; g2 = .17. Simple comparisons revealed that estimated accuracy of consistent items (Mcollapsed = 57.5, SD = 25.1, CI95 = 48.7, 66.3) exceeded estimated accuracy of reminiscent items, (Mcollapsed = 19.0; SD = 15.9, CI95 = 13.4, 24.6), t(31) = 9.54; p \ .001; d = 1.84, while the empirically observed accuracy did not significantly differ between
123
reminiscent (M = 83.9, SD = 25.1, CI95 = 71.3, 96.5) and consistent items (M = 92.6, SD = 4.28, CI95 = 90.5, 94.7), t(15) = 1.56; p = .14. It seems noteworthy, that the observation–estimation–discrepancy could be obtained for both item types. Participants not only underestimated accuracy of reminiscent details (see H1), but underestimated accuracy of consistent items, t(46) = 7.69; p \ .001; d = 1.95. As indicated by the significant interaction, however, the difference was more pronounced for reminiscent than for consistent details. Taken together, the findings conform to H4—participants viewed reminiscent items as less credible than consistent items, yet observed accuracy did not significantly differ. In the same analysis for reminiscent and forgotten items (H5) the predicted significant interaction also appeared, F(1,46) = 37.80; p \ .001; g2 = .25. Again, simple comparisons revealed that estimated accuracy of forgotten items (Mcollapsed = 66.7, SD = 20.7, CI95 = 59.3, 74.1) was significantly higher than estimated accuracy of reminiscent items (Mcollapsed = 19.0, SD = 15.9, CI95 = 13.4, 24.6), t(31) = 11.74; p \ .001, d = 2.59, while observed accuracy did not significantly differ between reminiscent (M = 83.9, SD = 25.1, CI95 = 71.3, 96.5) and forgotten items (M = 90.6, SD = 16.6, CI95 = 82.3, 98.9), t(15) = 1.40; p = .18. Again, the observation–estimation–discrepancy could also be demonstrated for forgotten items, t(46) = 4.31; p \ .001, d = 1.27. This difference (23.8), however, was less pronounced than for reminiscent details (64.9). In summary, the results confirm H5—participants perceived reminiscent items as less accurate than forgotten items, though they actually were not. Additional Analyses. The deviation of both estimated and observed accuracy of all item types from the 50% level were analyzed, in order to reflect the maximum rate of accuracy for contradictions. One-sample t tests yielded significant results for each item type in the observation condition. More precisely, accuracy of consistent, t(15) = 39.75; p \ .001, reminiscent, t(15) = 5.39; p \ .001, as well as forgotten items, t(15) = 9.78; p \ .001, was significantly higher. Regarding the estimated rates of accuracy, the analyses revealed significant deviations for
Law Hum Behav
reminiscent, t(31) = 11.07; p \ .001, as well as forgotten items, t(31) = 4.58; p \ .001. However, while evaluations of reminiscent items fell below the criterion, accuracy of forgotten items significantly exceeded the 50% value. Estimated accuracy of consistently recalled items was also only marginally above the threshold, t(31) = 1.69; p = .1. It has been reported above that average accuracy of both memory tests did not significantly differ. Participants’ estimations, however, did show a difference. Although initial accuracy (t1; M = 66.8, SD = 10.8, CI95 = 61.8, 71.8) was already significantly lower than actually observed accuracy, (M = 94.4, SD = 5.8, CI95 = 91.8, 97.0), t(36) = 9.8; p \ .001, d = 3.18, estimations further decreased significantly with respect to t2 reports (M = 43.3, SD = 14.4, CI95 = 35.0, 51.6), t(11) = 7.82; p \ .001, d = 1.77. In a repeated measures analysis of variance the interaction proved to be significant, F(1,26) = 57.56, p \ .001; g2 = .36. Last but not least, the number of inconsistencies was evaluated to determine whether they predicted overall accuracy or accuracy of consistent items. In separate linear regression analyses, either overall accuracy at t2 or accuracy of consistently recalled items was regressed on t1 accuracy, amount of details reported at t1 and t2 and percentage of consistent, reminiscent, and forgotten items recalled (method: stepwise). The inclusion/exclusion of the outlier varied the results of overall accuracy. When including the outlier, both, accuracy of t1, b = .76; t = 8.5; p \ .001, as well as the percentage of reminiscent details significantly predicted overall accuracy, b = -.29; t = 3.31; p \ .01; F(2,13) = 87.79; p \ .001; R2corr = .92. When excluded, however, t1 accuracy was the only significant predictor, b = .91; t = 7.94; p \ .001; F(1,13) = 63.11; p \ .001; R2corr = .82, while percentage of reminiscent details recalled was not, b = -.16; t = 1.46; p = .17. The negative relationship was mainly driven by one outlier. On the contrary, when regressing the accuracy of consistent items the percentage of reminiscent details recalled was not predictive, regardless of whether the outlier was included, b = -.01; t = .03; p [ .9, or excluded, b = -.07; t = .3; p [ .7. Finally, the proportion of inconsistent statements (reminiscent and forgotten details) was also not predictive, b = -.20; t = .86; p [ .4. Discussion The findings of the second study replicate and extend the results from the first. Applying a more realistic setting improved the design validity, and reminiscence occurred frequently. In fact, there was not a single person who did not exhibit it. More importantly, the delayed recall of novel items was both reliably observed, and again, proved to be quite reliable. The present rates of accuracy for all types of
items fit well with the results reported by Gilbert & Fisher (2006). Nevertheless, they found reminiscent details to be significantly less accurate than forgotten details, which were significantly less accurate than consistent items. In contrast, this study revealed no significant differences among the various item types. Given the descriptive differences and the small sample size, however, further investigations are necessary for valid conclusions. The tremendous underestimation of accuracy made by the observing participants is most crucial for the present discussion. Actual accuracy was about more than four times higher than was expected. These results did not vary, regardless of whether participants experienced the retention interval themselves or conducted their estimations from a rather abstract external perspective. This is important for two reasons. First, it excludes differences in sample or compensation as a causal factor for the obtained underestimation (Study 1). Second, evaluating evidence in court is definitely different from perceiving the event in question. On the other hand, some jurors might have experienced comparable situations, which they may draw on when evaluating eyewitness accounts. In general, however, tremendous underestimation is likely to take place. Consistent with an implicit assumption of memories fading with the passage of time, expectations deviated most sharply from empirical facts in the case of reminiscence. Given its frequent occurrence, however, this is particularly important when weighing eyewitness accounts. Finally, inclusion of the outlier in regression analysis determined whether the proportion of reminiscent statements predicted accuracy of the whole account at t2. This finding suggests that cases involving an inordinate number of novel recollections may have different trends. In such instances, regarding reminiscence as discrediting might reflect reality more closely. Without further investigations that address this issue, however, this remains speculative. Nevertheless, accuracy of consistently recalled details was independent of proportion of reminiscent details. Whether or not percentage of reminiscent items may contribute to overall accuracy, there is no reason at all to discredit the entire testimony of a witness who provided novel details at a later date in time.
General Discussion Ideally, decision-making in legal contexts should be evidence based as well as objective. Focusing on a specific aspect, namely reminiscence, the reported findings question whether current practice lives up to this claim. Although references in jury instructions and trial practice manuals imply that inconsistencies in eyewitness accounts are indicative of inaccuracy (see also Fisher & Cutler,
123
Law Hum Behav
1995; Potter & Brewer, 1999), these beliefs are not empirically supported. By replicating and extending previous research, this paper elaborates on two main arguments. First, it criticizes the undifferentiated use of the term of inconsistencies. While doubts in the credibility of contradictory statements may be warranted, this lack of discrimination is the principal fault. Treating contradictions, forgetting and reminiscence as phenomena of the same kind is neither appropriate nor justifiable (see also Berman & Cutler, 1996; Brock, et al., 1999; Fisher, et al., 2009). Second, it suggests that the references in the legal system are based on naı¨ve theories about human memory functioning, and demonstrates, that they are wrong in the case of reminiscence. Contradicting the idea of memory fading with the passage of time (Fisher et al., 2009; Gilbert & Fisher, 2006), delayed novel recollections seem unreliable, yet they are not. In the present studies reminiscence proved to be highly accurate but was judged to be discrediting, more so than any other type of detail. This is even more important when considering the frequent occurrence of reminiscence (see also Fisher et al., 2009). As such, advising jurors to consider (general) consistency neither adds to the instruction to rely on common sense nor falls in line with the noble pursuit of truth. Considering the fact that these investigations are the first to document the large discrepancies between estimated and observed accuracy of reminiscent details, replications are desirable. Moreover, there are many open issues that deserve further investigation, such as extended retention intervals, age and status of the ‘‘eyewitness’’ as well as potential explanations for the delayed recall given by the witness (e.g., Jones, Williams, & Brewer, 2008), not to mention the factors that are known to foster incorrect memory reports such as leading questions (e.g., Garven, Wood, Malpass, & Shaw, 1998; Loftus, 1975) and the presentation of misleading details (Loftus, Miller, & Burns, 1978; Roediger III, Meade, & Bergman, 2001). Nevertheless, these findings point to an important issue. Decision-making in legal contexts results in momentous consequences for the involved persons’ lives and should therefore be based on evidence rather than misconceptions. Acknowledgments This research was partially supported by Grant Number GRK 772 by the German Research Foundation. I am grateful to Ronald Fisher, Susanne Haberstroh, Ulla Martens and two anonymous reviewers for their valuable comments on an earlier version of this paper.
References Bailey, F. L., & Rothblatt, H. B. (1985). Successful techniques for criminal trials. Rochester, NY: Lawyers Co-operative. Ballard, P. B. (1913). Obliviscence and reminiscence. British Journal of Psychology Monograph Supplements, 1, 1–82.
123
Berman, G. L., & Cutler, B. L. (1996). Effects of inconsistencies in eyewitness testimony on mock-juror decision making. Journal of Applied Psychology, 81, 170–177. doi:10.1037/0021-9010.81. 2.170. Berman, G. L., Narby, D. J., & Cutler, B. L. (1995). Effects of inconsistent eyewitness statements on mock-juror’s evaluations of the eyewitness, perceptions of defendant culpability and verdicts. Law and Human Behavior, 19, 79–88. doi:10.1007/BF01499074. Bluck, S., Levine, L. J., & Laulhere, T. M. (1999). Autobiographical remembering and hypermnesia: A comparison of older and younger adults. Psychology and Aging, 14, 671–682. doi: 10.1037/0882-7974.14.4.671. Brainerd, C. J., & Reyna, V. F. (2002). Fuzzy-Trace Theory and False Memory. Current Directions in Psychological Science, 5, 164–169. doi:10.1111/1467-8721.00192. Brewer, N., Potter, R., Fisher, R. P., Bond, N., & Luszcz, M. A. (1999). Beliefs and data on the relationship between consistency and accuracy of eyewitness testimony. Applied Cognitive Psychology, 13, 297–313. doi:10.1002/(SICI)1099-0720(199908)13: 4\297:AID-ACP578[3.0.CO;2-S. Brock, P., Fisher, R. P., & Cutler, B. L. (1999). Examining the cognitive interview in a double-test paradigm. Psychology, Crime & Law, 5, 29–45. doi:10.1080/10683169908414992. Buschke, H. (1974). Spontaneous remembering after recall failure. Science, 184, 579–581. doi:10.1126/science.184.4136.579. Dunning, D., & Stern, L. B. (1992). Examining the generality of eyewitness hypermnesia: A close look at time delay and question type. Applied Cognitive Psychology, 6, 643–657. doi:10.1002/ acp.2350060707. Ebbinghaus, H. (1885). U¨ber das Geda¨chtnis. Untersuchungen zur experimentellen Psychologie [Memory: A contribution to experimental psychology]. Leipzig: Duncker & Humblot. Ellison, L. (2001). The mosaic art? Cross-examination and the vulnerable witness. Legal Studies, 21, 353–375. doi:10.1111/ j.1748-121X.2001.tb00172.x. Erdelyi, M. H. (2010). The ups and downs of memory. American Psychologist, 65, 623–633. doi:10.1037/a0020440. Erdelyi, M. H., Buschke, H., & Finkelstein, S. (1977). Hypermnesia for Socratic stimuli: The growth of recall for an internally generated memory list abstracted from a series of riddles. Memory and Cognition, 5, 283–286. doi:10.3758/BF03197571. Erdelyi, M. H., & Stein, J. B. (1981). Recognition hypermnesia: The growth of recognition memory (d’) over time with repeated testing. Cognition, 9, 23–33. doi:10.1016/0010-0277(81)900 12-3. Fisher, R. P., Brewer, N., & Mitchell, G. (2009). The relation between consistency and accuracy of eyewitness testimony: Legal versus cognitive explanations. In T. Williamson, R. Bull, & T. Valentine (Eds.), Handbook of psychology of investigative interviewing: Current developments and future directions (pp. 121–136). Chichester, UK: John Wiley. Fisher, R. P., & Cutler, B. L. (1995). The relation between consistency and accuracy of eyewitness testimony. In G. Davies, S. Lloyd-Bostock, M. McMurran, & C. Wilson (Eds.), Psychology, law, and criminal justice: International developments in research and practice (pp. 21–28). Oxford: De Gruyter. Florida Supreme Court Standard Jury Instructions 3d (2009). Retrieved February 1, 2011, from www.floridasupremecourt. org/. Garven, S., Wood, J. M., Malpass, R. S., & Shaw, J. S, I. I. I. (1998). More than suggestion: The effect of interviewing techniques from the McMartin preschool case. Journal of Applied Psychology, 83, 347–359. doi:10.1037/0021-9010.83.3.347. Gilbert, J. A. E., & Fisher, R. P. (2006). The effects of varied retrieval cues on reminiscence in eyewitness memory. Applied Cognitive Psychology, 20, 723–739. doi:10.1002/acp1232.
Law Hum Behav Jones, E. E., Williams, K. D., & Brewer, N. (2008). ‘‘I had a confidence epiphany!’’: Obstacles to combating post-identification confidence inflation. Law and Human Behavior, 32, 164–176. doi:10.1007/s10979-007-9101-0. Kaze´n, M., & Solı´s-Macı´as, V. M. (1999). Recognition hypermnesia with repeated trials: Evidence for the alternative retrieval pathways hypothesis. British Journal of Psychology, 90, 405–424. Kleider, H. M., Pezdek, K., Goldinger, S. D., & Kirk, A. (2008). Schema-driven source misattribution errors: Remembering the expected from a witnessed event. Applied Cognitive Psychology, 22, 1–20. doi:10.1002/acp.1361. Leippe, M. R., & Romanczyk, A. (1989). Reactions to child (versus adult) eyewitnesses: The influence of juror’s preconceptions and witness behavior. Law and Human Behavior, 13, 103–132. doi: 10.1007/BF01055919. Loftus, E. F. (1975). Leading questions and the eyewitness report. Cognitive Psychology, 7, 560–572. doi:10.1016/0010-0285(75) 90023-7. Loftus, E. F., Miller, D. G., & Burns, H. J. (1978). Semantic integration of verbal information into a visual memory. Journal of Experimental Psychology: Human Learning and Memory, 4, 19–31. doi:10.1037/0278-7393.4.1.19. New York Criminal Jury Instructions 2d. (2007). Retrieved from www.nycourts.gov/cji/ on April 12, 2011. Ninth Circuit Model Criminal Jury Instructions. (2010). Retrieved February 1, 2011, from www.ce9.uscourts.gov/. Peterson, C., Moores, L., & White, G. (2001). Recounting the same events again and again: Children’s consistency across multiple interviews. Applied Cognitive Psychology, 15, 353–371. doi: 10.1002/acp.708. Potter, R., & Brewer, N. (1999). Perceptions of witness behaviouraccuracy relationships held by police, lawyers and mock-jurors. Psychiatry, Psychology and Law, 6, 97–103. doi:10.1080/ 13218719909524952. Prager, I. R., Moran, G., & Sanchez, J. (1996). Job analysis of felony assistant public defenders: The most important tasks and most useful knowledge, skills, and abilities. Psychology, Crime & Law, 3, 37–49. doi:10.1080/10683169608409793. Roediger III, H. L., & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational
practice. Perspectives on Psychological Science, 1, 181–210. doi:10.1111/j.1745-6916.2006.00012.x. Roediger, H. L, I. I. I., Meade, M. L., & Bergmann, E. T. (2001). Social contagion of memory. Psychonomic Bulletin & Review, 8, 365–371. doi:10.3758/BF03196174. Sixth Circuit Criminal Pattern Jury Instructions. (2005). Retrieved from www.ca6.uscourts.gov/internet/crim_jury_insts.htm/ on April 12, 2011. Smeets, T., Candel, I., & Merckelbach, H. (2004). Accuracy, completeness, and consistency of emotional memories. American Journal of Psychology, 117, 595–609. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning & Memory, 6, 174–215. doi:10.1037/02787393.6.2.174. Steward, M. S., Steward, D. S., Farquhar, L., Myers, J. E. B., Reinhart, M., Welker, J., … Ornstein, P. A. (1996). Interviewing young children about body touch and handling. Monographs of the Society for Research in Child Development, 61, 1–232. doi: 10.1111/j.1540-5834.1996.tb00554.x. Szpunar, K. K., McDermott, K. B., & Roediger, H. L. (2007). Expectation of final cumulative test enhances long-term retention. Memory & Cognition, 35, 1007–1013. doi:10.3758/ BF03193473. Tulving, E. (1967). The effects of presentation and recall of material in free-recall learning. Journal of Verbal Learning & Verbal Behavior, 6, 175–184. doi:10.1016/S0022-5371(67)80092-6. Tulving, E., & Pearlstone, Z. (1966). Availability versus accessibility of information in memory for words. Journal of Verbal Learning & Verbal Behavior, 5, 381–391. doi:10.1016/S0022-5371(66) 80048-8. Uviller, H. R. (1993). Credence, character, and the rules of evidence: Seeing through the liar’s tale. Duke Law Journal, 42, 776–832. van Giezen, A. E., Arensman, E., Spinhoven, P., & Wolters, G. (2005). Consistency of memory for emotionally arousing events: A review of prospective and experimental studies. Clinical Psychology Review, 25, 935–953. doi:10.1016/j.cpr.2005.04. 011.
123