Mem Cogn (2015) 43:85–98 DOI 10.3758/s13421-014-0453-7
Metamemory monitoring and control following retrieval practice for text Jeri L. Little & Mark A. McDaniel
Published online: 19 August 2014 # Psychonomic Society, Inc. 2014
Abstract Test-taking is assumed to help learners diagnose what they do and do not know, and by so doing improve the effectiveness of their subsequent study. Previous work has examined metamemory monitoring (e.g., predictions of future performance) and control (e.g., restudy decisions) following testing or retrieval practice with relatively simple materials (e.g., word pairs). There is reason to believe, however, that such monitoring and control decisions might be more difficult with text materials, even after retrieval practice, owing perhaps to difficulty in accurately assessing one’s performance on the retrieval-practice test. In two experiments, participants read texts about world regions, then engaged in retrieval practice or rereading of the information in those texts, made estimates about future performance, and then received an opportunity to restudy the texts before taking a final recall test, with self-paced restudy enabling an examination of control processes. Memory predictions were more accurate in the retrieval-practice than in the rereading condition, and learners in both conditions allocated restudy time on the basis of their predictions. Additionally, restudy provided a greater benefit following retrieval practice than following rereading. The present study has implications for how students can use retrieval practice with text to foster subsequent learning.
J. L. Little : M. A. McDaniel Department of Psychology, Washington University, St. Louis, MO, USA M. A. McDaniel Center for Integrative Research in Cognition, Learning, and Education, Washington University, St. Louis, MO, USA J. L. Little (*) Department of Psychology, Hillsdale College, 33 E. College Street, Hillsdale, MI 49242, USA e-mail:
[email protected]
Keywords Testing effects . Restudy . Metacognition . Metamemory . Monitoring Taking a test is beneficial for learning—and presumably for a variety of reasons. The notion that the processes involved in retrieving information improve its later recall (i.e., the testing effect) has received considerable empirical attention (see Roediger & Karpicke, 2006a, for a review). Several indirect benefits are also commonly assumed: Test-taking helps learners to diagnose what they do and do not know, and this in turn improves the effectiveness of subsequent study. These potential benefits of testing or retrieval practice, however, have received relatively little empirical attention, especially as they pertain to metamemory for text passages. In the present research, we investigated the metamemory insights that retrieval practice for text passages may foster, how such practice influences future study, and the extent to which it improves the effectiveness of future study.
Metamemory indices In theory, in order to make good study decisions, it is necessary for learners to (a) have an accurate representation of what they do and do not know and (b) use that knowledge appropriately to inform subsequent study, known within the metacognitive/metamemory literature as monitoring and control, respectively (e.g., Nelson & Narens, 1994). Monitoring is commonly measured in two ways: absolute and relative accuracy. Absolute accuracy measures the degree to which learners can judge how much they learned. Relative accuracy measures the degree to which learners can judge which information was better or less well learned. Although both measures can inform metacognitive control (e.g., restudy decisions), it is sensible that relative accuracy would better
86
inform how learners should allocate future study time among various to-be-learned information, whereas absolute accuracy would better inform the total amount of time learners should allocate to future study (see, e.g., Schwartz & Efklides, 2012). Metacognitive monitoring Learners often fail to make accurate judgments about study strategy effectiveness. For example, they often rely on ease of processing (e.g., feelings of fluency) and attribute higher ratings of learning to information that is more easily processed, even though such easily processed information is often not better learned (e.g., Dunlosky & Matvey, 2001; Rhodes & Castel, 2008). Conditions that necessitate deeper processing, however, tend to improve judgment accuracy (e.g., deWinstanley & Bjork, 2004; Thiede, Anderson, & Therriault, 2003), with some work specifying that metacognitive accuracy depends upon whether attention is directed toward the processing of information that will be relevant to a later test (Thomas & McDaniel, 2007). These findings have implications for differences in metacognitive monitoring following attempted retrieval versus rereading. Whereas trying to retrieve information involves elaborative processing (McDaniel & Masson, 1985) that should afford accurate monitoring, especially for a later recall test, rereading is less likely to do so, in part, because rereading might induce a sense of fluency (e.g., perceptual; Kolers, 1976) that impairs one’s ability to monitor accurately. Metamemory monitoring during or after retrieval practice or test taking has been studied with word-pair materials (e.g., Dunlosky & Nelson, 1992; Karpicke, 2009; King, Zechmeister, & Shaughnessy, 1980; Koriat & Bjork, 2006; Kornell & Rhodes, 2013; Kornell & Son, 2009; Lovelace, 1984; Soderstrom & Bjork, 2014; Tullis, Finley, & Benjamin, 2013). Retrieval practice or test opportunities often lead to more accurate judgments of learning than do restudy opportunities, in part, because testing helps learners to diagnose their learning (e.g., King et al., 1980). Additionally, learners often overweigh the benefit of restudy. Specifically, they often predict that they will remember information better if it was restudied than if it was tested (e.g., Kornell & Son, 2009), even though performance often shows the opposite pattern. It is commonsensical that testing should give learners an idea of what they know in educational contexts as well, and in fact, students report using testing for this purpose, and to a greater extent than for the direct benefits that testing provides (Hartwig & Dunlosky, 2012; Kornell & Bjork, 2007). There is reason to suspect, however, that learners would have difficulty evaluating their memory for information in text as compared to word pairs. In particular, we suggest that learners may have difficulty assessing the accuracy of their retrieval-practice performance. This suggestion follows from work by Dunlosky, Hartwig, Rawson, and Lipko (2011), who had learners study terms and their definitions, take a test, and then
Mem Cogn (2015) 43:85–98
grade their tests against a key. They found that even with a key to grade their responses, learners overestimated the accuracy of their responses. One difficulty in learners’ self-assessments was that definitions included several idea units, and learners often gave themselves full credit for a response when critical idea units were missing. The implication here is that if learners cannot accurately grade their responses when a key is provided, their likelihood of accurately judging retrieval-practice performance without a key, let alone without their retrieval responses, is even more unlikely (see Dunlosky, Rawson, & Middleton, 2005). We suggest, however, that even if learners greatly overestimate their overall memory for text, their absolute accuracy should still be better than that of learners who reread, and retrieval practice may still provide learners with a higher degree of relative accuracy than would rereading, as they may have a good idea of the sections for which they recalled the most information and the sections for which they recalled the least. The above expectation remains uninformed, however, as no studies have directly investigated accuracy of metamemory judgments as a consequence of retrieval practice versus rereading with text materials. Most related is a study in which Roediger and Karpicke (2006b) had participants repeatedly recall or reread a short passage and then rate their likelihood of future recall. They found that participants gave higher ratings in the repeated reading condition than in the repeated recall condition (suggesting that learners underestimate the benefit of testing relative to rereading overall); this study, however, did not address the extent to which learners make accurate predictions about what they know, either in terms of absolute or relative accuracy. Although little research has investigated metamemory with text, much research has investigated metacomprehension with text (i.e., predictions of future ability to answer inference questions; e.g., Dunlosky et al., 2005; Rawson, Dunlosky, & Thiede, 2000). Yet, none has done so in a manner that directly contrasts retrieval practice with rereading. Most related, Thiede and Anderson (2003; see also Anderson & Thiede, 2008) examined the effect of summarizing on metacomprehension, demonstrating that summary generation—which shares some commonalities with retrieval practice—can improve metacomprehension accuracy as compared to a no-reread control; a rereading control was not used. Rereading is an important comparison condition for both practical and theoretical reasons, however. In fact, although students report using testing for diagnosing learning, Kornell and Bjork (2007) found that 76 % of students reported using rereading—either of whole chapters or highlighted portions— for study (see also Hartwig & Dunlosky, 2012, for similar results). Furthermore, rereading has been shown to improve metacomprehension (Rawson et al., 2000). Theoretically, rereading improves metacomprehension because it leads learners to allocate relatively more resources to the situation
Mem Cogn (2015) 43:85–98
model (e.g., connection between ideas contained in the text and/or ideas in the text and prior knowledge; Kintsch, 1994) than to the textbase (e.g., details, meaning of sentences) during a second reading than a first (in an undergraduate sample; Stine-Morrow, Gagne, Morrow, & DeWall, 2004; though see Callender & McDaniel, 2009, who suggest that rereading may not improve situation model processing; and see Thiede & Anderson, 2003, for a similar idea with regard to effects of summarizing). Thus, in generalizing these findings to the questions we raise in the present work, the possibility emerges that retrieval practice may not be more beneficial than rereading in terms of metamemory accuracy. To recapitulate, there is a paucity of evidence regarding the extent to which retrieval practice improves learners’ metamemory monitoring accuracy for subsequent recall of text relative to students’ standard strategy of rereading. On the basis of the aforementioned studies utilizing relatively simple materials, one possibility is that retrieval practice will improve monitoring accuracy as compared to rereading, due to increased processing and learners’ ability to use their retrieval-practice experience to estimate learning. On this possibility, performance predictions could well be inflated in the retrieval-practice condition (Dunlosky et al., 2011), but they would still be sensitive to relative performance. The other possibility, based on extrapolation from the results of metacomprehension studies, is that retrieval practice may not produce better monitoring accuracy than rereading because both types of activities may increase situation-model processing. Metacognitive control The second question of interest in the present work was how retrieval practice, as compared to rereading, would influence restudy allocation and the benefit of restudy. When students study, they often use judgments or assumptions about what they do or do not know to decide what to restudy (e.g., Kornell & Metcalfe, 2006; Nelson, Dunlosky, Graf, & Nairnes, 1994). Existing work has focused largely on what people choose to study (Metcalfe & Kornell, 2005; but see Soderstrom & Bjork, 2014); little work has investigated how much time people spend restudying information, particularly text (but see Mazzoni & Cornoldi, 1993, who used transitive sentences). An examination of this latter situation is important because, as previously stated, students report rereading whole chapters. Our interest pertaining to metacognitive control in the present study was how participants would allocate time to restudying sentences in text passages as a function of their metamemory judgments and their performance (i.e., during retrieval practice) and how such restudy would influence finaltest performance.
87
Restudy allocation Much research has examined restudy decisions following metamemory judgments. Importantly, views vary regarding how learners allocate future study (see, e.g., Metcalfe & Kornell, 2005). Two prominent views are that (a) learners will allocate attention toward information given the lowest judgments of learning (JOLs) or that they cannot recall (i.e., discrepancy reduction; e.g., Dunlosky & Hertzog, 1998; Dunlosky & Thiede, 1998, Thiede et al., 2003), or (b) they will use a combination of JOLs and likelihood of future learning, allocating the most time to the easiest unknown information or information given moderate JOL ratings (i.e., region of proximal learning; Metcalfe & Kornell, 2005), perhaps not even allocating any study time to the most difficult items (e.g., Son & Metcalfe, 2000; Thiede & Dunlosky, 1999; but see also Ariel, Dunlosky, & Bailey, 2009, for agenda-based control in which participants allocate study time on the basis of specific goals). For the purposes of the present article, we will blend these two views into the informed-decision hypothesis, a broader view of Nelson and Leonesio’s (1988) monitoring-affectscontrol hypothesis, which here encompasses JOLs and retrieval-practice performance as predictors of study time allocation. This hypothesis predicts that the correlation between JOLs and restudy time and the correlation between retrieval-practice performance and restudy time in the retrieval-practice condition will both be negative. The notion that restudy time will be allocated on the basis of prior retrieval experience gains some support from previous research (Koriat & Bjork, 2006; Mazzoni & Cornoldi, 1993; Soderstrom & Bjork, 2014). In Mazzoni and Cornoldi’s Experiment 1, participants examined 40 sentences (e.g., “The bride is drinking coffee”) and made JOLs, tried to recall the sentences in a first free-recall test, restudied the sentences at their own pace, made a second set of JOLs, and finally took a free-recall test. Restudy time was predicted by test performance such that unrecalled sentences were studied longer than recalled ones. In their Experiment 4 (but not Exp. 1), restudy time was predicted by JOLs. If the informed decision hypothesis has merit for coherent text, then in the present experiments, both JOLs and retrieval-practice performance would be associated with restudy time decisions. It is possible that the just-mentioned patterns might not emerge, however. Because our sentences were contained within a coherent text (unlike Mazzoni & Cornoldi, 1993), it may be less likely that participants would realize, upon reading a sentence, that their memory for that sentence was low and allocate time accordingly, especially if they had an overall sense of fluency as a consequence of rereading the familiar sentences within context of other familiar information (Jacoby & Whitehouse, 1989). Related to this idea is the notion that participants can develop an “illusion of knowing.” Glenberg, Wilkinson, and Epstein (1982), for example, showed that
88
participants fail to find contradictions in text even when they rate their comprehension as high. A hypothesis based on fluency or an “illusion of knowing” would predict that neither JOLs nor retrieval-practice performance would be related to restudy time. If we were to find that metacognitive control is consistent with the informed-decision hypothesis, however, a critical question that follows would be whether restudy policies differ following retrieval practice versus rereading. Our hypothesis was that because participants would make JOLs for small sections of text in both retrieval-practice and rereading conditions, to the extent that they use JOLs to inform restudy, they would use those judgments to inform restudy similarly in both conditions. Metcalfe and Finn (2008) showed that study choices were directly related to JOLs, even when those JOLs were not representative of actual learning (see also Koriat & Bjork, 2006; Rhodes & Castel, 2008). Benefits of restudy Finally, we were interested in how restudy would affect performance in the retrieval-practice condition versus in the rereading condition. Our prediction (examined in Exp. 2) was that restudy would be more beneficial following retrieval practice than following rereading. Specifically, although we expected the restudy condition to show higher performance than a no-restudy condition in both retrieval-practice and reread conditions, we expected the difference to be greater in the retrieval-practice condition (e.g., following from the bifurcation model; Kornell, Bjork, & Garcia, 2011). Such a result would be consistent with recent work demonstrating testpotentiated learning (Arnold & McDermott, 2013, with free recall of related words) but has not, to our knowledge, been demonstrated with retrieval practice (i.e., free recall) of text materials.
Mem Cogn (2015) 43:85–98
Experiment 2 provided the clearest assessment of metamemory monitoring and the benefit of restudy.
Experiment 1 Method Participants A total of 81 participants (42 in the retrieval-practice condition, 39 in the reread condition) from the Washington University in St. Louis community participated individually for course credit or payment ($20 for two 1-h sessions). Materials We used six passages about regions of the world (Norway, Australia, Africa, Canada, Greenland, and Siberia), with each passage containing one section about each of three topics: geography, climate, and people (18 sections, average length = 134 words, SD = 19.8). Each section contained eight facts. Sections had an average coherence rating of .32 (SD = .12, range = .16–.65; latent semantic analysis: Landauer & Dumais, 1997; see also Landauer, Foltz, & Laham, 1998) and an average readability rating of 45.3 (SD = 10.55, range = 30.8–64.7; Flesch–Kincaide index: Flesch, 1948). An example of the three sections pertaining to one region (i.e., Norway) is presented in the Appendix. To foreshadow, readability ratings, coherence ratings, and number of words in each section did not predict average retrieval-practice performance, JOL ratings, or final-test performance for those sections, with one exception: the correlation between LSA scores and retrieval-practice performance was significant, r(18) = –.54, p = .02 (less coherent sections were associated with better retrieval-practice performance).
The present experiments Procedure In the present experiments, we examined how retrieval practice influences metamemory monitoring and control relative to rereading text. In two experiments, participants read about the geography, climate, and people of six regions of the world. They then either reread the texts or engaged in retrieval practice (i.e., free-recall test), estimated the number of facts they would recall 48 h later for each of the 18 sections using cue-only prompts (e.g., Norway: Climate), and restudied the information (for every region, studied sentence by sentence, in Exp. 1; or pertaining to only half of the regions, studied section by section in Exp. 2). After a 48-h delay, they took a final free-recall test. Experiment 1 provided the clearest examination of restudy allocation on the basis of retrieval-practice performance.
Participants had 12 min to read the passages, with each passage presented on a computer screen for 2 min (passages were always presented in the same order). After participants read all of the information, they were randomly assigned to review the information either with a free-recall test (i.e., retrieval-practice condition) or re-presentation (i.e., reread condition). Across conditions, half of the participants were randomly assigned to review information by region (i.e., in the same order as initial study); the remainder reviewed information by topic (e.g., geography). When reviewing by region, participants had 3 min to review each passage (i.e., retrieve or reread, depending on review condition), for a total of 18 min. Specifically, in the
Mem Cogn (2015) 43:85–98
retrieval-practice condition, participants were given six sheets of paper with the region designated on the top of each sheet and the topics (i.e., Geography, Climate, People) listed below and asked to write down as many facts as possible within the time limit. They were told that they should not worry about writing complete sentences. The names of the studied regions were presented on the computer screen for 3 min each, and a chime alerted participants when each 3-min interval elapsed. In the reread condition, participants were given six sheets of paper with the region designated on the top of each sheet and with the passage below and were provided with the same timing mechanism as were the participants in the retrievalpractice condition. When reviewing information by topic, participants had 6 min to review information pertaining to each of the three topics (Geography, Climate, People), for a total of 18 min. Specifically, in the retrieval-practice condition, participants were given three sheets of paper with the topic (e.g., Geography) designated on the sheet and the regions listed below and were asked to write down as many facts as possible within the time limit, and were also told that they should not worry about writing complete sentences. In the reread condition, participants were given three two-sheet packets, with each sheet designating the topic and with the sections of the passages pertaining to that topic below. The names of the topics were presented on the computer screen for 6 min each, and a chime alerted participants when each 6-min interval had elapsed. After the review activity, all participants were told that each topic section of each region had eight facts, and they were asked to estimate the number of facts that they would be able to recall for each of these sections when they returned in two days. Participants were given the name of the region and the topic (e.g., Norway: Geography) for each section, and these cues were presented in the same order they had been in during the review phase. Next, all participants were told that they would have one more chance to restudy the information; sentences would be presented one at a time on the computer screen, and they could spend as much or as little time as they wanted on each sentence, moving to the next sentence by pressing Enter. Sentences were presented in same order as in the initial reading, and the name of the specific section (e.g., Norway: Geography) preceded the eight sentences for each section. The amount of time that participants spent restudying each sentence was recorded, with a maximum of 30 s per sentence. The final free-recall test took place 48 h after the first session. Participants were provided with six sheets of paper, with a region designated on each sheet. They were given unlimited time to write down as many facts as they could remember in any order, but were told that they were required to try for at least 20 min.
89
Results Scoring On both the retrieval practice and the final test, each of the 144 facts received a score of 1, .5, or 0. Scores of .5 were awarded to responses that were correct but vague or correct excepting an omission of an important detail (e.g., stating that a large sandstone rock exists in the middle of Australia without providing the name of the formation, Uluru). In cases in which a “fact” contained multiple parts, and the recall of one part would diminish the likelihood of recalling the other part— due, for example, to redundancy in information—1 point was awarded for a correct response (e.g., recalling that Africa has over a billion people or 12 % of the Earth’s population would each earn 1 point). No points were awarded for responses that clearly contained commission errors. The scoring guide for the Norway passage is presented as Appendix-Table 1 . The first author scored the data for 75 % of the participants, and a research assistant scored the data for 50 % of the participants, with the data for 25 % of the participants being scored by both graders. Interrater reliability for the scoring of the overlapping retrieval-practice and final-test assessments was substantial, κ = .738, p < .001. The research assistant’s scoring was used for the twice-scored data, so that each scorer would provide half of the data in the analyses. We found no significant effects (either main effects or interactions) of the order in which information was reviewed during rereading or retrieval practice (i.e., by region or by topic) on judgments of learning, rereading time, or performance, and thus, this variable will not be discussed further. Recall performance Participants recalled 29.0 (SD = 11.8) facts during retrieval practice. On the final test, participants in the retrieval-practice condition recalled more facts (M = 33.7, SE = 2.2) than did participants in the restudy condition (M = 25.1, SE = 2.0), t(79) = 2.89, p < .01, d = 0.65. Metacognition Judgments of learning Participants in the retrieval-practice condition provided lower estimates of future recall (M = 49.6, SE = 2.9) than did participants in the reread condition (M = 79.7, SE = 3.3), t(79) = 6.91, p < .001, d = 2.7. Restudy time Total restudy time did not differ between the retrieval-practice condition (M = 12.9 min, SE = 0.8) and the reread condition (M = 12.5 min, SE = 0.9), t(78) = 0.39, p = .70.
90
Mem Cogn (2015) 43:85–98
Because sentence and section lengths varied, we divided the restudy time for each sentence or section by the number of words in that sentence or section, respectively, depending on the level of analysis. Subsequent analyses used restudy times in terms of milliseconds per word (ms/word; Maki & Serra, 1992).
received full credit, half credit, and zero credit, respectively, on the retrieval-practice test, demonstrating a clear linear trend, F(1, 41) = 7.83, ηp2 = .16, MSE = 4,655, p < .01. Planned pairwise comparisons suggest that facts that received full credit received less subsequent study time than did facts given zero credit, t(42) = 2.87, p < .01.
JOLs and restudy time We analyzed correlations between restudy times and JOLs. All of the following correlations are expressed in terms of gamma correlations (Goodman & Kruskal, 1954), due to the expected differences in performance and JOL ratings between the retrieval-practice and reread groups (see Nelson, 1984), and they were calculated on a participant-by-participant basis and then averaged across participants. First, we analyzed the correlations between restudy times and JOLs. Overall, the correlations (M = –.10) were significantly lower than zero, t(79) = 3.74, p < .001. The correlation in the retrieval-practice condition (M = –.13, SE = .04), however, was not different from that in the reread condition (M = –.07, SE = .04), t(77) = 1.07, p = .29, suggesting that although participants spent more time restudying information given lower JOLs, participants in the two conditions did not differ in the extents to which they relied upon JOLs to guide that restudy, as determined by our correlational measure. Importantly, however, gamma correlations could mask a nonlinear relationship between JOLs and restudy time—a relationship that could be consistent with a study policy based on something other than strict discrepancy reduction (e.g., region of proximal learning; Metcalfe & Kornell, 2005). Figure 1 plots the restudy times in the retrieval-practice and reread conditions for each section on the basis of the JOLs made for those sections, and as is indicated there, learners allocated the most time toward sections that were given moderately low—but not the lowest—JOLs.
JOLs and final-test performance Pertaining to absolute accuracy, participants in the retrieval-practice condition made section-by-section predictions that more closely resembled their actual performance (difference of 0.9 facts per section, SE = .19) than did participants in the reread condition (difference of 3.0 facts per section, SE = .19), t(79) = 7.95, p < .001, d = 1.77, with both groups overestimating their future recall. Pertaining to the relative accuracy of their judgments, correlations between section-by-section JOLs and section-by-section final-test performance were higher in the retrievalpractice condition (M = .45, SE = .03) than in the reread condition (M = .23, SE = .04), t(78) = 4.27, p < .001, d = 1.44. The interpretation of these analyses could be challenged because learners made JOLs before the final restudy opportunity. We consider this issue in the next section and provide a more direct assessment of monitoring in Experiment 2.
Retrieval-practice performance and restudy time The correlation between section-by-section retrieval-practice performance and restudy time was –.06 (SE = 0.03), which was reliably different from zero, t(41) = 2.25, p = .03, indicating that participants spent more time studying sections that were associated with lower initial recall. This association was marginally weaker, however, than the correlation between JOLs and restudy time reported above, t(42) = 1.77, p = .09. More informative for the purposes of examining retrieval-practice performance on restudy time, we also calculated the average restudy times (ms/word in each sentence) for each fact as a consequence of whether it earned 0, .5, or 1 point during retrieval practice, and we averaged those restudy times for each participant. Participants spent averages of 330 (SE = 21), 352 (SE = 27), and 372 ms/word (SE = 23) for facts that
Discussion In Experiment 1, restudy was differentially allocated on the basis of both JOLs and retrieval-practice performance,
Fig. 1 Average restudy time (in milliseconds/word) for each section as a function of the judgment-of-learning (JOL) rating (i.e., the number of items that participants thought they would be able to recall, on a scale of 0–8) provided for that section in the retrieval-practice and reread conditions in Experiment 1. JOL values with fewer than 12 observations were omitted. Error bars represent ±1 SE.
Mem Cogn (2015) 43:85–98
supporting the informed-decision hypothesis. Specifically, JOLs predicted restudy times for participants in both the retrieval-practice and reread conditions, and retrievalpractice performance predicted restudy times in the retrieval-practice condition. The observed association between JOLs and restudy time extends a similar pattern with isolated sentences (Mazzoni & Cornoldi, 1993, Exp. 4). The predictability of restudy time as a function of retrieval-practice performance is consistent with that of Soderstrom and Bjork (2014), who used pairedassociate materials. Despite the large differences in JOL magnitudes and accuracy between conditions, learners in the retrievalpractice and rereading conditions did not differ in the amounts of time that they spent restudying or in the extents to which their JOLs predicted restudy time allocation. That their total study times did not differ was especially surprising, given the large difference in JOLs (Schwartz & Efklides, 2012). Pertaining to monitoring, we found some preliminary evidence for the notion that testing improves metamemory accuracy, in terms of both absolute and relative accuracy. Even so, note that learners’ expected performance was about twice as high as their retrievalpractice performance in the retrieval-practice condition, suggesting that learners had inflated estimates of what they recalled, consistent with work pertaining to learners’ inability to accurately grade their definition responses (Dunlosky et al., 2011). More ambiguous is how to interpret the analysis pertaining to relative accuracy because learners made JOLs before a final restudy opportunity, which may have dampened the relationship between JOLs and test performance (monitoring-neutralization hypothesis; Nelson & Leonesio, 1988). Although the correlations here may not be entirely accurate measures of relative metamemory accuracy because restudy followed the JOLs, there is reason to believe that the large difference between the correlations may still be meaningful. Some work has suggested that the benefit of restudy would be greater following a test than following rereading (e.g., bifurcation model; Kornell et al., 2011). If this is the case, restudy might be more likely to reduce the predictive power of JOLs in the retrieval-practice condition than in the reread condition. Such a finding should only occur to the extent that learners allocate study toward information given low JOLs and effectively learn that information. Learners, however, may also learn information better (with restudy) when they know the information moderately well (and thus give it high JOLs). Furthermore, participants may allocate attention to unlearned information that they will not then learn, the socalled labor-in-vain effect (Nelson & Leonesio, 1988).
91
Due to the latter possibilities, we would expect little to no reduction in JOL predictability on final-test performance as a consequence of restudy, a prediction that we tested in Experiment 2.
Experiment 2 The goal of Experiment 2 was to extend the major findings of Experiment 1, as well as to test more directly our assumption that retrieval practice fosters better metamemory monitoring than does rereading. For this reason, we had participants make JOLs for all of the sections, but they only restudied information for half of the regions. This procedure also enabled us to test the hypothesis that restudy is more effective following retrieval practice than following rereading. Specifically, we predicted that the difference in finaltest performance between the restudy condition and the no-restudy condition would be greater following retrieval practice than following rereading. To additionally extend and clarify the Experiment 1 findings, we modified the materials and procedure in several other ways. First, a potential concern with our analysis of restudy times in Experiment 1 is that we transformed restudy times by dividing by the number of words, but number of words does not alone influence reading time. Thus, in Experiment 2 we adopted a different approach: Participants studied the passages for the first time at their own pace (section by section), so that we could compare their restudy times (also section by section) to their initial study times. Specifically, we conducted regression analyses for each participant, with the residuals serving as a measure of their restudy time (see, e.g., Ferreira & Clifton, 1986, Exp. 3). Second, we provided participants with more time during the review phase (24 min, as compared to 18 min in Exp. 1), and participants were allowed to recall or reread the information in any order that they chose (rather than in the two orders that we utilized in Exp. 1). Rereading or recalling in an unconstrained manner should provide a more conservative test of the benefit of retrieval practice because, in the case of Experiment 2, participants would be able to move between regions during rereading, which should provide them with a chance to compare and contrast the material, a process that might foster elaborative encoding. Finally, participants typed their responses in both conditions. The increase in time, as well as having participants type their responses, was intended to maximize participants’ ability to recall everything that they remembered (nevertheless, performance remained about the same as in Exp. 1).
92
Method Participants A total of 80 participants (40 in the retrieval-practice condition, 40 in the reread condition) from the Washington University in St. Louis community participated individually for course credit or payment ($20 for two 1-h sessions). Materials The materials from Experiment 1 were modified to control better for coherence and readability. Specifically, for each section, LSA ratings ranged from .25 to .35, Flesch– Kincaide readability ratings ranged from 45 to 65, and the numbers of words ranged from 93 to 122. Readability ratings, coherence ratings, and numbers of words in each section did not predict average retrieval-practice performance, JOL ratings, restudy time (residuals), or final-test performance for those sections. Procedure Participants read the passages, self-paced, one section (e.g., Norway: Geography) at a time, with a maximum of 60 s per section, and reading times were recorded. After participants had read all of the passages once, they were randomly assigned either to a retrieval-practice condition (i.e., freerecall test) or to a reread condition. Participants in the retrieval-practice condition were provided with an electronic document labeled with each region and the three topics for each region (e.g., Geography) on the computer screen, and they were told to type as much information as they could remember but that they need not write in complete sentences. They were given 24 min to do this task, and they were allowed to recall information in any order. Participants in the reread condition were given a six-page packet with the information for each region printed on a single page, and they were told that they would have 24 min to restudy the information in any order. After the review activity, all participants made JOLs in the same manner that was described in Experiment 1. Next, all participants were told that they would have one last chance to study half of the information; sections (e.g., Norway: Geography) would be presented one at a time on the computer screen, and they could spend as much or as little time as they wanted on each section, moving from one section to the next by pressing Enter. Sections were presented in the same order as in the initial reading (either the first or last three passages, counterbalanced across participants), with a maximum of 75 s per section. Restudy time was recorded. The final free-recall test took place 48 h after the first session. Participants were provided with an electronic document with the name of each region on the computer screen.
Mem Cogn (2015) 43:85–98
They were given unlimited time to type as many facts as they could remember in any order, but they were required to try for at least 20 min. Results Scoring The scoring scheme was the same as in Experiment 1. The data for all of the participants were scored by the first author only. Initial study time Initial study time did not differ between the participants in the retrieval-practice condition (M = 12.0 min, SE = 0.6) and those in the reread condition (M = 12.2 min, SE = 0.7), t(78) = 0.17, p = .87. Recall performance Participants recalled 31.1 (SD = 12.1) facts during retrieval practice. On the final test, participants in the retrieval-practice condition recalled more facts (M = 30.1, SE = 1.9) than did participants in the reread condition (M = 22.5, SE = 2.2), t(78) = 2.58, p = .01, d = 0.58. We found a reliable interaction, however, between review condition (retrieval practice vs. reread) and whether participants restudied the sections, F(1, 78) = 14.04, MSE = 0.10, ηp2 = .15, p < .001. Specifically, although performance was better in the restudy than the norestudy condition, the difference between no restudy (M = 12.1, SE = 0.9) and restudy (M = 18.0, SE = 1.1) was greater in the retrieval-practice condition, t(39) = 8.95, p < .001, d = 1.54, than in the reread condition (M = 10.0, SE = 1.2 vs. M = 12.6, SE = 1.1), t(39) = 4.35, p < .001, d = 0.69, suggesting that restudy was more effective following retrieval practice than following rereading. Metacognition Judgments of learning Participants in the retrieval-practice condition provided lower estimates of future recall (M = 45.3, SE = 2.6) than did participants in the reread condition (M = 69.1, SE = 3.6), t(72) = 5.33, p < .001, d = 1.2. Restudy time Total restudy time (for half of the regions) did not differ between the retrieval-practice condition (M = 6.6 min, SE = 0.4) and the reread condition (M = 5.8 min, SE = 0.4), t(77) = 1.41, p = .16. In the present experiment, we conducted a regression analysis for each participant based on the predictability of restudy time on the basis of initial study time, with the standardized residuals (i.e., deviations from initial study time) serving as
Mem Cogn (2015) 43:85–98
the dependent measure of interest (see, e.g., Ferreira & Clifton, 1986, Exp. 3). Twelve participants read the paragraphs for the full time allotted during initial study, restudy, or both, and thus were excluded from analyses pertaining to restudy times. JOLs and restudy time First, we analyzed correlations between restudy time residuals and JOLs. Overall, the average correlation between restudy-time residuals and JOLs (M = –.12) was significantly lower than zero, t(66) = 2.72, p < .001. The correlation in the retrieval-practice condition (M = –.12, SE = .06), however, was not different from that in the reread condition (M = –.12, SE = .07), t(64) = 0.02, p = .99, consistent with the results in Experiment 1. Figure 2 shows restudy-time residuals as a function of JOLs. Retrieval-practice performance and restudy time Section-bysection retrieval-practice performance, like JOLs, was related to restudy-time residuals (M = –.07, SE = .06). Although the correlation was numerically similar to that shown in Experiment 1, it was not different from zero in the present experiment, t(33) = 1.10, p = .28. JOLs and final-test performance Pertaining to absolute accuracy, participants in the retrieval-practice condition made section-by-section predictions that more closely resembled their actual performance (restudy, difference of 0.6 facts per section, SE = 0.16; no restudy, M = 1.3, SE = 0.17) than did participants in the reread condition (restudy, difference of 2.6 facts per section, SE = 0.23; no restudy, M = 3.1, SE = 0.23), F(1, 78) = 49.60, MSE = 2.96, ηp2 = .39, p < .001. Pertaining to the relative accuracy of their judgments, correlations between section-by-section JOLs and sectionby-section final-test performance were significantly higher in the retrieval-practice condition (restudy, M = .45, SE = .05; no restudy, M = .47, SE = .05) than in the reread condition (restudy, M = .20, SE = .07; no restudy, M = .23, SE = .05), F(1, 74) = 14.84, MSE = 0.16, ηp2 = .17, p < .001. Neither the presence of restudy nor the interaction between review condition and restudy was reliable, Fs < 1.
93
The results of Experiment 2 more clearly demonstrate that retrieval practice improves relative metamemory monitoring accuracy as compared to rereading. Interestingly, however, relative metamemory accuracy (in terms of the gamma correlations between JOLs and restudy) was not reliably lower as consequence of restudy, as compared to no restudy, in either the retrieval-practice or the reread group. This finding suggests that restudying after one makes JOLs does not reduce the predictive power of those JOLs, a finding that was anticipated in the Discussion of Experiment 1 and that we will return to in the General Discussion. In Experiment 2, we also demonstrated that restudy was more effective following retrieval practice than following rereading. Specifically, the difference between the no-restudy condition and the restudy condition was greater following retrieval practice than following rereading. At least two possibilities could explain this pattern. One is that more accurate metamemory produced more accurate modulation of restudy times across learned and unlearned material. Another possibility is that trying to retrieve information itself produced better subsequent restudy of that information—that is, test-potentiated learning (Arnold & McDermott, 2013; Kornell, Hays, & Bjork, 2009; see Kornell et al., 2011, for discussion of the bifurcation model). These possibilities are not mutually exclusive, and future research will be warranted to better understand why retrieval practice enhances restudy in this context.
General discussion Testing is presumed to improve learning in educational contexts, in part because it gives the learner metacognitive
Discussion Although the procedure changed in several ways between Experiments 1 and 2 (i.e., initial study and restudy occurred section by section; initial study was self-paced; participants could retrieve/reread information in any order, and extended time was provided to do so; and only half of the regions were restudied), we generally obtained the same pattern of results. Specifically, retrieval practice was better for metamemory accuracy (and recall performance) than was rereading, and JOLs informed restudy time consistent with the informed-decision hypothesis.
Fig. 2 Average restudy-time residuals for each section as a function of the judgment-of-learning (JOL) rating (i.e., the number of items that participants thought they would be able to recall, on a scale of 0–8) provided for that section in the retrieval-practice and reread conditions in Experiment 2. JOL values with fewer than 12 observations were omitted. Error bars represent ±1 SE.
94
insights that guide future study (Hartwig & Dunlosky, 2012; Kornell & Bjork, 2007; Roediger, Putnam, & Smith, 2011). With memory for text materials, these theoretical assumptions have remained largely untested; the present experiments, however, provide the needed support for these assumptions. Metamemory monitoring As compared to participants in the retrieval-practice condition, participants in the reread condition provided higher JOLs but had lower final-test performance (even with no differences in restudy time), a finding that is consistent with the previous literature (Roediger & Karpicke, 2006b). A novel contribution of the present study is that retrieval practice improved the accuracy, particularly the relative accuracy, of metamemory monitoring as compared to rereading, both when either retrieval practice or rereading occurred in a rigid section-bysection review (Exp. 1) and when it occurred in a more openended review (Exp. 2). The finding of improved JOLs and final-test performance resolution in the retrieval-practice condition, relative to the reread condition, was most clear in Experiment 2, in which participants made JOLs for passages that they were not able to restudy. Participants in the retrieval-practice condition did not, however, have completely accurate metamemory: Their predictions for future recall greatly exceeded their performance during retrieval practice, suggesting that they grossly overestimated what they had been able to recall. We predicted these inflated predictions on the basis of work showing that learners cannot accurately assess their answers, even when provided with a key (Dunlosky et al., 2011), and this finding demonstrates the importance of investigating the influence of retrieval practice on metamemory using text materials. With word pairs, it is much easier for learners to assess whether their responses are correct, although it should be noted that even with simpler materials, JOLs that depend heavily on retrieval processes (e.g., delayed JOLs) are not immune to overconfidence (Finn, 2008). We had participants make memory predictions for sections of text. Although the JOL cues (e.g., Norway: Geography) resembled cue-only JOLs (e.g., VIKING–?, as have been used in many delayed-JOL studies) in the sense that the target (text) was not provided and we had participants make judgments at a delay after encoding, our findings do not align with the typical delayed-JOL finding; that is, retrieval-practice JOLs and reread JOLs were not similarly accurate. Nelson and Dunlosky (1991; see also Dunlosky & Nelson, 1992) argued that delayed cue-only JOLs can lead to more accurate monitoring, because learners spontaneously retrieve information from long-term memory in order to make an informed JOL (i.e., the metamemory hypothesis). Another possibility is that the successful covert retrievals (which were assigned high JOLs) receive an additional opportunity for spaced practice,
Mem Cogn (2015) 43:85–98
whereas the unsuccessful retrievals (which were assigned low JOLs) do not (e.g., the memory hypothesis: Kimball & Metcalfe, 2003; Spellman & Bjork, 1992; but see Rhodes & Tauber, 2011, for a review of delayed JOLs on metacognitive accuracy). Critically, however, both the metamemory and memory accounts assume that learners covertly retrieve information while making JOLs, and that this covert retrieval is necessary for increased accuracy. Given our procedure, we think it unlikely that learners covertly recalled all of the information that they needed to make accurate JOLs because each JOL corresponded to eight facts. To understand better why JOLs are more accurate in our retrieval-practice condition than in our reread condition in the present experiments, one should examine how JOLs made to cue-plus-text (e.g., analogous to cue–target JOLs) would differ in retrieval-practice versus reread conditions or differ from the cue-only JOLs in the present study. Metamemory versus metacomprehension Research exploring the benefit of retrieval practice on metamemory has largely relied on paired associates (e.g., Dunlosky & Nelson, 1992; Karpicke, 2009; King et al., 1980; Kornell & Rhodes, 2013; Kornell & Son, 2009; Lovelace, 1984; Tullis et al., 2013). In contrast, in the present experiments, participants were presented with a large amount of confusable information pertaining to different regions of the world in the form of text, and our goal was to assess the extent to which retrieval practice would improve their memory and metamemory for details pertaining to the different regions. Nevertheless, one possibility was that retrieval practice would still foster better metamemory than would rereading, in line with prior work that has examined metamemory for paired associates. We considered an alternative possibility. Appealing to the metacomprehension literature, we developed the idea that rereading may be similar to retrieval practice in terms of overall metacognition, reasoning that these two types of studying practices may, in principle, foster similar processing. This notion stems from findings suggesting that both rereading (Stine-Morrow et al., 2004) and generative activities (e.g., keyword generation, Thiede et al., 2003; Thiede, Redford, Wiley, & Griffin, 2012; self-explanation, Griffin, Wiley, & Thiede, 2008; concept-map generation, Thiede, Griffin, Wiley, & Anderson, 2010; and summarization, Anderson & Thiede, 2008; Thiede & Anderson, 2003) lead learners to devote relatively more resources to processing of the situation model than to processing of the text base. We did not, however, find equivalent metamemory following retrieval practice and rereading. A way to reconcile our results with those from the metacomprehension literature would be to consider the relationship between the processing that our encoding
Mem Cogn (2015) 43:85–98
activities afforded and the processing required by our metacognitive measures and recall tests (e.g., Thomas & McDaniel, 2007). First, although retrieval practice and summarization are both generative activities, these two activities may foster different types of processing, at least in some circumstances. Because retrieval practice in the present work induced learners to rely upon the retrieval of facts owing, in part, to the nature of the materials, it may have fostered textbase processing rather than situation-model processing. In contrast, generative activities postulated to increase reliance on situation-model processing (i.e., concept-map generation, self-explanation, and summarization) induce the generation of relationships between information that foster increased comprehension, but not necessarily increased ability to recall details. In fact, Thiede and Anderson (2003; see also Anderson & Thiede, 2008) observed that although delayed summarization led to improved metacomprehension, immediate summarization did not. They pointed to participants’ generation of details during immediate recall, and gist during delayed recall, as the reason for the difference, with only gist recall supporting metacomprehension. Second, although metacomprehension and performance on an inferencebased test may be supported by increased situationmodel processing, metamemory and a final test where performance depends on the recall of details, which we utilized here, may not. Instead, metamemory judgments and a recall-based test may be supported by increased attention to textbase processing. The fact that rereading did not support metamemory or recall to the extent that retrieval practice did is consistent with the idea that rereading may not lead to an increase in textbase processing that fosters memory for details, at least not substantially (Callender & McDaniel, 2009).
Metacognitive control Metcalfe and Finn (2008) suggested that people use JOLs to inform what they choose to restudy, even when such JOLs do not reflect learning, and we found evidence consistent with this notion. Across the two studies, we observed that restudy time was predicted by JOLs (M = –.11, SE = .02), t(145) = 4.41, p < .001, in a manner consistent with the informed decision hypothesis, but that JOLs were not relied upon more in the retrieval-practice condition (M = –.13, SE = .4) than in the reread condition (M = –.09, SE = .03), t(143) = 0.64, p = .53, even though JOLs were much more accurate in the retrieval-practice than in the reread condition. We posited, on the basis of these data as well as of the work by Metcalfe and Finn, that participants use their JOLs to inform future study, regardless of how accurate those JOLs are, and that they
95
do so in a manner that allocates more time toward information that is given low ratings. The correlation between JOLs and restudy time was smaller than one might expect on the basis of a strict discrepancy reduction framework, and we suggest two reasons for why this might have been the case and for why we adopted a more generalized informed decision framework. First, JOLs were made on a section-by-section basis, but learners may have made more fine-grained restudy decisions on the basis of fact knowledge. In fact, in Experiment 1, we analyzed time per fact and found that learners spent significantly less time on facts that they did not recall than on facts that they did recall; thus, it is possible that JOLs for each fact would be much more highly predictive of restudy time, and examining this issue might be worthwhile for future research. Second, the small correlation may also be the result of learners using a strategy that was not aimed strictly at discrepancy reduction. The pattern of restudy times as a consequence of JOL ratings in Fig. 1, and to a lesser extent in Fig. 2, suggests that learners did not allocate the most time to the sections that were given the lowest JOLs, but instead allocated the most time to sections given low-tomoderate JOLs, suggesting, perhaps, that learners did not find it worthwhile to dedicate time to information in sections that they found particularly difficult, perhaps suggestive of a region-of-proximal-learning framework. The use of restudy time, rather than restudy decisions, has been uncommon in much of the previous literature on metamemory control. For practical and theoretical reasons, we argue that restudy time is an important measure to consider, however. Restudy time of sections of text, or even of sentences, may offer an opportunity to examine cognitive control in a more subtle manner than do decisions on what to restudy. For example, although students in educational contexts may make restudy decisions that involve the restudy of only some information, they often report rereading full chapters or sections of chapters (Kornell & Bjork, 2007). In this context, it is useful to understand the extent to which they modulate their study time on the basis of metacognitive insight or previous experience (e.g., practice test performance). Accordingly, future work might fruitfully explore restudy time as a measure of metacognitive control (see also Soderstrom & Bjork, 2014). But was restudy time allocated more effectively in the retrieval-practice condition than in the reread condition? It is clear from Experiment 2 that restudy was more beneficial for later performance in the retrieval-practice condition than in the reread condition, suggesting that participants used the restudy session more effectively in the former than in the latter condition. A remaining question is whether such benefits are the consequence of better-informed JOLs or of other processes afforded by retrieval practice. If, during restudy, participants only learned information to which they had given low JOLs,
96
Mem Cogn (2015) 43:85–98
one would expect the correlations between JOLs and final-test performance to be lower in the restudy condition than in the no-restudy condition. The fact that correlations were not reliably different in the two conditions, coupled with the finding that the restudy condition performed better on the final recall than did the no-restudy condition, suggests that even though participants spent more time during restudy on information to which they gave low JOLs, they did not necessarily learn disproportionately more of that information than information to which they gave high JOLs. In addition to the possibility that restudy was modulated by JOLs, which were more accurate in the retrieval-practice than in the reread condition, it is possible that restudy was improved following retrieval
practice without the effect being caused directly by the accuracy of the JOLs. In sum, students report rereading—either full texts or highlighted portions—as a favored study strategy, although they also report using retrieval-based study techniques (e.g., flashcards, practice tests) for monitoring purposes. We found that for text materials, retrieval practice improved learners’ ability to monitor their learning as compared to rereading, and restudy was more effective following such practice.
Table 1 Scoring guide for Norway passage
Appendix
Section
Fact
Answer
Geo
Clim
1 2 3 4 5 6 7 8 1
Ppl
2 3 4 5 6 7 8 1
Scandinavian Peninsula (western portion) larger than Italy/Great Britain mountainous/high terrain (uninhabitable) long /rugged coastlines fjord = narrow inlet with steep sides (glacial activity) Longest fjord = Sognefjorden/127 miles 50,000/thousands of islands Oslo = capital/southeast of country (Oslofjorden) variation in topography/variation in latitude (13 deg) = varied climate summers mild for latitude/warming Gulf Stream average high in capital = 70 warmest temp recorded = 96/Nesbyen coldest temp = –61 (north) rainfall heavy in west /88 in. average little snow on coast/more inland midnight sun May to late July/arctic circle Pop = 5 million
2 3 4 5 6 7 8
second least dense country in Europe 80 % live in urban areas predominantly Germanic Sami came 10,000 years ago from Asia Sami are indigenous/ live in far north/central increasing #s of immigrants, foreign workers, asylum-seekers: name 2 immigrants from Pakistan, Somalia, Iraq: name 2
To earn a full point for each fact, participants needed to recall the information presented in the table above, with the following exceptions. A slash mark indicates that participants would receive a full point for recalling one piece of information. Recall of information contained within parentheses was not necessary to earn a full point, but recalling that information would earn half a point. For the 7th and 8th facts in the “People” section, two of the three listed groups would need to be recalled for a full point; the recall of one listed group would earn half a point. Geo, Geography; Clim, Climate; Ppl, People
Author note J.L.L. is now at the Department of Psychology, Hillsdale College. We thank Gabrielle Dinkin for her help with data collection and coding of recall responses in Experiment 1. The research reported here was supported by a Collaborative Activity Award from the James S. McDonnell Foundation’s 21st Century Science Initiative in Bridging Brain, Mind and Behavior.
Below is the passage about Norway used in Experiment 1. Table 1 shows the scoring guide for the Norway passage.
Norway Geography Norway comprises the western portion of the Scandinavian Peninsula. Norway is larger than either Italy or Great Britain. Much of the country is dominated by mountainous or high terrain, making much of the land uninhabitable. The country has one of the longest and most rugged coastlines in the world. The rugged coastline is marked by fjords, which are long, narrow inlets with steep sides or cliffs, created in a valley carved out by glacial activity. The longest fjord in Norway is the Sognefjorden (127 miles). Additionally, there are some 50,000 islands off of the coastline. The capital, Oslo, is located in the southeast of the country, at the end of the Oslofjorden. Climate Norway’s climate shows great variation, due in part to the 13degree span in latitude and the rugged topology. Summers are remarkably mild for the latitude; Norway’s (reasonably) temperate climate is the result of the warming Gulf Stream. The average high temperature in the capital during summer is 70 degrees Fahrenheit. The warmest temperature ever recorded in Norway was 96 degrees Fahrenheit in Nesbyen (in the south). The winters, however, can be very cold: The coldest temperature recorded was –61 degrees in the north. Rainfall is very heavy in the west with an average of about 88 inches yearly. Although it doesn’t snow much along the western coast, the
Mem Cogn (2015) 43:85–98
inland areas—even just a short distance away—receive a lot of snow. Interestingly, from late May to late July, the sun never completely descends beneath the horizon in areas in Norway that lie north of the Arctic Circle, a phenomenon known as “midnight sun.” People Norway has a population of 5 million people, which is very small for its size. It is the second least densely populated country in Europe. Eighty percent of Norwegians live in urban areas. Ethnically, Norwegians are predominantly Germanic. There are also Sami communities; the Sami came to the area more than 10,000 years ago from central Asia. The Sami are considered an indigenous people and traditionally live in the central and northern parts of Norway. In recent years, Norway has become home to increasing numbers of immigrants, foreign workers, and asylum-seekers from various parts of the world. Immigrants from outside of Europe are primarily Pakistani, Somali, and Iraqi.
References Anderson, M. C. M., & Thiede, K. W. (2008). Why do delayed summaries improve metacomprehension accuracy? Acta Psychologica, 128, 110–118. doi:10.1016/j.actpsy.2007.10.006 Ariel, R., Dunlosky, J., & Bailey, H. (2009). Agenda-based regulation of study-time allocation: When agendas override item-based monitoring. Journal of Experimental Psychology: General, 138, 432–447. doi:10.1037/a0015928 Arnold, K. M., & McDermott, K. B. (2013). Test-potentiated learning: Distinguishing between direct and indirect effects of tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 940–945. doi:10.1037/a0029199 Callender, A. A., & McDaniel, M. A. (2009). The limited benefits of rereading educational texts. Contemporary Educational Psychology, 34, 30–41. deWinstanley, P. A., & Bjork, E. L. (2004). Processing strategies and the generation effect: Implications for making a better reader. Memory & Cognition, 32, 945–955. doi:10.3758/BF03196872 Dunlosky, J., Hartwig, M. K., Rawson, K. A., & Lipko, A. R. (2011). Improving college students’ evaluation of text learning using ideaunit standards. Quarterly Journal of Experimental Psychology, 64, 467–484. Dunlosky, J., & Hertzog, C. (1998). Training programs to improve learning in later adulthood: Helping older adults educate themselves. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 249–275). Mahwah: Erlbaum. Dunlosky, J., & Matvey, G. (2001). Empirical analysis of the intrinsic– extrinsic distinction of judgments of learning (JOLs): Effects of relatedness and serial position on JOLs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1180–1191. doi:10.1037/0278-7393.27.6.1180 Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for judgments of learning (JOL) and the delayed-JOL effect. Memory & Cognition, 20, 373–380.
97 Dunlosky, J., Rawson, K. A., & Middleton, E. L. (2005). What constrains the accuracy of metacomprehension judgments? Testing the transfer-appropriate-monitoring and accessibility hypotheses. Journal of Memory and Language, 52, 551–565. Dunlosky, J., & Thiede, K. W. (1998). What makes people study more? An evaluation of factors that affect self-paced study. Acta Psychologica, 98, 37–56. Ferreira, F., & Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 348–368. Finn, B. (2008). Framing effects on metacognitive monitoring and control. Memory & Cognition, 36, 813–821. doi:10.3758/MC.36.4.813 Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–223. Glenberg, A. M., Wilkinson, A. C., & Epstein, W. (1982). The illusion of knowing: Failure in the self-assessment of comprehension. Memory & Cognition, 10, 597–602. Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732–764. Griffin, T. D., Wiley, J., & Thiede, K. W. (2008). Individual differences, rereading, and self-explanation: Concurrent processing and cue validity as constraints on metacomprehension accuracy. Memory & Cognition, 36, 93–103. doi:10.3758/MC.36.1.93 Hartwig, M. K., & Dunlosky, J. (2012). Study strategies of college students: Are self-testing and scheduling related to achievement? Psychonomic Bulletin & Review, 19, 126–134. Jacoby, L. L., & Whitehouse, K. (1989). An illusion of memory: False recognition influenced by unconscious perception. Journal of Experimental Psychology: General, 118, 126–135. doi:10.1037/ 0096-3445.118.2.126 Karpicke, J. D. (2009). Metacognitive control and strategy selection: Deciding to practice retrieval during learning. Journal of Experimental Psychology: General, 138, 469–486. doi:10.1037/ a0017341 Kimball, D. R., & Metcalfe, J. (2003). Delaying judgments of learning affects memory, not metamemory. Memory & Cognition, 31, 918– 929. King, J. F., Zechmeister, E. B., & Shaughnessy, J. J. (1980). Judgments of knowing: The influence of retrieval-practice. American Journal of Psychology, 93, 329–343. Kintsch, W. (1994). Text comprehension, memory, and learning. American Psychologist, 49, 294–303. Kolers, P. A. (1976). Reading a year later. Journal of Experimental Psychology: Human Learning and Memory, 2, 554–565. doi:10. 1037/0278-7393.2.5.554 Koriat, A., & Bjork, R. A. (2006). Illusions of competence during study can be remedied by manipulations that enhance learners’ sensitivity to retrieval conditions at test. Memory & Cognition, 34, 959–972. doi:10.3758/BF03193244 Kornell, N., & Bjork, R. A. (2007). The promise and perils of selfregulated study. Psychonomic Bulletin & Review, 14, 219–224. doi:10.3758/BF03194055 Kornell, N., Bjork, R. A., & Garcia, M. A. (2011). Why tests appear to prevent forgetting: A distribution-based bifurcation model. Journal of Memory and Language, 65, 85–97. Kornell, N., Hays, M., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 35, 989–998. doi: 10.1037/a0015729 Kornell, N., & Metcalfe, J. (2006). Study efficacy and the region of proximal learning framework. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 609–622. doi: 10.1037/0278-7393.32.3.609 Kornell, N., & Rhodes, M. G. (2013). Feedback reduces the metacognitive benefit of tests. Journal of Experimental Psychology: Applied, 19, 1–13.
98 Kornell, N., & Son, L. K. (2009). Learners’ choices and beliefs about selftesting. Memory, 17, 493–501. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. doi:10.1037/0033-295X.104.2.211 Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to semantic analysis. Discourse Processes, 25, 259–284. doi:10.1080/ 01638539809545028 Lovelace, E. A. (1984). Metamemory: Monitoring future recallability during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 756–766. Maki, R. H., & Serra, M. (1992). The basis of test predictions for text material. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 116–126. doi:10.1037/0278-7393.18.1.116 Mazzoni, G., & Cornoldi, C. (1993). Strategies in study time allocation: Why is study time sometimes not effective? Journal of Experimental Psychology: General, 122, 47–60. McDaniel, M. A., & Masson, M. E. (1985). Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 371–385. doi:10.1037/ 0278-7393.11.2.371 Metcalfe, J., & Finn, B. (2008). Evidence that judgments of learning are causally related to study choice. Psychonomic Bulletin & Review, 15, 174–179. doi:10.3758/PBR.15.1.174 Metcalfe, J., & Kornell, N. (2005). A region of proximal learning model of study time allocation. Journal of Memory and Language, 52, 463–477. Nelson, T. O. (1984). A comparison of current measures of accuracy of feeling-of-knowing predictions. Psychological Bulletin, 95, 109– 133. doi:10.1037/0033-2909.95.1.109 Nelson, T. O., & Dunlosky, J. (1991). When people’s judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The “delayed-JOL effect”. Psychological Science, 2, 267– 270. Nelson, T. O., Dunlosky, J., Graf, A., & Nairnes, L. (1994). Utilization of metacognitive judgments in the allocation of study during multitrial learning. Psychological Science, 5, 207–213. Nelson, T. O., & Leonesio, R. J. (1988). Allocation of self-paced study time and the “labor-in-vain effect”. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 676–686. doi: 10.1037/0278-7393.14.4.676 Nelson, T. O., & Narens, L. (1994). Why investigate metacognition? In J. Metcalfe & A. P. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 1–25). Cambridge, MA: MIT Press. Rawson, K. A., Dunlosky, J., & Thiede, K. W. (2000). The rereading effect: Metacomprehension accuracy improves across reading trials. Memory & Cognition, 28, 1004–1010. Rhodes, M. G., & Castel, A. D. (2008). Memory predictions are influences by perceptual information: Evidence for metacognitive illusions. Journal of Experimental Psychology: General, 137, 615–625. doi:10.1037/a0013684 Rhodes, M. G., & Tauber, S. K. (2011). The influence of delaying judgments of learning on metacognitive accuracy: A meta-analytic
Mem Cogn (2015) 43:85–98 review. Psychological Bulletin, 137, 131–148. doi:10.1037/ a0021705 Roediger, H. L., III, & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. doi:10.1111/j. 1745-6916.2006.00012.x Roediger, H. L., III, & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. doi:10.1111/j.1467-9280.2006.01693.x Roediger, H. L., III, Putnam, A. L., & Smith, M. A. (2011). Ten benefits of testing and their applications to educational practice. In J. P. Mestre & B. H. Ross (Eds.), Psychology of learning and motivation: Cognition in education (Vol. 55, pp. 1–36). Oxford, UK: Elsevier. Schwartz, B. L., & Efklides, A. (2012). Metamemory and memory efficiency: Implications for student learning. Journal of Applied Research in Memory and Cognition, 1, 145–151. Soderstrom, N. C., & Bjork, R. A. (2014). Testing facilitates the regulation of subsequent study time. Journal of Memory and Language, 73, 99–115. Son, L. K., & Metcalfe, J. (2000). Metacognitive and control strategies in study-time allocation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 204–221. doi:10.1037/ 0278-7393.26.1.204 Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality: Judgments of learning may alter what they are intended to assess. Psychological Science, 3, 315–316. Stine-Morrow, E. A. L., Gagne, D. D., Morrow, D. G., & DeWall, B. H. (2004). Age differences in rereading. Memory & Cognition, 32, 696–710. doi:10.3758/BF03195860 Thiede, K. W., & Anderson, M. C. M. (2003). Summarizing can improve metacomprehension accuracy. Contemporary Educational Psychology, 28, 129–160. Thiede, K. W., Anderson, M. C. M., & Therriault, D. (2003). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95, 66–73. Thiede, K. W., & Dunlosky, J. (1999). Toward a general model of selfregulated study: An analysis of selection for items for study and selfpaced study time. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1024–1037. doi:10.1037/0278-7393. 25.4.1024 Thiede, K. W., Griffin, T., Wiley, J., & Anderson, M. C. M. (2010). Poor metacomprehension accuracy as a result of inappropriate cue use. Discourse Processes, 47, 331–362. Thiede, K. W., Redford, J. S., Wiley, J., & Griffin, T. D. (2012). Elementary school experience with comprehension testing may influence metacomprehension accuracy among seventh and eighth graders. Journal of Educational Psychology, 104, 554–564. Thomas, A. K., & McDaniel, M. A. (2007). Metacomprehension for educationally relevant materials: Dramatic effects of encodingretrieval interactions. Psychonomic Bulletin & Review, 14, 212–218. Tullis, J. G., Finley, J. R., & Benjamin, A. S. (2013). Metacognition of the testing effect: Guiding learners to predict the benefits of retrieval. Memory & Cognition, 41, 429–442. doi:10.3758/s13421-012-0274-5