6- and 8-year-olds’ performance evaluations: Do they differ between self and unknown others?

The current study investigated kindergarteners and second graders’ ability to monitor and evaluate their own and a virtual peer’s performance in a pai...

1 downloads 89 Views 1005KB Size

Download PDF

Metacognition Learning DOI 10.1007/s11409-017-9170-5

6- and 8-year-olds’ performance evaluations: Do they differ between self and unknown others? Nesrin Destan 1 & Manuela A. Spiess 1 & Anique de Bruin 2 & Mariëtte van Loon 1,2 & Claudia M. Roebers 1

Received: 4 May 2016 / Accepted: 11 April 2017 # Springer Science+Business Media New York 2017

Abstract The current study investigated kindergarteners and second graders’ ability to monitor and evaluate their own and a virtual peer’s performance in a paired-associate learning task. Participants provided confidence judgments (CJs) for their own responses and performancebased judgments (judgments provided after receiving feedback on their performance) for both their own and a virtual peer’s responses. For the performance-based judgments, children were confronted with their own or the peer’s answer as well as the correct answer. Additionally, participants were asked to credit their own and the peer’s correct and incorrect answers while facing feedback. Results indicate an age-related progression in metacognitive monitoring skills, with second graders differentiating more strongly in their confidence judgments between correct and incorrect responses compared to kindergarteners. Regarding performance-based judgments, children of both age groups provided higher judgments for correctly compared to incorrectly recognized items as well as for their own responses in comparison to the responses of the unknown child. Similarly, when crediting, participants of both age groups gave more credits for correct recognition than for incorrect recognition and for their own responses than for the peer’s responses. The significant interaction between age group and recognition accuracy for the crediting shows that second graders gave more credits for correctly recognized items while kindergarteners gave more credits for incorrect answers than the older children – primarily for their own incorrect answers. In conclusion, the study provides new insights into 6- and 8-yearolds’ evaluations of their own and an unknown child’s performance in a paired-associate learning task by showing that children of both age groups generally judged and credited responses in their own favor. These results add to our understanding of biases in children’s performance evaluations, including metacognitive judgments and judgments provided after receiving feedback.

* Nesrin Destan [email protected]

1

Department of Developmental Psychology and Center for Cognition, Learning & Memory, University of Bern, Hochschulzentrum vonRoll, Fabrikstrasse 8, 3012 Bern, Switzerland

2

School of Health Professions Education (SHE), Faculty of Health, Medicine and Life Sciences, Maastricht University, P. O. Box 616, 6200 MD Maastricht, The Netherlands

Keywords Metacognition . Monitoring . Confidence judgments . Crediting . Self versus other performance . Development

In everyday life, children face numerous situations requiring them to judge their performance or to evaluate their abilities accurately. For example, while cycling to school, they need to permanently monitor and adapt their speed to arrive in time and without causing an accident. Or, when preparing a test, they must self-assess their progress to decide whether to stop or continue their learning activities. These are examples of metacognitive monitoring, the ability to monitor and evaluate one’s own ongoing information processing or learning activities (Dunlosky and Metcalfe 2009; Flavell 1979). Distinctive, yet closely related to monitoring, is metacognitive control, the ability to plan and regulate ongoing cognitive processes. Knowing when one has learned material sufficiently to take a recall test or knowing which items will need additional study time, are examples of control processes. Together, monitoring and control represent core components of procedural metacognition (Brown 1978; Nelson and Narens 1990; Schneider and Lockl 2008). Yet, children’s – and even adults’ – metacognitive skills are often insufficient. It has been hypothesized that one reason for children’s metacognitive difficulties may lie in their still slightly immature ability to introspect. If immature introspection is a reason for children’s inaccurate metacognition, one question arising is whether these difficulties persist when children are asked to evaluate another child’s performance. One hypothesis is that children first learn to evaluate another child’s performance accurately and then are able use this experience to introspect on their own performance more objectively – in the sense of a developmental progression from external to internal regulation (e.g., Vygotsky 1978). Therefore, our study contrasts kindergarteners and second graders’ ability to monitor and evaluate their own and an unknown peer’s performance to gain insights into children’s development of self-evaluations.

The importance of accurately monitoring performance Accurately monitoring one’s own performance is directly linked to the first-order task performance (e.g., memory performance; Schneider and Lockl 2002). Thus, more accurate metacognition is associated with superior cognition while inadequate monitoring can lead to undesired, detrimental, or even dangerous outcomes (e.g., risk taking in traffic; Plumert and Schwebel 1997). Furthermore, poor monitoring impedes future learning or academic performance (Dunlosky and Rawson 2012; Roebers et al. 2014; van Loon et al. 2013). Given that even (young) adults often have marked difficulties judging their own performance (Dunning 2005; Dunlosky and Rawson 2012; Hacker et al. 2000), it is not surprising that children undergo fundamental developmental progression in these abilities (e.g., Schneider 2014). That is, young children’s monitoring and selfassessment skills improve throughout the elementary school years (e.g., Krebs and Roebers 2012; see Roebers 2014, for a review; Tsalas et al. 2015). Although a wealth of research has investigated children’s developing ability to assess and evaluate performance, less is known about factors potentially influencing these skills, for example wishful thinking (e.g., Schneider 1998) or effort attributions (Wellman 1985). If children’s judgments are affected by wishful thinking or effort attribution when assessing or evaluating their own performance, a follow up question is whether these factors also play a role when assessing the performance of another child. One possibility is that children might differentiate in their performance evaluations between their own and the other child by giving more credit to their own compared to the other child’s performance. Considering

6- and 8-Year-Olds’ Performance Evaluations

these open questions, pursuing an experimental approach, this study aimed to provide further answers about factors influencing the accuracy of children’s monitoring skills, not only when judging and crediting their own performance in a paired-associate learning task, but also when judging and crediting the performance of an unknown peer.

Developing monitoring skills As indices of monitoring accuracy, we assessed confidence judgments (CJs) without feedback and performance-based judgments with feedback (i.e., judgments provided after being shown which items were correctly and incorrectly recognized). CJs reflect the degree of certainty in (one’s own) memory performance after retrieval and are gathered by asking the child to indicate how sure she or he is that a previously given answer was correct. Studies on children’s confidence ratings in the context of memory tasks have shown that young children tend to be overoptimistic regarding the accuracy of their own answers (Lipko et al. 2009; Lipko et al. 2012). At the same time, an increasing number of investigations shows that even 3-year-olds can differentiate in their monitoring between correct and incorrect responses, giving higher CJs to correct responses than to incorrect ones (e.g., Lyons and Ghetti 2011, 2013). This differentiation in CJs between correct and incorrect answers seems to improve with age (Lyons and Ghetti 2011) and has consistently been found to be due to a decreasing confidence in incorrect answers (e.g., Howie and Roebers 2007; Roebers 2002). For example, in the study by Roebers (2002), 8- and 10-year-olds, just like adults, provided higher confidence ratings for correct answers than for incorrect answers, suggesting that metacognitive differentiation is well at work in this age range. At the same time, however, the 8and 10-year-olds provided significantly higher confidence judgments for incorrect answers in comparison to adults, resulting in less fine-tuned and less accurate metacognitive differentiation. Thus, elementary school children seem to have more difficulties in judging their uncertainty than their certainty, by being overly overconfident in their incorrect answers. It has therefore been argued that monitoring uncertainty (i.e., how unsure one is about the answer) is more demanding than reporting how sure one is that the given answer is correct, and might therefore develop later in childhood (e.g., Roebers 2002; Schneider and Laurion 1993). The handling of uncertainty likely improves with the experience gained throughout the school years. Young children might not be skilled enough to consider performance because they lack experience with facing errors and receiving feedback about objective task accuracy. Although only few studies examined children’s evaluations of performance after receiving feedback, young children often demonstrate difficulties integrating feedback in their self-assessments, remaining overconfident even after several trials (e.g., Lipko et al. 2012). There seems to be an increase in the amount of objective feedback children receive during the elementary school years (Stipek and Tannatt 1984). While preschoolers receive mainly social as well as effortrelated reinforcement, one expects a change in the way children process and evaluate feedback due to an increase in performance-related feedback once they are in school where they receive formal instructions and grades. Thus, children might integrate such (positive and negative) feedback more systematically with growing age and experience.

Which factors influence children’s monitoring accuracy? One factor which has been shown to influence young children’s monitoring accuracy is wishful thinking, describing the difficulties in discriminating between one’s own wishes and expectations when making performance predictions or evaluations (Piaget 1930; Schneider

1998; Stipek 1984). In this line of thought, children base their predictions on “what they desire to achieve” rather than what can be expected realistically from their actual task performance. Consequently, future performance expectations are based on how good children would like it to be, often resulting in overoptimistic self-assessments (Bernard et al. 2016; Stipek 1984; Schneider 1998). Schneider (1998) tested the wishful thinking hypothesis by investigating 4and 6-year-olds’ performance predictions in a motor task and in a memory task. In both tasks, children monitored their performance accurately, but demonstrated difficulties in differentiating between their wishes and expectations. Interestingly, 4- and 6-year-olds provided more accurate predictions for the peer’s performance than for their own, but only in the motor task. Following-up on the wishful thinking hypothesis, Visé and Schneider (2000) also included third graders and additionally investigated the effort attribution hypothesis (Wellman 1985), postulating that young children might overestimate their performance because they expect their effort to be particularly influential, regardless of their actual score. By considering effort as a primary cause of outcome, children might overlook other influential factors such as task difficulty, task familiarity, or general intellectual ability. Like Schneider’s study (Schneider 1998), the 4- and 6-year-olds in Visé and Schneider’s (2000) investigation showed difficulties distinguishing wish from expectation. This was also true for the participating third graders. Regarding the effort attribution hypothesis, only the two younger age groups overestimated the influence of effort on their memory performance, while the third graders seemed to integrate additional factors for their self-assessments (e.g., number of items to be learned). Overall, the understanding of effort and ability develops dramatically in the early elementary school years (Nicholls 1978). At the same time, self-efficacy, an individual’s belief about its own capabilities for reaching a goal, might mediate the relation between effort and ability in children as well as adults (e.g., Bandura 1977; Schunk 1983). If one believes to have the ability to reach a predefined goal, motivation is increased and the individual works harder, persists longer in the task and thus is likely to exert more effort than when self-efficacy beliefs are low (Bandura 1977).

Monitoring one’s own versus a peer’s performance Not only children, but human individuals in general, tend to perceive themselves as superior to others which has been called the “Above-Average-Effect” (e.g., Dunning 2005). This seems to be due to individuals’ general tendency to be egocentric, focusing mainly on their own skills without considering the skills of their peers (i.e., the comparison group). In domains in which people have good skills, they will easily see themselves as above average in comparison to their peers. In more demanding and challenging domains, however, focusing solely on one’s own skills will result in a “below-average-effect”. In other words, the difficulty of the domain to be assessed has been shown to influence an individual’s ability judgments (Kruger 1999). For young children, like the participants in our study, considering the peer’s level of ability when evaluating their own (and the unknown child’s ability) might be even more challenging than for adults, as their judgments and credits are likely to be influenced by a strong wish to perform well and, at the same time, outperform others. Since young children have been shown to assume that trying hard (i.e., effort) leads to good performance (see effort attribution hypothesis; Wellman 1985), they might give higher judgments and reward themselves with more credits than older children. This is because they are convinced to have given their best and are driven by the wish to be “best”, while they cannot know for sure about the amount of effort invested by the other (personally unknown) child. In the case of the other child, consequently, their judgment regarding invested effort would rather be based on a “guess”

6- and 8-Year-Olds’ Performance Evaluations

on how hard the unknown peer tried, possibly also influenced by a more or less conscious wish to outperform the other child. As this is just one possible interpretation, it should obviously be treated with caution. Research focusing on children’s assessment of their own performance versus that of a peer is rare. In the study by Schneider (1998), 4- to 6-year-olds did not seem to have the same difficulties (i.e., overly optimistic judgments) when monitoring the performance of another child compared to their own. Stipek and Hoffman (1980) were among the first to investigate the ability to differentiate between self and other regarding metacognitive monitoring. Their 3- to 8-year-olds provided performance judgments (predictions) and allocated credits on a motor task (pulling a string on which a ball was balanced in a cart), both for their own and for another individual’s performance. In the self-condition, children played the game themselves. After playing, children provided three different types of performance predictions: Choose one explanation for their performance out of six options (e.g., luck, ability, effort), predict their performance on the next trial and assign stars to their performance. In the other condition, participants were shown a picture of a sex-matched child who performed at the same level and then asked to provide the same three types of performance predictions. Findings indicated that monitoring the other was easier than monitoring one’s own performance, with the 3- to 8-year-olds being particularly overoptimistic predicting their own performance, compared to predicting the peer’s performance. Even the 3- and 4-year-olds managed to use another child’s feedback to form adequate expectations for the other child’s future task performance. Thus, a child who previously failed in the task was expected to perform poorer in an upcoming trial. But, negative feedback on their own performance was not integrated in children’s subsequent performance predictions. In another study (Stipek and Tannatt 1984), 4- to 8-year-olds were asked to judge their own and a classmate’s abilities (e.g., Who is the best/worst thinker in class? Or, Who is the smartest in class?). For this, children were interviewed individually and shown pictures of all classmates to make their judgments. Smartness ratings for the child itself and the peers were provided by a chart with columns of 1 to 5 stars, 1 symbolizing “not smart” and 5 symbolizing “very smart”. After providing the ratings (i.e., “how many stars should you get?”), children were asked for an explanation (i.e., “why?”). Overall, even elementary school children showed difficulties with discriminating effort from the actual ability, following their assumed “rule” that “trying hard leads to positive outcome”. Many of the preschool-aged children used “smartness” and “likeable” interchangeably when explaining their judgments (for example, why a certain child is the worst thinker in class). Thus, a smart classmate was defined as “someone I like a lot” and “who is my friend”. While children on average rated themselves to be smarter than the classmate in all grades, smartness self-ratings declined with age, but ratings for the peers remained at about the same level in all age groups. When assessing their own and another child’s abilities, older children more often considered factors such as task difficulty and social comparisons than younger participants. Generally, results support the idea of a less biased and more mature assessment of another person’s ability compared to self-evaluations, with an approximation between self and other judgments with growing age. Okita (2014) hypothesized that children learn to self-monitor by monitoring other people (e.g., a computer character), or by observing other people during self-monitoring. She postulated that with practice and observation of others, external monitoring (of another individual or computer character), which is supposed to be easier, is turned inward and thus applied to the “self”. In line with her theoretical assumption, 9- to 11-year-olds were better at monitoring the other (computer character) in mathematical tasks, such that it was easier for participants to identify mistakes made by the computer character than their own mistakes. Additionally,

practicing with a computer character assisted participants in learning to correct their own mistakes and improved their performance in mathematics. Similarly, Paulus et al. (2014) examined how accurate participants of different ages (primary school, adolescents, adults) are in judging another person’s learning and memory processes during a paired-associate learning task. Children aged 6 to 7 and 8 to 10 years demonstrated difficulties with evaluating an observed adult’s performance (on video). Furthermore, having versus not having performed the task themselves influenced participants’ evaluations of the other individual, with first-hand self-experience leading to more adequate monitoring judgments. Consequently, one’s own metacognitive knowledge and experience was helpful in judging the other’s performance more accurately. According to the authors of this study, children’s understanding of others’ learning and the accurate evaluation of their abilities is important for successfully coordinating selfregulated collaborative learning at school. Overall, results indicate that children seem to be more accurate when judging another individual’s performance, and that overoptimistic self-evaluations decline with age. In the case of evaluating another individual, Heckhausen (1984) argued that when asking children to assess another child’s performance, they cannot know about the effort the other had invested and thus need to rely on what can be observed (for example, whether they got an answer wrong or right). Nevertheless, so far no study has directly compared children’s evaluations of their own and a peer’s performance with multiple measures (judgments and credits) in a paired-associate learning task. First, it is of interest to see whether children make a difference when evaluating their own versus the personally unknown child’s performance, for example, by giving more credits for their own incorrect answers, and whether there are age-related differences in this respect. Second, by providing feedback on the correctness of the child and the peer’s performance, it is interesting to see if this feedback influences children’s evaluation in the first place and if it influences the evaluation of the self differently from that of the unknown peer. By including the evaluation of another, same-aged and same-gender child, we hope to gain insights into children’s abilities of assessing and evaluating another child and how these abilities might change across development. Finally, children’s ability to consider another person’s knowledge and skills is beneficial for successful joint-learning at school (e.g., Sebanz et al. 2006) and therefore is worth to be investigated. Considering that children may be able to judge a peer’s performance more accurately than their own because their self-assessments might be influenced by the two factors mentioned, we tested children’s ability to monitor and evaluate their own and a virtual peer’s performance. This might help to explore whether children differentiate in their monitoring between self and unknown others and bring about new insights into developing monitoring skills. One way to quantify effort attribution is to assess the amount of reward (for example, in the form of credits) children give themselves and others for their performance. Knowing when and to what extent children credit their own and a peer’s performance and whether they differentiate in their crediting between self and other and between correct and incorrect recognition, provides explanations on what might potentially influence young children’s monitoring skills. In this sense, the additional evaluation of another child’s performance in the current study symbolizes the experimental vehicle to improve our understanding of children’s self-evaluations.

The present study We investigated kindergarteners’ and second graders’ ability to monitor their own and an unknown virtual peer’s performance in a paired-associate learning task. To additionally

6- and 8-Year-Olds’ Performance Evaluations

quantify the degree of effort attribution, we also examined how children credit their own and the peer’s performance. Based on the existing literature on the development of metacognitive monitoring skills (e.g., Lyons and Ghetti 2011, 2013; see Roebers 2014; Schneider 2014, for a review), the following hypotheses were tested: 1. Confidence judgments (CJs) without feedback for the self: Firstly, we expected that all children differentiate in their confidence judgments (CJs) between their own correct and incorrect responses, giving higher CJs to correct than to incorrect answers, without receiving feedback on their performance. Secondly, we expected that second graders would show a more pronounced metacognitive differentiation between correct and incorrect answers compared to kindergarteners. 2. Performance-based judgments (with feedback) for the self and other: Even in the face of objective information about the accuracy of a response, the younger age-group (kindergarteners) was expected to provide higher performance-based judgments for their own responses than for the responses of the virtual peer due to a strong wish to perform well (and outperform others), being influenced by a wishful thinking bias (Schneider 1998; Stipek 1984). In this sense, younger children were assumed to be prone to an “unrealistic optimism” regarding their own performance, basing their judgments on what they wish to achieve as well as their wish to perform extraordinarily, rather than on the evidence of objective feedback. In case of the performance-based judgments for the peer, in contrast, kindergarteners were expected to provide lower, less “inflated” judgments, especially for the peer’s incorrect answers, resulting in more adequate and realistic evaluations (e.g., Schneider 1998). 3. Credits for the self and the other: For the self-condition, we hypothesized that both agegroups (kindergarteners and second graders) would differentiate in their credits between correct and incorrect answers, giving more credits for their correct than for their incorrect answers. Moreover, we expected the older children to differentiate more strongly in their credits, mirroring the typical age-related progression in metacognitive monitoring skills (e.g., Roebers 2014). Regarding the peer’s performance, we assumed that children of both age-groups would differentiate in the peer’s credits between correct and incorrect answers. Older children were expected to differentiate more strongly than younger children, which would also be in line with developmental changes in monitoring skills reported in previous studies (see Roebers 2014; for a review). Finally, in line with Stipek and Tannatt’s (1984) results, we predicted that participants would give more credits for their own answers compared to those of the personally unknown child, regardless of correctness and age.

Method Participants The final sample consisted of 101 children from two age groups, 48 kindergarteners (26 girls; Mage = 72.1 months, SD = 4.8 months) and 53 second graders (26 girls; Mage = 96.8 months, SD = 4.2 months). The initial pool included 108 children, seven were excluded due to missing data, comprehension difficulties of task instructions or because of self-determined termination.

Participants were recruited and tested in the German speaking part of Switzerland. All children were fluent in German. The ability to read Asian ideograms served as exclusion criterion to control for any prior knowledge. Trained experimenters tested participants individually in a quiet room of children’s kindergarten or school during two visits (Session A and B) lasting 20–30 min each and being generally one week apart. Parents gave written consent, children provided verbal assent. Participation was rewarded with a small gift.

Materials and procedure Stimuli Japanese characters called Kanji served as stimuli in a paired-associate learning task. These characters have been successfully used in previous studies with similar age groups (Destan et al. 2014; Destan and Roebers 2015; Roderer and Roebers 2010). The learning task included two sessions (A and B). Children learned ten (kindergarteners) / 12 Kanji (second graders) per session, thus, a total of 20 and 24 unique Kanji, respectively. Stimuli appeared randomly within the sessions. Procedure Firstly, the task quantifies young children’s ability to monitor their own performance and to credit their own performance (self-condition). Secondly, it assesses children’s ability to judge an unknown virtual peer’s monitoring skills and their ability to credit the performance of this peer (other-condition). Both sessions contained two conditions (self-condition; other-condition): During session A (self-other), all children completed first the self-condition, then the othercondition. During session B (other-self), all children completed the other-condition first, then the self-condition. The two conditions were equivalent in both sessions, but appeared in reversed order. The order of session A and B was not additionally randomized. The task appeared on a touch screen tablet (Acer Iconia W700, 11.6″), equipped with the EPrime software. Children were familiarized with the tablet’s touch function prior to testing. The paired-associate learning task included the phases Learning I and II, recognition, provision of confidence judgments before feedback and provision of performance-based judgments after feedback using a thermometer-scale, and the provision of credits using a dice-scale (also facing feedback). Phases are described below; depictions of the scales are presented in Fig. 1a and b.

Session a: Self-condition Task introduction Participants were taught that Japanese words are represented by symbols (Kanji) instead of words. Children practiced the procedure with three different Kanji as familiarization with the materials and task phases. Learning phases I and II The test started with two consecutive fixed-length study phases during which children had to learn ten (kindergarteners) or 12 (second graders) different Kanji and their meaning. During learning phase I, each Kanji appeared together with a pictorial image representing the meaning for 3 s. After learning phase I, the Kanji and its meaning were shown a second time (II), but now for only 2 s. Recognition After the two learning phases, children took a recognition test. Participants encountered all Kanji they had learned within one session. Kanji were presented next to four different pictures, one of which represented the correct meaning for the Kanji. Children were asked to choose the picture they believed to represent the correct meaning. We used the total

6- and 8-Year-Olds’ Performance Evaluations

Fig. 1 Procedure of the self-condition (a) and the other-condition (b) of the paired-associate learning task including the 7-point thermometer scale and the 7-point dice scale

number of correctly recognized Kanji across both sessions (20 or 24, in percentages) for the analyses reported below.

Thermometer scale Confidence judgments (CJs) and performance-based judgments were provided using a 7-point thermometer scale (see Fig. 1a and b), ranging from a dark blue button located on the left (symbolizing the “very unsure” pole) to a dark red button located on the right (symbolizing the “very sure” pole). There were two bluish and two reddish buttons, and one mixed-color button in between. The use of the scale was introduced using the “cold/warm hide and seek game” that is well known among Swiss children. The rationale behind this game is as follows: While seeking a hidden object, children are told “cold” as they move away from the object and “warm” as they get closer to the object. Consequently, the further away from the object, the less certain one is about the correct location of the object, and the closer to the object the more certain one is, respectively. In analogy to a conventional thermometer, blue represented cold/uncertainty, and red represented warm/certainty. Accordingly, children were to press the reddish buttons to indicate certainty, and the bluish buttons to indicate uncertainty. Thereby, the darker the color (or the closer to the two

poles), the higher the degree of un/certainty. The experimenter explicitly illustrated the analogy and checked the participant’s understanding of the scale. This was done using an example for the two poles and the middle button by asking three questions, including an easy one (i.e., “What’s the color of grass?”), a difficult one (i.e., “In numbers, how much hair do I have on my head?”, and one of medium difficulty (i.e., “How old am I?”). After responding, children had to indicate how sure they are to have responded correctly by pressing the requested button. All children learned the use of the scale on the tablet with ease.

Metacognitive confidence judgments (CJs) After the recognition test, children provided CJs for the whole set of Kanji (delayed CJs) using the thermometer scale. One Kanji and four alternative pictures appeared at a time, with a black frame surrounding the previously chosen picture. No feedback on performance accuracy was available to the child while providing the confidence judgments. For every Kanji, participants were asked to indicate how sure they are that their response surrounded by the frame was correct by pressing one of the seven buttons on the thermometer scale. CJs were scored from 0 (very unsure) to 6 (very sure) and mean CJs for correct and incorrect responses served as dependent measures. Performance-based judgments Next, children were asked to provide another judgment but this time while facing feedback (i.e., seeing whether their answer from the recognition test was correct or incorrect). For these performance-based judgments, participants saw their responses for every Kanji surrounded by a black frame and in addition, the truly correct response surrounded by a green frame (see Fig. 1a and b). In case of a correct response, these two frames overlapped. Participants provided a judgment on the same thermometer scale used for the confidence judgments, indicating how sure they are now that their response was correct, when facing the correct answer. Like the CJs, performance-based judgments were scored from 0 (very unsure) to 6 (very sure). As dependent measures, we used the mean performance-based judgments for correctly and incorrectly recognized Kanji. Self-crediting Finally, children had to give credits using a 7-point dice-scale (see Fig. 1a and b) while facing feedback on their performance accuracy. A 7-point dice-scale was chosen because dices are well-known among children from various board games and the concept of points on a dice can easily be adapted to the idea of providing credits for answers in the current task. To give credits for their answers, children saw each of the previously learned Kanji and four alternative pictures in randomized order. Again, a black frame surrounded the picture the child had previously indicated as the corresponding meaning of the Kanji. A green frame surrounded the correct meaning (again, in case of correct recognition, these two frames overlapped). Then, participants had to act as a teacher, giving themselves points for each response by using the dice-scale that ranged from 0 to 6 points. By using a 7-point dice scale, the credits could be compared to the CJs and performance-based judgments provided on the 7point thermometer scale. As dependent measures, we used the mean credits for correctly and incorrectly recognized Kanji across self-conditions A and B.

Session a: Other-condition Task introduction and performance-based judgments After a short break, children conducted the other-condition. Thereby, children provided performance-based judgments for

6- and 8-Year-Olds’ Performance Evaluations

a virtual peer and gave credits for the performance of this peer using the two scales (thermometer; dice) introduced in the self-sessions. First, a photograph of a boy/girl working on the same type of paired-associate learning task appeared on the tablet, matched to the participant’s sex and age. The experimenter explained that the peer who was named either Ida (girl) or Franz (boy), is a student at another kindergarten/school, who had learned the meaning of another set of Kanji and had taken a test, just like the participant did. After a short reintroduction of the thermometer, a Kanji with four alternative pictures appeared. Responses of the peer were also indicated by green and black frames. Children were then asked to report where the peer should have pressed on the thermometer scale. Half of the peer’s answers were depicted as correct (overlapping frames). Recognition accuracy of the virtual peer was set to be 50% in both age groups to obtain an even database. Mean performance-based judgments for correctly and incorrectly recognized Kanji were used as dependent measures.

Other-crediting Children were then asked to be a teacher, giving credits for each of the peer’s responses using the 7-point dice-scale. Again, half of the peer’s answers appeared as correct (overlapping frames). Importantly, a Kanji appearing as incorrectly recognized by the peer in the phase of providing performance-based judgments also appeared as incorrectly recognized in the crediting phase. Children repeated this same multi-phase procedure approximately one week afterwards, but now everyone started with the other-condition, followed by the self-condition (Session B).

Results Preliminary results A preliminary analysis of variance (ANOVA) revealed no significant effect of gender on the dependent measures examined (i.e., confidence judgments, performance-based judgments and credits in the self-condition and other-condition), Fs ≤ 3.04, and therefore, gender was not further considered in the analyses reported below. Similarly, the paired-samples t-test serving to rule out systematic differences in recognition accuracy between the two sessions (Session A and B), was not significant, neither for the kindergarteners t(47) = −1.41, p > .05, r = .21, nor for the second graders t(52) = −1.80, p > .05, r = .26. Thus, the following analyses were all conducted across both sessions. Table 1 provides the descriptive statistics of the dependent measures included in the analyses. Recognition accuracy We first examined potential age differences in the percentage of correctly recognized Kanji by running a one-way ANOVA with age group (kindergarten versus second grade) as the factor, and overall percentage of correct recognition across both sessions as the dependent measure. A significant age group effect resulted, F(1, 99) = 33.57, p < .001, η2 = 0.25, with second graders (M = 63.99, SD = 15.76) showing a significantly higher percentage of correctly recognized Kanji overall, compared to the kindergarteners (M = 44.69, SD = 17.73). Regardless of age, the recognition rate provides sufficient data for both correct and incorrect performance when addressing confidence and crediting as a function of accuracy. Metacognitive confidence judgments in the self-condition In a next step, we examined kindergarteners and second graders’ confidence judgments (0–6 on the thermometer scale) for correctly and incorrectly recognized Kanji. Therefore, we conducted a 2 (Age Group:

Table 1 Descriptives of the Dependent Measures for Age Group, Recognition Accuracy and Condition Correct recognition n Confidence judgments Kindergarten 48 2nd grade 52 Performance-based judgments Kindergarten Self 48 Other 48 2nd grade Self 53 Other 53 Credits Kindergarten Self 48 Other 48 2nd grade Self 53 Other 53

Incorrect recognition

M

SD

Min

Max

n

M

SD

Min

Max

4.26 4.87

1.37 0.84

0.67 2.22

6.00 6.00

48 52

3.70 3.65

1.39 1.36

0.00 0.00

6.00 6.00

5.47 5.08 5.84 5.62

0.85 1.24 0.53 0.61

2.00 1.73 2.44 3.62

6.00 6.00 6.00 6.00

48 48 52 52

2.28 2.31 2.77 2.61

1.80 1.61 1.70 1.50

0.00 0.00 0.00 0.00

6.00 6.00 5.95 4.91

4.60 4.27 5.62 5.36

1.39 1.27 0.67 0.80

1.40 1.73 3.89 3.31

6.00 6.00 6.00 6.00

48 48 52 52

3.14 2.51 2.73 2.63

1.32 1.15 1.31 1.15

0.13 0.00 0.00 0.00

6.00 4.67 5.95 4.55

n = sample size, M = mean, SD = standard deviation, Min = minimum, Max = maximium represent the range within the sample

kindergarten versus second grade) × 2 (Recognition Accuracy: correct versus incorrect) mixed ANOVA on children’s confidence judgments (Fig. 2). Results revealed a significant main effect of recognition accuracy, F(1, 98) = 51.51, p < .001, η2 = 0.33, with children of both age groups giving significantly higher CJs to correctly recognized Kanji (M = 4.58, SD = 1.16) than to incorrectly recognized Kanji (M = 3.67, SD = 1.37). Even though the main effect of age group did not reach significance, F(1, 98) = 1.63, p = .21, η2 = .02, a significant interaction between recognition accuracy and age group resulted, F(1, 98) = 7.21, p < .01, η2 = .05. This interaction shows that second graders differentiated more strongly in their confidence judgments between correctly and incorrectly recognized Kanji (correct: M = 4.87, SD = 0.84; incorrect: M = 3.65, SD = 1.36), compared to kindergarteners (correct: M = 4.26, SD = 1.37; incorrect: M = 3.70, SD = 1.39). In summary, children from both age groups could monitor their level of confidence in relation to performance accuracy, giving higher confidence judgments to correctly recognized than to incorrectly recognized Kanji. Additionally, older children differentiated more strongly in their confidence ratings between correct and incorrect recognition than the kindergarteners.

Performance-based judgments in the self- and other-condition To examine whether children differentiate in their judgments between their own performance versus the performance of an unknown other child after receiving feedback on their recognition accuracy, performance-based judgments for correct and incorrect answers were compared across the two age groups (kindergarten versus second grade) and the two conditions (self versus other). Therefore, we conducted a 2 (Age Group: kindergarten versus second grade) × 2 (Recognition Accuracy: correct versus incorrect) × 2 (Condition: self versus other) mixed ANOVA on performance-based judgments (Fig. 3a and b). The analysis revealed a significant main effect of recognition accuracy, F(1, 98) = 248.03, p < .001, η2 = .72, such that children provided higher judgments after feedback for correct answers (M = 5.51, SD = .86) compared to incorrect answers (M = 2.50, SD = 1.65). Furthermore, there was a significant main effect of age group, F(1, 98) = 7.83, p < .01, η2 = .07, showing that second graders gave higher performance-based judgments (M = 4.21, SD = 1.09) compared to kindergarteners (M = 3.79, SD = 1.37). The main effect of condition (self versus other)

6- and 8-Year-Olds’ Performance Evaluations Correct 6 Confidence Judgments (mean)

Fig. 2 Mean confidence judgments (0–6) for correct and incorrect recognition for both age groups (kindergarten and 2nd grade) in the self-condition. Standard errors of the mean are represented in the figure by the error bars attached to each column

Incorrect

5 4 3 2 1 0 Kindergarten

2nd Grade Age Group

also reached significance, F(1, 98) = 6.93, p < .05, η2 = .07, such that children of both age groups provided significantly higher performance-based judgments for their own answers (M = 4.10, SD = 1.24) than for the answers of the virtual peer (M = 3.91, SD = 1.27). None of the interactions reached significance, Fs ≤ .02.

Crediting of answers in the self- and other-condition To analyze how children credit their own and a virtual peers’ performance after receiving feedback on recognition accuracy, another mixed ANOVA for the credits given for correct and incorrect responses was computed (see Fig. 4). We conducted a 2 (Age Group: kindergarten versus second grade) × 2 (Recognition Accuracy: correct versus incorrect) × 2 (Condition: self versus other) mixed ANOVA on children’s credits for correctly and incorrectly recognized Kanji (Fig. 4). The analysis revealed a significant main effect of recognition accuracy, F(1,98) = 221.24, p < .001, η2 = .66, with participants giving more credits for correct (M = 4.98, SD = 1.18) than for incorrect answers (M = 2.75, SD = 1.24). The main effect of age group also reached significance, F(1,98) = 10.77, p < .01, η2 = .10, showing that the second graders gave more credits overall (M = 4.08, SD = .98) than the kindergarteners (M = 3.63, SD = 1.28). Furthermore, there was a significant main effect of condition, F(1,98) = 14.21, p < .001, η2 = .12, such that children provided more credits for their own answers (M = 4.03, SD = 1.25) than for the answers of the peer (M = 3.73, SD = 1.16) after receiving feedback. Moderating the main effects, there was a significant interaction effect between recognition accuracy and age group, F(1,98) = 15.92, p < .001, η2 = .05. This indicates that credits for correct and incorrect answers differed between kindergarteners and second graders. Specifically, second graders gave more credits for correct answers (M = 5.48, SD = .73) and less credits for incorrect answers (M = 2.68, SD = 1.23) than kindergarteners (correct: M = 4.44, SD = 1.33; incorrect: M = 2.82, SD = 1.24). None of the other interactions reached significance, Fs ≤ 3.0. Taken together, kindergarteners and second graders gave more credits for their own responses than for the responses of the virtual peer. The significant interaction between recognition accuracy and age group shows that the second graders credited correct responses more than the kindergarteners, while the younger children tended to give more credits for incorrect responses, but only when their own incorrect answers were considered.

a Performance-based Judgments (mean)

Fig. 3 Mean performance-based judgments for correctly recognized Kanji in the self and other condition in kindergarteners and 2nd graders (a). Mean performancebased judgments for incorrectly recognized Kanji in the self and other condition in kindergarteners and 2nd graders (b). Standard errors of the mean are represented in the figure by the error bars attached to each column

Self 6

Other

5 4 3 2 1 0 Kindergarten

2nd Grade Age Group

Performance-based Judgments (mean)

b

Self 6

Other

5 4 3 2 1 0 Kindergarten

2nd Grade Age Group

Discussion This study examined kindergartener and second graders’ ability to monitor their own and an unknown other child’s performance in a paired-associate learning task. Both age groups studied the meanings of 20 (kindergarteners) or 24 (second graders) Japanese characters (Kanji) in two fixed study phases. They provided metacognitive confidence judgments for their own performance and performance-based judgments (after receiving feedback on recognition accuracy) for themselves and for the peer on a 7-point thermometer scale, ranging from very unsure (0) to very sure (6). Additionally, participants credited their own and the other child’s performance, again while facing feedback, giving points on a 7-point dice scale, ranging from 0 points to 6 points.

Metacognitive monitoring of one’s own performance: Confidence judgments Children’s postdictions (confidence judgments) for correct and incorrect answers were examined. The analysis revealed a significant age-related progression such that second graders

6- and 8-Year-Olds’ Performance Evaluations

a

Self Other 6 5

Credits (mean)

Fig. 4 Mean credits for correctly recognized Kanji in the self and other condition in kindergarteners and 2nd graders (a). Mean credits for incorrectly recognized Kanji in the self and other condition in kindergarteners and 2nd graders (b). Standard errors of the mean are represented in the figure by the error bars attached to each column

4 3 2 1 0 Kindergarten

2nd Grade Age Group

b

Self Other

6

Credits (mean)

5 4 3 2 1 0 Kindergarten

2nd Grade Age Group

differentiated more strongly in their confidence judgments between correct and incorrect answers, compared to kindergarteners. This finding is in line with studies on metacognitive development showing steady improvements in children’s monitoring skills (e.g., in the ability to discriminate between correct and incorrect answers in their metacognitive judgments) across the elementary school years (e.g., Roderer and Roebers 2010; Schneider and Lockl 2008; Schneider et al. 2000). Interestingly, this age effect seemed to be based mainly on the difference in CJs for correct responses, with older children giving higher mean CJs than younger children. However, no such substantial difference was observed in CJs for incorrect responses. Based on the above-mentioned studies on metacognitive development, one would expect a decrease in children’s (over)confidence with growing age. In the current study, older children’s higher CJs for correctly recognized items are possibly linked to their higher recognition accuracy compared to the younger age group. Task difficulty, complexity as well as its familiarity has been shown to influence children’s monitoring judgments (e.g., Schneider 1998). Another possible interpretation could be the so called “hard-easy-effect”. The hardeasy-effect refers to an individual’s tendency to be overconfident about the correctness of one’s

own answers to difficult items (i.e., Kanji) and underconfident about one’s answers to easy items (e.g., Juslin et al. 2000; Lichtenstein and Fischhoff 1977). In this sense, the older children in our study not only gave more correct responses but were possibly also more strongly affected by the hard-easy-effect. Using familiar and less demanding materials such as pictures of well-known objects from children’s daily life, might also facilitate a more (over)optimistic view of one’s own abilities. In our study, the stimuli were novel and the task procedure itself was demanding. This novelty and complexity of evaluating the learning of Kanji might have influenced children’s (particularly kindergarteners) evaluation of their performance, leading to lower CJs for correct answers. On the other hand, the still relatively high CJs for incorrect responses in both age groups (second graders: 3.65 and kindergarteners: 3.70) might reflect children’s difficulty in monitoring uncertainty. Thus, it might generally be more difficult to state how unsure one is about a given answer than to state how sure one is. This conforms to the finding reported by Roebers and Spiess (2016), showing marked improvements in second graders CJs for incorrect answers (i.e., a downward adjustment of CJs for incorrect answers) across a delay of only eight months. In the current study, children encountered the task of learning Japanese symbols for the first time. Thus, their previous experience with this type of task was rather limited – especially in the case of the kindergarteners, who are not yet confronted with such learning tasks in their daily life as much as school-aged children. Due to the lack of (task) experience, it may have been particularly difficult for these young children to estimate the difficulty of the task and use it as a cue for their confidence judgments, performance-based judgments and credits. Previous research on metacognitive monitoring has shown that individuals use various cues such as how easily material is learned, how familiar they are with the material or how much knowledge they have about a certain task as a basis for their judgments (e.g., Koriat 1997; Koriat et al. 2008). Although there is a general agreement among researchers that individuals use various cues when predicting and evaluating their performance, it does not necessarily mean that these cues are always valid to predict memory performance accurately (e.g., Koriat 1997). For example, children often demonstrate difficulties considering task difficulty as a cue for their self-evaluations prior to about grade 2 or 3 (Nicholls 1978). With growing age, children gain more experience in evaluating their skills, they acquire more knowledge concerning their skills and they are likely to increase their knowledge on valid cues to be used as a basis for metacognitive judgments. In this sense, children might become more sophisticated in the use of cues when evaluating performance, resulting in less overestimation as well as a stronger differentiation between correct and incorrect performance.

Self versus other: Performance-based judgments Regarding judgments provided after receiving feedback on their performance, children of both age groups differentiated meaningfully in their judgments between correct and incorrect answers. Kindergarteners and second graders considered and incorporated the feedback (i.e., which Kanji they recognized correctly and incorrectly) in a constructive way by providing higher ratings for correct answers and lower ratings for incorrect answers. The ability to differentiate in their ratings between correctly and incorrectly recognized items (i.e., discrimination accuracy), seems to improve through feedback, as can be seen when the mean performance-based judgments (correct answers: M = 5.7 and incorrect answers: M = 2.5; difference = 3.2) are compared to the confidence judgments provided before receiving feedback (correct answers: M = 4.6; incorrect answers: M = 3.7; difference = 0.9). Thus, the participating children used feedback information to adjust and optimize their learning or, in

6- and 8-Year-Olds’ Performance Evaluations

this specific case, to increase discrimination accuracy. This speaks for a positive effect of feedback in children as young as kindergarteners. Even though the potential influence of feedback was not the focus of the current investigation, future studies on children’s performance evaluations should consider the role of feedback. Interestingly, participants from both age groups differentiated in their performance-based judgments between self and other. Thus, regardless of feedback and accuracy, children provided higher judgments for their own answers compared to the virtual peer’s answers. Why do children provide higher ratings for their own answers in the face of feedback? One possible explanation is that children of both age groups based their judgments on rather simplistic or invalid cues such as how much effort they invested in solving the task or on the desire to perform extraordinarily well, cues shown to be commonly used by young children (e.g., Nicholls 1978). In this case, participants would judge their own effort in the Kanji task to be very high which would justify “higher” judgments, while in the case of the peer they cannot know about the effort invested. At the same time, the wish to outperform others might play a role too, for example an individual’s need for self-enhancement to protect its self-esteem or the tendency to see oneself preferentially in a positive, self-serving manner in comparison to others (Leary 2007). In our study, children clearly differentiated between the evaluation of their own and another, personally unknown child’s performance, regardless of age and accuracy. On the one hand, this result suggests that kindergarteners but also second graders might still be guided by their “unrealistic optimism”, wishing to perform extraordinarily, and, possibly also to outperform others, mirrored in the difference in judgments between self and unknown other. On the other hand, the result speaks for a so called attributional egotism, describing the tendency to see oneself in the best possible light, a phenomenon frequently described in individualistic countries (e.g., Snyder et al. 1978). When assessing their own performance, children in the current study were possibly influenced by such immature and self-serving biases while for the unknown other child’s performance, more “objective” reasoning occurred. In the case of one’s self and self-esteem, evaluation objectivity could therefore be less relevant. On the contrary, when assessing the other child’s performance, less emotional, more rational and “objective” cues such as item difficulty or performance accuracy (i.e., feedback) might form the basis for the evaluations. As argued by Stipek and Tannatt (1984), it may be more difficult for young children to engage in self-reflection than to analyze another child’s performance accurately – perhaps because of the self-relevance in the former. Future studies should consider cross-cultural comparisons to gain a deeper understanding of the influence of the attributional egotism on children’s performance assessments and evaluations. Although children generally lowered their judgments for incorrect responses when facing feedback, they still did not lower their ratings to a 0 – neither for the self, nor for the unknown other. This is particularly interesting when considering that children received salient and obvious feedback on their own and the virtual peer’s performance. Again, we can only speculate what might have driven children to provide a judgment of approximately 2.5 out of 6 on the thermometer scale when seeing that the answer was incorrect. Effort could be one possible explanation for these results. Unfortunately, we did not measure invested effort directly, for example by asking children to rate their effort on a similar Likert scale. If “invested effort” plays a role when evaluating performance in the face of feedback, giving a judgment of 0 after receiving negative feedback is not adequate if one is convinced to have worked very hard or the peer for sure must have tried hard. Similarly, the “wish” to perform well might be particularly dominant, making it impossible for young children to “accept” the fact of being incorrect. This could lead to conclusions or justifications such as “this just cannot be true” or

“it may not be, what should not be”. The possibility that participants did not understand the instructions can also not be ruled out completely.

Self versus other: Crediting correct and incorrect answers Regarding the credits for one’s own versus the virtual peer’s answers, children of both age groups gave more credits for their own correct and incorrect answers – just as it was the case for the performance-based judgments. As discussed previously, the desire to perform well or so-called “wishful thinking” processes (Schneider 1998; Stipek 1984) might influence children’s evaluations of their own and a peer’s performance. In our study, this could explain the overly optimistic crediting of correct and especially of incorrect responses. Children might be driven by the desire to outperform their peers and thus give themselves more credits – regardless of whether the peer performed at an equal level. An earlier study with second graders showed that children tended to exhibit a self-serving bias only if they had to assess a non-friend’s performance, but not when assessing a friends’ performance (Posey and Smith 2003). Thus, motivational influences might play a role, too, depending on whether the individual feels to be in a competition (i.e., with an unknown virtual peer) or in a cooperation (i.e., with a close friend). This might explain why children in the current study differentiated in their credits between self and unknown other. Likewise, young children’s difficulties in differentiating between ability and effort perhaps influenced the distribution of credits. “The more effort I invested, the more credits I should get” – this effort heuristic has played a role in previous studies with young children (e.g., Schneider 1998; Stipek and Tannatt 1984; Wellman 1985). Accordingly, it might be easier for children to assess and evaluate (or in our case credit) their own effort invested in the task than an unknown child’s effort. Interestingly, as opposed to the performance-based judgments, there was an age-related difference for the crediting of incorrect and correct responses between kindergarteners and second graders. Based on this interaction effect, it can be concluded that older children gave more credits for correct answers and less credits for incorrect answers than kindergarteners. Again, this result speaks for an age-related increase in the differentiation between correct and incorrect answers as was found for metacognitive judgments. Learning Kanji was a novel and challenging task, with second graders outperforming the younger age group. Compared to the kindergarteners, second graders might have shown an “experience-related” advantage as they already passed more than a year of schooling, being offered occasions of receiving “objective” feedback and grades. Also, social reinforcement which is frequently provided during the preschool years, plays a less central role in elementary school (Stipek and Tannatt 1984). Any additional year of academic education allows children to become better at interpreting feedback from their teachers. At the same time, they learn that teachers credit answers based on correctness rather than on the effort invested by the student. For the second graders, the experience of dealing with feedback and the instruction of being a teacher who must give credits for responses, might have helped them to focus on more objective or “valid” cues, such as item difficulty, as a basis for their crediting – for the self as well as the unknown other. In previous research on metacognitive monitoring it was shown that individuals tend to use various cues for their metacognitive judgments and that these cues can differ in their validity or in their relation to task performance (e.g., Hertzog et al. 2013; Koriat 1997). Relying on cues when making metacognitive judgments is not only the case in adults but also in children, and the use of (valid) cues is expected to contribute to the increase in monitoring accuracy with growing age (e.g., Koriat et al. 2009). Furthermore, social comparisons are likely to become

6- and 8-Year-Olds’ Performance Evaluations

more influential as children get older and self-ratings of abilities and smartness have been shown to decrease during the elementary school years (e.g., Stipek and Tannatt 1984).

Conclusions, implications and limitations Taken together, our study provides insights into 6- and 8-year-old children’s ability to metacognitively monitor and evaluate their own and an unknown virtual peer’s performance, pursuing two questions: Firstly, do children of different ages assess and evaluate their own performance differently from that of a virtual peer? Secondly, which factors might influence their performance assessments for self versus unknown others? When asked to provide confidence judgments for their own recognition performance, second graders outperformed kindergarteners by differentiating more strongly in their confidence ratings between correct and incorrect recognition. These results replicated previous findings on age-related progression in the ability to monitor one’s own performance, strengthening and extending the existing evidence. In the case of performance-based judgments, both age groups rated their own as well as the virtual peer’s performance in the face of objective feedback. Results showed that 6- and 8-year-olds made use of the feedback by providing higher ratings for their own and the peer’s correct answers than for incorrect answers. Interestingly, regardless of age and recognition accuracy, participants gave higher judgments to their own answers. Similarly, when asked to credit their own and the peer’s responses, more credits were given for their own responses. So far there has been little research reported on children’s abilities to assess and evaluate their own and another child’s performance. By understanding how children evaluate their own performance compared to how they evaluate a peer’s performance (e.g., by providing judgments and crediting responses) could be of importance in academic contexts. Group work, collaborative and co-operative learning is a methodology observed in many classrooms around the world. Collaborative learning requires children to listen to others, consider their opinions and perspectives and integrate them into their own perception and understanding. Hence, to work with classmates on a project in an efficient and successful manner, it is worth being able to regulate learning, to assess and evaluate performance not only for the self but also for the other group members (e.g., Paulus et al. 2014). From a more theoretical standpoint it is important to investigate children’s developing monitoring skills from various perspectives: It could be that young children benefit from the experience of evaluating a peer’s performance (other-monitoring) for the monitoring of their own performance (self-monitoring). This is in line with Vygotsky’s (1978) idea, such that children learn to regulate themselves by first practicing it in interactions with other individuals. Certainly, there are limitations to the current study. Due to our study design, it was not possible to collect confidence judgments before receiving feedback for the virtual peer’s answers and to compare these to children’s confidence judgments for their own answers. Therefore, it would be particularly interesting to elaborate on this approach of “self versus other” by asking children from different age groups to evaluate their own and a “real-life” peer’s performance, for example from their kindergarten or school class. Participants could be asked to provide all types of judgments for themselves and the other individual and data of a child’s reallife peer would be available instead of a virtual peer’s. Additionally, there is the question of cultural influences on the differentiation between self and other. Our sample consisted of children growing up in an individualistic (self-oriented) culture. Possibly, individuals from collectivistic (other-oriented) cultures would show a different pattern of judging and crediting their own performance versus that of a peer. Analyzing cultural difference was beyond the

scope of our approach. Nevertheless, we think that future studies should consider an individual’s culture as a potential influencing factor when investigating performance evaluations for the self in comparison to (unknown) others. It could also be of interest to further investigate the potential influence of effort in more detail. We admit that effort was not directly measured in the current investigation and thus we can only speculate of how influential it is as a cue for monitoring processes. Future studies could elaborate on the influence of effort by asking children directly to rate their own effort invested in a task (e.g., by using similar scales as for the monitoring judgments). Even though no final conclusions can be drawn, the current study provides a useful starting point to further explore children’s assessment and evaluation of their own and another individual’s performance as well as potential underlying factors. It is certainly of great importance to further explore the various cues or heuristics children of different ages use to predict and postdict their performance in cognitive, social, emotional, or physical tasks. It would be interesting to include various types of (metacognitive) judgments and compare these for the self and the other in different age groups. As shown in the present study, factors influencing children’s assessments and evaluations depend on age and on whether the child assesses his or her own performance or the performance of another person. Especially when asked to assess a peer’s performance, assessment accuracy is very likely influenced by various factors such as the ability to mentalize about another individual’s desires, beliefs and knowledge (i.e., theory of mind; Sodian 1990), the relationship to the other (i.e., virtual peer/ unknown other versus friend) or other characteristics of the counterpart (i.e., child versus adult, human vs. computer character/robot). Consequently, the question arises of how these new insights could be used to improve children’s ability to evaluate their own as well as another individual’s performance more accurately – a question that we hope will be pursued in future studies. Acknowledgments We would like to thank the participating children and their families, the children’s teachers and the student research assistants who helped with data collection. Many thanks also to Jakob Raible for his valuable help in task design and programming. Compliance with ethical standards All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Conflict of interest This project was partially financed by a grant of the Swiss National Science Foundation (SNSF- Grant No.100014_126559 provided to the author Claudia M. Roebers. The authors declare that they have no conflict of interest.

References Bandura, A. (1977). Toward a unifying theory of behavioral change. Psychological Review, 84, 191–215 Retrieved from https://www.uky.edu/~eushe2/Bandura/Bandura1977PR.pdf. Bernard, S., Clément, F., & Mercier, H. (2016). Wishful thinking in preschoolers. Journal of Experimental Child Psychology, 141, 267–274. doi:10.1016/j.jecp.2015.07.018. Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of metacognition. In R. Glaser (Ed.), Advances in instructional psychology (pp. 77–165). Hillsdale, NJ: Lawrence Erlbaum Associates. Destan, N., & Roebers, C. M. (2015). What are the metacognitive costs of young children’s overconfidence? Metacognition and Learning, 10, 347–374. doi:10.1007/s11409-014-9133-z. Destan, N., Hembacher, E., Ghetti, S., & Roebers, C. M. (2014). Early metacognitive abilities: The interplay of monitoring and control processes in 5- to 7-year-old children. Journal of Experimental Child Psychology, 126, 213–228. doi:10.1016/j.jecp.2014.1004.1001. Dunlosky, J., & Metcalfe, J. (2009). Metacognition. Thousand Oaks, CA: SAGE.

6- and 8-Year-Olds’ Performance Evaluations Dunlosky, J., & Rawson, K. A. (2012). Overconfidence produces underachievement: Inaccurate self evaluations undermine students' learning and retention. Learning and Instruction, 22, 271–280. doi:10.1016/j. learninstruc.2011.08.003. Dunning, D. (2005). Self-insight: Roadblocks and detours on the path to knowing thyself. New York, NY: Psychology Press. Flavell, J. H. (1979). Meta-cognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34, 906–911. doi:10.1037/0003-066x.34.10.906. Hacker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test prediction and performance in a classroom context. Journal of Educational Psychology, 92, 160–170. doi:10.1037//0022-0663.92.1.160. Heckhausen, H. (1984). Emergent achievement behavior: Some early developments. In J. Nicholls (Ed.), Advances in achievement motivation (pp. 1–32). Greenwich, CT: JAI. Hertzog, C., Hines, J. C., & Touron, D. R. (2013). Judgments of learning are influenced by multiple cues in addition to memory for past test accuracy. Archives of Scientific Psychology, 1, 23–32. doi:10.1037/arc0000003. Howie, P., & Roebers, C. M. (2007). Developmental progression in the confidence-accuracy relationship in event recall: Insights provided by a calibration perspective. Applied Cognitive Psychology, 21, 871–893. doi:10.1002/acp.1302. Juslin, P., Winman, A., & Olsson, H. (2000). Naive empiricism and dogmatism in confidence research: A critical examination of the hard-easy effect. Psychological Review, 107, 384–396. doi:10.1037//0033-295X.107.2.384. Koriat, A. (1997). Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology-General, 126, 349–370. doi:10.1037//0096-3445.126.4.349. Koriat, A., Nussinson, R., Bless, H., & Shaked, N. (2008). Information-based and experience-based metacognitive judgments: Evidence from subjective confidence. In J. Dunlosky & R. A. Bjork (Eds.), Handbook of metamemory and memory (pp. 117–136). New York, NY: Psychology Press. Koriat, A., Ackerman, R., Lockl, K., & Schneider, W. (2009). The memorizing effort heuristic in judgments of learning: A developmental perspective. Journal of Experimental Child Psychology, 102, 265–279. doi:10.1016/j.jecp.2008.10.005. Krebs, S. S., & Roebers, C. M. (2012). The impact of retrieval processes, age, general achievement level, and test scoring scheme for children's metacognitive monitoring and controlling. Metacognition and Learning, 7, 75–90. doi:10.1007/s11409-011-9079-3. Kruger, J. (1999). Lake Wobegon be gone! The "below-average effect" and the egocentric nature of comparative ability judgments. Journal of Personality and Social Psychology, 77(2), 221–232. Leary, M. R. (2007). Motivational and emotional aspects of the self. Annual Review of Psychology, 58, 317–344. doi:10.1146/annurev.psych.58.110405.085658. Lichtenstein, S., & Fischhoff, B. (1977). Do those who know more also know more about how much they know? Organizational Behavior and Human Performance, 20, 159–183. doi:10.1016/0030-5073(77)90001-0. Lipko, A. R., Dunlosky, J., Hartwig, M. K., Rawson, K. A., Swan, K., & Cook, D. (2009). Using standards to improve middle school students' accuracy at evaluating the quality of their recall. Journal of Experimental Psychology: Applied, 15, 307–318. doi:10.1037/a0017599. Lipko, A. R., Dunlosky, J., Lipowski, S. L., & Merriman, W. E. (2012). Young children are not underconfident with practice: The benefit of ignoring a fallible memory heuristic. Journal of Cognition and Development, 13, 174–188. doi:10.1080/15248372.2011.577760. Lyons, K. E., & Ghetti, S. (2011). The development of uncertainty monitoring in early childhood. Child Development, 82, 1178–1787. doi:10.1111/j.1467-8624.2011.01649.x 21954871. Lyons, K. E., & Ghetti, S. (2013). I don't want to pick! Introspection on uncertainty supports early strategic behavior. Child Development, 84, 726–736. doi:10.1111/cdev.12004. Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework and new findings. In G. H. Bower (Ed.), The psychology of learning and motivation (pp. 125–173). New York, NY: Academic Press. Nicholls, J. G. (1978). Development of concepts of effort and ability, perception of academic attainment, and understanding that difficult tasks require more ability. Child Development, 49, 800–814. doi:10.1111/j.14678624.1978.tb02383.x. Okita, S. Y. (2014). Learning from the folly of others: Learning to self-correct by monitoring the reasoning of virtual characters in a computer-supported mathematics learning environment. Computers & Education, 71, 257–278. doi:10.1016/j.compedu.2013.09.018. Paulus, M., Tsalas, N., Proust, J., & Sodian, B. (2014). Metacognitive monitoring of oneself and others: Developmental changes during childhood and adolescence. Journal of Experimental Child Psychology, 122, 153–165. doi:10.1016/j.jecp.2013.12.011 24607803. Piaget, J. (1930). The child's conception of physical causality. London, UK: Routledge & Keagan Paul. Plumert, J. M., & Schwebel, D. C. (1997). Social and temperamental influences on children's overestimation of their physical abilities: Links to accidental injuries. Journal of Experimental Child Psychology, 67, 317–337. doi:10.1006/jecp.1997.2411.

Posey, E., & Smith, R. A. (2003). The self-serving bias in children. Psi Chi Journal of Undergraduate Research, 8, 153–156 Retrieved from http://homepage.psy.utexas.edu/homepage/class/Psy359H/Echols/Psi%20 Chi%20Articles/2003%20Vol%208/2003%20Vol%208%20%234/Posey.pdf. Roderer, T., & Roebers, C. M. (2010). Explicit and implicit confidence judgments and developmental differences in metamemory: An eye-tracking approach. Metacognition and Learning, 5, 229–250. doi:10.1007/s11409010-9059-z. Roebers, C. M. (2002). Confidence judgments in children's and adults' event recall and suggestibility. Developmental Psychology, 38, 1052–1067. doi:10.1037//0012-1649.38.6.1052. Roebers, C. M. (2014). Children's deliberate memory development: The contribution of strategies and metacognitive processes. In P. J. Bauer & R. Fivush (Eds.), The Wiley Handbook on the development of Children's memory (pp. 865–894). Oxford, UK: Blackwell Wiley. Roebers, C. M., & Spiess, M. A. (2016). The development of metacognitive monitoring and control in second graders: A short-term longitudinal study. Journal of Cognition and Development. doi:10.1080 /15248372.2016.1157079. Roebers, C. M., Krebs, S. S., & Roderer, T. (2014). Metacognitive monitoring and control in elementary school children: Their interrelations and their role for test performance. Learning and Individual Differences, 29, 141–149. doi:10.1016/j.lindif.2012.12.003. Schneider, W. (1998). Performance prediction in young children: Effects of skill, metacognition and wishful thinking. Developmental Science, 1, 291–297. doi:10.1111/1467-7687.00044. Schneider, W. (2014). Memory development from early childhood through emerging adulthood. Heidelberg, DE: Springer. Schneider, S. L., & Laurion, S. K. (1993). Do we know what we've learned from listening to the news? Memory & Cognition, 21(2), 198–209. Schneider, W., & Lockl, K. (2002). The development of metacognitive knowledge in children and adolescents. In T. J. Perfect & B. L. Schwartz (Eds.), Applied metacognition (pp. 224–257). Cambridge, UK: Cambridge University Press. Schneider, W., & Lockl, K. (2008). Procedural metacognition in children: Evidence for developmental trends. In J. Dunlosky & R. A. Bjork (Eds.), Handbook of metamemory and memory (pp. 391–409). New York, NY: Psychology Press. Schneider, W., Visé, M., Lockl, K., & Nelson, T. O. (2000). Developmental trends in children's memory monitoring - evidence from a judgment-of-learning task. Cognitive Development, 15, 115–134. doi:10.1016/S0885-2014(00)00024-1. Schunk, D. H. (1983). Ability versus effort attributional feedback: Differential effects on self-efficacy and achievement. Journal of Educational Psychology, 75(6), 848–856 Retrieved from https://libres.uncg. edu/ir/uncg/f/D_Schunk_Ability_1983.pdf. Sebanz, N., Bekkering, H., & Knoblich, G. (2006). Joint actions: Bodies and minds moving together. Trends in Cognitive Science, 10, 70–76. doi:10.1016/j.tics.2005.12.009. Snyder, M. L., Stephan, W. G., & Rosenfield, D. (1978). Attributional egotism. In J. H. Harvey, W. J. Ickes, & R. F. Kidd (Eds.), New directions in attribution research (Vol. 2, pp. 91–120). Hillsdale, NJ: Erlbaum. Sodian, B. (1990). Understanding sources of information - implications for early strategy use. Interactions among Aptitudes, Strategies, and Knowledge in Cognitive Performance, 12–21 Retrieved from http://link.springer. com/chapter/10.1007/978-1-4612-3268-1_2. Stipek, D. (1984). Young children's performance expectations: Logical analysis or wishful thinking. In J. Nicholls (Ed.), Advances in motivation and achievement (Vol. 3, pp. 33–56). Greenwich, CT: JAI Press. Stipek, D. J., & Hoffman, J. M. (1980). Development of children’s performanceerelated judgments. Child Development, 51, 912–914. doi:10.2307/1129485. Stipek, D. J., & Tannatt, L. M. (1984). Childrens judgments of their own and their peers academic competence. Journal of Educational Psychology, 76, 75–84. doi:10.1037//0022-0663.76.1.75. Tsalas, N., Paulus, M., & Sodian, B. (2015). Developmental changes and the effect of self-generated feedback in metacognitive controlled spacing strategies in 7-year-olds, 10-year-olds, and adults. Journal of Experimental Child Psychology, 132, 140–154. doi:10.1016/j.jecp.2015.01.008 25703006. van Loon, M. H., de Bruin, A. B. H., van Gog, T., & van Merriënboer, J. J. G. (2013). The effect of delayed-JOLs and sentence generation on children's monitoring accuracy and regulation of idiom study. Metacognition and Learning, 8, 173–191. doi:10.1007/s11409-013-9100-0. Visé, M., & Schneider, W. (2000). Determinanten der Leistungsvorhersage bei Kindergarten- und Grundschulkindern: Zur Bedeutung metakognitiver und motivationaler Einflußfaktoren. Zeitschrift fuer Entwicklungspsychologie und Paedagogische Psychologie, 32, 51–58. doi:10.1026//0049-8637.32.2.51. Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press. Wellman, H. M. (1985). The origins of metacognition. In D. L. Forrest-Pressley, G. E. MacKinnon, & T. G. Waller (Eds.), Metacognition, cognition, and human performance (Vol. 1, pp. 1–31). Orlando, FL: Academic Press.

6- and 8-year-olds’ performance evaluations: Do they differ between self and unknown others?

Recommend Documents