Read Writ DOI 10.1007/s11145-017-9731-7
Passage independence within standardized reading comprehension tests Annie Roy-Charland1 • Gabrielle Colangelo1 Victoria Foglia1 • Leı¨la Reguigui1
•
Springer Science+Business Media Dordrecht 2017
Abstract In tests used to measure reading comprehension, validity is important in obtaining accurate results. Unfortunately, studies have shown that people can correctly answer some questions of these tests without reading the related passage. These findings bring forth the need to address whether this phenomenon is observed in multiple-choice only tests or in those that employ open-ended questions. Three common standardized reading comprehension tests were examined: the WIAT-III, the CAAT, and the Nelson–Denny. The WIAT-III is composed of open-ended questions, while the other two tests utilize multiple-choice questions. All participants were instructed to answer the questions to the best of their ability, without access to the related passage. The results revealed that participants correctly answered the questions at a significantly higher rate than by chance for the multiplechoice, which supports the independency issue. For the open-ended questions, participants still answered with 18% accuracy, without the passages. Keywords Reading comprehension Validity Passage independence Standardized tests Psychological tests are the most commonly used tools to study human behaviour (Aiken & Groth-Marnat, 2006; Heilbrun, 1997; Keenan & Betjemann, 2006; Larcker & Lessig, 1980; McClelland, 1977; Viswesvaran & Ones, 1995). They are developed to better understand individual differences that exist amongst humans, as well as the changes that occur in an individual in various situations and environments. These tests can help us understand latent constructs that cannot be accessed by observation alone (Anastasi and Urbina, 1997; Aiken & Groth-Marnat, & Annie Roy-Charland
[email protected] 1
Department of Psychology, Laurentian University, 935 Ramsay Lake Road, Sudbury, ON P3E 2C6, Canada
123
A. Roy-Charland et al.
2006; Lim, Pangam, Periyasami, & Aneja, 2006; Neukrug & Fawcett, 2006). Research in this field mainly focuses on developing more precise tools with better psychometric properties (Anastasi and Urbina, 1997; Neukrug & Fawcett, 2010). This study will contribute to this field by looking at the quality of tests used to measure reading comprehension. Psychological tests are composed of many key psychometric properties. For instance, it is very important for a test to be reliable and valid in order for it to be effective (Aiken & Groth-Marnat, 2006; Neukrug & Fawcett, 2006). Reliability refers to the consistency of a test. It must be able to generate, for the same person, comparable scores from one day to another, in different situations, or following a second test comparing equivalent items. A reliable tool allows us to identify whether the dispersion in scores is a result of individual differences or random error (Anastasi and Urbina, 1997; Neukrug & Fawcett, 2010). Validity refers to a test’s ability to measure the construct that it is designed to measure. A valid tool is composed of items that are able to accurately measure what a researcher is studying (Anastasi and Urbina, 1997; Neukrug & Fawcett, 2010). Despite the similarities between the two, a test cannot be valid without being reliable. When it comes to the measure of reading comprehension, validity encompasses a greater concern (Messik, 1996; Lim et al., 2006; Keenan & Betjemann, 2006). Therefore, it is important to verify that the questions and subsequent scores truly measure reading comprehension in an individual as promised by these tests. Research on the validity of reading comprehension tests started in the 1930s. It was found that it was possible for people to correctly answer multiple-choice questions about a specific passage without having read said passage. It was suggested that these questions were independent of their related passage, and could not accurately measure reading comprehension (Eurich, 1931; Rothney & Bear, 1938). This discovery received little attention until 1964, when Preston revived the idea with the first systematic experiment. Preston (1964) found that university students were able to obtain higher scores than chance level, without having access to the related passage. This confirmed that passage independence was still problematic. Preston (1964) suggested that there should be more review when developing reading comprehension tests. Following this discovery, popular psychological tests were also investigated for this issue. Lifson, Scruggs and Bennion (1984) showed that once again, university students were able to correctly answer questions that relate to a specific passage, without having access to this passage, at a higher rate than chance alone. Furthermore, multiple other studies have unveiled passage independent items on standardized reading comprehension measures such as the Minnesota Scholastic Aptitude Test (Fowler & Kroll, 1978), Stanford Achievement Test (Lifson, Scruggs, & Bennion, 1984), Scholastic Achievement Test (SAT; e.g., Katz, Lautenschlager, Blackburn, & Harris, 1990), Test of English as a Foreign Language (Tian, 2006), Gray Oral Reading Test (Keenan & Betjemann, 2006), and the Nelson–Denny (Coleman, Lindstrom, Nelson, Lindstrom, & Gregg, 2010). Research indicating passage independent items on these tests are concerning due to the popularity of these tests aimed to measure reading comprehension, within the field of learning disabilities. These tests are often used to identify reading
123
Passage independence within standardized reading…
difficulties in children, and also in the development of intervention programs for these children. The Gray Oral Reading Test (GORT-3) is one of the tests used to measures how easily and efficiently an individual can read. The GORT-3 is composed of multiple-choice questions that are related to various passages and used to assess how effectively the passage has been understood by the individual. The passage independence of the GORT-3 has been previously examined by Keenan and Betjemann (2006) through administering the questions of the test to be answered without the passage. The results from their undergraduate sample revealed that 86% of the GORT-3 was answered correctly above chance level (25%) without reading the passage. They then followed the same procedure with a group of smaller children who obtained a mean accuracy of 47%. Furthermore, Keenan and Betjemann (2006) also examined a separate sample of children with and without reading disabilities to determine if the passage independent items on the test were sensitive to those with reading disabilities. Results indicated that the passage independent items were not sensitive to reading disabilities. This confirmed that only the most passage dependent items of the GORT-3 are useful in detecting differences in comprehension abilities as a function of reading disability. Furthermore, it should be noted that these results for the GORT-3 would also apply to the GORT-4 since questions and passages remained the same between these two versions. In comparison to Keenan and Betjemann’s (2006) results on the validity and passage independence of the GORT-3, Coleman et al. (2010) examined the validity and passage independence of the Nelson–Denny. Similarly, to the GORT-3, the Nelson–Denny is also designed to measure reading comprehension in individuals (Creaser, Jacobs, Zaccaria, & Carsello, 1970; Cummins & Porter, 1981; Daneman & Merikle, 1996; Lewandowski, Codding, Kleinmann, & Tucker 2003) and is the most commonly used of these tests. Coleman et al., (2010) administered Form G and Form H from the Nelson–Denny to participants in a passageless format. The study consisted of both participants who were at risk for reading disabilities and those who were not. The questions used from the Nelson–Denny consisted of 5 answer alternatives, which set the chance level for answering these correctly at 20%. The results from Coleman et al. (2010) revealed that the Nelson–Denny had a similar validity issue as the GORT-3. Participants without any risk of reading disabilities were answering nearly half questions correctly, by performing at 46.6% accuracy for Form H and 43.8% accuracy on Form G without having read the related passages. Furthermore, even those who were at risk for reading difficulties were able to perform above chance, at about 40.6% accuracy. Thus, much like the results from Keenan and Betjemann’s (2006) study examining the GORT-3, the Nelson– Denny also suffers from passage independency all of the participants were able to answer the questions correctly well above chance level without the passages. This discovery about the GORT-3 and the Nelson–Denny’s validity put into question similar tests that were designed to measure reading comprehension in individuals. For example, the Canadian Adult Achievement Test measures reading abilities in adults and adolescents (Canadian Test Center, 1992). This test includes sub-tests to measure vocabulary abilities, spelling, number operations, problem solving, mechanical reasoning, language, science and, of interest to the current
123
A. Roy-Charland et al.
study, reading comprehension. This measure can be used to predict academic success and diagnose learning disabilities. As the CAAT and the Nelson–Denny both employ multiple-choice questions in their reading comprehension component, it makes them easy to administer, fill out, and score (Keenan & Betjemman, 2006). It is important for these tests to be able to accurately measure reading comprehension, because false results could negatively affect an individual’s academic progress. It can lead to the missed detection of a learning disability, especially if the individual does well on the test. This means this individual may not receive the help they need to succeed. However, the CAAT has not been tested for passage independence in research prior to this study. Passage independence could be a problem only found in tests that use multiplechoice questions. Consequently, the use of open-ended questions such as those used in the Weshler Individual Achievement Test—Third Edition (WIAT-III) might be a possible solution to passage independence issues (Flanagan & Harrison, 2012; Tamm et al., 2014). The WIAT-III is administered to children and adults. It is designed to measure academic strength and weaknesses, to diagnose learning disabilities, and to help with education eligibility or program placement (Wechsler, 2009). The reading comprehension subtest of the WIAT-III uses open-ended questions. Unlike multiple-choice questions, individuals subjected to open-ended questions cannot use the process of elimination technique to answer the questions. Nevertheless, it remains possible that participants can answer questions with the use of prior knowledge, which would result in similar passage independence issues found in other tests. Passage independence has not been examined within this standardized test to this day. The current study will examine passage independence of the CAAT, the Nelson– Denny and the WIAT-III. The Nelson–Denny was used as a comparative and in the hope of replicating the results of Coleman et al. (2010). If the CAAT suffers the same validity issues as the GORT-3 and Nelson–Denny, it could be predicted that individuals could correctly answer its questions, without having access to the related passage, more often than by chance alone. The Nelson–Denny contains five answer alternatives, therefore the predicted rate of correct answers obtained by chance is 20%. The CAAT contains four answer alternatives, therefore the predicted rate of correct answers obtained by chance is 25%. As for the WIAT-III, which uses openended questions, a precise hypothesis cannot be determined as there is no clear way to set ‘‘chance level’’ with open-ended questions. Because of the inability to set an appropriate ‘‘baseline’’, we will present and discuss the descriptive statistics for the WIAT-III.
Methods Participants Thirty-nine undergraduate students (29 women and 10 men; mean age: 20.5 years old) from [name deleted to maintain the integrity of the review process] participated in this study. Participants had an on average, three years of post-secondary
123
Passage independence within standardized reading…
education. There were 32 monolingual English speakers and 8 were bilingual with English being their primary language of usage on a daily basis.
Materials Three standardized tests for reading comprehension were used for this study: the WIAT-III (Flanagan & Harrison, 2012; Tamm et al., 2014)., the CAAT (Cummins & Swain, 2014; Thorndike, Hagen, & Sattler 1986; Willms, 1992) and the Nelson– Denny (Creaser et al., 1970; Cummins & Porter, 1981; Daneman & Merikle, 1996; Lewandowski et al., 2003). The WIAT-III is used to measure an individual’s cognitive abilities. This test is intended to be used with individuals who are between the ages of 4 years 0 months and 50 years 11 months. The WIAT-III is composed of 16 sub-tests that measure the following cognitive abilities: listening comprehension, early reading skills, mathematics problem solving, alphabet writing fluency, sentence composition, word reading, essay composition, pseudoword decoding, numerical operations, oral expression, oral reading fluency, spelling and grammar, mathematic fluency (addition, subtraction, multiplication) and lastly, reading comprehension. The reading comprehension sub-test is composed of 83 short-answer questions that relate to 13 different passages. The CAAT measures the level of cognitive ability in adults (19 years of age and over). It focuses on 4 categories: vocabulary, reading comprehension, mathematic operations and mathematic problem solving. The reading comprehension task is composed of 50 multiple-choice questions (with four options per question) that relate to 9 different passages. The Nelson–Denny measures reading competence in individuals between the ages of 14 and 65. This test measures 2 main skills: vocabulary and reading comprehension. The reading comprehension portion is composed of 38 multiplechoice questions (5 options per question) that relate to different 7 passages. Both a demographic and bilingualism questionnaire was administered to participants. The demographic questionnaire required participants to indicate their gender, age, university year and day they completed the study. The bilingualism questionnaire was used to identify the participants’ primary and secondary language, and their competence in each. Procedure Each session lasted approximately 90 min. Participants were required to answer questions associated with the reading comprehension portion of the WIAT-III, the CAAT and the Nelson–Denny, without accessing the related passages. Questions were presented in the same order as in the standardized administration in paper format. Participants were told that they would be answering questions from 3 different tests, but were not told the specific names of these tests. They were informed that one of the tests was composed of 38 questions relating to 7 different passages, a second test was composed of 83 questions relating to 13 different passages, and a third test was composed of 50 questions relating to 9 different passages. In order to provide some reassurance to participants, they were advised
123
A. Roy-Charland et al.
that it is not common to correctly answer questions about passages that they have never read, but that they should answer these questions to the best of their abilities. Participants were informed that they were not allowed to go back to previous questions once they moved to the following question. While within a test, question order was kept as per standard administration, the order in which these tests were administered was counterbalanced between participants. Once the participants finished the task, they were asked to complete both the demographic and bilingualism questionnaires.
Results Percentages of correct answers were obtained for each test by adding correct responses and dividing by the number of questions and multiplying by 100. Comparison with chance level for Nelson–Denny and CAAT Nelson–Denny A one-sample t test comparing correct answers obtained in this study, to those obtained by chance were computed for the Nelson–Denny. Participants were able to correctly answer questions (M = 38.19%, SD = 9.33) significantly better than by chance (20%), t(38) = 12.18, p \ .001. A second series of analyses were computed comparing percentage of correct answers to chance level for each of the seven passages. As can be seen in Table 1, participants were able to correctly answer questions at above chance level for all of the passages. Finally, a third series of analyses were computed for each question in comparison to 20% chance level. Results are presented in Table 2. Twenty-one questions of the 38 questions in the test were answered correctly, above chance level. CAAT A one-sample t test comparing correct answers obtained in this study, to those obtained by chance, were computed for the CAAT. Similar results were observed with participants able to correctly answer questions (M = 43.54%, SD = 10.02), significantly better than by chance (25%), t(38) = 11.56, p \ .001. A second series of analyses were computed comparing percentage of correct answers to chance level for each of the nine passages. As can be seen in Table 1, participants were able to correctly answer questions above chance level, for all of the passages, except one. Finally, a third series of analyses were computed for each question, against a 25% chance level. Results are presented in Table 3. As can be observed in the Table, twenty-six questions were answered correctly above chance level out of the 50 questions in the test.
123
Passage independence within standardized reading… Table 1 Percentage of correct responses, standard deviations and t tests as a function of passage for the Nelson–Denny and the CAAT
Passage
M
SD
t
1
26.28
13.99
2.81*
2
33.80
16.00
5.41*
3
44.10
28.35
5.31*
4
45.60
21.50
7.45*
5
33.80
20.08
4.31*
6
60.50
21.76
11.63*
7
32.80
20.25
3.95*
1
59.87
17.32
12.57*
2
54.50
26.82
6.878*
3
28.85
34.19
0.70
4
31.15
16.27
2.36*
5
43.60
25.08
4.63*
6
39.00
20.49
4.29*
7
37.18
22.12
3.44*
8
33.92
17.93
3.11*
9
59.31
18.55
11.55*
Nelson–Denny
CAAT
Asterisk indicates passages that are answered significantly better than chance in the NelsonDenny and CAAT
Descriptive statistics for the WIAT-III While multiple-choice questions have a statistical chance level, open-ended questions do not. Therefore, making comparisons was more challenging and we opted to only discuss descriptive statistics. Accuracy for the WIAT-III without the passage was 17.64% (SD = 5.66). Descriptive data for each passage are presented in Table 4 and for individual questions in Table 5. As can be seen it Table 4, for of the passages have accuracy below 10% and one of the passages (passage #2) has an accuracy rate of 64.10%. As for individual questions (Table 5), 17 of the questions were not answered accurately by any of the participants (0%). However, six questions were answered accurately over 60% of the time.
Discussion The goal of this study was to examine passage independence within three standardized reading comprehension tests: the Nelson–Denny, the CAAT, and the WIAT-III. This was done by giving participants questions from each test without their related passage. These tests are some of the most popular tools used to assess reading comprehension in individuals. They require its users to read a passage and then answer questions relating to that passage. In order for these tests to be valid, it should be impossible, or at least difficult, to answer these questions correctly
123
A. Roy-Charland et al. Table 2 Percentage of correct responses, standard deviations and t tests as a function of questions for the Nelson–Denny
Asterisk indicates questions that are answered significantly better than chance for the NelsonDenny
Passage
Question
M
SD
t
1
1
10.00
30.70
-1.98
1
2
8.00
27.00
-2.85*
1
3
15.00
36.60
-0.79
1
4
10.00
30.70
-1.98
1
5
31.00
46.80
1.44
1
6
51.00
50.60
3.86*
1
7
36.00
48.60
2.04*
1
8
54.00
50.50
4.18*
2
1
26.00
44.20
0.80
2
2
15.00
36.60
-0.79
2
3
74.00
44.20
7.67*
2
4
31.00
46.80
1.44
2
5
23.00
42.70
0.45
3
1
49.00
50.60
3.54*
3
2
38.00
49.30
2.34*
3
3
38.00
49.30
2.34*
3
4
44.00
50.20
2.93*
3
5
51.00
50.60
3.86*
4
1
8.00
27.00
-2.85* -0.33
4
2
18.00
38.90
4
3
54.00
50.50
4.86*
4
4
67.00
47.80
6.10*
4
5
69.00
46.80
5
1
18.00
38.90
5
2
36.00
48.60
2.04*
5
3
36.00
48.60
2.04*
5
4
26.00
44.20
0.80
5
5
54.00
50.50
4.19*
6
1
77.00
42.70
8.33*
6
2
77.00
42.70
8.33*
6
3
61.00
49.50
5.04*
6.58* -0.33
6
4
10.00
30.70
6
5
79.00
40.90
-1.98 9.08*
7
1
51.00
50.60
3.86*
7
2
23.00
42.70
0.45
7
3
36.00
48.60
2.04*
7
4
31.00
46.80
1.44
7
5
23
42.70
0.45
without the help of the related passage. However, passage independence has been a reoccurring problem since its discovery (Coleman et al., 2010; Daneman & Merikle 1996; Eurich, 1931; Katz et al., 1990; Keenan & Betjemann, 2006; Lifson et al.,
123
Passage independence within standardized reading… Table 3 Percentage of correct responses, standard deviations and t tests as a function of questions for the CAAT
Passage
Question
M
SD
t
1
1
31.00
46.80
0.77
1
2
77.00
42.70
7.60*
1
3
41.00
49.80
2.01
1
4
28.00
45.60
0.44
1
5
90.00
30.70
13.16*
1
6
67.00
47.80
5.45*
1
7
92.00
27.00
15.56*
1
8
51.00
50.60
3.24*
2
1
30.80
46.76
0.77
2
2
84.60
36.55
10.19*
2
3
56.40
50.24
3.91*
2
4
46.20
50.50
2.62*
3
1
10.30
30.74
-3.00*
3
2
5.10
22.35
-5.55*
3
3
20.50
40.91
-0.69
3
4
59.00
49.83
4.26*
4
1
30.80
46.76
0.77
4
2
46.20
50.50
2.62*
4
3
15.40
36.55
-1.64
4
4
4
5
56.40
0
50.24
0
– 3.91*
4
6
33.30
47.76
1.09
5
1
35.90
48.60
1.40
5
2
41.00
49.83
2.01
5
3
35.90
48.60
1.40
5
4
87.20
33.87
11.47*
5
5
17.90
38.88
-1.13 -2.25*
6
1
12.80
33.87
6
2
56.40
50.24
3.91*
6
3
2.60
16.01
-8.75*
6
4
61.50
49.29
4.63*
6
5
61.50
49.29
4.63*
7
1
43.60
50.24
2.31*
7
2
7.70
27.00
-4.00*
7
3
38.50
49.29
1.71
7
4
59.00
49.83
4.26*
8
1
59.00
49.83
4.26*
8
2
53.80
50.50
3.57*
8
3
23.10
42.68
-0.28
8
4
43.60
50.24
2.31*
8
5
23.10
42.68
-0.28
8
6
7.70
27.00
-4.00*
123
A. Roy-Charland et al. Table 3 continued
Asterisk indicates questions that are answered significantly better than chance for the CAAT
Table 4 Percentage of correct responses, standard deviations and t tests as a function of passage for the WIAT-III
Passage
Question
M
SD
t
8
7
17.90
38.88
-1.13
9
1
10.30
30.74
-3.00*
9
2
79.50
40.91
8.32*
9
3
61.50
49.29
4.63*
9
4
48.70
50.64
2.93*
9
5
74.40
44.24
7.99*
9
6
56.40
50.24
3.91*
9
7
84.60
36.55
10.19*
Passage
M
SD
1
8.33
11.23
2
64.10
20.35
3
11.22
11.12
4
5.56
9.62
5
6.59
9.33
6
16.45
9.84
7
17.79
12.46
8
25.32
12.82
9
3.63
5.34
10
7.37
9.82
11
28.53
15.95
12
24.10
14.82
13
16.35
18.17
1984; Preston, 1964). If it is possible to answer the questions correctly without the use of the related passage, then the test does not accurately measure reading comprehension, which causes problems especially in diagnosing learning disabilities. The results of this study suggest the presence of passage independence in all three tests. In fact, participants successfully answered questions without having access to the related passage, more often than by chance for the Nelson–Denny and CAAT who use multiple-choice questions, indicating that the related passages are not always necessary. Furthermore, the current study also examined the passage independence of an open-ended reading comprehension test, the WIAT-III. Though the Nelson–Denny is a widely used reading comprehension test in North America, results of the current study replicate those of Coleman et al. (2010) indicating that many of the test questions can be answered successfully, without reading their related passage. Without the passage, participants were able to correctly answer, on average, 38.19% of the questions correctly, when chance level was 20%. These results were similar to the 43.8–46.6% correct responses without the passage that Coleman et al. (2010) obtained. When examining the results on
123
Passage independence within standardized reading… Table 5 Percentage of correct responses, standard deviations and t tests as a function of questions for the WIAT-III
Passage
Question
M
SD
1 1
1
0
0
2
7.69
22.70
1
3
1.28
8.01
1
4
24.36
27.80
2
1
44.87
48.39
2
2
69.23
46.76
2
3
32.05
45.14
2
4
84.62
36.55
2
5
89.74
30.74
3
1
17.95
29.22
3
2
2.56
11.17
3
3
19.23
27.18
3
4
2.56
16.01
3
5
7.69
27.00
3
6
6.41
23.45
3
7
15.38
23.38
3
8
17.95
38.88
4
1
7.69
27.00
4
2
2.56
16.01
4
3
0
0
4
4
12.82
24.92
4
5
1.28
8.01
4
6
8.97
27.80
5
1
0
0
5
2
0
0
5
3
0
0
5
4
2.56
11.17
5
5
0
0
5
6
10.26
28.52
5
7
33.33
41.89
6
1
5.13
22.35
6
2
0
0
6
3
7.69
27.00
6
4
3.85
17.71
6
5
2.56
16.01
6
6
79.49
40.91
7
1
7.69
27.00
7
2
37.18
42.49
7
3
8.97
19.44
7
4
51.28
50.64
7
5
6.41
16.93
7
6
5.13
22.35
7
7
25.64
44.25
123
A. Roy-Charland et al. Table 5 continued
123
Passage
Question
M
SD
7 8
8
0
0
1
20.51
27.43
8
2
0
0
8
3
47.44
37.96
8
4
61.54
49.29
8
5
5.13
15.37
8
6
0
0
8
7
43.59
50.24
8
8
24.36
27.80
9
1
14.10
25.52
9
2
3.85
13.50
9
3
0
0
9
4
0
0
9
5
0
0
9
6
3.85
13.50
10
1
11.54
21.34
10
2
7.69
24.44
10
3
12.82
31.87
10
4
8.97
19.44
10
5
2.56
16.01
10
6
0
0
10
7
15.38
36.55
10
8
0
0
11
1
56.41
50.24
11
2
33.33
26.49
11
3
21.79
34.02
11
4
42.31
33.52
11
5
12.82
22.12
11
6
5.13
19.18
11
7
38.46
49.29
11
8
17.95
38.88
12
1
0
0
12
2
21.79
27.61
12
3
19.23
29.50
12
4
12.82
27.43
12
5
66.67
47.76
13
1
17.95
35.33
13
2
0
0
13
3
29.49
44.01
13
4
17.95
29.22
Passage independence within standardized reading…
each of the seven passages, participants were able to correctly answer questions from all of the passages, significantly better than chance. As for individual questions, 55% of the questions could successfully be answered, at above chance level. Thus, although the Nelson–Denny is widely used to assess reading comprehension abilities, more than half of the test can be correctly answered without reading the passages, which puts into question its validity. Since the Nelson–Denny contained many questions that were not passage dependent, the current study also examined passage independence of the CAAT. Without the related passages, participants were able to correctly answer on average 43.54% of the time, when chance level was 25%. In addition, when comparing individual passages and question scores, the CAAT remained comparable. In fact, for all of the nine passages in the CAAT, there was only one that was not answered significantly better than by chance. As for individual questions, 52% of them were answered correctly at above chance level, which is slightly better than the Nelson– Denny at 55%. This means that, the CAAT suffers from similar validity issues as the Nelson–Denny. The goal of the current study was also to examine if tests that use open-ended questions could be a better alternative because these questions remove process of elimination as a strategy and they make guessing potentially more difficult. We tested the WIAT-III which uses open-ended questions, rather than the typical multiple-choice. The complexity of this strategy is that we cannot compare results obtained without the passages to statistical chance level. As such, we have presented descriptive statistics and will discuss them here. When examining to total score for the WIAT, participants were able to answer 18% of the questions without the related passages. As for individual passages, some were more difficult to answer without the related text than others. In effect, four passages were at levels below 10%. However, some passages were particularly problematic. For instance, the second passage, which is about ‘‘pet day’’, was answered accurately at 64% without having access to the passage. In effect, when examining potential answers, if participants had some knowledge about what animals could be pets, they were able to answer some of the questions accurately. Actually, two of the questions from this passage were answered accurately above 80% of the time. As an example, one of the questions states: ‘‘Martin brought his cat in a cage. Jeff carried his cat in his arms. Which person did not follow the rules?’’. The answer was Jess who carried his pet in his arms. It could seem logical that this was the right answer based solely on previous knowledge. It should, however, be noted that 17 questions were at 0%, which suggest that they are passage dependant. Actually, the fifth passage is promising since four of the questions were at 0% and the remaining questions were at 3, 10 and 33% accuracy. Thus, this test is not perfect but has potential for improvement. These three tests are all used to evaluate reading comprehension in children and/ or adults and the scores obtained can have serious repercussions on their academic progression. If a child obtains a higher score than they deserve, it could bring upon frustration, lack of stimulation and also negatively impact their self-esteem (Keenan & Betjemann, 2006; Sparfeld, Kimmel, Lo¨wenkamp, Steingra¨ber, & Rost, 2012). They could also feel inferior because the level of difficulty surpasses their own
123
A. Roy-Charland et al.
abilities. As for adults who use these tests in the scope of psychological evaluations, a similar issue arises where receiving inaccurate scores could have severe consequences. Being able to answer reading comprehension questions without having access to the related passages could be due to prior knowledge each individual may have (Katz et al., 1990; Keenan & Betjemann, 2006; Lifson et al., 1984; Sparfeld et al., 2012). However, it is also possible that this phenomenon occurs due to the question and answer format these tests utilize, such as multiple-choice or open-ended. For the multiple-choice format, it is possible that certain individuals are better at identifying the answer that makes the most sense by eliminating less likely answers. While this could be a strategy that individuals use, the results of this study suggests that passage independence in reading comprehension is not simply due to a problem in question formatting. In fact, the WIAT-III uses open-ended questions and participants were still able to correctly answer the questions for some passages at very high levels of accuracy. Nevertheless, the current study does not allow for the possibility of deciphering whether or not the issue is due to prior knowledge, question format, the combination of these elements or even, other cognitive components. About half of the questions were correctly answered above chance level for the multiple-choice questions and at 18% for open-ended questions (Canadian Test Center, 1992; Cummins & Porter, 1981; Wechsler, 2009). This observation suggests that not all of the questions are independent from their related passages, meaning these tests contain some validity. Although passage independence exists within each of these tests, the results of this study suggest that this issue is not sufficient enough to justify a total elimination of these questions. Since the majority of the questions that make up the Nelson–Denny, the CAAT, and the WIAT-III are dependent upon their related passages, it would be more advantageous to modify the problematic questions instead of eliminating them. The average scores for the WIAT-III (openended questions) were the lowest as compared to the other two tests, which gives us a good indication as to where to start with the modifications. Using open-ended questions as well as creating fictitious passages that do not rely on prior knowledge could be some likely solutions to the issue of passage independence. In conclusion, the Nelson–Denny, the CAAT, and the WIAT-III all contain questions that are independent of their related passages. However, they still contain a fair amount of questions that are dependent upon their passages. With that said, we cannot ignore or underestimate the importance of passage independence within standardized tests, especially when the results of these tests can seriously affect the future of an individual. It is recommended to modify the problematic questions, and thoroughly review all standardized tests to ensure their validity. Acknowledgements This research was supported by a Canada Foundation for Innovation infrastructure grant and a NSERC Discovery grant to Annie Roy-Charland. We thank Caroline Comeau for her assistance in running participants and data coding.
123
Passage independence within standardized reading…
References Aiken, L. R., & Groth-Marnat, G. (2006). Psychological testing and assessment (12th ed.). New York: Pearson. Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). New York: Pearson. Canadian Test Center. (1992). Canadian adult achievement test, technical bulletin. Markham: Canadian Test Center Inc. Coleman, C., Lindstrom, J., Nelson, J., Lindstrom, W., & Gregg, K. N. (2010). Passageless comprehension on the Nelson–Denny Reading Test: Well above chance for university students. Journal of Learning Disabilities, 43(3), 244–249. Creaser, J., Jacobs, M., Zaccaria, L., & Carsello, C. (1970). Effects of shortened time limits on the Nelson–Denny Reading Test. Journal of Reading, 14(3), 167–170. Cummins, R., & Porter, (1981). Test review: The Nelson–Denny Reading Test (forms E and F). Journal of Reading, 25(1), 54–59. Cummins, J., & Swain, M. (2014). Bilingualism in education: Aspects of theory, research and practice. Routledge. Daneman, M., & Merikle, P. M. (1996). Working memory and language comprehension: A metaanalysis. Psychonomic Bulletin & Review, 3(4), 422–433. doi:10.3758/BF03214546. Eurich, A. C. (1931). Four types of examinations compared and evaluated. Journal of Educational Psychology, 22(4), 268–278. doi:10.1037/h0075460. Flanagan, D. P., & Harrison, P. L. (Eds.). (2012). Contemporary intellectual assessment: Theories, tests, and issues. New York: Guilford Press. Fowler, B., & Kroll, B. M. (1978). Verbal skills as factors in the passageless validation of reading comprehension tests. Perceptual and Motor Skills, 47, 335–338. Heilbrun, K. (1997). Prediction versus management models relevant to risk assessment: The importance of legal decision-making context. Law and Human Behavior, 21(4), 347–359. doi:10.1023/A: 1024851017947. Katz, S., Lautenschlager, G. J., Blakckburn, A. B., & Harris, F. H. (1990). Answering reading comprehension items without passages on the SAT. Psychological Science, 1(2), 122–127. doi:10. 1111/j.1467-9280.1990.tb00080.x. Keenan, J. M., & Betjemann, R. S. (2006). Comprehending the Gray Oral Reading Test without reading it: Why comprehension tests should not include passage-independent items. Scientific Studies of Reading, 10(4), 363–380. doi:10.1207/s1532799xssr1004_2. Larcker, D. F., & Lessig, V. P. (1980). Perceived usefulness of information: A psychometric examination. Decision Sciences, 11(1), 121–134. doi:10.1111/j.1540-5915.1980.tb01130.x. Lewandowski, L. J., Codding, R. S., Kleinmann, A. E., & Tucker, K. L. (2003). Assessment of reading rate in postsecondary students. Journal of Psychoeducational Assessment, 21(2), 134–144. doi:10. 1177/073428290302100202. Lifson, S., Scruggs, T. E., & Bennion, K. (1984). Passage independence in reading achievement tests: A follow-up. Perceptual and Motor Skills, 58(3), 945–946. doi:10.2466/pms.1984.58.3.945. Lim, Y. K., Pangam, A., Periyasami, S., & Aneja, S. (2006, October). Comparative analysis of high-and low-fidelity prototypes for more valid usability evaluations of mobile devices. In Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles (pp. 291-300). ACM. doi: 10.1145/1182475.1182506 McClelland, D. C. (1977). Testing for competence rather than intelligence. American Psychologist, 28(1), 1–14. doi:10.1037/h0034092. Messik, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241–256. doi:10. 1177/026553229601300302. Neukrug, E. S., & Fawcett, R. C. (2006). Essentials of testing and assessment: A practical guide for counselors, social workers, therapists, and others. Belmont: Brooks/Cole. Neukrug, E. S., & Fawcett, R. C. (2010). Essentials of testing and assessment: A practical guide for counselors, social workers, and psychologists (2nd ed.). Belmont, CA: Brooks/Cole. Preston, R. C. (1964). Ability of students to identify correct responses before reading. The Journal of Educational Research, 58(4), 181–183. doi:10.1080/00220671.1964.10883203. Rothney, J. W., & Bear, R. M. (1938). An evaluation of visual factors in reading. Hanover: Dartmouth College.
123
A. Roy-Charland et al. Sparfeld, J. R., Kimmel, R., Lo¨wenkamp, L., Steingra¨ber, A., & Rost, D. H. (2012). Not read, but nevertheless solved? Three experiments on PIRLS multiple choice reading comprehension test items. Educational Assessment, 17(4), 214–232. doi:10.1080/10627197.2012.735921. Tamm, L., Epstein, J. N., Denton, C. A., Vaughn, A. J., Peugh, J., & Willcutt, E. G. (2014). Reaction time variability associated with reading skills in poor readers with ADHD. Journal of the International Neuropsychological Society, 20(3), 292–301. doi:10.1017/S1355617713001495. Tian, S. (2006). Passage dependency of reading comprehension items in the GEPT and the TOEFL. The Reading Matrix, 6, 66–84. Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). The Stanford-Binet intelligence scale: Guide for administering and scoring. Riverside Pub. Co. Viswesvaran, C., & Ones, D. S. (1995). Theory testing: Combining psychometric meta-analysis and structural equations modeling. Personnel Psychology, 48(4), 865–885. doi:10.1111/j.1744-6570. 1995.tb01784.x. Wechsler, D. (2009). Weschsler individual achievement test 3rd edition: Manual. San Antonio: Psychological Corporporation. Willms, J. D., 1992 (2004). Monitoring school performance: A guide for educators. Bristol, PA: Taylor and Francis Inc.
123