Original article Reliability of DMSA for the diagnosis of renal parenchymal abnormality in children Jonathan C. Craig1, 2, 6, Les Irwig1, 2, Melissa Ford3, Narelle S. Willis1, Robert B. Howman-Giles3, 6, Roger F. Uren3, Monica A. Rossleigh4, Simon Grunewald5 1 Centre
for Kidney Research, The New Children’s Hospital, Sydney, Australia of Public Health and Community Medicine, University of Sydney, Australia 3 Department of Nuclear Medicine, The New Children’s Hospital, Sydney, Australia 4 Department of Nuclear Medicine, Sydney Children’s Hospital, Sydney, Australia 5 Department of Nuclear Medicine and Diagnostic Ultrasound, Westmead Hospital, Sydney, Australia 6 Department of Paediatrics and Child Health, University of Sydney, Sydney, Australia 2 Department
Received 28 April and in revised form 9 July 2000 / Published online: 22 September 2000 © Springer-Verlag 2000
Abstract. The objective of this study was to evaluate the variability of technetium-99m dimercaptosuccinic acid (DMSA) scintigraphy interpretation by four nuclear medicine physicians for the diagnosis of renal parenchymal abnormality in children, and to compare variability among three different DMSA methods in clinical use: planar alone, single-photon emission tomography (SPET) alone, and planar with SPET. One hundred consecutive DMSA studies were independently interpreted 3 times by four participating nuclear medicine specialists from different departments and in random order. All scans were classified by the presence or absence of renal parenchymal abnormality using the modified four-level grading system of Goldraich. Indices of agreement were the percentage of agreement and the kappa statistic. Disagreement was analysed using children, kidneys and kidney zones (three zones per kidney). Using patients as the unit of analysis, agreement for planar and planar with SPET methods was 87%–88% (kappa 0.74) for the normal-abnormal scan classification. The corresponding agreement value for the SPET alone method was 78% (kappa 0.56). Similarly, substantial disagreement (disagreement ≥2 categories) occurred in 2.5% and 1.3% of comparisons between observers for planar alone and planar with SPET, respectively, but in 5.2% of comparisons for SPET alone. These results did not vary appreciably whether interpretation of patients, kidneys or kidney zones was compared. It is concluded that the four experienced nuclear medicine physicians showed substantial agreement in the interpretation of planar alone and planar with SPET DMSA scintigraphic images. Interpretation of SPET DMSA images, without planar images, was significantly more variable than interpretation using the Correspondence to: J.C. Craig, Centre for Kidney Research, The New Children’s Hospital, PO Box 3515, Parramatta, NSW 2124, Australia
two other methods, disagreement occurring in more than 20% of comparisons. SPET DMSA scintigraphy, when used without planar images, does not provide a firm basis for clinical decision making in the care of children who may have renal damage. There is no apparent benefit of reduced variability from the extra provision of SPET data to nuclear medicine physicians who already have planar images. Keywords: DMSA – Kidney – Children – Interobserver variability Eur J Nucl Med (2000) 27:1610–1616 DOI 10.1007/s002590000349
Introduction Technetium-99m dimercaptosuccinic acid (DMSA) scintigraphy has become the reference standard for assessing renal cortical damage in children and adults and is now a widely used and common diagnostic test [1, 2]. Two groups of comparative studies have established DMSA scintigraphy as the reference standard for renal cortical damage, particularly post-infectious renal damage: studies comparing the test performance of DMSA with the gold standard, histopathology [3, 4], and studies comparing the results of DMSA and other imaging modalities such as renal sonography and intravenous pyelography [5, 6]. There remain a number of important unanswered questions concerning the applicability of DMSA scintigraphy to patient care. First, the clinical significance of DMSA-detected cortical defects is not known because the link between imaging abnormalities and patient-centred outcomes like hypertension and renal insufficiency
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000
1611
has yet to be established. Second, the most accurate modality of DMSA cortical scintigraphy is unclear. DMSA scintigraphy is not a homogeneous technique, but encompasses planar high-resolution collimated DMSA scintigraphy, pinhole collimated DMSA scintigraphy, and single-photon emission tomography (SPET), each used alone or in combination. Third, the extent of physician variability in the interpretation of DMSA images has not been well established. This study was designed to answer this last question. Differences in the interpretation of qualitative or semi-quantitative diagnostic tests such as medical imaging, clinical examination and histopathology by trained personnel are widely assumed to be uncommon, trivial and clinically unimportant. Many diagnostic tests have not been formally evaluated in this manner, or have been shown to be very observer-dependent, with a surprisingly high frequency of disagreement in interpretation by different experts. For example, in a recent study of the variability in ten radiologists’ interpretations of 150 mammograms for the diagnosis of breast cancer, the median agreement was 78% [7]. That is, in 22% of mammograms, trained radiologists did not agree in their interpretation and either interpreted the mammogram as normal or suggested further follow-up. The implications of these variations in diagnostic test interpretations for clinical decision making and patient care are clear. We have previously shown that two nuclear medicine physicians working within the same department disagreed in their interpretation of DMSA images as normal or abnormal 14% of the time [8]. Agreement beyond that expected by chance alone, or kappa, was 0.69, which is classified as substantial agreement [9]. Our previous study did not compare interpretation by nuclear medicine physicians from different departments and did not compare interpretation of non-planar DMSA, nor could we find these comparisons in the literature. Our current study was designed to determine the frequency and degree of variability in interpretation of DMSA among four nuclear medicine physicians from different departments using the different DMSA modalities in clinical use: planar, SPET, and planar with SPET.
Materials and methods Case selection. One hundred DMSA studies were obtained from August 1996 to March 1997 from consecutive children referred to a single paediatric nuclear medicine department for cortical scintigraphy. Four hard-copy sets of scans were made from the digitalised images to replicate different clinical practices: planar, SPET alone and planar with SPET. All three groups of scans were sent to all four nuclear medicine physicians for reporting, separated in time by no less than 28 days to ensure that each group was interpreted blinded to the results of the other two groups of DMSA scans. Within each group of 100, the order of scans presented to each physician was randomly assigned. Patient details were removed from all scans, and no clinical details were provided.
Participating nuclear medicine physicians. Four physicians were invited to participate, two from the same department, the third from an adult and paediatric nuclear medicine department and the fourth from the adjacent adult general hospital. All are specialist nuclear medicine physicians and have 10–20 years of clinical experience reading DMSA scintigraphy. In 1998 three of the four physicians had interpreted 285–330 DMSA studies, equivalent to 10%–18% of their routine clinical practice. The fourth physician had interpreted about 150 DMSA studies per year from 1981 until 1995, and ten scans subsequently because of the transfer of paediatric nuclear medicine services to a new children’s hospital. Data acquisition. All four physicians independently interpreted all DMSA studies between November 1997 and April 1998. It was not feasible to blind the nuclear medicine physicians to the objectives and design of the study. The technical aspects of DMSA scintigraphy did not change during the period. 99mTc-DMSA (40–120 MBq) was injected intravenously, after adjustment for body weight. Planar anterior, posterior, and right and left posterior oblique images of both kidneys were obtained for 6 min each (approximately 200 kcounts), 3 h post injection, on a Siemens Multispect (MS3) triple-head gamma camera using low-energy high-resolution collimators. A SPET study was then performed with acquisition parameters depending on the size of the child. If the infant was less than 6 months of age or less than 10 kg, a circular orbit SPET was acquired using a paediatric palette attached to the normal scanning bed. The palette allows the camera heads to be contracted to their innermost position, thereby ensuring maximum resolution. SPET was acquired for 25 s per view for 120° with a zoom of 2. For all other children an elliptical orbit SPET was performed on the normal scanning bed for 20 s per view for 120° with a zoom of 1.23. All SPET data were reconstructed using a Butterworth filter with a cut-off of 0.3 and power factor of 5. Reconstruction also corrected for the natural lie of the kidneys by reconstructing the image in oblique re-orientation in both sagittal and coronal planes. A DMSA study was defined as abnormal if there was decreased or absent uptake of tracer in the renal cortex causing distortion or indentation of the normal renal outline. The defects were localised to three regions of the kidney – mid-zone, upper pole or lower pole – or were reported as diffuse if generalised photon deficiency was evident throughout the kidney. The studies were also graded according to the severity of the abnormality using a modified system of Goldraich [10]: grade I, no more than two cortical defects; grade II, more than two cortical defects but remnant areas of normal renal parenchyma; grade III, diffuse reduction in uptake of DMSA throughout the whole kidney with or without focal defects; grade IV, shrunken kidney contributing <10% of overall renal functional mass. All studies were displayed on radiographic viewing boxes. All physicians recorded their observations using an identical checklist, reviewed with each physician before the study commenced. They were asked to interpret the images as they would in their usual clinical practice, and no external time restraints were imposed. Data analysis. Inter-observer variability in the interpretations of the 100 DMSA studies was assessed from the readings of all participants. Six comparisons were possible for each set of DMSA studies: observer 1 and observer 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4. The measures of variability used were the percentage of agreement [11] (the frequency of concordant classifications divided by the total number of subjects) and the kappa statistic [12]. The kappa statistic is the preferred test of agreement because it measures agreement beyond that expected by chance alone. When
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000
1612 kappa equals 1, agreement is perfect; when it equals 0, agreement is no better than would be expected by chance. Kappa values greater than 0.81 represent almost perfect agreement, those between 0.61 and 0.80, substantial agreement, those between 0.41 and 0.60, moderate agreement, and those less than 41%, fair agreement. The agreement for each DMSA method (planar, SPET, planar with SPET) was summarised using the mean weighted percentage of agreement of the six comparisons possible (observers 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4) and an overall kappa for each method using the method of Landis and Koch [13] and STATA software. Percentage of agreement was weighted for the number of integers removed from perfect agreement in an r × c table format. With this method, smaller differences between observers (e.g. between grades 0 and I) have less influence on the computed kappa than do larger differences (e.g. between grades 0 and III) [12]. As an additional measure of substantial disagreement, the mean frequency of disagreement for ≥2 categories was calculated for each DMSA group. Variability was assessed at three levels: patient, kidney and kidney zone (upper pole, mid zone, lower pole and diffuse). Of the three options, disagreement in the interpretation of patient studies was regarded as the most clinically important. That is, if a difference in interpretation of side or location of a cortical defect were to occur, the management implications would be less than if a DMSA study were to be reported as normal by one physician and abnormal by another. Patients were assigned to the group corresponding to the maximal grade of abnormality for either kidney and the sum of the individual kidney grades for both kidneys. Because other clinical decisions are based upon the DMSA result of individual kidneys, variability of interpretation for individual kid-
neys was also measured. This would be relevant if unilateral ureteric re-implantation were to be considered because of the development of a new DMSA scan defect. Disagreement in kidney zone interpretation was also analysed, corresponding to individual lesions. Potential sources of variability. On completion of the study, all four participants met to discuss cases where DMSA scan interpretation differed and identify explanations for variations in reporting practices.
Results Frequency of DMSA abnormality The mean proportion of the 100 patient and 200 kidney studies reported as abnormal was approximately 40% and 25%, respectively (Tables 1, 2). Observer variability Overall, variability in DMSA interpretation by the four observers was significantly different among the three DMSA groups, such that there was a 10% lower rate of agreement and a 0.1–0.2 lower kappa for SPET DMSA than for planar and planar with SPET (Tables 3, 4). On average, 12%–13% of planar and planar with SPET images were interpreted as normal by one physician and
Table 1. Distribution of DMSA interpretation in 100 patients classified by observer and method with maximum grade of abnormality in either kidney defining overall patient grade Observer
Planar
SPET
Planar + SPET
0
I
II
III
IV
0
I
II
III
IV
0
I
II
III
IV
1 2 3 4
62 65 58 60
26 22 26 21
5 2 7 6
6 10 8 12
1 1 1 1
49 63 65 53
27 29 21 31
15 4 6 5
8 3 6 10
1 1 1 1
52 60 62 54
31 26 21 29
6 3 7 4
10 10 9 12
1 1 1 1
Mean
61
23
5
9
1
58
27
7
7
1
57
27
5
10
1
Table 2. Distribution of DMSA scintigraphy in 200 kidneys by observer and method Observer
Planar
SPET
Planar + SPET
0
I
II
III
IV
0
I
II
III
IV
0
I
II
III
IV
1 2 3 4
153 158 146 150
32 26 36 28
8 9 9 9
6 6 8 12
1 1 1 1
129 150 153 141
41 40 31 41
21 6 7 8
8 3 7 10
1 1 2 1
135 149 148 142
44 32 32 38
9 5 10 7
9 11 9 12
1 1 1 1
Mean (%)
152 (76)
30 (15)
9 (4.5)
8 (4)
1 (0.5)
143 38 (71.5) (19)
11 (5.5)
7 (3.5)
1 (0.5)
144 (72)
37 8 (18.5) (4)
10 (5)
1 (0.5)
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000
1613 Table 3. Agreement (%) for different DMSA methods for each grading system reported, for patients, kidneys and kidney zones (mean values of six comparisons)
Table 4. Overall kappa for different DMSA methods using different methods for grading renal parenchymal abnormality
Category
No.
Planar
SPET
Planar + SPET
Patients Normal/abnormal (0, 1) Maximum grade (0, 1, 2, 3, 4) Sum of grades (0, 1, 2, 3, 4, 5, 6)
100 100 100 100
88 95 95
78 95 91
87 95 94
Kidneys Normal/abnormal (0, 1) Grade of abnormality (0, 1, 2, 3, 4)
200 200 200
91 96
85 93
89 96
Kidney zones Upper Mid Lower Diffuse
200 200 200 200 200
90 95 92 97
86 92 92 98
88 94 94 98
Category
No.
Planar
SPET
Planar + SPET
Patients Normal/abnormal (0, 1) Maximum grade (0, 1, 2, 3, 4) Sum of grades (0, 1, 2, 3, 4, 5, 6)
100 100 100 100
0.74 0.66 0.61
0.56 0.45 0.39
0.74 0.67 0.59
Kidneys Normal/abnormal (0, 1) Grade of abnormality (0, 1, 2, 3, 4)
200 200 200
0.74 0.65
0.62 0.50
0.71 0.66
Kidney zones Upper Mid Lower Diffuse
200 200 200 200 200
0.63 0.60 0.64 0.69
0.57 0.47 0.67 0.51
0.64 0.57 0.71 0.78
Table 5. Frequency of substantial disagreement (two categories) for each DMSA group
* 95% confidence limits in brackets
Planar SPET Planar + SPET
Patients (600 comparisons)
Kidneys (1200 comparisons)
Frequency
Percent*
Frequency
Percent*
15 31 8
2.5 (1.4–4.1) 5.2 (3.5–7.3) 1.3 (0.6–2.6)
21 33 8
1.8 (1.1–2.7) 2.8 (1.9–3.8) 0.7 (0.3–1.3)
abnormal by a second, or vice versa. For SPET alone this occurred in 22% of cases. Corresponding kappa values, or agreement beyond expected by chance alone, were 0.74 and 0.56. Agreement was consistently higher for planar, and planar with SPET, than for SPET alone, irrespective of whether the images were classified by patient, kidney or kidney zone (Tables 3, 4). Comparing interpretation of patients between observers, substantial disagreement (≥2 categories of disagreement) occurred in 2.5% of cases for planar DMSA scintigraphy, 5.2% of cases for SPET alone and 1.3% of
cases for planar with SPET. Similar substantial disagreement was observed when interpretation of kidneys was compared (Table 5). Possible sources of variability After completion of all reporting, all four nuclear medicine physicians met to review specific studies chosen as examples of disagreement and to comment on possible reasons for variability in reporting. Four sources of variability were identified. First, in some cases the study was
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000
1614 Fig. 1. A Example of disagreement over planar images. The left kidney shows focal areas of renal cortical thinning at the upper and lower poles (grade I changes), and the right kidney is normal (consensus interpretation). Some observers had initially reported that there was no normal renal parenchyma between the focal areas of abnormality (grade III changes). B Example of disagreement over SPET images. After review, all four observers agreed that the left kidney was diffusely abnormal with <10% differential function (grade IV) and that the right kidney had bipolar focal defects with normal renal parenchyma between the focal defects (grade I). Initially some observers had reported multiple focal defects (grade II)
technically suboptimal, and the images were blurred. This occurred in children who moved while the image was being obtained. The adjustment for this poor image quality in the interpretation of perceived DMSA abnormality varied with the observer. For example, if there was blurring of the image due to patient motion, one observer remarked that he was less likely to interpret an area of relative photon deficiency as a cortical defect. Second, interpretation of study appearances differed when the perceived abnormality may have been due to a normal or exaggerated anatomical structure, such as the pelvicalyceal system. Third, in some cases the same abnormality was perceived differently. For example, one observer regarded renal parenchyma between focal defects as normal and another as abnormal, but both commented that they “saw what the other was seeing”. Fourth, one nuclear medicine physician saw a cortical defect which was missed by another, who noticed the defect when the scan was re-shown, and vice versa. (See Figs. 1 and 2 for examples of agreement and disagreement in image interpretation.). All nuclear medicine physicians reported increased confidence in DMSA interpretation when provided with both planar and SPET images.
Fig. 2. A Example of perfect agreement in planar image interpretation. All observers reported the left kidney as normal and the right kidney as grade III abnormality (focal defects and diffuse parenchymal abnormality). B Example of perfect agreement in SPET
image interpretation. All observers reported the left kidney as normal and the right kidney as having focal defects involving the upper and lower poles with normal renal parenchyma elsewhere (grade I)
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000
1615
Discussion This study has demonstrated a high level of inter-observer agreement in the diagnosis of renal parenchymal abnormality in children using planar DMSA and planar with SPET. The kappa value for the normal/abnormal dichotomy was 0.74, which represents substantial agreement. However, in about 12% of comparisons there were clinically important differences in interpretation, with one physician reporting a DMSA study as normal and another reporting the same study as abnormal. The majority of these cases of disagreement were within one category of abnormality, and in only 1%–2% of comparisons was there disagreement across two or more categories of abnormality. These conclusions are robust using different methods by which the DMSA study results for individual patients are classified: the sum of the abnormality for both kidneys, and the maximum grade of abnormality for either kidney, individual kidneys and individual kidney zones. There are important clinical and research implications of these findings. First, the management of children at risk of kidney damage is not changed appreciably by which nuclear medicine physician interprets the DMSA study. Clinicians, however, also need to be aware that, as with any other qualitative or semi-quantitative diagnostic test, there is some degree of variability in reporting practices and that misclassification of children will inevitably occur. Paediatric clinicians can be reassured that disagreement occurs infrequently with these two methods of DMSA scintigraphy and that substantial disagreement is a rare event. Second, planar alone and planar with SPET DMSA scintigraphy are valid and reproducible methods by which renal parenchymal abnormality can be diagnosed, and this is essential for clinical research. Urinary tract infection is a common and important health problem. The validity of prospective cohort studies designed to observe the outcome of children with urinary tract infection, and of randomised controlled trials to assess the efficacy of therapeutic interventions, requires reproducible diagnosis of renal parenchymal abnormality, which is an essential baseline and outcome variable. Planar alone and planar with SPET DMSA scintigraphy provide the diagnostic tools for these studies. This level of agreement compares favourably with other clinical, radiological and biochemical diagnostic tests. A survey of publications for 1997–1998, in which inter-rater variability of a number diagnostic tests in children was analysed using kappa (search strategy available upon request), showed a wide variety of values but generally below that of DMSA scintigraphy: clinical signs of pneumonia, 0.3–0.7 [14]; clinical signs of adenoidal obstruction, 0.84–0.91 [15]; 24-h intra-oesophageal pH monitoring for gastro-oesophageal reflux, 0.32 [16]; and physical findings on sexual abuse, 0.15–0.39 [17]. Inter-rater agreement was appreciably lower for SPET alone compared with either planar alone or planar with
SPET DMSA. This conclusion was consistent for the different methods used to classify renal parenchymal abnormality, patient, kidney, kidney zone and substantial disagreement. Typical kappa values for SPET DMSA scintigraphy were 0.55–0.60, which is classified as moderate, with percentage of agreement about 80%. That is, in about 20% of cases one observer interpreted the scan as abnormal and another observer, as normal. Substantial disagreement occurred in 5% of cases. This is a clear disadvantage for SPET alone relative to the other forms of DMSA scintigraphy available. Paediatric clinicians and nuclear medicine physicians need to balance this important problem of SPET DMSA scintigraphy with other factors like cost, sensitivity and specificity, time for imaging, and provision of useful information of prognostic and therapeutic value. Using reproducibility criteria, SPET should only be performed with planar DMSA scintigraphy. Some reasons were identified why there was more disagreement with SPET-alone interpretation compared with planar-alone and planar with SPET. Because SPET provides many more images than planar imaging, there are many more possibilities for an observer to interpret an area of kidney as abnormal. Areas are also magnified compared with the overall kidney size. For every kidney, SPET provides more detail and more areas of possible abnormality. If uncertain, some observers were more likely to interpret the SPET image as normal and others as abnormal. When also provided with planar images, observers generally only reported the possible abnormality as abnormal if both SPET and planar images were abnormal, thereby reducing interobserver variability. Does SPET have incremental value over planar DMSA images alone? These data do not suggest that the extra provision of SPET DMSA images to nuclear medicine physicians who already have planar images results in improved agreement. One exception to this may be a trend towards a reduction in frequency of substantial disagreement when SPET is added to planar images. The extra costs in terms of time and technical equipment do not result in less variability in interpretation even though more kidney detail and information is provided. If SPET DMSA is of extra value, justification for its continued use must be found elsewhere. The inter-rater agreement of planar DMSA scintigraphy in this study was almost identical to that which we have reported previously. In our earlier study we only analysed agreement between two observers working within the same department and we were cautious in generalising the observed high level of agreement of DMSA interpretation from within to across nuclear medicine departments. In this study, the two original nuclear medicine physicians again participated, but we also included two nuclear medicine physicians from different departments, including a predominantly adult nuclear medicine physician. Because there was no apparent “departmental” effect, we could be reasonably confident in
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000
1616
generalising these results to other between-observer and across-department comparisons. Whether there is the same level of agreement between nuclear medicine physicians from different countries and with less DMSA scintigraphy experience remains unclear. In conclusion, four experienced nuclear medicine physicians showed excellent agreement in the interpretation of planar alone and planar with SPET DMSA scintigraphic images. These methods provide a firm basis for clinical decision making and clinical research directed at the care of children at risk of renal damage. Interpretation of SPET images, without planar images, was significantly more variable than interpretation with the two other methods used, so that disagreement occurred in more than 20% of comparisons. SPET DMSA scintigraphy, when used without planar images, does not provide a firm basis for clinical decision making. There is no apparent benefit of reduced variability from the extra provision of SPET data to nuclear medicine physicians who already have planar images. Acknowledgements. Thanks are expressed to Petra Macaskill for providing assistance with calculating overall kappa values.
References 1. Craig JC, Irwig LM, Knight JF, Roy LP. Trends in the health burden due to urine infection in Australian children. J Paediatr Child Health 1997; 33: 434–438. 2. Rushton HG, Majd M. Dimercaptosuccinic acid renal scintigraphy for the evaluation of pyelonephritis and scarring: a review of experimental and clinical studies. J Urol 1992; 148: 1726–1732. 3. Rushton HG, Majd M, Chandra R, Yim D. Evaluation of 99m technetium-dimercapto-succinic acid renal scans in experimental acute pyelonephritis in piglets. J Urol 1988; 140: 1169–1174.
4. Rossleigh MA, Farnsworth RH, Leighton DM, Yong JLC, Rose M, Christian CL. Technetium-99m dimercaptosuccinic acid scintigraphy studies of renal cortical scarring and renal length. J Nucl Med 1998; 39: 1280–1285. 5. Goldraich NP, Goldraich IH. Update on dimercaptosuccinic acid renal scanning in children with urinary tract infection. Pediatr Nephrol 1995; 9: 221–226. 6. Mackenzie JR. A review of renal scarring in children. Nucl Med Commun 1996; 17: 176–190. 7. Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists’ interpretations of mammograms. N Engl J Med 1994; 331: 1493–1499. 8. Craig JC, Irwig LM, Howman-Giles R, Uren R, Bernard E, Knight JF, Sureshkumar P, Roy LP. Variability in the interpretation of DMSA scintigraphy after urine infection. J Nucl Med 1998; 39: 1428–1432. 9. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–174. 10. Goldraich NP, Goldraich IH, Anselmi OE, Ramos OL. Reflux nephropathy: the clinical picture in South Brazilian children. Contrib Nephrol 1984; 39: 52–67. 11. Fleiss JL. Statistical methods for rates and proportions, 2nd edn. New York: Wiley, 1981. 12. Kramer MS, Feinstein AR. Clinical biostatistics. LIV. The biostatistics of concordance. Clin Pharmacol Ther 1981; 29: 111–123. 13. Landis JR, Koch GG. An application of hierarchical kappatype statistics in the assessment of majority agreement among multiple observers. Biometrics 1977; 33: 363–374. 14. Margolis P, Gadomski A. Does this infant have pneumonia? JAMA 1998; 279: 308–313. 15. Paradise JL, Bernard BS, Colborn DK, Janosky JE. Assessment of adenoidal obstruction in children: clinical signs versus roentgenographic findings. Pediatrics 1998; 101: 979–986. 16. Mahajan L, Wyllie R, Oliva L, Balsells F, Steffen R, Kay M. Reproducibility of 24-h intraesophageal pH monitoring in pediatric patients. Pediatrics 1998; 101: 260–263. 17. Sinal SH, Lawless MR, Rainey DY, Everett VD, Runyan DK, Frothingham T, Herman-Giddens M, St Clair K. Clinician agreement on physical findings in child sexual abuse cases. Arch Pediatr Adolesc Med 1997; 151: 497–501.
European Journal of Nuclear Medicine Vol. 27, No. 11, November 2000