Eur. Radiol. (2001) 11: 2454±2459 DOI 10.1007/s003300101079
Ansgar Malich Christiane Marx Mirjam Facius Thomas Boehm Marlies Fleck Werner A. Kaiser
Received: 29 September 2000 Revised: 6 March 2001 Accepted: 9 July 2001 Published online: 5 September 2001 Springer-Verlag 2001
)
A. Malich ( ) ´ C. Marx ´ M. Facius ´ M. Fleck ´ W. A. Kaiser Institute of Diagnostic and Interventional Radiology, Friedrich-Schiller-University Jena, Bachstrasse 18, 07740 Jena, Germany E-mail:
[email protected] Phone: +49-0 36 41-93 33 84 Fax: +49-0 36 41-93 40 66 T. Boehm Institute of Diagnostic Radiology, Universitätsspital Zürich, Rämistrasse 100, 8032 Zürich, Switzerland
BREAST
Tumour detection rate of a new commercially available computer-aided detection system
Abstract The aim of this study was to determine the tumour detection rate and false positive rate of a new mammographic computer-aided detection system (CAD) in order to assess its clinical usefulness. The craniocaudal and oblique images of 150 suspicious mammograms from 150 patients that were histologically proven to be malignant were analysed using the Second Look CAD (CADx Medical Systems, Quebec, Canada). Cases were selected randomly using the clinic's internal tumour case sampler. Correct marking of the malignant lesion in at least one view was scored as a true positive. Marks not at the location of the malignant lesion were scored as false positives. In addition, mammograms with histologically proven benign masses (n = 50) and microcalcifications (n = 50), as well as 100 non-suspicious mammograms, were scanned in order to determine the value of false-positive marks per image. The 150 mammograms included 94 lesions that were suspicious due to masses, 26 due to microcalcifications and 30 showed both signs of malignancy. The overall
Introduction Breast carcinoma is one of the most common malignant tumours. In 1996 approximately 180,000 new cases of breast cancer were diagnosed and approximately 44,000 patients died from breast cancer in the United
sensitivity was 90.0 % (135 of 150). Sensitivity on subsets of the data was 88.7 % (110 of 124) for suspicious masses (MA) and 98.2 % (55 of 56) for microcalcifications. Eight of 14 false-negative cases were large lesions. The overall false-positive rate was observed as 0.28 and 0.97 marks per image of microcalcifications and masses, respectively. The lowest false-positive rates for microcalcifications and MA were observed in the cancer subgroup, whereas the highest false-positive rates were scored in the benign but mammographically suspicious subgroups, respectively. The new CAD system shows a high tumour detection rate, with approximately 1.3 false positive marks per image. These results suggest that this system might be clinically useful as a second reader of mammograms. The system performance was particularly useful for detecting microcalcifications. Keywords Computer-aided detection ´ Breast ´ Neoplasms ´ Mammography
States [1]. The lifetime incidence rate is approximately 8 % for women who live to 70 years of age [2]. Early detection is of great importance for improved prognosis and therapy. Mammography is a common, well-established method for early detection of breast cancer; however, inter-
2455
pretation is often difficult and depends on the expertise and experience of the radiologist. Several studies have shown that breast cancer detection rates can be improved by up to 15 % using a second reader [3, 4, 5, 6, 7]. It may not be feasible to routinely perform a second reading by a radiologist due to financial, technical and logistical restraints; therefore, efforts were made as early as 1967 to develop a computer-aided detection (CAD) system [8]. These systems are designed to help radiologists detect suspicious masses (MA) and microcalcifications (MC) earlier and more accurately in mammography screening. Funovics et al. showed that the sensitivity of breast cancer detection increases significantly when a radiologist uses a CAD system [9]. Other studies have demonstrated the capability of CAD systems in detecting approximately one-half of ªmissedº breast cancers [10, 11]. Currently, only two CAD systems are commercially available. The quality of such systems depends critically on the tumour detection rate and the number of falsepositive (FP) marks per image. Both sensitivity and FP marks per image have already been well investigated for the ªImageCheckerº system (R2, Los Altos, Calif.) [12, 13, 14]. The sensitivity data varies between 59 and 87 %, depending on the case selection criteria and the definition of true positive (TP) used [12, 13, 15, 16,17]; however, to date, no studies, have assessed the sensitivity and FP marks per image for the newly available ªSecond Lookº CAD system (CADx Medical Systems, Quebec, Canada). It is known that a high FP rate (i.e. low specificity) may constitute a problem during the actual usage of CAD [9, 12, 13, 18]. In addition, studies have shown that the tumour detection rate may be limited by the size of the lesion [12, 19]. The aim of the present study was to evaluate the tumour detection rate of the Second Look CAD system by analysing biopsy-proven cases from randomly selected mammograms, including all cancer types typically seen in a large mammography department, in order to simulate a clinical setting. The total number of FP marks per image is also investigated and determined.
Patients and methods A retrospective analysis of the mammograms for 150 patients with biopsy-proven cancer lesions was performed. Cases were randomly selected from June 1993 to March 2000 using the institutional tumour case sampler. The cases for this study were randomly chosen, with every fifth biopsy-proven cancer case being selected. All mammograms showing tumour-induced changes that were histologically proven in the department, leading to the diagnosis of breast cancer, were stored separately in the ªinternal tumour case samplerº. The majority of these cases were screen-detected. Cases first detected in external mammographic centres were excluded from the tumour sampler. All cases with more than one mammo-
graphically visible suspicious lesion were excluded from the study. No other pre-selection was performed. All tumour sizes and lesion types were included in the study. All 150 suspicious foci were detected and diagnosed mammographically and treated surgically. The histopathological findings revealed 150 malignant lesions, as shown in Table 1. In 94 of the cases, the radiologist reviewing the mammograms considered the case suspicious due to MA. In 30 cases, both MA and MC were found and described as suspicious, whereas in 26 cases a malignant tumour was diagnosed due to the presence of suspicious MC. In addition, 100 contralateral mammograms (200 images) were scanned for use as controls. The inclusion criteria included the absence of any suspicious mammographic lesion as well as a followup of at least 2 years confirming that these mammograms were normal. Because of these criteria, and because of cases of contralateral ablation, only 100 contralateral mammograms could be included in the study. We scanned 50 mammograms (100 images) from 47 patients from our institution with a mammographically suspicious MC that was histologically proven to be benign. All cases were consecutively chosen and underwent a vacuum-assisted minimally invasive breast biopsy with removal of the MC as of March 1999 (corresponding to the installation date of the Second Look CAD system in the department). Finally, beginning in March 1999, 50 mammograms (100 images) from 50 patients having a histologically proven benign MA were selected consecutively for inclusion in the study. The lesions underwent either an ultrasound-guided or surgical biopsy. Each of the 350 mammograms was read and interpreted independently by two experienced radiologists. Between June 1993 and March 2000, the Mammodiagnost UC (Philips, Eindhoven, The Netherlands) or Senographe DMR (GE Medical Systems, Milwaukee, Wis.) were used for all mammographic examinations. Each mammographic examination consisted of two images: the craniocaudal and medial lateral oblique, or oblique view of the right or left breast containing one suspicious lesion. These two views were processed by the Second Look CAD system. This new CAD system first digitizes the mammograms. The digitized images are then analysed using image processing, proprietary feature selection and mathematical computations to highlight potential areas of concern on a Mammagraph (a laser-printed output). According to the specifications, the system is able to provide two different types of markings: an ellipse for MA, and a rectangle for MC. There is no size limitation for MA or for MC marking by the system, and marks for MC and MA are determined independently from each other. As indicated above, the main goal of this study was to determine the lesion-related tumour detection rate of the Second Look CAD system. To accomplish this, the location on the Mammagraph corresponding to the mammographically detected and histopathologically confirmed cancer lesion was analysed to determine if the CAD system correctly marked the lesion. Since CAD systems are designed to be used as an additional detection aid by the radiologist, the lesion was scored as a TP if the CAD system marked it with the correct lesion type (MA or MC) in at least one of the two views. As all cases included in the study had only one mammographically visible suspicious lesion, all marks on the Mammagraph not located on the suspicious lesion were scored as FPs to determine the number of FP marks per image. In 30 cases, as already noted, the cancer-induced lesion was visible by both mammographic signs, suspicious MA and MC, on this one location. Subsequently, the CAD system's claimed definition of a TP was used by highlighting the correct marking of a tumour-induced le-
2456
Table 1 Histopathology of all cases, mean size and standard deviation of suspicious mass (MA). DCIS ductal carcinoma in situ; LCIS lobular carcinoma in situ Histopathological findings
N
Mean largest diameter in X-ray (mm)
Standard deviation of MA size (mm)
Largest diameter of smallest lesion (mm)
Largest diameter of largest lesion (mm)
Invasive ductal carcinoma Invasive lobular carcinoma Invasive tubular carcinoma Mucinoid adenocarcinoma DCIS+LCIS Intraductal non-invasive cancer Other malignancies All
92 22 10 3 9 7 7 150
21.8 23.2 15.1 28 18 19.2 27.0 21.6
11.9 13.5 5.5 8.7 10.9 10.9 15.3 11.5
7 8 10 24 7 11 11 ±
63 64 22 35 38 35 46 ±
sion in at least one of the two views; thus, the tumour detection rate is defined as the rate of correct marking of a suspicious lesion in at least one of the images.
Results Tumour detection rate Fifty-five of 56 suspicious MC lesions were marked correctly in at least one of the two views by the new CAD system, resulting in an overall tumour detection rate of 98.2 %. One hundred ten of 124 suspicious MA lesions were correctly marked by the CAD system, resulting in a tumour detection rate of 88.7 %. The mean radiologically visible size of the marked MA lesions was 17.2 mm with a standard deviation of 12.7 mm. In contrast, the MA that were not marked were larger (28.4 mm) with a standard deviation of 14.5 mm (n = 14). In 8 of 14 nonmarked MA, the largest diameter was > 3.2 cm, whereas in the remaining 6 cases, this parameter was < 2.0 cm. The mean size of the smallest detected MA was 4 4 3 mm. In total, 135 of 150 histologically proven cancers were correctly highlighted by the CAD system in at least one of the two views; thus, the overall tumour detection rate was found to be 90.0 %. Five of 14 non-marked MA were highlighted due to the CAD system's correct marking of the same lesion with MC. Consequently, only 10 of 150 cancer-induced lesions (9 MA and 1 MC) were missed; thus, the corrected tumour detection rate was 93.3 %. Of these 9 MA, 7 had the largest diameter > 3.2 cm. False-positive marks per image On a total of 300 images (150 biopsy-proven cancer cases), 395 MA marks (1.32 MA marks per image) and 158 MC marks were placed (0.53 per image). As a result,
Table 2 Comparison of false-positive (FP) rate of different subgroups. MC microcalcifications; MA suspicious mass No. of images
FP rate MC per image
FP rate MA
Cancers Benign microcalcifications Benign MA
300 100
0.23 0.43
0.74 1.21
100
0.26
1.45
Screening cases (normals)
200
0.28
0.94
Total
700
0.28
0.97
the radiologist would need to look at, on average, 1.85 marks per image. In total, 69 MC marks were placed on a location other than the cancer lesion. This resulted in 0.23 FP marks per image for MC. Regarding MA marks, 221 were not placed on the location of the biopsy-proven cancer lesions. The average FP marks per image for MA was 0.74. Consequently, the radiologist would have to review less than 1 FP mark per image. For the 100 images having a suspicious MC, a total of 44 MC marks were placed. All these mammograms had revealed a benign histopathological finding. In addition, 123 MA marks were placed on these 100 images. This yielded 0.43 FP MC marks per image and 1.21 FP MA marks per image. On the 100 images from 50 patients having a suspicious but a histologically proven benign MA, the new CAD system placed 145 MA marks and 26 MC marks; thus, in this subgroup, the FPs per image for MC and MA was 0.26 and 1.45, respectively. Regarding the 200 non-suspicious images from 100 patients, a total of 187 MA marks and 56 MC marks were placed, producing 0.28 FPs per image for MC, and 0.94 for MA. When taking into account all of the cases that were scanned [300 images with cancer-induced mammographically visible lesions, 100 images with histological-
2457
ly proven benign MC, 100 images with histologically proven benign MA (both mammographically suspicious) and 200 images that were mammographically non-suspicious, proven by a 2-year follow-up], the values obtained were: A total of 195 MC FP marks per image and 676 MA FP marks per image leading to an FP rate of 0.28 per image for MC and 0.97 per image for MA. Table 2 compares the FP rates of the four subgroups, as well as of the total number of images scanned.
Discussion Tumour detection rate A CAD system assists the radiologist in the early detection of breast cancer by highlighting suspicious areas. A TP on a CAD system is defined as the correct identification of the cancer in at least one of the two views; thus, the important concern is whether the suspicious areas are highlighted or not. Therefore, it was not of importance whether these areas were highlighted once or twice. That is why the more relevant term when discussing tumour detection rate, the broader definition (correct marking in at least one of the two images), is used as in most publications. The sensitivity results for the Second Look CAD system was 98.2 % for MC and 88.7 % for MA: an overall sensitivity of 90.0 %. Several studies of the tumour detection rate and sensitivity of various CAD systems have been conducted [9, 16, 17, 19, 20, 21, 22, 23, 24, 25]. The case selection protocols from these studies differ considerably. Because case selection has been shown to substantially affect the evaluation of a CAD system performance, including its sensitivity [26], it is not possible to compare the results of these previous studies [23, 24]. In this study, the CAD system demonstrated a sensitivity that was at least similar to previously evaluated CAD systems. In the studies that used protocols comparable to those used in the current study [16, 19, 25] the sensitivity range was 65±87 % for MA and 70±99 % for MC. This is lower than the overall sensitivity detected in the current study for the Second Look CAD system (87.6 % for MA and 98.2 % for MC). It is worth noting that the study performed by Burhenne et al. [25] had a larger sample size than the current study, and in contrast to the present one, that study looked at the CAD performance prior to detection. The results obtained in the current study for MC revealed a value comparable to the results from the study performed by Burhenne et al. [25]. Although the number of cancer-induced MC in the current study is low, the CAD system demonstrated an ability to detect microcalcifications with a high sensitivity, suggesting that this new CAD system is promising. Nevertheless, it was not the aim of this study to compare both commercially available CAD systems. In
addition, the current study was not designed to allow such comparison because the same mammograms would need to have been scanned in the same manner by both systems. In the present study, all lesion sizes were analysed in order to simulate the clinical setting of a large mammography department. The current study did not compare to most other studies which used different protocols [9, 14, 20, 21, 24, 27, 28], examining lesions of a limited size or structure (due to limitations of the CAD systems used, e.g. [9]) or using pre-selected or pre-scanned cases [27]. For example, Sittek et al. used 1110 mammograms, from which only 39 were proven to be malignant histologically [24]. Other studies used receiver operating characteristics models or took into account only special regions of interest [14, 28]. The tumour detection rate and sensitivity obtained with this new CAD system is promising and sufficiently high to reduce the false-negative rate of mammography [8, 14]. Tumour size and overlapping marks Given the size limitations (6±32 mm) in the detectability of MA by other CAD systems, it is worth noting that 7 of 14 non-marked MA were larger than the theoretically detectable range of MA by other CAD systems. This should be taken into consideration when comparing different CAD systems. Interestingly, 5 of the 14 nondetected MA were, however, marked and highlighted by the Second Look CAD system, because the tumour did induce suspicious MC that were correctly marked. Consequently, the tumour-induced lesion was highlighted by the computer for the radiologist in those cases. If detectability is defined as the marking of the tumour, then these ªoverlappingº cases could be scored correctly as marked, yielding to a lower false-negative rate. From the remaining 9 non-marked MA, 7 were large lesions. Consequently, only 3 of 150 cancers (two MA and one MC) were either not very large or marked at least once by the CAD system. False-positive marks per image Depending on the case selection criteria and the definition of an FP mark per image, the observed values for the number of FP marks per image using different CAD systems vary significantly [26]. This study included 150 highly suspicious cases with biopsy-proven cancer, 100 screening cases, and 50 cases with MA and 50 cases with MC that were mammographically suspicious and biopsy-proven to be benign. The four different groups led to different values of FP marks per image, which is not surprising. The lowest
2458
number of FP marks per image was obtained in the cancer group, with the number of FP marks per image for MC comparable to already reported values of other systems. The reason for this may be due to the focusing of the system on the cancer lesions. In summary, the new CAD system used in this study limits the number of marks to 9 per case (4 images). The FP marks per image for MA is high in all subgroups as well as in total. As expected, this rate is highest in the subgroup with mammographically suspicious (histologically benign) MA. The reason for the high rate of FP MA marks in the subgroup of mammographically suspicious histologically benign lesions, as compared with the other subgroups, may be the existence of thick and mastopathic tissue. Although the number of FP marks per image for MA appears high, it is important to point out that the system not only marks spiculated MA, but also non-spiculated, asymmetrical densities and architectural distortions. For this reason, the FP rate of the system used in this study is not directly comparable to other CAD systems or software versions that only mark spiculated MA. In addition, the contrast between MC and the surrounding tissue is much higher than the contrast between homogeneous MA and the surrounding breast tissue. This might explain the discrepancy in both the tumour detection rates and FP rates between MC and MA. The observed rate of FP marks per image for MC of all four subgroups obtained in this study is similar to or
lower in relation to some other studies [5, 12, 19, 20, 29]. In a study by Thurfjell et al. a low FP rate for MC (0.08) and for MA (0.46) was observed [17]. Although the values of MC markers (detection rate as well as FP rate) are promising, it has to be addressed that the FP marks per image for MA is in all subgroups higher than reported in other studies (e.g. [25]). It appears to be necessary to reduce this value in future software releases. This would be helpful in increasing the clinical value of the current CAD system.
Conclusion A mammogram, while not preventing or curing the disease, is the single most effective method of early detection since it can identify cancer several years before physical symptoms develop [30]. The CAD system results show a promising high overall tumour detection rate. The number of FP marks per image is currently promising only for MC. An improvement of the algorithms to reduce the high MA FP rate seems to be necessary and would probably increase the clinical value of this technology. The results obtained in the detection of cancer-induced lesions in this study are promising and suggest the usefulness of CAD systems as a diagnostic aid to the radiologist.
References 1. American Cancer Society (1996) Breast cancer facts and figures 2. Robra PB, Swart E, Dierks ML (1994) Mammographische Früherkennung ± internationaler Standard. In: Frischbier HJ, Hoeffken W, Robra PB (eds) Mammographie in der Krebsfrüherkennung, 1st edn. Enke, Stuttgart, pp 19±29 3. Bird RE (1990) Professional quality assurance for mammography screening programs. Radiology 177: 8±10 4. Chan HP, Doi K, Vyborny CJ et al. (1990) Improvement in radiologists detection of clustered MC on mammograms: the potential of computer-aided diagnosis. Invest Radiol 25: 1102±1110 5. Zheng B, Chang YH, Staiger M, Good W, Gur D (1995) Computer-aided detection of clustered MC in digitized mammograms. Acad Radiol 2: 655±662 6. Chan HP, Sahiner B, Helvie MA et al. (1999) Improvement of radiologist's characterization of mammographic MA by using computer-aided diagnosis: an ROC study. Radiology 212: 817±827
7. Thurfjell EL, Lernevall KA, Taube AAS (1994) Benefit of independent double reading in a population-based mammography screening program. Radiology 191: 241±244 8. Winsberg F, Elkin M, Macy J, Bordaz V, Weymouth W (1967) Detection of radiographic abnormalities in mammograms by means of optical scanning and computer analysis. Radiology 89: 211±215 9. Funovics M, Schamp S, Lackner B, Wunderbaldinger P, Lechner G, Wolf G (1998) Computerassistierte Diagnose in der Mammographie: Das R2 ImageChecker-System in der Detektion spikulierter Läsionen Wien. Med Wschr 148: 321±324 10. Doi K, MacMahon H, Katsuragawa S, Nishikawa RM, Jiang Y (1999) Computer-aided diagnosis in radiology: potential and pitfalls. Eur J Radiol 31: 97±109
11. Marx C, Schütze B, Fleck M, O'Shaughnessy K, Kaiser WA (1998) Computer aided diagnosis: help to detect breast cancer earlier? A retrospective analysis of 58 cases cases with previous mammograms. Breast 4 (Suppl 1):39 12. Ehrenstein T, Kenzel PP, Hadijuana J et al. (2000) Computer assisted diagnosis in mammography: evaluation of an expert system. Eur Radiol 117:S1±S10 13. Marx C, Schütze B, Fleck M, O'Shaughnessy K, Kaiser WA (1997) Computer aided diagnosis in mammography. Eur Radiol 7:S82 14. Leichter I, Fields S, Nirel R et al. (2000) Improved mammographic interpretation of MA using computer-aided diagnosis. Eur Radiol 10: 377±383 15. Sittek H, Herrmann K, Perlet C, Künzer I, Kessler M, Reiser M. (1997) Computerunterstützte Auswertung von Mammographien- Erste klinische Erfahrungen. Radiologe 37: 610±616
2459
16. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Doi K (1999) Comparison of independent double reading and computer-aided diagnosis (CAD) for the diagnosis of breast lesions. Radiology 213 (P):S323 17. Thurfjell E, Thurfjell MG, Egge E, Bjurstam N (1998) Sensitivity and specificity of computer-assisted breast cancer detection in mammography screening. Acta Radiol 39: 384±388 18. Image Checker M 1000 System, Draft Document, 12 November 1997, pp 1±35 19. Chang YH, Zheng B, Gur D (1997) Computer-aided detection of clustered microcalcifications on digitized mammograms: a robustness experiment. Acad Radiol 4: 415±418 20. Ibrahim N, Fujita H, Hara T, Endo T (1997) Automated detection of clustered MC on mammograms. CAD system application to MIAS database. Phys Med Biol 42: 2577±2589
21. Chang YH, Zheng B, Gur D (1996) Robustness of computerized identification of MA in digitized mammograms. A preliminary assessment. Invest Radiol 31: 563±568 22. Nishikawa RM, Giger ML, Doi K et al. (1994) Effect of case selection on the performance of computer-aided detection schemes. Med Phys 21: 265±269 23. Veldkamp WJ, Karssmeijer N (1998) Accurate segmentation and contrast measurement of microcalcifications in mammograms: a phantom study. Med Phys 25: 1102±1110 24. Sittek H, Perlet C, Helmberger R, Linsmeier E, Kessler M, Reiser M (1998) Computer assisted analysis of mammograms in routine clinical diagnosis. Radiologe 38: 848±852 (in German) 25. Burhenne LJ, Wood SA, D'Orsi CJ et al. (2000) Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 215: 554±562
26. Kopans DB (1998) Breast imaging, 2nd edn. Lippincott-Raven, Philadelphia, p 214 27. Yu S, Guan L (2000) A CAD system for the automatic detection of clustered microcalcifications in digitized mammogram films. IEEE Trans Med Imaging 19: 115±126 28. Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K (1999) Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 6: 22±33 29. Gavrielides MA, Lo JY, Vargas-Voracek R, Floyd CE Jr (2000) Segmentation of suspicious clustered microcalcifications in mammograms. Med Phys 27: 13±22 30. American Cancer Society (2000) Breast cancer facts and figures 1999±2000 (http://www.cancer.org/statistics/99bcff/ early.html)