doumat of
J Cancer Res Clin Oncol (1981) 101: 331-337
~CancerResearch Clinical Oncology 9 Springer-Verlag1981
Sensitivity Determination of Cancer Screening Programmes with the Aid of "Interval Cases" F.W. Schwartz Zentralinstitut fiir die kassen/irztlicheVersorgtmgin der Bundesrepublik Deutschland, Haedenkampstr. 5, D-5000 K61n41, Federal Republic of Germany
Summary. The determination of "interval cases" bears great significance in the estimation of the diagnostic sensitivity of a cancer screening programme and the setting of an appropriate periodicity of the screening terms. The concept can only be applied purposefully with regard to tumor kinetics and its relationships to the probability of tumor detection through the examination methods applied. The methodical considerations presented also result in a criticism of the "length-biased sampling" which has constantly been pointed out in the screening theory in recent years. Key words: Cancer screening - Sensitivity - Early detection - Interval cases
The diagnostic effectiveness of screening programmes is generally described with the terms sensitivity, specificity and with prediction values. Familiarity with these basic epidemiological terms is assumed in the following paper. Prediction values, used also in many clinical studies, are inappropriate for objectifying effectiveness determinations over a long period of time or in comparison with different populations because they depend on the prevalence of the target disease in the respective periods of time or populations. Sensitivity or specificity are measurable variables which are independent of the prevalence of the disease; but their determination normally presupposes previous and follow-up examination of the screened population by means of a better test than the screening test (reference test; criterion validity). Such a procedure cannot be utilized with a mass screening programme but is restricted to small-scale studies. However, the results of such studies cannot automatically be transferred to conditions which are determining for nation-wide mass screening. For this, the concept of "interval cases" can be applied, as various authors (e.g., Kirch and Klein 1978) have done for breast cancer screening. It can be used in those cases where, in one specific region, cancer incidence and, at the same time, participants and screening results are recorded very reliably. The basic idea behind this method is, so to speak, to take "the time" as a reference test: the cancer cases symptomatically developing within a certain interval
0171-5216/81/0101/0331/$1.40
332
F . W . Schwartz
after a screening test has been carried out are evaluated as "false negatives". This raises the question whether a subsequent case of disease was actually present at the time of screening (and whether it would have been detectable by means of the best reference method). Obviously, the answer depends on the interval chosen and the tumor growth as well as on the relationship between tumor kinetics and detectability. The relationships become clear in Fig. 1. In the model introduced in Fig. 1 we assume that, with increasing growth, each tumor passes through three phases of ideal type: (1) a "non-detectable preclinical phase" (So), (2) a "detectable preclinical phase" (Sp) and (3) a "clinical phase" (So). The transition from Sp to Sc is understood such that, at this point in time, Sp is regularly ended by the occurrence of clinically manifest symptoms. To simplify matters, we further presume that this is always the case if the tumor is of the same
TUMOR SIZE
a
a' b I!
r D
bill
C
r
0
D c I
S
o
S
= Non-detectable = Detectable
preclinical
preclinical
c2
c3
phase
phase
P S
C
= Clinical
phase
Fig. 1. Schematic relationship between tumor kinetics and detectability by screening
SensitivityDeterminationof Cancer ScreeningProgrammeswith the Aid of "Interval Cases"
333
size. Corresponding considerations are assumed for the transition of So to Sp. On the one hand, this point in time is defined by the size of the tumor and, on the other hand, by the ability of the applied detection method to recognize the tumor with the size it has at this point in time. Accordingly, the detectability of all tumors lies between the beginning of Sp and the beginning of So, in other words during Sp. If we consider a long interexamination interval, i.e. a screening which is carried out at C1, and then at Cz, it becomes obvious that it is mainly the slowly developing tumors (a", a"; b") which are detected at these two points in time, whereas the interval cases, viz. those which were overlooked at the first examination although the tumors were in phase Sv, consist of the rapid developments (b, b'). Of course, it must be stated here that, with these considerations, the detection methods are assumed to exhibit a sensitivity of one in phase Sp for the time being. If I reduce the interexamination interval, I can expect to have no interval cases at a certain position (C3). However, if after a first screening a tumor which was overlooked initially and which is now clinically manifest should occur within this short observation period (C1 to C3), this would most probably indicate inadequate sensitivity of my first examination at date C1. Accordingly, this would be a means of estimating the sensitivity of my examination method. The obvious difficulty with this method is the choice of the correct interval, for which the mean sojourn time in Sp is decisive. This time is not defined in chronological and uniform terms, but it is dependent on the tumor growth and the type of detection. In the case of a short tumor doubling time (with constant detection modalities), it is shorter (A C) than in the case of slower developments (A" C"). With increasing sojourn time, it becomes more probable that the tumor will be detected by a given screening method. Consequently, the screening does not detect carriers of a disease by accident, but prefers those with the extended preclinical sojourn time (length-biased sampling) (Zelen and Feinleib 1969). By means of Fig. 1 a further term can be discussed. The extent by which the point in time of detection of a tumor can be brought forward by screening before phase So is reached is called "lead time". In Fig. 1, for the tumor development a", it is represented by the line B" C". Thus, it is dependent on the screening date during the sojourn time in Sp. If the screening dates in a population are chosen randomly, only ttie duration of the sojourn time is decisive. Two important conclusions can be drawn from these considerations. 1. A screening predominantly detects tumor forms which have a slow development (long sojourn time). 2. The slower the development (increasing sojourn time), the greater the possible advance of the date of diagnosis (increasing lead time). If, among other things, we want to use these considerations to estimate the sensitivity of our examination method, it follows that 3. the shorter the sojourn time, the shorter my observation interval must be. For complete assessment, however, it is necessary to consider another factor. The sensitivity of the currently applied, morphologically orientated observation methods generally improves as the tumor increases in size, i.e. the closer time Sc approaches, the more probable the detection by a given screening. By analogy with the introduced term "length-biased sampling", this could be called "longitudinal detection bias". From these considerations, it follows that
F. W. Schwartz
334 Table 1 a, b.
a Extent of the carcinoma in the testing material
I Less than 10% II More than 10% III In the total material Not sufficientlyassessable Total
Cases
%
42 68 22 9 141
29.8 48.2 15.6 6.4 100.0
b Histological classification of the prostate carcinomas of Groups I and III Group I Highly differentiated Slightly differentiated Cribriform Solid and anaplastic Highly and slightly differentiated Cribriform, and solid and anaplastic Cribriform in others Other combinations Total
Group III
29 2 1 0 5
0 2 2 1 5
0
0
4 1 42
7 5 22
(Dohm et al. 1979) 4. tumors which grow rapidly and have a short sojourn time are, during screening, detectable with a higher sensitivity than those with slower growth (if the interexamination intervals are shorter than the average sojourn time, or if we consider only one screening date). This fact is clearly illustrated by developments b and a", respectively, in Fig. 1. At the screening date Cs, b is more easily detectable than a". Consequently, the length-biased sampling, which is often emphasized, is counteracted by this longitudinal detection bias effect. This effect will be the more pronounced, the more the sensitivity of the detection method depends on tumor growth. In practice, this becomes particularly noticeable with prostate carcinomas. Whereas the recorded prostate carcinomas in Saarland especially detected by palpation only show a proportion of 11.8% of differentiated, slowly developing carcinomas, these carcinomas detected by accident (incidental carcinomas) show a proportion of 41.8% ( D o h m and H a u t n m m 1979). A further breakdown of the latter material clarifies the relationship: Within the group of those carcinomas which constitute less than 10 % of the testing material, the proportion of the highly differentiated forms was 69%. In the group of those histological samples which were completely overrun by the tumor, this proportion was 0% (Tables 1 a, b). These findings can, of course, be interpreted only with great caution since they do not originate from a clearly defined population. The term "incidental car-
SensitivityDetermination of Cancer ScreeningProgrammeswith the Aid of "Interval Cases"
335
cinoma" presupposes that, in the cases presented, indicative palpation results were not or have not yet been obtained. 1 This example is mentioned simply to show that - assuming this is not a highly selected population of cases - the slightly differentiated tumor developments, i.e. the more rapid ones, because of their expansion which is greater on the average, would have stood a better chance than the others of being detected by the screening method of palpation. If we return to the beginning of our considerations, viz. the influence of the interexamination intervals, the following can be stated: When we use the method of estimating the sensitivity of a screening program by interval cases (I), I is not just a function of the sensitivity of the given screening method, but also of the average sojourn time of the observed tumors (tsp) and of the observation interval chosen (ti), i.e. I (q, tsp). If t~ approaches tsp, the proportion of more slowly developing tumors with I will be high, and that of rapidly developing tumors will be small. This trend is reversed if t~ tends to zero, or if ti is far larger than tsv. Before discussing some practical applications of these considerations, it should be pointed out that the sojourn time in the given definition can be inappropriate for various diseases or cancer forms. In the case of breast cancer screening, for instance, it is obviously much more important not to take So as the upper limit, i.e. the entry into the clinically manifest stage, but the point in time of metastatic lymph node affection. However, this time is then no longer correlated with a uniform tumor size, as assumed in Fig. 1, but, with some forms of development, can be present with as little as a few millimetres of tumor diameter, yet with other forms, large tumors measuring several centimetres do not exhibit any lymph node affection (Heuser et al. 1979; Duncan and Kerr 1976). This consideration alone shows that setting the screening interval at 1 year, as is the case under the legal programme of early detection of cancer in the Federal Republic of Germany, is largely an arbitrary measure. Therefore, in the case of breast cancer, Heuser et al. (1979) advocate individual setting of the intervals according to the patients' risk factors and earlier suspect findings, although, and this must be emphasized, a clear concept of distinguishing women with the risk of rapidly growing tumors would first have to be developed 2. All these considerations show how difficult realistic estimation of sensitivity is when applying the method of interval cases to cancer screening. Nevertheless, this method is popular because an efficient definite diagnostic reference test, which could be applied to a screening population, is not available here (Chamberlain et al. 1979). Whereas a series of studies on breast cancer pursues this approach in a relatively simple way (e.g. Fox et al. 1978; Heuser et al. 1979; Chamberlain et al. 1979), Kirch and Klein (1978) tried to examine the relationships between interval cases and varying periodic examinations in a quantitative model. Using data of extensive American studies, they arrived at the result that the proportion of interval 1 For 95 of the 141 cases, the study presents data on rectal palpation findings at the most recent routine examination. Seventy-tbur(77%) of these were inconspicuous. Obviously, the remaining palpation findings were not interpreted as indicating the possibility of carcinomas; otherwise, the term incidentalcarcinoma could not be applied purposefully. 2 Gautherie and Gros (1980)proposed the use of thermographyto distinguish risk groups for rapid neoplasms
F. W. Schwartz
336 Table 2. Expected proportion of interval cases in periodic programs Proportion. of false negatives
Interexaminafion interval (months) 3
6
A. Physical examination programs 0 01 07 10 02 11 20 03 16 30 04 20 40 06 24
12
24
29 34 40 46 52
57 61 66 70 74
B. Joint physical-mammographic examinations a 0 -04 18 10 -07 23 20 -10 27 30 -12 32 40 -15 37
42 47 52 57 62
" 50% of cancers detected by each modality; (cf. Kirch and Klein 1978)
cases rises almost linearly with the increase o f the false negatives (insensivity) and almost linearly with the increase o f the interexamination intervals (Table 2). The example applies to breast cancer screening. This table makes it possible to estimate the sensitivity (l-e) at k n o w n values for interval cases and given screening periodicity. I f the sensitivity is known, this table can also be used to determine the p r o p o r t i o n o f interval cases which entered the detectable phase only after the previous screening date. F o r a given interval, I calculate the difference between the values o f m y k n o w n sensitivity and those o f a screening with the hypothetical sensitivity o f 1 (c~= 0). Assuming that, at the time o f detection or o f first treatment, a linear relationship can be seen between the p r o p o r t i o n o f patients with axillary lymph node affection and the n u m b e r o f t u m o r doublings f r o m a given basic t u m o r size, Kirch and Klein (1978) have c o m p o s e d a corresponding table for the expected p r o p o r t i o n o f positive lymph node cases (Table 3). However, D u n c a n and Kerr (1976) have proved that this no longer applies to tumors above 6 cm diameter. Table 2 can be used to estimate the sensitivity, whereas Table 3 n o t only fulfills the same purpose using a different observation method, but also provides a medically relevant criterion o f decision for the desired sensitivity and periodicity o f a screening. Thus, e.g., it becomes obvious that even under different sensitivity assumptions a physical examination at 3-monthly intervals seems to have the same usefulness 3 as a screening c o m b i n e d with m a m m o g r a p h y , which is carried out every 12 months. 3 The authors use the proportion of "lymphatic node negative" patients to define the "primary benefits" of a screening program which is contrasted to different cost assumptions. It is presumed that this measure'~is accepted to be a prognostically sufficient parameter
Sensitivity Determination of Cancer Screening Programmes with the Aid of "Interval Cases"
337
Table 3. Expected proportion of positive node cases in periodic programs Proportion of false negatives
Interexamination interval (months) 3
6
A. Physical examinations 0 33 35 10 34 36 20 34 36 30 34 37 40 35 37
12
24
37 38 38 39 39
39 40 40 40 40
B. Joint physical-mammographic examinations~ 0 -29 32 10 -30 33 20 -31 34 30 -32 35 40 -32 35
36 37 37 38 38
a 50% of cancers detected by each modality; (cf. Kirch and Klein 1978)
These considerations show that the concept of sensitivity estimation by means of interval cases is productive if applied with appropriate consideration. However, it can only be applied successfully in cases where sufficient empirical data on tumor kinetics and sojourn time in a detectable preclinical phase are available and, furthermore, on the relationships between tumor size and detectability by the examination method applied. In this connection, the "length-sampling bias" has to be regarded more critically than it used to be because, with morphologically oriented methods of discovery, it ignores the increasing detectability of tumors in lapse of time. A rational screening strategy today demands a detailed knowledge of experimental, clinical and epidemiological data. References Chamberlain J, Clifford RR, Nathan BE et al. (1979) Error rates in screening for breast cancer by clinical examination and mammography. Clin Oncol Dohm G, Hautumm B (1979) Die Morphologie des kiinischen Stadiums 0 des Prostatacarcinoms (incidental carcinoma). Urologe A 14:105-111 Duncan W, Kerr GR (1976) The curability of breast cancer. Br Med J 2:781-783 Fox SH, Moskowitz M, Saenger EL (1978) Benefit/risk analysis of aggressive mammographic screening. Radiology 128:359-365 Gautherie M, Gros CM (1980) Breast thermography and cancer risk prediction. Cancer 45:51-56 Heuser L, Spratt JS, Hisam C et al. (1979) Relation between mammary cancer groth kinetics and the intervals between screenings. Cancer 43:857-862 K itch RLA, Klein M (1978) Prospective evaluation of periodic breast examination programs. Cancer 41:728-736 Zelen M, Feinleib M (1969) On the theory of screening for chronic diseases. Biometrika 59:601-614
Received July 8, 1980/Accepted April 25, 1981