You Are Here:
Model Estimates of Potential Benefits and Harms
Date: November 2009
By Jeanne Mandelblatt, MD, MPH; Kathleen Cronin, PhD; Stephanie Bailey, PhD; Donald Berry, PhD; Harry de Koning, MD, PhD; Gerrit Draisma, PhD; Hui Huang, MS; Sandra Lee, DSc; Mark Munsell, MS; Sylvia Plevritis, PhD; Peter Ravdin, MD, PhD; Clyde Schechter, MD, MA; Bronislava Sigal, PhD; Michael Stoto, PhD; Natasha Stout, PhD; Nicolien van Ravesteyn, MSc; John Venier, MS; Marvin Zelen, PhD; and Eric Feuer, PhD; for the Breast Cancer Working Group of the Cancer Intervention and Surveillance Modeling Network (CISNET)*
Address correspondence to: Jeannes S. Mandelblatt, MD, MPH, Lombardi Comprehensive Cancer Center, 3300 Whitehaven Street, Northwest, Suite 4100, Washington, DC 20007; E-mail, firstname.lastname@example.org.
The information in this report is intended to help clinicians, employers, policymakers, and others make informed decisions about the provision of health care services. This report is intended as a reference and not as a substitute for clinical judgment.
This report may be used, in whole or in part, as the basis for the development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.
This article was first published in Annals of Internal Medicine in November 2009 (Ann Intern Med 2009;151:738-747. http://www.annals.org).
Background: Despite trials of mammography and widespread use, optimal screening policy is controversial.
Objective: To evaluate U.S. breast cancer screening strategies.
Design: 6 models using common data elements.
Data Sources: National data on age-specific incidence, competing mortality, mammography characteristics, and treatment effects.
Target Population: A contemporary population cohort.
Time Horizon: Lifetime.
Interventions: 20 screening strategies with varying initiation and cessation ages applied annually or biennially.
Outcome Measures: Number of mammograms, reduction in deaths from breast cancer or life-years gained (vs. no screening), false-positive results, unnecessary biopsies, and overdiagnosis.
Results of Base-Case Analysis: The 6 models produced consistent rankings of screening strategies. Screening biennially maintained an average of 81% (range across strategies and models, 67% to 99%) of the benefit of annual screening with almost half the number of false-positive results. Screening biennially from ages 50 to 69 years achieved a median 16.5% (range, 15% to 23%) reduction in breast cancer deaths versus no screening. Initiating biennial screening at age 40 years (vs. 50 years) reduced mortality by an additional 3% (range, 1% to 6%), consumed more resources, and yielded more false-positive results. Biennial screening after age 69 years yielded some additional mortality reduction in all models, but overdiagnosis increased most substantially at older ages.
Results of Sensitivity Analysis: Varying test sensitivity or treatment patterns did not change conclusions.
Limitation: Results do not include morbidity from false-positive results, patient knowledge of earlier diagnosis, or unnecessary treatment.
Conclusion: Biennial screening achieves most of the benefit of annual screening with less harm. Decisions about the best strategy depend on program and individual objectives and the weight placed on benefits, harms, and resource considerations.
Primary Funding Source: National Cancer Institute.
In 2009, an estimated 193,370 women in the United States will develop invasive breast cancer, and about 40,170 of them will die of this disease (1). Randomized trials of mammography (2-4) have demonstrated reductions in breast cancer mortality associated with screening from ages 50 to 74 years. Trial results for women aged 40 to 49 years and women aged 74 years or older were not conclusive, and the trials (4,5) had some problems with design, conduct, and interpretation. However, it is not feasible to conduct additional trials to get more precise estimates of the mortality benefits from extending screening to women younger than 50 years or older than 74 years or to test different screening schedules.
We developed models of breast cancer incidence and mortality in the United States. These models are ideally suited for estimating the effect of screening under a variety of policies (6,7). Modeling has the advantage of being able to hold selected conditions (for example, screening intervals or test sensitivity) constant, which facilitates comparison of strategies. Because all models make assumptions about unobservable events, use of several models provides a range of plausible effects and can illustrate the effects of differences in model assumptions (7).
We used 6 established models to estimate the outcomes across 20 mammography screening strategies that vary by age of initiation and cessation and by screening interval among a cohort of U.S. women. The results are intended to contribute to practice and guideline policy debates.
The 6 models were developed independently within the Cancer Intervention and Surveillance Modeling Network (CISNET) of the National Cancer Institute (NCI) (7,8) and were exempt from institutional review board approval. The models have been described elsewhere (7,9-15). Briefly, they share common features and inputs but differ in some ways (Appendix Table 1). Model E (Erasmus Medical Center, Rotterdam, the Netherlands), model G (Georgetown University Medical Center, Washington, DC, and Albert Einstein College of Medicine, Bronx, New York), model M (M.D. Anderson Cancer Center, Houston, Texas), and model W (University of Wisconsin, Madison, Wisconsin, and Harvard Medical School, Boston, Massachusetts) include ductal carcinoma in situ (DCIS). Models E and W specifically assume that some portions of DCIS are nonprogressive and do not result in death. Model W also assumes that some cases of small invasive cancer are nonprogressive. Model S (Stanford University, Palo Alto, California) and model D (Dana-Farber Cancer Institute, Boston, Massachusetts) include only invasive cancer. Some groups model breast cancer in stages, but 3 (models E, S, and W) use tumor size and tumor growth. The models also differ by whether treatment affects the hazard for death from breast cancer (models G, S, and D), results in a cure for some fraction of cases (models E and W), or both (model M). Despite these differences, in previous collaborations (7) all the models came to similar qualitative estimates of the relative contributions of screening and treatment to observed decreases in deaths from breast cancer.
We used the 6 models to estimate the benefits, resource use (as measured by number of mammograms), and harms of 20 alternative screening strategies varying by starting and stopping age and by interval (annual and biennial) (Table 1). The models begin with estimates of breast cancer incidence and mortality trends without screening and treatment and then overlay screening use and improvements in survival associated with treatment (7). We use a cohort of women born in 1960 and follow them beginning at age 25 years for their entire lives. Breast cancer is generally depicted as having a preclinical, screening-detectable period (sojourn time) and a clinical detection point. On the basis of mammography sensitivity (or thresholds of detection), screening identifies disease in the preclinical screening-detection period and results in the identification of earlier-stage or smaller tumors than might be identified by clinical detection, resulting in reduction in breast cancer mortality. Age, estrogen receptor status, and tumor size- or stage-specific treatment have independent effects on mortality. Women can die of breast cancer or of other causes.
Model Data Variables
All 6 modeling groups use a common set of age-specific variables for breast cancer incidence, mammography test characteristics, treatment algorithms and effects, and nonbreast cancer competing causes of death (Appendix Table 2). In addition to these common variables, each model includes model-specific inputs (or intermediate outputs) to represent preclinical detectable times, lead time, dwell time within stages of disease, and stage distribution in unscreened versus screened women on the basis of their specific model structure (7,9-15).
We use an age-period-cohort model to estimate what breast cancer incidence rates would have been without screening (16). This approach considers the effect of age, temporal trends in risk by cohort, and time period. Because we do not have data on future incidence of breast cancer, we extrapolate forward assuming that future age-specific incidence increases as women age, as observed in 2000. To isolate the effect of technical effectiveness of screening and to assess the effect of screening on mortality while holding treatment constant, models assume 100% adherence to screening and indicated treatment.
Three groups use the age-specific mammography sensitivity (and specificity) values observed in the Breast Cancer Surveillance Consortium (BCSC) program for detection of all cases of breast cancer (invasive and in situ). Separate values are used for initial and subsequent mammography performed at either annual or biennial intervals (17). Two of the models (D and G) use these data directly as input variables (10,14), and 1 model (S) uses the data to calibrate the model (13). The other 3 models (E, M, and W) use the BCSC data as a guide and to fit sensitivity estimates from this and other sources (9,11,15).
All women who have estrogen receptor-positive invasive tumors receive hormonal treatment (tamoxifen if women aged <50 years at diagnosis and anastrozole if ≥50 years) and nonhormonal treatment with an anthracycline-based regimen. Women with estrogen receptor-negative invasive tumors receive nonhormonal therapy only. Women with DCIS who have estrogen receptor-positive tumors receive hormonal therapy only (18). Treatment effectiveness is based on a synthesis of recent clinical trials and is modeled as a proportionate reduction in mortality risk or the proportion cured (19,20).
We estimated the cumulative probability of unscreened women dying of breast cancer from age 40 years to death. Screening benefit is then calculated as the percentage of reduction in breast cancer mortality (vs. no screening). We also examined life-years gained because of averted or delayed breast cancer death. Benefits are cumulated over the lifetime of the cohort to capture reductions in breast cancer mortality (or life-years gained) occurring years after the start of screening, after considering nonbreast cancer mortality (21,22).
As measures of the burden that a regular screening program imposes on a population, 3 different potential screening harms were examined: false-positive mammograms, unnecessary biopsies, and overdiagnosis. We define the rate of false-positive mammograms as the number of mammograms read as abnormal or needing further follow-up in women without cancer divided by the total number of positive screening mammograms based on the specificity reported in the BCSC (17). We define unnecessary biopsies post hoc as the proportion of women with false-positive screening results who receive a biopsy (23). We define overdiagnosis as the proportion of cases in each strategy that would not have clinically surfaced in a woman's lifetime (because of lack of progressive potential or death from another cause) among all cases arising from age 40 years onward.
We compared model results for the 20 strategies to select the most efficient approach. In a decision analysis, we considered a new intervention more efficient than a comparison intervention if it results in gains in health outcomes, such as life-years gained or deaths averted, while consuming fewer resources (or costs). If the new intervention results in worse outcomes and requires a greater investment, it is inefficient and would not be considered for further use. In economic analysis, inefficient strategies are said to be "dominated" when this occurs. To rank the screening strategies, we first look at the results of each model independently. For a particular model, a strategy that requires more mammographies (our measure of resource use) but has a lower relative percentage of mortality reduction (or life-years gained) is considered inefficient or dominated by other strategies. To evaluate strategies on the basis of results from all 6 models together, we classify them as follows: If a strategy is dominated in all or in 5 of 6 of the models, we considered it dominated overall. If a strategy is not dominated in any of the models, we classified it as efficient. For a strategy with mixed results across the models, we classified it as borderline.
After all dominated strategies were eliminated, the remaining strategies were represented as points on a graph plotting the average number of mammograms versus the percentage of mortality reduction (or life-years gained) for each model. We obtained the efficiency frontier for each graph by identifying the sequence of points that represent the largest incremental gain in percentage of mortality reduction (or life-years gained) per additional screening mammography. Screening strategies that fall on this frontier are the most efficient (that is, no alternative exists that provides more benefit for fewer mammographies performed).
We conducted a sensitivity analysis to see whether our conclusions about the ranking of strategies change when we vary input variables. First, we investigate the effect of assuming that mammography sensitivity for a given age, screening round, and screening interval is 10 percentage points less than that observed. Second, we examine whether ranking of strategies varies if treatment includes newer hormonal and nonhormonal adjuvant regimens (for example, taxanes). Third, because adjuvant therapy is unlikely to reach 100% of women as modeled in our base-case analysis, we reassess the ranking of strategies if we assume that actual observed current treatment patterns apply to the cohort (24).
Model Validation and Uncertainty
Each model has a different structure and assumptions and some varying input variables, so no single method can be used to validate results against an external gold standard. For instance, because some models used results from screening trials (or SEER [Surveillance, Epidemiology and End Results] data) for calibration or as input variables, we cannot use comparisons of projected mortality reductions to trial results to validate all of the models. In addition, we cannot directly compare the results of this analysis, which uses 100% actual screening for all women at specified intervals, with screening trial results in which invitation to screening and participation varied. In our previous work (7,9-11,13-15), results of each model accurately projected independently estimated trends in the absence of intervention and closely approximated modern stage distributions and observed mortality trends. Overall, using 6 models to project a range of plausible screening outcomes provides implicit cross-validation, with the range of results from the models as a measure of uncertainty.
Role of the Funding Source
This work was done under contracts from the Agency for Healthcare Research and Quality (AHRQ) and NCI and grants from the NCI. Staff from the NCI provided some data and technical assistance, and AHRQ staff reviewed the manuscript. Model results are the sole responsibility of the investigators.
In an unscreened population, the models predict a cumulative probability of breast cancer developing over a woman's lifetime starting at age 40 years ranging from 12% to 15%. Without screening, the median probability of dying of breast cancer after age 40 years is 3.0% across the 6 models. Thus, if a particular screening strategy leads to a 10% reduction in breast cancer mortality, then the probability of breast cancer mortality would be reduced from 3.0% to 2.7%, or 3 deaths averted per 1000 women screened.
The 6 models produce consistent results on the ranking of the strategies (Appendix Table 3). Eight approaches are "efficient" in all models (that is, not dominated, because they provide additional mortality reductions for added use of mammography); 7 of these have a biennial interval, and all but 2 start at age 50 years. The Figure shows these results, and again we see that most strategies on the efficiency frontier have a biennial interval. Screening every other year from ages 50 to 69 years is an efficient strategy for reducing breast cancer mortality in all models. In all models, biennial screening starting at age 50 years and continuing through ages 74, 79, or 84 years are of fairly similar efficiency.
In examining benefits in terms of life-years gained (Appendix Table 4), 6 of the 8 consistently nondominated strategies have a biennial interval. In contrast to results for mortality reduction, half of the nondominated strategies include screening initiation at age 40 years. Annual screening strategies that include screening until age 79 or 84 years are on the efficiency frontier (Appendix Figure), but are less resource-efficient than biennial approaches for increasing life-years gained.
As another way to examine the effect of screening interval, we calculated for each screening strategy and model the proportion of the annual benefit (in terms of mortality reduction) that could be achieved by biennial screening (Table 2). Biennial screening maintains an average of 81% (range across strategies and models, 67% to 99%) of the benefits achieved by annual screening.
We also examined the incremental benefits gained by extending screening from ages 50 to 69 years to either earlier or later ages of initiation and cessation (Table 3). Continuing screening to age 79 years (vs. 69 years) results in a median increase in percentage of mortality reduction of 8% (range, 7% to 11%) and 7% (range, 6% to 10%) under annual and biennial intervals, respectively. If screening begins at age 40 years (vs. 50 years) and continues to age 69 years, all models project additional, albeit small, reductions in breast cancer mortality (3% median reduction with either annual or biennial intervals) (Table 3). This translates into a median of 1 additional breast cancer death averted (range, 1 to 2 deaths) per 1000 women screened under a strategy of annual screening from age 40 to 69 years (vs. 50 to 69 years). Thus, greater mortality reductions could be achieved by stopping screening at an older age than by initiating screening at an earlier age.
However, when life-years gained is the outcome measure, 3 of the models conclude that benefits are greater from extending screening to the younger rather than the older age group (Table 3). For instance, starting annual screening at age 40 years (vs. 50 years) and continuing annually to age 69 years yields a median of 33 (range, 11 to 58) life-years gained per 1000 women screened, whereas extending annual screening to age 79 years (vs. 69 years) yields a median of only 24 (range, 18 to 38) life-years gained per 1000 women screened.
All the models project similar rates of false-positive mammograms over the lifetime of screened women across the screening strategies; Table 4 summarizes results for an exemplar model. More false-positive results occur in strategies that include screening from ages 40 to 49 years than in those that initiate screening at age 50 years or later and those that include annual screening rather than biennial screening. For instance, annual screening from ages 40 to 69 years yields 2250 false-positive results for every 1000 women screened over this period, almost twice as many as that of biennial screening in this age group. The proportion of biopsies that occur because of these false-positive results that are retrospectively deemed unnecessary (that is, the woman did not have cancer) is about 7%; therefore, many more women will undergo unnecessary biopsies under annual screening than biennial screening.
Of the 6 models, 5 estimated rates of overdiagnosis. They showed an increase in the risk for overdiagnosis as age increases (data not shown). Although the increase with age occurs over the entire age range considered in the different screening strategies, the rate of increase accelerates in the older age groups, mostly because of increasing rates of competing causes of mortality. Rates of overdiagnosis were higher for DCIS than for invasive disease, proportionately affecting younger women more because more cases of DCIS are diagnosed at younger ages. However, overall, initiating screening at age 40 years (vs. 50 years) had a smaller effect on overdiagnosis than did extending screening beyond age 69 years. Biennial strategies decrease the rate of overdiagnosis, but by much less than one half. The absolute estimate of overdiagnosis varied between models depending on whether DCIS was or was not included and on the assumptions related to progression of DCIS and invasive disease, reflecting the uncertainty in the current knowledge base.
The overall conclusions are robust across the 6 models under different assumptions about mammography sensitivity, treatment patterns, and treatment effectiveness (data not shown).
This study uses 6 established models that use common inputs but different approaches and assumptions to extend previous randomized mammography screening trial results to the U.S. population and to age groups in whom trial results are less conclusive. All 6 modeling groups concluded that the most efficient screening strategies are those that include a biennial screening interval. Conclusions about the optimal starting ages for screening depend more on the measure chosen for evaluating outcomes. If the goal of a national screening program is to reduce mortality in the most efficient manner, then programs that screen biennially from age 50 years to age 69, 74, or 79 years are among the most efficient on the basis of the ratio of benefits to the number of screening examinations. If the goal of a screening program is to efficiently maximize the number of life-years gained, then the preferred strategy would be to screen biennially starting at age 40 years. Decisions about the best starting and stopping ages also depend on tolerance for false-positive results and rates of overdiagnosis.
The conclusion of this modeling analysis—that biennial intervals are more efficient and provide a better balance of benefits and harms than annual intervals—is contrary to some current practices in the United States (25-27). However, our result that biennial screening is more efficient than annual screening is consistent with previous modeling research (28-32) and screening trials, most of which used 2-year intervals (2-5). The model results also agree with reports showing similar intermediate cancer outcomes (for example, stage distribution) between programs using annual and biennial screening, especially among women aged 50 years or older (33-37). In addition, we demonstrated substantial increases in false-positive results and unnecessary biopsies associated with annual intervals, and these harms are reduced by almost 50% with biennial intervals. Our results are also consistent with current knowledge of disease biology. Slow-growing tumors are much more common than fast-growing tumors, and the ratio of slow- to fast-growing tumors increases with age, (38) so that little survival benefit is lost between screening every year versus every other year. For the small subset of women with aggressive, fast-growing tumors, even annual screening is not likely to confer a survival advantage. Guidelines in other countries (4) include biennial screening. However, whether it will be practical or acceptable to change the existing U.S. practice of annual screening cannot be addressed by our models.
In all models, some reductions in breast cancer mortality, albeit small, were seen with strategies that started screening at age 40 years versus 50 years. Because models can represent millions of observations, they are well-suited to detect small differences in a group over time that might not be seen in even the largest clinical trial with a 10- to 15-year follow-up (4,39-42). If program benefits are measured in life-years, the measure most commonly used in cost-effectiveness analysis, then our results suggest that initiating screening at age 40 years saves more life-years than extending screening past age 69 years (albeit at the cost of increasing the number of false-positive mammograms).
Previous recommendations on breast cancer screening have suggested an upper age limit for screening cessation because of decreasing program efficiency due to competing mortality (26,43). Our result that screening strategies that include an upper age limit beyond age 69 years remain on the efficiency frontier (albeit with low incremental gains over strategies that stop screening at earlier ages and with greater harms) is consistent with previously reported results of screening benefit from observational and modeled data (31,32,44-47). However, the observational data reports may have been confounded by the inability to capture lead time and length biases (48-50). Any benefits of screening older women must be balanced against possible harms. For instance, the probability of overdiagnosis increases with age and increases more dramatically for the oldest age groups. Model estimates for the oldest age groups also have more uncertainty compared with estimates for ages 50 to 74 years because of the lack of primary data on natural history of breast cancer and the absence of screening trial data after age 74 years. With the demographic pressure of an aging society, more research will be needed to fully understand the natural history of this disease and the balance of risks and benefits of screening and treatment in the older age groups (38,50).
Our results also highlight the need for better primary data on the natural history of DCIS and small invasive cancer to draw reliable conclusions on the absolute magnitude of overdiagnosis associated with different screening schedules (37,51). Clinical investigation (52), follow-up in screening trials (53), epidemiologic trends in incidence (54), and previous modeling efforts (9,55) all indicated that some DCIS cases will not progress (56,57), but how many is not known.
The collaboration of 6 groups with different modeling philosophies and approaches to estimate the same end points by using a common set of data provides an excellent opportunity to cross-replicate data generated from modeling, represent uncertainty related to modeling assumptions and structure, and give insight into which results are consistent across modeling approaches and which are dependent on model assumptions. The resulting conclusions about the ranking of screening strategies were very robust and should provide greater credibility than inferences based on 1 model alone.
Despite our consistent results, our study had some limitations (58). First, our models provide estimates of the average benefits and harms expected across a cohort of women and do not reflect personal data for individual women. Also, although our models project mortality reductions similar to those observed in clinical trials, the range of results includes higher mortality reductions than that achieved in the trials because we model lifetime screening and assume adherence to all screening and treatment. The trials followed women for limited numbers of years and have some nonadherence. The models also do not capture differences in outcomes among certain risk subgroups, such as women with BRCA1 or BRCA2 genetic susceptibility mutations, women who are healthier or sicker than average, or black women who seem to have more disease at younger ages than white women (59).
Second, the outcomes considered do not capture morbidity associated with surgery for screening-detected disease (60) or decrements in quality of life associated with false-positive results, living with earlier knowledge of a cancer diagnosis, or overdiagnosis (61).
Third, in estimating lifetime results, we projected breast cancer trends from background incidence rates of a 1960 birth cohort extrapolated forward in time. However, future background incidence (and mortality) may change as the result of several different forces, such as changes in patterns of reproduction; less use of hormone replacement therapy after 2002 or prescription of tamoxifen or other agents for primary disease prevention; increasing rates of obesity; and further advances in treatment (for example, trastuzumab) (62). Although most models portray known differences in biology by age (for example, distribution of estrogen receptor-positive tumors, sensitivity of screening, and length of the preclinical sojourn times), some aspects of the natural history of disease are not known or cannot be fully captured.
We assumed 100% adherence to screening and treatment to evaluate program efficacy. Benefits will always fall short of the projected results because adherence is not perfect. If actual adherence varies systematically by age or other factors, the ranking of strategies could change. In addition, we did not consider "mixed" strategies (for example, screening annually from age 40 to 49 years and then biennially from age 50 to 79 years) as was done in some trials (5) and other analyses (36,63). We found that the benefits of screening from ages 40 to 49 years were small. Benefits in this age group were also associated with harms in terms of false-positive results and unnecessary biopsies. Thus, although strategies that include annual screening from ages 40 to 49 years might be efficient, this would be largely driven by the more favorable balance of benefits and harms after age 50 years. In addition, we judged that mixed strategies are very difficult to communicate to consumers and implement in public health practice.
Finally, we did not discount benefits or include costs in our analysis, although the average number of mammograms per woman (and false-positive results) provides some proxy of resource consumption. Even with these acknowledged limitations, the models demonstrate meaningful, qualitatively similar outcomes despite variations in structure and assumptions.
Overall, the evaluation of screening strategies by the 6 models suggests that optimal program design is based on biennial intervals. Choices about optimal ages of initiation and cessation will ultimately depend on program goals, resources, weight attached to the presence of trial data, the balance of harms and benefits, and considerations of efficiency and equity.