Procedure Manual Appendix XII. Summary of Evidence Table for Evidence Reviews

The approach to the summary of evidence for the USPSTF should transparently represent the body of evidence at the key question level and support the application of the USPSTF's six critical appraisal questions to determine the adequacy of the evidence (convincing, adequate, or inadequate).

Summary of evidence tables created by different EPC teams for the USPSTF should be consistent in the methodological assessment of the body of evidence and the definitions of the information displayed; however, the format of the content may vary by the first, second, and subsequent stratification approaches required for a specific body of evidence (Appendix Table 2).

Appendix Table 2. Summary of Evidence Table

Key Question
Separate Populations or Interventions (First-Order Stratification)
No. of Studies (k), No. of Participants (n), Study Design (Second-Order Stratification)
Summary of Findings by Outcome (Third-Order Stratification, if Needed)
Consistency/ Precision
Reporting Bias
Overall Risk for Bias/ Quality
Body of Evidence Limitations
EPC Assessment of Strength of Evidence for Key Question
  1. Summary of evidence tables are organized by key question to reflect the linkages in the analytic framework.
  2. Within the key questions, it can be most informative to stratify the body of evidence by subpopulation (e.g., by age or clinical group, such as pregnant women) or type of intervention (e.g., psychotherapy, specific medications), depending on the topic. This choice should not be rote, but should reflect the way the USPSTF has conceptualized the topic and key questions; the EPC should also consider the most informative approach for summarizing the available evidence given consistency and applicability issues within the body of evidence. The first-order stratification will generally result in a separate row for the entire subbody of evidence for that key question, particularly when the stratified data may be the basis for considering a subpopulation-specific recommendation or clinical considerations.
  3. Within the first-order stratification, it may be necessary to organize the body of evidence by a second-order variable, such as type of intervention or study design (e.g., RCT vs. observational study). The number of studies (k) and number of participants (n) for each study design should be described within this level of stratification.
  4. There may be a requirement for a third-order stratification, most likely for large bodies of evidence with pooled data available for different types of outcomes. To the degree made possible by the body of evidence, this summary should display the quantitative findings (pooled point estimates with 95% confidence intervals, heterogeneity measures, and predictive intervals, if warranted) or qualitative findings for each important outcome, with some indication of its variability. For qualitative or quantitative summaries at the outcome level, the number of contributing studies, number of events, and the combined sample size should also be reported. The consideration of the strength of evidence for the key question should be outcome-specific when multiple critical outcomes are measured and differ in any of the following domains (i.e., consistency/precision, reporting bias, overall risk for bias/quality).
  5. Consistency is the degree to which contributing studies estimate the same direction of effect (i.e., consistently suggest benefit or harm); when there is consistency, confidence intervals overlap and statistical tests suggest low heterogeneity. Consistency can be rated as reasonably consistent, inconsistent, or N/A. Inconsistent results may indicate subgroup effects. Precision is the degree to which contributing studies estimate the same magnitude of effect (i.e., precisely suggest the magnitude of benefit or harm); when there is precision, point estimates are close and confidence intervals are narrow, without concerns about insufficient sample size, low event rates, or estimates that could suggest different clinical actions would be appropriate at the upper and lower bounds of the confidence interval. Precision can be rated as reasonably precise, imprecise, or N/A. Imprecise results may suggest the need for further research.
  6. Reporting bias is the degree to which contributing studies may be limited by publication bias, selective outcome reporting bias, or selective analysis reporting bias. Reporting bias can be difficult to document (suspected, undetected, or N/A).
  7. Within the appropriate level of stratification, a combined summary of individual study (or outcome-specific) quality assessments (or risk for bias) should be presented as good, fair-to-good, fair, fair-to-poor, or N/A (for no evidence). While the overall USPSTF quality rating occurs at the individual study level, EPC teams consider that threats to validity may apply differently to benefits and harms in the same study. Outcome-specific threats to validity may be reported when there are sufficient data and outcomes are of critical importance.
  8. Important limitations in the body of evidence from what is desired to answer the overall key question are qualitatively described so the USPSTF might keep them in mind. These limitations might represent issues that led to low individual- or outcome-level study quality, such as concerns about populations selected and whether they adequately address racial/ethnic or other vulnerable subpopulations, lack of replication of interventions, or nonreporting of patient-important outcomes.
  9. Using definitions from the EPC Program, the EPC provides a tentative strength of evidence assessment for each stratum for internal use by the USPSTF in its independent process of assessing the evidence. Strength of evidence assessments are labeled with the assessed grade (high, moderate, low, or insufficient), followed by language from the grade's definition (Appendix Table 3) that describes the critical appraisal issues leading to that grade. For example, a "high" strength of evidence assessment may state: "We are moderately confident that the estimate of effect lies close to the true effect; however, the body of evidence is still fairly small and not broadly representative of primary care settings, so some doubt remains."
  10. Applicability is a descriptive assessment of how well the overall body of evidence would apply to the U.S. population based on settings; populations; and intervention characteristics, including accessibility, training, or quality assurance requirements.

Appendix Table 3. EPC Strength of Evidence Grades and Definitions

Grade Definition
High We are very confident that the estimate of effect lies close to the true effect for this outcome. The body of evidence has few or no deficiencies. We believe that the findings are stable (i.e., another study would not change the conclusions).
Moderate We are moderately confident that the estimate of effect lies close to the true effect for this outcome. The body of evidence has some deficiencies. We believe that the findings are likely to be stable, but some doubt remains.
Low We have limited confidence that the estimate of effect lies close to the true effect for this outcome. The body of evidence has major or numerous deficiencies (or both). We believe that additional evidence is needed before concluding either that the findings are stable or that the estimate of effect is close to the true effect.
Insufficient We have no evidence, we are unable to estimate an effect, or we have no confidence in the estimate of effect for this outcome. No evidence is available or the body of evidence has unacceptable deficiencies, precluding us from reaching a conclusion.

Current as of: July 2017
Internet Citation: Appendix XII. Summary of Evidence Table for Evidence Reviews. U.S. Preventive Services Task Force. July 2017.

Back to Previous Section

Proceed to Next Section