Recommendations from European Breast Guidelines

Should double reading (with consensus or arbitration) vs. single reading be used to screen mammograms for early detection of breast cancer in mammography screening programmes?

Recommendation

The recommendation tailored for patient/individual is currently under development. Please find below the recommendation already issued for healthcare professionals.

The ECIBC's Guidelines Development Group suggests using double reading (withconsensus or arbitration) over single reading to screen mammograms for early detection of breast cancer in mammography screening programmes (conditional recommendation, moderate certainty).

Recommendation strength

  •   Strong recommendation against the intervention
  •   Conditional recommendation against the intervention
  •   Conditional recommendation for either the intervention or the comparison
  • Conditional recommendation for the intervention
  •   Strong recommendation for the intervention

A recommendation can be strong or conditional.

When a recommendation is strong, most women will want to follow it. When a recommendation is conditional, the majority of women want to follow it but may need more discussion with their healthcare professional first.

Justification

The justification for the recommendation tailored for patient/individual is currently under development. Please find below the justification for the recommendation already issued for healthcare professionals.
 

Overall justification

The GDG suggests by consensus that double reading (with consensus or arbitration) over single reading be used to diagnose breast cancer in mammography screening.

Only one study with digital mammography was included in the evidence, which limited the GDG to be able to issue a strong recommendation.

Detailed justification

Test accuracy:
The GDG notes that most studies reviewed did not use digital mammography. The GDG notes that there is a higher sensitivity of breast cancer detection with double reading with consensus or abitration in mammography screening.

Desirable Effects:
The GDG judged that the desirable anticipated effects, including additional breast cancers detected were small.

Undesirable Effects:
The GDG judged that the undesirable anticipated effects, including additional false positive screening results identified, were small.

Certainty of the evidence of test accuracy:
The GDG notes that only one study of digital mammography was identified. The other identified studies were published based on screen-film mammography since double reading with consensus or arbitration has been standard practice in many settings for a number of years. The GDG notes that indirectness of the evidence was a concern as the studies included did not use digital mammography as is used in current practice.

Resources required:
The GDG judged that the resources required will vary, but noted that they will always be greater with double reading with consensus or arbitration than single reading. The GDG noted that the proportionate cost increase will vary and it may be negligible, moderate or large depending on the setting. The GDG notes that increased costs observed may be due to both additional costs of reading and for additional assessments required as a result of increased detection or false positives.

 

Considerations

Subgroup

1. The GDG notes that in the context of double reading with consensus or arbitration, no differences were observed in accuracy when arbitration or consensus or both were used to reconcile differences in interpretation between mammography readers.


Implementation

1. In settings with many low-volume mammography readers, the balance of benefits and harms may be even greater. The GDG refers readers to the PICO Question 7: ‘What is the optimal annual interpretive volume for radiologists reading screening mammograms?’ in the CCIB report, addressed by the QASDG regarding the experience level of mammography readers.
2. In some settings, capacity (human resources of mammography readers) should be scaled up to implement double readings. In settings where double readings are already in practice, the GDG suggests continued use of double readings with consensus or arbitration.
3. The GDG notes that a consideration that can favour double reading with consensus or arbitration is in those settings with many low volume mammography readers; the desirable effects of double reading with consensus or arbitration were found to be greater with less undesirable effects in these settings as compared to high volume mammography reader settings.

Monitoring and Evaluation

1. The GDG suggests reporting the proportion of double reading with consensus or arbitration of mammograms that occur in practice. The GDG refers this suggestion to the QASDG for consideration.

Research Priorities

1. The GDG suggests further research examining the cost-effectiveness of double vs single reading of digital mammography in different settings. Cost-effectiveness data was only identified for Spain.
2. The GDG suggests new research using observational studies comparing double reading with consensus or arbitration with single reading in the context of digital mammography. Additional research could also be performed to assess accuracy within the context of double readings assessing a single reader vs with the addition of a second reader, which is performed in practice.
3. The GDG suggests the use of formal radiologist blinding in research to improve the quality of evidence on double vs single readings.
4. The GDG notes that newer screening strategies such as digital breast tomosynthesis (DBT) or automatic computer assisted detection (CAD) was excluded from the analysis of this question with double vs single mammography. Future research could assess the impact of double reading using CAD and/or DBT systems.

Evidence

Download the evidence profile

Assessment

Background

Mammography screening is strongly recommended for women aged 50 to 69 because the benefits outweigh the harms. Many countries have organised programmes according to the Council of the EU recommendation 2003. Nevertheless, mammography sensitivity can be low, thus limiting the efficacy of screening. On the other hand, some of the undesirable effects of screening are due to the low specificity (false positive screening exams and invasive assessment) of the test. Practice varies with respect to image reading and diagnostic protocols. Optimising mammography sensitivity and specificity is therefore important to optimise the benefit-harm balance of screening.

One of the methods that has been adopted to improve sensitivity of mammography screening is double reading, whereby the mammograms are read, generally independently, by two trained readers. If every mammogram that is read as positive by one or both readers is recalled for assessment, this method has necessarily a detrimental impact on specificity. To mitigate or avoid this problem, mammograms with discordant readings can be reviewed by a third reader (arbitration) or can be discussed by the two readers to reach consensus. Another possible scenario for doing consensus is when the two readers agree on a positive result. Published articles addressing this topic, however, are missing.
The main objective of this question is whether a strategy in reading mammograms by double reading (independent or dependent, blinded or not) with a consensus conference (1st intervention) or double reading with arbitration (2nd intervention) is superior to single reading (comparison) with regards to the outcomes of breast cancer mortality, stage of breast cancer detected, interval cancer rate, advanced cancers in subsequent rounds, false positive and false negative results of screening mammograms, recall rates and breast cancer detection rate.

Is the problem a priority?
Yes *
* Possible answers: ( No , Probably no , Probably yes , Yes , Varies , Don't know )
Research Evidence
Breast cancer is the second most common cancer in the world and, by far, the most frequent cancer among women with an estimated 1.67 million new cancer cases diagnosed in 2012—accounting for 25% of all cancers [GLOBOCAN 2012]. Breast cancer ranks as the fifth leading cause of cancer death worldwide and the second leading cause of cancer-related death in developed regions [GLOBOCAN 2012]. In the European Union, 367 090 women were diagnosed with breast cancer and 92 000 women died from the disease in 2012 [Ferlay 2013]. Breast cancer ranks fourth among the top five cancers with the highest disease burden [Tsilidis 2016]. Annual incidence of breast cancer in the EU among women aged 50 to 69 is 2.7 per 1 000 and mortality is 0.5 per 1 000 [GLOBOCAN 2012].
Additional Considerations

This question was prioritised by the GDG

How accurate is the test?
Accurate *
* Possible answers: ( Very inaccurate , Inaccurate , Accurate , Very accurate , Varies , Don't know )
Research Evidence

Test accuracy

Double reading (with consensus or arbitration)
Sensitivity: 0.83 (95% CI: 0.67 to 0.94) Specificity: 0.96 (95% CI: 0.86 to 1.00)

Single reading
Sensitivity: 0.75 (95% CI: 0.63 to 0.86) Specificity: 0.95 (95% CI: 0.86 to 1.00)


Test resultNumber of results per 1000 patients tested (95% CI)№ of participants
(studies)
Certainty of the evidence
(GRADE)
Prevalence 0%
double reading (with consensus or arbitration)single reading
True positives
patients with breast cancer
6 (5 to 7)5 (4 to 6)252240
(3)
a

MODERATE
b,c,d
1 more TP in double reading (with consensus or arbitration)
False negatives
patients incorrectly classified as not having breast cancer
1 (0 to 2)2 (1 to 3)
1 fewer FN in double reading (with consensus or arbitration)
True negatives
patients without breast cancer
953 (854 to 993)943 (854 to 993)252240
(3)
e

LOW
b,d,f
10 more TN in double reading (with consensus or arbitration)
False positives
patients incorrectly classified as having breast cancer
40 (0 to 139)50 (0 to 139)
10 fewer FP in double reading (with consensus or arbitration)
  1. Pooled detection rate ‰ (overall): Double reading with consensus or arbitration: 4.7‰ (95%CI 3.4 to 6.1‰). Single reading: 4.2‰ (95%CI 3.0 to 5.5‰) (Duijm 2009, Gromet 2008, Warren 1995).
  2. The quality of the evidence was downgraded due to indirectness. First, the follow-up for interval cancers was different between studies and therefore it affects the estimated sensitivity. Warren 1995 assessed data from first screening round (3-year follow up for interval cancers). Gromet 2008 included one-year interval cancers. Duijm 2009 included two-year interval cancers. Second, all studies were performed based on data from screen-film mammography, which is an old technique that has been replaced by digital mammography in most of the European programmes.
  3. Duijim (2009) showed a lower sensitivity compared to the other studies (2-year follow up for interval cancers). Warren (1995) showed the highest sensitivity (included only first screening round). These results are compatible with data from breast cancer screening programs.
  4. Unclear information about the used reference standard. Likely to be consistent with population screening programs.
  5. Pooled false positive rate ‰ (overall): Double reading with consensus or arbitration: 46.1‰ (95%CI 28.6‰ to 67.4‰). Single reading: 47.0‰ (95%CI 29.4‰ to 68.6‰) (Duijm 2009, Gromet 2008, Warren 1995).
  6. Wide confidence intervals for false positive results might imply different consequences and decisions for stakeholders.





Other outcomes


REFERENCES
1) Duijm LEM, et al (2009) Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome. Br J Cancer 100:901–907.
2) Gromet M (2008) Comparison of computer-aided detection to double reading of screening mammograms: review of 231,221 mammograms. AJR Am J Roentgenol 190:854–859.
3) Leivo T, et al (1999) Incremental cost-effectiveness of double-reading mammograms. Breast Cancer Res Treat 54:261–267.
4) Liston JC, et al. Can the NHS Breast Screening Programme afford not to double read screening mammograms? ClinRadiol 2003; 58 (6): 474–7.
5) Pauli R, et al (1996) Comparison of radiographer/radiologist double film reading with single reading in breast cancer screening. J Med Screen 3:18–22.
6) Posso M, et al. Double versus single reading of mammograms in a breast cancer screening programme: a cost-consequence analysis. Eur Radiol. 2016 Sep;26(9):3262-71.
7) Tonita JM, et al. Medical radiologic technologist review: effects on a population-based breast cancer screening program. Radiology. 1999; 211(2): 529-33.
8) Warren RM, et al (1995) Comparison of single reading with double reading of mammograms, and change in effectiveness with experience. Br J Radiol 68:958–962.
Additional Considerations

Breast cancer detection:
-Blanch et al. (2013) reported a higher cancer detection rate with double reading with or without consensus or arbitration-digital mammography than with single reading-film mammography (3.88 per 1000 vs 2.63 per 1000).
False positive screening result:
-Roman et al. (2012) reported that double reading with or without consensus or arbitration mammograms conferred a higher risk of false positive results (OR = 2.06; 95% CI 2.00–2.13) than single reading.
Breast cancer invasiveness:
-Double reading with consensus detected 17.0% of DCIS and single reading detected 16.3% (Duijm 2009).
-Double reading with arbitration detected 30.3% of DCIS and single reading detected 29.5% (Gromet 2008).
-Double reading with consensus or arbitration detected 32% more small (<15 mm) invasive cancers than single reading, in prevalent screening. This detection increased to 73% more cancers in incident screening. (Blanks 1998).
The GDG notes that two additional subgroup analyses of the systematic review data were conducted. As there were only 3 studies (Duijm 2009, Gromet 2008, Warren 1995), a univariate random effects logistic regression model instead of a bivariate model was fitted. The model assumes a binomial distribution of the data and uses the maximum likelihood estimation, which is an alternative method to estimate between-study variance in situations with few studies or sparse data. (Nyaga 2014, Takwoingi 2017).

Excluding Gromet (2008) (due to arbitration not being performed in the same way in all disagreements):
Double reading with consensus or arbitration
Sensitivity: 0.76 (95% CI: 0.73 to 0.78) Specificity: 0.99 (95% CI: 0.99 to 0.99)
Single reading
Sensitivity: 0.69 (95% CI: 0.66 to 0.72) Specificity: 0.98 (95% CI: 0.98 to 0.98).

Excluding Warren (1995) (due to this study included prevalent screening only):
Double reading with consensus or arbitration
Sensitivity: 0.77 (95% CI: 0.75 to 0.79) Specificity: 0.95 (95% CI: 0.95 to 0.95)
Single reading
Sensitivity: 0.72 (95% CI: 0.69 to 0.74) Specificity: 0.96 (95% CI: 0.96 to 0.96).

The GDG interpreted that there is higher sensitivity with double reading with consensus or arbitration compared to single reading in mammography screening.

The GDG notes that the three studies reviewed did not use digital mammography (Duijm 2009, Gromet 2008, Warren 1995).

The GDG notes that for other outcomes double vs single reading showed the following results per 100,000 screening mammograms: 37 more breast cancers detected; 48 fewer interval breast cancers; 482 more recalls; 443 more false positive results; 77 fewer true positives per 100.000 recalls; 4 more breast cancer in situ per 100.000 screening mammograms; breast cancer stage I 5 more; breast cancer stage II 2 more; breast cancer stage III 1 more; breast cancer stage IV no difference. No estimates were identified for breast cancer mortality.

The GDG discussed the increased rate of false positives and noted that the likelihood of recall and biopsy will vary based on the setting. There is possibly a higher rate of biopsies with double reading with consensus or arbitration in certain settings. In other settings, the GDG notes that many false positives will receive additional imaging, as the next step after double reading with consensus or arbitration and they will not immediately have a biopsy or will have no biopsy at all, since additional imaging was sufficient.

The GDG notes that double reading with consensus or arbitration has more benefits when readers are not highly experienced (Tonita1999) whereas when both readers read 5000 or more mammograms per year double reading with consensus or arbitration increases recalls and false positive results (Duijm2009, Gromet2008, Posso2016).

References
1) Blanch J, et al (2013) Cumulative risk of cancer detection in breast cancer screening by protocol strategy. Breast Cancer Res Treat 138:869–877.
2) Blanks RG, et al (1998) A comparison of cancer detection rates achieved by breast cancer screening programmes by number of readers, for one and two view mammography: results from the UK National Health Service breast screening programme. J Med Screen 5:195–201.
3) Duijm LEM, et al (2009) Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome. Br J Cancer 100:901–907.
4) Gromet M (2008) Comparison of computer-aided detection to double reading of screening mammograms: review of 231,221 mammograms. AJR Am J Roentgenol 190:854–859.
5) Nyaga VN, et al (2014). Metaprop: a Stata command to perform meta-analysis of binomial data. Arch Public Health 10: 72 (1):39.
6) Roman R, et al (2012) Effect of protocol-related variables and women's characteristics on the cumulative false-positive risk in breast cancer screening. Ann Oncol 23:104–111.
7) Tonita JM, et al (1999). Medical radiologic technologist review: effects on population-based breast cancer screening program 211 (2): 529-33.
7) Takwoingi Y, et al (2017) Performance of methods for meta-analysis of diagnostic test accuracy with few studies or sparse data. Stat Meth Med Res 26(4) 1896–1911.

How substantial are the desirable anticipated effects?
Small *
* Possible answers: ( Trivial , Small , Moderate , Large , Varies , Don't know )
Additional Considerations

The GDG notes that the test accuracy shows 1 more true positive detected with double reading with consensus or arbitration compared to single reading per 1,000 women screened.
The GDG notes that other outcomes demonstrate an increase in breast cancer detection and decreased interval cancers detected on follow-up with double readings.
The GDG notes that the effects identified are presented per mammography exam; therefore this increases the impact of the increased detection and decreased interval cancers per woman due to women having multiple mammograms.
The GDG did not reach consensus and therefore voting was conducted. Among 20 GDG members eligible to vote, results were: 10 members voted ‘small’; 9 members voted ‘moderate’; 1 member abstained.

How substantial are the undesirable anticipated effects?
Small *
* Possible answers: ( Large , Moderate , Small , Trivial , Varies , Don't know )
Additional Considerations

The GDG notes that the rate of additional false positives identified in double reading with consensus or arbitration compared to single reading were 443 per 100,000 screening mammograms.
Nonetheless, the GDG felt that this number constitutes a small undesirable effect.
The GDG agreed by consensus that the undesirable anticipated effects were small.

What is the overall certainty of the evidence of test accuracy?
Moderate *
* Possible answers: ( Very low , Low , Moderate , High , No included studies )
Additional Considerations

The GDG notes that the certainty of the evidence for true positives with breast cancer and false negatives was downgraded due to indirectness. This was due to the fact that screen film mammography was used rather than digital mammography in the studies assessed.
The GDG also notes that downgrading was considered for risk of bias, as readings were independently screened in one study, not true blinding. The GDG notes that the direction of bias would like bias towards the null hypothesis.
The GDG also notes that the definition of interval cancers were different in the studies as one study only looked at one screening round while the others looked at first and subsequent rounds of screening.
The GDG agreed by consensus that the overall certainty of the evidence of test accuracy was moderate.

What is the overall certainty of the evidence for any critical or important direct benefits, adverse effects or burden of the test?
No included studies *
* Possible answers: ( Very low , Low , Moderate , High , No included studies )
Additional Considerations

The GDG does not expect any difference in the direct benefits or harms from the test to women, whether it is double or single reading.
The GDG judged that women may be reassured if they are aware that double reading on the mammogram will be performed.
The GDG agreed that false positives were not considered a direct effect of the test.
Qualitative Evidence
Kalecinski at al. (2015) reported the results from a qualitative interview in 48 women from a randomly selected sample of women who were invited to attend organised breast cancer screening in 13 French departments between 2010 and 2011. Twenty-seven women chose the organised screening programme, which they considered to be trustworthy, as negative mammograms are double checked by a second radiologist. Twenty-one women preferred individual screening, which they considered to be more reliable, less anonymous and providing them with more liberty to take control of their own health.
The GDG also noted that the only harm may be that there is a time delay in women obtaining their results in double reading with consensus or arbitration compared to single reading
The GDG agreed by consensus there were no included studies.

What is the overall certainty of the evidence of effects of the management that is guided by the test results?
Moderate *
* Possible answers: ( Very low , Low , Moderate , High , No included studies )
Additional Considerations

The GDG agreed by consensus that the overall certainty of the effects of management were moderate.

How certain is the link between test results and management decisions?
High *
* Possible answers: ( Very low , Low , Moderate , High , No included studies )
Additional Considerations

The GDG agreed by consensus that the overall certainty of the link between test results and management were high.

What is the overall certainty of the evidence of effects of the test?
Moderate *
* Possible answers: ( Very low , Low , Moderate , High , No included studies )
Additional Considerations

The GDG agreed by consensus that the overall certainty of effects were moderate, as the certainty of test accuracy results was moderate, there was moderate certainty in the treatment effects and high certainty with regards to link between the test results and the management decisions. The direct consequences of the test were not considered by the GDG to be a decisive element here.

Is there important uncertainty about or variability in how much people value the main outcomes?
Probably no important uncertainty or variability *
* Possible answers: ( Important uncertainty or variability , Possibly important uncertainty or variability , Probably no important uncertainty or variability , No important uncertainty or variability , No known undesirable outcomes )
Additional Considerations

The GDG agreed by consensus that there was probably no important uncertainty or variability in values by women.

Does the balance between desirable and undesirable effects favor the intervention or the comparison?
Probably favors the intervention *
* Possible answers: ( Favors the comparison , Probably favors the comparison , Does not favor either the intervention or the comparison , Probably favors the intervention , Favors the intervention , Varies , Don't know )
Additional Considerations

The GDG did not reach consensus and therefore voting was conducted. Among 20 GDG members eligible to vote, results were: 15 members voted ‘probably favours intervention’; 2 members voted ‘does not favour either the intervention or the comparison’; 2 members voted ‘favours the intervention’ and 1 member abstained.

How large are the resource requirements (costs)?
Varies *
* Possible answers: ( Large costs , Moderate costs , Negligible costs and savings , Moderate savings , Large savings , Varies , Don't know )
Research Evidence
Double reading vs. single reading (costs and resources used from the societal perspective).

References
1) Brown J, et al. Mammography screening: an incremental cost effectiveness analysis of double versus single reading of mammograms. BMJ. 1996 Mar 30; 312(7034):809–12.
2) Leivo T, et al. Incremental cost-effectiveness of double-reading mammograms. Breast Cancer Res Treat. 1999; 54(3): 261–7.

Double reading vs. single reading (costs and resources used from the Health System perspective).


References
1) Brown J, et al. Mammography screening: an incremental cost effectiveness analysis of double versus single reading of mammograms. BMJ. 1996 Mar 30; 312(7034):809–12.
2) Leivo T, et al. Incremental cost-effectiveness of double-reading mammograms. Breast Cancer Res Treat. 1999; 54(3): 261–7.
3) Posso M, et al. Cost-Effectiveness of Double Reading versus Single Reading of Mammograms in a Breast Cancer Screening Programme. PLoS ONE. Public Library of Science; 2016; 11(7):e0159806.
Additional Considerations

The GDG noted that due to the fact that two of the studies were from before the time of digital mammography (twenty years old), the resources evidence is very indirect. Therefore, the GDG did not consider the evidence from Brown1996 and Leivo1999.
The GDG considered only Posso (2016) for resource requirement evidence.
The GDG notes that increased costs observed may be due to both additional costs of reading and for additional assessments required as a result of increased detection of false positives.
The GDG also discussed that the Posso (2016) study includes early recall as a cost, which impacts the total costs of double reading.
Cost of consensus and arbitration (radiologists' time and administrative costs) is included in double reading which is why the costs of double reading are higher than single reading. In Posso (2016) study, approximately 6% of examinations went to consensus or arbitration.
GDG discussed that these figures are different, and slightly lower, in other European countries.
The GDG notes that the reading and reporting time for one digital mammography in screen-reading setting is in average 33 to 48 seconds [Bernardi 2012, Gilbert 2015, Skaane 2013], which is lower than the time needed for consensus, arbitration or recall assessments.
The GDG agreed by consensus that the costs will vary according to the setting, but noted that they will always be greater with double reading than single reading. The GDG agreed that the actual cost of the reading of mammograms (whether it is double or single) is probably small with regards to the total cost of the screening programme. The GDG noted that the proportionate cost increase will vary and it may be negligible, moderate or large depending on the setting.

References
1) Bernardi D, et al. Application of breast tomosynthesis in screening: incremental effect on mammography acquisition and reading time. Br J Radiol. 2012; 85 (1020): e1174-8.
2) Gilbert F, et al. The TOMMY trial: a comparison of TOMosynthesis with digital MammographY in the UK NHS Breast Screening Programme--a multicentre retrospective reading study comparing the diagnostic performance of digital breast tomosynthesis and digital mammography with digital mammography alone. Health Technol Assess. 2015; 19(4): i-xxv, 1-136.
3) Posso M, et al. Cost-Effectiveness of Double Reading versus Single Reading of Mammograms in a Breast Cancer Screening Programme. PLoS ONE. Public Library of Science; 2016; 11(7):e0159806.
4) Skaane P, et al. Prospective trial comparing full-field digital mammography (FFDM) versus combined FFDM and tomosynthesis in a population-based screening programme using independent double reading with arbitration. Eur Radiol. 2013; 23 (8): 2061-2071.

What is the certainty of the evidence of resource requirements (costs)?
Low *
* Possible answers: ( Very low , Low , Moderate , High , No included studies )
Research Evidence
The quality is probably low due to indirectness and imprecision. Two studies were conducted 20 years ago and one of them shows contradictory results. Only one study was performed based on digital mammography screening and it was conducted in Spain.
Additional Considerations

The GDG notes that this relates to the Posso (2016) study in Spain only.


Does the cost-effectiveness of the intervention favor the intervention or the comparison?
Varies *
* Possible answers: ( Favors the comparison , Probably favors the comparison , Does not favor either the intervention or the comparison , Probably favors the intervention , Favors the intervention , Varies , No included studies )
Research Evidence
Cost-effectiveness per detected cancer (double vs. single reading)


References
1) Brown J, et al. Mammography screening: an incremental cost effectiveness analysis of double versus single reading of mammograms. BMJ. 1996 Mar 30; 312(7034):809–12.
2) Leivo T, et al. Incremental cost-effectiveness of double-reading mammograms. Breast Cancer Res Treat. 1999; 54(3): 261–7.
3) Posso M, et al. Cost-Effectiveness of Double Reading versus Single Reading of Mammograms in a Breast Cancer Screening Programme. PLoS ONE. Public Library of Science; 2016; 11(7):e0159806.
Additional Considerations

The GDG suggests consideration of local cost effectiveness data for application to different settings. Research evidence was only identified for Spain (Posso 2016).
The GDG noted that in Europe a common fixed threshold for cost-effectiveness is not used.

The GDG therefore agreed that the cost-effectiveness varies. In some settings it may not be cost-effective.

What would be the impact on health equity?
Probably no impact *
* Possible answers: ( Reduced , Probably reduced , Probably no impact , Probably increased , Increased , Varies , Don't know )
Additional Considerations

The GDG agreed by consensus that there would probably be no impact on health equity.

Is the intervention acceptable to key stakeholders?
Yes *
* Possible answers: ( No , Probably no , Probably yes , Yes , Varies , Don't know )
Additional Considerations

The GDG judged that patients would likely find this intervention acceptable. The GDG judged that certain radiologists and clinicians may not find double reading with consensus or arbitration acceptable. The GDG judged that policy-makers would likely find this acceptable as evidenced by its widespread use in current practice.

Is the intervention feasible to implement?
Yes *
* Possible answers: ( No , Probably no , Probably yes , Yes , Varies , Don't know )
Additional Considerations

The GDG judged by consensus that it would likely be feasible to implement.
The GDG notes that in some settings, capacity (human resources of mammography readers) may make the feasibility of performing double reading with consensus or arbitration more challenging.

Bibliography

Evidence of effects

Research evidence
  • Duijm LEM, et al. Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome. Br J Cancer. 2009; 100(6): 901–7.
  • Gromet M. Comparison of computer-aided detection to double reading of screening mammograms: review of 231,221 mammograms. AJR Am J Roentgenol. 2008; 190(4): 854–9.
  • Leivo T, et al. Incremental cost-effectiveness of double-reading mammograms. Breast Cancer Res Treat. 1999; 54(3): 261–7.
  • Liston JC, Dall BJG. Can the NHS Breast Screening Programme afford not to double read screening mammograms? Clin Radiol. 2003; 58(6): 474–7.
  • Pauli R, et al. Comparison of radiographer/radiologist double film reading with single reading in breast cancer screening. J Med Screen. 1996; 3(1): 18–22.
  • Posso M, et al. Double versus single reading of mammograms in a breast cancer screening programme: a cost-consequence analysis. Eur Radiol. 2016; 26(9): 3262–71.
  • Tonita JM, et al. Medical radiologic technologist review: effects on a population-based breast cancer screening program. Radiology. Radiological Society of North America; 1999; 211(2):529–33.
 Additional considerations
  • Blanch J, et al (2013) Cumulative risk of cancer detection in breast cancer screening by protocol strategy. Breast Cancer Res Treat 138:869–877.
  • Blanks RG, et al (1998) A comparison of cancer detection rates achieved by breast cancer screening programmes by number of readers, for one and two view mammography: results from the UK National Health Service breast screening programme. J Med Screen 5:195–201.
  • Duijm LEM, et al (2009) Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome. Br J Cancer 100:901–907.
  • Gromet M (2008) Comparison of computer-aided detection to double reading of screening mammograms: review of 231,221 mammograms. AJR Am J Roentgenol 190:854–859.
  • Nyaga VN, et al (2014). Metaprop: a Stata command to perform meta-analysis of binomial data. Arch Public Health 10: 72 (1):39.
  • Roman R, et al (2012) Effect of protocol-related variables and women's characteristics on the cumulative false-positive risk in breast cancer screening. Ann Oncol 23:104–111.
  • Tonita JM, et al (1999). Medical radiologic technologist review: effects on population-based breast cancer screening program 211 (2): 529-33.
  • Takwoingi Y, et al (2017) Performance of methods for meta-analysis of diagnostic test accuracy with few studies or sparse data. Stat Meth Med Res 26(4) 1896–1911.

Economic evidence
  •  Brown J, et al. Mammography screening: an incremental cost effectiveness analysis of double versus single reading of mammograms. BMJ. 1996 Mar 30; 312(7034):809–12.
  • Leivo T, et al. Incremental cost-effectiveness of double-reading mammograms. Breast Cancer Res Treat. 1999; 54(3): 261–7.