Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography

Patricia A. Carney; Edward A. Sickles; Barbara S. Monsees; Lawrence W. Bassett; R. James Brenner; Stephen A. Feig; Robert A. Smith; Robert D. Rosenberg; T. Andrew Bogart Ms; Sally Browning; Jane W. Barry; Mary M. Kelly; Khai A. Tran Md; Diana L. Miglioretti

doi:10.1148/radiol.10091636

Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography

Patricia A. Carney, Edward A. Sickles, Barbara S. Monsees, Lawrence W. Bassett, R. James Brenner, Stephen A. Feig, Robert A. Smith, Robert D. Rosenberg, T. Andrew Bogart Ms, Sally Browning, Jane W. Barry, Mary M. Kelly, Khai A. Tran Md, Diana L. Miglioretti

Family Medicine

Research output: Contribution to journal › Article › peer-review

93 Scopus citations

Abstract

Purpose: To develop criteria to identify thresholds for minimally acceptable physician performance in interpreting screening mammography studies and to profile the impact that implementing these criteria may have on the practice of radiology in the United States. Materials and Methods: In an institutional review board-approved, HIPAA-compliant study, an Angoff approach was used in two phases to set criteria for identifying minimally acceptable interpretive performance at screening mammography as measured by sensitivity, specificity, recall rate, positive predictive value (PPV) of recall (PPV ¹) and of biopsy recommendation (PPV²), and cancer detection rate. Performance measures were considered separately. In phase I, a group of 10 expert radiologists considered a hypothetical pool of 100 interpreting physicians and conveyed their cut points of minimally acceptable performance. The experts were informed that a physician's performance falling outside the cut points would result in a recommendation to consider additional training. During each round of scoring, all expert radiologists' cut points were summarized into a mean, median, mode, and range; these were presented back to the group. In phase II, normative data on performance were shown to illustrate the potential impact cut points would have on radiology practice. Rescoring was done until consensus among experts was achieved. Simulation methods were used to estimate the potential impact of performance that improved to acceptable levels if effective additional training was provided. Results: Final cut points to identify low performance were as follows: sensitivity less than 75%, specificity less than 88% or greater than 95%, recall rate less than 5% or greater than 12%, PPV¹ less than 3% or greater than 8%, PPV² less than 20% or greater than 40%, and cancer detection rate less than 2.5 per 1000 interpretations. The selected cut points for performance measures would likely result in 18%-28% of interpreting physicians being considered for additional training on the basis of sensitivity and cancer detection rate, while the cut points for specificity, recall, and PPV¹ and PPV² would likely affect 34%-49% of practicing interpreters. If underperforming physicians moved into the acceptable range, detection of an additional 14 cancers per 100 000 women screened and a reduction in the number of false-positive examinations by 880 per 100 000 women screened would be expected. Conclusion: This study identified minimally acceptable performance levels for interpreters of screening mammography studies. Interpreting physicians whose performance falls outside the identified cut points should be reviewed in the context of their specific practice settings and be considered for additional training.

Original language	English (US)
Pages (from-to)	354-361
Number of pages	8
Journal	RADIOLOGY
Volume	255
Issue number	2
DOIs	https://doi.org/10.1148/radiol.10091636
State	Published - May 2010

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1148/radiol.10091636

Cite this

Carney, P. A., Sickles, E. A., Monsees, B. S., Bassett, L. W., James Brenner, R., Feig, S. A., Smith, R. A., Rosenberg, R. D., Andrew Bogart Ms, T., Browning, S., Barry, J. W., Kelly, M. M., Tran Md, K. A., & Miglioretti, D. L. (2010). Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography. RADIOLOGY, 255(2), 354-361. https://doi.org/10.1148/radiol.10091636

Carney, PA, Sickles, EA, Monsees, BS, Bassett, LW, James Brenner, R, Feig, SA, Smith, RA, Rosenberg, RD, Andrew Bogart Ms, T, Browning, S, Barry, JW, Kelly, MM, Tran Md, KA & Miglioretti, DL 2010, 'Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography', RADIOLOGY, vol. 255, no. 2, pp. 354-361. https://doi.org/10.1148/radiol.10091636

@article{1c940fc0e55045d5a9e7d685ec9bc156,

title = "Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography",

abstract = "Purpose: To develop criteria to identify thresholds for minimally acceptable physician performance in interpreting screening mammography studies and to profile the impact that implementing these criteria may have on the practice of radiology in the United States. Materials and Methods: In an institutional review board-approved, HIPAA-compliant study, an Angoff approach was used in two phases to set criteria for identifying minimally acceptable interpretive performance at screening mammography as measured by sensitivity, specificity, recall rate, positive predictive value (PPV) of recall (PPV 1) and of biopsy recommendation (PPV2), and cancer detection rate. Performance measures were considered separately. In phase I, a group of 10 expert radiologists considered a hypothetical pool of 100 interpreting physicians and conveyed their cut points of minimally acceptable performance. The experts were informed that a physician's performance falling outside the cut points would result in a recommendation to consider additional training. During each round of scoring, all expert radiologists' cut points were summarized into a mean, median, mode, and range; these were presented back to the group. In phase II, normative data on performance were shown to illustrate the potential impact cut points would have on radiology practice. Rescoring was done until consensus among experts was achieved. Simulation methods were used to estimate the potential impact of performance that improved to acceptable levels if effective additional training was provided. Results: Final cut points to identify low performance were as follows: sensitivity less than 75%, specificity less than 88% or greater than 95%, recall rate less than 5% or greater than 12%, PPV1 less than 3% or greater than 8%, PPV2 less than 20% or greater than 40%, and cancer detection rate less than 2.5 per 1000 interpretations. The selected cut points for performance measures would likely result in 18%-28% of interpreting physicians being considered for additional training on the basis of sensitivity and cancer detection rate, while the cut points for specificity, recall, and PPV1 and PPV2 would likely affect 34%-49% of practicing interpreters. If underperforming physicians moved into the acceptable range, detection of an additional 14 cancers per 100 000 women screened and a reduction in the number of false-positive examinations by 880 per 100 000 women screened would be expected. Conclusion: This study identified minimally acceptable performance levels for interpreters of screening mammography studies. Interpreting physicians whose performance falls outside the identified cut points should be reviewed in the context of their specific practice settings and be considered for additional training.",

author = "Carney, {Patricia A.} and Sickles, {Edward A.} and Monsees, {Barbara S.} and Bassett, {Lawrence W.} and {James Brenner}, R. and Feig, {Stephen A.} and Smith, {Robert A.} and Rosenberg, {Robert D.} and {Andrew Bogart Ms}, T. and Sally Browning and Barry, {Jane W.} and Kelly, {Mary M.} and {Tran Md}, {Khai A.} and Miglioretti, {Diana L.}",

year = "2010",

month = may,

doi = "10.1148/radiol.10091636",

language = "English (US)",

volume = "255",

pages = "354--361",

journal = "RADIOLOGY",

issn = "0033-8419",

publisher = "Radiological Society of North America Inc.",

number = "2",

}

TY - JOUR

T1 - Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography

AU - Carney, Patricia A.

AU - Sickles, Edward A.

AU - Monsees, Barbara S.

AU - Bassett, Lawrence W.

AU - James Brenner, R.

AU - Feig, Stephen A.

AU - Smith, Robert A.

AU - Rosenberg, Robert D.

AU - Andrew Bogart Ms, T.

AU - Browning, Sally

AU - Barry, Jane W.

AU - Kelly, Mary M.

AU - Tran Md, Khai A.

AU - Miglioretti, Diana L.

PY - 2010/5

Y1 - 2010/5

N2 - Purpose: To develop criteria to identify thresholds for minimally acceptable physician performance in interpreting screening mammography studies and to profile the impact that implementing these criteria may have on the practice of radiology in the United States. Materials and Methods: In an institutional review board-approved, HIPAA-compliant study, an Angoff approach was used in two phases to set criteria for identifying minimally acceptable interpretive performance at screening mammography as measured by sensitivity, specificity, recall rate, positive predictive value (PPV) of recall (PPV 1) and of biopsy recommendation (PPV2), and cancer detection rate. Performance measures were considered separately. In phase I, a group of 10 expert radiologists considered a hypothetical pool of 100 interpreting physicians and conveyed their cut points of minimally acceptable performance. The experts were informed that a physician's performance falling outside the cut points would result in a recommendation to consider additional training. During each round of scoring, all expert radiologists' cut points were summarized into a mean, median, mode, and range; these were presented back to the group. In phase II, normative data on performance were shown to illustrate the potential impact cut points would have on radiology practice. Rescoring was done until consensus among experts was achieved. Simulation methods were used to estimate the potential impact of performance that improved to acceptable levels if effective additional training was provided. Results: Final cut points to identify low performance were as follows: sensitivity less than 75%, specificity less than 88% or greater than 95%, recall rate less than 5% or greater than 12%, PPV1 less than 3% or greater than 8%, PPV2 less than 20% or greater than 40%, and cancer detection rate less than 2.5 per 1000 interpretations. The selected cut points for performance measures would likely result in 18%-28% of interpreting physicians being considered for additional training on the basis of sensitivity and cancer detection rate, while the cut points for specificity, recall, and PPV1 and PPV2 would likely affect 34%-49% of practicing interpreters. If underperforming physicians moved into the acceptable range, detection of an additional 14 cancers per 100 000 women screened and a reduction in the number of false-positive examinations by 880 per 100 000 women screened would be expected. Conclusion: This study identified minimally acceptable performance levels for interpreters of screening mammography studies. Interpreting physicians whose performance falls outside the identified cut points should be reviewed in the context of their specific practice settings and be considered for additional training.

AB - Purpose: To develop criteria to identify thresholds for minimally acceptable physician performance in interpreting screening mammography studies and to profile the impact that implementing these criteria may have on the practice of radiology in the United States. Materials and Methods: In an institutional review board-approved, HIPAA-compliant study, an Angoff approach was used in two phases to set criteria for identifying minimally acceptable interpretive performance at screening mammography as measured by sensitivity, specificity, recall rate, positive predictive value (PPV) of recall (PPV 1) and of biopsy recommendation (PPV2), and cancer detection rate. Performance measures were considered separately. In phase I, a group of 10 expert radiologists considered a hypothetical pool of 100 interpreting physicians and conveyed their cut points of minimally acceptable performance. The experts were informed that a physician's performance falling outside the cut points would result in a recommendation to consider additional training. During each round of scoring, all expert radiologists' cut points were summarized into a mean, median, mode, and range; these were presented back to the group. In phase II, normative data on performance were shown to illustrate the potential impact cut points would have on radiology practice. Rescoring was done until consensus among experts was achieved. Simulation methods were used to estimate the potential impact of performance that improved to acceptable levels if effective additional training was provided. Results: Final cut points to identify low performance were as follows: sensitivity less than 75%, specificity less than 88% or greater than 95%, recall rate less than 5% or greater than 12%, PPV1 less than 3% or greater than 8%, PPV2 less than 20% or greater than 40%, and cancer detection rate less than 2.5 per 1000 interpretations. The selected cut points for performance measures would likely result in 18%-28% of interpreting physicians being considered for additional training on the basis of sensitivity and cancer detection rate, while the cut points for specificity, recall, and PPV1 and PPV2 would likely affect 34%-49% of practicing interpreters. If underperforming physicians moved into the acceptable range, detection of an additional 14 cancers per 100 000 women screened and a reduction in the number of false-positive examinations by 880 per 100 000 women screened would be expected. Conclusion: This study identified minimally acceptable performance levels for interpreters of screening mammography studies. Interpreting physicians whose performance falls outside the identified cut points should be reviewed in the context of their specific practice settings and be considered for additional training.

UR - http://www.scopus.com/inward/record.url?scp=77951428628&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951428628&partnerID=8YFLogxK

U2 - 10.1148/radiol.10091636

DO - 10.1148/radiol.10091636

M3 - Article

C2 - 20413750

AN - SCOPUS:77951428628

SN - 0033-8419

VL - 255

SP - 354

EP - 361

JO - RADIOLOGY

JF - RADIOLOGY

IS - 2

ER -

Identifying Minimally Acceptable Interpretive Performance Criteria for Screening Mammography

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this