Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice

Diana L. Miglioretti; Laura Ichikawa; Robert A. Smith; Diana S.M. Buist; Patricia A. Carney; Berta Geller; Barbara Monsees; Tracy Onega; Robert Rosenberg; Edward A. Sickles; Bonnie C. Yankaskas; Karla Kerlikowske

doi:10.1016/j.acra.2017.03.016

Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice

Diana L. Miglioretti, Laura Ichikawa, Robert A. Smith, Diana S.M. Buist, Patricia A. Carney, Berta Geller, Barbara Monsees, Tracy Onega, Robert Rosenberg, Edward A. Sickles, Bonnie C. Yankaskas, Karla Kerlikowske

Family Medicine

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Rationale and Objectives Evidence is inconsistent about whether radiologists' interpretive performance on a screening mammography test set reflects their performance in clinical practice. This study aimed to estimate the correlation between test set and clinical performance and determine if the correlation is influenced by cancer prevalence or lesion difficulty in the test set. Materials and Methods This institutional review board-approved study randomized 83 radiologists from six Breast Cancer Surveillance Consortium registries to assess one of four test sets of 109 screening mammograms each; 48 radiologists completed a fifth test set of 110 mammograms 2 years later. Test sets differed in number of cancer cases and difficulty of lesion detection. Test set sensitivity and specificity were estimated using woman-level and breast-level recall with cancer status and expert opinion as gold standards. Clinical performance was estimated using women-level recall with cancer status as the gold standard. Spearman rank correlations between test set and clinical performance with 95% confidence intervals (CI) were estimated. Results For test sets with fewer cancers (N = 15) that were more difficult to detect, correlations were weak to moderate for sensitivity (woman level = 0.46, 95% CI = 0.16, 0.69; breast level = 0.35, 95% CI = 0.03, 0.61) and weak for specificity (0.24, 95% CI = 0.01, 0.45) relative to expert recall. Correlations for test sets with more cancers (N = 30) were close to 0 and not statistically significant. Conclusions Correlations between screening performance on a test set and performance in clinical practice are not strong. Test set performance more accurately reflects performance in clinical practice if cancer prevalence is low and lesions are challenging to detect.

Original language	English (US)
Pages (from-to)	1256-1264
Number of pages	9
Journal	Academic radiology
Volume	24
Issue number	10
DOIs	https://doi.org/10.1016/j.acra.2017.03.016
State	Published - Oct 2017

Keywords

Screening mammography
interpretive performance
test sets

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1016/j.acra.2017.03.016

Cite this

Miglioretti, D. L., Ichikawa, L., Smith, R. A., Buist, D. S. M., Carney, P. A., Geller, B., Monsees, B., Onega, T., Rosenberg, R., Sickles, E. A., Yankaskas, B. C., & Kerlikowske, K. (2017). Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice. Academic radiology, 24(10), 1256-1264. https://doi.org/10.1016/j.acra.2017.03.016

Miglioretti, DL, Ichikawa, L, Smith, RA, Buist, DSM, Carney, PA, Geller, B, Monsees, B, Onega, T, Rosenberg, R, Sickles, EA, Yankaskas, BC & Kerlikowske, K 2017, 'Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice', Academic radiology, vol. 24, no. 10, pp. 1256-1264. https://doi.org/10.1016/j.acra.2017.03.016

@article{5ce8173d262b49299004ac37c77106cc,

title = "Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice",

abstract = "Rationale and Objectives Evidence is inconsistent about whether radiologists' interpretive performance on a screening mammography test set reflects their performance in clinical practice. This study aimed to estimate the correlation between test set and clinical performance and determine if the correlation is influenced by cancer prevalence or lesion difficulty in the test set. Materials and Methods This institutional review board-approved study randomized 83 radiologists from six Breast Cancer Surveillance Consortium registries to assess one of four test sets of 109 screening mammograms each; 48 radiologists completed a fifth test set of 110 mammograms 2 years later. Test sets differed in number of cancer cases and difficulty of lesion detection. Test set sensitivity and specificity were estimated using woman-level and breast-level recall with cancer status and expert opinion as gold standards. Clinical performance was estimated using women-level recall with cancer status as the gold standard. Spearman rank correlations between test set and clinical performance with 95% confidence intervals (CI) were estimated. Results For test sets with fewer cancers (N = 15) that were more difficult to detect, correlations were weak to moderate for sensitivity (woman level = 0.46, 95% CI = 0.16, 0.69; breast level = 0.35, 95% CI = 0.03, 0.61) and weak for specificity (0.24, 95% CI = 0.01, 0.45) relative to expert recall. Correlations for test sets with more cancers (N = 30) were close to 0 and not statistically significant. Conclusions Correlations between screening performance on a test set and performance in clinical practice are not strong. Test set performance more accurately reflects performance in clinical practice if cancer prevalence is low and lesions are challenging to detect.",

keywords = "Screening mammography, interpretive performance, test sets",

author = "Miglioretti, {Diana L.} and Laura Ichikawa and Smith, {Robert A.} and Buist, {Diana S.M.} and Carney, {Patricia A.} and Berta Geller and Barbara Monsees and Tracy Onega and Robert Rosenberg and Sickles, {Edward A.} and Yankaskas, {Bonnie C.} and Karla Kerlikowske",

note = "Publisher Copyright: {\textcopyright} 2017 The Association of University Radiologists",

year = "2017",

month = oct,

doi = "10.1016/j.acra.2017.03.016",

language = "English (US)",

volume = "24",

pages = "1256--1264",

journal = "Academic radiology",

issn = "1076-6332",

publisher = "Elsevier USA",

number = "10",

}

TY - JOUR

T1 - Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice

AU - Miglioretti, Diana L.

AU - Ichikawa, Laura

AU - Smith, Robert A.

AU - Buist, Diana S.M.

AU - Carney, Patricia A.

AU - Geller, Berta

AU - Monsees, Barbara

AU - Onega, Tracy

AU - Rosenberg, Robert

AU - Sickles, Edward A.

AU - Yankaskas, Bonnie C.

AU - Kerlikowske, Karla

PY - 2017/10

Y1 - 2017/10

N2 - Rationale and Objectives Evidence is inconsistent about whether radiologists' interpretive performance on a screening mammography test set reflects their performance in clinical practice. This study aimed to estimate the correlation between test set and clinical performance and determine if the correlation is influenced by cancer prevalence or lesion difficulty in the test set. Materials and Methods This institutional review board-approved study randomized 83 radiologists from six Breast Cancer Surveillance Consortium registries to assess one of four test sets of 109 screening mammograms each; 48 radiologists completed a fifth test set of 110 mammograms 2 years later. Test sets differed in number of cancer cases and difficulty of lesion detection. Test set sensitivity and specificity were estimated using woman-level and breast-level recall with cancer status and expert opinion as gold standards. Clinical performance was estimated using women-level recall with cancer status as the gold standard. Spearman rank correlations between test set and clinical performance with 95% confidence intervals (CI) were estimated. Results For test sets with fewer cancers (N = 15) that were more difficult to detect, correlations were weak to moderate for sensitivity (woman level = 0.46, 95% CI = 0.16, 0.69; breast level = 0.35, 95% CI = 0.03, 0.61) and weak for specificity (0.24, 95% CI = 0.01, 0.45) relative to expert recall. Correlations for test sets with more cancers (N = 30) were close to 0 and not statistically significant. Conclusions Correlations between screening performance on a test set and performance in clinical practice are not strong. Test set performance more accurately reflects performance in clinical practice if cancer prevalence is low and lesions are challenging to detect.

AB - Rationale and Objectives Evidence is inconsistent about whether radiologists' interpretive performance on a screening mammography test set reflects their performance in clinical practice. This study aimed to estimate the correlation between test set and clinical performance and determine if the correlation is influenced by cancer prevalence or lesion difficulty in the test set. Materials and Methods This institutional review board-approved study randomized 83 radiologists from six Breast Cancer Surveillance Consortium registries to assess one of four test sets of 109 screening mammograms each; 48 radiologists completed a fifth test set of 110 mammograms 2 years later. Test sets differed in number of cancer cases and difficulty of lesion detection. Test set sensitivity and specificity were estimated using woman-level and breast-level recall with cancer status and expert opinion as gold standards. Clinical performance was estimated using women-level recall with cancer status as the gold standard. Spearman rank correlations between test set and clinical performance with 95% confidence intervals (CI) were estimated. Results For test sets with fewer cancers (N = 15) that were more difficult to detect, correlations were weak to moderate for sensitivity (woman level = 0.46, 95% CI = 0.16, 0.69; breast level = 0.35, 95% CI = 0.03, 0.61) and weak for specificity (0.24, 95% CI = 0.01, 0.45) relative to expert recall. Correlations for test sets with more cancers (N = 30) were close to 0 and not statistically significant. Conclusions Correlations between screening performance on a test set and performance in clinical practice are not strong. Test set performance more accurately reflects performance in clinical practice if cancer prevalence is low and lesions are challenging to detect.

KW - Screening mammography

KW - interpretive performance

KW - test sets

UR - http://www.scopus.com/inward/record.url?scp=85019586734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019586734&partnerID=8YFLogxK

U2 - 10.1016/j.acra.2017.03.016

DO - 10.1016/j.acra.2017.03.016

M3 - Article

C2 - 28551400

AN - SCOPUS:85019586734

SN - 1076-6332

VL - 24

SP - 1256

EP - 1264

JO - Academic radiology

JF - Academic radiology

IS - 10

ER -

Correlation Between Screening Mammography Interpretive Performance on a Test Set and Performance in Clinical Practice

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this