Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity

Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows

doi:10.1016/j.oret.2022.02.015

Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity

Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows

Pediatrics

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. Design: Evaluation of a diagnostic test or technology. Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). Main Outcome Measures: Model performance using AUROC and linearly weighted κ. Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.

Original language	English (US)
Pages (from-to)	657-663
Number of pages	7
Journal	Ophthalmology Retina
Volume	6
Issue number	8
DOIs	https://doi.org/10.1016/j.oret.2022.02.015
State	Published - Aug 2022

Keywords

Deep learning
Epidemiology
Federated learning
Retinopathy of prematurity

ASJC Scopus subject areas

Ophthalmology

Access to Document

10.1016/j.oret.2022.02.015

Cite this

Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows (2022). Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity. Ophthalmology Retina, 6(8), 657-663. https://doi.org/10.1016/j.oret.2022.02.015

Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity. / Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows.
In: Ophthalmology Retina, Vol. 6, No. 8, 08.2022, p. 657-663.

Research output: Contribution to journal › Article › peer-review

Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows 2022, 'Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity', Ophthalmology Retina, vol. 6, no. 8, pp. 657-663. https://doi.org/10.1016/j.oret.2022.02.015

Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows. Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity. Ophthalmology Retina. 2022 Aug;6(8):657-663. doi: 10.1016/j.oret.2022.02.015

@article{7b89564406304ff6be0e80c064e2baf2,

title = "Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity",

abstract = "Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. Design: Evaluation of a diagnostic test or technology. Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). Main Outcome Measures: Model performance using AUROC and linearly weighted κ. Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.",

keywords = "Deep learning, Epidemiology, Federated learning, Retinopathy of prematurity",

author = "{Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows} and Charles Lu and Adam Hanif and Praveer Singh and Ken Chang and Coyner, {Aaron S.} and Brown, {James M.} and Susan Ostmo and Chan, {Robison V.Paul} and Daniel Rubin and Chiang, {Michael F.} and Campbell, {John Peter} and Jayashree Kalpathy-Cramer and Kim, {Sang Jin} and Kemal Sonmez and Robert Schelonka and Aaron Coyner and Chan, {R. V.Paul} and Karyn Jonas and Bhavana Kolli and Jason Horowitz and Osode Coki and Eccles, {Cheryl Ann} and Leora Sarna and Anton Orlin and Audina Berrocal and Catherin Negron and Kimberly Denser and Kristi Cumming and Tammy Osentoski and Tammy Check and Mary Zajechowski and Thomas Lee and Aaron Nagiel and Evan Kruger and Kathryn McGovern and Dilshad Contractor and Margaret Havunjian and Charles Simmons and Raghu Murthy and Sharon Galvis and Jerome Rotter and Ida Chen and Xiaohui Li and Kent Taylor and Kaye Roll and Hartnett, {Mary Elizabeth} and Leah Owen and Darius Moshfeghi and Mariana Nunez and Zac Wennber-Smith",

note = "Publisher Copyright: {\textcopyright} 2022 American Academy of Ophthalmology",

year = "2022",

month = aug,

doi = "10.1016/j.oret.2022.02.015",

language = "English (US)",

volume = "6",

pages = "657--663",

journal = "Ophthalmology Retina",

issn = "2468-7219",

publisher = "Elsevier Inc.",

number = "8",

}

TY - JOUR

T1 - Federated Learning for Multicenter Collaboration in Ophthalmology

T2 - Improving Classification Performance in Retinopathy of Prematurity

AU - Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows

AU - Lu, Charles

AU - Hanif, Adam

AU - Singh, Praveer

AU - Chang, Ken

AU - Coyner, Aaron S.

AU - Brown, James M.

AU - Ostmo, Susan

AU - Chan, Robison V.Paul

AU - Rubin, Daniel

AU - Chiang, Michael F.

AU - Campbell, John Peter

AU - Kalpathy-Cramer, Jayashree

AU - Kim, Sang Jin

AU - Sonmez, Kemal

AU - Schelonka, Robert

AU - Coyner, Aaron

AU - Chan, R. V.Paul

AU - Jonas, Karyn

AU - Kolli, Bhavana

AU - Horowitz, Jason

AU - Coki, Osode

AU - Eccles, Cheryl Ann

AU - Sarna, Leora

AU - Orlin, Anton

AU - Berrocal, Audina

AU - Negron, Catherin

AU - Denser, Kimberly

AU - Cumming, Kristi

AU - Osentoski, Tammy

AU - Check, Tammy

AU - Zajechowski, Mary

AU - Lee, Thomas

AU - Nagiel, Aaron

AU - Kruger, Evan

AU - McGovern, Kathryn

AU - Contractor, Dilshad

AU - Havunjian, Margaret

AU - Simmons, Charles

AU - Murthy, Raghu

AU - Galvis, Sharon

AU - Rotter, Jerome

AU - Chen, Ida

AU - Li, Xiaohui

AU - Taylor, Kent

AU - Roll, Kaye

AU - Hartnett, Mary Elizabeth

AU - Owen, Leah

AU - Moshfeghi, Darius

AU - Nunez, Mariana

AU - Wennber-Smith, Zac

PY - 2022/8

Y1 - 2022/8

N2 - Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. Design: Evaluation of a diagnostic test or technology. Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). Main Outcome Measures: Model performance using AUROC and linearly weighted κ. Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.

AB - Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. Design: Evaluation of a diagnostic test or technology. Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). Main Outcome Measures: Model performance using AUROC and linearly weighted κ. Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.

KW - Deep learning

KW - Epidemiology

KW - Federated learning

KW - Retinopathy of prematurity

UR - http://www.scopus.com/inward/record.url?scp=85128201264&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85128201264&partnerID=8YFLogxK

U2 - 10.1016/j.oret.2022.02.015

DO - 10.1016/j.oret.2022.02.015

M3 - Article

C2 - 35296449

AN - SCOPUS:85128201264

SN - 2468-7219

VL - 6

SP - 657

EP - 663

JO - Ophthalmology Retina

JF - Ophthalmology Retina

IS - 8

ER -

Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this