TY - JOUR
T1 - Federated Learning for Multicenter Collaboration in Ophthalmology
T2 - Improving Classification Performance in Retinopathy of Prematurity
AU - Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows
AU - Lu, Charles
AU - Hanif, Adam
AU - Singh, Praveer
AU - Chang, Ken
AU - Coyner, Aaron S.
AU - Brown, James M.
AU - Ostmo, Susan
AU - Chan, Robison V.Paul
AU - Rubin, Daniel
AU - Chiang, Michael F.
AU - Campbell, John Peter
AU - Kalpathy-Cramer, Jayashree
AU - Kim, Sang Jin
AU - Sonmez, Kemal
AU - Schelonka, Robert
AU - Coyner, Aaron
AU - Chan, R. V.Paul
AU - Jonas, Karyn
AU - Kolli, Bhavana
AU - Horowitz, Jason
AU - Coki, Osode
AU - Eccles, Cheryl Ann
AU - Sarna, Leora
AU - Orlin, Anton
AU - Berrocal, Audina
AU - Negron, Catherin
AU - Denser, Kimberly
AU - Cumming, Kristi
AU - Osentoski, Tammy
AU - Check, Tammy
AU - Zajechowski, Mary
AU - Lee, Thomas
AU - Nagiel, Aaron
AU - Kruger, Evan
AU - McGovern, Kathryn
AU - Contractor, Dilshad
AU - Havunjian, Margaret
AU - Simmons, Charles
AU - Murthy, Raghu
AU - Galvis, Sharon
AU - Rotter, Jerome
AU - Chen, Ida
AU - Li, Xiaohui
AU - Taylor, Kent
AU - Roll, Kaye
AU - Hartnett, Mary Elizabeth
AU - Owen, Leah
AU - Moshfeghi, Darius
AU - Nunez, Mariana
AU - Wennber-Smith, Zac
N1 - Publisher Copyright:
© 2022 American Academy of Ophthalmology
PY - 2022/8
Y1 - 2022/8
N2 - Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. Design: Evaluation of a diagnostic test or technology. Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). Main Outcome Measures: Model performance using AUROC and linearly weighted κ. Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.
AB - Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. Design: Evaluation of a diagnostic test or technology. Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). Main Outcome Measures: Model performance using AUROC and linearly weighted κ. Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.
KW - Deep learning
KW - Epidemiology
KW - Federated learning
KW - Retinopathy of prematurity
UR - http://www.scopus.com/inward/record.url?scp=85128201264&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128201264&partnerID=8YFLogxK
U2 - 10.1016/j.oret.2022.02.015
DO - 10.1016/j.oret.2022.02.015
M3 - Article
C2 - 35296449
AN - SCOPUS:85128201264
SN - 2468-7219
VL - 6
SP - 657
EP - 663
JO - Ophthalmology Retina
JF - Ophthalmology Retina
IS - 8
ER -