TY - JOUR
T1 - Federated Learning for Multicenter Collaboration in Ophthalmology
T2 - Implications for Clinical Diagnosis and Disease Epidemiology
AU - Imaging and Informatics in Retinopathy of Prematurity Consortium
AU - Hanif, Adam
AU - Lu, Charles
AU - Chang, Ken
AU - Singh, Praveer
AU - Coyner, Aaron S.
AU - Brown, James M.
AU - Ostmo, Susan
AU - Chan, Robison V.Paul
AU - Rubin, Daniel
AU - Chiang, Michael F.
AU - Kalpathy-Cramer, Jayashree
AU - Campbell, John Peter
AU - Kim, Sang Jin
AU - Sonmez, Kemal
AU - Schelonka, Robert
AU - Coyner, Aaron
AU - Chan, R. V.Paul
AU - Jonas, Karyn
AU - Kolli, Bhavana
AU - Horowitz, Jason
AU - Coki, Osode
AU - Eccles, Cheryl Ann
AU - Sarna, Leora
AU - Orlin, Anton
AU - Berrocal, Audina
AU - Negron, Catherin
AU - Denser, Kimberly
AU - Cumming, Kristi
AU - Osentoski, Tammy
AU - Check, Tammy
AU - Zajechowski, Mary
AU - Lee, Thomas
AU - Nagiel, Aaron
AU - Kruger, Evan
AU - McGovern, Kathryn
AU - Contractor, Dilshad
AU - Havunjian, Margaret
AU - Simmons, Charles
AU - Murthy, Raghu
AU - Galvis, Sharon
AU - Rotter, Jerome
AU - Chen, Ida
AU - Li, Xiaohui
AU - Taylor, Kent
AU - Roll, Kaye
AU - Hartnett, Mary Elizabeth
AU - Owen, Leah
AU - Moshfeghi, Darius
AU - Nunez, Mariana
AU - Wennber-Smith, Zac
N1 - Funding Information:
This work was supported by grants R01 EY19474, R01 EY031331, R21 EY031883, and P30 EY10572 from the National Institutes of Health (Bethesda, Maryland), and by unrestricted departmental funding and a Career Development Award (J.P.C.) from Research to Prevent Blindness. The sponsors or funding organizations had no role in the design or conduct of this research.
Publisher Copyright:
© 2022 American Academy of Ophthalmology
PY - 2022/8
Y1 - 2022/8
N2 - Objective: To utilize a deep learning (DL) model trained via federated learning (FL), a method of collaborative training without sharing patient data, to delineate institutional differences in clinician diagnostic paradigms and disease epidemiology in retinopathy of prematurity (ROP). Design: Evaluation of a diagnostic test or technology. Subjects and Controls: We included 5245 patients with wide-angle retinal imaging from the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. Images were labeled with the clinical diagnoses of plus disease (plus, preplus, no plus), which were documented in the chart, and a reference standard diagnosis was determined by 3 image-based ROP graders and the clinical diagnosis. Methods: Demographics (birth weight, gestational age) and clinical diagnoses for all eye examinations were recorded from each institution. Using an FL approach, a DL model for plus disease classification was trained using only the clinical labels. The 3 class probabilities were then converted into a vascular severity score (VSS) for each eye examination, as well as an “institutional VSS,” in which the average of the VSS values assigned to patients’ higher severity (“worse”) eyes at each examination was calculated for each institution. Main Outcome Measures: We compared demographics, clinical diagnoses of plus disease, and institutional VSSs between institutions using the McNemar–Bowker test, 2-proportion Z test, and 1-way analysis of variance with post hoc analysis by the Tukey–Kramer test. Single regression analysis was performed to explore the relationship between demographics and VSSs. Results: We found that the proportion of patients diagnosed with preplus disease varied significantly between institutions (P < 0.001). Using the DL-derived VSS trained on the data from all institutions using FL, we observed differences in the institutional VSS and the level of vascular severity diagnosed as no plus (P < 0.001) across institutions. A significant, inverse relationship between the institutional VSS and mean gestational age was found (P = 0.049, adjusted R2 = 0.49). Conclusions: A DL-derived ROP VSS developed without sharing data between institutions using FL identified differences in the clinical diagnoses of plus disease and overall levels of ROP severity between institutions. Federated learning may represent a method to standardize clinical diagnoses and provide objective measurements of disease for image-based diseases.
AB - Objective: To utilize a deep learning (DL) model trained via federated learning (FL), a method of collaborative training without sharing patient data, to delineate institutional differences in clinician diagnostic paradigms and disease epidemiology in retinopathy of prematurity (ROP). Design: Evaluation of a diagnostic test or technology. Subjects and Controls: We included 5245 patients with wide-angle retinal imaging from the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. Images were labeled with the clinical diagnoses of plus disease (plus, preplus, no plus), which were documented in the chart, and a reference standard diagnosis was determined by 3 image-based ROP graders and the clinical diagnosis. Methods: Demographics (birth weight, gestational age) and clinical diagnoses for all eye examinations were recorded from each institution. Using an FL approach, a DL model for plus disease classification was trained using only the clinical labels. The 3 class probabilities were then converted into a vascular severity score (VSS) for each eye examination, as well as an “institutional VSS,” in which the average of the VSS values assigned to patients’ higher severity (“worse”) eyes at each examination was calculated for each institution. Main Outcome Measures: We compared demographics, clinical diagnoses of plus disease, and institutional VSSs between institutions using the McNemar–Bowker test, 2-proportion Z test, and 1-way analysis of variance with post hoc analysis by the Tukey–Kramer test. Single regression analysis was performed to explore the relationship between demographics and VSSs. Results: We found that the proportion of patients diagnosed with preplus disease varied significantly between institutions (P < 0.001). Using the DL-derived VSS trained on the data from all institutions using FL, we observed differences in the institutional VSS and the level of vascular severity diagnosed as no plus (P < 0.001) across institutions. A significant, inverse relationship between the institutional VSS and mean gestational age was found (P = 0.049, adjusted R2 = 0.49). Conclusions: A DL-derived ROP VSS developed without sharing data between institutions using FL identified differences in the clinical diagnoses of plus disease and overall levels of ROP severity between institutions. Federated learning may represent a method to standardize clinical diagnoses and provide objective measurements of disease for image-based diseases.
KW - Deep learning
KW - Epidemiology
KW - Federated learning
KW - Retinopathy of prematurity
UR - http://www.scopus.com/inward/record.url?scp=85128236250&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128236250&partnerID=8YFLogxK
U2 - 10.1016/j.oret.2022.03.005
DO - 10.1016/j.oret.2022.03.005
M3 - Article
C2 - 35304305
AN - SCOPUS:85128236250
SN - 2468-7219
VL - 6
SP - 650
EP - 656
JO - Ophthalmology Retina
JF - Ophthalmology Retina
IS - 8
ER -