TY - JOUR
T1 - Dealing with inter-expert variability in retinopathy of prematurity
T2 - A machine learning approach
AU - Bolón-Canedo, V.
AU - Ataer-Cansizoglu, E.
AU - Erdogmus, D.
AU - Kalpathy-Cramer, J.
AU - Fontenla-Romero, O.
AU - Alonso-Betanzos, A.
AU - Chiang, M. F.
N1 - Funding Information:
This research has been financially supported in part by the Secretaría de Estado de Investigación of the Spanish Government through the research project TIN 2012-37954, and by the Xunta de Galicia through the research project GRC 2014/035, both of them partially funded by FEDER funds of the European Union. Also supported by grants IIS-1118061 , IIS-1149570 , SMA-0835976 , CNS-1136027 from NSF , grant R00LM009889 from the NLM/NIH ; by grants EY19474 and 1R21EY022387-01A1 from the NIH , and by unrestricted departmental funding from Research to Prevent Blindness. V. Bolón-Canedo acknowledges the support of Xunta de Galicia under posdoctoral Grant code POS-A/2014/164. M.F. Chiang is an unpaid member of the Scientific Advisory Board for Clarity Medical Systems (Pleasanton, CA).
Publisher Copyright:
© 2015 Elsevier Ireland Ltd.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - Background and objective: Understanding the causes of disagreement among experts in clinical decision making has been a challenge for decades. In particular, a high amount of variability exists in diagnosis of retinopathy of prematurity (ROP), which is a disease affecting low birth weight infants and a major cause of childhood blindness. A possible cause of variability, that has been mostly neglected in the literature, is related to discrepancies in the sets of important features considered by different experts. In this paper we propose a methodology which makes use of machine learning techniques to understand the underlying causes of inter-expert variability. Methods: The experiments are carried out on a dataset consisting of 34 retinal images, each with diagnoses provided by 22 independent experts. Feature selection techniques are applied to discover the most important features considered by a given expert. Those features selected by each expert are then compared to the features selected by other experts by applying similarity measures. Finally, an automated diagnosis system is built in order to check if this approach can be helpful in solving the problem of understanding high inter-rater variability. Results: The experimental results reveal that some features are mostly selected by the feature selection methods regardless the considered expert. Moreover, for pairs of experts with high percentage agreement among them, the feature selection algorithms also select similar features. By using the relevant selected features, the classification performance of the automatic system was improved or maintained. Conclusions: The proposed methodology provides a handy framework to identify important features for experts and check whether the selected features reflect the pairwise agreements/disagreements. These findings may lead to improved diagnostic accuracy and standardization among clinicians, and pave the way for the application of this methodology to other problems which present inter-expert variability.
AB - Background and objective: Understanding the causes of disagreement among experts in clinical decision making has been a challenge for decades. In particular, a high amount of variability exists in diagnosis of retinopathy of prematurity (ROP), which is a disease affecting low birth weight infants and a major cause of childhood blindness. A possible cause of variability, that has been mostly neglected in the literature, is related to discrepancies in the sets of important features considered by different experts. In this paper we propose a methodology which makes use of machine learning techniques to understand the underlying causes of inter-expert variability. Methods: The experiments are carried out on a dataset consisting of 34 retinal images, each with diagnoses provided by 22 independent experts. Feature selection techniques are applied to discover the most important features considered by a given expert. Those features selected by each expert are then compared to the features selected by other experts by applying similarity measures. Finally, an automated diagnosis system is built in order to check if this approach can be helpful in solving the problem of understanding high inter-rater variability. Results: The experimental results reveal that some features are mostly selected by the feature selection methods regardless the considered expert. Moreover, for pairs of experts with high percentage agreement among them, the feature selection algorithms also select similar features. By using the relevant selected features, the classification performance of the automatic system was improved or maintained. Conclusions: The proposed methodology provides a handy framework to identify important features for experts and check whether the selected features reflect the pairwise agreements/disagreements. These findings may lead to improved diagnostic accuracy and standardization among clinicians, and pave the way for the application of this methodology to other problems which present inter-expert variability.
KW - Classification
KW - Clinical decision-making
KW - Feature selection
KW - Inter-expert variability
KW - Machine learning
KW - Retinopathy of prematurity
UR - http://www.scopus.com/inward/record.url?scp=84939574276&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939574276&partnerID=8YFLogxK
U2 - 10.1016/j.cmpb.2015.06.004
DO - 10.1016/j.cmpb.2015.06.004
M3 - Article
C2 - 26120072
AN - SCOPUS:84939574276
SN - 0169-2607
VL - 122
SP - 1
EP - 15
JO - Computer Methods and Programs in Biomedicine
JF - Computer Methods and Programs in Biomedicine
IS - 1
ER -