TY - GEN
T1 - Structural visual guidance attention networks in retinopathy of prematurity
AU - Yildiz, V.
AU - Ioannidis, S.
AU - Yildiz, I.
AU - Tian, P.
AU - Campbell, J. P.
AU - Ostmo, S.
AU - Kalpathy-Cramer, J.
AU - Chiang, M. F.
AU - Erdogmus, D.
AU - Dy, J.
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/4/13
Y1 - 2021/4/13
N2 - Convolutional neural networks (CNNs) have shown great performance in medical diagnostic applications. However, because their black-box nature, clinicians are reluctant to trust CNN diagnostic outcomes. Incorporating visual attention capabilities in CNNs enhances interpretability by highlighting regions in the images that CNNs utilize for prediction. Clinicians can often provide domain knowledge on relevant features: e.g., to diagnose retinopathy of prematurity (ROP), structural information such as tortuosity of vessels aid clinicians in diagnosing ROP. We propose a Structural Visual Guidance Attention Networks (SVGA-Net) method, that leverages structural domain knowledge to guide visual attention in CNNs. Experiments on a dataset of 5512 posterior retinal images, taken using a wide-angle fundus camera, show that SVGA-Net achieves 0.987 and 0.979 AUC to predict plus and normal categories, respectively. SVGA-Net consistently results in higher AUC compared to visual attention CNNs without guidance, baseline CNNs, and CNNs with structured masks.
AB - Convolutional neural networks (CNNs) have shown great performance in medical diagnostic applications. However, because their black-box nature, clinicians are reluctant to trust CNN diagnostic outcomes. Incorporating visual attention capabilities in CNNs enhances interpretability by highlighting regions in the images that CNNs utilize for prediction. Clinicians can often provide domain knowledge on relevant features: e.g., to diagnose retinopathy of prematurity (ROP), structural information such as tortuosity of vessels aid clinicians in diagnosing ROP. We propose a Structural Visual Guidance Attention Networks (SVGA-Net) method, that leverages structural domain knowledge to guide visual attention in CNNs. Experiments on a dataset of 5512 posterior retinal images, taken using a wide-angle fundus camera, show that SVGA-Net achieves 0.987 and 0.979 AUC to predict plus and normal categories, respectively. SVGA-Net consistently results in higher AUC compared to visual attention CNNs without guidance, baseline CNNs, and CNNs with structured masks.
KW - Attention
KW - CNN
KW - Interpretability
KW - ROP
UR - http://www.scopus.com/inward/record.url?scp=85107218507&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107218507&partnerID=8YFLogxK
U2 - 10.1109/ISBI48211.2021.9433881
DO - 10.1109/ISBI48211.2021.9433881
M3 - Conference contribution
AN - SCOPUS:85107218507
T3 - Proceedings - International Symposium on Biomedical Imaging
SP - 353
EP - 357
BT - 2021 IEEE 18th International Symposium on Biomedical Imaging, ISBI 2021
PB - IEEE Computer Society
T2 - 18th IEEE International Symposium on Biomedical Imaging, ISBI 2021
Y2 - 13 April 2021 through 16 April 2021
ER -