Convolutional neural networks (CNNs) have shown great performance in medical diagnostic applications. However, because their black-box nature, clinicians are reluctant to trust CNN diagnostic outcomes. Incorporating visual attention capabilities in CNNs enhances interpretability by highlighting regions in the images that CNNs utilize for prediction. Clinicians can often provide domain knowledge on relevant features: e.g., to diagnose retinopathy of prematurity (ROP), structural information such as tortuosity of vessels aid clinicians in diagnosing ROP. We propose a Structural Visual Guidance Attention Networks (SVGA-Net) method, that leverages structural domain knowledge to guide visual attention in CNNs. Experiments on a dataset of 5512 posterior retinal images, taken using a wide-angle fundus camera, show that SVGA-Net achieves 0.987 and 0.979 AUC to predict plus and normal categories, respectively. SVGA-Net consistently results in higher AUC compared to visual attention CNNs without guidance, baseline CNNs, and CNNs with structured masks.