In this paper, we focus on improving retrieval performance, especially early precision, in the task of solving medical multimodal queries. The queries we deal with consist of a visual component, given as a set of image-examples, and textual annotation, provided as a set of words. The queries' semantic content can be classified along three domain dimensions: anatomy, pathology, and modality. To solve these queries, we interpret their semantic content using both textual and visual data. Medical images often are accompanied by textual annotations, which in turn typically include explicit mention of their image's anatomy or pathology. Annotations rarely include explicit mention of image modality, however. To address this, we use an image's visual features to identify its modality. Our system thereby performs image retrieval by combining purely visual information about an image with information derived from its textual annotations. In order to experimentally evaluate our approach, we performed a set of experiments using the 2009 ImageCLEFmed collection using our integrated system as well as a purely textual retrieval system. Our integrated approach consistently outperformed our text-only system by 43% in MAP and by 71% in precision within the top 5 retrieved documents. We conclude that this improved performance is due to our method of combining visual and textual features.