The diagnosis of myelodysplastic syndromes (MDS) requires a high clinical index of suspicion to prompt bone marrow studies as well as subjective assessment of dysplastic morphology. We sought to determine if data collected by automated hematology analyzers during complete blood count (CBC) analysis might help to identify MDS in a routine clinical setting. We collected CBC parameters (including those for research use only and cell population data) and demographic information in a large (>5,000), unselected sequential cohort of outpatients. The cohort was divided into independent training and test groups to develop and validate a random forest classifier that identifies MDS. The classifier effectively identified MDS and had a receiver operating characteristic area under the curve (AUC) of 0.942. Platelet distribution width and the standard deviation of red blood cell distribution width were the most discriminating variables within the classifier. Additionally, a similar classifier was validated with an additional, independent set of >200 patients from a second institution with an AUC of 0.93. This retrospective study demonstrates the feasibility of identifying MDS in an unselected outpatient population using data routinely collected during CBC analysis with a classifier that has been validated using two independent data sets from different institutions.
ASJC Scopus subject areas