Motivation: High-throughput reverse-phase protein array (RPPA) technology allows for the parallel measurement of protein expression levels in approximately 1000 samples. However, the many steps required in the complex protocol (sample lysate preparation, slide printing, hybridization, washing and amplified detection) may create substantial variability in data quality. We are not aware of any other quality control algorithm that is tuned to the special characteristics of RPPAs. Results: We have developed a novel classifier for quality control of RPPA experiments using a generalized linear model and logistic function. The outcome of the classifier, ranging from 0 to 1, is defined as the probability that a slide is of good quality. After training, we tested the classifier using two independent validation datasets. We conclude that the classifier can distinguish RPPA slides of good quality from those of poor quality sufficiently well such that normalization schemes, protein expression patterns and advanced biological analyses will not be drastically impacted by erroneous measurements or systematic variations. Availability and implementation: The classifier, implemented in the "SuperCurve" R package, can be freely downloaded at http://bioinformatics.mdanderson.org/main/OOMPA:Overview or http://r-forge.r-project.org/projects/supercurve/. The data used to develop and validate the classifier are available at http://bioinformatics.mdanderson.org/MOAR.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics