Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data

Carl R. Pelz; Molly Kulesz-Martin; Grover Bagby; Rosalie C. Sears

doi:10.1186/1471-2105-9-520

Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data

Carl R. Pelz, Molly Kulesz-Martin, Grover Bagby, Rosalie C. Sears

Molecular and Medical Genetics

Research output: Contribution to journal › Article › peer-review

41 Scopus citations

Abstract

Background: Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a method to reduce this type of variation applied after three common procedures for processing microarray data: MAS 5.0, RMA, and dChip®. Results: We commonly observe intensity-dependent technical variation between samples in a single microarray experiment. This is most common when MAS 5.0 is used to process probe level data, but we also see this type of technical variation with RMA and dChip® processed data. Datasets with unbalanced numbers of up and down regulated genes seem to be particularly susceptible to this type of intensity-dependent technical variation. Unbalanced gene regulation is common when studying cancer samples or genetically manipulated animal models and preservation of this biologically relevant information, while removing technical variation has not been well addressed in the literature. We propose a method based on using rank-invariant, endogenous transcripts as reference points for normalization (GRSN). While the use of rank-invariant transcripts has been described previously, we have added to this concept by the creation of a global rank-invariant set of transcripts used to generate a robust average reference that is used to normalize all samples within a dataset. The global rank-invariant set is selected in an iterative manner so as to preserve unbalanced gene expression. Moreover, our method works well as an overlay that can be applied to data already processed with other probe set summary methods. We demonstrate that this additional normalization step at the "probe set level" effectively corrects a specific type of technical variation that often distorts samples in datasets. Conclusion: We have developed a simple post-processing tool to help detect and correct non-linear technical variation in microarray data and demonstrate how it can reduce technical variation and improve the results of downstream statistical gene selection and pathway identification methods.

Original language	English (US)
Article number	520
Journal	BMC bioinformatics
Volume	9
DOIs	https://doi.org/10.1186/1471-2105-9-520
State	Published - Dec 4 2008

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-9-520

Cite this

@article{7455fe806f7248f1aef68ad766df809f,

title = "Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data",

abstract = "Background: Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a method to reduce this type of variation applied after three common procedures for processing microarray data: MAS 5.0, RMA, and dChip{\textregistered}. Results: We commonly observe intensity-dependent technical variation between samples in a single microarray experiment. This is most common when MAS 5.0 is used to process probe level data, but we also see this type of technical variation with RMA and dChip{\textregistered} processed data. Datasets with unbalanced numbers of up and down regulated genes seem to be particularly susceptible to this type of intensity-dependent technical variation. Unbalanced gene regulation is common when studying cancer samples or genetically manipulated animal models and preservation of this biologically relevant information, while removing technical variation has not been well addressed in the literature. We propose a method based on using rank-invariant, endogenous transcripts as reference points for normalization (GRSN). While the use of rank-invariant transcripts has been described previously, we have added to this concept by the creation of a global rank-invariant set of transcripts used to generate a robust average reference that is used to normalize all samples within a dataset. The global rank-invariant set is selected in an iterative manner so as to preserve unbalanced gene expression. Moreover, our method works well as an overlay that can be applied to data already processed with other probe set summary methods. We demonstrate that this additional normalization step at the {"}probe set level{"} effectively corrects a specific type of technical variation that often distorts samples in datasets. Conclusion: We have developed a simple post-processing tool to help detect and correct non-linear technical variation in microarray data and demonstrate how it can reduce technical variation and improve the results of downstream statistical gene selection and pathway identification methods.",

author = "Pelz, {Carl R.} and Molly Kulesz-Martin and Grover Bagby and Sears, {Rosalie C.}",

note = "Funding Information: OHSU Affymetrix{\textregistered} Shared Resource processed the RNA samples from the MKM, GB, SS and RS datasets. This work was supported by NIH/NCI Grants K01 CA086957 and R01 CA100855 to R. C. S.; NIH PHS Grants CA98893 and CA106195 to M. K. .M.; NIH Grants P01 HL48546, R01 HL72321, and P30 CA69533 to G. B., the Department of Veterans Affairs Merit Review Grant (G. B.), and the FA Transcriptome Consortium Grant from the Fanconi Anemia Research Fund (G. B.). We are grateful to the patients and families who agreed to provide bone marrow samples for the FA study and to the other physician scientist members of the consortium who obtained and isolated RNA from bone marrow samples derived from children and adults with Fanconi Anemia (Ricardo Pasquini, John Wagner, and Richard Harris). R. C. S. received additional support for this work from the Oregon Clinical and Translational Research Institute (OCTRI), grant number UL1 RR024140 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research.",

year = "2008",

month = dec,

day = "4",

doi = "10.1186/1471-2105-9-520",

language = "English (US)",

volume = "9",

journal = "BMC bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

}

TY - JOUR

T1 - Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data

AU - Pelz, Carl R.

AU - Kulesz-Martin, Molly

AU - Bagby, Grover

AU - Sears, Rosalie C.

N1 - Funding Information: OHSU Affymetrix® Shared Resource processed the RNA samples from the MKM, GB, SS and RS datasets. This work was supported by NIH/NCI Grants K01 CA086957 and R01 CA100855 to R. C. S.; NIH PHS Grants CA98893 and CA106195 to M. K. .M.; NIH Grants P01 HL48546, R01 HL72321, and P30 CA69533 to G. B., the Department of Veterans Affairs Merit Review Grant (G. B.), and the FA Transcriptome Consortium Grant from the Fanconi Anemia Research Fund (G. B.). We are grateful to the patients and families who agreed to provide bone marrow samples for the FA study and to the other physician scientist members of the consortium who obtained and isolated RNA from bone marrow samples derived from children and adults with Fanconi Anemia (Ricardo Pasquini, John Wagner, and Richard Harris). R. C. S. received additional support for this work from the Oregon Clinical and Translational Research Institute (OCTRI), grant number UL1 RR024140 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research.

PY - 2008/12/4

Y1 - 2008/12/4

N2 - Background: Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a method to reduce this type of variation applied after three common procedures for processing microarray data: MAS 5.0, RMA, and dChip®. Results: We commonly observe intensity-dependent technical variation between samples in a single microarray experiment. This is most common when MAS 5.0 is used to process probe level data, but we also see this type of technical variation with RMA and dChip® processed data. Datasets with unbalanced numbers of up and down regulated genes seem to be particularly susceptible to this type of intensity-dependent technical variation. Unbalanced gene regulation is common when studying cancer samples or genetically manipulated animal models and preservation of this biologically relevant information, while removing technical variation has not been well addressed in the literature. We propose a method based on using rank-invariant, endogenous transcripts as reference points for normalization (GRSN). While the use of rank-invariant transcripts has been described previously, we have added to this concept by the creation of a global rank-invariant set of transcripts used to generate a robust average reference that is used to normalize all samples within a dataset. The global rank-invariant set is selected in an iterative manner so as to preserve unbalanced gene expression. Moreover, our method works well as an overlay that can be applied to data already processed with other probe set summary methods. We demonstrate that this additional normalization step at the "probe set level" effectively corrects a specific type of technical variation that often distorts samples in datasets. Conclusion: We have developed a simple post-processing tool to help detect and correct non-linear technical variation in microarray data and demonstrate how it can reduce technical variation and improve the results of downstream statistical gene selection and pathway identification methods.

AB - Background: Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a method to reduce this type of variation applied after three common procedures for processing microarray data: MAS 5.0, RMA, and dChip®. Results: We commonly observe intensity-dependent technical variation between samples in a single microarray experiment. This is most common when MAS 5.0 is used to process probe level data, but we also see this type of technical variation with RMA and dChip® processed data. Datasets with unbalanced numbers of up and down regulated genes seem to be particularly susceptible to this type of intensity-dependent technical variation. Unbalanced gene regulation is common when studying cancer samples or genetically manipulated animal models and preservation of this biologically relevant information, while removing technical variation has not been well addressed in the literature. We propose a method based on using rank-invariant, endogenous transcripts as reference points for normalization (GRSN). While the use of rank-invariant transcripts has been described previously, we have added to this concept by the creation of a global rank-invariant set of transcripts used to generate a robust average reference that is used to normalize all samples within a dataset. The global rank-invariant set is selected in an iterative manner so as to preserve unbalanced gene expression. Moreover, our method works well as an overlay that can be applied to data already processed with other probe set summary methods. We demonstrate that this additional normalization step at the "probe set level" effectively corrects a specific type of technical variation that often distorts samples in datasets. Conclusion: We have developed a simple post-processing tool to help detect and correct non-linear technical variation in microarray data and demonstrate how it can reduce technical variation and improve the results of downstream statistical gene selection and pathway identification methods.

UR - http://www.scopus.com/inward/record.url?scp=60849121405&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60849121405&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-520

DO - 10.1186/1471-2105-9-520

M3 - Article

C2 - 19055840

AN - SCOPUS:60849121405

SN - 1471-2105

VL - 9

JO - BMC bioinformatics

JF - BMC bioinformatics

M1 - 520

ER -

Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this