Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.

Original languageEnglish (US)
Pages (from-to)2905-2912
Number of pages8
JournalJournal of Proteome Research
Volume10
Issue number7
DOIs
StatePublished - Jul 1 2011

Fingerprint

Protein Databases
Proteomics
Databases
Chemical analysis
Proteins
Microarrays
Genes
Genetic Research
Peptides
Proteome
Protein Isoforms
Alcohols
Genetics
Genome
Brain
Research

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

@article{999cca6ea6ba482e96108b03cde41fba,
title = "Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains",
abstract = "Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8{\%} gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1{\%} FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.",
author = "Fei, {Suzanne S.} and Phillip Wilmarth and Robert Hitzemann and Shannon McWeeney and John Belknap and Larry David",
year = "2011",
month = "7",
day = "1",
doi = "10.1021/pr200133p",
language = "English (US)",
volume = "10",
pages = "2905--2912",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "7",

}

TY - JOUR

T1 - Protein database and quantitative analysis considerations when integrating genetics and proteomics to compare mouse strains

AU - Fei, Suzanne S.

AU - Wilmarth, Phillip

AU - Hitzemann, Robert

AU - McWeeney, Shannon

AU - Belknap, John

AU - David, Larry

PY - 2011/7/1

Y1 - 2011/7/1

N2 - Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.

AB - Decades of genetics research comparing mouse strains has identified many regions of the genome associated with quantitative traits. Microarrays have been used to identify which genes in those regions are differentially expressed and are therefore potentially causal; however, genetic variants that affect probe hybridization lead to many false conclusions. Here we used spectral counting to compare brain striata between two mouse strains. Using strain-specific protein databases, we concluded that proteomics was more robust to sequence differences than microarrays; however, some proteins were still significantly affected. To generate strain-specific databases, we used a complete database that contained all of the putative genetic isoforms for each protein. While the increased proteome coverage in the databases led to a 6.8% gain in peptide assignments compared to a nonredundant database, it also necessitated the development of a strategy for grouping similar proteins due to a large number of shared peptides. Of the 4563 identified proteins (2.1% FDR), there were 1807 quantifiable proteins/groups that exceeded minimum count cutoffs. With four pooled biological replicates per strain, we used quantile normalization, ComBat (a package that adjusts for batch effects), and edgeR (a package for differential expression analysis of count data) to identify 101 differentially expressed proteins/groups, 84 of which had a coding region within one of the genomic regions of interest identified by the Portland Alcohol Research Center.

UR - http://www.scopus.com/inward/record.url?scp=79959988142&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959988142&partnerID=8YFLogxK

U2 - 10.1021/pr200133p

DO - 10.1021/pr200133p

M3 - Article

C2 - 21553863

AN - SCOPUS:79959988142

VL - 10

SP - 2905

EP - 2912

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 7

ER -