High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results

Brian O. Searle; Surendra Dasari; Mark Turner; Ashok P. Reddy; Dongseok Choi; Phillip A. Wilmarth; Ashley L. McCormack; Larry L. David; Srinivasa B. Nagalla

doi:10.1021/ac035258x

High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results

Brian O. Searle, Surendra Dasari, Mark Turner, Ashok P. Reddy, Dongseok Choi, Phillip A. Wilmarth, Ashley L. McCormack, Larry L. David, Srinivasa B. Nagalla

Research output: Contribution to journal › Article › peer-review

130 Scopus citations

Abstract

With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.

Original language	English (US)
Pages (from-to)	2220-2230
Number of pages	11
Journal	Analytical Chemistry
Volume	76
Issue number	8
DOIs	https://doi.org/10.1021/ac035258x
State	Published - Apr 15 2004

ASJC Scopus subject areas

Analytical Chemistry

Access to Document

10.1021/ac035258x

Cite this

Searle, B. O., Dasari, S., Turner, M., Reddy, A. P., Choi, D., Wilmarth, P. A., McCormack, A. L., David, L. L., & Nagalla, S. B. (2004). High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results. Analytical Chemistry, 76(8), 2220-2230. https://doi.org/10.1021/ac035258x

@article{cd89fdc24fd34c999acb950c1ed68809,

title = "High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results",

abstract = "With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.",

author = "Searle, {Brian O.} and Surendra Dasari and Mark Turner and Reddy, {Ashok P.} and Dongseok Choi and Wilmarth, {Phillip A.} and McCormack, {Ashley L.} and David, {Larry L.} and Nagalla, {Srinivasa B.}",

year = "2004",

month = apr,

day = "15",

doi = "10.1021/ac035258x",

language = "English (US)",

volume = "76",

pages = "2220--2230",

journal = "Analytical Chemistry",

issn = "0003-2700",

publisher = "American Chemical Society",

number = "8",

}

TY - JOUR

T1 - High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results

AU - Searle, Brian O.

AU - Dasari, Surendra

AU - Turner, Mark

AU - Reddy, Ashok P.

AU - Choi, Dongseok

AU - Wilmarth, Phillip A.

AU - McCormack, Ashley L.

AU - David, Larry L.

AU - Nagalla, Srinivasa B.

PY - 2004/4/15

Y1 - 2004/4/15

N2 - With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.

AB - With the increasing availability of de novo sequencing algorithms for interpreting high-mass accuracy tandem mass spectrometry (MS/MS) data, there is a growing need for programs that accurately identify proteins from de novo sequencing results. De novo sequences derived from tandem mass spectra of peptides often contain ambiguous regions where the exact amino acid order cannot be determined. One problem this poses for sequence alignment algorithms is the difficulty in distinguishing discrepancies due to de novo sequencing errors from actual genomic sequence variation and posttranslational modifications. We present a novel, mass-based approach to sequence alignment, implemented as a program called OpenSea, to resolve these problems. In this approach, de novo and database sequences are interpreted as masses of residues, and the masses, rather than the amino acid codes, are compared. To provide further flexibility, the masses can be aligned in groups, which can resolve many de novo sequencing errors. The performance of OpenSea was tested with three types of data: a mixture of known proteins, a mixture of unknown proteins that commonly contain sequence variations, and a mixture of posttranslationally modified known proteins. In all three cases, we demonstrate that OpenSea can identify more peptides and proteins than commonly used database-searching programs (SEQUEST and ProteinLynx) while accurately locating sequence variation sites and unanticipated posttranslational modifications in a high-throughput environment.

UR - http://www.scopus.com/inward/record.url?scp=1942423061&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1942423061&partnerID=8YFLogxK

U2 - 10.1021/ac035258x

DO - 10.1021/ac035258x

M3 - Article

C2 - 15080731

AN - SCOPUS:1942423061

SN - 0003-2700

VL - 76

SP - 2220

EP - 2230

JO - Analytical Chemistry

JF - Analytical Chemistry

IS - 8

ER -

High-Throughput Identification of Proteins and Unanticipated Sequence Modifications Using a Mass-Based Alignment Algorithm for MS/MS de Novo Sequencing Results

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this