An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms

Samuel A. Lee, Steven Wormsley, Sophien Kamoun, Austin F S Lee, Keith Joiner, Brian Wong

Research output: Contribution to journalArticle

59 Citations (Scopus)

Abstract

We sought to identify all genes in the Candida albicans genome database whose deduced proteins would likely be soluble secreted proteins (the secretome). While certain C. albicans secretory proteins have been studied in detail, more data on the entire secretome is needed. One approach to rapidly predict the functions of an entire proteome is to utilize genomic database information and prediction algorithms. Thus, we used a set of prediction algorithms to computationally define a potential C. albicans secretome. We first assembled a validation set of 47 C. albicans proteins that are known to be secreted and 47 that are known not to be secreted. The presence or absence of an N-terminal signal peptide was correctly predicted by SignalP version 2.0 in 47 of 47 known secreted proteins and in 47 of 47 known non-secreted proteins. When all 6165 C. albicans ORFs from CandidaDB were analysed with SignalP, 495 ORFs were predicted to encode proteins with N-terminal signal peptides. In the set of 495 deduced proteins with N-terminal signal peptides, 350 were predicted to have no transmembrane domains (or a single transmembrane domain at the extreme N-terminus) and 300 of these were predicted not to be GPI-anchored. TargetP was used to eliminate proteins with mitochondrial targeting signals, and the final computationally-predicted C. albicans secretome was estimated to consist of up to 283 ORFs. The C. albicans secretome database is available at http://info.med.yale.edu/intmed/infdis/candida/

Original languageEnglish (US)
Pages (from-to)595-610
Number of pages16
JournalYeast
Volume20
Issue number7
DOIs
StatePublished - May 2003
Externally publishedYes

Fingerprint

Candida
Candida albicans
Genes
Genome
Databases
Proteins
genome
prediction
proteins
Protein Sorting Signals
Open Reading Frames
signal peptide
open reading frames
Mitochondrial Proteins
Proteome
proteome
Set theory

Keywords

  • Fungi
  • Genomics
  • Secreted proteins
  • Yeast

ASJC Scopus subject areas

  • Agricultural and Biological Sciences (miscellaneous)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Biochemistry
  • Biotechnology
  • Bioengineering
  • Microbiology

Cite this

An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms. / Lee, Samuel A.; Wormsley, Steven; Kamoun, Sophien; Lee, Austin F S; Joiner, Keith; Wong, Brian.

In: Yeast, Vol. 20, No. 7, 05.2003, p. 595-610.

Research output: Contribution to journalArticle

Lee, Samuel A. ; Wormsley, Steven ; Kamoun, Sophien ; Lee, Austin F S ; Joiner, Keith ; Wong, Brian. / An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms. In: Yeast. 2003 ; Vol. 20, No. 7. pp. 595-610.
@article{64fc62143d4b4d6f9a64b5ac935beab7,
title = "An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms",
abstract = "We sought to identify all genes in the Candida albicans genome database whose deduced proteins would likely be soluble secreted proteins (the secretome). While certain C. albicans secretory proteins have been studied in detail, more data on the entire secretome is needed. One approach to rapidly predict the functions of an entire proteome is to utilize genomic database information and prediction algorithms. Thus, we used a set of prediction algorithms to computationally define a potential C. albicans secretome. We first assembled a validation set of 47 C. albicans proteins that are known to be secreted and 47 that are known not to be secreted. The presence or absence of an N-terminal signal peptide was correctly predicted by SignalP version 2.0 in 47 of 47 known secreted proteins and in 47 of 47 known non-secreted proteins. When all 6165 C. albicans ORFs from CandidaDB were analysed with SignalP, 495 ORFs were predicted to encode proteins with N-terminal signal peptides. In the set of 495 deduced proteins with N-terminal signal peptides, 350 were predicted to have no transmembrane domains (or a single transmembrane domain at the extreme N-terminus) and 300 of these were predicted not to be GPI-anchored. TargetP was used to eliminate proteins with mitochondrial targeting signals, and the final computationally-predicted C. albicans secretome was estimated to consist of up to 283 ORFs. The C. albicans secretome database is available at http://info.med.yale.edu/intmed/infdis/candida/",
keywords = "Fungi, Genomics, Secreted proteins, Yeast",
author = "Lee, {Samuel A.} and Steven Wormsley and Sophien Kamoun and Lee, {Austin F S} and Keith Joiner and Brian Wong",
year = "2003",
month = "5",
doi = "10.1002/yea.988",
language = "English (US)",
volume = "20",
pages = "595--610",
journal = "Yeast",
issn = "0749-503X",
publisher = "John Wiley and Sons Ltd",
number = "7",

}

TY - JOUR

T1 - An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms

AU - Lee, Samuel A.

AU - Wormsley, Steven

AU - Kamoun, Sophien

AU - Lee, Austin F S

AU - Joiner, Keith

AU - Wong, Brian

PY - 2003/5

Y1 - 2003/5

N2 - We sought to identify all genes in the Candida albicans genome database whose deduced proteins would likely be soluble secreted proteins (the secretome). While certain C. albicans secretory proteins have been studied in detail, more data on the entire secretome is needed. One approach to rapidly predict the functions of an entire proteome is to utilize genomic database information and prediction algorithms. Thus, we used a set of prediction algorithms to computationally define a potential C. albicans secretome. We first assembled a validation set of 47 C. albicans proteins that are known to be secreted and 47 that are known not to be secreted. The presence or absence of an N-terminal signal peptide was correctly predicted by SignalP version 2.0 in 47 of 47 known secreted proteins and in 47 of 47 known non-secreted proteins. When all 6165 C. albicans ORFs from CandidaDB were analysed with SignalP, 495 ORFs were predicted to encode proteins with N-terminal signal peptides. In the set of 495 deduced proteins with N-terminal signal peptides, 350 were predicted to have no transmembrane domains (or a single transmembrane domain at the extreme N-terminus) and 300 of these were predicted not to be GPI-anchored. TargetP was used to eliminate proteins with mitochondrial targeting signals, and the final computationally-predicted C. albicans secretome was estimated to consist of up to 283 ORFs. The C. albicans secretome database is available at http://info.med.yale.edu/intmed/infdis/candida/

AB - We sought to identify all genes in the Candida albicans genome database whose deduced proteins would likely be soluble secreted proteins (the secretome). While certain C. albicans secretory proteins have been studied in detail, more data on the entire secretome is needed. One approach to rapidly predict the functions of an entire proteome is to utilize genomic database information and prediction algorithms. Thus, we used a set of prediction algorithms to computationally define a potential C. albicans secretome. We first assembled a validation set of 47 C. albicans proteins that are known to be secreted and 47 that are known not to be secreted. The presence or absence of an N-terminal signal peptide was correctly predicted by SignalP version 2.0 in 47 of 47 known secreted proteins and in 47 of 47 known non-secreted proteins. When all 6165 C. albicans ORFs from CandidaDB were analysed with SignalP, 495 ORFs were predicted to encode proteins with N-terminal signal peptides. In the set of 495 deduced proteins with N-terminal signal peptides, 350 were predicted to have no transmembrane domains (or a single transmembrane domain at the extreme N-terminus) and 300 of these were predicted not to be GPI-anchored. TargetP was used to eliminate proteins with mitochondrial targeting signals, and the final computationally-predicted C. albicans secretome was estimated to consist of up to 283 ORFs. The C. albicans secretome database is available at http://info.med.yale.edu/intmed/infdis/candida/

KW - Fungi

KW - Genomics

KW - Secreted proteins

KW - Yeast

UR - http://www.scopus.com/inward/record.url?scp=0038290322&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038290322&partnerID=8YFLogxK

U2 - 10.1002/yea.988

DO - 10.1002/yea.988

M3 - Article

VL - 20

SP - 595

EP - 610

JO - Yeast

JF - Yeast

SN - 0749-503X

IS - 7

ER -