MGAP

The macaque genotype and phenotype resource, a framework for accessing and interpreting macaque variant data, and identifying new models of human disease

Benjamin N. Bimber, Melissa Y. Yan, Samuel M. Peterson, Betsy Ferguson

    Research output: Contribution to journalArticle

    Abstract

    Background: Non-human primates (NHPs), particularly macaques, serve as critical and highly relevant pre-clinical models of human disease. The similarity in human and macaque natural disease susceptibility, along with parallel genetic risk alleles, underscores the value of macaques in the development of effective treatment strategies. Nonetheless, there are limited genomic resources available to support the exploration and discovery of macaque models of inherited disease. Notably, there are few public databases tailored to searching NHP sequence variants, and no other database making use of centralized variant calling, or providing genotype-level data and predicted pathogenic effects for each variant. Results: The macaque Genotype And Phenotype (mGAP) resource is the first public website providing searchable, annotated macaque variant data. The mGAP resource includes a catalog of high confidence variants, derived from whole genome sequence (WGS). The current mGAP release at time of publication (1.7) contains 17,087,212 variants based on the sequence analysis of 293 rhesus macaques. A custom pipeline was developed to enable annotation of the macaque variants, leveraging human data sources that include regulatory elements (ENCODE, RegulomeDB), known disease- or phenotype-associated variants (GRASP), predicted impact (SIFT, PolyPhen2), and sequence conservation (Phylop, PhastCons). Currently mGAP includes 2767 variants that are identical to alleles listed in the human ClinVar database, of which 276 variants, spanning 258 genes, are identified as pathogenic. An additional 12,472 variants are predicted as high impact (SnpEff) and 13,129 are predicted as damaging (PolyPhen2). In total, these variants are predicted to be associated with more than 2000 human disease or phenotype entries reported in OMIM (Online Mendelian Inheritance in Man). Importantly, mGAP also provides genotype-level data for all subjects, allowing identification of specific individuals harboring alleles of interest. Conclusions: The mGAP resource provides variant and genotype data from hundreds of rhesus macaques, processed in a consistent manner across all subjects (https://mgap.ohsu.edu). Together with the extensive variant annotations, mGAP presents unprecedented opportunity to investigate potential genetic associations with currently characterized disease models, and to uncover new macaque models based on parallels with human risk alleles.

    Original languageEnglish (US)
    Article number176
    JournalBMC Genomics
    Volume20
    Issue number1
    DOIs
    StatePublished - Mar 6 2019

    Fingerprint

    Macaca
    Genotype
    Phenotype
    Alleles
    Databases
    Macaca mulatta
    Primates
    Genetic Databases
    Information Storage and Retrieval
    Disease Susceptibility
    Sequence Analysis
    Publications

    Keywords

    • Animal model
    • Database
    • Genome
    • Indian-origin
    • Macaca mulatta
    • Nonhuman primate
    • Rhesus
    • SNP

    ASJC Scopus subject areas

    • Biotechnology
    • Genetics

    Cite this

    MGAP : The macaque genotype and phenotype resource, a framework for accessing and interpreting macaque variant data, and identifying new models of human disease. / Bimber, Benjamin N.; Yan, Melissa Y.; Peterson, Samuel M.; Ferguson, Betsy.

    In: BMC Genomics, Vol. 20, No. 1, 176, 06.03.2019.

    Research output: Contribution to journalArticle

    @article{c4d5fa75c4114d55a929d5a982d05788,
    title = "MGAP: The macaque genotype and phenotype resource, a framework for accessing and interpreting macaque variant data, and identifying new models of human disease",
    abstract = "Background: Non-human primates (NHPs), particularly macaques, serve as critical and highly relevant pre-clinical models of human disease. The similarity in human and macaque natural disease susceptibility, along with parallel genetic risk alleles, underscores the value of macaques in the development of effective treatment strategies. Nonetheless, there are limited genomic resources available to support the exploration and discovery of macaque models of inherited disease. Notably, there are few public databases tailored to searching NHP sequence variants, and no other database making use of centralized variant calling, or providing genotype-level data and predicted pathogenic effects for each variant. Results: The macaque Genotype And Phenotype (mGAP) resource is the first public website providing searchable, annotated macaque variant data. The mGAP resource includes a catalog of high confidence variants, derived from whole genome sequence (WGS). The current mGAP release at time of publication (1.7) contains 17,087,212 variants based on the sequence analysis of 293 rhesus macaques. A custom pipeline was developed to enable annotation of the macaque variants, leveraging human data sources that include regulatory elements (ENCODE, RegulomeDB), known disease- or phenotype-associated variants (GRASP), predicted impact (SIFT, PolyPhen2), and sequence conservation (Phylop, PhastCons). Currently mGAP includes 2767 variants that are identical to alleles listed in the human ClinVar database, of which 276 variants, spanning 258 genes, are identified as pathogenic. An additional 12,472 variants are predicted as high impact (SnpEff) and 13,129 are predicted as damaging (PolyPhen2). In total, these variants are predicted to be associated with more than 2000 human disease or phenotype entries reported in OMIM (Online Mendelian Inheritance in Man). Importantly, mGAP also provides genotype-level data for all subjects, allowing identification of specific individuals harboring alleles of interest. Conclusions: The mGAP resource provides variant and genotype data from hundreds of rhesus macaques, processed in a consistent manner across all subjects (https://mgap.ohsu.edu). Together with the extensive variant annotations, mGAP presents unprecedented opportunity to investigate potential genetic associations with currently characterized disease models, and to uncover new macaque models based on parallels with human risk alleles.",
    keywords = "Animal model, Database, Genome, Indian-origin, Macaca mulatta, Nonhuman primate, Rhesus, SNP",
    author = "Bimber, {Benjamin N.} and Yan, {Melissa Y.} and Peterson, {Samuel M.} and Betsy Ferguson",
    year = "2019",
    month = "3",
    day = "6",
    doi = "10.1186/s12864-019-5559-7",
    language = "English (US)",
    volume = "20",
    journal = "BMC Genomics",
    issn = "1471-2164",
    publisher = "BioMed Central",
    number = "1",

    }

    TY - JOUR

    T1 - MGAP

    T2 - The macaque genotype and phenotype resource, a framework for accessing and interpreting macaque variant data, and identifying new models of human disease

    AU - Bimber, Benjamin N.

    AU - Yan, Melissa Y.

    AU - Peterson, Samuel M.

    AU - Ferguson, Betsy

    PY - 2019/3/6

    Y1 - 2019/3/6

    N2 - Background: Non-human primates (NHPs), particularly macaques, serve as critical and highly relevant pre-clinical models of human disease. The similarity in human and macaque natural disease susceptibility, along with parallel genetic risk alleles, underscores the value of macaques in the development of effective treatment strategies. Nonetheless, there are limited genomic resources available to support the exploration and discovery of macaque models of inherited disease. Notably, there are few public databases tailored to searching NHP sequence variants, and no other database making use of centralized variant calling, or providing genotype-level data and predicted pathogenic effects for each variant. Results: The macaque Genotype And Phenotype (mGAP) resource is the first public website providing searchable, annotated macaque variant data. The mGAP resource includes a catalog of high confidence variants, derived from whole genome sequence (WGS). The current mGAP release at time of publication (1.7) contains 17,087,212 variants based on the sequence analysis of 293 rhesus macaques. A custom pipeline was developed to enable annotation of the macaque variants, leveraging human data sources that include regulatory elements (ENCODE, RegulomeDB), known disease- or phenotype-associated variants (GRASP), predicted impact (SIFT, PolyPhen2), and sequence conservation (Phylop, PhastCons). Currently mGAP includes 2767 variants that are identical to alleles listed in the human ClinVar database, of which 276 variants, spanning 258 genes, are identified as pathogenic. An additional 12,472 variants are predicted as high impact (SnpEff) and 13,129 are predicted as damaging (PolyPhen2). In total, these variants are predicted to be associated with more than 2000 human disease or phenotype entries reported in OMIM (Online Mendelian Inheritance in Man). Importantly, mGAP also provides genotype-level data for all subjects, allowing identification of specific individuals harboring alleles of interest. Conclusions: The mGAP resource provides variant and genotype data from hundreds of rhesus macaques, processed in a consistent manner across all subjects (https://mgap.ohsu.edu). Together with the extensive variant annotations, mGAP presents unprecedented opportunity to investigate potential genetic associations with currently characterized disease models, and to uncover new macaque models based on parallels with human risk alleles.

    AB - Background: Non-human primates (NHPs), particularly macaques, serve as critical and highly relevant pre-clinical models of human disease. The similarity in human and macaque natural disease susceptibility, along with parallel genetic risk alleles, underscores the value of macaques in the development of effective treatment strategies. Nonetheless, there are limited genomic resources available to support the exploration and discovery of macaque models of inherited disease. Notably, there are few public databases tailored to searching NHP sequence variants, and no other database making use of centralized variant calling, or providing genotype-level data and predicted pathogenic effects for each variant. Results: The macaque Genotype And Phenotype (mGAP) resource is the first public website providing searchable, annotated macaque variant data. The mGAP resource includes a catalog of high confidence variants, derived from whole genome sequence (WGS). The current mGAP release at time of publication (1.7) contains 17,087,212 variants based on the sequence analysis of 293 rhesus macaques. A custom pipeline was developed to enable annotation of the macaque variants, leveraging human data sources that include regulatory elements (ENCODE, RegulomeDB), known disease- or phenotype-associated variants (GRASP), predicted impact (SIFT, PolyPhen2), and sequence conservation (Phylop, PhastCons). Currently mGAP includes 2767 variants that are identical to alleles listed in the human ClinVar database, of which 276 variants, spanning 258 genes, are identified as pathogenic. An additional 12,472 variants are predicted as high impact (SnpEff) and 13,129 are predicted as damaging (PolyPhen2). In total, these variants are predicted to be associated with more than 2000 human disease or phenotype entries reported in OMIM (Online Mendelian Inheritance in Man). Importantly, mGAP also provides genotype-level data for all subjects, allowing identification of specific individuals harboring alleles of interest. Conclusions: The mGAP resource provides variant and genotype data from hundreds of rhesus macaques, processed in a consistent manner across all subjects (https://mgap.ohsu.edu). Together with the extensive variant annotations, mGAP presents unprecedented opportunity to investigate potential genetic associations with currently characterized disease models, and to uncover new macaque models based on parallels with human risk alleles.

    KW - Animal model

    KW - Database

    KW - Genome

    KW - Indian-origin

    KW - Macaca mulatta

    KW - Nonhuman primate

    KW - Rhesus

    KW - SNP

    UR - http://www.scopus.com/inward/record.url?scp=85062550515&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85062550515&partnerID=8YFLogxK

    U2 - 10.1186/s12864-019-5559-7

    DO - 10.1186/s12864-019-5559-7

    M3 - Article

    VL - 20

    JO - BMC Genomics

    JF - BMC Genomics

    SN - 1471-2164

    IS - 1

    M1 - 176

    ER -