Risk Classification With an Adaptive Naive Bayes Kernel Machine Model

Jessica Minnier; Ming Yuan; Jun S. Liu; Tianxi Cai

doi:10.1080/01621459.2014.908778

Risk Classification With an Adaptive Naive Bayes Kernel Machine Model

Jessica Minnier, Ming Yuan, Jun S. Liu, Tianxi Cai

Knight Cancer Institute

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for nonlinearity. Identifying markers with weak signals and estimating their joint effects among many noninformative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive Bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially nonlinear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principal component analysis. Asymptotic results for model estimation and gene-set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.

Original language	English (US)
Pages (from-to)	393-404
Number of pages	12
Journal	Journal of the American Statistical Association
Volume	110
Issue number	509
DOIs	https://doi.org/10.1080/01621459.2014.908778
State	Published - Jan 2 2015

Keywords

Gene-set analysis
Genetic association
Genetic pathways
Kernel PCA
Kernel machine regression
Principal component analysis
Risk prediction

ASJC Scopus subject areas

Statistics and Probability
Statistics, Probability and Uncertainty

Access to Document

10.1080/01621459.2014.908778

Cite this

@article{edcd293fd58e42ffbc68540c57039ccd,

title = "Risk Classification With an Adaptive Naive Bayes Kernel Machine Model",

abstract = "Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for nonlinearity. Identifying markers with weak signals and estimating their joint effects among many noninformative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive Bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially nonlinear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principal component analysis. Asymptotic results for model estimation and gene-set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.",

keywords = "Gene-set analysis, Genetic association, Genetic pathways, Kernel PCA, Kernel machine regression, Principal component analysis, Risk prediction",

author = "Jessica Minnier and Ming Yuan and Liu, {Jun S.} and Tianxi Cai",

note = "Publisher Copyright: {\textcopyright} 2015, American Statistical Association.",

year = "2015",

month = jan,

day = "2",

doi = "10.1080/01621459.2014.908778",

language = "English (US)",

volume = "110",

pages = "393--404",

journal = "Journal of the American Statistical Association",

issn = "0162-1459",

publisher = "Taylor and Francis Ltd.",

number = "509",

}

TY - JOUR

T1 - Risk Classification With an Adaptive Naive Bayes Kernel Machine Model

AU - Minnier, Jessica

AU - Yuan, Ming

AU - Liu, Jun S.

AU - Cai, Tianxi

PY - 2015/1/2

Y1 - 2015/1/2

N2 - Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for nonlinearity. Identifying markers with weak signals and estimating their joint effects among many noninformative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive Bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially nonlinear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principal component analysis. Asymptotic results for model estimation and gene-set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.

AB - Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for nonlinearity. Identifying markers with weak signals and estimating their joint effects among many noninformative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive Bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially nonlinear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principal component analysis. Asymptotic results for model estimation and gene-set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.

KW - Gene-set analysis

KW - Genetic association

KW - Genetic pathways

KW - Kernel PCA

KW - Kernel machine regression

KW - Principal component analysis

KW - Risk prediction

UR - http://www.scopus.com/inward/record.url?scp=84928259367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84928259367&partnerID=8YFLogxK

U2 - 10.1080/01621459.2014.908778

DO - 10.1080/01621459.2014.908778

M3 - Article

AN - SCOPUS:84928259367

SN - 0162-1459

VL - 110

SP - 393

EP - 404

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

IS - 509

ER -

Risk Classification With an Adaptive Naive Bayes Kernel Machine Model

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this