A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies

The International IBD Genetics Consortium

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

One goal of human genetics is to understand the genetic basis of disease, a challenge for diseases of complex inheritance because risk alleles are few relative to the vast set of benign variants. Risk variants are often sought by association studies in which allele frequencies in case subjects are contrasted with those from population-based samples used as control subjects. In an ideal world we would know population-level allele frequencies, releasing researchers to focus on case subjects. We argue this ideal is possible, at least theoretically, and we outline a path to achieving it in reality. If such a resource were to exist, it would yield ample savings and would facilitate the effective use of data repositories by removing administrative and technical barriers. We call this concept the Universal Control Repository Network (UNICORN), a means to perform association analyses without necessitating direct access to individual-level control data. Our approach to UNICORN uses existing genetic resources and various statistical tools to analyze these data, including hierarchical clustering with spectral analysis of ancestry; and empirical Bayesian analysis along with Gaussian spatial processes to estimate ancestry-specific allele frequencies. We demonstrate our approach using tens of thousands of control subjects from studies of Crohn disease, showing how it controls false positives, provides power similar to that achieved when all control data are directly accessible, and enhances power when control data are limiting or even imperfectly matched ancestrally. These results highlight how UNICORN can enable reliable, powerful, and convenient genetic association analyses without access to the individual-level data.

Original languageEnglish (US)
Pages (from-to)857-868
Number of pages12
JournalAmerican Journal of Human Genetics
Volume98
Issue number5
DOIs
StatePublished - May 5 2016

Fingerprint

Genetic Structures
Gene Frequency
Case-Control Studies
Inborn Genetic Diseases
Bayes Theorem
Medical Genetics
Crohn Disease
Population
Cluster Analysis
Alleles
Research Personnel

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies. / The International IBD Genetics Consortium.

In: American Journal of Human Genetics, Vol. 98, No. 5, 05.05.2016, p. 857-868.

Research output: Contribution to journalArticle

The International IBD Genetics Consortium. / A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies. In: American Journal of Human Genetics. 2016 ; Vol. 98, No. 5. pp. 857-868.
@article{ab283cda84a24b04af3478baeb5b783b,
title = "A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies",
abstract = "One goal of human genetics is to understand the genetic basis of disease, a challenge for diseases of complex inheritance because risk alleles are few relative to the vast set of benign variants. Risk variants are often sought by association studies in which allele frequencies in case subjects are contrasted with those from population-based samples used as control subjects. In an ideal world we would know population-level allele frequencies, releasing researchers to focus on case subjects. We argue this ideal is possible, at least theoretically, and we outline a path to achieving it in reality. If such a resource were to exist, it would yield ample savings and would facilitate the effective use of data repositories by removing administrative and technical barriers. We call this concept the Universal Control Repository Network (UNICORN), a means to perform association analyses without necessitating direct access to individual-level control data. Our approach to UNICORN uses existing genetic resources and various statistical tools to analyze these data, including hierarchical clustering with spectral analysis of ancestry; and empirical Bayesian analysis along with Gaussian spatial processes to estimate ancestry-specific allele frequencies. We demonstrate our approach using tens of thousands of control subjects from studies of Crohn disease, showing how it controls false positives, provides power similar to that achieved when all control data are directly accessible, and enhances power when control data are limiting or even imperfectly matched ancestrally. These results highlight how UNICORN can enable reliable, powerful, and convenient genetic association analyses without access to the individual-level data.",
author = "{The International IBD Genetics Consortium} and Bodea, {Corneliu A.} and Neale, {Benjamin M.} and Stephan Ripke and Murray Barclay and Laurent Peyrin-Biroulet and Mathias Chamaillard and Colombel, {Jean Frederick} and Mario Cottone and Anthony Croft and Renata D'Inc{\`a} and Jonas Halfvarson and Katherine Hanigan and Paul Henderson and Hugot, {Jean Pierre} and Amir Karban and Kennedy, {Nicholas A.} and Khan, {Mohammed Azam} and Marc L{\'e}mann and Arie Levine and Dunecan Massey and Monica Milla and Montgomery, {Grant W.} and Ng, {Sok Meng Evelyn} and Ioannis Oikonomou and Harald Peeters and Proctor, {Deborah D.} and Rahier, {Jean Francois} and Rebecca Roberts and Paul Rutgeerts and Frank Seibold and Laura Stronati and Taylor, {Kirstin M.} and Leif T{\"o}rkvist and Kullak Ublick and {Van Limbergen}, Johan and {Van Gossum}, Andre and Vatn, {Morten H.} and Hu Zhang and Wei Zhang and Andrews, {Jane M.} and Bampton, {Peter A.} and Florin, {Timothy H.} and Richard Gearry and Krupa Krishnaprasad and Lawrance, {Ian C.} and Gillian Mahy and Graham Radford-Smith and Roberts, {Rebecca L.} and Simms, {Lisa A.} and Don Conrad",
year = "2016",
month = "5",
day = "5",
doi = "10.1016/j.ajhg.2016.02.025",
language = "English (US)",
volume = "98",
pages = "857--868",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "5",

}

TY - JOUR

T1 - A Method to Exploit the Structure of Genetic Ancestry Space to Enhance Case-Control Studies

AU - The International IBD Genetics Consortium

AU - Bodea, Corneliu A.

AU - Neale, Benjamin M.

AU - Ripke, Stephan

AU - Barclay, Murray

AU - Peyrin-Biroulet, Laurent

AU - Chamaillard, Mathias

AU - Colombel, Jean Frederick

AU - Cottone, Mario

AU - Croft, Anthony

AU - D'Incà, Renata

AU - Halfvarson, Jonas

AU - Hanigan, Katherine

AU - Henderson, Paul

AU - Hugot, Jean Pierre

AU - Karban, Amir

AU - Kennedy, Nicholas A.

AU - Khan, Mohammed Azam

AU - Lémann, Marc

AU - Levine, Arie

AU - Massey, Dunecan

AU - Milla, Monica

AU - Montgomery, Grant W.

AU - Ng, Sok Meng Evelyn

AU - Oikonomou, Ioannis

AU - Peeters, Harald

AU - Proctor, Deborah D.

AU - Rahier, Jean Francois

AU - Roberts, Rebecca

AU - Rutgeerts, Paul

AU - Seibold, Frank

AU - Stronati, Laura

AU - Taylor, Kirstin M.

AU - Törkvist, Leif

AU - Ublick, Kullak

AU - Van Limbergen, Johan

AU - Van Gossum, Andre

AU - Vatn, Morten H.

AU - Zhang, Hu

AU - Zhang, Wei

AU - Andrews, Jane M.

AU - Bampton, Peter A.

AU - Florin, Timothy H.

AU - Gearry, Richard

AU - Krishnaprasad, Krupa

AU - Lawrance, Ian C.

AU - Mahy, Gillian

AU - Radford-Smith, Graham

AU - Roberts, Rebecca L.

AU - Simms, Lisa A.

AU - Conrad, Don

PY - 2016/5/5

Y1 - 2016/5/5

N2 - One goal of human genetics is to understand the genetic basis of disease, a challenge for diseases of complex inheritance because risk alleles are few relative to the vast set of benign variants. Risk variants are often sought by association studies in which allele frequencies in case subjects are contrasted with those from population-based samples used as control subjects. In an ideal world we would know population-level allele frequencies, releasing researchers to focus on case subjects. We argue this ideal is possible, at least theoretically, and we outline a path to achieving it in reality. If such a resource were to exist, it would yield ample savings and would facilitate the effective use of data repositories by removing administrative and technical barriers. We call this concept the Universal Control Repository Network (UNICORN), a means to perform association analyses without necessitating direct access to individual-level control data. Our approach to UNICORN uses existing genetic resources and various statistical tools to analyze these data, including hierarchical clustering with spectral analysis of ancestry; and empirical Bayesian analysis along with Gaussian spatial processes to estimate ancestry-specific allele frequencies. We demonstrate our approach using tens of thousands of control subjects from studies of Crohn disease, showing how it controls false positives, provides power similar to that achieved when all control data are directly accessible, and enhances power when control data are limiting or even imperfectly matched ancestrally. These results highlight how UNICORN can enable reliable, powerful, and convenient genetic association analyses without access to the individual-level data.

AB - One goal of human genetics is to understand the genetic basis of disease, a challenge for diseases of complex inheritance because risk alleles are few relative to the vast set of benign variants. Risk variants are often sought by association studies in which allele frequencies in case subjects are contrasted with those from population-based samples used as control subjects. In an ideal world we would know population-level allele frequencies, releasing researchers to focus on case subjects. We argue this ideal is possible, at least theoretically, and we outline a path to achieving it in reality. If such a resource were to exist, it would yield ample savings and would facilitate the effective use of data repositories by removing administrative and technical barriers. We call this concept the Universal Control Repository Network (UNICORN), a means to perform association analyses without necessitating direct access to individual-level control data. Our approach to UNICORN uses existing genetic resources and various statistical tools to analyze these data, including hierarchical clustering with spectral analysis of ancestry; and empirical Bayesian analysis along with Gaussian spatial processes to estimate ancestry-specific allele frequencies. We demonstrate our approach using tens of thousands of control subjects from studies of Crohn disease, showing how it controls false positives, provides power similar to that achieved when all control data are directly accessible, and enhances power when control data are limiting or even imperfectly matched ancestrally. These results highlight how UNICORN can enable reliable, powerful, and convenient genetic association analyses without access to the individual-level data.

UR - http://www.scopus.com/inward/record.url?scp=84963574963&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963574963&partnerID=8YFLogxK

U2 - 10.1016/j.ajhg.2016.02.025

DO - 10.1016/j.ajhg.2016.02.025

M3 - Article

C2 - 27087321

AN - SCOPUS:84963574963

VL - 98

SP - 857

EP - 868

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 5

ER -