The NIH BD2K center for big data in translational genomics

Benedict Paten; Mark Diekhans; Brian J. Druker; Stephen Friend; Justin Guinney; Nadine Gassner; Mitchell Guttman; W. James Kent; Patrick Mantey; Adam A. Margolin; Matt Massie; Adam M. Novak; Frank Nothaft; Lior Pachter; David Patterson; Maciej Smuga-Otto; Joshua M. Stuart; Laura Van't Veer; Barbara Wold; David Haussler

doi:10.1093/jamia/ocv047

The NIH BD2K center for big data in translational genomics

Benedict Paten, Mark Diekhans, Brian J. Druker, Stephen Friend, Justin Guinney, Nadine Gassner, Mitchell Guttman, W. James Kent, Patrick Mantey, Adam A. Margolin, Matt Massie, Adam M. Novak, Frank Nothaft, Lior Pachter, David Patterson, Maciej Smuga-Otto, Joshua M. Stuart, Laura Van't Veer, Barbara Wold, David Haussler

Research output: Contribution to journal › Article › peer-review

33 Scopus citations

Abstract

The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility.

Original language	English (US)
Pages (from-to)	1143-1147
Number of pages	5
Journal	Journal of the American Medical Informatics Association
Volume	22
Issue number	6
DOIs	https://doi.org/10.1093/jamia/ocv047
State	Published - 2015

Keywords

APIs
Big data
Computational genomics
Genome informatics
Genomics

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamia/ocv047

Cite this

Paten, B., Diekhans, M., Druker, B. J., Friend, S., Guinney, J., Gassner, N., Guttman, M., James Kent, W., Mantey, P., Margolin, A. A., Massie, M., Novak, A. M., Nothaft, F., Pachter, L., Patterson, D., Smuga-Otto, M., Stuart, J. M., Van't Veer, L., Wold, B., & Haussler, D. (2015). The NIH BD2K center for big data in translational genomics. Journal of the American Medical Informatics Association, 22(6), 1143-1147. https://doi.org/10.1093/jamia/ocv047

Paten, B, Diekhans, M, Druker, BJ, Friend, S, Guinney, J, Gassner, N, Guttman, M, James Kent, W, Mantey, P, Margolin, AA, Massie, M, Novak, AM, Nothaft, F, Pachter, L, Patterson, D, Smuga-Otto, M, Stuart, JM, Van't Veer, L, Wold, B & Haussler, D 2015, 'The NIH BD2K center for big data in translational genomics', Journal of the American Medical Informatics Association, vol. 22, no. 6, pp. 1143-1147. https://doi.org/10.1093/jamia/ocv047

@article{79e074e3d2ed40e2a44cbd384b7ba6d3,

title = "The NIH BD2K center for big data in translational genomics",

abstract = "The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility.",

keywords = "APIs, Big data, Computational genomics, Genome informatics, Genomics",

author = "Benedict Paten and Mark Diekhans and Druker, {Brian J.} and Stephen Friend and Justin Guinney and Nadine Gassner and Mitchell Guttman and {James Kent}, W. and Patrick Mantey and Margolin, {Adam A.} and Matt Massie and Novak, {Adam M.} and Frank Nothaft and Lior Pachter and David Patterson and Maciej Smuga-Otto and Stuart, {Joshua M.} and {Van't Veer}, Laura and Barbara Wold and David Haussler",

note = "Publisher Copyright: {\textcopyright} The Author 2015.",

year = "2015",

doi = "10.1093/jamia/ocv047",

language = "English (US)",

volume = "22",

pages = "1143--1147",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "6",

}

TY - JOUR

T1 - The NIH BD2K center for big data in translational genomics

AU - Paten, Benedict

AU - Diekhans, Mark

AU - Druker, Brian J.

AU - Friend, Stephen

AU - Guinney, Justin

AU - Gassner, Nadine

AU - Guttman, Mitchell

AU - James Kent, W.

AU - Mantey, Patrick

AU - Margolin, Adam A.

AU - Massie, Matt

AU - Novak, Adam M.

AU - Nothaft, Frank

AU - Pachter, Lior

AU - Patterson, David

AU - Smuga-Otto, Maciej

AU - Stuart, Joshua M.

AU - Van't Veer, Laura

AU - Wold, Barbara

AU - Haussler, David

PY - 2015

Y1 - 2015

N2 - The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility.

AB - The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility.

KW - APIs

KW - Big data

KW - Computational genomics

KW - Genome informatics

KW - Genomics

UR - http://www.scopus.com/inward/record.url?scp=84949803160&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949803160&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocv047

DO - 10.1093/jamia/ocv047

M3 - Article

C2 - 26174866

AN - SCOPUS:84949803160

SN - 1067-5027

VL - 22

SP - 1143

EP - 1147

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 6

ER -

The NIH BD2K center for big data in translational genomics

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this