The health care and life sciences community profile for dataset descriptions

Michel Dumontier, Alasdair J G Gray, M. Scott Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, Pascale Gaudet, Erich A. Gombocz, Alejandra N. Gonzalez-Beltran, Paul Groth, Melissa Haendel, Maori Ito, Simon Jupp, Nick Juty, Toshiaki Katayama, Norio Kobayashi & 10 others Kalpana Krishnaswami, Camille Laibe, Nicolas Le Novère, Simon Lin, James Malone, Michael Miller, Christopher J. Mungall, Laurens Rietveld, Sarala M. Wimalaratne, Atsuko Yamaguchi

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

Original languageEnglish (US)
Article numbere2331
JournalPeerJ
Volume2016
Issue number8
DOIs
StatePublished - 2016

Fingerprint

Biological Science Disciplines
Metadata
Health care
health services
Vocabulary
Delivery of Health Care
provenance
Guidelines
Semantic Web
Public Opinion
Semantics
Publications
Datasets

Keywords

  • Data profiling
  • Dataset descriptions
  • FAIR data
  • Metadata
  • Provenance

ASJC Scopus subject areas

  • Neuroscience(all)
  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Dumontier, M., Gray, A. J. G., Marshall, M. S., Alexiev, V., Ansell, P., Bader, G., ... Yamaguchi, A. (2016). The health care and life sciences community profile for dataset descriptions. PeerJ, 2016(8), [e2331]. https://doi.org/10.7717/peerj.2331

The health care and life sciences community profile for dataset descriptions. / Dumontier, Michel; Gray, Alasdair J G; Marshall, M. Scott; Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko.

In: PeerJ, Vol. 2016, No. 8, e2331, 2016.

Research output: Contribution to journalArticle

Dumontier, M, Gray, AJG, Marshall, MS, Alexiev, V, Ansell, P, Bader, G, Baran, J, Bolleman, JT, Callahan, A, Cruz-Toledo, J, Gaudet, P, Gombocz, EA, Gonzalez-Beltran, AN, Groth, P, Haendel, M, Ito, M, Jupp, S, Juty, N, Katayama, T, Kobayashi, N, Krishnaswami, K, Laibe, C, Le Novère, N, Lin, S, Malone, J, Miller, M, Mungall, CJ, Rietveld, L, Wimalaratne, SM & Yamaguchi, A 2016, 'The health care and life sciences community profile for dataset descriptions', PeerJ, vol. 2016, no. 8, e2331. https://doi.org/10.7717/peerj.2331
Dumontier M, Gray AJG, Marshall MS, Alexiev V, Ansell P, Bader G et al. The health care and life sciences community profile for dataset descriptions. PeerJ. 2016;2016(8). e2331. https://doi.org/10.7717/peerj.2331
Dumontier, Michel ; Gray, Alasdair J G ; Marshall, M. Scott ; Alexiev, Vladimir ; Ansell, Peter ; Bader, Gary ; Baran, Joachim ; Bolleman, Jerven T. ; Callahan, Alison ; Cruz-Toledo, José ; Gaudet, Pascale ; Gombocz, Erich A. ; Gonzalez-Beltran, Alejandra N. ; Groth, Paul ; Haendel, Melissa ; Ito, Maori ; Jupp, Simon ; Juty, Nick ; Katayama, Toshiaki ; Kobayashi, Norio ; Krishnaswami, Kalpana ; Laibe, Camille ; Le Novère, Nicolas ; Lin, Simon ; Malone, James ; Miller, Michael ; Mungall, Christopher J. ; Rietveld, Laurens ; Wimalaratne, Sarala M. ; Yamaguchi, Atsuko. / The health care and life sciences community profile for dataset descriptions. In: PeerJ. 2016 ; Vol. 2016, No. 8.
@article{2ed26c772b3d440c81c8133bf705ec76,
title = "The health care and life sciences community profile for dataset descriptions",
abstract = "Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.",
keywords = "Data profiling, Dataset descriptions, FAIR data, Metadata, Provenance",
author = "Michel Dumontier and Gray, {Alasdair J G} and Marshall, {M. Scott} and Vladimir Alexiev and Peter Ansell and Gary Bader and Joachim Baran and Bolleman, {Jerven T.} and Alison Callahan and Jos{\'e} Cruz-Toledo and Pascale Gaudet and Gombocz, {Erich A.} and Gonzalez-Beltran, {Alejandra N.} and Paul Groth and Melissa Haendel and Maori Ito and Simon Jupp and Nick Juty and Toshiaki Katayama and Norio Kobayashi and Kalpana Krishnaswami and Camille Laibe and {Le Nov{\`e}re}, Nicolas and Simon Lin and James Malone and Michael Miller and Mungall, {Christopher J.} and Laurens Rietveld and Wimalaratne, {Sarala M.} and Atsuko Yamaguchi",
year = "2016",
doi = "10.7717/peerj.2331",
language = "English (US)",
volume = "2016",
journal = "PeerJ",
issn = "2167-8359",
publisher = "PeerJ",
number = "8",

}

TY - JOUR

T1 - The health care and life sciences community profile for dataset descriptions

AU - Dumontier, Michel

AU - Gray, Alasdair J G

AU - Marshall, M. Scott

AU - Alexiev, Vladimir

AU - Ansell, Peter

AU - Bader, Gary

AU - Baran, Joachim

AU - Bolleman, Jerven T.

AU - Callahan, Alison

AU - Cruz-Toledo, José

AU - Gaudet, Pascale

AU - Gombocz, Erich A.

AU - Gonzalez-Beltran, Alejandra N.

AU - Groth, Paul

AU - Haendel, Melissa

AU - Ito, Maori

AU - Jupp, Simon

AU - Juty, Nick

AU - Katayama, Toshiaki

AU - Kobayashi, Norio

AU - Krishnaswami, Kalpana

AU - Laibe, Camille

AU - Le Novère, Nicolas

AU - Lin, Simon

AU - Malone, James

AU - Miller, Michael

AU - Mungall, Christopher J.

AU - Rietveld, Laurens

AU - Wimalaratne, Sarala M.

AU - Yamaguchi, Atsuko

PY - 2016

Y1 - 2016

N2 - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

AB - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

KW - Data profiling

KW - Dataset descriptions

KW - FAIR data

KW - Metadata

KW - Provenance

UR - http://www.scopus.com/inward/record.url?scp=84992130197&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992130197&partnerID=8YFLogxK

U2 - 10.7717/peerj.2331

DO - 10.7717/peerj.2331

M3 - Article

VL - 2016

JO - PeerJ

JF - PeerJ

SN - 2167-8359

IS - 8

M1 - e2331

ER -