Identifiers for the 21st century

How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

Julie A. McMurry, Nick Juty, Niklas Blomberg, Tony Burdett, Tom Conlin, Nathalie Conte, Mélanie Courtot, John Deck, Michel Dumontier, Donal K. Fellows, Alejandra Gonzalez-Beltran, Philipp Gormanns, Jeffrey Grethe, Janna Hastings, Jean Karim Hériché, Henning Hermjakob, Jon C. Ison, Rafael C. Jimenez, Simon Jupp, John Kunze & 24 others Camille Laibe, Nicolas Le Novère, James Malone, Maria Jesus Martin, Johanna R. McEntyre, Chris Morris, Juha Muilu, Wolfgang Müller, Philippe Rocca-Serra, Susanna Assunta Sansone, Murat Sariyar, Jacky L. Snoep, Stian Soiland-Reyes, Natalie J. Stanford, Neil Swainston, Nicole Washington, Alan R. Williams, Sarala M. Wimalaratne, Lilly M. Winfree, Katherine Wolstencroft, Carole Goble, Christopher J. Mungall, Melissa Haendel, Helen Parkinson

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

Original languageEnglish (US)
Article numbere2001414
JournalPLoS Biology
Volume15
Issue number6
DOIs
StatePublished - Jun 29 2017

Fingerprint

Biological Science Disciplines
Databases
bricks
Knowledge Bases
Data integration
generators (equipment)
Brick
Mortar
Practice Guidelines
infrastructure
Registries

ASJC Scopus subject areas

  • Neuroscience(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Identifiers for the 21st century : How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. / McMurry, Julie A.; Juty, Nick; Blomberg, Niklas; Burdett, Tony; Conlin, Tom; Conte, Nathalie; Courtot, Mélanie; Deck, John; Dumontier, Michel; Fellows, Donal K.; Gonzalez-Beltran, Alejandra; Gormanns, Philipp; Grethe, Jeffrey; Hastings, Janna; Hériché, Jean Karim; Hermjakob, Henning; Ison, Jon C.; Jimenez, Rafael C.; Jupp, Simon; Kunze, John; Laibe, Camille; Le Novère, Nicolas; Malone, James; Martin, Maria Jesus; McEntyre, Johanna R.; Morris, Chris; Muilu, Juha; Müller, Wolfgang; Rocca-Serra, Philippe; Sansone, Susanna Assunta; Sariyar, Murat; Snoep, Jacky L.; Soiland-Reyes, Stian; Stanford, Natalie J.; Swainston, Neil; Washington, Nicole; Williams, Alan R.; Wimalaratne, Sarala M.; Winfree, Lilly M.; Wolstencroft, Katherine; Goble, Carole; Mungall, Christopher J.; Haendel, Melissa; Parkinson, Helen.

In: PLoS Biology, Vol. 15, No. 6, e2001414, 29.06.2017.

Research output: Contribution to journalArticle

McMurry, JA, Juty, N, Blomberg, N, Burdett, T, Conlin, T, Conte, N, Courtot, M, Deck, J, Dumontier, M, Fellows, DK, Gonzalez-Beltran, A, Gormanns, P, Grethe, J, Hastings, J, Hériché, JK, Hermjakob, H, Ison, JC, Jimenez, RC, Jupp, S, Kunze, J, Laibe, C, Le Novère, N, Malone, J, Martin, MJ, McEntyre, JR, Morris, C, Muilu, J, Müller, W, Rocca-Serra, P, Sansone, SA, Sariyar, M, Snoep, JL, Soiland-Reyes, S, Stanford, NJ, Swainston, N, Washington, N, Williams, AR, Wimalaratne, SM, Winfree, LM, Wolstencroft, K, Goble, C, Mungall, CJ, Haendel, M & Parkinson, H 2017, 'Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data', PLoS Biology, vol. 15, no. 6, e2001414. https://doi.org/10.1371/journal.pbio.2001414
McMurry, Julie A. ; Juty, Nick ; Blomberg, Niklas ; Burdett, Tony ; Conlin, Tom ; Conte, Nathalie ; Courtot, Mélanie ; Deck, John ; Dumontier, Michel ; Fellows, Donal K. ; Gonzalez-Beltran, Alejandra ; Gormanns, Philipp ; Grethe, Jeffrey ; Hastings, Janna ; Hériché, Jean Karim ; Hermjakob, Henning ; Ison, Jon C. ; Jimenez, Rafael C. ; Jupp, Simon ; Kunze, John ; Laibe, Camille ; Le Novère, Nicolas ; Malone, James ; Martin, Maria Jesus ; McEntyre, Johanna R. ; Morris, Chris ; Muilu, Juha ; Müller, Wolfgang ; Rocca-Serra, Philippe ; Sansone, Susanna Assunta ; Sariyar, Murat ; Snoep, Jacky L. ; Soiland-Reyes, Stian ; Stanford, Natalie J. ; Swainston, Neil ; Washington, Nicole ; Williams, Alan R. ; Wimalaratne, Sarala M. ; Winfree, Lilly M. ; Wolstencroft, Katherine ; Goble, Carole ; Mungall, Christopher J. ; Haendel, Melissa ; Parkinson, Helen. / Identifiers for the 21st century : How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. In: PLoS Biology. 2017 ; Vol. 15, No. 6.
@article{4aacf050f384485394e56e8bd8ae91d2,
title = "Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data",
abstract = "In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.",
author = "McMurry, {Julie A.} and Nick Juty and Niklas Blomberg and Tony Burdett and Tom Conlin and Nathalie Conte and M{\'e}lanie Courtot and John Deck and Michel Dumontier and Fellows, {Donal K.} and Alejandra Gonzalez-Beltran and Philipp Gormanns and Jeffrey Grethe and Janna Hastings and H{\'e}rich{\'e}, {Jean Karim} and Henning Hermjakob and Ison, {Jon C.} and Jimenez, {Rafael C.} and Simon Jupp and John Kunze and Camille Laibe and {Le Nov{\`e}re}, Nicolas and James Malone and Martin, {Maria Jesus} and McEntyre, {Johanna R.} and Chris Morris and Juha Muilu and Wolfgang M{\"u}ller and Philippe Rocca-Serra and Sansone, {Susanna Assunta} and Murat Sariyar and Snoep, {Jacky L.} and Stian Soiland-Reyes and Stanford, {Natalie J.} and Neil Swainston and Nicole Washington and Williams, {Alan R.} and Wimalaratne, {Sarala M.} and Winfree, {Lilly M.} and Katherine Wolstencroft and Carole Goble and Mungall, {Christopher J.} and Melissa Haendel and Helen Parkinson",
year = "2017",
month = "6",
day = "29",
doi = "10.1371/journal.pbio.2001414",
language = "English (US)",
volume = "15",
journal = "PLoS Biology",
issn = "1544-9173",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Identifiers for the 21st century

T2 - How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data

AU - McMurry, Julie A.

AU - Juty, Nick

AU - Blomberg, Niklas

AU - Burdett, Tony

AU - Conlin, Tom

AU - Conte, Nathalie

AU - Courtot, Mélanie

AU - Deck, John

AU - Dumontier, Michel

AU - Fellows, Donal K.

AU - Gonzalez-Beltran, Alejandra

AU - Gormanns, Philipp

AU - Grethe, Jeffrey

AU - Hastings, Janna

AU - Hériché, Jean Karim

AU - Hermjakob, Henning

AU - Ison, Jon C.

AU - Jimenez, Rafael C.

AU - Jupp, Simon

AU - Kunze, John

AU - Laibe, Camille

AU - Le Novère, Nicolas

AU - Malone, James

AU - Martin, Maria Jesus

AU - McEntyre, Johanna R.

AU - Morris, Chris

AU - Muilu, Juha

AU - Müller, Wolfgang

AU - Rocca-Serra, Philippe

AU - Sansone, Susanna Assunta

AU - Sariyar, Murat

AU - Snoep, Jacky L.

AU - Soiland-Reyes, Stian

AU - Stanford, Natalie J.

AU - Swainston, Neil

AU - Washington, Nicole

AU - Williams, Alan R.

AU - Wimalaratne, Sarala M.

AU - Winfree, Lilly M.

AU - Wolstencroft, Katherine

AU - Goble, Carole

AU - Mungall, Christopher J.

AU - Haendel, Melissa

AU - Parkinson, Helen

PY - 2017/6/29

Y1 - 2017/6/29

N2 - In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

AB - In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.

UR - http://www.scopus.com/inward/record.url?scp=85021689952&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021689952&partnerID=8YFLogxK

U2 - 10.1371/journal.pbio.2001414

DO - 10.1371/journal.pbio.2001414

M3 - Article

VL - 15

JO - PLoS Biology

JF - PLoS Biology

SN - 1544-9173

IS - 6

M1 - e2001414

ER -