Large-scale automated machine reading discovers new cancer-driving mechanisms

Marco A. Valenzuela-Escárcega, Ozgun Babur, Gus Hahn-Powell, Dane Bell, Thomas Hicks, Enrique Noriega-Atala, Xia Wang, Mihai Surdeanu, Emek Demir, Clayton T. Morrison

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.

Original languageEnglish (US)
JournalDatabase : the journal of biological databases and curation
Volume2018
DOIs
StatePublished - Jan 1 2018

Fingerprint

Search engines
Reading
Throughput
Biological Phenomena
Search Engine
neoplasms
engines
Processing
PubMed
Neoplasms
data analysis
extracts
Big data

ASJC Scopus subject areas

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Valenzuela-Escárcega, M. A., Babur, O., Hahn-Powell, G., Bell, D., Hicks, T., Noriega-Atala, E., ... Morrison, C. T. (2018). Large-scale automated machine reading discovers new cancer-driving mechanisms. Database : the journal of biological databases and curation, 2018. https://doi.org/10.1093/database/bay098

Large-scale automated machine reading discovers new cancer-driving mechanisms. / Valenzuela-Escárcega, Marco A.; Babur, Ozgun; Hahn-Powell, Gus; Bell, Dane; Hicks, Thomas; Noriega-Atala, Enrique; Wang, Xia; Surdeanu, Mihai; Demir, Emek; Morrison, Clayton T.

In: Database : the journal of biological databases and curation, Vol. 2018, 01.01.2018.

Research output: Contribution to journalArticle

Valenzuela-Escárcega, MA, Babur, O, Hahn-Powell, G, Bell, D, Hicks, T, Noriega-Atala, E, Wang, X, Surdeanu, M, Demir, E & Morrison, CT 2018, 'Large-scale automated machine reading discovers new cancer-driving mechanisms', Database : the journal of biological databases and curation, vol. 2018. https://doi.org/10.1093/database/bay098
Valenzuela-Escárcega, Marco A. ; Babur, Ozgun ; Hahn-Powell, Gus ; Bell, Dane ; Hicks, Thomas ; Noriega-Atala, Enrique ; Wang, Xia ; Surdeanu, Mihai ; Demir, Emek ; Morrison, Clayton T. / Large-scale automated machine reading discovers new cancer-driving mechanisms. In: Database : the journal of biological databases and curation. 2018 ; Vol. 2018.
@article{1511fbb1861341fd8d081a614797c012,
title = "Large-scale automated machine reading discovers new cancer-driving mechanisms",
abstract = "PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.",
author = "Valenzuela-Esc{\'a}rcega, {Marco A.} and Ozgun Babur and Gus Hahn-Powell and Dane Bell and Thomas Hicks and Enrique Noriega-Atala and Xia Wang and Mihai Surdeanu and Emek Demir and Morrison, {Clayton T.}",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/database/bay098",
language = "English (US)",
volume = "2018",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - Large-scale automated machine reading discovers new cancer-driving mechanisms

AU - Valenzuela-Escárcega, Marco A.

AU - Babur, Ozgun

AU - Hahn-Powell, Gus

AU - Bell, Dane

AU - Hicks, Thomas

AU - Noriega-Atala, Enrique

AU - Wang, Xia

AU - Surdeanu, Mihai

AU - Demir, Emek

AU - Morrison, Clayton T.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.

AB - PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.

UR - http://www.scopus.com/inward/record.url?scp=85054367285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054367285&partnerID=8YFLogxK

U2 - 10.1093/database/bay098

DO - 10.1093/database/bay098

M3 - Article

VL - 2018

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

ER -