Molecular heterogeneity at the network level: High-dimensional testing, clustering and a TCGA case study

Nicolas Städler, Frank Dondelinger, Steven M. Hill, Rehan Akbani, Yiling Lu, Gordon Mills, Sach Mukherjee

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Motivation: Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. Results: We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. Availability and implementation: As the Bioconductor package nethet.

Original languageEnglish (US)
Pages (from-to)2890-2896
Number of pages7
JournalBioinformatics
Volume33
Issue number18
DOIs
StatePublished - Jan 1 2017
Externally publishedYes

Fingerprint

Atlases
Atlas
Cluster Analysis
Cancer
Genome
High-dimensional
Genes
Clustering
Testing
Proteins
Neoplasms
Learning
Subgroup
Statistics
Availability
Tissue
Biology
Protein
Leverage
Pathway

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Molecular heterogeneity at the network level : High-dimensional testing, clustering and a TCGA case study. / Städler, Nicolas; Dondelinger, Frank; Hill, Steven M.; Akbani, Rehan; Lu, Yiling; Mills, Gordon; Mukherjee, Sach.

In: Bioinformatics, Vol. 33, No. 18, 01.01.2017, p. 2890-2896.

Research output: Contribution to journalArticle

Städler, Nicolas ; Dondelinger, Frank ; Hill, Steven M. ; Akbani, Rehan ; Lu, Yiling ; Mills, Gordon ; Mukherjee, Sach. / Molecular heterogeneity at the network level : High-dimensional testing, clustering and a TCGA case study. In: Bioinformatics. 2017 ; Vol. 33, No. 18. pp. 2890-2896.
@article{d57092cc9db44c50821679aeea7aae1f,
title = "Molecular heterogeneity at the network level: High-dimensional testing, clustering and a TCGA case study",
abstract = "Motivation: Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. Results: We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. Availability and implementation: As the Bioconductor package nethet.",
author = "Nicolas St{\"a}dler and Frank Dondelinger and Hill, {Steven M.} and Rehan Akbani and Yiling Lu and Gordon Mills and Sach Mukherjee",
year = "2017",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/btx322",
language = "English (US)",
volume = "33",
pages = "2890--2896",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "18",

}

TY - JOUR

T1 - Molecular heterogeneity at the network level

T2 - High-dimensional testing, clustering and a TCGA case study

AU - Städler, Nicolas

AU - Dondelinger, Frank

AU - Hill, Steven M.

AU - Akbani, Rehan

AU - Lu, Yiling

AU - Mills, Gordon

AU - Mukherjee, Sach

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Motivation: Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. Results: We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. Availability and implementation: As the Bioconductor package nethet.

AB - Motivation: Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. Results: We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. Availability and implementation: As the Bioconductor package nethet.

UR - http://www.scopus.com/inward/record.url?scp=85029785637&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029785637&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx322

DO - 10.1093/bioinformatics/btx322

M3 - Article

C2 - 28535188

AN - SCOPUS:85029785637

VL - 33

SP - 2890

EP - 2896

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 18

ER -