Disentangling multidimensional spatio-temporal data into their common and aberrant responses

Young Hwan Chang, James Korkola, Dhara N. Amin, Mark M. Moasser, Jose M. Carmena, Joe Gray, Claire J. Tomlin

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatiotemporal biological data sets such as time series gene expression with various perturbations over different cell lines, or neural spike trains across many experimental trials, have the potential to acquire insight about the dynamic behavior of the system. For this potential to be realized, we need a suitable representation to understand the data. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. Since the wide range of experiments and unknown complexity of the underlying system contribute to the heterogeneity of biological data, we develop a new method by proposing an extension of Robust Principal Component Analysis (RPCA), which models common variations across multiple experiments as the lowrank component and anomalies across these experiments as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data sets in a robust way without any prior knowledge by separating these common responses and abnormal responses. Thus, the proposed method provides us a new representation of these data sets which has the potential to help users acquire new insight from data.

Original languageEnglish (US)
Article numbere0121607
JournalPLoS One
Volume10
Issue number4
DOIs
StatePublished - Apr 22 2015

Fingerprint

Experiments
Gene expression
Principal component analysis
Time series
methodology
Cells
Throughput
engineers
Engineers
time series analysis
Principal Component Analysis
principal component analysis
cell lines
gene expression
Datasets
Gene Expression
Cell Line

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Disentangling multidimensional spatio-temporal data into their common and aberrant responses. / Chang, Young Hwan; Korkola, James; Amin, Dhara N.; Moasser, Mark M.; Carmena, Jose M.; Gray, Joe; Tomlin, Claire J.

In: PLoS One, Vol. 10, No. 4, e0121607, 22.04.2015.

Research output: Contribution to journalArticle

Chang, Young Hwan ; Korkola, James ; Amin, Dhara N. ; Moasser, Mark M. ; Carmena, Jose M. ; Gray, Joe ; Tomlin, Claire J. / Disentangling multidimensional spatio-temporal data into their common and aberrant responses. In: PLoS One. 2015 ; Vol. 10, No. 4.
@article{5eee5e2cf9ae4680b848edf5326b57f6,
title = "Disentangling multidimensional spatio-temporal data into their common and aberrant responses",
abstract = "With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatiotemporal biological data sets such as time series gene expression with various perturbations over different cell lines, or neural spike trains across many experimental trials, have the potential to acquire insight about the dynamic behavior of the system. For this potential to be realized, we need a suitable representation to understand the data. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. Since the wide range of experiments and unknown complexity of the underlying system contribute to the heterogeneity of biological data, we develop a new method by proposing an extension of Robust Principal Component Analysis (RPCA), which models common variations across multiple experiments as the lowrank component and anomalies across these experiments as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data sets in a robust way without any prior knowledge by separating these common responses and abnormal responses. Thus, the proposed method provides us a new representation of these data sets which has the potential to help users acquire new insight from data.",
author = "Chang, {Young Hwan} and James Korkola and Amin, {Dhara N.} and Moasser, {Mark M.} and Carmena, {Jose M.} and Joe Gray and Tomlin, {Claire J.}",
year = "2015",
month = "4",
day = "22",
doi = "10.1371/journal.pone.0121607",
language = "English (US)",
volume = "10",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "4",

}

TY - JOUR

T1 - Disentangling multidimensional spatio-temporal data into their common and aberrant responses

AU - Chang, Young Hwan

AU - Korkola, James

AU - Amin, Dhara N.

AU - Moasser, Mark M.

AU - Carmena, Jose M.

AU - Gray, Joe

AU - Tomlin, Claire J.

PY - 2015/4/22

Y1 - 2015/4/22

N2 - With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatiotemporal biological data sets such as time series gene expression with various perturbations over different cell lines, or neural spike trains across many experimental trials, have the potential to acquire insight about the dynamic behavior of the system. For this potential to be realized, we need a suitable representation to understand the data. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. Since the wide range of experiments and unknown complexity of the underlying system contribute to the heterogeneity of biological data, we develop a new method by proposing an extension of Robust Principal Component Analysis (RPCA), which models common variations across multiple experiments as the lowrank component and anomalies across these experiments as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data sets in a robust way without any prior knowledge by separating these common responses and abnormal responses. Thus, the proposed method provides us a new representation of these data sets which has the potential to help users acquire new insight from data.

AB - With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatiotemporal biological data sets such as time series gene expression with various perturbations over different cell lines, or neural spike trains across many experimental trials, have the potential to acquire insight about the dynamic behavior of the system. For this potential to be realized, we need a suitable representation to understand the data. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. Since the wide range of experiments and unknown complexity of the underlying system contribute to the heterogeneity of biological data, we develop a new method by proposing an extension of Robust Principal Component Analysis (RPCA), which models common variations across multiple experiments as the lowrank component and anomalies across these experiments as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data sets in a robust way without any prior knowledge by separating these common responses and abnormal responses. Thus, the proposed method provides us a new representation of these data sets which has the potential to help users acquire new insight from data.

UR - http://www.scopus.com/inward/record.url?scp=84930636297&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930636297&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0121607

DO - 10.1371/journal.pone.0121607

M3 - Article

C2 - 25901353

AN - SCOPUS:84930636297

VL - 10

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 4

M1 - e0121607

ER -