A multitask multiple kernel learning formulation for discriminating early- And late-stage cancers

Arezou Rahimi; Mehmet Gonen

doi:10.1093/bioinformatics/btaa168

A multitask multiple kernel learning formulation for discriminating early- And late-stage cancers

Arezou Rahimi, Mehmet Gonen

Biomedical Engineering

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation: Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments.

Original language	English (US)
Pages (from-to)	3766-3772
Number of pages	7
Journal	Bioinformatics
Volume	36
Issue number	12
DOIs	https://doi.org/10.1093/bioinformatics/btaa168
State	Published - Mar 31 2020

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btaa168

Cite this

@article{0f8210ad4a9840aea36f31e6c3133d6f,

title = "A multitask multiple kernel learning formulation for discriminating early- And late-stage cancers",

abstract = "Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation: Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments.",

author = "Arezou Rahimi and Mehmet Gonen",

year = "2020",

month = mar,

day = "31",

doi = "10.1093/bioinformatics/btaa168",

language = "English (US)",

volume = "36",

pages = "3766--3772",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "12",

}

TY - JOUR

T1 - A multitask multiple kernel learning formulation for discriminating early- And late-stage cancers

AU - Rahimi, Arezou

AU - Gonen, Mehmet

PY - 2020/3/31

Y1 - 2020/3/31

N2 - Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation: Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments.

AB - Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature. Availability and implementation: Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments.

UR - http://www.scopus.com/inward/record.url?scp=85083521479&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85083521479&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btaa168

DO - 10.1093/bioinformatics/btaa168

M3 - Article

C2 - 32163111

AN - SCOPUS:85083521479

SN - 1367-4803

VL - 36

SP - 3766

EP - 3772

JO - Bioinformatics

JF - Bioinformatics

IS - 12

ER -

A multitask multiple kernel learning formulation for discriminating early- And late-stage cancers

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this