On selecting features from splice junctions: an analysis using information theoretic and machine learning approaches.

Christina L. Zheng; Virginia R. de Sa; Michael Gribskov; T. Murlidharan Nair

On selecting features from splice junctions: an analysis using information theoretic and machine learning approaches.

Christina L. Zheng, Virginia R. de Sa, Michael Gribskov, T. Murlidharan Nair

Research output: Contribution to journal › Article › peer-review

Abstract

The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.

Original language	English (US)
Pages (from-to)	73-83
Number of pages	11
Journal	Genome informatics. International Conference on Genome Informatics
Volume	14
State	Published - 2003
Externally published	Yes

ASJC Scopus subject areas

General Medicine

Cite this

@article{ec4185054cfd41f1bd6a54a9135f7238,

title = "On selecting features from splice junctions: an analysis using information theoretic and machine learning approaches.",

abstract = "The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.",

author = "Zheng, {Christina L.} and {de Sa}, {Virginia R.} and Michael Gribskov and Nair, {T. Murlidharan}",

year = "2003",

language = "English (US)",

volume = "14",

pages = "73--83",

journal = "Genome informatics. International Conference on Genome Informatics",

issn = "0919-9454",

publisher = "Universal Academy Press",

}

TY - JOUR

T1 - On selecting features from splice junctions

T2 - an analysis using information theoretic and machine learning approaches.

AU - Zheng, Christina L.

AU - de Sa, Virginia R.

AU - Gribskov, Michael

AU - Nair, T. Murlidharan

PY - 2003

Y1 - 2003

N2 - The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.

AB - The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.

UR - http://www.scopus.com/inward/record.url?scp=14944355870&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14944355870&partnerID=8YFLogxK

M3 - Article

C2 - 15706522

AN - SCOPUS:14944355870

SN - 0919-9454

VL - 14

SP - 73

EP - 83

JO - Genome informatics. International Conference on Genome Informatics

JF - Genome informatics. International Conference on Genome Informatics

ER -

On selecting features from splice junctions: an analysis using information theoretic and machine learning approaches.

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this