On selecting features from splice junctions: an analysis using information theoretic and machine learning approaches.

Christina L. Zheng, Virginia R. de Sa, Michael Gribskov, T. Murlidharan Nair

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The computational recognition of precise splice junctions is a challenge faced in the analysis of newly sequenced genomes. This is challenging due to the fact that the distribution of sequence patterns in these regions is not always distinct. Our objective is to understand the sequence signatures at the splice junctions, not simply to create an artificial recognition system. We use a combination of a neural network based calliper randomization approach and an information theoretic based feature selection approach for this purpose. This has been done in an effort to understand regions that harbor information content and to extract features relevant for the prediction of splice junctions. The analysis using the neural network based calliper randomization approach revealed regions important in the internal representation of the network model. The calliper approach captured both correlated as well as independently important features. The feature selection approach captures features that are independently informative. The two different methods can capture features with different properties. Comparative analysis of the results using both the methods help to infer about the kind of information present in the region.

Original languageEnglish (US)
Pages (from-to)73-83
Number of pages11
JournalGenome informatics. International Conference on Genome Informatics
Volume14
StatePublished - 2003
Externally publishedYes

ASJC Scopus subject areas

  • General Medicine

Fingerprint

Dive into the research topics of 'On selecting features from splice junctions: an analysis using information theoretic and machine learning approaches.'. Together they form a unique fingerprint.

Cite this