Abstract
One-dimensional intervals incremental inverted index (Di4) is a multi-resolution, single-dimension indexing framework for efficient, scalable, and extensible computation of genomic interval expressions. The framework has a tri-layer architecture: the semantic layer provides orthogonal and generic means (including the support of user-defined function) of sense-making and higher-lever reasoning from region-based datasets; the logical layer provides building blocks for region calculus and topological relations between intervals; the physical layer abstracts from persistence technology and makes the model adaptable to variety of persistence technologies, spanning from small-scale (e.g., B+tree) to large-scale (e.g., LevelDB). The extensibility of Di4 to application scenarios is shown with an example of comparative evaluation of ChIP-seq and DNase-Seq replicates. Performance of Di4 is benchmarked for small and large scale scenarios under common bioinformatics application scenarios. Di4 is freely available from https://genometric.github.io/Di4.
Original language | English (US) |
---|---|
Article number | 8468044 |
Pages (from-to) | 2008-2021 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 31 |
Issue number | 10 |
DOIs | |
State | Published - Oct 1 2019 |
Keywords
- Index structures
- efficient query processing
- genomic data management
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics