Next Generation Indexing for Genomic Intervals

Vahid Jalili, Matteo Matteucci, Jeremy Goecks, Yashar Deldjoo, Stefano Ceri

Research output: Contribution to journalArticle

1 Scopus citations


Di4 (1D intervals incremental inverted index) is a multi-resolution, single-dimension indexing framework for efficient, scalable, and extensible computation of genomic interval expressions. The framework has a tri-layer architecture: the semantic layer provides orthogonal and generic means (including the support of user-defined function) of sense-making and higher-lever reasoning from region-based datasets; the logical layer provides building blocks for region calculus and topological relations between intervals; the physical layer abstracts from persistence technology and makes the model adaptable to variety of persistence technologies, spanning from small-scale (e.g., B+tree) to large-scale (e.g., LevelDB). The extensibility of Di4 to application scenarios is shown with an example of comparative evaluation of ChIP-seq and DNase-Seq replicates. Performance of Di4 is benchmarked for small and large scale scenarios under common bioinformatics application scenarios. Di4 is freely available from

Original languageEnglish (US)
JournalIEEE Transactions on Knowledge and Data Engineering
StateAccepted/In press - Sep 18 2018


  • Bioinformatics
  • Calculus
  • DNA
  • efficient query processing
  • genomic data management
  • Genomics
  • Index structures
  • Indexing
  • Tools

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Next Generation Indexing for Genomic Intervals'. Together they form a unique fingerprint.

  • Cite this