Skip to main content

Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics

Abstract

Heterogeneous DNA sequences can be partitioned into homogeneous domains that are comprised of the four nucleotides A, C, G, and T and the stop-codons. Recursively, we apply a new entropic segmentation method on DNA sequences using Jensen-Shannon and Jensen-Rényi divergences in order to find the borders between coding and noncoding DNA regions. We have chosen 12- and 18-symbol alphabets that capture (i) the differential nucleotide composition in codons, and (ii) the differential stop-codon composition along all the three phases in both strands of the DNA. The new segmentation method is based on the Jensen-Rényi divergence measure, nucleotide statistics, and stop-codon statistics in both DNA strands. The recursive segmentation process requires no prior training on known datasets. Consequently, for three entire genomes of bacteria, we find that the use of nucleotide composition, stop-codon composition, and Jensen-Rényi divergence improve the accuracy of finding the borders between coding and noncoding regions in DNA sequences.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Daniel Nicorici.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Nicorici, D., Astola, J. Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics. EURASIP J. Adv. Signal Process. 2004, 832471 (2004). https://doi.org/10.1155/S1110865704309212

Download citation

Keywords

  • recursive segmentation
  • DNA sequence
  • information divergence measures
  • statistics of stop-codons
  • Bayesian information criterion