Open Access

Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics

EURASIP Journal on Advances in Signal Processing20042004:832471

DOI: 10.1155/S1110865704309212

Received: 28 February 2003

Published: 21 January 2004

Abstract

Heterogeneous DNA sequences can be partitioned into homogeneous domains that are comprised of the four nucleotides A, C, G, and T and the stop-codons. Recursively, we apply a new entropic segmentation method on DNA sequences using Jensen-Shannon and Jensen-Rényi divergences in order to find the borders between coding and noncoding DNA regions. We have chosen 12- and 18-symbol alphabets that capture (i) the differential nucleotide composition in codons, and (ii) the differential stop-codon composition along all the three phases in both strands of the DNA. The new segmentation method is based on the Jensen-Rényi divergence measure, nucleotide statistics, and stop-codon statistics in both DNA strands. The recursive segmentation process requires no prior training on known datasets. Consequently, for three entire genomes of bacteria, we find that the use of nucleotide composition, stop-codon composition, and Jensen-Rényi divergence improve the accuracy of finding the borders between coding and noncoding regions in DNA sequences.

Keywords

recursive segmentation DNA sequence information divergence measures statistics of stop-codons Bayesian information criterion

Authors’ Affiliations

(1)
Tampere International Center for Signal Processing, Tampere University of Technology

Copyright

© Nicorici and Astola 2004