- Research Article
- Open Access
- Published:
DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach
EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 059809 (2006)
Abstract
Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.
References
Hartigan JA: Direct clustering of a data matrix. Journal of the American Statistical Association 1972, 67(337):123–129. 10.2307/2284710
Cheng Y, Church GM: Biclustering of expression data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB '00), August 2000, La Jolla, Calif, USA 93–103.
Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(supplement 1):S136–S144.
Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(22):12079–12084. 10.1073/pnas.210134797
Lazzeroni L, Owen A: Plaid models for gene expression data. Statistica Sinica 2002, 12(1):61–86.
Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the order-preserving submatrix problem. Proceedings of the 6th Annual International Conference on Computational Biology (RECOMB '02), April 2002, Washington, DC, USA 49–57.
Sharan R, Maron-Katz A, Shamir R: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 2003, 19(14):1787–1799. 10.1093/bioinformatics/btg232
Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 2003, 13(4):703–716. 10.1101/gr.648603
Yang J, Wang H, Wang W, Yu PS: Enhanced biclustering on expression data. Proceedings of 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE '03), March 2003, Bethesda, Md, USA 321–327.
Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE Transactions on Computational Biology and Bioinformatics 2004, 1(1):24–45. 10.1109/TCBB.2004.2
Alter O, Brown PO, Botstein D: Processing and modeling genome-wide expression data using singular value decomposition. Microarrays: Optical Technologies and Informatics, January 2001, San Jose, Calif, USA, Proceedings of SPIE 4266: 171–186.
Troyanskaya O, Cantor M, Sherlock G, et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520–525. 10.1093/bioinformatics/17.6.520
Tewfik AH, Tchagang AB: Biclustering of DNA microarray data with early pruning. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA
Tchagang AB, Tewfik AH: Robust biclustering algorithm: ROBA. 2005.
Tavazoie S, Hughes J, Campbell M, Cho R, Church GYeast micro data set, https://doi.org/arep.med.harvard.edu/biclustering
Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets. Proceedings of the International Conference on Management of Data (ACM SIGMOD '02), June 2002, Madison, Wis, USA 394–405.
Güldener U, Münsterkötter M, Kastenmüller G, et al.: CYGD: the comprehensive yeast genome database. Nucleic Acids Research 2005, 33, Database issue: D364–D368.
Munich Information Center for Protein Sequences (MIPS) and GSF-National Research Center for Environment and Health, "Comprehensive Yeast Genome Database," 2002. (visited July 21, 2005), https://doi.org/mips.gsf.de/genre/proj/yeast/
Ruepp A, Zollner A, Maier D, et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 2004, 32(18):5539–5545. 10.1093/nar/gkh894
Balakrishnan R, Christie KR, Costanzo MC, et al.: Saccharomyces Genome Database. https://doi.org/www.yeastgenome.org
Tewfik A, Tchagang AB, Vertatschitsch L: Parallel identification of gene biclusters with coherent evolution. IEEE Transactions on Signal Processing 2006, 54(6):2408–2417. Special issue on Genomics Signal Processing
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Tchagang, A.B., Tewfik, A.H. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach. EURASIP J. Adv. Signal Process. 2006, 059809 (2006). https://doi.org/10.1155/ASP/2006/59809
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/ASP/2006/59809
Keywords
- Experimental Data
- Information Technology
- Cluster Algorithm
- Information Retrieval
- Quantum Information