DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

Tchagang, Alain B.; Tewfik, Ahmed H.

doi:10.1155/ASP/2006/59809

Research Article
Open access
Published: 01 December 2006

DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

Alain B. Tchagang¹ &
Ahmed H. Tewfik²

EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 059809 (2006) Cite this article

1469 Accesses
15 Citations
Metrics details

Abstract

Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.

References

Hartigan JA: Direct clustering of a data matrix. Journal of the American Statistical Association 1972, 67(337):123–129. 10.2307/2284710
Article Google Scholar
Cheng Y, Church GM: Biclustering of expression data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB '00), August 2000, La Jolla, Calif, USA 93–103.
Google Scholar
Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(supplement 1):S136–S144.
Article Google Scholar
Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(22):12079–12084. 10.1073/pnas.210134797
Article Google Scholar
Lazzeroni L, Owen A: Plaid models for gene expression data. Statistica Sinica 2002, 12(1):61–86.
MathSciNet MATH Google Scholar
Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the order-preserving submatrix problem. Proceedings of the 6th Annual International Conference on Computational Biology (RECOMB '02), April 2002, Washington, DC, USA 49–57.
Google Scholar
Sharan R, Maron-Katz A, Shamir R: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 2003, 19(14):1787–1799. 10.1093/bioinformatics/btg232
Article Google Scholar
Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 2003, 13(4):703–716. 10.1101/gr.648603
Article Google Scholar
Yang J, Wang H, Wang W, Yu PS: Enhanced biclustering on expression data. Proceedings of 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE '03), March 2003, Bethesda, Md, USA 321–327.
Google Scholar
Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE Transactions on Computational Biology and Bioinformatics 2004, 1(1):24–45. 10.1109/TCBB.2004.2
Article Google Scholar
Alter O, Brown PO, Botstein D: Processing and modeling genome-wide expression data using singular value decomposition. Microarrays: Optical Technologies and Informatics, January 2001, San Jose, Calif, USA, Proceedings of SPIE 4266: 171–186.
Google Scholar
Troyanskaya O, Cantor M, Sherlock G, et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520–525. 10.1093/bioinformatics/17.6.520
Article Google Scholar
Tewfik AH, Tchagang AB: Biclustering of DNA microarray data with early pruning. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA
Google Scholar
Tchagang AB, Tewfik AH: Robust biclustering algorithm: ROBA. 2005.
Google Scholar
Tavazoie S, Hughes J, Campbell M, Cho R, Church GYeast micro data set, https://doi.org/arep.med.harvard.edu/biclustering
Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets. Proceedings of the International Conference on Management of Data (ACM SIGMOD '02), June 2002, Madison, Wis, USA 394–405.
Google Scholar
Güldener U, Münsterkötter M, Kastenmüller G, et al.: CYGD: the comprehensive yeast genome database. Nucleic Acids Research 2005, 33, Database issue: D364–D368.
Google Scholar
Munich Information Center for Protein Sequences (MIPS) and GSF-National Research Center for Environment and Health, "Comprehensive Yeast Genome Database," 2002. (visited July 21, 2005), https://doi.org/mips.gsf.de/genre/proj/yeast/
Ruepp A, Zollner A, Maier D, et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 2004, 32(18):5539–5545. 10.1093/nar/gkh894
Article Google Scholar
Balakrishnan R, Christie KR, Costanzo MC, et al.: Saccharomyces Genome Database. https://doi.org/www.yeastgenome.org
Tewfik A, Tchagang AB, Vertatschitsch L: Parallel identification of gene biclusters with coherent evolution. IEEE Transactions on Signal Processing 2006, 54(6):2408–2417. Special issue on Genomics Signal Processing
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biomedical Engineering, Institute of Technology, University of Minnesota, 312 Church Street SE, Minneapolis, MN, 55455, USA
Alain B. Tchagang
Department of Electrical and Computer Engineering, Institute of Technology, University of Minnesota, 200 Union Street SE, Minneapolis, MN, 55455, USA
Ahmed H. Tewfik

Authors

Alain B. Tchagang
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed H. Tewfik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alain B. Tchagang.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tchagang, A.B., Tewfik, A.H. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach. EURASIP J. Adv. Signal Process. 2006, 059809 (2006). https://doi.org/10.1155/ASP/2006/59809

Download citation

Received: 15 May 2005
Revised: 05 October 2005
Accepted: 01 December 2005
Published: 01 December 2006
DOI: https://doi.org/10.1155/ASP/2006/59809

DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords