Open Access

DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

EURASIP Journal on Advances in Signal Processing20062006:059809

https://doi.org/10.1155/ASP/2006/59809

Received: 15 May 2005

Accepted: 1 December 2005

Published: 6 June 2006

Abstract

Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.

[123456789101112131415161718192021]

Authors’ Affiliations

(1)
Department of Biomedical Engineering, Institute of Technology, University of Minnesota
(2)
Department of Electrical and Computer Engineering, Institute of Technology, University of Minnesota

References

  1. Hartigan JA: Direct clustering of a data matrix. Journal of the American Statistical Association 1972, 67(337):123-129. 10.2307/2284710View ArticleGoogle Scholar
  2. Cheng Y, Church GM: Biclustering of expression data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB '00), August 2000, La Jolla, Calif, USA 93-103.Google Scholar
  3. Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(supplement 1):S136-S144.View ArticleGoogle Scholar
  4. Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(22):12079-12084. 10.1073/pnas.210134797View ArticleGoogle Scholar
  5. Lazzeroni L, Owen A: Plaid models for gene expression data. Statistica Sinica 2002, 12(1):61-86.MathSciNetMATHGoogle Scholar
  6. Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the order-preserving submatrix problem. Proceedings of the 6th Annual International Conference on Computational Biology (RECOMB '02), April 2002, Washington, DC, USA 49-57.Google Scholar
  7. Sharan R, Maron-Katz A, Shamir R: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 2003, 19(14):1787-1799. 10.1093/bioinformatics/btg232View ArticleGoogle Scholar
  8. Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 2003, 13(4):703-716. 10.1101/gr.648603View ArticleGoogle Scholar
  9. Yang J, Wang H, Wang W, Yu PS: Enhanced biclustering on expression data. Proceedings of 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE '03), March 2003, Bethesda, Md, USA 321-327.Google Scholar
  10. Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE Transactions on Computational Biology and Bioinformatics 2004, 1(1):24-45. 10.1109/TCBB.2004.2View ArticleGoogle Scholar
  11. Alter O, Brown PO, Botstein D: Processing and modeling genome-wide expression data using singular value decomposition. Microarrays: Optical Technologies and Informatics, January 2001, San Jose, Calif, USA, Proceedings of SPIE 4266: 171-186.Google Scholar
  12. Troyanskaya O, Cantor M, Sherlock G, et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520-525. 10.1093/bioinformatics/17.6.520View ArticleGoogle Scholar
  13. Tewfik AH, Tchagang AB: Biclustering of DNA microarray data with early pruning. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USAGoogle Scholar
  14. Tchagang AB, Tewfik AH: Robust biclustering algorithm: ROBA. 2005.Google Scholar
  15. Tavazoie S, Hughes J, Campbell M, Cho R, Church GYeast micro data set, http://arep.med.harvard.edu/biclustering
  16. Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets. Proceedings of the International Conference on Management of Data (ACM SIGMOD '02), June 2002, Madison, Wis, USA 394-405.Google Scholar
  17. Güldener U, Münsterkötter M, Kastenmüller G, et al.: CYGD: the comprehensive yeast genome database. Nucleic Acids Research 2005, 33, Database issue: D364-D368.Google Scholar
  18. Munich Information Center for Protein Sequences (MIPS) and GSF-National Research Center for Environment and Health, "Comprehensive Yeast Genome Database," 2002. (visited July 21, 2005), http://mips.gsf.de/genre/proj/yeast/
  19. Ruepp A, Zollner A, Maier D, et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 2004, 32(18):5539-5545. 10.1093/nar/gkh894View ArticleGoogle Scholar
  20. Balakrishnan R, Christie KR, Costanzo MC, et al.: Saccharomyces Genome Database. http://www.yeastgenome.org
  21. Tewfik A, Tchagang AB, Vertatschitsch L: Parallel identification of gene biclusters with coherent evolution. IEEE Transactions on Signal Processing 2006, 54(6):2408-2417. Special issue on Genomics Signal ProcessingView ArticleGoogle Scholar

Copyright

© Tchagang and Tewfik 2006