Skip to main content


DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

Article metrics

  • 1113 Accesses

  • 13 Citations


Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.


  1. 1.

    Hartigan JA: Direct clustering of a data matrix. Journal of the American Statistical Association 1972, 67(337):123–129. 10.2307/2284710

  2. 2.

    Cheng Y, Church GM: Biclustering of expression data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB '00), August 2000, La Jolla, Calif, USA 93–103.

  3. 3.

    Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(supplement 1):S136–S144.

  4. 4.

    Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(22):12079–12084. 10.1073/pnas.210134797

  5. 5.

    Lazzeroni L, Owen A: Plaid models for gene expression data. Statistica Sinica 2002, 12(1):61–86.

  6. 6.

    Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the order-preserving submatrix problem. Proceedings of the 6th Annual International Conference on Computational Biology (RECOMB '02), April 2002, Washington, DC, USA 49–57.

  7. 7.

    Sharan R, Maron-Katz A, Shamir R: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 2003, 19(14):1787–1799. 10.1093/bioinformatics/btg232

  8. 8.

    Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 2003, 13(4):703–716. 10.1101/gr.648603

  9. 9.

    Yang J, Wang H, Wang W, Yu PS: Enhanced biclustering on expression data. Proceedings of 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE '03), March 2003, Bethesda, Md, USA 321–327.

  10. 10.

    Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE Transactions on Computational Biology and Bioinformatics 2004, 1(1):24–45. 10.1109/TCBB.2004.2

  11. 11.

    Alter O, Brown PO, Botstein D: Processing and modeling genome-wide expression data using singular value decomposition. Microarrays: Optical Technologies and Informatics, January 2001, San Jose, Calif, USA, Proceedings of SPIE 4266: 171–186.

  12. 12.

    Troyanskaya O, Cantor M, Sherlock G, et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520–525. 10.1093/bioinformatics/17.6.520

  13. 13.

    Tewfik AH, Tchagang AB: Biclustering of DNA microarray data with early pruning. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA

  14. 14.

    Tchagang AB, Tewfik AH: Robust biclustering algorithm: ROBA. 2005.

  15. 15.

    Tavazoie S, Hughes J, Campbell M, Cho R, Church GYeast micro data set,

  16. 16.

    Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets. Proceedings of the International Conference on Management of Data (ACM SIGMOD '02), June 2002, Madison, Wis, USA 394–405.

  17. 17.

    Güldener U, Münsterkötter M, Kastenmüller G, et al.: CYGD: the comprehensive yeast genome database. Nucleic Acids Research 2005, 33, Database issue: D364–D368.

  18. 18.

    Munich Information Center for Protein Sequences (MIPS) and GSF-National Research Center for Environment and Health, "Comprehensive Yeast Genome Database," 2002. (visited July 21, 2005),

  19. 19.

    Ruepp A, Zollner A, Maier D, et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 2004, 32(18):5539–5545. 10.1093/nar/gkh894

  20. 20.

    Balakrishnan R, Christie KR, Costanzo MC, et al.: Saccharomyces Genome Database.

  21. 21.

    Tewfik A, Tchagang AB, Vertatschitsch L: Parallel identification of gene biclusters with coherent evolution. IEEE Transactions on Signal Processing 2006, 54(6):2408–2417. Special issue on Genomics Signal Processing

Download references

Author information

Correspondence to Alain B. Tchagang.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tchagang, A.B., Tewfik, A.H. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach. EURASIP J. Adv. Signal Process. 2006, 059809 (2006) doi:10.1155/ASP/2006/59809

Download citation


  • Experimental Data
  • Information Technology
  • Cluster Algorithm
  • Information Retrieval
  • Quantum Information