Skip to main content
  • Research Article
  • Open access
  • Published:

DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach

Abstract

Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNA microarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of biclustering algorithms is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this study, we develop novel biclustering algorithms using basic linear algebra and arithmetic tools. The proposed biclustering algorithms can be used to search for all biclusters with constant values, biclusters with constant values on rows, biclusters with constant values on columns, and biclusters with coherent values from a set of data in a timely manner and without solving any optimization problem. We also show how one of the proposed biclustering algorithms can be adapted to identify biclusters with coherent evolution. The algorithms developed in this study discover all valid biclusters of each type, while almost all previous biclustering approaches will miss some.

References

  1. Hartigan JA: Direct clustering of a data matrix. Journal of the American Statistical Association 1972, 67(337):123–129. 10.2307/2284710

    Article  Google Scholar 

  2. Cheng Y, Church GM: Biclustering of expression data. Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB '00), August 2000, La Jolla, Calif, USA 93–103.

    Google Scholar 

  3. Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(supplement 1):S136–S144.

    Article  Google Scholar 

  4. Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(22):12079–12084. 10.1073/pnas.210134797

    Article  Google Scholar 

  5. Lazzeroni L, Owen A: Plaid models for gene expression data. Statistica Sinica 2002, 12(1):61–86.

    MathSciNet  MATH  Google Scholar 

  6. Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the order-preserving submatrix problem. Proceedings of the 6th Annual International Conference on Computational Biology (RECOMB '02), April 2002, Washington, DC, USA 49–57.

    Google Scholar 

  7. Sharan R, Maron-Katz A, Shamir R: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 2003, 19(14):1787–1799. 10.1093/bioinformatics/btg232

    Article  Google Scholar 

  8. Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research 2003, 13(4):703–716. 10.1101/gr.648603

    Article  Google Scholar 

  9. Yang J, Wang H, Wang W, Yu PS: Enhanced biclustering on expression data. Proceedings of 3rd IEEE Symposium on Bioinformatics and Bioengineering (BIBE '03), March 2003, Bethesda, Md, USA 321–327.

    Google Scholar 

  10. Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE Transactions on Computational Biology and Bioinformatics 2004, 1(1):24–45. 10.1109/TCBB.2004.2

    Article  Google Scholar 

  11. Alter O, Brown PO, Botstein D: Processing and modeling genome-wide expression data using singular value decomposition. Microarrays: Optical Technologies and Informatics, January 2001, San Jose, Calif, USA, Proceedings of SPIE 4266: 171–186.

    Google Scholar 

  12. Troyanskaya O, Cantor M, Sherlock G, et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520–525. 10.1093/bioinformatics/17.6.520

    Article  Google Scholar 

  13. Tewfik AH, Tchagang AB: Biclustering of DNA microarray data with early pruning. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA

    Google Scholar 

  14. Tchagang AB, Tewfik AH: Robust biclustering algorithm: ROBA. 2005.

    Google Scholar 

  15. Tavazoie S, Hughes J, Campbell M, Cho R, Church GYeast micro data set, https://doi.org/arep.med.harvard.edu/biclustering

  16. Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets. Proceedings of the International Conference on Management of Data (ACM SIGMOD '02), June 2002, Madison, Wis, USA 394–405.

    Google Scholar 

  17. Güldener U, Münsterkötter M, Kastenmüller G, et al.: CYGD: the comprehensive yeast genome database. Nucleic Acids Research 2005, 33, Database issue: D364–D368.

    Google Scholar 

  18. Munich Information Center for Protein Sequences (MIPS) and GSF-National Research Center for Environment and Health, "Comprehensive Yeast Genome Database," 2002. (visited July 21, 2005), https://doi.org/mips.gsf.de/genre/proj/yeast/

  19. Ruepp A, Zollner A, Maier D, et al.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 2004, 32(18):5539–5545. 10.1093/nar/gkh894

    Article  Google Scholar 

  20. Balakrishnan R, Christie KR, Costanzo MC, et al.: Saccharomyces Genome Database. https://doi.org/www.yeastgenome.org

  21. Tewfik A, Tchagang AB, Vertatschitsch L: Parallel identification of gene biclusters with coherent evolution. IEEE Transactions on Signal Processing 2006, 54(6):2408–2417. Special issue on Genomics Signal Processing

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alain B. Tchagang.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tchagang, A.B., Tewfik, A.H. DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach. EURASIP J. Adv. Signal Process. 2006, 059809 (2006). https://doi.org/10.1155/ASP/2006/59809

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/ASP/2006/59809

Keywords