Skip to main content
Log in

Cluster Analysis of Gene Expression Data

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample—such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50–100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by “clustering” and how we analyze the massive amounts of data from such experiments, and present results obtained from analysis of data from colon cancer, brain tumors and breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. E. Domany, K. K. Mon, G. V. Chester, and M. E. Fisher, Phys. Rev. B 12:5025(1975).

    Google Scholar 

  2. D. Mukamel, M. E. Fisher, and E. Domany, Phys. Rev. Lett. 37:565(1976).

    Google Scholar 

  3. R. Sharan and R. Shamir, in Current Topics in Computational Biology (MIT Press, Boston, 2002), p. 269.

    Google Scholar 

  4. B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson, Molecular Biology of the Cell, 3rd Ed. (Garland Publishing, New York, 1994).

    Google Scholar 

  5. J. L. Gould and W. T. Keeton, Biological Science, 6th Ed. (W. W. Norton, New York, London, 1996).

    Google Scholar 

  6. A. Schulze and J. Downward, Nature Cell. Biol. 3:190(2001)

    Google Scholar 

  7. See http://www.affymetrix.com for information.

  8. See http://cmgm.stanford.edu/pbrown/mguide/index.html

  9. U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, Proc. Natl. Acad. Sci. USA 96:6745(1999).

    Google Scholar 

  10. J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer. Nat. Med. 7:673–9 (2001).

    Google Scholar 

  11. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data (Prentice-Hall, Englewood Cliffs, NJ, 1988).

    Google Scholar 

  12. O. R. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (Wiley, New York, 2001)

    Google Scholar 

  13. M. Eisen, P. Spellman, P. Brown, and D. Botstein, Proc. Natl. Acad. Sci. USA 95:14863(1998).

    Google Scholar 

  14. T. Kohonen, Self Organizing Maps (Springer, Berlin, 1997).

    Google Scholar 

  15. K. Rose, E. Gurewitz, and G. C. Fox, Phys. Rev. Lett. 65:945(1990).

    Google Scholar 

  16. L. Angelini, F. De Carlo, C. Marangi, M. Pellicor, and S. Stramaglia, Phys. Rev. Lett 85:554(2000).

    Google Scholar 

  17. J. Schneider, Phys. Rev. E 57:2449(1998)

    Google Scholar 

  18. M. Blatt, S. Wiseman, and E. Domany, Phys. Rev. Lett. 76:3251(1996).

    Google Scholar 

  19. M. Blatt, S. Wiseman, and E. Domany, Neural Comp. 9:1805(1997).

    Google Scholar 

  20. M. Blatt, Non-ferromagnetic Potts models can be obtained from maximum likelihood and maximum entropy principles; Ph.D. thesis, Weizmann Inst. Of Science (1997) and

  21. L. Giada and M. Marsili, Phys. Rev. E 63:1101(2001).

    Google Scholar 

  22. E. Domany, M. Blatt, Y. Gdalyahu, and D. Weinshall, Comp. Phys. Comm. 121:5(1999).

    Google Scholar 

  23. G. Getz, E. Levine, E. Domany, and M. Zhang, Phys. A 279:457(2000).

    Google Scholar 

  24. P. T. Spellman et al., Mol. Biol. Cell 9:3273(1998).

    Google Scholar 

  25. K. Kannan, N. Amariglio, G. Rechavi, J. Jakob-Hirsch, I. Kela, N. Kaminski, G. Getz, E. Domany, and D. Givol, Oncogene 20:2225(2001).

    Google Scholar 

  26. G. Fontemaggi, I. Kela, N. Amariglio, G. Rechavi, J. Krishnamurthy, S. Strano, A. Sacchi, D. Givol, and G. Blandino, Identification of direct p73 target genes combining DNA microarray and chromatin immunoprecipitation analyses; Comparison with p53 targets, Oncogene (in print 2002).

  27. G. Getz, E. Levine, and E. Domany, Proc. Natl. Acad. Sci. USA 97:12079(2000).

    Google Scholar 

  28. A. Califano, G. Stolovitsky, and Y. Tu, Proc. Int. Conf. Intell. Syst. Mol. Biol. 8:75(2000).

    Google Scholar 

  29. Y. Cheng, and G. M. Church, Proc. Int. Conf. Intell. Syst. Mol. Biol. 8:93(2000).

    Google Scholar 

  30. A. Tanay, R. Sharan, and R. Shamir, Proc. Int. Conf. Intell. Syst. Mol. Biol. (in print, 2002)

  31. J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai, Nature Genetics 31:370(2002).

    Google Scholar 

  32. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, Science 286:531(1999).

    Google Scholar 

  33. I. Kela, Unraveling Biological Information from Gene Expression Data, Using Advanced Clustering Techniques, M.Sc. thesis (Weizmann Institute of Science, 2001). Available at http://www.weizmann.ac.il/physics/complex/compphys

  34. C. M. Perou et al., Nature 406:747(2000).

    Google Scholar 

  35. T. Sorlie et al., Proc. Natl. Acad. Sci. USA, 19:10869(2001).

    Google Scholar 

  36. S. Godard, G. Getz, H. Kobayashi, M. Nozaki, A.-C. Diserens, M.-F. Hamou, R. Stupp, R. C. Janzer, P. Bucher, N. de Tribolet, E. Domany, and M. E. Hegi (submitted, 2002).

  37. J.-E. Dazard, H. Gal, N. Amariglio, G. Rechavi, E. Domany, and D. Givol (submitted 2002)

  38. D. A. Notterman, U. Alon, A. J. Sierk, and A. J. Levine, Cancer Res. 7:3124(2001).

    Google Scholar 

  39. G. Getz, H. Gal, I. Kela, D. A. Notterman, and Eytan Domany, Bioinformatics (in print 2002).

  40. F. Quintana, G. Getz, G. Hed, E. Domany, and I. R. Cohen (submitted 2002).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domany, E. Cluster Analysis of Gene Expression Data. Journal of Statistical Physics 110, 1117–1139 (2003). https://doi.org/10.1023/A:1022148927580

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022148927580

Navigation