Abstract
The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample—such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50–100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by “clustering” and how we analyze the massive amounts of data from such experiments, and present results obtained from analysis of data from colon cancer, brain tumors and breast cancer.
Similar content being viewed by others
REFERENCES
E. Domany, K. K. Mon, G. V. Chester, and M. E. Fisher, Phys. Rev. B 12:5025(1975).
D. Mukamel, M. E. Fisher, and E. Domany, Phys. Rev. Lett. 37:565(1976).
R. Sharan and R. Shamir, in Current Topics in Computational Biology (MIT Press, Boston, 2002), p. 269.
B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson, Molecular Biology of the Cell, 3rd Ed. (Garland Publishing, New York, 1994).
J. L. Gould and W. T. Keeton, Biological Science, 6th Ed. (W. W. Norton, New York, London, 1996).
A. Schulze and J. Downward, Nature Cell. Biol. 3:190(2001)
See http://www.affymetrix.com for information.
See http://cmgm.stanford.edu/pbrown/mguide/index.html
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, Proc. Natl. Acad. Sci. USA 96:6745(1999).
J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer. Nat. Med. 7:673–9 (2001).
A. K. Jain and R. C. Dubes, Algorithms for Clustering Data (Prentice-Hall, Englewood Cliffs, NJ, 1988).
O. R. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (Wiley, New York, 2001)
M. Eisen, P. Spellman, P. Brown, and D. Botstein, Proc. Natl. Acad. Sci. USA 95:14863(1998).
T. Kohonen, Self Organizing Maps (Springer, Berlin, 1997).
K. Rose, E. Gurewitz, and G. C. Fox, Phys. Rev. Lett. 65:945(1990).
L. Angelini, F. De Carlo, C. Marangi, M. Pellicor, and S. Stramaglia, Phys. Rev. Lett 85:554(2000).
J. Schneider, Phys. Rev. E 57:2449(1998)
M. Blatt, S. Wiseman, and E. Domany, Phys. Rev. Lett. 76:3251(1996).
M. Blatt, S. Wiseman, and E. Domany, Neural Comp. 9:1805(1997).
M. Blatt, Non-ferromagnetic Potts models can be obtained from maximum likelihood and maximum entropy principles; Ph.D. thesis, Weizmann Inst. Of Science (1997) and
L. Giada and M. Marsili, Phys. Rev. E 63:1101(2001).
E. Domany, M. Blatt, Y. Gdalyahu, and D. Weinshall, Comp. Phys. Comm. 121:5(1999).
G. Getz, E. Levine, E. Domany, and M. Zhang, Phys. A 279:457(2000).
P. T. Spellman et al., Mol. Biol. Cell 9:3273(1998).
K. Kannan, N. Amariglio, G. Rechavi, J. Jakob-Hirsch, I. Kela, N. Kaminski, G. Getz, E. Domany, and D. Givol, Oncogene 20:2225(2001).
G. Fontemaggi, I. Kela, N. Amariglio, G. Rechavi, J. Krishnamurthy, S. Strano, A. Sacchi, D. Givol, and G. Blandino, Identification of direct p73 target genes combining DNA microarray and chromatin immunoprecipitation analyses; Comparison with p53 targets, Oncogene (in print 2002).
G. Getz, E. Levine, and E. Domany, Proc. Natl. Acad. Sci. USA 97:12079(2000).
A. Califano, G. Stolovitsky, and Y. Tu, Proc. Int. Conf. Intell. Syst. Mol. Biol. 8:75(2000).
Y. Cheng, and G. M. Church, Proc. Int. Conf. Intell. Syst. Mol. Biol. 8:93(2000).
A. Tanay, R. Sharan, and R. Shamir, Proc. Int. Conf. Intell. Syst. Mol. Biol. (in print, 2002)
J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai, Nature Genetics 31:370(2002).
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, Science 286:531(1999).
I. Kela, Unraveling Biological Information from Gene Expression Data, Using Advanced Clustering Techniques, M.Sc. thesis (Weizmann Institute of Science, 2001). Available at http://www.weizmann.ac.il/physics/complex/compphys
C. M. Perou et al., Nature 406:747(2000).
T. Sorlie et al., Proc. Natl. Acad. Sci. USA, 19:10869(2001).
S. Godard, G. Getz, H. Kobayashi, M. Nozaki, A.-C. Diserens, M.-F. Hamou, R. Stupp, R. C. Janzer, P. Bucher, N. de Tribolet, E. Domany, and M. E. Hegi (submitted, 2002).
J.-E. Dazard, H. Gal, N. Amariglio, G. Rechavi, E. Domany, and D. Givol (submitted 2002)
D. A. Notterman, U. Alon, A. J. Sierk, and A. J. Levine, Cancer Res. 7:3124(2001).
G. Getz, H. Gal, I. Kela, D. A. Notterman, and Eytan Domany, Bioinformatics (in print 2002).
F. Quintana, G. Getz, G. Hed, E. Domany, and I. R. Cohen (submitted 2002).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Domany, E. Cluster Analysis of Gene Expression Data. Journal of Statistical Physics 110, 1117–1139 (2003). https://doi.org/10.1023/A:1022148927580
Issue Date:
DOI: https://doi.org/10.1023/A:1022148927580