Data Mining for Bioinformatics

Liew, A. W. -C.; Yan, Hong; Yang, Mengsu

doi:10.1007/3-540-26888-X_4

A. W. -C. Liew²,
Hong Yan^2,3 &
Mengsu Yang⁴

1133 Accesses
2 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D., (1989) Molecular Biology of the Cell. Garland Publishing, New York and London.
Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers E.W. and Lipman, D.J., (1990) Basic local alignment search tool. J. Mol. Bio. 215: 403–410.
Google Scholar
Altschul, S.F. and Gish, G., (1996) Local alignment statistics. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 460–480.
Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25: 3389–3402.
Article Google Scholar
Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., and Murzin, A.G. (2004). SCOP database in 2004: refinements integrate structure and sequence family data. Nucl. Acid Res. 32: D226–D229.
Google Scholar
Attwood, T.K. (2002) The PRINTS database: a resource for identification of protein families.
Google Scholar
Ball, C.A., Sherlock, G., Parkinson, H., Rocca-Sera, P., Brooksbank, C., Causton, H.C., Cavalieri, D., Gaasterland, T., Hingamp, P., Holstege, F., Ringwald, M., Spellman, P., Stoeckert, C.J. Jr, Stewart, J.E., Taylor, R., Brazma, A. and Quackenbush, J. (2002) An open letter to the scientific journals. Published in Science 298(5593): 539 and Bioinformatics 18(11):1409.
Google Scholar
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. and Wheeler, D.L. (2003) GenBank. Nucl. Acids. Res. 31: 23–27.
Article Google Scholar
Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253: 164–170.
Google Scholar
Bowie, J.U., Zhang, K., Wilmanns, M. and Eisenberg D (1996) Three-dimensional profiles for measuring compatibility of amino acid sequence with threedimensional structure. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 598–616.
Google Scholar
Branden, C. and Tooze, J. (1999) Introduction to Protein Structure. 2nd Ed., Garland Science Publishing, New York
Google Scholar
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C, Aach J, Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C.P., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J. and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics 29: 365–371
Article Google Scholar
Bryant, S.H. and Lawrence, C.E. (1993) An empirical energy function for threading protein sequence through the fold motif. Proteins Struct. Funct. Genet. 16: 92–112.
Google Scholar
Burset, M. and Guigo, R. (1996) Evaluation of Gene Structure Prediction Programs. Genomics 34: 353–367
Article Google Scholar
Chou, P.Y. and Fasman, G.D. (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 47: 45–147.
Google Scholar
Dayhoff, M.O., Schwartz, R.M. and Orcutt BC (1978) A model of evolutionary change in proteins. Atlas of Protein Science and Structure, vol. 5,supplement 3, National Biomedical Research Foundation, Washington, DC, pp. 345–351
Google Scholar
Dovichi, N.J. and Zhang, J.Z. (2001) DNA sequencing by capillary array electrophoresis. Methods Mol. Bio. 167: 225–239.
Google Scholar
Eddy, S.R. (1998) Profile Hidden Markov models. Bioinformatics 14: 755–763.
Article Google Scholar
Felsenstein, J. (1993) PHYLIP 3.5 (phylogeny inference package). Department of Genetics, University of Washington, Seattle.
Google Scholar
Felsenstein, J. (1996) Inferring phylogeny from protein sequences by parsimony, distance and likelihood methods. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 418–427.
Google Scholar
Feng, D.F. and Doolittle, R.F. (1996) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 368–382.
Google Scholar
Fickett, J.W. (1982) Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10: 5303–5318.
Google Scholar
Fickett, J.W. (1996) Finding genes by computer: the state of the art. Trends Genet. 12: 316–320.
Article Google Scholar
Fickett, J.W. and Tung, C.S. (1992). Assessment of protein coding measures. Nucl. Acids Res. 20: 6641–6450.
Google Scholar
Frishman, D. and Argos, P. (1996) Incorporation of long-distance interactions into a secondary structure prediction algorithm. Protein Engineering 9: 133–142.
Google Scholar
Frishman D and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. Proteins 27: 329–335.
Article Google Scholar
Galperin, M.Y. (2004) The Molecular Biology Database Collection: 2004 update. Nucl. Acids Res. 32: D3–D22.
Article Google Scholar
Garnier, J., Osguthorpe, D.J. and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120: 97–120.
Article Google Scholar
Garnier, J., Gilbrat, J.F. and Robson, B. (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 540–553.
Google Scholar
Geer, R.C. and Sayers, E.W. (2003) Entrez: Making use of its power. Briefings in Bioinformatics 4: 1779–184
Google Scholar
Gibbs, A.J., McIntyre, G.A. (1970) The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16: 1–11.
Article Google Scholar
Graur, D., Li, W.H. (2000) Fundamentals of molecular evolution. (2^nd ed.) Sinauer Associates, Sunderland, Massachusetts.
Google Scholar
Grosse, I., Buldyrev, S.V., Stanley, H.E., Holste, D. and Herzel, H. (2000) Average mutual information of coding and noncoding DNA. Pacific Symposium on Biocomputing 5: 611–620.
Google Scholar
Guigo, R. (1999) DNA Composition, Codon Usage and Exon Prediction. In: Genetic Databases, (ed. M.J. Bishop), chap. 4, pp. 53–80, Academic Press.
Google Scholar
Henikoff, S., Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89: 10915–10919.
Google Scholar
Henikoff, S. and Henikoff, J.G. (1994) Protein family classification based on searching a database of blocks. Genomics 19: 97–107.
Article Google Scholar
Henikoff, J.G., Greene, E.A., Pietrokovski, S. and Henikoff S (2000) Increased coverage of protein families with the blocks database servers. Nucl. Acids Res. 28: 228–230.
Article Google Scholar
Herzel, H. and Grosse, I. (1995) Measuring correlations in symbol sequences. Physica A 216: 518–542.
MathSciNet Google Scholar
Hawkins, J.D. (1988) A survey on intron and exon lengths. Nucl. Acids Res. 16: 9893–9908.
Google Scholar
Helt, G.A., Lewis, S., Loraine, A.E. and Rubin, G.M. (1998) BioViews: Java-based tools for genomic data visualization. Genome Res. 8: 291–305.
Google Scholar
Hoersch, S., Leroy, C., Brown, N.P., Andrade, M.A., and Sander, C. (2000) The GeneQuiz Web server: protein functional analysis through the Web. Trends in Biochem. Sci. 25: 33–35.
Google Scholar
Holm, L. and Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233: 23–138
Article Google Scholar
Holm, L. and Sander, C. (1996a) Mapping the protein universe. Science 273: 595–602.
Google Scholar
Holm, L. and Sander, C. (1996b) The FSSP database: fold classification based on structure-structure alignment of proteins. Nucl. Acids Res. 24: 206–209
Article Google Scholar
Hughey, R. and Krogh, A. (1996) Hidden Markov models for sequence analysis: Extension and the analysis of the basic method. Comput. Appl. Biosci. 12: 95–107.
Google Scholar
Huang, J.Y. and Brutlag, D.L. (2001). The eMOTIF database. Nucl. Acids Res. 29: 202–204.
Google Scholar
Johnson, M.S., May, A.C. and Ridionov, M.A., Overington JP (1996) Discrimination of common protein folds: Application of protein structure to sequence/structure comparisons. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 575–598.
Google Scholar
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195–202.
Article Google Scholar
Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992a) A new approach to protein fold recognition. Nature 358: 86–89.
Article Google Scholar
Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992b) The rapid generation of mutation data matrices from protein sequences. Comp. Appl. Biosci. 8: 275–282.
Google Scholar
Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.
Article Google Scholar
Kim, J., Pramanik. S. and Chung, M.J. (1994). Multiple sequence alignment by simulated annealing. Comput. Appl. Biosci. 10: 419–426.
Google Scholar
Konopka, A.K. (1994) Structure and Methods: VI. Human Genome Initiative and DNA Recombination, chapter Towards Mapping Functional Domains in Indiscriminantly Sequenced Nucleic Acids: A Computational Approach. Adenine Press, Guilderland, New York.
Google Scholar
Kulikova, T., Aldebert, P., Althorpe, N., Baker, W., Bates, K. and Browne, P., van den Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, Garcia-Pastor M, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Stoehr P, Stoesser G, Tuli MA, Tzouvara K, Vaughan R, Wu D and Zhu W, Apweiler R (2004) The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 32: D27–D30.
Article Google Scholar
Lathrop, R.H., Rogers R.G. Jr., Bienkowska J., Bryant B.K.M, Buturovic L.J., Gaitatzes C., Nambudripad R., White J.V., and Smith T.F. (1998). Analysis and algorithms for protein sequence-structure alignment. Computational methods in molecular biology. S. Salzberg, D. Searls, and S. Kasif Eds. Elsevier Press. Amsterdam, Chapter 12, pp. 227–283.
Google Scholar
Lathrop, R.H., Rogers, R.G. Jr., Bienkowska, J., Bryant, B.K.M., Buturovic, L.J., Gaitatzes, C., Nambudripad, R., White, J.V., Smith, T.F. (1988) Analysis and algorithms for protein sequence-structure alignment. New Compr. Biochem. (Series title: Computational methods in molecular biology) 32: 337–355.
Google Scholar
Lemer, C.M., Rooman, M.J. and Wodak, S.J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23(3): 337–55.
Article Google Scholar
Li, W. (1997) The study of correlation structures of DNA sequences: a critical review. Computer and Chemistry 21: 257–271.
Google Scholar
Li, W.H. (1997) Molecular evolution. Sinauer Associates, Sunderland, Massachusetts.
Google Scholar
Liew, A.W.C., Wu, Y., Yan, H. and Yang, M. (2004) A Study on the Effective Statistical Coding Features for Coding/Non-coding DNA Sequence Classification for Yeast, C. elegans and Human. Submitted.
Google Scholar
Lippmann, R.P. (1987) An introduction to computing with neural nets. IEEE ASSP Magazine. 4(2): 4–22.
Article Google Scholar
Lo Conte, L., Brenner, S.E., Hubbard, T.J.P., Chothia, C. and Murzin, A. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucl. Acid Res. 30: 264–267.
Google Scholar
Lukashin, A.V., Borodovsky, M. (1998) GeneMark.hmm: new solutions for gene finding. Nucl. Acids Res. 26: 1107–1115.
Article Google Scholar
Madej, T., Gibrat, J.F. and Bryant, S.H. (1995) Threading a database of protein cores. Proteins, 23: 356–369.
Article Google Scholar
Markel, S. and Leon, D. (2003) Sequence Analysis in a nutshell: a guide to common tools and databases. O’Reilly and Associates, Inc., USA
Google Scholar
Martz, E. (2003) 3D molecular visualization with Protein Explorer. In: Introduction to Bioinformatics: A Theoretical and Practical Approach, (S.A. Krawetz, D.D. Womble eds.), Humana Press, Totowa, New Jersey
Google Scholar
Maizel, J.V. Jr. and Lenk, R.P. (1981) Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. Proc. Natl. Acad. Sci. USA. 78; 7665–7669
MathSciNet Google Scholar
Mathe, C., Sagot, M.F., Schiex, T. and Rouze, P. (2002) Current methods of gene prediction, their strengths and weakness — survey and summary. Nucl. Acids Res. 30: 4103–4117
Article Google Scholar
McGuffin, L.J., Bryson, K. and Jones D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics 16: 404–405.
Article Google Scholar
Mirny, L.A. and Shakhnovich, E.I. (1998) Protein structure prediction by threading-Why it works and why it does not. J. Mol. Biol. 283(2): 507–526.
Article Google Scholar
Mirny, L.A., Finkelstein, A.V. and Shakhnovich, E.I. (2000) Statistical significance of protein structure prediction by threading. Proc. Natl. Acad. Sci. USA. 97(18): 9978–9983.
Article Google Scholar
Miyazaki, S., Sugawara, H., Gojobori, T. and Tateno, Y. (2003) DNA Data Bank of Japan (DDBJ) in XML. Nucl. Acids. Res. 31: 13–16.
Article Google Scholar
Miyazaki, S., Sugawara, H., Ikeo, K., Gojobori, T. and Tateno, Y. (2004). DDBJ in the stream of various biological data. Nucl. Acids. Res. 32: D31–D34.
Article Google Scholar
Mizuguchi, K., Blundell, T.L. (2000) Analysis of conservation and substitutions of secondary structure elements within protein superfamilies. Bioinformatics 16: 1111–1119.
Article Google Scholar
Mount, D.W. (2001) Bioinformatics — Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, New York.
Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.
Article Google Scholar
Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Bio. 48: 443–453.
Google Scholar
Notredame, C. and Higgins, D.G. (1996) SAGA: Sequence alignment by genetic algorithm. Nucl. Acids Res. 24: 1515–1524.
Article Google Scholar
Orengo, C.A., Michie, A.D., Jones S., Jones D.T., Swindells M.B., and Thornton J.M. (1997). CATH-A Hierarchic Classification of Protein Domain Structures. Structure 5(8): 1093–1108.
Article Google Scholar
Panchenko, A.R., Marchler-Bauer, A., Bryant, S.H. (2000) Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296(5): 1319–1331.
Article Google Scholar
Pearl, F.M.G., Lee, D., Bray, J.E,, Sillitoe, I., Todd A.E. and Harrison A.P., Thornton J.M., and Orengo C.A. (2000). Assigning genomic sequences to CATH. Nucl. Acids Res. 28(1): 277–282. Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C (1999) Automated genome sequence analysis and annotation. Bioinformatics 15: 391–412.
Article Google Scholar
Pearson, W.R. (1990) Rapid and sensitive comparison with FASTP and FASTA. Methods Enzymol. (Series tile: Molecular evolution: computer analysis of protein and nucleic acid sequences) 183: 63–98.
Google Scholar
Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85: 2444–8.
Google Scholar
Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 525–539.
Google Scholar
Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232: 584–599.
Article Google Scholar
Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19: 55–77.
Article Google Scholar
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. D.E. Rumelhart and J.L. McClelland Eds. MIT Press, pp 318–362.
Google Scholar
Salamov, A.A. and Solovyev, V.V. (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiply sequence alignments. J. Mol. Biol. 247: 11–15.
Article Google Scholar
Salamov, A.A. and Solovyev, V.V. (1997) Protein secondary structure prediction using local alignments. J. Mol. Biol. 268: 31–36.
Article Google Scholar
Salzberg, S.L., Delcher, A.L., Kasif, S. and White, O. (1998a) Microbial gene identification using interpolated Markov models. Nucl. Acids Res. 26: 544–548.
Article Google Scholar
Salzberg, S.L., Delcher, A.L., Fasman, K.H. and Henderson, J. (1998b) A decision tree system for finding genes in DNA. J. of Comp. Biol. 5: 667–680.
Google Scholar
Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA. 74: 5463–5467.
Google Scholar
Schwartz, R.M. and Dayhoff, M.O. (1978) Matrices for detecting distant relationships. Atlas of Protein Science and Structure, vol. 5,supplement 3, National Biomedical Research Foundation, Washington, DC, pp 353–358.
Google Scholar
Serov, V.N. and Spirov, A.V., Samsonova MG (1998) Graphical interface to the genetic network database GeNet. Bioinformatics 14: 546–547.
Article Google Scholar
Shapiro, L. and Harris, T. (2000) Finding function through structural genomics. Current Opinion in Biotechnology 11: 31–35.
Article Google Scholar
Siddiqui, A.S., Dengler, U. and Barton, G.J. (2001) 3Dee: A database of protein structural domains. Bioinformatics 17: 200–201.
Article Google Scholar
Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors.
Google Scholar
Smith, T.F. and Waterman, M.S. (1981a) Identification of common molecular subsequences. J. Mol. Bio. 147: 195–197.
Google Scholar
Smith, T.F. and Waterman, M.S. (1981b). Comparison of biosequences. Adv. Appl. Math. 2: 482–489.
Article MathSciNet MATH Google Scholar
Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A. and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucl. Acids Res. 26: 320–322.
Article Google Scholar
Staden, R. (1990) Finding protein coding regions in genomic sequences. Methods Enzymol. (Series title: Molecular evolution: computer analysis of protein and nucleic acid sequences) 183: 163–80.
Google Scholar
Staden R, McLachlan AD (1982) Codon preference and its use in identifying protein
Google Scholar
States, D.J., Boguski, M.S. (1991) Similarity and homology. In: Sequence Analysis Primer, (ed. M. Gribskov and J. Devereux), pp. 92–124, Stockton Press, New York.
Google Scholar
Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M. (1996) Phylogenetic inference. In Molecular Systematics 2nd ed., (ed. D.M. Hillis et al.), chap. 5, pp 407–514, Sinauer Associates, Sunderland, Massachusetts.
Google Scholar
Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochandrail DNA in humans and chimpanzees. Mol. Bio. Evol. 10: 512–526.
Google Scholar
Tateno, Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K. and Saitou, N., Sugawara H, Gojobori T (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucl. Acids. Res. 30: 27–30.
Article Google Scholar
Thanaraj, T.A. (2000) Positional characterisation of false positives from computational prediction of human splice sites. Nucl. Acids Res. 28: 744–754.
Article Google Scholar
Thiele, R., Zimmer, R. and Lengauer, T. (1999) Protein threading by recursive dynamic programming. J. Mol. Biol. 290(3): 757–779.
Article Google Scholar
Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22: 4673–4680.
Google Scholar
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins, D.G. (1997) The CLUSTAL X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl. Acids Res. 25: 4876–4882.
Article Google Scholar
Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R. (1997) Prediction of probable genes by fourier analysis of genomic sequences. Computer Applications in the Biosciences 13: 263–270.
Google Scholar
Uberbacher, E.C., Xu, Y. and Mural, R.J. (1996) Discovering and understanding genes in human DNA sequence using GRAIL. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 259–281.
Google Scholar
Vilo, J., Kapushesky, M., Kemmeren, P., Sarkans, U. and Brazma, A. (2003) Expression Profiler. In: The analysis of gene expression data: methods and software (Parmigiani G; Garrett E; Irizarry R; Zeger S L, eds.), Springer, NY.
Google Scholar
Wang Y., Zhang C.T., and Dong P. (2002). Recognizing shorter coding regions of human genes based on the statistics of stop codons. Biopolymers 63(3): 207–216.
Article Google Scholar
Williams, G. (1999) Nucleic acid and protein sequence databases. In: Genetic Databases, (ed. M.J. Bishop), chap.2, pp. 11–37, Academic Press.
Google Scholar
Wu, S., Liew, A.W.C. and Yan, H. (2003) Cluster Analysis of Gene Expression Data Based on Self-Splitting and Merging Competitive Learning. To appear in IEEE Transactions on Information Technology in Biomedicine.
Google Scholar
Wu, Y., Liew, A.W.C., Yan, H. and Yang, M. (2003a) DB-Curve: A Novel 2D Method of DNA Sequence Visualization and Representation. Chem. Phys. Lett. 367: 170–176.
Article Google Scholar
Wu, Y., Liew, A.W.C., Yan, H. and Yang, M. (2003b) Classification of short human exons and introns based on statistical features. Phys. Rev. E. 67(6): Art. No. 061916.
Google Scholar
Zhang, M.Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc. Natl. Acad. Sci. USA. 94: 565–568.
Google Scholar
Zhang, C.T., Wang, J. (2000) Recognition of protein coding genes in the Yeast genome at better than 95% accuracy based on the Z curve. Nucl. Acids Res. 28: 2804–2814.
Google Scholar
Zhang, R. and Zhang, C.T. (1994). Z Curves, an Intuitive Tool for Visualizing and Analyzing DNA sequences. Journal Biomolecular Structure Dynamics 11: 767–782.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering and Information Technology, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
A. W. -C. Liew & Hong Yan
School of Electrical and Information Engineering, University of Sydney, NSW, 2006, Australia
Hong Yan
Department of Biology and Chemistry, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Mengsu Yang

Authors

A. W. -C. Liew
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Mengsu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Faculty of Science and Technology, Deakin University, Australia
Yi-Ping Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liew, A.W.C., Yan, H., Yang, M. (2005). Data Mining for Bioinformatics. In: Chen, YP.P. (eds) Bioinformatics Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26888-X_4

Download citation

DOI: https://doi.org/10.1007/3-540-26888-X_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20873-0
Online ISBN: 978-3-540-26888-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics