skip to main content
article

Biological applications of multi-relational data mining

Published:01 July 2003Publication History
Skip Abstract Section

Abstract

Biological databases contain a wide variety of data types, often with rich relational structure. Consequently multi-relational data mining techniques frequently are applied to biological data. This paper presents several applications of multi-relational data mining to biological data, taking care to cover a broad range of multi-relational data mining techniques.

References

  1. Critical assessment of information extraction systems in biology, 2003. www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html.]]Google ScholarGoogle Scholar
  2. A. Bernal, U. Ear, and N. Kyrpides. Genomes OnLline database (GOLD): A monitor of genome projects worldwide. Nucleic Acids Research, 29(1):126--127, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Bockhorst, M. Craven, D. Page, J. Shavlik, and J. Glasner. A Bayesian network approach to operon prediction. Bioinformatics, 19(10):1227--1235, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Bockhorst, Y. Qiu, J. Glasner, M. Liu, F. Blattner, and M. Craven. Predicting bacterial transcription units using sequence and expression data. Bioinformatics, 19(Suppl. 1):34--43, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Bryant, S. Muggleton, S. Oliver, D. Kell, P. Reiser, and R. King. Combining inductive logic programming, active learning, and robotics to discover the function of genes. Electronic Transactions in Artificial Intelligence, 2001.]]Google ScholarGoogle Scholar
  6. R. Bunescu, R. Ge, R. Kate, R. Mooney, E. Marcotte, and A. Ramani. Learning information extractors for proteins and their interactions. In Working Notes of the ICML Workshop on Machine Learning in Bioinformatics, 2003.]]Google ScholarGoogle Scholar
  7. C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78--94, 1997.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. M. E. Califf and R. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177--210, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cheng, C. Hatzis, H. Hayashi, M. A. Krogel, S. Morishita, D. Page, and J. Sese. KDD Cup 2001 report. SIGKDD Explorations, 3(2):47--64, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Chrisman, P. Langley, S. Bay, and A. Pohorille. Incorporating biological knowledge into evaluation of causal regulatory hypotheses. In Proceedings of the Eighth Pacific Symposium on Biocomputing. 2003.]]Google ScholarGoogle Scholar
  11. M. Craven. The genomics of a signaling pathway: A KDD cup challenge task. SIGKDD Explorations, 4(2):97--98, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77--86, Heidelberg, Germany, 1999. AAAI Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Craven, D. Page, J. Shavlik; J. Bockhorst, and J. Glasner. A probabilistic learning approach to wholegenome operon prediction. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 116--127, La Jolla, CA, 2000. AAAI Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Craven and S. Slattery. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning, 43(1--2):97--119, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Debnath, R. L. de Compadre, G. Debnath, A. Schusterman, and C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34(2):786--797, 1991.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). AAAI Press, New York, 1998.]]Google ScholarGoogle Scholar
  17. T. Dietterich, R. Lathrop, and T. Lozano-Pérez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1--2):31--71, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Džeroski, H. Blockeel, B. Kompare, S. Kramer, B. Pfahringer, and W. V. Laer. Experiments in predicting biodegradability. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 80--91. Springer-Verlag LNAI 1634, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32:41--62, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Finn, S. Muggleton, D. Page, and A. Srinivasan. Discovery of pharmacophores using Inductive Logic Programming. Machine Learning, 30 :241--270, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Freitag and N. Kushmerick. Boosted wrapper induction. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 577--583, Austin, TX, 2000. AAAI Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Helma and S. Kramer. A survey of the predictive toxicology challenge 2000--2001. Bioinformatics, 19( 10):1179--1182, 2003.]]Google ScholarGoogle Scholar
  23. L. Hirschman, J. Park, J. Tsujii, L. Wong, and C. Wu. Accomplishments and challenges in literature data mining for biology. Bioinformatics, 18:1553--1561, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  24. L. Hood and D. Galas. The digital code of DNA. Nature, 421:444--448, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Data Mining and Knowledge Discovery, pages 13--23, 2000.]] Google ScholarGoogle ScholarCross RefCross Ref
  26. A. Jain, T. Dietterich, R. Lathrop, D. Chapman, R. Critchlow, B. Bauer, T. Webster, and T. Lozano-Pérez. Compass: a shape-based machine learning tool for drug design. Journal of Computer-Aided Molecular Design, 8:635--652, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. A. Jain, K. Koile, B. Bauer, and D. Chapman. Compass: Predicting biological activities from molecular surface properties. Journal of Medicinal Chemistry, 37:2315--2327. 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  28. P. Karp, M. Riley, S. Paley, and A. Pellegrini-Toole. EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nucleic Acids Research, 25(l), 1997.]]Google ScholarGoogle Scholar
  29. R. King, S. Muggleton, R. Lewis, and M. Sternberg. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proceedings of the National Academy of Sciences, 89(23):11322--11326, 1992.]]Google ScholarGoogle ScholarCross RefCross Ref
  30. R. King, S. Muggleton, A. Srinivasan, and M. Sternberg. Structure-activity relationships derived by machine learning: the use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences, 93:438--442, 1996.]]Google ScholarGoogle ScholarCross RefCross Ref
  31. I. Korf, P. Flicek, D. Duan, and M. Brent. Integrating genomic homology into gene structure prediction. Bioinformatics, l7(Suppl. l):S140--S148, 2001.]]Google ScholarGoogle Scholar
  32. S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pages 136--143, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262:208--214, 1993.]]Google ScholarGoogle ScholarCross RefCross Ref
  34. N. Marchand-Geneste, K. Watson, B. Alsberg, and R. King. A new approach to pharmacophore mapping and qsar analysis using inductive logic programming. application to thermolysin inhibitors and glycogen phosphorylase b inhibitors. Journal of Medicinal Chemistry, 45(2):399--409, January 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  35. I. Meyer and R. Durbin. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics, 18(10):1309--1318, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  36. M. Molla, P. Andrae, J. Glasner, F. Blattner, and J. Shavlik. Interpreting microarray expression data using text annotating the genes. Information Sciences, 146:75--88, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Molla, M. Waddell, D. Page, ind J. Shavlik. Using machine learning to design and interpret geneexpression microarrays. AI Magazine, 2003. (In Press).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245--286, 1995.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Muggleton and C. Feng. Efficient induction of logic programs. In Proceedings of the First Conference on Algorithmic Learning Theory, Tokyo, 1990. Ohmsha.]]Google ScholarGoogle Scholar
  40. S. Muggleton, R. King, and M. Sternberg. Protein secondary structure prediction using logic-based machine learning. Protein Engineering, 5(7):647--657, 1992.]]Google ScholarGoogle ScholarCross RefCross Ref
  41. National Library of Medicine. Pubmed, 1999. http://www.ncbi.nlm.nih.gov/PubMed/.]]Google ScholarGoogle Scholar
  42. C. Perlich and F. Provost. Aggregation-based feature invention and relational concept classes. In Proceedings of KDD-03. ACM SIGKDD, August 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239--2666, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning, pages 3--20, Vienna, Austria, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Ray and M. Craven. Representing sentence structure in hidden Markov models for information extraction. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 1273--1279, Seattle, WA, 2001. Morgan Kaufmann.]]Google ScholarGoogle Scholar
  46. M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. Genecards: Encyclopedia for genes, proteins and diseases, 1997. http://bighost.area.ba.cnr.it/GeneCards.]]Google ScholarGoogle Scholar
  47. P. Reiser, R. King, D. Kell, S. Muggleton, C. Bryant, and S. Oliver. Developing a logical model of yeast metabolism. Electronic Transactions in Artificial Intelligence, 2001.]]Google ScholarGoogle Scholar
  48. E. Riloff. An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence, 85:101--134, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. E. Riloff. The sundance sentence analyzer, 1998. http://www.cs.utah.edu/projects/nlp/.]]Google ScholarGoogle Scholar
  50. B. Rost and C. Sander. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19:55--77, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  51. S. Schmidler, J. Liu, and D. Brutlag. Bayesian segmentation of protein secondary structure. Journal of Computational Biology, 7:233--248, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  52. E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller. Rich probabilistic models for gene expression. Bioinforrnatics, 1:l--10, 2001.]]Google ScholarGoogle Scholar
  53. H. Shatkay, S. Edwards, W. J. Wilbur, and M. Boguski. Genes, themes and microarrays: Using information retrieval for large-scale gene analysis. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 317--328, La Jolla, CA, 2000. AAAI Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Skounakis, M. Craven, and S. Ray. Hierarchical hidden Markov models for information extraction. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003. Morgan Kaufmann.]]Google ScholarGoogle Scholar
  55. A. Srinivasan and R. King. Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. In S. Muggleton, editor, Proceedings of the 6th International Workshop on Inductive Logic Programming, pages 352--367. Stockholm University, Royal Institute of Technology, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Turcotte, S. Muggleton, and M. Sternberg. Automated discovery of structural signatures of protein fold and function. Journal of Molecular Biology, 306:591--605, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  57. X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'O3), 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Biological applications of multi-relational data mining
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGKDD Explorations Newsletter
              ACM SIGKDD Explorations Newsletter  Volume 5, Issue 1
              July 2003
              101 pages
              ISSN:1931-0145
              EISSN:1931-0153
              DOI:10.1145/959242
              Issue’s Table of Contents

              Copyright © 2003 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 2003

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader