Abstract
Biological databases contain a wide variety of data types, often with rich relational structure. Consequently multi-relational data mining techniques frequently are applied to biological data. This paper presents several applications of multi-relational data mining to biological data, taking care to cover a broad range of multi-relational data mining techniques.
- Critical assessment of information extraction systems in biology, 2003. www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html.]]Google Scholar
- A. Bernal, U. Ear, and N. Kyrpides. Genomes OnLline database (GOLD): A monitor of genome projects worldwide. Nucleic Acids Research, 29(1):126--127, 2001.]]Google ScholarCross Ref
- J. Bockhorst, M. Craven, D. Page, J. Shavlik, and J. Glasner. A Bayesian network approach to operon prediction. Bioinformatics, 19(10):1227--1235, 2003.]]Google ScholarCross Ref
- J. Bockhorst, Y. Qiu, J. Glasner, M. Liu, F. Blattner, and M. Craven. Predicting bacterial transcription units using sequence and expression data. Bioinformatics, 19(Suppl. 1):34--43, 2003.]]Google ScholarCross Ref
- C. Bryant, S. Muggleton, S. Oliver, D. Kell, P. Reiser, and R. King. Combining inductive logic programming, active learning, and robotics to discover the function of genes. Electronic Transactions in Artificial Intelligence, 2001.]]Google Scholar
- R. Bunescu, R. Ge, R. Kate, R. Mooney, E. Marcotte, and A. Ramani. Learning information extractors for proteins and their interactions. In Working Notes of the ICML Workshop on Machine Learning in Bioinformatics, 2003.]]Google Scholar
- C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78--94, 1997.]]Google ScholarCross Ref
- M. E. Califf and R. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177--210, 2003.]] Google ScholarDigital Library
- J. Cheng, C. Hatzis, H. Hayashi, M. A. Krogel, S. Morishita, D. Page, and J. Sese. KDD Cup 2001 report. SIGKDD Explorations, 3(2):47--64, 2002.]] Google ScholarDigital Library
- L. Chrisman, P. Langley, S. Bay, and A. Pohorille. Incorporating biological knowledge into evaluation of causal regulatory hypotheses. In Proceedings of the Eighth Pacific Symposium on Biocomputing. 2003.]]Google Scholar
- M. Craven. The genomics of a signaling pathway: A KDD cup challenge task. SIGKDD Explorations, 4(2):97--98, 2003.]] Google ScholarDigital Library
- M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77--86, Heidelberg, Germany, 1999. AAAI Press.]] Google ScholarDigital Library
- M. Craven, D. Page, J. Shavlik; J. Bockhorst, and J. Glasner. A probabilistic learning approach to wholegenome operon prediction. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 116--127, La Jolla, CA, 2000. AAAI Press.]] Google ScholarDigital Library
- M. Craven and S. Slattery. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning, 43(1--2):97--119, 2001.]] Google ScholarDigital Library
- A. Debnath, R. L. de Compadre, G. Debnath, A. Schusterman, and C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34(2):786--797, 1991.]]Google ScholarCross Ref
- L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). AAAI Press, New York, 1998.]]Google Scholar
- T. Dietterich, R. Lathrop, and T. Lozano-Pérez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1--2):31--71, 1997.]] Google ScholarDigital Library
- S. Džeroski, H. Blockeel, B. Kompare, S. Kramer, B. Pfahringer, and W. V. Laer. Experiments in predicting biodegradability. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 80--91. Springer-Verlag LNAI 1634, 1999.]] Google ScholarDigital Library
- S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32:41--62, 1998.]] Google ScholarDigital Library
- P. Finn, S. Muggleton, D. Page, and A. Srinivasan. Discovery of pharmacophores using Inductive Logic Programming. Machine Learning, 30 :241--270, 1998.]] Google ScholarDigital Library
- D. Freitag and N. Kushmerick. Boosted wrapper induction. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 577--583, Austin, TX, 2000. AAAI Press.]] Google ScholarDigital Library
- C. Helma and S. Kramer. A survey of the predictive toxicology challenge 2000--2001. Bioinformatics, 19( 10):1179--1182, 2003.]]Google Scholar
- L. Hirschman, J. Park, J. Tsujii, L. Wong, and C. Wu. Accomplishments and challenges in literature data mining for biology. Bioinformatics, 18:1553--1561, 2002.]]Google ScholarCross Ref
- L. Hood and D. Galas. The digital code of DNA. Nature, 421:444--448, 2003.]]Google ScholarCross Ref
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Data Mining and Knowledge Discovery, pages 13--23, 2000.]] Google ScholarCross Ref
- A. Jain, T. Dietterich, R. Lathrop, D. Chapman, R. Critchlow, B. Bauer, T. Webster, and T. Lozano-Pérez. Compass: a shape-based machine learning tool for drug design. Journal of Computer-Aided Molecular Design, 8:635--652, 1994.]]Google ScholarCross Ref
- A. Jain, K. Koile, B. Bauer, and D. Chapman. Compass: Predicting biological activities from molecular surface properties. Journal of Medicinal Chemistry, 37:2315--2327. 1994.]]Google ScholarCross Ref
- P. Karp, M. Riley, S. Paley, and A. Pellegrini-Toole. EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nucleic Acids Research, 25(l), 1997.]]Google Scholar
- R. King, S. Muggleton, R. Lewis, and M. Sternberg. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proceedings of the National Academy of Sciences, 89(23):11322--11326, 1992.]]Google ScholarCross Ref
- R. King, S. Muggleton, A. Srinivasan, and M. Sternberg. Structure-activity relationships derived by machine learning: the use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences, 93:438--442, 1996.]]Google ScholarCross Ref
- I. Korf, P. Flicek, D. Duan, and M. Brent. Integrating genomic homology into gene structure prediction. Bioinformatics, l7(Suppl. l):S140--S148, 2001.]]Google Scholar
- S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pages 136--143, 2001.]] Google ScholarDigital Library
- C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, and J. Wootton. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262:208--214, 1993.]]Google ScholarCross Ref
- N. Marchand-Geneste, K. Watson, B. Alsberg, and R. King. A new approach to pharmacophore mapping and qsar analysis using inductive logic programming. application to thermolysin inhibitors and glycogen phosphorylase b inhibitors. Journal of Medicinal Chemistry, 45(2):399--409, January 2002.]]Google ScholarCross Ref
- I. Meyer and R. Durbin. Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics, 18(10):1309--1318, 2002.]]Google ScholarCross Ref
- M. Molla, P. Andrae, J. Glasner, F. Blattner, and J. Shavlik. Interpreting microarray expression data using text annotating the genes. Information Sciences, 146:75--88, 2002.]] Google ScholarDigital Library
- M. Molla, M. Waddell, D. Page, ind J. Shavlik. Using machine learning to design and interpret geneexpression microarrays. AI Magazine, 2003. (In Press).]] Google ScholarDigital Library
- S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245--286, 1995.]]Google ScholarDigital Library
- S. Muggleton and C. Feng. Efficient induction of logic programs. In Proceedings of the First Conference on Algorithmic Learning Theory, Tokyo, 1990. Ohmsha.]]Google Scholar
- S. Muggleton, R. King, and M. Sternberg. Protein secondary structure prediction using logic-based machine learning. Protein Engineering, 5(7):647--657, 1992.]]Google ScholarCross Ref
- National Library of Medicine. Pubmed, 1999. http://www.ncbi.nlm.nih.gov/PubMed/.]]Google Scholar
- C. Perlich and F. Provost. Aggregation-based feature invention and relational concept classes. In Proceedings of KDD-03. ACM SIGKDD, August 2003.]] Google ScholarDigital Library
- J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239--2666, 1990.]] Google ScholarDigital Library
- J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning, pages 3--20, Vienna, Austria, 1993.]] Google ScholarDigital Library
- S. Ray and M. Craven. Representing sentence structure in hidden Markov models for information extraction. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 1273--1279, Seattle, WA, 2001. Morgan Kaufmann.]]Google Scholar
- M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. Genecards: Encyclopedia for genes, proteins and diseases, 1997. http://bighost.area.ba.cnr.it/GeneCards.]]Google Scholar
- P. Reiser, R. King, D. Kell, S. Muggleton, C. Bryant, and S. Oliver. Developing a logical model of yeast metabolism. Electronic Transactions in Artificial Intelligence, 2001.]]Google Scholar
- E. Riloff. An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence, 85:101--134, 1996.]] Google ScholarDigital Library
- E. Riloff. The sundance sentence analyzer, 1998. http://www.cs.utah.edu/projects/nlp/.]]Google Scholar
- B. Rost and C. Sander. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19:55--77, 1994.]]Google ScholarCross Ref
- S. Schmidler, J. Liu, and D. Brutlag. Bayesian segmentation of protein secondary structure. Journal of Computational Biology, 7:233--248, 2000.]]Google ScholarCross Ref
- E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller. Rich probabilistic models for gene expression. Bioinforrnatics, 1:l--10, 2001.]]Google Scholar
- H. Shatkay, S. Edwards, W. J. Wilbur, and M. Boguski. Genes, themes and microarrays: Using information retrieval for large-scale gene analysis. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 317--328, La Jolla, CA, 2000. AAAI Press.]] Google ScholarDigital Library
- M. Skounakis, M. Craven, and S. Ray. Hierarchical hidden Markov models for information extraction. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003. Morgan Kaufmann.]]Google Scholar
- A. Srinivasan and R. King. Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes. In S. Muggleton, editor, Proceedings of the 6th International Workshop on Inductive Logic Programming, pages 352--367. Stockholm University, Royal Institute of Technology, 1996.]] Google ScholarDigital Library
- M. Turcotte, S. Muggleton, and M. Sternberg. Automated discovery of structural signatures of protein fold and function. Journal of Molecular Biology, 306:591--605, 2001.]]Google ScholarCross Ref
- X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'O3), 2003.]] Google ScholarDigital Library
Index Terms
- Biological applications of multi-relational data mining
Recommendations
Multi-relational data mining: an introduction
Data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multi-relational data mining (MRDM) approaches look for patterns that involve multiple tables (relations) from a ...
Interesting pattern mining in multi-relational data
Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-...
Multi-relational Data Mining: a perspective
EPIA '01: Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint SolvingMulti-relational data mining (MRDM) is a form of data mining operating on data stored in multiple database tables. While machine learning and data mining are traditionally concerned with learning from single tables, MRDM is required in domains where the ...
Comments