Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 5-6/2014

01-09-2014

Ontology of core data mining entities

Authors: Panče Panov, Larisa Soldatova, Sašo Džeroski

Published in: Data Mining and Knowledge Discovery | Issue 5-6/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines the most essential data mining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend. OntoDM-core is available at http://​www.​ontodm.​com.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
OBO: http://​www.​obofoundry.​org (accessed 1 June 2014).
 
2
GO: http://​www.​geneontology.​org (accessed 1 June 2014).
 
3
NCI Thesaurus: http://​ncit.​nci.​nih.​gov (accessed 1 June 2014).
 
5
SNOMED-CT: http://​www.​ihtsdo.​org/​snomed-ct (accessed 1 June 2014).
 
6
OBI: http://​www.​obi-ontology.​org (accessed 1 June 2014).
 
7
BioPortal: http://​bioportal.​bioontology.​org (accessed 1 June 2014).
 
8
BFO: http://​www.​ifomis.​org/​bfo (accessed 1 June 2014).
 
9
RO: http://​obofoundry.​org/​ro (accessed 1 June 2014).
 
10
PMML: http://​www.​dmg.​org/​ (accessed 1 June 2014).
 
13
OWL-DL: http://​www.​w3.​org/​TR/​owl-guide (accessed 1 June 2014).
 
14
Protégé: http://​protege.​stanford.​edu (accessed 1 June 2014).
 
15
OBO Foundry principles: http://​obofoundry.​org/​crit.​shtml (accessed 1 June 2014).
 
16
In Table 7 from the Appendix, we list all relations used in OntoDM-core.
 
17
OntoFox: http://​ontofox.​hegroup.​org (accessed 1 June 2014).
 
18
In the remainder of the article, italic formatting denotes ontology class.
 
19
Table 8 in the Appendix lists the typical competency questions OntoDM-core is designed to answer.
 
20
Oxford dictionary: http://​oxforddictionari​es.​com/​definition/​scenario (accessed 1 June 2014).
 
23
Hermit reasoner: http://​www.​hermit-reasoner.​com/​ (accessed 1 June 2014).
 
25
BioPortal SPARQL endpoint: http://​sparql.​bioontology.​org (accessed 1 June 2014).
 
27
SPARQLer: http://​www.​sparql.​org/​sparql.​html (accessed 1 June 2014).
 
28
Robot Scientist Project: http://​goo.​gl/​6wazqw and http://​goo.​gl/​Iq6WGS (accessed 1 June 2014).
 
29
BAO: http://​bioassayontology​.​org (accessed 1 June 2014).
 
30
QSAR Chemoinformatics repository: http://​cheminformatics.​org/​datasets/​#qsar (accessed 1 June 2014).
 
31
Example MDL Molfile http://​mychem.​sourceforge.​net/​doc/​apes06.​html (accessed 1 June 2014).
 
33
A pair of molecules consisting of one chiral molecule and the mirror image of this molecule. The molecules making up an enantiomeric pair rotate the plane of polarized light in equal, but opposite, directions.
 
35
NACTEM centre: http://​www.​nactem.​ac.​uk/​cheta (accessed 1 June 2014).
 
36
ChEBI: http://​www.​ebi.​ac.​uk/​chebi (accessed 1 June 2014).
 
40
CEO: http://​goo.​gl/​AUktCK (accessed 1 June 2014).
 
41
WSMO: http://​www.​wsmo.​org (accessed 1 June 2014).
 
Literature
go back to reference Avery MA, Alvim-Gaston M, Rodrigues CR, Barreiro EJ, Cohen FE, Sabnis YA, Woolfrey JR (2002) Structure activity relationships of the antimalarial agent artemisinin: the development of predictive in vitro potency models using CoMFA and HQSAR methodologies. J Med Chem 45:292–303. doi:10.1021/jm0100234 CrossRef Avery MA, Alvim-Gaston M, Rodrigues CR, Barreiro EJ, Cohen FE, Sabnis YA, Woolfrey JR (2002) Structure activity relationships of the antimalarial agent artemisinin: the development of predictive in vitro potency models using CoMFA and HQSAR methodologies. J Med Chem 45:292–303. doi:10.​1021/​jm0100234 CrossRef
go back to reference Bakir GH, Hofmann T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SVN (2007) Predicting structured data. Neural information processing. The MIT Press, Cambridge, MA Bakir GH, Hofmann T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SVN (2007) Predicting structured data. Neural information processing. The MIT Press, Cambridge, MA
go back to reference Bayardo RJ (2002) The many roles of constraints in data mining: letter from the guest editor (special issue on constraints in data mining). SIGKDD Explorations 4(1):i–ii Bayardo RJ (2002) The many roles of constraints in data mining: letter from the guest editor (special issue on constraints in data mining). SIGKDD Explorations 4(1):i–ii
go back to reference Bernstein A, Provost F, Hill S (2005) Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Trans Knowl Data Eng 17(4):503–518. doi:10.1109/TKDE.2005.67 CrossRef Bernstein A, Provost F, Hill S (2005) Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification. IEEE Trans Knowl Data Eng 17(4):503–518. doi:10.​1109/​TKDE.​2005.​67 CrossRef
go back to reference Blockeel H, DeRaedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, pp 55–63 Blockeel H, DeRaedt L, Ramon J (1998) Top-down induction of clustering trees. In: Proceedings of the 15th international conference on machine learning, Morgan Kaufmann, pp 55–63
go back to reference Brezany P, Janciak I, Tjoa AM (2007) Ontology-based construction of grid data mining workflows. In: Data mining with ontologies: implementations, findings and frameworks, IGI Global, pp 182–210. doi: 10.4018/978-1-59904-618-1.ch010 Brezany P, Janciak I, Tjoa AM (2007) Ontology-based construction of grid data mining workflows. In: Data mining with ontologies: implementations, findings and frameworks, IGI Global, pp 182–210. doi: 10.​4018/​978-1-59904-618-1.​ch010
go back to reference Cannataro M, Comito C (2003) A data mining ontology for GRID programming. In: Proceedings of 1st international workshop on semantics in peer-to-peer and grid computing, pp 113–134 Cannataro M, Comito C (2003) A data mining ontology for GRID programming. In: Proceedings of 1st international workshop on semantics in peer-to-peer and grid computing, pp 113–134
go back to reference Diamantini C, Potena D (2008) Semantic annotation and services for KDD tools sharing and reuse. In: ICDMW ’08: proceedings of the 2008 IEEE ICDM workshops, IEEE computer society, pp 761–770. doi:10.1109/ICDMW.2008.43 Diamantini C, Potena D (2008) Semantic annotation and services for KDD tools sharing and reuse. In: ICDMW ’08: proceedings of the 2008 IEEE ICDM workshops, IEEE computer society, pp 761–770. doi:10.​1109/​ICDMW.​2008.​43
go back to reference Fox MS, Grüninger M (1994) Ontologies for enterprise integration. In: CoopIS, pp 82–89 Fox MS, Grüninger M (1994) Ontologies for enterprise integration. In: CoopIS, pp 82–89
go back to reference Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: Proceedings of 13th international conference on knowledge engineering and knowledge management. Ontologies and the semantic web, pp 166–181. doi:10.1007/3-540-45810-7_18 Gangemi A, Guarino N, Masolo C, Oltramari A, Schneider L (2002) Sweetening ontologies with DOLCE. In: Proceedings of 13th international conference on knowledge engineering and knowledge management. Ontologies and the semantic web, pp 166–181. doi:10.​1007/​3-540-45810-7_​18
go back to reference Garcia J, Garcia-Penalvo FJ, Theron R (2010) A survey on ontology metrics. In: Communications in computer and information science, vol 111, Springer, Berlin, pp 22–27. doi:10.1007/978-3-642-16318-0_4 Garcia J, Garcia-Penalvo FJ, Theron R (2010) A survey on ontology metrics. In: Communications in computer and information science, vol 111, Springer, Berlin, pp 22–27. doi:10.​1007/​978-3-642-16318-0_​4
go back to reference Grenon P, Smith B, Goldberg L (2004) Biodynamic ontology: applying BFO in the biomedical domain. In: Pisanelli D, (ed) Ontologies in medicine, vol 102. IOS, Amsterdam, pp 20–38. doi:10.3233/978-1-60750-945-5-20 Grenon P, Smith B, Goldberg L (2004) Biodynamic ontology: applying BFO in the biomedical domain. In: Pisanelli D, (ed) Ontologies in medicine, vol 102. IOS, Amsterdam, pp 20–38. doi:10.​3233/​978-1-60750-945-5-20
go back to reference Grüninger M, Fox M (1995) Methodology for the design and evaluation of ontologies. In: IJCAI’95, workshop on basic ontological issues in knowledge sharing Grüninger M, Fox M (1995) Methodology for the design and evaluation of ontologies. In: IJCAI’95, workshop on basic ontological issues in knowledge sharing
go back to reference Guha R, Jurs PC (2004) Development of QSAR models to predict and interpret the biological activity of artemisinin analogues. J Chem Inf Comput Sci 44:1440–1449. doi:10.1021/ci0499469 Guha R, Jurs PC (2004) Development of QSAR models to predict and interpret the biological activity of artemisinin analogues. J Chem Inf Comput Sci 44:1440–1449. doi:10.​1021/​ci0499469
go back to reference Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) The blue obelisk-interoperability in chemical informatics. J Chem Inf Model 46(3):991–998. doi:10.1021/ci050400b CrossRef Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) The blue obelisk-interoperability in chemical informatics. J Chem Inf Model 46(3):991–998. doi:10.​1021/​ci050400b CrossRef
go back to reference Hand DJ, Smyth P, Mannila H (2001) Principles of data mining. MIT Press, Cambridge, MA Hand DJ, Smyth P, Mannila H (2001) Principles of data mining. MIT Press, Cambridge, MA
go back to reference Hilario M, Nguyen P, Do H, Woznica A, Kalousis A (2011) Ontology-based meta-mining of knowledge discovery workflows. In: Meta-learning in computational intelligence, studies in computational intelligence, vol 358, Springer, Berlin, pp 273–315. doi:10.1007/978-3-642-20980-2_9 Hilario M, Nguyen P, Do H, Woznica A, Kalousis A (2011) Ontology-based meta-mining of knowledge discovery workflows. In: Meta-learning in computational intelligence, studies in computational intelligence, vol 358, Springer, Berlin, pp 273–315. doi:10.​1007/​978-3-642-20980-2_​9
go back to reference ISO (2007) ISO/IEC 11404:2007—Information Technology—General-Purpose datatypes (GPD). Tech. rep, International Organization for Standardization ISO (2007) ISO/IEC 11404:2007—Information Technology—General-Purpose datatypes (GPD). Tech. rep, International Organization for Standardization
go back to reference Keet CM, Lawrynowicz A, d’Amato C, Hilario M (2013) Modeling issues and choices in the data mining optimisation ontology. In: 8th workshop on OWL: experiences and directions (OWLED-13), 26–27 May 2013, Montpellier Keet CM, Lawrynowicz A, d’Amato C, Hilario M (2013) Modeling issues and choices in the data mining optimisation ontology. In: 8th workshop on OWL: experiences and directions (OWLED-13), 26–27 May 2013, Montpellier
go back to reference Kietz JU, F Serban AB, Fischer S (2010) Data mining workflow templates for intelligent discovery assistance and Auto-Experimentation. In: ECML/PKDD 2010 workshop on third generation data mining: towards service-oriented knowledge discovery (SoKD-10), pp 1–12 Kietz JU, F Serban AB, Fischer S (2010) Data mining workflow templates for intelligent discovery assistance and Auto-Experimentation. In: ECML/PKDD 2010 workshop on third generation data mining: towards service-oriented knowledge discovery (SoKD-10), pp 1–12
go back to reference King RD, Muggleton SH, Srinivasan A, Sternberg MJ (1996) Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc Natl Acad Sci 93(1):438–442. doi:10.1073/pnas.93.1.438 CrossRef King RD, Muggleton SH, Srinivasan A, Sternberg MJ (1996) Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc Natl Acad Sci 93(1):438–442. doi:10.​1073/​pnas.​93.​1.​438 CrossRef
go back to reference Kremen P, Sirin E (2008) SPARQL-DL implementation experience. In: Proceedings of the fourth OWLED workshop on OWL: experiences and directions volume 496 of CEUR workshop proceedings Kremen P, Sirin E (2008) SPARQL-DL implementation experience. In: Proceedings of the fourth OWLED workshop on OWL: experiences and directions volume 496 of CEUR workshop proceedings
go back to reference López MF, Gómez-Pérez A, Sierra JP, Sierra AP (1999) Building a chemical ontology using methontology and the ontology design environment. IEEE Intell Syst 14:37–46. doi:10.1109/5254.747904 CrossRef López MF, Gómez-Pérez A, Sierra JP, Sierra AP (1999) Building a chemical ontology using methontology and the ontology design environment. IEEE Intell Syst 14:37–46. doi:10.​1109/​5254.​747904 CrossRef
go back to reference Panov P (2012) A modular ontology of data mining. PhD thesis, Jožef Stefan Iternational Postgraduate School, Ljubljana, Slovenia Panov P (2012) A modular ontology of data mining. PhD thesis, Jožef Stefan Iternational Postgraduate School, Ljubljana, Slovenia
go back to reference Panov P, Džeroski S, Soldatova LN (2008) OntoDM: an ontology of data mining. In: ICDMW ’08: proceedings of the 2008 IEEE ICDM workshops. IEEE Computer Society, pp 752–760 Panov P, Džeroski S, Soldatova LN (2008) OntoDM: an ontology of data mining. In: ICDMW ’08: proceedings of the 2008 IEEE ICDM workshops. IEEE Computer Society, pp 752–760
go back to reference Panov P, Soldatova L, Džeroski S (2010) Representing entities in the OntoDM data mining ontology. In: Inductive databases and constraint-based data mining, Springer, New York, pp 27–58. doi:10.1007/978-1-4419-7738-0_2 Panov P, Soldatova L, Džeroski S (2010) Representing entities in the OntoDM data mining ontology. In: Inductive databases and constraint-based data mining, Springer, New York, pp 27–58. doi:10.​1007/​978-1-4419-7738-0_​2
go back to reference Panov P, Soldatova L, Džeroski S (2013) OntoDM-KDD: ontology for representing the knowledge discovery process. In: DS 2013, LNAI 8140, Springer, Berlin, pp 126–140. doi:10.1007/978-3-642-40897-7_9 Panov P, Soldatova L, Džeroski S (2013) OntoDM-KDD: ontology for representing the knowledge discovery process. In: DS 2013, LNAI 8140, Springer, Berlin, pp 126–140. doi:10.​1007/​978-3-642-40897-7_​9
go back to reference Robinson P, Bauer S (2011) Introduction to bio-ontologies. Chapman & Hall, London Robinson P, Bauer S (2011) Introduction to bio-ontologies. Chapman & Hall, London
go back to reference Sirin E, Parsia B (2007) SPARQL-DL: SPARQL query for OWL-DL. In: 3rd OWL experiences and directions workshop (OWLED-2007) Sirin E, Parsia B (2007) SPARQL-DL: SPARQL query for OWL-DL. In: 3rd OWL experiences and directions workshop (OWLED-2007)
go back to reference Slavkov I, Gjorgjioski V, Struyf J, Džeroski S (2010) Finding explained groups of time-course gene expression profiles with predictive clustering trees. Mol BioSyst 6:729–740. doi:10.1039/b913690h CrossRef Slavkov I, Gjorgjioski V, Struyf J, Džeroski S (2010) Finding explained groups of time-course gene expression profiles with predictive clustering trees. Mol BioSyst 6:729–740. doi:10.​1039/​b913690h CrossRef
go back to reference Smith B et al (2007) The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotech 25(11):1251–1255. doi:10.1038/nbt1346 CrossRef Smith B et al (2007) The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotech 25(11):1251–1255. doi:10.​1038/​nbt1346 CrossRef
go back to reference Smith B, Ceusters W (2010) Ontological realism: a methodology for coordinated evolution of scientific ontologies. Appl Ontol 5(3–4):139–188. doi:10.3233/AO-2010-0079 Smith B, Ceusters W (2010) Ontological realism: a methodology for coordinated evolution of scientific ontologies. Appl Ontol 5(3–4):139–188. doi:10.​3233/​AO-2010-0079
go back to reference Soldatova LN, Lord Ph, Sansone SA, Stephens SM, Shah NH (2010) Selected papers from the 12th annual bio-ontologies meeting. J Biomed Semant 1(Suppl 1):I1CrossRef Soldatova LN, Lord Ph, Sansone SA, Stephens SM, Shah NH (2010) Selected papers from the 12th annual bio-ontologies meeting. J Biomed Semant 1(Suppl 1):I1CrossRef
go back to reference Struyf J, Dzeroski S (2005) Constraint based induction of multi-objective regression trees. In: KDID 2005. Lecture notes in computer science, vol 3933, Springer, pp 222–233. doi:10.1007/11733492_13 Struyf J, Dzeroski S (2005) Constraint based induction of multi-objective regression trees. In: KDID 2005. Lecture notes in computer science, vol 3933, Springer, pp 222–233. doi:10.​1007/​11733492_​13
go back to reference Suarez-Figueroa M C, Gomez-Perez A, Motta E, Gangemi A (2012) The NeOn methodology for ontology engineering. In: Ontology engineering in a networked world, pp 9–34. doi:10.1007/978-3-642-24794-1_2 Suarez-Figueroa M C, Gomez-Perez A, Motta E, Gangemi A (2012) The NeOn methodology for ontology engineering. In: Ontology engineering in a networked world, pp 9–34. doi:10.​1007/​978-3-642-24794-1_​2
go back to reference Sure Y, Staab S, Struder R (2009) Ontology engineering methodology. In: Staab S, Struder R (eds) Handbook on ontologies, 2nd edn. International Handbooks on Information Systems. Springer, Berlin, Heidelberg, pp 135–152. doi:10.1007/978-3-540-92673-3_6 Sure Y, Staab S, Struder R (2009) Ontology engineering methodology. In: Staab S, Struder R (eds) Handbook on ontologies, 2nd edn. International Handbooks on Information Systems. Springer, Berlin, Heidelberg, pp 135–152. doi:10.​1007/​978-3-540-92673-3_​6
go back to reference Uschold M, King M (1995) Towards a methodology for building ontologies. In: Workshop on basic ontological issues in knowledge sharing, held in conjunction with IJCAI-95 Uschold M, King M (1995) Towards a methodology for building ontologies. In: Workshop on basic ontological issues in knowledge sharing, held in conjunction with IJCAI-95
go back to reference Žáková M, Kremen P, Železný F, Lavrač N (2010) Automating knowledge discovery workflow composition through ontology-based planning. IEEE Trans Autom Sci Eng 8(2):253–264. doi:10.1109/TASE.2010.2070838 Žáková M, Kremen P, Železný F, Lavrač N (2010) Automating knowledge discovery workflow composition through ontology-based planning. IEEE Trans Autom Sci Eng 8(2):253–264. doi:10.​1109/​TASE.​2010.​2070838
go back to reference Ženko B, Džeroski S (2008) Learning classification rules for multiple target attributes. In: PAKDD. Lecture notes in computer science, vol 5012. Springer, pp 454–465. doi:10.1007/978-3-540-68125-0_40 Ženko B, Džeroski S (2008) Learning classification rules for multiple target attributes. In: PAKDD. Lecture notes in computer science, vol 5012. Springer, pp 454–465. doi:10.​1007/​978-3-540-68125-0_​40
Metadata
Title
Ontology of core data mining entities
Authors
Panče Panov
Larisa Soldatova
Sašo Džeroski
Publication date
01-09-2014
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 5-6/2014
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-014-0363-0

Other articles of this Issue 5-6/2014

Data Mining and Knowledge Discovery 5-6/2014 Go to the issue

Premium Partner