Skip to main content

2015 | OriginalPaper | Buchkapitel

Enabling Non-expert Users to Apply Data Mining for Bridging the Big Data Divide

verfasst von : Roberto Espinosa, Diego García-Saiz, Marta Zorrilla, Jose Jacobo Zubcoff, Jose-Norberto Mazón

Erschienen in: Data-Driven Process Discovery and Analysis

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Non-expert users find complex to gain richer insights into the increasingly amount of available heterogeneous data, the so called big data. Advanced data analysis techniques, such as data mining, are difficult to apply due to the fact that (i) a great number of data mining algorithms can be applied to solve the same problem, and (ii) correctly applying data mining techniques always requires dealing with the inherent features of the data source. Therefore, we are attending a novel scenario in which non-experts are unable to take advantage of big data, while data mining experts do: the big data divide. In order to bridge this gap, we propose an approach to offer non-expert miners a tool that just by uploading their data sets, return them the more accurate mining pattern without dealing with algorithms or settings, thanks to the use of a data mining algorithm recommender. We also incorporate a previous task to help non-expert users to specify data mining requirements and a later task in which users are guided in interpreting data mining results. Furthermore, we experimentally test the feasibility of our approach, in particular, the method to build recommenders in an educational context, where instructors of e-learning courses are non-expert data miners who need to discover how their courses are used in order to make informed decisions to improve them.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For us, a “non-expert user” is one who has basic knowledge of statistics but does not know how to apply data mining algorithms satisfactorily.
 
3
Attribute-Relation File Format (ARFF), a file format used by the data mining tool Weka [6] to store data.
 
4
Our Taverna workflow was designed to be useful for any mining technique, but in this work we only consider classification techniques.
 
Literatur
1.
Zurück zum Zitat Abadi, D., Agrawal, R., Ailamaki, A., Balazinska, M., Bernstein, P.A., Carey, M.J., Chaudhuri, S., Dean, J., Doan, A., Franklin, M.J., Gehrke, J., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Jagadish, H., Kossmann, D., Madden, S., Mehrotra, S., Milo, T., Naughton, J.F., Ramakrishnan, R., Markl, V., Olston, C., Ooi, B.C., Christopher, R., Suciu, D., Stonebraker, M., Walter, T., Widom, J.: The beckman report on database research (2013). http://beckman.cs.wisc.edu/beckman-report2013.pdf Abadi, D., Agrawal, R., Ailamaki, A., Balazinska, M., Bernstein, P.A., Carey, M.J., Chaudhuri, S., Dean, J., Doan, A., Franklin, M.J., Gehrke, J., Haas, L.M., Halevy, A.Y., Hellerstein, J.M., Ioannidis, Y.E., Jagadish, H., Kossmann, D., Madden, S., Mehrotra, S., Milo, T., Naughton, J.F., Ramakrishnan, R., Markl, V., Olston, C., Ooi, B.C., Christopher, R., Suciu, D., Stonebraker, M., Walter, T., Widom, J.: The beckman report on database research (2013). http://​beckman.​cs.​wisc.​edu/​beckman-report2013.​pdf
2.
Zurück zum Zitat Blockeel, H., Vanschoren, J.: Experiment databases: towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007). http://dx.doi.org/10.1007/978-3-540-74976-9_5 CrossRef Blockeel, H., Vanschoren, J.: Experiment databases: towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007). http://​dx.​doi.​org/​10.​1007/​978-3-540-74976-9_​5 CrossRef
3.
Zurück zum Zitat Diamantini, C., Potena, D., Storti, E.: Ontology-driven KDD process composition. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 285–296. Springer, Heidelberg (2009) CrossRef Diamantini, C., Potena, D., Storti, E.: Ontology-driven KDD process composition. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 285–296. Springer, Heidelberg (2009) CrossRef
4.
Zurück zum Zitat Espinosa, R., García-Saiz, D., Zorrilla, M.E., Zubcoff, J.J., Mazón, J.N.: Development of a knowledge base for enabling non-expert users to apply data mining algorithms. In: Accorsi, R., Ceravolo, P., Cudré-Mauroux, P. (eds.) SIMPDA, CEUR Workshop Proceedings, vol. 1027, pp. 46–61. CEUR-WS.org (2013) Espinosa, R., García-Saiz, D., Zorrilla, M.E., Zubcoff, J.J., Mazón, J.N.: Development of a knowledge base for enabling non-expert users to apply data mining algorithms. In: Accorsi, R., Ceravolo, P., Cudré-Mauroux, P. (eds.) SIMPDA, CEUR Workshop Proceedings, vol. 1027, pp. 46–61. CEUR-WS.org (2013)
5.
Zurück zum Zitat Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)CrossRef Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)CrossRef
6.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)CrossRef
7.
Zurück zum Zitat Hämäläinen, W., Vinni, M.: Comparison of machine learning methods for intelligent tutoring systems. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.) ITS 2006. LNCS, vol. 4053, pp. 525–534. Springer, Heidelberg (2006). doi:10.1007/11774303_52 CrossRef Hämäläinen, W., Vinni, M.: Comparison of machine learning methods for intelligent tutoring systems. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.) ITS 2006. LNCS, vol. 4053, pp. 525–534. Springer, Heidelberg (2006). doi:10.​1007/​11774303_​52 CrossRef
8.
Zurück zum Zitat Hilario, M.: e-lico annual report 2010. Université de Geneve, Technical report (2010) Hilario, M.: e-lico annual report 2010. Université de Geneve, Technical report (2010)
9.
Zurück zum Zitat Hilario, M., Kalousis, A., Nguyen, P., Woznica, A.: A data mining ontology for algorithm selection and meta-mining. In: ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, SoKD 2009, pp. 76–87 (2009) Hilario, M., Kalousis, A., Nguyen, P., Woznica, A.: A data mining ontology for algorithm selection and meta-mining. In: ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery, SoKD 2009, pp. 76–87 (2009)
10.
Zurück zum Zitat Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A.: Ontology-based meta-mining of knowledge discovery workflows. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds.) Meta-Learning in Computational Intelligence. SCI, vol. 358, pp. 273–315. Springer, Heidelberg (2011) CrossRef Hilario, M., Nguyen, P., Do, H., Woznica, A., Kalousis, A.: Ontology-based meta-mining of knowledge discovery workflows. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds.) Meta-Learning in Computational Intelligence. SCI, vol. 358, pp. 273–315. Springer, Heidelberg (2011) CrossRef
11.
Zurück zum Zitat Kalousis, A., Hilario, M.: Model selection via meta-learning: a comparative study. In: 12th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2000, Proceedings, pp. 406–413 (2000) Kalousis, A., Hilario, M.: Model selection via meta-learning: a comparative study. In: 12th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2000, Proceedings, pp. 406–413 (2000)
12.
Zurück zum Zitat Kietz, J.U., Serban, F., Bernstein, A., Fischer, S.: Designing kdd-workflows via htn-planning. In: Raedt, L.D., Bessière, C., Dubois, D., Doherty, P., Frasconi, P., Heintz, F., Lucas, P.J.F. (eds.) ECAI: Frontiers in Artificial Intelligence and Applications, vol. 242, pp. 1011–1012. IOS Press (2012) Kietz, J.U., Serban, F., Bernstein, A., Fischer, S.: Designing kdd-workflows via htn-planning. In: Raedt, L.D., Bessière, C., Dubois, D., Doherty, P., Frasconi, P., Heintz, F., Lucas, P.J.F. (eds.) ECAI: Frontiers in Artificial Intelligence and Applications, vol. 242, pp. 1011–1012. IOS Press (2012)
13.
Zurück zum Zitat Kriegel, H.P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Min. Knowl. Discov. 15(1), 87–97 (2007)CrossRefMathSciNet Kriegel, H.P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Min. Knowl. Discov. 15(1), 87–97 (2007)CrossRefMathSciNet
14.
Zurück zum Zitat Nisbet, R., Elder, J., Miner, G.: Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Boston (2009) MATH Nisbet, R., Elder, J., Miner, G.: Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Boston (2009) MATH
15.
Zurück zum Zitat Panov, P., Soldatova, L.N., Džeroski, S.: Towards an ontology of data mining investigations. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 257–271. Springer, Heidelberg (2009) CrossRef Panov, P., Soldatova, L.N., Džeroski, S.: Towards an ontology of data mining investigations. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 257–271. Springer, Heidelberg (2009) CrossRef
16.
Zurück zum Zitat Parreiras, F.S., Staab, S., Winter, A.: On marrying ontological and metamodeling technical spaces. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE 2007, pp. 439–448. ACM, New York (2007). http://doi.acm.org/10.1145/1287624.1287687 Parreiras, F.S., Staab, S., Winter, A.: On marrying ontological and metamodeling technical spaces. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE 2007, pp. 439–448. ACM, New York (2007). http://​doi.​acm.​org/​10.​1145/​1287624.​1287687
17.
Zurück zum Zitat Romero, C., Ventura, S.: Educational data mining: a review of the state-of-the-art. IEEE Tans. Syst. Man and Cybern. Part C Appl. Rev. 40(6), 601–618 (2010)CrossRef Romero, C., Ventura, S.: Educational data mining: a review of the state-of-the-art. IEEE Tans. Syst. Man and Cybern. Part C Appl. Rev. 40(6), 601–618 (2010)CrossRef
19.
Zurück zum Zitat Soldatova, L., King, R.: An ontology of scientific experiments. J. R. Soc. Interface 3(11), 795–803 (2006)CrossRef Soldatova, L., King, R.: An ontology of scientific experiments. J. R. Soc. Interface 3(11), 795–803 (2006)CrossRef
20.
Zurück zum Zitat Vanschoren, J., Blockeel, H.: Stand on the shoulders of giants: towards a portal for collaborative experimentation in data mining. In: International Workshop on Third Generation Data Mining at ECML PKDD, 1, 88–89, September 2009 Vanschoren, J., Blockeel, H.: Stand on the shoulders of giants: towards a portal for collaborative experimentation in data mining. In: International Workshop on Third Generation Data Mining at ECML PKDD, 1, 88–89, September 2009
21.
Zurück zum Zitat Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G.: Experiment databases - a new way to share, organize and learn from experiments. Mach. Learn. 87(2), 127–158 (2012)CrossRefMATHMathSciNet Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G.: Experiment databases - a new way to share, organize and learn from experiments. Mach. Learn. 87(2), 127–158 (2012)CrossRefMATHMathSciNet
22.
Zurück zum Zitat Vanschoren, J., Soldatova, L.: Exposé: an ontology for data mining experiments. In: International Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-2010), pp. 31–46, September 2010 Vanschoren, J., Soldatova, L.: Exposé: an ontology for data mining experiments. In: International Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-2010), pp. 31–46, September 2010
23.
Zurück zum Zitat Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. IJCSA 1(1), 31–45 (2004) Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. IJCSA 1(1), 31–45 (2004)
25.
Zurück zum Zitat Záková, M., Kremen, P., Zelezný, F., Lavrac, N.: Automating knowledge discovery workflow composition through ontology-based planning. IEEE Trans. Autom. Sci. Eng. 8(2), 253–264 (2011)CrossRef Záková, M., Kremen, P., Zelezný, F., Lavrac, N.: Automating knowledge discovery workflow composition through ontology-based planning. IEEE Trans. Autom. Sci. Eng. 8(2), 253–264 (2011)CrossRef
26.
Zurück zum Zitat Zorrilla, M.E., García-Saiz, D.: Mining Service to Assist Instructors involved in Virtual Education. Business Intelligence Applications and the Web: Models, Systems and Technologies. Information Science Reference (IGI Global Publishers), September 2011 Zorrilla, M.E., García-Saiz, D.: Mining Service to Assist Instructors involved in Virtual Education. Business Intelligence Applications and the Web: Models, Systems and Technologies. Information Science Reference (IGI Global Publishers), September 2011
Metadaten
Titel
Enabling Non-expert Users to Apply Data Mining for Bridging the Big Data Divide
verfasst von
Roberto Espinosa
Diego García-Saiz
Marta Zorrilla
Jose Jacobo Zubcoff
Jose-Norberto Mazón
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-46436-6_4