Skip to main content
Erschienen in: International Journal of Data Science and Analytics 3/2018

15.05.2018 | Regular Paper

Data science at SoBigData: the European research infrastructure for social mining and big data analytics

verfasst von: Valerio Grossi, Beatrice Rapisarda, Fosca Giannotti, Dino Pedreschi

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most people have become “big data” producers in their daily life. Our desires, opinions, sentiments, social links as well as our mobile phone calls and GPS track leave traces of our behaviours. To transform these data into knowledge, value is a complex task of data science. This paper shows how the SoBigData Research Infrastructure supports data science towards the new frontiers of big data exploitation. Our research infrastructure serves a large community of social sensing and social mining researchers and it reduces the gap between existing research centres present at European level. SoBigData integrates resources and creates an infrastructure where sharing data and methods among text miners, visual analytics researchers, socio-economic scientists, network scientists, political scientists, humanities researchers can indeed occur. The main concepts related to SoBigData Research Infrastructure are presented. These concepts support virtual and transnational (on-site) access to the resources. Creating and supporting research communities are considered to be of vital importance for the success of our research infrastructure, as well as contributing to train the new generation of data scientists. Furthermore, this paper introduces the concept of exploratory and shows their role in the promotion of the use of our research infrastructure. The exploratories presented in this paper represent also a set of real applications in the context of social mining. Finally, a special attention is given to the legal and ethical aspects. Everything in SoBigData is supervised by an ethical and legal framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The IEEE Glossary defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [1].
 
2
The formal definition of Virtual and Transnational access (and their key performance indicators) is defined by the European Community—Infraia-1-2014-2015 call (https://​goo.​gl/​E6Cyze).
 
3
The description of the project consortium is available at the following link: http://​project.​sobigdata.​eu/​consortium.
 
4
The access of users not working in a EU or associated country is limited to 20% of the total amount of units of access provided under the grant.
 
5
The list of international experts inside the SoBigData advisory board is available at the following link: http://​project.​sobigdata.​eu/​management-bodies/​project-advisory-board.
 
6
The VREs [5] are web-based, community-oriented, comprehensive, flexible, and secure working environments. They are conceived and tailored to satisfy the needs of a designated community. Generally, they offer: (i) a rich array of services for data discovery and access, (ii) a data analytics platform, (iii) collaboration-oriented facilities enabling scientists.
 
7
The SoBigData e-Infra is powered by D4Science [6].
 
8
The master in big data of the University of Pisa is an annual course to become data scientists (http://​masterbigdata.​it/​en).
 
9
The Ph.D. in Data Science is aimed at educating the new generation of researchers that combine their disciplinary competences with those of a data scientist (http://​phd.​sns.​it/​it/​data-science/​).
 
10
The updated list and the description of all dissemination and training events inside SoBigData is available at the following link: http://​www.​sobigdata.​eu/​events/​.
 
11
The description of these initiatives (called Tuscan Big Data Challenge) is available at the following link: http://​www.​sobigdata.​eu/​blog/​tuscan-big-data-challenge-20172018.
 
12
The number of available resources is growing up thanks to the collaboration between the original partners, new organizations and users.
 
13
From May 2018 the new General Data Protection Regulation (GDPR) shall apply replacing the Data Protection Directive (DPD) and its national implementations.
 
14
The following link reports list of exploratory available in SoBigData: http://​www.​sobigdata.​eu/​exploratories/​.
 
15
Espresso is an Italian newspaper edited by Gruppo Editoriale l’Espresso.
 
16
An important Spanish newspaper edited by PRISA.
 
17
The walk-throughs are currently freely available and published at http://​sobigdata.​ee.
 
18
SoBigData project is developing a language and an execution platform for representing scientific process in highly heterogeneous e-Infrastructures in terms of so-called hybrid workflows. Currently, SoBigData workflows can express sequences of manually executable actions, which offer a formal and high-level description of a reasoning, protocol, or procedure, and machine-executable actions, which enable the fully automated execution of one (or more) web services [20].
 
19
The GATE platform provides end-to-end text processing solutions. A last version of the GATE platform is available at cloud.​gate.​ac.​uk.
 
20
Mímir is a DBMS used by GATE Infrastructure for collecting documents with information stored as annotations.
 
21
Using SoBigData Gateway the user can access all the information required for register and upload a dataset or a method (https://​sobigdata.​d4science.​org/​group/​sobigdata-gateway).
 
Literatur
1.
Zurück zum Zitat Geraci, A.: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, Piscataway (1991) Geraci, A.: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, Piscataway (1991)
2.
Zurück zum Zitat Greenwood, R., Augustin Landier, A., Thesmar, D.: Vulnerable banks. J. Financ. Econ. 115(3), 471–485 (2015)CrossRef Greenwood, R., Augustin Landier, A., Thesmar, D.: Vulnerable banks. J. Financ. Econ. 115(3), 471–485 (2015)CrossRef
3.
Zurück zum Zitat Maynard, D., Greenwood, M., Roberts, I., Windsor, G., Bontcheva, K.: Real-time social media analytics through semantic annotation and linked open data. In: Proc of 2015 ACM Web Science, Oxford, United Kingdom (Jul 2015) Maynard, D., Greenwood, M., Roberts, I., Windsor, G., Bontcheva, K.: Real-time social media analytics through semantic annotation and linked open data. In: Proc of 2015 ACM Web Science, Oxford, United Kingdom (Jul 2015)
4.
Zurück zum Zitat Maynard, D., Bontcheva, K.: Understanding climate change tweets: an open-source toolkit for social media analysis. In: Proc. of EnviroInfo 2015, Copenhagen (Sep. 2015) Maynard, D., Bontcheva, K.: Understanding climate change tweets: an open-source toolkit for social media analysis. In: Proc. of EnviroInfo 2015, Copenhagen (Sep. 2015)
6.
Zurück zum Zitat Candela, L., Castelli, D., Manzi, A., Pagano, P.: Realising virtual research environments by hybrid cata infrastructures: the D4Science experience. In: International Symposium on Grids and Clouds (ISGC), Proceedings of Science PoS(ISGC2014) (2014) Candela, L., Castelli, D., Manzi, A., Pagano, P.: Realising virtual research environments by hybrid cata infrastructures: the D4Science experience. In: International Symposium on Grids and Clouds (ISGC), Proceedings of Science PoS(ISGC2014) (2014)
11.
Zurück zum Zitat Garimella, K., De Francisci Morales, G., Gionis, A., Mathioudakis, M.: Reducing controversy by connecting opposing views. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17 (2017). https://doi.org/10.1145/3018661.3018703 Garimella, K., De Francisci Morales, G., Gionis, A., Mathioudakis, M.: Reducing controversy by connecting opposing views. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17 (2017). https://​doi.​org/​10.​1145/​3018661.​3018703
12.
Zurück zum Zitat Grossi, V., Romei, R., Ruggieri, S.: A case study in sequential pattern mining for IT-operational risk, machine learning and knowledge discovery in databases. In: European Conference (ECML/PKDD), pp 424–439 (2008) Grossi, V., Romei, R., Ruggieri, S.: A case study in sequential pattern mining for IT-operational risk, machine learning and knowledge discovery in databases. In: European Conference (ECML/PKDD), pp 424–439 (2008)
13.
Zurück zum Zitat Coletto, M., Esuli, A., Lucchese, C., Muntean, C., Nardini, F.M., Perego, R., Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1270–1277 (2016) Coletto, M., Esuli, A., Lucchese, C., Muntean, C., Nardini, F.M., Perego, R., Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1270–1277 (2016)
14.
Zurück zum Zitat Bontcheva, K., Rout, D.P.: Making sense of social media streams through semantics: a survey. Seman. Web 5(5), 373–403 (2014) Bontcheva, K., Rout, D.P.: Making sense of social media streams through semantics: a survey. Seman. Web 5(5), 373–403 (2014)
15.
Zurück zum Zitat Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. CoRR abs/1701.03017 (2017) Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. CoRR abs/1701.03017 (2017)
16.
Zurück zum Zitat Cresci, S., Tesconi, M., Cimino, A., Dell’Orletta, F.: A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. WWW (Companion Volume), 1195–1200 (2015) Cresci, S., Tesconi, M., Cimino, A., Dell’Orletta, F.: A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. WWW (Companion Volume), 1195–1200 (2015)
17.
Zurück zum Zitat Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017)CrossRef Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017)CrossRef
18.
Zurück zum Zitat Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance CRM. ACM TIST 8(1), 14:1–14:26 (2016) Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance CRM. ACM TIST 8(1), 14:1–14:26 (2016)
19.
Zurück zum Zitat Guidotti, R., Trasarti, R., Nanni, M., Giannotti, F.: Towards user-centric data management: individual mobility analytics for collective services, pp. 80–83. MobiGIS (2015) Guidotti, R., Trasarti, R., Nanni, M., Giannotti, F.: Towards user-centric data management: individual mobility analytics for collective services, pp. 80–83. MobiGIS (2015)
20.
Zurück zum Zitat Candela, L., Manghi, P., Giannotti, F., Grossi, V., Trasarti, R.: HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures, D-Lib Magazine 23(1/2) (2017) Candela, L., Manghi, P., Giannotti, F., Grossi, V., Trasarti, R.: HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures, D-Lib Magazine 23(1/2) (2017)
22.
Zurück zum Zitat Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D.: Legal and ethical framework for SoBigData 1, SoBigData project deliverable. https://goo.gl/NUiWhR (2016) Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D.: Legal and ethical framework for SoBigData 1, SoBigData project deliverable. https://​goo.​gl/​NUiWhR (2016)
23.
Zurück zum Zitat Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D., Lishchuck, I.: Legal and ethical framework for SoBigData 2, SoBigData project deliverable. https://goo.gl/5MLkzN (2017) Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D., Lishchuck, I.: Legal and ethical framework for SoBigData 2, SoBigData project deliverable. https://​goo.​gl/​5MLkzN (2017)
Metadaten
Titel
Data science at SoBigData: the European research infrastructure for social mining and big data analytics
verfasst von
Valerio Grossi
Beatrice Rapisarda
Fosca Giannotti
Dino Pedreschi
Publikationsdatum
15.05.2018
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 3/2018
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-018-0126-x

Weitere Artikel der Ausgabe 3/2018

International Journal of Data Science and Analytics 3/2018 Zur Ausgabe