Skip to main content
Top
Published in: International Journal of Data Science and Analytics 3/2018

15-05-2018 | Regular Paper

Data science at SoBigData: the European research infrastructure for social mining and big data analytics

Authors: Valerio Grossi, Beatrice Rapisarda, Fosca Giannotti, Dino Pedreschi

Published in: International Journal of Data Science and Analytics | Issue 3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most people have become “big data” producers in their daily life. Our desires, opinions, sentiments, social links as well as our mobile phone calls and GPS track leave traces of our behaviours. To transform these data into knowledge, value is a complex task of data science. This paper shows how the SoBigData Research Infrastructure supports data science towards the new frontiers of big data exploitation. Our research infrastructure serves a large community of social sensing and social mining researchers and it reduces the gap between existing research centres present at European level. SoBigData integrates resources and creates an infrastructure where sharing data and methods among text miners, visual analytics researchers, socio-economic scientists, network scientists, political scientists, humanities researchers can indeed occur. The main concepts related to SoBigData Research Infrastructure are presented. These concepts support virtual and transnational (on-site) access to the resources. Creating and supporting research communities are considered to be of vital importance for the success of our research infrastructure, as well as contributing to train the new generation of data scientists. Furthermore, this paper introduces the concept of exploratory and shows their role in the promotion of the use of our research infrastructure. The exploratories presented in this paper represent also a set of real applications in the context of social mining. Finally, a special attention is given to the legal and ethical aspects. Everything in SoBigData is supervised by an ethical and legal framework.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The IEEE Glossary defines interoperability as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [1].
 
2
The formal definition of Virtual and Transnational access (and their key performance indicators) is defined by the European Community—Infraia-1-2014-2015 call (https://​goo.​gl/​E6Cyze).
 
3
The description of the project consortium is available at the following link: http://​project.​sobigdata.​eu/​consortium.
 
4
The access of users not working in a EU or associated country is limited to 20% of the total amount of units of access provided under the grant.
 
5
The list of international experts inside the SoBigData advisory board is available at the following link: http://​project.​sobigdata.​eu/​management-bodies/​project-advisory-board.
 
6
The VREs [5] are web-based, community-oriented, comprehensive, flexible, and secure working environments. They are conceived and tailored to satisfy the needs of a designated community. Generally, they offer: (i) a rich array of services for data discovery and access, (ii) a data analytics platform, (iii) collaboration-oriented facilities enabling scientists.
 
7
The SoBigData e-Infra is powered by D4Science [6].
 
8
The master in big data of the University of Pisa is an annual course to become data scientists (http://​masterbigdata.​it/​en).
 
9
The Ph.D. in Data Science is aimed at educating the new generation of researchers that combine their disciplinary competences with those of a data scientist (http://​phd.​sns.​it/​it/​data-science/​).
 
10
The updated list and the description of all dissemination and training events inside SoBigData is available at the following link: http://​www.​sobigdata.​eu/​events/​.
 
11
The description of these initiatives (called Tuscan Big Data Challenge) is available at the following link: http://​www.​sobigdata.​eu/​blog/​tuscan-big-data-challenge-20172018.
 
12
The number of available resources is growing up thanks to the collaboration between the original partners, new organizations and users.
 
13
From May 2018 the new General Data Protection Regulation (GDPR) shall apply replacing the Data Protection Directive (DPD) and its national implementations.
 
14
The following link reports list of exploratory available in SoBigData: http://​www.​sobigdata.​eu/​exploratories/​.
 
15
Espresso is an Italian newspaper edited by Gruppo Editoriale l’Espresso.
 
16
An important Spanish newspaper edited by PRISA.
 
17
The walk-throughs are currently freely available and published at http://​sobigdata.​ee.
 
18
SoBigData project is developing a language and an execution platform for representing scientific process in highly heterogeneous e-Infrastructures in terms of so-called hybrid workflows. Currently, SoBigData workflows can express sequences of manually executable actions, which offer a formal and high-level description of a reasoning, protocol, or procedure, and machine-executable actions, which enable the fully automated execution of one (or more) web services [20].
 
19
The GATE platform provides end-to-end text processing solutions. A last version of the GATE platform is available at cloud.​gate.​ac.​uk.
 
20
Mímir is a DBMS used by GATE Infrastructure for collecting documents with information stored as annotations.
 
21
Using SoBigData Gateway the user can access all the information required for register and upload a dataset or a method (https://​sobigdata.​d4science.​org/​group/​sobigdata-gateway).
 
Literature
1.
go back to reference Geraci, A.: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, Piscataway (1991) Geraci, A.: IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, Piscataway (1991)
2.
go back to reference Greenwood, R., Augustin Landier, A., Thesmar, D.: Vulnerable banks. J. Financ. Econ. 115(3), 471–485 (2015)CrossRef Greenwood, R., Augustin Landier, A., Thesmar, D.: Vulnerable banks. J. Financ. Econ. 115(3), 471–485 (2015)CrossRef
3.
go back to reference Maynard, D., Greenwood, M., Roberts, I., Windsor, G., Bontcheva, K.: Real-time social media analytics through semantic annotation and linked open data. In: Proc of 2015 ACM Web Science, Oxford, United Kingdom (Jul 2015) Maynard, D., Greenwood, M., Roberts, I., Windsor, G., Bontcheva, K.: Real-time social media analytics through semantic annotation and linked open data. In: Proc of 2015 ACM Web Science, Oxford, United Kingdom (Jul 2015)
4.
go back to reference Maynard, D., Bontcheva, K.: Understanding climate change tweets: an open-source toolkit for social media analysis. In: Proc. of EnviroInfo 2015, Copenhagen (Sep. 2015) Maynard, D., Bontcheva, K.: Understanding climate change tweets: an open-source toolkit for social media analysis. In: Proc. of EnviroInfo 2015, Copenhagen (Sep. 2015)
6.
go back to reference Candela, L., Castelli, D., Manzi, A., Pagano, P.: Realising virtual research environments by hybrid cata infrastructures: the D4Science experience. In: International Symposium on Grids and Clouds (ISGC), Proceedings of Science PoS(ISGC2014) (2014) Candela, L., Castelli, D., Manzi, A., Pagano, P.: Realising virtual research environments by hybrid cata infrastructures: the D4Science experience. In: International Symposium on Grids and Clouds (ISGC), Proceedings of Science PoS(ISGC2014) (2014)
11.
12.
go back to reference Grossi, V., Romei, R., Ruggieri, S.: A case study in sequential pattern mining for IT-operational risk, machine learning and knowledge discovery in databases. In: European Conference (ECML/PKDD), pp 424–439 (2008) Grossi, V., Romei, R., Ruggieri, S.: A case study in sequential pattern mining for IT-operational risk, machine learning and knowledge discovery in databases. In: European Conference (ECML/PKDD), pp 424–439 (2008)
13.
go back to reference Coletto, M., Esuli, A., Lucchese, C., Muntean, C., Nardini, F.M., Perego, R., Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1270–1277 (2016) Coletto, M., Esuli, A., Lucchese, C., Muntean, C., Nardini, F.M., Perego, R., Renso, C.: Sentiment-enhanced multidimensional analysis of online social networks: perception of the mediterranean refugees crisis. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1270–1277 (2016)
14.
go back to reference Bontcheva, K., Rout, D.P.: Making sense of social media streams through semantics: a survey. Seman. Web 5(5), 373–403 (2014) Bontcheva, K., Rout, D.P.: Making sense of social media streams through semantics: a survey. Seman. Web 5(5), 373–403 (2014)
15.
go back to reference Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. CoRR abs/1701.03017 (2017) Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M.: The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. CoRR abs/1701.03017 (2017)
16.
go back to reference Cresci, S., Tesconi, M., Cimino, A., Dell’Orletta, F.: A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. WWW (Companion Volume), 1195–1200 (2015) Cresci, S., Tesconi, M., Cimino, A., Dell’Orletta, F.: A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages. WWW (Companion Volume), 1195–1200 (2015)
17.
go back to reference Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017)CrossRef Trasarti, R., Guidotti, R., Monreale, A., Giannotti, F.: MyWay: location prediction via mobility profiling. Inf. Syst. 64, 350–367 (2017)CrossRef
18.
go back to reference Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance CRM. ACM TIST 8(1), 14:1–14:26 (2016) Nanni, M., Trasarti, R., Monreale, A., Grossi, V., Pedreschi, D.: Driving profiles computation and monitoring for car insurance CRM. ACM TIST 8(1), 14:1–14:26 (2016)
19.
go back to reference Guidotti, R., Trasarti, R., Nanni, M., Giannotti, F.: Towards user-centric data management: individual mobility analytics for collective services, pp. 80–83. MobiGIS (2015) Guidotti, R., Trasarti, R., Nanni, M., Giannotti, F.: Towards user-centric data management: individual mobility analytics for collective services, pp. 80–83. MobiGIS (2015)
20.
go back to reference Candela, L., Manghi, P., Giannotti, F., Grossi, V., Trasarti, R.: HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures, D-Lib Magazine 23(1/2) (2017) Candela, L., Manghi, P., Giannotti, F., Grossi, V., Trasarti, R.: HyWare: a HYbrid Workflow lAnguage for Research E-infrastructures, D-Lib Magazine 23(1/2) (2017)
22.
go back to reference Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D.: Legal and ethical framework for SoBigData 1, SoBigData project deliverable. https://goo.gl/NUiWhR (2016) Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D.: Legal and ethical framework for SoBigData 1, SoBigData project deliverable. https://​goo.​gl/​NUiWhR (2016)
23.
go back to reference Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D., Lishchuck, I.: Legal and ethical framework for SoBigData 2, SoBigData project deliverable. https://goo.gl/5MLkzN (2017) Hänold, S., Forgó, N., van den Hoven, J., Mahieu, R., van Putten, D., Lishchuck, I.: Legal and ethical framework for SoBigData 2, SoBigData project deliverable. https://​goo.​gl/​5MLkzN (2017)
Metadata
Title
Data science at SoBigData: the European research infrastructure for social mining and big data analytics
Authors
Valerio Grossi
Beatrice Rapisarda
Fosca Giannotti
Dino Pedreschi
Publication date
15-05-2018
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 3/2018
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-018-0126-x

Other articles of this Issue 3/2018

International Journal of Data Science and Analytics 3/2018 Go to the issue

Premium Partner