Skip to main content
Top
Published in: Journal of Intelligent Information Systems 2/2018

02-05-2018

Unified domain-specific language for collecting and processing data of social media

Authors: Nikolay Butakov, Maxim Petrov, Ksenia Mukhina, Denis Nasonov, Sergey Kovalchuk

Published in: Journal of Intelligent Information Systems | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different users and individual crawling scenarios, that can range from a simple query to a complex workflow containing multiple steps and requiring data from different networks to be collected. To address this problem, our paper proposes an approach based on a developed domain specific language (DSL) and architecture of distributed crawling system. The DSL has a declarative style that requires the user to define the description of needed data and based on an ontological model of social networks and the essential crawling techniques. Thus, the crawling system can be applied to collect the data from different online social networks within complex workflows along with the exploitation of various crawling methods implemented in a distributed computing environment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Arnaboldi, V., Conti, M., Passarella, A., Pezzoni, F. (2013). Ego networks in twitter: an experimental analysis. In INFOCOM, 2013 Proceedings IEEE (pp. 3459–3464): IEEE. Arnaboldi, V., Conti, M., Passarella, A., Pezzoni, F. (2013). Ego networks in twitter: an experimental analysis. In INFOCOM, 2013 Proceedings IEEE (pp. 3459–3464): IEEE.
go back to reference Avrachenkov, K.E., Mazalov, V.V., Tsynguev, B.T. (2015). Beta Current Flow Centrality for Weighted Networks. In Computational Social Networks (pp. 216–227): Springer International Publishing. Avrachenkov, K.E., Mazalov, V.V., Tsynguev, B.T. (2015). Beta Current Flow Centrality for Weighted Networks. In Computational Social Networks (pp. 216–227): Springer International Publishing.
go back to reference Bansal, N., & Koudas, N. (2007). Blogscope: spatio-temporal analysis of the blogosphere. In Proceedings of the 16th international conference on World Wide Web (pp. 1269–1270): ACM. Bansal, N., & Koudas, N. (2007). Blogscope: spatio-temporal analysis of the blogosphere. In Proceedings of the 16th international conference on World Wide Web (pp. 1269–1270): ACM.
go back to reference Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L. (2012). TwitterEcho: a distributed focused crawler to support open research with twitter data. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1233–1240): ACM. Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L. (2012). TwitterEcho: a distributed focused crawler to support open research with twitter data. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1233–1240): ACM.
go back to reference Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2012). Crawling social internetworking systems. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 506–510): IEEE. - (BFS, Random Walk and others). Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2012). Crawling social internetworking systems. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 506–510): IEEE. - (BFS, Random Walk and others).
go back to reference Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2015). A system for extracting structural information from Social Network accounts. Software: Practice and Experience, 45(9), 1251–1275. Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2015). A system for extracting structural information from Social Network accounts. Software: Practice and Experience, 45(9), 1251–1275.
go back to reference Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A. (2016). A model to support design and development of multiple-social-network applications. Information Sciences, 331, 99–119.MathSciNetCrossRef Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A. (2016). A model to support design and development of multiple-social-network applications. Information Sciences, 331, 99–119.MathSciNetCrossRef
go back to reference Buraya, K., Farseev, A., Filchenkov, A., Chua, T.S. (2017). Towards User Personality Profiling from Multiple Social Networks. In AAAI (pp. 4909–4910). Buraya, K., Farseev, A., Filchenkov, A., Chua, T.S. (2017). Towards User Personality Profiling from Multiple Social Networks. In AAAI (pp. 4909–4910).
go back to reference Butakov, N., Chuprova, Y., Knyazkov, K., Shindyapina, N., Boukhanovsky, A. (2015). Evolutionary-based Framework for Optimizing the Spread of Information on Twitter. Procedia Computer Science, 66, 287–296.CrossRef Butakov, N., Chuprova, Y., Knyazkov, K., Shindyapina, N., Boukhanovsky, A. (2015). Evolutionary-based Framework for Optimizing the Spread of Information on Twitter. Procedia Computer Science, 66, 287–296.CrossRef
go back to reference Dunbar, R.I.M., Arnaboldi, V., Conti, M., Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47.CrossRef Dunbar, R.I.M., Arnaboldi, V., Conti, M., Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47.CrossRef
go back to reference Duvanova, D., Nikolaev, A., Nikolsko-Rzhevskyy, A., Semenov, A. (2015). Violent conflict and online segregation: An analysis of social network communication across Ukraine’s regions. Journal of Comparative Economics. Duvanova, D., Nikolaev, A., Nikolsko-Rzhevskyy, A., Semenov, A. (2015). Violent conflict and online segregation: An analysis of social network communication across Ukraine’s regions. Journal of Comparative Economics.
go back to reference Farseev, A., Nie, L., Akbari, M., Chua, T.S. (2015). Harvesting multiple sources for user profile learning: a big data study. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (pp. 235–242): ACM. Farseev, A., Nie, L., Akbari, M., Chua, T.S. (2015). Harvesting multiple sources for user profile learning: a big data study. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (pp. 235–242): ACM.
go back to reference Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A. (2010). Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE (pp. 1–9). Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A. (2010). Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE (pp. 1–9).
go back to reference Hicks, A., & BE, D.F. (2015). Mining Twitter as a First Step toward Assessing the Adequacy of Gender Identification Terms on Intake Forms. Hicks, A., & BE, D.F. (2015). Mining Twitter as a First Step toward Assessing the Adequacy of Gender Identification Terms on Intake Forms.
go back to reference Kahanda, I., & Neville, J. (2009). Using Transactional Information to Predict Link Strength in Online Social Networks. ICWSM, 9, 74–81. Kahanda, I., & Neville, J. (2009). Using Transactional Information to Predict Link Strength in Online Social Networks. ICWSM, 9, 74–81.
go back to reference Knyazkov, K.V., Kovalchuk, S.V., Tchurov, T.N., Maryin, S.V., Boukhanovsky, A.V. (2012). CLAVIRE: e-Science infrastructure for data-driven computing. Journal of Computational Science, 3(6), 504–510.CrossRef Knyazkov, K.V., Kovalchuk, S.V., Tchurov, T.N., Maryin, S.V., Boukhanovsky, A.V. (2012). CLAVIRE: e-Science infrastructure for data-driven computing. Journal of Computational Science, 3(6), 504–510.CrossRef
go back to reference Kwak, H., Lee, C., Park, H., Moon, S. (2010). What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web (pp. 591–600): ACM. Kwak, H., Lee, C., Park, H., Moon, S. (2010). What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web (pp. 591–600): ACM.
go back to reference Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C. (2012). Tedas: A twitter-based event detection and analysis system. In 2012 ieee 28th international conference on Data engineering (icde) (pp. 1273–1276): IEEE. Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C. (2012). Tedas: A twitter-based event detection and analysis system. In 2012 ieee 28th international conference on Data engineering (icde) (pp. 1273–1276): IEEE.
go back to reference Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C. (2012). Processing and visualizing the data in tweets. ACM SIGMOD Record, 40(4), 21–27.CrossRef Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C. (2012). Processing and visualizing the data in tweets. ACM SIGMOD Record, 40(4), 21–27.CrossRef
go back to reference Mathioudakis, M., & Koudas, N. (2010). Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1155–1158): ACM. Mathioudakis, M., & Koudas, N. (2010). Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1155–1158): ACM.
go back to reference METRA, I. (2014). Influence based exploration of twitter social network. METRA, I. (2014). Influence based exploration of twitter social network.
go back to reference Papadakis, G., Tserpes, K., Sardis, E., Kardara, M., Papaoikonomou, A., Aisopos, F. (2012). Social media meta-API: leveraging the content of social networks. In Proceedings of the 21st international conference companion on World Wide Web (pp. 271–274): ACM. Papadakis, G., Tserpes, K., Sardis, E., Kardara, M., Papaoikonomou, A., Aisopos, F. (2012). Social media meta-API: leveraging the content of social networks. In Proceedings of the 21st international conference companion on World Wide Web (pp. 271–274): ACM.
go back to reference Psallidas, F., Ntoulas, A., Delis, A. (2013). Soc web: Efficient monitoring of social network activities. In Web Information Systems Engineering–WISE 2013 (pp. 118–136): Springer Berlin Heidelberg. Psallidas, F., Ntoulas, A., Delis, A. (2013). Soc web: Efficient monitoring of social network activities. In Web Information Systems Engineering–WISE 2013 (pp. 118–136): Springer Berlin Heidelberg.
go back to reference Serrano, D., Stroulia, E., Barbosa, D., Guana, V. (2012). Sociql: A query language for the socialweb, Springer Berlin Heidelberg. Serrano, D., Stroulia, E., Barbosa, D., Guana, V. (2012). Sociql: A query language for the socialweb, Springer Berlin Heidelberg.
go back to reference Shuai, H.H., Yang, D.N., Shen, C.Y., Yu, P.S., Chen, M.S. (2015). QMSampler: Joint Sampling of Multiple Networks with Quality Guarantee. arXiv:1502.07439. Shuai, H.H., Yang, D.N., Shen, C.Y., Yu, P.S., Chen, M.S. (2015). QMSampler: Joint Sampling of Multiple Networks with Quality Guarantee. arXiv:1502.​07439.
go back to reference Teng, S.Y., Yeh, M.Y., Chuang, K.T. (2015). Toward Understanding the Mobile Social Properties: An Analysis on Instagram Photo-Sharing Network. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 266–269): ACM. Teng, S.Y., Yeh, M.Y., Chuang, K.T. (2015). Toward Understanding the Mobile Social Properties: An Analysis on Instagram Photo-Sharing Network. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 266–269): ACM.
go back to reference Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Bhagat, N. (2014). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 147–156): ACM. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Bhagat, N. (2014). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 147–156): ACM.
go back to reference Valkanas, G., & Gunopulos, D. (2013). How the live web feels about events. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 639–648): ACM. Valkanas, G., & Gunopulos, D. (2013). How the live web feels about events. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 639–648): ACM.
go back to reference Valkanas, G., Saravanou, A., Gunopulos, D. (2014). A faceted crawler for the twitter service. In Web Information Systems Engineering–WISE 2014 (pp. 178–188): Springer International Publishing. Valkanas, G., Saravanou, A., Gunopulos, D. (2014). A faceted crawler for the twitter service. In Web Information Systems Engineering–WISE 2014 (pp. 178–188): Springer International Publishing.
go back to reference Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S. (2013). Exploiting hashtags for adaptive microblog crawling. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 311–315): ACM. Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S. (2013). Exploiting hashtags for adaptive microblog crawling. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 311–315): ACM.
go back to reference Wachowicz, M., Arteaga, M.D., Cha, S., Bourgeois, Y. (2015). Developing a streaming data processing workflow for querying space–time activities from geotagged tweets. Computers, Environment and Urban Systems. Wachowicz, M., Arteaga, M.D., Cha, S., Bourgeois, Y. (2015). Developing a streaming data processing workflow for querying space–time activities from geotagged tweets. Computers, Environment and Urban Systems.
go back to reference Xiong, F., Liu, Y., Zhang, Z. J., Zhu, J., Zhang, Y. (2012). An information diffusion model based on retweeting mechanism for online social media. Physics Letters A, 376(30), 2103–2108.CrossRef Xiong, F., Liu, Y., Zhang, Z. J., Zhu, J., Zhang, Y. (2012). An information diffusion model based on retweeting mechanism for online social media. Physics Letters A, 376(30), 2103–2108.CrossRef
go back to reference Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2–2): USENIX Association. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2–2): USENIX Association.
go back to reference Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I. (2012b). Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Presented as part of the. Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I. (2012b). Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Presented as part of the.
go back to reference Zou, J., Fekri, F., McLaughlin, S. W. (2015). Mining Streaming Tweets for Real-Time Event Credibility Prediction in Twitter. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1586–1589): ACM. Zou, J., Fekri, F., McLaughlin, S. W. (2015). Mining Streaming Tweets for Real-Time Event Credibility Prediction in Twitter. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1586–1589): ACM.
Metadata
Title
Unified domain-specific language for collecting and processing data of social media
Authors
Nikolay Butakov
Maxim Petrov
Ksenia Mukhina
Denis Nasonov
Sergey Kovalchuk
Publication date
02-05-2018
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 2/2018
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-018-0508-5

Other articles of this Issue 2/2018

Journal of Intelligent Information Systems 2/2018 Go to the issue

Premium Partner