Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2018

02.05.2018

Unified domain-specific language for collecting and processing data of social media

verfasst von: Nikolay Butakov, Maxim Petrov, Ksenia Mukhina, Denis Nasonov, Sergey Kovalchuk

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different users and individual crawling scenarios, that can range from a simple query to a complex workflow containing multiple steps and requiring data from different networks to be collected. To address this problem, our paper proposes an approach based on a developed domain specific language (DSL) and architecture of distributed crawling system. The DSL has a declarative style that requires the user to define the description of needed data and based on an ontological model of social networks and the essential crawling techniques. Thus, the crawling system can be applied to collect the data from different online social networks within complex workflows along with the exploitation of various crawling methods implemented in a distributed computing environment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Arnaboldi, V., Conti, M., Passarella, A., Pezzoni, F. (2013). Ego networks in twitter: an experimental analysis. In INFOCOM, 2013 Proceedings IEEE (pp. 3459–3464): IEEE. Arnaboldi, V., Conti, M., Passarella, A., Pezzoni, F. (2013). Ego networks in twitter: an experimental analysis. In INFOCOM, 2013 Proceedings IEEE (pp. 3459–3464): IEEE.
Zurück zum Zitat Avrachenkov, K.E., Mazalov, V.V., Tsynguev, B.T. (2015). Beta Current Flow Centrality for Weighted Networks. In Computational Social Networks (pp. 216–227): Springer International Publishing. Avrachenkov, K.E., Mazalov, V.V., Tsynguev, B.T. (2015). Beta Current Flow Centrality for Weighted Networks. In Computational Social Networks (pp. 216–227): Springer International Publishing.
Zurück zum Zitat Bansal, N., & Koudas, N. (2007). Blogscope: spatio-temporal analysis of the blogosphere. In Proceedings of the 16th international conference on World Wide Web (pp. 1269–1270): ACM. Bansal, N., & Koudas, N. (2007). Blogscope: spatio-temporal analysis of the blogosphere. In Proceedings of the 16th international conference on World Wide Web (pp. 1269–1270): ACM.
Zurück zum Zitat Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L. (2012). TwitterEcho: a distributed focused crawler to support open research with twitter data. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1233–1240): ACM. Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L. (2012). TwitterEcho: a distributed focused crawler to support open research with twitter data. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1233–1240): ACM.
Zurück zum Zitat Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2012). Crawling social internetworking systems. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 506–510): IEEE. - (BFS, Random Walk and others). Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2012). Crawling social internetworking systems. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 506–510): IEEE. - (BFS, Random Walk and others).
Zurück zum Zitat Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2015). A system for extracting structural information from Social Network accounts. Software: Practice and Experience, 45(9), 1251–1275. Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2015). A system for extracting structural information from Social Network accounts. Software: Practice and Experience, 45(9), 1251–1275.
Zurück zum Zitat Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A. (2016). A model to support design and development of multiple-social-network applications. Information Sciences, 331, 99–119.MathSciNetCrossRef Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A. (2016). A model to support design and development of multiple-social-network applications. Information Sciences, 331, 99–119.MathSciNetCrossRef
Zurück zum Zitat Buraya, K., Farseev, A., Filchenkov, A., Chua, T.S. (2017). Towards User Personality Profiling from Multiple Social Networks. In AAAI (pp. 4909–4910). Buraya, K., Farseev, A., Filchenkov, A., Chua, T.S. (2017). Towards User Personality Profiling from Multiple Social Networks. In AAAI (pp. 4909–4910).
Zurück zum Zitat Butakov, N., Chuprova, Y., Knyazkov, K., Shindyapina, N., Boukhanovsky, A. (2015). Evolutionary-based Framework for Optimizing the Spread of Information on Twitter. Procedia Computer Science, 66, 287–296.CrossRef Butakov, N., Chuprova, Y., Knyazkov, K., Shindyapina, N., Boukhanovsky, A. (2015). Evolutionary-based Framework for Optimizing the Spread of Information on Twitter. Procedia Computer Science, 66, 287–296.CrossRef
Zurück zum Zitat Dunbar, R.I.M., Arnaboldi, V., Conti, M., Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47.CrossRef Dunbar, R.I.M., Arnaboldi, V., Conti, M., Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47.CrossRef
Zurück zum Zitat Duvanova, D., Nikolaev, A., Nikolsko-Rzhevskyy, A., Semenov, A. (2015). Violent conflict and online segregation: An analysis of social network communication across Ukraine’s regions. Journal of Comparative Economics. Duvanova, D., Nikolaev, A., Nikolsko-Rzhevskyy, A., Semenov, A. (2015). Violent conflict and online segregation: An analysis of social network communication across Ukraine’s regions. Journal of Comparative Economics.
Zurück zum Zitat Farseev, A., Nie, L., Akbari, M., Chua, T.S. (2015). Harvesting multiple sources for user profile learning: a big data study. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (pp. 235–242): ACM. Farseev, A., Nie, L., Akbari, M., Chua, T.S. (2015). Harvesting multiple sources for user profile learning: a big data study. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (pp. 235–242): ACM.
Zurück zum Zitat Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A. (2010). Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE (pp. 1–9). Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A. (2010). Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE (pp. 1–9).
Zurück zum Zitat Hicks, A., & BE, D.F. (2015). Mining Twitter as a First Step toward Assessing the Adequacy of Gender Identification Terms on Intake Forms. Hicks, A., & BE, D.F. (2015). Mining Twitter as a First Step toward Assessing the Adequacy of Gender Identification Terms on Intake Forms.
Zurück zum Zitat Kahanda, I., & Neville, J. (2009). Using Transactional Information to Predict Link Strength in Online Social Networks. ICWSM, 9, 74–81. Kahanda, I., & Neville, J. (2009). Using Transactional Information to Predict Link Strength in Online Social Networks. ICWSM, 9, 74–81.
Zurück zum Zitat Knyazkov, K.V., Kovalchuk, S.V., Tchurov, T.N., Maryin, S.V., Boukhanovsky, A.V. (2012). CLAVIRE: e-Science infrastructure for data-driven computing. Journal of Computational Science, 3(6), 504–510.CrossRef Knyazkov, K.V., Kovalchuk, S.V., Tchurov, T.N., Maryin, S.V., Boukhanovsky, A.V. (2012). CLAVIRE: e-Science infrastructure for data-driven computing. Journal of Computational Science, 3(6), 504–510.CrossRef
Zurück zum Zitat Kwak, H., Lee, C., Park, H., Moon, S. (2010). What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web (pp. 591–600): ACM. Kwak, H., Lee, C., Park, H., Moon, S. (2010). What is Twitter, a social network or a news media?. In Proceedings of the 19th international conference on World wide web (pp. 591–600): ACM.
Zurück zum Zitat Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C. (2012). Tedas: A twitter-based event detection and analysis system. In 2012 ieee 28th international conference on Data engineering (icde) (pp. 1273–1276): IEEE. Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C. (2012). Tedas: A twitter-based event detection and analysis system. In 2012 ieee 28th international conference on Data engineering (icde) (pp. 1273–1276): IEEE.
Zurück zum Zitat Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C. (2012). Processing and visualizing the data in tweets. ACM SIGMOD Record, 40(4), 21–27.CrossRef Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C. (2012). Processing and visualizing the data in tweets. ACM SIGMOD Record, 40(4), 21–27.CrossRef
Zurück zum Zitat Mathioudakis, M., & Koudas, N. (2010). Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1155–1158): ACM. Mathioudakis, M., & Koudas, N. (2010). Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1155–1158): ACM.
Zurück zum Zitat METRA, I. (2014). Influence based exploration of twitter social network. METRA, I. (2014). Influence based exploration of twitter social network.
Zurück zum Zitat Papadakis, G., Tserpes, K., Sardis, E., Kardara, M., Papaoikonomou, A., Aisopos, F. (2012). Social media meta-API: leveraging the content of social networks. In Proceedings of the 21st international conference companion on World Wide Web (pp. 271–274): ACM. Papadakis, G., Tserpes, K., Sardis, E., Kardara, M., Papaoikonomou, A., Aisopos, F. (2012). Social media meta-API: leveraging the content of social networks. In Proceedings of the 21st international conference companion on World Wide Web (pp. 271–274): ACM.
Zurück zum Zitat Psallidas, F., Ntoulas, A., Delis, A. (2013). Soc web: Efficient monitoring of social network activities. In Web Information Systems Engineering–WISE 2013 (pp. 118–136): Springer Berlin Heidelberg. Psallidas, F., Ntoulas, A., Delis, A. (2013). Soc web: Efficient monitoring of social network activities. In Web Information Systems Engineering–WISE 2013 (pp. 118–136): Springer Berlin Heidelberg.
Zurück zum Zitat Serrano, D., Stroulia, E., Barbosa, D., Guana, V. (2012). Sociql: A query language for the socialweb, Springer Berlin Heidelberg. Serrano, D., Stroulia, E., Barbosa, D., Guana, V. (2012). Sociql: A query language for the socialweb, Springer Berlin Heidelberg.
Zurück zum Zitat Shuai, H.H., Yang, D.N., Shen, C.Y., Yu, P.S., Chen, M.S. (2015). QMSampler: Joint Sampling of Multiple Networks with Quality Guarantee. arXiv:1502.07439. Shuai, H.H., Yang, D.N., Shen, C.Y., Yu, P.S., Chen, M.S. (2015). QMSampler: Joint Sampling of Multiple Networks with Quality Guarantee. arXiv:1502.​07439.
Zurück zum Zitat Teng, S.Y., Yeh, M.Y., Chuang, K.T. (2015). Toward Understanding the Mobile Social Properties: An Analysis on Instagram Photo-Sharing Network. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 266–269): ACM. Teng, S.Y., Yeh, M.Y., Chuang, K.T. (2015). Toward Understanding the Mobile Social Properties: An Analysis on Instagram Photo-Sharing Network. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 266–269): ACM.
Zurück zum Zitat Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Bhagat, N. (2014). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 147–156): ACM. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Bhagat, N. (2014). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 147–156): ACM.
Zurück zum Zitat Valkanas, G., & Gunopulos, D. (2013). How the live web feels about events. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 639–648): ACM. Valkanas, G., & Gunopulos, D. (2013). How the live web feels about events. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 639–648): ACM.
Zurück zum Zitat Valkanas, G., Saravanou, A., Gunopulos, D. (2014). A faceted crawler for the twitter service. In Web Information Systems Engineering–WISE 2014 (pp. 178–188): Springer International Publishing. Valkanas, G., Saravanou, A., Gunopulos, D. (2014). A faceted crawler for the twitter service. In Web Information Systems Engineering–WISE 2014 (pp. 178–188): Springer International Publishing.
Zurück zum Zitat Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S. (2013). Exploiting hashtags for adaptive microblog crawling. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 311–315): ACM. Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S. (2013). Exploiting hashtags for adaptive microblog crawling. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 311–315): ACM.
Zurück zum Zitat Wachowicz, M., Arteaga, M.D., Cha, S., Bourgeois, Y. (2015). Developing a streaming data processing workflow for querying space–time activities from geotagged tweets. Computers, Environment and Urban Systems. Wachowicz, M., Arteaga, M.D., Cha, S., Bourgeois, Y. (2015). Developing a streaming data processing workflow for querying space–time activities from geotagged tweets. Computers, Environment and Urban Systems.
Zurück zum Zitat Xiong, F., Liu, Y., Zhang, Z. J., Zhu, J., Zhang, Y. (2012). An information diffusion model based on retweeting mechanism for online social media. Physics Letters A, 376(30), 2103–2108.CrossRef Xiong, F., Liu, Y., Zhang, Z. J., Zhu, J., Zhang, Y. (2012). An information diffusion model based on retweeting mechanism for online social media. Physics Letters A, 376(30), 2103–2108.CrossRef
Zurück zum Zitat Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2–2): USENIX Association. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2–2): USENIX Association.
Zurück zum Zitat Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I. (2012b). Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Presented as part of the. Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I. (2012b). Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Presented as part of the.
Zurück zum Zitat Zou, J., Fekri, F., McLaughlin, S. W. (2015). Mining Streaming Tweets for Real-Time Event Credibility Prediction in Twitter. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1586–1589): ACM. Zou, J., Fekri, F., McLaughlin, S. W. (2015). Mining Streaming Tweets for Real-Time Event Credibility Prediction in Twitter. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1586–1589): ACM.
Metadaten
Titel
Unified domain-specific language for collecting and processing data of social media
verfasst von
Nikolay Butakov
Maxim Petrov
Ksenia Mukhina
Denis Nasonov
Sergey Kovalchuk
Publikationsdatum
02.05.2018
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2018
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-018-0508-5

Weitere Artikel der Ausgabe 2/2018

Journal of Intelligent Information Systems 2/2018 Zur Ausgabe

Premium Partner