Skip to main content
Top

2021 | Book

Big Data and Social Media Analytics

Trending Applications

insite
SEARCH

About this book

This edited book provides techniques which address various aspects of big data collection and analysis from social media platforms and beyond. It covers efficient compression of large networks, link prediction in hashtag graphs, visual exploration of social media data, identifying motifs in multivariate data, social media surveillance to enhance search and rescue missions, recommenders for collaborative filtering and safe travel plans to high risk destinations, analysis of cyber influence campaigns on YouTube, impact of location on business rating, bibliographical and co-authorship network analysis, and blog data analytics. All these trending topics form a major part of the state of the art in social media and big data analytics. Thus, this edited book may be considered as a valuable source for readers interested in grasping some of the most recent advancements in this high trending domain.

Table of Contents

Frontmatter
Twenty Years of Network Science: A Bibliographic and Co-authorship Network Analysis
Abstract
Two decades ago three pioneering papers turned the attention to complex networks and initiated a new era of research, establishing an interdisciplinary field called network science. Namely, these highly-cited seminal papers were written by Watts and Strogatz, Barabási and Albert, and Girvan and Newman on small-world networks, on scale-free networks and on the community structure of complex networks, respectively. In the past 20 years – due to the multidisciplinary nature of the field – a diverse but not divided network science community has emerged. In this chapter, we investigate how this community has evolved over time with respect to speed, diversity and interdisciplinary nature as seen through the growing co-authorship network of network scientists (here the notion refers to a scholar with at least one paper citing at least one of the three aforementioned milestone papers). After providing a bibliographic analysis of 31,763 network science papers, we construct the co-authorship network of 56,646 network scientists and we analyze its topology and dynamics. We shed light on the collaboration patterns of the last 20 years of network science by investigating numerous structural properties of the co-authorship network and by using enhanced data visualization techniques. We also identify the most central authors, the largest communities, investigate the spatiotemporal changes, and compare the properties of the network to scientometric indicators.
Roland Molontay, Marcell Nagy
Impact of Locational Factors on Business Ratings/Reviews: A Yelp and TripAdvisor Study
Abstract
The proliferation and the success of crowdsourcing reviews of businesses became a major indicator of the success or failure of establishments. Particularly, Yelp, Trip Advisor, and Zomato are using a social media platform where users can share feedback, scores, photos, and make reservations. Business owners keep a close eye on these crowd-sourced indicators in order to maintain or improve their ratings. While improvements can be related to many factors such as services, food, cleanliness, etc., in this chapter, we focus on the impact of the location of the business on its success based on these crowdsourced ratings. We perform an empirical study to quantify the impact of location characteristic indicators (parameters), such as cost of living, housing affordability, or tourism, on the success of restaurants as a business example using two datasets: 2019 Yelp, and TripAdvisor. We first, performed a state wise preliminary experiments to verify the correlation between location parameters and business success. We have verified that some location parameters alone, such as education index, can determine the success of a business with a 0.72 correlation ratio. Next, we propose a clustering method that group similar zip code locations to better estimate the influence of the said location parameters on the success scores.
Abu Saleh Md Tayeen, Abderrahmen Mtibaa, Satyajayant Misra, Milan Biswal
Identifying Reliable Recommenders in Users’ Collaborating Filtering and Social Neighbourhoods
Abstract
Recommender systems increasingly use information sourced from social networks to improve the quality of their recommendations. However, both recommender systems and social networks exhibit phenomena under which information for certain users or items is limited, such as the cold start and the grey sheep phenomena in collaborative filtering systems and the isolated users in social networks. In the context of a social network-aware collaborative filtering, where the collaborating filtering- and social network--based neighbourhoods are of varying density and utility for recommendation formulation, the ability to identify the most reliable recommenders from each neighbourhood for each user and appropriately combine the information associated with them in the recommendation computation process can significantly improve the quality and accuracy of the recommendations offered. In this chapter, we report on our extensions on earlier works in this area which comprise of (1) the development of an algorithm for discovering the most reliable recommenders of a social network recommender system and (2) the development and evaluation of a new collaborative filtering algorithm that synthesizes the opinions of a user’s identified recommenders to generate successful recommendations for the particular user. The proposed algorithm introduces significant gains in rating prediction accuracy (4.9% on average, in terms of prediction MAE reduction and 4.2% on average, in terms of prediction RMSE reduction) and outperforms other algorithms. The proposed algorithm, by design, utilizes only basic information from the collaborative filtering domain (user–item ratings) and the social network domain (user relationships); therefore, it can be easily applied to any social network recommender system dataset.
Dionisis Margaris, Dimitris Spiliotopoulos, Costas Vassilakis
Safe Travelling Period Recommendation to High Attack Risk European Destinations Based on Past Attack Information
Abstract
Terrorism is a significant deterrent for tourism. It affects both visitors and local citizens and personnel of a country or area. On one hand, the potential visitor will probably avoid travelling to a high attack risk country, due to safety reasons, hence will miss the opportunity to visit it, and, on the other hand, the country’s tourism will decline. This work addresses the aforementioned problem by (1) showing that relatively safe visiting periods for high attack risk European countries can be predicted with high accuracy, using limited information, comprising of attack and fatality data from the past years, which are widely available, and (2) developing an algorithm that recommends relatively safe periods to potential travellers.
The results of this work will be useful for tourists, visitors, businesses and operators, as well as relevant stakeholders and actors.
Dimitris Spiliotopoulos, Dionisis Margaris, Costas Vassilakis
Analyzing Cyber Influence Campaigns on YouTube Using YouTubeTracker
Abstract
YouTube is the second most popular website in the world. Over 500 hours of videos are uploaded every minute and 5 billion videos are watched every day – almost one video per person worldwide. Because videos can deliver a complex message in a way that captures the audience’s attention more effectively than text-based platforms, it has become one of the most relevant platforms in the age of digital mass communication. This makes the analysis of YouTube content and user behavior invaluable not only to information scientists but also communication researchers, journalists, sociologists, and many more. There exists a number of YouTube analysis tools but none of them provide an in-depth qualitative and quantitative insights into user behavior or networks. Towards that direction, we introduce YouTubeTracker – a tool designed to gather YouTube data and gain insights on content and users. This tool can help identify leading actors, networks and spheres of influence, emerging popular trends, as well as user opinion. This analysis can also be used to understand user engagement and social networks. This can help reveal suspicious and inorganic behaviors (e.g., trolling, botting, commenter mobs) that may cause algorithmic manipulations. Utility of the YouTubeTracker application is demonstrated via case studies on NATO’s 2018 Trident Juncture Exercise and the 2019 Canadian elections.
Thomas Marcoux, Nitin Agarwal, Recep Erol, Adewale Obadimu, Muhammad Nihal Hussain
Blog Data Analytics Using Blogtrackers
Abstract
The use of social media has increased precipitously over the past few years. Despite the emergence of social networking services like Twitter and Facebook, blogging has, although slowly, continued to rise and provides an effective medium for spreading hoaxes and radicalizing content. Individuals also use blogs as a platform to mobilize, coordinate, and conduct cyber campaigns ranging from awareness for diseases or disorders to deviant acts threatening democratic principles and institutions. Today, blogs are increasingly being used to convey mis/disinformation and political propaganda. With no restriction on the number of characters, many use blogs to set narratives then use other social media channels like Twitter and Facebook to disseminate those narratives and steer the audience to their blogs. Hence, blog monitoring and analysis is of great value for public affairs, strategic communications, journalists, and political and social scientists to examine various cyber campaigns. There are a few challenges in blog data analysis. For instance, since blogs are not uniformly structured, the absence of a universal Application Programming Interface (API) makes it difficult to collect blog data for analysis. Similarly, identifying relevant blogs for analysis is also challenging due to the lack of an efficient blog search engines. To facilitate research in this direction, in this chapter, we present the Blogtrackers tool which is designed to explore the blogosphere and gain insights on various events. Blogtrackers can help in analyzing the networks of blogs and bloggers. This tool can also be used to identify influential bloggers, analyze emerging trends, assess tones, and extract key entities in blogs.
Adewale Obadimu, Muhammad Nihal Hussain, Nitin Agarwal
Using Social Media Surveillance in Order to Enhance the Effectiveness of Crew Members in Search and Rescue Missions
Abstract
Social media nowadays are linked almost with every aspect of our lives. They can and have been used to explain social relations, human behaviors, political affections, product preference, just to mention a few applications of social network analysis. Moreover, there are cases in which social media surveillance can be proved valuable for saving human lives as the case which is studied in this book chapter. More specifically, we attempt to test how information collected from social media can improve the ability of a Search and Rescue (SAR) crew to detect people in need using visual search. For the purposes of the study, we simulated a SAR mission in a 3D virtual environment and we asked the study participants to locate refuges needing assistance in different areas of an island. The initial information provided to the volunteers was differentiated and the experiments showed that volunteers who were searching for clues based on input from local social media posts were able to track all the people in need and thus social media surveil-lance was proved to be very promising if it is applied in such cases.
Dimitrios Lappas, Panagiotis Karampelas, Georgios Fessakis
Visual Exploration and Debugging of Machine Learning Classification over Social Media Data
Abstract
Humanitarian and geopolitical crises (such as COVID-19) are frequently extra-national in scope. Technology, including applications of natural language processing and machine learning, can play a vital role in mitigating this burden, especially with availability and real-time analyses of social media. One such application is situation labeling, intuitively defined as the semi-automatic assignment of one or more actionable labels (such as food, medicine or water) from a controlled vocabulary to tweets or documents that become available in the aftermath of a crisis, such as an earthquake. Despite multiple advances, users of current situation labeling systems are often unwilling to trust these (and other machine learning) outputs without some provenance and visualization of results. This article describes an interactive visualization approach called SAVIZ that allows non-technical users to intuitively and interactively explore outputs of situation labeling systems. We illustrate the potential of SAVIZ with two real-world crisis datasets from Twitter. Our platform is completely built using open-source tools, can be rendered on a web browser and is backward-compatible with several pre-existing crisis intelligence platforms.
Mayank Kejriwal, Peilin Zhou
Efficient and Flexible Compression of Very Sparse Networks of Big Data
Abstract
In the current era of big data, huge amounts of valuable data and information have been generated and collected at a very rapid rate from a wide variety of rich data sources. Social networks are examples of these rich data sources. Embedded in these big data are implicit, previously unknown and useful knowledge that can be mined and discovered by data science techniques such as data mining and social network analysis. Hence, these techniques have drawn attention of researchers. In general, a social network consists of many users (or social entities), who are often connected by “following” relationships. Finding those famous users who are frequently followed by a large number of common followers can be useful. These frequently followed groups of famous users can be of interest to many researchers (or businesses) due to their influential roles in the social networks. However, it can be challenging to find these frequently followed groups because most users are likely to follow only a small number of famous users. In this chapter, we present an efficient and flexible compression model for supporting the analysis and mining of very sparse networks of big data, from which the frequently followed groups of users can be discovered.
Carson K. Leung, Fan Jiang, Yibin Zhang
Weather Big Data Analytics: Seeking Motifs in Multivariate Weather Data
Abstract
For the past few years, climate changes and frequent disasters that are attributed to extreme weather phenomena have received considerable attention. Technical advancement both in hardware, such as sensors, satellites, cluster computing, etc., and analytical tools such as machine learning, deep learning, network analysis, etc., have allowed the collection and analysis of a large volume of complex weather related data. In this chapter, we study the European capital temperatures by implementing the novel “General Purpose Sequence Clustering” methodology (GPSC), which allows to analyze and cluster numerous long time series using commercial widely available hardware of low cost. Using the specific methodology, we have managed to cluster two-years temperature time series of 38 European capitals. This is not just based on typical seasonality but in a more in-depth level using complex patterns. The results showed the efficiency and effectiveness of the methodology by identifying several clusters showing similarities that could help weather specialists in discovering more advanced weather prediction models.
Konstantinos F. Xylogiannopoulos, Panagiotis Karampelas, Reda Alhajj
Analysis of Link Prediction Algorithms in Hashtag Graphs
Abstract
Twitter is a prominent multilingual social networking site where users can post messages known as “tweets”. Twitter, like other social networking sites such as Facebook, allows users to categorize tweets by the use of “hashtags”. Communication on Twitter can be mapped in terms of hashtag graphs, where vertices correspond to hashtags, and edges correspond to co-occurrences of hashtags within the same distinct tweet. Furthermore, a vertex in hashtag graphs can be weighted with the number of tweets a hashtag has occurred in, and edges can be weighted with the number of tweets both hashtags have co-occurred in, creating a “weighted hashtag graph”. In this chapter, we describe additions to some well-known link prediction methods that allow the weights of both vertices and edges in a weighted hashtag graph to be taken into account. We base our novel predictive additions on the assumption that more popular hashtags have a higher probability to appear with other hashtags in the future. We then apply these improved methods to three sets of Twitter data with the intent of predicting hashtag co-occurrences in the future. In addition to these methods, we investigate the performance of a new, graph neural network-based framework, SEAL, which has been shown in past trials to perform better than heuristic-based approaches such as the Katz index, SimRank and rooted PageRank. Experiments were conducted on real-life data sets consisting of over 3,000,000 combined unique tweets and over 250,000 combined unique hashtags. Results from the experiments show that simpler heuristic-based scoring methods have marginal performance that decreases with the addition of more data over time. On the other hand, SEAL is shown to have superior performance in hashtag graph link prediction over the approaches it has been previously compared against in other domains. The AUC score of 0.959 obtained in our experiments by using SEAL significantly exceeds those of our benchmark approaches for link prediction, which include the Katz index, SimRank, and rooted PageRank.
Logan Praznik, Mohiuddin Md Abdul Qudar, Chetan Mendhe, Gautam Srivastava, Vijay Mago
Metadata
Title
Big Data and Social Media Analytics
Editors
Dr. Mehmet Çakırtaş
Dr. Mehmet Kemal Ozdemir
Copyright Year
2021
Electronic ISBN
978-3-030-67044-3
Print ISBN
978-3-030-67043-6
DOI
https://doi.org/10.1007/978-3-030-67044-3

Premium Partner