Prediction and Inference from Social Networks and Social Media

herausgegeben von: Jalal Kawash, Nitin Agarwal, Tansel Özyer

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Social Networks

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book addresses the challenges of social network and social media analysis in terms of prediction and inference. The chapters collected here tackle these issues by proposing new analysis methods and by examining mining methods for the vast amount of social content produced. Social Networks (SNs) have become an integral part of our lives; they are used for leisure, business, government, medical, educational purposes and have attracted billions of users. The challenges that stem from this wide adoption of SNs are vast. These include generating realistic social network topologies, awareness of user activities, topic and trend generation, estimation of user attributes from their social content, and behavior detection. This text has applications to widely used platforms such as Twitter and Facebook and appeals to students, researchers, and professionals in the field.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Having Fun?: Personalized Activity-Based Mood Prediction in Social Media

Abstract

People engage in various activities and hobbies as a part of their work as well as for entertainment. Positivity and negativity attributes of a person’s mood and emotions are affected by the activity that they’re engaged in. In addition to that, time is also a fundamental contextual trigger for emotions as activities have been found to occur at particular time. An interesting question is can we design accurate personalized classifiers that can predict a person’s mood or emotions based on these features extracted from his/her posting in social media? Such a classifier would enable caretakers and health personnel to monitor people going through conditions such as depression as well as identifying people in a timely manner who may be prone to such conditions. This paper explores the design, implementation, and evaluation of such a classifier based on the data collected from Twitter. To do so, crowdworkers were first recruited through Amazon’s Mechanical Turk to label the dataset. A number of potential features are then explored to build a general classifier to automatically predict positivity or negativity of users’ tweets. These features include social engagement, gender, language and linguistic styles, and various psychological features. Then in addition to these features, LIWC is used to extract daily activities of users. Observations show how much activities and temporal nature of posting can be useful behavioral cues to develop a personalized classifier that improves the prediction accuracy of tweets of individual users as positive, negative, and neutral.

Mahnaz Roshanaei, Richard Han, Shivakant Mishra

Chapter 2. Automatic Medical Image Multilingual Indexation Through a Medical Social Network

Abstract

Medical social networking sites enabled multimedia content sharing in large volumes, by allowing physicians and patients to upload their medical images. These images are diagnosed and commented, in different languages, by several specialists instantly. Moreover, it is necessary to employ new techniques, in order to automatically extract information and analyze knowledge from the huge number of comments expressing specialist’s analyzes and recommendations. For this reason, we propose a terms-based method in order to extract the relevant terms and words which can describe the medical image. Furthermore, significant extracted terms and keywords will be used later to index medical images, in order to facilitate their search through the social network site. In fact, we need to take account, in our work, that existing comments are expressed in different languages. So, it is essential to implement a multilingual indexation method to eliminate the ambiguity which will be the cause of the effectiveness’s reduction of the search function. In order to palliate this situation, we propose a multilingual mixed approach which concentrates on algorithms based on statistical methods and external multilingual semantic resources, in order to handle and to cover different languages. The use of external resources, such as semantic multilingual thesaurus, can improve the efficiency of the indexing process. Our proposed method can be applied in different languages. It is also essential to implement an auto-correction of the medical terms by using a medical dictionary. The correction of terms helps to eliminate the ambiguity which will be the cause of the reduction in the frequency of appearance of such terms. The correction of terms has taken into consideration that terms are presented in different languages. Our study is validated by a set of experiments and a comparison study with some existing approaches in literature. Experimental results have indicated that the proposed system has a superior performance compared to other systems, which is satisfactory.

Mouhamed Gaith Ayadi, Riadh Bouslimi, Jalel Akaichi, Hana Hedhli

Chapter 3. The Significant Effect of Overlapping Community Structures in Signed Social Networks

Abstract

Social networks are non-detachable part of modern life. It is improbable that someone is not familiar with Facebook or Twitter. Nowadays people join these platforms and communicate with other members. In social networks, some people are more similar to each other and they form densely connected components named communities. Detection of these tightly connected components have received much attention recently. These components are overlapping in nature. In other words, a member belongs to more than one community. Not only overlapping community structure but also temporality is an intrinsic property of online social networks. Regarding the dynamism, people initiate new connections among each other. When people’s communications is mapped to signed connections then we obtain a network with both positive and negative links. To reliably predict how these interactions evolve, we consider the sign prediction problem. Sign prediction and overlapping community detection have been explored to a certain extent, however, answers to some of research questions are still unknown. For instance, how much is the significance of overlapping members in signed networks? To answer this question, we need a fast and precise overlapping community detection algorithm (OCD) working based on simple network dynamics such as disassortative degree mixing and information diffusion. In this paper, we propose a two-phase approach to discover overlapping communities in signed networks. In the first phase, the algorithm identifies most influential nodes (leaders) in the network. In the second phase, we identify the membership of each node to the leaders using network coordination game. We apply several features to investigate the significance of overlapping members. These features include extra, overlapping and intra. To compute these feature types, we not only apply simple degree ranking but also extend original HITS and PageRank ranking algorithms. We employ the features to the sign prediction problem. Results indicate that overlapping nodes competitively predict signs in comparison to intra and extra nodes.

Mohsen Shahriari, Ying Li, Ralf Klamma

Chapter 4. Extracting Relations Between Symptoms by Age-Frame Based Link Prediction

Abstract

The saying “treat the disease, not the symptoms” is widespread, a cliche for eliminating or repairing the root of a problem rather than mitigating the negative effects. However, the effort to prevent the negative effects which may be reason of a disease is the best course of action. The prediction of symptoms based on the past patient medical history revealed efficacious in foreseeing symptoms (abnormal parameters) a patient could likely be affected in the future. In this paper, we predict the onset of future symptoms on the base of the current health status of patients. For this purpose, we first construct a weighted symptom network considering the relations between abnormal parameters. Then, we propose an unsupervised link prediction method to identify the connections between parameters, building the evolving structure of symptom network with respect to patients’ ages. Experiments on a real network demonstrate that the proposed approach can reveal new abnormal parameter correlations accurately and perform well at capturing future disease risks.

Buket Kaya, Mustafa Poyraz

Chapter 5. Link Prediction by Network Analysis

Abstract

Link prediction refers to the process of mining and determining whether a link between two nodes in a given network may emerge in the future or it is already present but hidden in the network. Link prediction may be categorized under the class of recommendation systems, e.g., finding or predicting link/recommendation between users and items. Thus, efficient link prediction in social networks is the focus of the study described in this paper. Finding hidden links and extracting missing information in a network will aid in identifying a set of new interactions. We developed a technique for link prediction by exposing the benefits of social network analysis tools and algorithms. We used popular network models commonly used by the research community for testing our algorithm accuracy against well-known algorithms leading to similarity measures. We also decided on using a graph database to model the network for providing better scalability and efficiency compared to storing graph information in a relational database. The experimental results reported in this paper demonstrate how the proposed algorithm outperforms traditional link prediction algorithms described in the literature.

Salim Afra, Alper Aksaç, Tansel Õzyer, Reda Alhajj

Chapter 6. Structure-Based Features for Predicting the Quality of Articles in Wikipedia

Abstract

Success of Wikipedia is decidedly due to the free availability of high quality articles across many different expertise areas. If most of these resolute collaborations between authoritative users might constitute referenceable sources, Wikipedia is not sheltered from well-identified problems regarding articles quality, e.g., reputability of third-party sources and vandalism. Because of the huge number of articles and the intensive edit rate, it is not reasonable to even consider the manual evaluation of the content quality of each article. In this paper, we tackle the problem of modeling and predicting the quality of articles in collaborative platforms. We propose a quality model integrating both temporal and structural features captured from the implicit peer review process enabled by Wikipedia. A generic HITS-like framework is developed and able to capture both the quality of the content and the authority of the associated authors. Notably, a mutual reinforcement principle held between articles quality and author’s authority is exploited in order to take advantage of the collaborative graph generated by the users. Experiments conducted on a set of representative data from Wikipedia show the effectiveness of the computed indicators both in an unsupervised and supervised scenario.

Baptiste de La Robertie, Yoann Pitarch, Olivier Teste

Chapter 7. Predicting Collective Action from Micro-Blog Data

Abstract

Global and national events in recent years have shown that social media, and particularly micro-blogging services such as Twitter, can be a force for good (e.g., Arab Spring) and harm (e.g., London riots). In both of these examples, social media played a key role in group formation and organisation, and in the coordination of the group’s subsequent collective actions (i.e., the move from rhetoric to action). Surprisingly, despite its clear importance, little is understood about the factors that lead to this kind of group development and the transition to collective action. This paper focuses on an approach to the analysis of data from social media to detect weak signals, i.e., indicators that initially appear at the fringes, but are, in fact, early indicators of such large-scale real-world phenomena. Our approach is in contrast to existing research which focuses on analysing major themes, i.e., the strong signals, prevalent in a social network at a particular point in time. Analysis of weak signals can provide interesting possibilities for forecasting, with online user-generated content being used to identify and anticipate possible offline future events. We demonstrate our approach through analysis of tweets collected during the London riots in 2011 and use of our weak signals to predict tipping points in that context.

Christos Charitonidis, Awais Rashid, Paul J. Taylor

Chapter 8. Discovery of Structural and Temporal Patterns in MOOC Discussion Forums

Abstract

This work aims to explore methods to investigate the structure of knowledge exchange in discussion forums in massive open online courses (MOOCs) explicitly taking into account changing patterns over time. The paper covers three different aspects of forum analysis combining different methods. First, an approach for the extraction of dynamic communication networks from forum data based on the classification of forum posts is presented that takes into account the information exchange relations between forum users. Second, measures that characterise users according to information seeking and information giving behaviour are introduced and the development of individual actors is analysed. Third, blockmodelling and tensor decomposition approaches for reducing a dynamic network to an interpretable macro-structure reflecting knowledge exchange between clusters of actors over time are evaluated. This allows for the analysis of the communication structure related to information exchange between participants of large scale online courses in different aspects. The utility of the analytics framework is demonstrated along two case studies on forum discussions in two MOOCs offered on the Coursera platform.

Tobias Hecking, Andreas Harrer, H. Ulrich Hoppe

Chapter 9. Diffusion Process in a Multi-Dimension Networks: Generating, Modelling, and Simulation

Abstract

Social networks simulation implies two preconditions: (1) determining a population behavior and (2) simulating the information diffusion within it. A population is defined by a group of interconnected individuals possessing individual and structural behaviors in regard to information sharing. In this paper, the population generated is defined by socio-cultural features, specifically the way that people tend to link together. To this end, the definition of a unique social network is too restrictive: realistically, people are not interlinked by only one relationship. To overcome this limitation, multidimensional social networks (MSN) have been proposed to model social interactions where each dimension represents a category of relationship. The MSN architecture allows not only to better represent the diversity of human’s relations but also to define distinctive rules for the simulation of the message diffusion. We study a model of information spreading on multiplex networks, in which agents interact through multiple interaction channels or with different relationships (layers). The inner idea is that information disseminates differently according to the category of links through which the information propagates. So, this paper presents the modelling of an MSN based on social science and a simulation using propagation rules for each dimension.

Youssef Bouanan, Mathilde Forestier, Judicael Ribault, Gregory Zacharewicz, Bruno Vallespir

Titel: Prediction and Inference from Social Networks and Social Media
herausgegeben von: Jalal Kawash
Nitin Agarwal
Tansel Özyer
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-51049-1
Print ISBN: 978-3-319-51048-4
DOI: https://doi.org/10.1007/978-3-319-51049-1