Skip to main content

2017 | Buch

Social Informatics

9th International Conference, SocInfo 2017, Oxford, UK, September 13-15, 2017, Proceedings, Part I

insite
SUCHEN

Über dieses Buch

The two-volume set LNCS 10539 and 10540 constitutes the proceedings of the 9th International Conference on Social Informatics, SocInfo 2017, held in Oxford, UK, in September 2017.The 37 full papers and 43 poster papers presented in this volume were carefully reviewed and selected from 142 submissions. The papers are organized in topical sections named: economics, science of success, and education; network science; news, misinformation, and collective sensemaking; opinions, behavior, and social media mining; proximity, location, mobility, and urban analytics; security, privacy, and trust; tools and methods; and health and behaviour.

Inhaltsverzeichnis

Frontmatter

Economics, Science of Success, and Education

Frontmatter
Simple Acoustic-Prosodic Models of Confidence and Likability are Associated with Long-Term Funding Outcomes for Entrepreneurs

Entrepreneurship pitches are an increasingly common way for startup founders to attract the attention of potential investors, who may be swayed by style as well as content. This study examines whether vocal features can capture some of the perceived traits of entrepreneurs, and whether those perceptions are associated with long-term funding outcomes for the firm. Using 122 pitches from the TechCrunch Disrupt Startup Battlefield competition, I find that eventual funding amounts are significantly greater for those entrepreneurs who are perceived as more confident and less likable, and that these traits can be well modeled by features associated with the intensity (loudness) of their speech patterns.

Natalie A. Carlson
ABCE: A Python Library for Economic Agent-Based Modeling

The rise of computational power makes agent-based modelling a viable option for models capturing the complex nature of an economy. However, the coding implementation can be tedious. Because of this, we introduce ABCE, the Agent-Based Computational Economics library. ABCE is an agent-based modeling library for Python that is specifically tailored for economic phenomena. With ABCE the modeler specifies the decision logic of the agents, the order of actions, the goods and their physical transformation (the production and the consumption functions). Then, ABCE automatically handles the actions, such as production and consumption, trade and agent interaction. The result is a program where the source code consists of only economically meaningful commands (e.g. decisions, buy, sell, produce, consume, contract, etc.). ABCE scales on multi-core computers, without the intervention of the modeler. The model can be packaged into a nice web application or run in a Jupyter notebook.

Davoud Taghawi-Nejad, Rudy H. Tanin, R. Maria Del Rio Chanona, Adrián Carro, J. Doyne Farmer, Torsten Heinrich, Juan Sabuco, Mika J. Straka
The Dynamics of Professional Prestige in Fashion Industries of Europe and the US: Network Approach

Career trajectories of fashion models have different outcomes and depend on every project (photoshoot, catwalk etc.) where they do participate. In this field, it is common practice that there are choices between salary and symbolic capital as recognition and new connections in the world of fashion and art which they can acquire after collaboration with brands or journals. From this, it follows that present affiliation influences their future career path, so they exchange among themselves their level of prestige. In this paper we use longitudinal data on cover photoshoots in fashion and lifestyle magazines from 1975 to 2016 to see, how journals and fashion models occupy positions in this field and how their prestige transforms at different time periods according to cultural and economic mechanisms.

Margarita Kuleva, Daria Maglevanaya
Matching Graduate Applicants with Faculty Members

Every year, millions of students apply to universities for admission to graduate programs (Master’s and Ph.D.). The applications are individually evaluated and forwarded to appropriate faculty members. Considering human subjectivity and processing latency, this is a highly tedious and time-consuming job that has to be performed every year. In this paper, we propose several information retrieval models aimed at partially or fully automating the task. Applicants are represented by their statements of purpose (SOP), and faculty members are represented by the papers they authored. We extract keywords from papers and SOPs using a state-of-the-art keyword extractor. A detailed exploratory analysis of keywords yields several insights into the contents of SOPs and papers. We report results on several information retrieval models employing keywords and bag-of-words content modeling, with the former offering significantly better results. While we are able to correctly retrieve research areas for a given statement of purpose (F-score of 57.7% at rank 2 and 61.8% at rank 3), the task of matching applicants and faculty members is more difficult, and we are able to achieve an F-measure of 21% at rank 2 and 24% at rank 3, when making a selection among 73 faculty members.

Shibamouli Lahiri, Carmen Banea, Rada Mihalcea

Network Science

Frontmatter
Why Groups Show Different Fairness Norms? The Interaction Topology Might Explain

Computational models of prosocial norms are becoming important from the perspective of theoretical social sciences as well as engineering of autonomous systems, who also need to show prosocial behavior in their social interactions. Fairness, as one of the strongest prosocial norms has long been argued to govern human behavior in a wide range of social, economic, and organizational activities. The sense of fairness, although universal, varies across different societies. In this study, using a computational model based on evolutionary games on graphs, we demonstrate emergence of fair behavior in structured interaction of rational agents and test the hypothesis that the network structure of social interaction can causally explain some of the cross-societal variations in fairness norms, as previously reported by empirical studies. We show that two network parameters, community structure, as measured by the modularity index, and network hubiness, represented by the skewness of degree distribution, have the most significant impact on emergence of fairness norms. These two parameters can explain much of the variations in fairness norms across societies and can also be linked to hypotheses suggested by earlier empirical work. We devised a multi-layered model that combines local agent interactions with social learning, thus enables both strategic behavior as well as diffusion of successful strategies. We also discuss some generalizable methods that are used in the selection of network structures and convergence criteria used in simulations for work. By applying multivariate statistics on the results, we obtain the relation between network structural features and the collective fair behavior.

Mohsen Mosleh, Babak Heydari
Constrained Community Detection in Multiplex Networks

Constrained community detection is a kind of community detection taking given constraints into account to improve the accuracy of community detection. Optimizing constrained Hamiltonian is one of the methods for constrained community detection. Constrained Hamiltonian consists of Hamiltonian which is generalized modularity and constrained term which takes given constraints into account. Nakata proposed a method for constrained community detection in monoplex networks based on the optimization of constrained Hamiltonian by extended Louvain method.In this paper, we propose a new method for constrained community detection in multiplex networks. Multiplex networks are the combinations of multiple individual networks. They can represent temporal networks or networks with several types of edges. While optimizing modularity proposed by Mucha et al. is popular for community detection in multiplex networks, our method optimizes the constrained Hamiltonian which we extend for multiplex networks. By using our proposed method, we successfully detect communities taking constraints into account. We also successfully improve the accuracy of community detection by using our method iteratively. Our method enables us to carry out constrained community detection interactively in multiplex networks.

Koji Eguchi, Tsuyoshi Murata

News, Misinformation, and Collective Sensemaking

Frontmatter
Seminar Users in the Arabic Twitter Sphere

We introduce the notion of “seminar users”, who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecting seminar users in other parts of the world and in other languages. We further explored a controversial political topic to observe the prevalence and potential potency of such users. In our case study, we found that 25% of the users engaged in the topic are in fact seminar users and their tweets make nearly a third of the on-topic tweets. Moreover, they are often successful in affecting mainstream discourse with coordinated hashtag campaigns.

Kareem Darwish, Dimitar Alexandrov, Preslav Nakov, Yelena Mejova
Exploiting Context for Rumour Detection in Social Media

Tools that are able to detect unverified information posted on social media during a news event can help to avoid the spread of rumours that turn out to be false. In this paper we compare a novel approach using Conditional Random Fields that learns from the sequential dynamics of social media posts with the current state-of-the-art rumour detection system, as well as other baselines. In contrast to existing work, our classifier does not need to observe tweets querying the stance of a post to deem it a rumour but, instead, exploits context learned during the event. Our classifier has improved precision and recall over the state-of-the-art classifier that relies on querying tweets, as well as outperforming our best baseline. Moreover, the results provide evidence for the generalisability of our classifier.

Arkaitz Zubiaga, Maria Liakata, Rob Procter
Multidimensional Analysis of the News Consumption of Different Demographic Groups on a Nationwide Scale

Examining 103,133 news articles that are the most popular for different demographic groups in Daum News (the second most popular news portal in South Korea) during the whole year of 2015, we provided multi-level analyses of gender and age differences in news consumption. We measured such differences in four different levels: (1) by actual news items, (2) by section, (3) by topic, and (4) by subtopic. We characterized the news items at the four levels by using the computational techniques, which are topic modeling and the vector representation of words and news items. We found that differences in news reading behavior across different demographic groups are the most noticeable in subtopic level but neither section nor topic levels.

Jisun An, Haewoon Kwak
Trump vs. Hillary: What Went Viral During the 2016 US Presidential Election

In this paper, we present quantitative and qualitative analysis of the top retweeted tweets (viral tweets) pertaining to the US presidential elections from September 1, 2016 to Election Day on November 8, 2016. For everyday, we tagged the top 50 most retweeted tweets as supporting or attacking either candidate or as neutral/irrelevant. Then we analyzed the tweets in each class for: general trends and statistics; the most frequently used hashtags, terms, and locations; the most retweeted accounts and tweets; and the most shared news and links. In all we analyzed the 3,450 most viral tweets that grabbed the most attention during the US election and were retweeted in total 26.3 million times accounting over 40% of the total tweet volume pertaining to the US election in the aforementioned period. Our analysis of the tweets highlights some of the differences between the social media strategies of both candidates, the penetration of their messages, and the potential effect of attacks on both.

Kareem Darwish, Walid Magdy, Tahar Zanouda
Understanding Online Political Networks: The Case of the Far-Right and Far-Left in Greece

This paper examines the connectivity among political networks on Twitter. We explore dynamics inside and between the far right and the far left, as well as the relation between the structure of the network and sentiment. The 2015 Greek political context offers a unique opportunity to investigate political communication in times of political intensity and crisis. We explore interactions inside and between political networks on Twitter in the run up to the elections of three different ballots: the parliamentary election of 25 January, the bailout referendum of 5 July, the snap election of 20 September; we, then, compare political action during campaigns with that during routinized politics.

Pantelis Agathangelou, Ioannis Katakis, Lamprini Rori, Dimitrios Gunopulos, Barry Richards
Polarization in Blogging About the Paris Meeting on Climate Change

To what extent was the blogging about the recent Paris meeting on climate change polarized? This paper addresses this question by way of a series of analyses of a comprehensive corpus of English language blog posts about the negotiations to reach an agreement to mitigate climate change. We identify two groups of bloggers, the engaged bloggers and the contrarian bloggers and use the contents of their blog posts and the patterns in their linking to sources to characterize and compare the two groups. The paper combines computational methods and manual analyses and uses co-citation networks in an innovative way to characterize and compare the contexts of the linking in the two groups. We address challenges that using computational methods to study polarization in blogs raises. We argue that the ideological profiles of the sources the blogs link to are clear signals of polarization.

Dag Elgesem
Event Analysis on the 2016 U.S. Presidential Election Using Social Media

It is not surprising that social media have played an important role in shaping the political debate during the 2016 presidential election. The dynamics of social media provide a unique opportunity to detect and interpret the pivotal events and scandals of the candidates quantitatively. This paper examines several text-based analysis to determine which topics have a lasting impact on the election for the two main candidates, Clinton and Trump. About 135.5 million tweets are collected over the six weeks prior to the election. From these tweets, topic clustering, keyword extraction, and tweeter analysis are performed to better understand the impact of the events occurred during this period. Our analysis builds upon a social science foundation to provide another avenue for scholars to use in discerning how events detected from social media show the impacts of campaigns as well as campaign the election.

Tarrek A. Shaban, Lindsay Hexter, Jinho D. Choi
Badly Evolved? Exploring Long-Surviving Suspicious Users on Twitter

We study the behavior of long-lived eventually suspended accounts in social media through a comprehensive investigation of Arabic Twitter. With a threefold study of (i) the content these accounts post; (ii) the evolution of their linguistic patterns; and (iii) their activity evolution, we compare long-lived users versus short-lived, legitimate, and pro-ISIS users. We find that these long-lived accounts – though trying to appear normal – do exhibit significantly different behaviors from both normal and other suspended users. We additionally identify temporal changes and assess their value in supporting discovery of these accounts and find out that most accounts have actually being “hiding in plain sight” and are detectable early in their lifetime. Finally, we successfully apply our findings to address a series of classification tasks, most notably to determine whether a given account is a long-surviving account.

Majid Alfifi, James Caverlee

Opinions, Behavior, and Social Media Mining

Frontmatter
Like at First Sight: Understanding User Engagement with the World of Microvideos

Several content-driven platforms have adopted the ‘micro video’ format, a new form of short video that is constrained in duration, typically at most 5–10 s long. Micro videos are typically viewed through mobile apps, and are presented to viewers as a long list of videos that can be scrolled through. How should micro video creators capture viewers’ attention in the short attention span? Does quality of content matter? Or do social effects predominate, giving content from users with large numbers of followers a greater chance of becoming popular? To the extent that quality matters, what aspect of the video – aesthetics or affect – is critical to ensuring user engagement?We examine these questions using a snapshot of nearly all ($${>}120,000$$) videos uploaded to globally accessible channels on the micro video platform Vine over an 8 week period. We find that although social factors do affect engagement, content quality becomes equally important at the top end of the engagement scale. Furthermore, using the temporal aspects of video, we verify that decisions are made quickly, and that first impressions matter more, with the first seconds of the video typically being of higher quality and having a large effect on overall user engagement. We verify these data-driven insights with a user study from 115 respondents, confirming that users tend to engage with micro videos based on “first sight”, and that users see this format as a more immediate and less professional medium than traditional user-generated video (e.g., YouTube) or user-generated images (e.g., Flickr).

Sagar Joglekar, Nishanth Sastry, Miriam Redi
Compression-Based Algorithms for Deception Detection

In this work we extend compression-based algorithms for deception detection in text. In contrast to approaches that rely on theories for deception to identify feature sets, compression automatically identifies the most significant features. We consider two datasets that allow us to explore deception in opinion (content) and deception in identity (stylometry). Our first approach is to use unsupervised clustering based on a normalized compression distance (NCD) between documents. Our second approach is to use Prediction by Partial Matching (PPM) to train a classifier with conditional probabilities from labeled documents, followed by arithmetic coding (AC) to classify an unknown document based on which label gives the best compression. We find a significant dependence of the classifier on the relative volume of training data used to build the conditional probability distributions of the different labels. Methods are demonstrated to overcome the data size-dependence when analytics, not information transfer, is the goal. Our results indicate that deceptive text contains structure statistically distinct from truthful text, and that this structure can be automatically detected using compression-based algorithms.

Christina L. Ting, Andrew N. Fisher, Travis L. Bauer
‘Dark Germany’: Hidden Patterns of Participation in Online Far-Right Protests Against Refugee Housing

The political discourse in Western European countries such as Germany has recently seen a resurgence of the topic of refugees, fueled by an influx of refugees from various Middle Eastern and African countries. Even though the topic of refugees evidently plays a large role in online and offline politics of the affected countries, the fact that protests against refugees stem from the right-wight political spectrum has lead to corresponding media to be shared in a decentralized fashion, making an analysis of the underlying social and mediatic networks difficult. In order to contribute to the analysis of these processes, we present a quantitative study of the social media activities of a contemporary nationwide protest movement against local refugee housing in Germany, which organizes itself via dedicated Facebook pages per city. We analyse data from 136 such protest pages in 2015, containing more than 46,000 posts and more than one million interactions by more than 200,000 users. In order to learn about the patterns of communication and interaction among users of far-right social media sites and pages, we investigate the temporal characteristics of the social media activities of this protest movement, as well as the connectedness of the interactions of its participants. We find several activity metrics such as the number of posts issued, discussion volume about crime and housing costs, negative polarity in comments, and user engagement to peak in late 2015, coinciding with chancellor Angela Merkel’s much criticized decision of September 2015 to temporarily admit the entry of Syrian refugees to Germany. Furthermore, our evidence suggests a low degree of direct connectedness of participants in this movement, (i.a., indicated by a lack of geographical collaboration patterns), yet we encounter a strong affiliation of the pages’ user base with far-right political parties.

Sebastian Schelter, Jérôme Kunegis
An Analysis of UK Policing Engagement via Social Media

Police forces in the UK make use of social media to communicate and engage with the public. However, while guidance reports claim that social media can enhance the accessibility of policing organisations, research studies have shown that exchanges between the citizens and the police tend to be infrequent. Social media usually act as an extra channel for delivering messages, but not as a mean for enabling a deeper engagement with the public. This has led to a phenomena where police officers and staff started to use social media in a personal capacity in the aim of getting closer to the public. In this paper, we aim to understand what attracts citizens to engage with social media policing content, from corporate as well as from non-corporate accounts. Our approach combines learnings from existing theories and studies on user engagement as well as from the analysis of 1.5 Million posts from 48 corporate and 2,450 non-corporate Twitter police accounts. Our results provide police-specific guidelines on how to improve communication to increase public engagement and participation.

Miriam Fernandez, Tom Dickinson, Harith Alani
What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata

Wikidata is a community-driven knowledge graph which has drawn much attention from researchers and practitioners since its inception in 2012. The large user pool behind this project has been able to produce information spanning over several domains, which is openly released and can be reused to feed any information-based application. Collaborative production processes in Wikidata have not yet been explored. Understanding them is key to prevent potentially harmful community dynamics and ensure the sustainability of the project in the long run. We performed a regression analysis to investigate how the contribution of different types of users, i.e. bots and human editors, registered or anonymous, influences outcome quality in Wikidata. Moreover, we looked at the effects of tenure and interest diversity among registered users. Our findings show that a balanced contribution of bots and human editors positively influence outcome quality, whereas higher numbers of anonymous edits may hinder performance. Tenure and interest diversity within groups also lead to higher quality. These results may be helpful to identify and address groups that are likely to underperform in Wikidata. Further work should analyse in detail the respective contributions of bots and registered users.

Alessandro Piscopo, Chris Phethean, Elena Simperl
Multimodal Analysis and Prediction of Latent User Dimensions

Humans upload over 1.8 billion digital images to the internet each day, yet the relationship between the images that a person shares with others and his/her psychological characteristics remains poorly understood. In the current research, we analyze the relationship between images, captions, and the latent demographic/psychological dimensions of personality and gender. We consider a wide range of automatically extracted visual and textual features of images/captions that are shared by a large sample of individuals ($$N \approx 1,350$$). Using correlational methods, we identify several visual and textual properties that show strong relationships with individual differences between participants. Additionally, we explore the task of predicting user attributes using a multimodal approach that simultaneously leverages images and their captions. Results from these experiments suggest that images alone have significant predictive power and, additionally, multimodal methods outperform both visual features and textual features in isolation when attempting to predict individual differences.

Laura Wendlandt, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
Characterizing Videos, Audience and Advertising in Youtube Channels for Kids

Online video services, messaging systems, games and social media services are tremendously popular among young people and children in many countries. Most of the digital services offered on the internet are advertising funded, which makes advertising ubiquitous in children’s everyday life. To understand the impact of advertising-based digital services on children, we study the collective behavior of users of YouTube for kids channels and present the demographics of a large number of users. We collected data from 12,848 videos from 17 channels in US and UK and 24 channels in Brazil. The channels in English have been viewed more than 37 billion times. We also collected more than 14 million comments made by users. Based on a combination of text-analysis and face recognition tools, we show the presence of racial and gender biases in our large sample of users. We also identify children actively using YouTube, although the minimum age for using the service is 13 years in most countries. We provide comparisons of user behavior among the three countries, which represent large user populations in the global North and the global South.

Camila Souza Araújo, Gabriel Magno, Wagner Meira Jr., Virgilio Almeida, Pedro Hartung, Danilo Doneda
Comparing Influencers: Activity vs. Connectivity Measures in Defining Key Actors in Twitter Ad Hoc Discussions on Migrants in Germany and Russia

Today, a range of research approaches is used to define the so-called influencers in discussions in social media, and one can trace both conceptual and methodological differences in how influencers are defined and tracked. We distinguish between ‘marketing’ and ‘deliberative’ conceptualization of influencers and between metrics based on absolute figures and those from social network analytics; combining them leads to better understanding of user activity and connectivity measures in defining influential users. We add to the existing research by asking whether user activity necessarily leads to better connectivity and by what metrics in online ad hoc discussions, and try to compare the structure of influencers. To do this, we use comparable outbursts of discussions on inter-ethnic conflicts related to immigration. We collect Twitter data on violent conflicts between host and re-settled groups in Russia and Germany and look at top20 user lists by eight parameters of activity and connectivity to assess the structure of influencers in terms of pro/contra-migrant cleavages and institutional belonging. Our results show that, in both discussions, the number of users involved matters most for becoming an influencer by betweenness and pagerank centralities. Also, contrary to expectations, Russian top users all in all are, in general, more neutral, while Germans are more divided, but in both countries pro-migrant media oppose anti-migrant informal leaders.

Svetlana S. Bodrunova, Anna A. Litvinenko, Ivan S. Blekanov
The President on Twitter: A Characterization Study of @realDonaldTrump

US President Donald Trump is perhaps the most powerful man on Twitter in terms of both his office and his ability to impact world events through his tweets. The way he uses the platform is unusual for someone in his position and is divisive among US citizens. Some tweets are posted by staff while others are posted by Trump himself, and in the time period of our dataset, the platform used to post distinguishes the author. We use this data to study the behavioral characteristics of the tweet sources and the public reaction to this content. Trump tweets tended to be more focused on himself or and other people, rather than the audience, and are more negative, angry, and anxious than staffers’ tweets. Liberals and conservatives alike found some of the tweets inappropriate for someone in Trump’s position to be posting, and the majority of inappropriate tweets came from Trump himself. The language characteristics are so distinctive that they may be used in a predictive model to correctly classify a tweet’s author with 87% accuracy. Our predictive model will low for authorship determination, even when platform information is not informative, and our analysis suggests directions for future research on the rise of populist candidates and how they communicate on social media.

Brooke Auxier, Jennifer Golbeck
Social Networking Sites Withdrawal

The importance of the users for the survival of a social networking site is vital. For this reason, most of the research about this topic is focused about how to make the user to participate on the network. However, little has been researched about the reasons why a user would decide to close its account and leave the network for good. This research is aimed to study this phenomenon based on the Social Identity Theory, specifically the disidentification concept. The research implemented the means-end chain methodology using the data collected from in-depth interviews to 26 adults who have closed an SNS account. This data was analyzed through content analysis and using Social Network Analysis as an alternative to map the chains suggested by the means end chain methodology, as well as providing more information based on the centrality measures. The findings suggest that impression management, friendship, time management and emotional stability play a central role to take the withdrawal decision.

Carlos Osorio, Rob Wilson, Savvas Papagiannidis
When Follow is Just One Click Away: Understanding Twitter Follow Behavior in the 2016 U.S. Presidential Election

Motivated by the two paradoxical facts that the marginal cost of following one extra candidate is close to zero and that the majority of Twitter users choose to follow only one or two candidates, we study the Twitter follow behaviors observed in the 2016 U.S. presidential election. Specifically, we complete the following tasks: (1) analyze Twitter follow patterns of the presidential election on Twitter, (2) use negative binomial regression to study the effects of gender and occupation on the number of candidates that one follows, and (3) use multinomial logistic regression to investigate the effects of gender, occupation and celebrities on the choice of candidates to follow.

Yu Wang, Jiebo Luo, Xiyang Zhang
Inferring Spread of Readers’ Emotion Affected by Online News

Depending on the reader, A news article may be viewed from many different perspectives, thus triggering different (and possibly contradicting) emotions. In this paper, we formulate a problem of predicting readers’ emotion distribution affected by a news article. Our approach analyzes affective annotations provided by readers of news articles taken from a non-English online news site. We create a new corpus from the annotated articles, and build a domain-specific emotion lexicon and word embedding features. We finally construct a multi-target regression model from a set of features extracted from online news articles. Our experiments show that by combining lexicon and word embedding features, our regression model is able to predict the emotion distribution with RMSE scores between 0.067 to 0.232 for each emotion category.

Agus Sulistya, Ferdian Thung, David Lo
How Polarized Have We Become? A Multimodal Classification of Trump Followers and Clinton Followers

Polarization in American politics has been extensively documented and analyzed for decades, and the phenomenon became all the more apparent during the 2016 presidential election, where Trump and Clinton depicted two radically different pictures of America. Inspired by this gaping polarization and the extensive utilization of Twitter during the 2016 presidential campaign, in this paper we take the first step in measuring polarization in social media and we attempt to predict individuals’ Twitter following behavior through analyzing ones’ everyday tweets, profile images and posted pictures. As such, we treat polarization as a classification problem and study to what extent Trump followers and Clinton followers on Twitter can be distinguished, which in turn serves as a metric of polarization in general. We apply LSTM to processing tweet features and we extract visual features using the VGG neural network. Integrating these two sets of features boosts the overall performance. We are able to achieve an accuracy of 69%, suggesting that the high degree of polarization recorded in the literature has started to manifest itself in social media as well.

Yu Wang, Yang Feng, Zhe Hong, Ryan Berger, Jiebo Luo
Can Cross-Lingual Information Cascades Be Predicted on Twitter?

Social network services (SNSs) have provided many opportunities for sharing information and knowledge in various languages due to their international popularity. Understanding the information flow between different countries and languages on SNSs can not only provide better insights into global connectivity and sociolinguistics, but is also beneficial for practical applications such as globally-influential event detection and global marketing. In this study, we characterized and attempted to detect influential cross-lingual information cascades on Twitter. With a large-scale Twitter dataset, we conducted statistical analysis of the growth and language distribution of information cascades. Based on this analysis, we propose a feature-based model to detect influential cross-lingual information cascades and show its effectiveness in predicting the growth and language distribution of cascades in the early stage.

Hongshan Jin, Masashi Toyoda, Naoki Yoshinaga
An Analysis of Individuals’ Behavior Change in Online Groups

Many online platforms support social functions that enable their members to communicate, befriend, and join groups with one another. These social engagements are known to shape individuals’ future behavior. However, most work has focused solely on how peers influence behavior and little is known what additional role online groups play in changing behavior. We investigate the capacity for group membership to lead users to change their behavior in three settings: (1) selecting physical activities, (2) responding to help requests, and (3) remaining active on the platform. To do this, we analyze nearly half a million users over five years from a popular fitness-focused social media platform whose unique affordances allow us to precisely control for the effects of social ties, user demographics, and communication. We find that after joining a group, users readily adopt the exercising behavior seen in the group, regardless of whether the group was exercise and non-exercise themed, and this change is not explained by the influence of pre-existing social ties. Further, we find that the group setting equalizes the social status of individuals such that lower status users still receive responses to requests. Finally, we find, surprisingly, that the number of groups one joins is negatively associated with user retention, when controlling for other behavioral and social factors.

David Jurgens, James McCorriston, Derek Ruths
The Message or the Messenger? Inferring Virality and Diffusion Structure from Online Petition Signature Data

Goel et al. [14] examined diffusion data from Twitter to conclude that online petitions are shared more virally than other types of content. Their definition of structural virality, which measures the extent to which diffusion follows a broadcast model or is spread person to person (virally), depends on knowing the topology of the diffusion cascade. But often the diffusion structure cannot be observed directly. We examined time-stamped signature data from the Obama White House’s We the People petition platform. We developed measures based on temporal dynamics that, we argue, can be used to infer diffusion structure as well as the more intrinsic notion of virality sometimes known as infectiousness. These measures indicate that successful petitions are likely to be higher in both intrinsic and structural virality than unsuccessful petitions are. We also investigate threshold effects on petition signing that challenge simple contagion models, and report simulations for a theoretical model that are consistent with our data.

Chi Ling Chan, Justin Lai, Bryan Hooi, Todd Davies

Proximity, Location, Mobility, and Urban Analytics

Frontmatter
Measuring Ambient Population from Location-Based Social Networks to Describe Urban Crime

Recently, a lot of attention has been given to crime prediction, both by the general public and by the research community. Most of the latest work has concentrated on showing the potential of novel data sources like social media, mobile phone data, points of interest, or transportation data for the crime prediction task and researchers have focused mostly on techniques from supervised machine learning to show their predictive potential. Yet, the question remains if indeed this data can be used to better describe urban crime. In this paper, we investigate the potential of data harvested from location-based social networks (specifically Foursquare) to describe urban crime. Towards this end, we apply techniques from spatial econometrics. We show that this data, seen as a measurement for the ambient population of a neighborhood, is able to further describe crime levels in comparison to models built solely on census data, seen as measurement for the resident population of a neighborhood. In an analysis of crime on census tract level in New York City, the total number of incidents can be described by our models with up to $$R^2 = 56\%$$, while the best model for the different crime subtypes is achieved for larcenies with roughly $$67\%$$ of the variance explained.

Cristina Kadar, Raquel Rosés Brüngger, Irena Pletikosa
Robust Modeling of Human Contact Networks Across Different Scales and Proximity-Sensing Techniques

The problem of mapping human close-range proximity networks has been tackled using a variety of technical approaches. Wearable electronic devices, in particular, have proven to be particularly successful in a variety of settings relevant for research in social science, complex networks and infectious diseases dynamics. Each device and technology used for proximity sensing (e.g., RFIDs, Bluetooth, low-power radio or infrared communication, etc.) comes with specific biases on the close-range relations it records. Hence it is important to assess which statistical features of the empirical proximity networks are robust across different measurement techniques, and which modeling frameworks generalize well across empirical data. Here we compare time-resolved proximity networks recorded in different experimental settings and show that some important statistical features are robust across all settings considered. The observed universality calls for a simplified modeling approach. We show that one such simple model is indeed able to reproduce the main statistical distributions characterizing the empirical temporal networks.

Michele Starnini, Bruno Lepri, Andrea Baronchelli, Alain Barrat, Ciro Cattuto, Romualdo Pastor-Satorras
Personalized Recommendation of Points-of-Interest Based on Multilayer Local Community Detection

When visiting a touristic venue, building personalized itineraries is often non-trivial, mainly because of the variety of types of points-of-interest (PoIs) that might be considered by an individual. Several online platforms exist to support the tourists by providing them with detailed PoI-related information in a certain area, such as routes, distances, reviews, and ratings. However, integrating all these aspects can be tricky, and finding a reasonable trade-off between spatial/temporal proximity, amount and serendipity of the PoIs to visit can be challenging even for expert tourists. In this work, we propose a novel approach to the recommendation of a set of PoIs for a geographic area set around a given seed PoI, by leveraging a multilayer local community detection framework. The seed-centric communities are discovered in a complex network system, whose nodes correspond to PoIs and relations in the different layers correspond to services provided by different online platforms, i.e., Google Maps, Foursquare and Wikipedia. Experimental evaluation on renowned Italian touristic venues unveiled interesting findings on the significance of the proposed approach.

Roberto Interdonato, Andrea Tagarelli
Designing for Digital Inclusion: A Post-Hoc Evaluation of a Civic Technology

Digital inequalities are a major obstacle in diversifying the public discourse on the Internet. To explore the potential of a system design to help bridging digital inequalities across gender and race, we conducted a post-hoc evaluation of design decisions within a civic technology that were particularly dedicated to increase participation of women and people of color. While many aspects of digital inequality stay unresolved, our results provide evidence in support of such dedicated design decisions. Our work also makes a methodological contribution by providing an approach to use external public data sets to supplement user demographic data, without which studies of digital inclusion could only rely on self-reported, potentially biased data. We discuss the empirical and ethical implications of our research approach and results.

Claudia López, Rosta Farzan

Security, Privacy, and Trust

Frontmatter
The Cognitive Heuristics Behind Disclosure Decisions

Despite regulatory efforts to protect personal data online, users knowingly consent to disclose more personal data than they intend, and they are also prone to disclose more than they know. We consider that a reliance on cognitive heuristics is key to explaining these aspects of users’ disclosure decisions. Also, that the cues underpinning these heuristics can be exploited by organisations seeking to extract more data than is required. Through the lens of an existing credibility heuristic framework, we qualitatively analyse 23, one-to-one, semi-structured interviews. We identify six super-ordinate classes of heuristics that users rely upon during disclosures: PROMINENCE, NETWORK, RELIABILITY, ACCORDANCE, NARRATIVE, MODALITY, and a seventh non-heuristics TRADE class. Our results suggest that regulatory efforts seeking to increase the autonomy of the informed user are inapt. Instead the key to supporting users during disclosure decisions could be to positively nudge users through the cues underpinning these simple heuristics.

Vincent Marmion, Felicity Bishop, David E. Millard, Sarah V. Stevenage

Tools and Methods

Frontmatter
A Propagation-Based Method of Estimating Students’ Concept Understanding

In this paper, we introduce a method to estimate the degree of students’ understanding of concepts and relationships while they learn from digital text materials online. To achieve our goal, we first define a semantic network that represents the knowledge in a material. Second, we define students’ behavior as the sequence of relationships they read in the material, and we create a probabilistic model for relationship understanding. We also create inference rules to include new relationships in the network. Third, we simulate the propagation of the new concept understanding through the network by using a method based on Biased PageRank, extending it with a method to represent prior knowledge and weighting the contribution of every concept according to the uniqueness of its relationships. Finally, we describe an experiment to compare our method against a method without propagation and a method in which propagation is inversely proportional to the distance between concepts. Our method shows significant improvement compared to the others, providing evidence that propagation of concept understanding through the entire network exists.

Rafael López-García, Makoto P. Kato, Katsumi Tanaka
Seeds Buffering for Information Spreading Processes

Seeding strategies for influence maximization in social networks have been studied for more than a decade. They have mainly relied on the activation of all resources (seeds) simultaneously in the beginning; yet, it has been shown that sequential seeding strategies are commonly better. This research focuses on studying sequential seeding with buffering, which is an extension to basic sequential seeding concept. The proposed method avoids choosing nodes that will be activated through the natural diffusion process, which is leading to better use of the budget for activating seed nodes in the social influence process. This approach was compared with sequential seeding without buffering and single stage seeding. The results on both real and artificial social networks confirm that the buffer-based consecutive seeding is a good trade-off between the final coverage and the time to reach it. It performs significantly better than its rivals for a fixed budget. The gain is obtained by dynamic rankings and the ability to detect network areas with nodes that are not yet activated and have high potential of activating their neighbours.

Jarosław Jankowski, Piotr Bródka, Radosław Michalski, Przemysław Kazienko
Backmatter
Metadaten
Titel
Social Informatics
herausgegeben von
Giovanni Luca Ciampaglia
Afra Mashhadi
Dr. Taha Yasseri
Copyright-Jahr
2017
Electronic ISBN
978-3-319-67217-5
Print ISBN
978-3-319-67216-8
DOI
https://doi.org/10.1007/978-3-319-67217-5

Neuer Inhalt