main-content

## Über dieses Buch

This book constitutes the proceedings of the Workshops held at the International Conference on Social Informatics, SocInfo 2014, which took place in Barcelona, Spain, in November 2014. This year SocInfo 2014 included nine satellite workshops: the City Labs Workshop, the Workshop on Criminal Network Analysis and Mining, CRIMENET, the Workshop on Interaction and Exchange in Social Media, DYAD, the Workshop on Exploration of Games and Gamers, EGG, the Workshop on HistoInformatics, the Workshop on Socio-Economic Dynamics, Networks and Agent-based Models, SEDNAM, the Workshop on Social Influence, SI, the Workshop on Social Scientists Working with Start-Ups and the Workshop on Social Media in Crowdsourcing and Human Computation, SoHuman.

## Inhaltsverzeichnis

### City Labs - Introduction

Social media and digital traces from sensor such as smart cards and mobile phones have played a key role in providing insights into people’s activities, opinions and day-to-day lives. These detailed user-generated information streams offer a unique opportunity for cities to understand and engage their citizens. The research domain of smarter cities aims to monitor disruptive events (e.g., emergencies, Olympics), analyze social behaviour, identify citizens’ sentiment and understand their interactions with services. On the other side, cities can use their understanding of the citizen to foster stronger relationships with the diverse communities in their constituencies. This understanding could be applied to mobilize people on important issues such as education, health care, political engagement and community awareness. Also, new digital fabrication tools have been recently used to generate adoptable, dynamic and interactive architecture able to evolve together with urban dwellers, and it has been shown that new Internet-of-Things devices could effectively capture physical observations to understand how cities and urban centers work. As a result, cities now provide a living lab where applied research can be carried out to understand citizen and services with a focus on collaborative, user-centred design and co-creation.

Elizabeth M. Daly, Areti Markopoulou, Daniele Quercia

### FlowSampler: Visual Analysis of Urban Flows in Geolocated Social Media Data

Analysis of flows such as human movement can help spatial planners better understand territorial patterns in urban environments. In this paper, we describe FlowSampler, an interactive visual interface designed for spatial planners to gather, extract and analyse human flows in geolocated social media data. Our system adopts a graph-based approach to infer movement pathways from spatial point type data and expresses the resulting information through multiple linked multiple visualisations to support data exploration. We describe two use cases to demonstrate the functionality of our system and characterise how spatial planners utilise it to address analytical task.

Alvin Chua, Ernesto Marcheggiani, Loris Servillo, Andrew Vande Moere

### Policing Engagement via Social Media

Social Media is commonly used by policing organisations to spread the word on crime, weather, missing person, etc. In this work we aim to understand what attracts citizens to engage with social media policing content. To study these engagement dynamics we propose a combination of machine learning and semantic analysis techniques. Our initial research, performed over 3,200 posts from @dorsetpolice Twitter account, shows that writing longer posts, with positive sentiment, and sending them out before 4pm, was found to increase the probability of attracting attention. Additionally, posts about weather, roads and infrastructures, mentioning places, are also more likely to attract attention.

Miriam Fernandez, A. Elizabeth Cano, Harith Alani

### Digital Social Media to Enhance the Public Realm in Historic Cities

The research aims at exploring a methodology for the use of

digital social media

(

DSM

) to study and influence people’s behaviors within the public realm of an historic city center. Potentialities and limitations, created by the use of digital social media for urban analysis and planning, are spotted regarding the specific conditions of an historic city center with the goal of creating a more livable public realm. The research aims at drafting a general methodology both in data mining both in public places promotion and enhancement, through information and digital connection, referring to the specific case of historic city centers. The research considers the case study of Urbino, in the Marche region (Italy), and carries on different analyses and proposals of intervention based on digital social media.

Morandi Corinna, Palmieri Riccardo, Tomarchio Ludovica

### Privacy Preserving Energy Management

The improvement of energy efficiency is an important target on all levels of society. It is best achieved on the basis of locally and temporally fine-grained measurement data for identifying unnecessary use of energy. However, at the same time such fine-grained measurements allow deriving information about the persons using the energy. In this paper we describe our work towards a privacy preserving system for energy management. Our solution follows the privacy by design paradigm and uses attribute-based cryptography and virtualization to increase security.

Holger Kinkelin, Marcel von Maltitz, Benedikt Peter, Cornelia Kappler, Heiko Niedermayer, Georg Carle

### DaTactic, Data with Tactics: Description and Evaluation of a New Format of Online Campaigning for NGOs

Social media has emerged as a powerful communication channel to promote actions and raise social awareness. Initiatives through social media are being driven by NGOs to increase the scope and effectiveness of their campaigns. In this paper, we describe the

#DaTactic2

campaign, which is both an offline and online initiative supported by Oxfam Intermón devised to gather activists and NGOs practitioners and create awareness on the importance of the 2014 European Parliament election. We provide details regarding the background of the campaign, as well as the objectives, the strategies that have been implemented and an empirical evaluation of its performance through an analysis of the impact on Twitter. Our findings show the effectiveness of bringing together relevant actors in an offline event and the high value of creating multimedia content in order to increase the scope and virality of the campaign.

Pablo Aragón, Saya Sauliere, Rebeca Díez Escudero, Alberto Abellán

### Online Communication in Apartment Buildings

In this paper we explore main patterns of communication and cooperation in online groups created by residents of apartment buildings in St.Petersburg on the

VK

social networking site (SNS). Using word-frequency analysis and Latent Dirichlet Allocation (LDA), we discovered main discussion topics in online groups. We have also found that communication between neighbors in these groups is predominantly connected with material needs and directed at solving common problems, e.g. related to building improvement, houseowner associations (HOA) and in-fill constructions near their house. Based on online observations of city activists, we suggest that dynamic nature of SNS creates online communities that initially are dedicated to resolving particular problems, however the connections that get established between users during this process prevent such groups from falling apart even after the resolution of the original issues.

Vadim Voskresenskiy, Kirill Sukharev, Ilya Musabirov, Daniel Alexandrov

### Experiments for a Real Time Crowdsourced Urban Design

We present a case study that encompassed an interactive urban design workshop held in Nebrija Architecture University in Madrid, Spain, in March 2013. In this workshop, an urban survey was held and an urban intervention proposal was participatorily developed for an empty plot in a nearby neighborhood. Different online collaborative design tools and data mining were used and monitored over the span of a year, and results were analyzed last March 2014. The findings show that collaborative tools help distribute work and gather knowledge from different sources, but seldom are the span and intensity of these work stages taken into consideration. The timeline and completion of the agenda was a key element during the workshop, determining the success or failure of many of the tools used depending on the time dimension. This temporal dimension still retro-feeds the work process, as some of those tools have become obsolete or redundant in a matter of few months. The lessons learned will lead to future studies on this subject.

Gonzalo Reyero Aldama, Federico Cabitza

### How Can City Labs Enhance the Citizens’ Motivation in Different Types of Innovation Activities?

There is a wide diversity of city labs for collaborative innovation. However, in all cases their success depends on motivating citizens to participate in their activities. This article builds on the literature on innovation dynamics in Living Labs to link them with other kinds of City Labs. The contribution of this article consists on building on the types of innovation mechanisms in Living Lab networks (Leminen, Westerlund, & Nyström, 2012; Leminen, 2013) by relating each type to a different theoretical innovation logic (methods for creativity; social innovation; open innovation; user innovation). Each logic is related to a different type of localized space of collective innovation (Fab Labs, co-creation spaces, Living Labs, coworking spaces and hackerspaces) and participants’ motivation to collaborate. The literature review on the main characteristics of each logic provide some guidelines for City Labs practitioners about how to motivate citizens.

Ignasi Capdevila

### Criminal Network Analysis and Mining (CRIMENET 2014) - Introduction

Mobile phone networks, social network platforms, social media and over-IP messaging systems represent typical examples of the multitude of communication media broadly adopted in nowadays society. One aspect that has vast societal impact is the abuse of such platforms: the possibility that criminals can exploit these communication channels to organize and coordinate their illicit activities has been proved real. Criminal Networks (CNs) differ from well-studied Social Networks in a number of ways, including their size (usually the number of members is low), the lack of knowledge of their structure and organization (information about members and their relations is incomplete) and the different types of dynamics of interactions (digital communications, economic transactions, face-to-face interactions, etc.). Therefore, in recent years (say, after 9-11-2001) Criminal Network Analysis has grown as an outstanding, almost independent research area. The ability to detect criminal behavior across different interaction media is of paramount importance to avoid abuse and fight crime. For this reason, computational tools and models have been recently proposed to study criminal behavior in online platforms and mobile phone networks.

Emilio Ferrara, Salvatore Catanese, Giacomo Fiumara

### Understanding Crime Networks: Actors and Links

In order to understand crime networks, criminological and practical knowledge should be merged. Criminals are similar, criminals are different. Crime networks can be categorized but still the links, actors, and characteristics are different. This paper gives a literature review of crime networks from criminological as well as network analysis views.

Fatih Ozgul, Zeki Erdem

### The (not so) Critical Nodes of Criminal Networks

One of the most basic question in the analysis of social networks is to find nodes that are of particular relevance in the network. The answer that emerged in the recent literature is that the

importance

, or

centrality

, of a node

$$x$$

is proportional to the number of nodes that get disconnected from the network when node

$$x$$

is removed. We show that while in social networks such important nodes lie in their

cores

(i.e., maximal subgraphs in which all nodes have degree higher than a certain value), this is not necessarily the case in criminal networks. This shows that nodes whose removal affects large portions of the criminal network prefer to operate from network peripheries, thus confirming the intuition of Baker and Faulkner [

4

]. Our results also highlight structural differences between criminal networks and other social networks, suggesting that classical definitions of importance (or centrality) in a network fail to capture the concept of key players in criminal networks.

Donatella Firmani, Giuseppe F. Italiano, Luigi Laura

### A Literature-Based Approach to a Narco-Network

In this work we used a combined text-mining/manual-curing approach in order to mine the Spanish-written book “Los Señores del Narco” and identify the narco social network. In our method, nodes are book characters and links are created when the closeness between those characters is under a certain threshold value along the text. Results show the network of the principal drug-dealers of México as well as some politicians or members of the national police department. A community analysis shows some separated groups corresponding to well-known drug cartels. The analysis presented here remarks the importance of the text mining tools to understand relationships among individuals –specially the qualitative character of the interactions– which could be difficult by using other approaches.

Jesús Espinal-Enríquez, J. Mario Siqueiros-García, Rodrigo García-Herrera, Sergio Antonio Alcalá-Corona

### The Spatial Structure of Crime in Urban Environments

It is undoubtedly

cliché

to say that we are in the Age of Big Data Analytics or Data Science; every computing and IT publication you find talks about Big Data and companies no longer are interested in software engineers and analysts but instead they are looking for Data Scientists! In spite of the excessive use of the term, the truth of the matter is that data has never been more available and the increase in computation power allows for more sophisticated tools to identify patterns in the data and on the networks that governs these systems (complex networks). Crime is not different, the open data phenomena has spread to thousand of cities in the world, which are making data about crime activity available for any citizen to look at. Furthermore, new criminology studies argue that criminals typically commit crimes in areas in which they are familiar, usually close to home. Using this information we propose a new model based on networks to build links between crimes in close physical proximity. We show that the structure of the criminal activity can be partially represented by this spatial network of sites. In this paper we describe this process and the analysis of the networks we have constructed to find patterns in the underlying structure of criminal activity.

Sarah White, Tobin Yehle, Hugo Serrano, Marcos Oliveira, Ronaldo Menezes

### Emergence of Extreme Opinions in Social Networks

The emergence and spreading of “extreme opinions” are studied in networks with agents sharing mild opinions. The turning extreme shift is driven by social group meetings. The extremization process is apprehended according to the social psychology phenomenon of group polarization and illustrated in the case of terrorism. In particular the focus is on the dynamics of emergence of “passive supporters” from which terrorists can then be recruited. Becoming a passive supporter being considered as taking an extreme opinion, group polarization is shown to play an important role for increasing the transition probabilities from mild opinion (e.g., anti-western feeling) to its extreme form (e.g., passive supporter or terrorist). Accordingly a simple agent-based model is defined to implement interactions among agents on networks. Three opinions are considered, pro-western opinion, anti-western opinion and extreme anti-western opinion. The latter may lead people to become passive supporters and, potentially, terrorists. Results of simulations show that a substantial fraction of anti-western agents adopt the extreme opinion exhibiting an emergent phenomenon which may shed some new light on real social phenomena of political violence.

Marco Alberto Javarone, Serge Galam

### Using Societal Impact Assessment (SIA) to Improve Technological Development in the Field of Crime Prevention

Geographical information systems (GIS), intelligence-led policing, and automation of border controls are approaches to crime prevention heavily reliant on technology as a fix for faster data gathering and processing. This paper proposes a four-part societal impact assessment (SIA) methodology as a means of accounting for the impacts of crime prevention technologies from the standpoints of desirability, acceptability, ethics, and data management. The paper provides empirical material in two short cases on crime-mapping and automated border control.

Gemma Galdon Clavell, Philippe M. Frowd

### What’s in a Dyad? Interaction and Exchange in Social Media - Introduction

Scientists are now on the cusp of gaining a computational understanding of social interaction by means of online conversational data in the form of blog posts, emails exchange, comments threads, or interest-based discussions. Online interactions can be conceptualized as a social exchange, and also as a process from which meaning emerges through dialogue between the two partners. This workshop creates an interdisciplinary venue for plentiful dialogue and exchange that aims to shed light at the understanding of social structure through a computational focus on the mechanics of the dyad.

Rossano Schifanella, Bogdan State, Yelena Mejova

### Triad-Based Role Discovery for Large Social Systems

The

social role

of a participant in a social system conceptualizes the circumstances under which she chooses to interact with others, making their discovery and analysis important for theoretical and practical purposes. In this paper, we propose a methodology to detect such roles by utilizing the conditional triad censuses of ego-networks. These censuses are a promising tool for social role extraction because they capture the degree to which basic social forces push upon a user to interact with others in a system. Clusters of triad censuses, inferred from network samples that preserve local structural properties, define the social roles. The approach is demonstrated on two large online interaction networks.

Derek Doran

### A Tool-Based Methodology to Analyze Social Network Interactions in Cultural Fields: The Use Case “MuseumWeek”

The goal of this paper is to present a tool-based methodology which has been developed to analyze messages sent on the Twitter social network. This methodology implements quantitative and qualitative analyses, which were benchmarked with the “MuseumWeek” event.

Antoine Courtin, Brigitte Juanals, Jean-Luc Minel, Mathide de Saint Léger

### Detecting Presence of Personal Events in Twitter Streams

Social media has become a prime place where many users announce their personal events, such as getting married, graduating, or having a baby, to name a few. It is common for users to post about such events and receive attention from their friends. Such events are often sought after by social platforms to enrich users timelines, to create life-log videos, to personalize ads, etc. One important step towards accurately identifying an event is learning the signals that indicate the presence of such events. In this paper we generate an event/non-event classification model using a mixture of content and interaction features. We experiment with two categories of interaction features; activity, and attention, and reached a Precision of 56 % and 83 % respectively, demonstrating the higher importance of attention features in personal event detection.

Smitashree Choudhury, Harith Alani

### Digital Addiction Ontology for Social Networking Systems

Digital Addiction (hereafter referred to as DA) is an emerging, perhaps controversial, issue that is expected to profoundly impact modern societies. Different types of addiction, such as drugs, gambling and alcohol, have clear standards, regulations and policies on how to manufacture, market and sell them. In great contrast, DA has received little recognition or guidance in Human-Computer Interaction (HCI) and social media research communities. These communities are required to support software industry to provide ways to develop products that are more aware of DA. This research focuses on conceptualising DA to advance the understanding of how the design of social networking systems might influence human behaviour in a way that facilitates addiction. This paper presents an initial ontology and logical models for DA, and discusses potential HCI related implications.

Amen Alrobai, Huseyin Dogan

### EGG 2014: Exploration on Games and Gamers - Introduction

With the remarkable advances from isolated console games to massively multi-player online role-playing games, the online gaming world has become invaluable assets for the research of social dynamics [7]. Online game players interact with each other in various ways, as they do in the real world. More importantly, interactions could be easily quantified and logged in detail. The huge volume of behavioral data collected from online games helps researchers study human nature in an unprecedented scale. For instance, Szell et al. observe six different types of in-game interaction (e.g., friendship, communication, trade, enmity, aggression, and punishment) and analyze the inter-dependence of social networks based on each type [12]. Their rich modeling of human society demonstrates competitive advantage of user behavior data collected in online games.

Haewoon Kwak, Jeremy Blackburn, Huy Kang Kim

### Initial Exploration of the Use of Specific Tangible Widgets for Tablet Games

In this paper we investigated the use of tangible widgets vs. the use of finger touch for tablet games, which to our knowledge has not been researched so far. A user test was conducted where participants would report which of the two interaction methods they preferred for playing two tablet games: a fast-paced and a slow-paced game. We conclude that some of the participants found tangible widgets to be an interesting and potentially entertaining interaction method, even though our implementation had technical shortcomings compared to finger touch. Further study is needed to investigate how to fix these shortcomings, and how to increase the game experience of tablet games using tangible widgets.

Mads Bock, Martin Fisker, Kasper Fischer Topp, Martin Kraus

### Generosity as Social Contagion in Virtual Community

Online social network platform becomes a good arena to observe generous behaviors of humans. Among various online social platforms, online games mimic real world closely and embed various social interactions between players. In online games, players show generosity each other even to strangers by donating their cyber asset generously. We focus on analyzing the generous behaviors giving items or money to lower-level strangers. In this research, we focus on analyzing random acts of kindness that resembles generous behaviors in the real world, especially donation. Using a large-scale real data from a major online game company, we find that benefiting from a generous behavior increases a player’s generosity. On the other hand, we also notice that social influence does not work effectively in case that these generous behaviors are not recognizable or visible to their friends.

Jiyoung Woo, Byung Il Kwak, Jiyoun Lim, Huy Kang Kim

### Developing Game-Structure Sensitive Matchmaking System for Massive-Multiplayer Online Games

Providing a fair matchmaking system is an essential issue, while developing every online video game. In the article, we show that the currently existing matchmaking system in League of Legends, one of the most popular online video games currently existing, is built on a base of conditions which do not hold true in the presence of empirical data. This, in short, decreases the effectiveness of the ranking system, and negatively affects users experience. Therefore, we propose a new ranking system, which genuinely answers the needs, which arise from League of Legends gameplay. As League of Legends gameplay model is nowadays highly popular amid online video games, the proposed system can be easily generalized and adopted by other online video games that are currently popular among gamers.

Mateusz Myślak, Dominik Deja

### Linguistic Analysis of Toxic Behavior in an Online Video Game

In this paper we explore the

linguistic

components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player

transitions

from typical to toxic behavior. Our findings can be helpful to automatically detect and warn players who may become toxic and thus insulate potential victims from toxic playing in advance.

Haewoon Kwak, Jeremy Blackburn

### Informal In-Game Help Practices in Massive Multiplayer Online Games

In this paper we explore helping behavior of support agents and regular players in browser-based MMORTS/RPG Castlot. Using chat logs from 12 servers, we analyzed differences between support agents and regular players. We have found that the major in-game verbal help is being provided by players and not by support agents. We have also found that support agents’ helping behavior drops dramatically as a server ages, while regular players preserve helping practice, that is mostly transferred from public to guild chat channels.

Paul Okopny, Ilya Musabirov, Daniel Alexandrov

### Social Network Analysis of High-Level Players in Multiplayer Online Battle Arena Game

Recently, multiplayer online battle arena (MOBA) games have become one of the most popular video game genres. They are also known as Defense of the Ancients (DotA)-like games. As an online-based matching game, it is interesting to analyze players’ social structure. In League of Legends (LOL), the most popular MOBA game, players form a team and fight against an enemy together. In the game, they build communities like other conventional social network services (SNSs). In this paper, we analyze the social network of LOL, constructed from team/player data extracted with an official application programming interface (API). In particular, the ranks of players are considered in the analysis. The experimental results show the important features in the social structure of LOL that would be useful for applications in player modeling and match making.

Hyunsoo Park, Kyung-Joong Kim

### The 2nd HistoInformatics Workshop - Introduction

The 2

nd

HistoInformatics Workshop (

http://www.dl.kuis.kyoto-u.ac.jp/histoinformatics2014/

) (the 2nd International Workshop on Computational History) was held in conjunction with the 6th International Conference on Social Informatics (Socinfo2014) in Barcelona, Spain on the 10

th

November 2014. The objective of the workshop is to provide for two research communities, Computer Science and History Sciences, a place to meet and exchange ideas and to facilitate discussion and collaboration. This report briefly summarizes the workshop.

Adam Jatowt, Gaël Dias, Marten Düring, Antal van den Bosch

### Learning to Identify Historical Figures for Timeline Creation from Wikipedia Articles

This paper addresses a central sub-task of timeline creation from historical Wikipedia articles: learning from text which of the person names in a textual article should appear in a timeline on the same topic. We first process hundreds of timelines written by human experts and related Wikipedia articles to construct a corpus that can be used to evaluate systems that create history timelines from text documents. We then use a set of features to train a classifier that predicts the most important person names, resulting in a clear improvement over a competitive baseline.

Sandro Bauer, Stephen Clark, Thore Graepel

### Mapping the Early Modern News Flow: An Enquiry by Robust Text Reuse Detection

Early modern printed gazettes relied on a system of news exchange and text reuse largely based on handwritten sources. The reconstruction of this information exchange system is possible by detecting reused texts. We present a method to individuate text borrowings within noisy OCRed texts from printed gazettes based on string kernels and local text alignment. We apply our methods on a corpus of Italian gazettes for the year 1648. Beside unveiling substantial overlaps in news sources, we are able to assess the editorial policy of different gazettes and account for a multi-faceted system of text reuse.

Giovanni Colavizza, Mario Infelise, Frédéric Kaplan

### Linking Historical Ship Records to a Newspaper Archive

Linking historical datasets and making them available on the Web has increasingly become a subject of research in the field of digital humanities. In this paper, we focus on discovering links between ships from a dataset of Dutch maritime events and a historical archive of newspaper articles. We apply a heuristic-based method for finding and filtering links between ship instances; subsequently, we use machine learning for article classification to be used for enhanced filtering in combination with domain features. We evaluate the resulting links, using manually annotated samples as gold standard. The resulting links are made available as Linked Open Data, thus enriching the original data.

Andrea Bravo Balado, Victor de Boer, Guus Schreiber

### Digital Chronofiles of Life Experience

Technology has brought us to the point where we are able to digitally sample life experience in rich multimedia detail, often referred to as lifelogging. In this paper we explore the potential of lifelogging for the digitisation and archiving of life experience into a longitudinal media archive for an individual. We motivate the historical archive potential for rich digital memories, enabling individuals’ digital footprints to contribute to societal memories, and propose a data framework to gather and organise the lifetime of the subject.

Cathal Gurrin, Håvard Johansen, Thomas Sødring, Dag Johansen

### Mapping Memory Landscapes in nodegoat

nodegoat (

http://nodegoat.net/

) is a web-based data management, analysis and visualisation environment. nodegoat allows scholars to build datasets based on their own data model and offers relational modes of analysis with spatial and diachronic contextualisations. By combining these elements within one environment, scholars are able to instantly process, analyse and visualise complex datasets relationally, diachronically and spatially; trailblazing. nodegoat follows an object-oriented approach throughout its core functionalities. Borrowing from actor-network theory this means that people, events, artefacts, and sources are treated as equal: objects, and hierarchy depends solely on the composition of the network: relations. This object-oriented approach advocates the self-identification of individual objects and maps the correlation of objects within the collective.

Pim van Bree, Geert Kessels

### Mining Ministers (1572–1815). Using Semi-structured Data for Historical Research

There is a long tradition of categorizing and storing historical data in databases. However, these databases cannot always be used readily for computational approaches. In this paper, we use a twentieth century dataset on Dutch ministers (1572–1815) for modern quantitative analyses. We describe our methodology, provide results on the mobility of ministers and make further suggestions for the questions that can be answered now that could not before.

Serge ter Braake, Antske Fokkens, Fred van Lieburg

### Laboratories of Community: How Digital Humanities Can Further New European Integration History

It has been said that media is an important but mostly overlooked player in European integration history. Now, the mass digitisation of newspapers and the introduction of new digital techniques promise great potential to remedy this inattention. With the conjecture that people are drivers and carriers of change, we propose a people-centric approach to mine news articles in a way that can be most useful to further historical research. In this paper, we describe a methodology for building social networks from unstructured news stories, with the European integration scenario serving as a case study.

Mariona Coll Ardanuy, Maarten van den Bos, Caroline Sporleder

### The EHRI Project - Virtual Collections Revisited

This paper introduces details of EHRI’s approach to user-centric data integration across heterogeneous archival institutions using virtual collections. Virtual collections provide the means to re-unite archival material that has, through complex historical circumstances, been deposited in many physical locations. They also allow the creation of subject-specific groupings of material more closely comparable to archival research guides, and provide users with the ability to organise their own research in personalised ways.

Mike Bryant, Linda Reijnhoudt, Reto Speck, Thibault Clerice, Tobias Blanke

### Developing Onomastic Gazetteers and Prosopographies for the Ancient World Through Named Entity Recognition and Graph Visualization: Some Examples from Trismegistos People

Developing prosopographies or onomastic lists in a non-digital environment used to be a painstaking and time-consuming exercise, involving manual labour by teams of researchers, often taking decades. For some scholarly disciplines from the ancient world this is still true, especially those studying non-alphabetical writing systems that lack a uniform transcription system, e.g. Demotic. But for many others, such as Greek and Latin, digital full text corpora in Unicode are now available, often even freely accessible. In this paper we illustrate, on the basis of Trismegistos, how data collection through Named Entity Recognition and visualization through Social Network Analysis have huge potential to speed up the creation of onomastic lists and the development of prosopographies.

Yanne Broux, Mark Depauw

### Can Network Analysis Reveal Importance? Degree Centrality and Leaders in the EU Integration Process

This paper describes ongoing work on the potential of simple centrality algorithms for the robust and low-cost exploration of non-curated text corpora. More specifically, this paper studies (1) a network of historical personalities created from co-occurrences in historical photographs and (2) a network created from co-occurrences of names in Wikipedia pages with the goal to accurately identify outstanding personalities in the history of European integration even within flawed datasets. In both cases Degree centrality emerges as a viable method to detect leading personalities.

Marten Düring

### SEDNAM - Socio-Economic Dynamics: Networks and Agent-Based Models - Introduction

Recent years have witnessed the increasing interest of physicists, mathematicians and computer scientists for socio-economic systems. In our view, the many reasons behind this can be summarized by observing that traditional approaches to disciplines as sociology and economics have dramatically shown their limitations.

Serge Galam, Marco Alberto Javarone, Tiziano Squartini

### Reconstructing Topological Properties of Complex Networks Using the Fitness Model

A major problem in the study of complex socioeconomic systems is represented by privacy issues—that can put severe limitations on the amount of accessible information, forcing to build models on the basis of incomplete knowledge. In this paper we investigate a novel method to reconstruct global topological properties of a complex network starting from limited information. This method uses the knowledge of an intrinsic property of the nodes (indicated as

fitness

), and the number of connections of only a limited subset of nodes, in order to generate an ensemble of

exponential random graphs

that are representative of the real systems and that can be used to estimate its topological properties. Here we focus in particular on reconstructing the most basic properties that are commonly used to describe a network: density of links, assortativity, clustering. We test the method on both benchmark synthetic networks and real economic and financial systems, finding a remarkable robustness with respect to the number of nodes used for calibration. The method thus represents a valuable tool for gaining insights on privacy-protected systems.

Giulio Cimini, Tiziano Squartini, Nicolò Musmeci, Michelangelo Puliga, Andrea Gabrielli, Diego Garlaschelli, Stefano Battiston, Guido Caldarelli

### The Structure of Global Inter-firm Networks

We investigate the structure of global inter-firm relationships using a unique dataset containing information on customers, suppliers, licensors, licensees and strategic alliances for each of 412,814 major incorporated non-financial firms in the world. We focus on three different networks: customer-supplier network, licensee-licensor network, and strategic alliance network. In/out-degree distribution of these networks follows a Pareto distribution with an exponent of 1.5. The shortest path length on the networks for any pair of firms is around six links. The networks have a scale-free property. We also find that stock price returns tend to be more highly correlated the closer two listed firms are to each other in the networks. This suggests that a non-negligible portion of price fluctuations stems from the propagation of a particular firm’s shocks through inter-firm relationships.

Takayuki Mizuno, Takaaki Ohnishi, Tsutomu Watanabe

### Generalized Friendship Paradox: An Analytical Approach

The friendship paradox refers to the sociological observation that, while the people’s assessment of their own popularity is typically self-aggrandizing, in reality they are less popular than their friends. The generalized friendship paradox is the average alter superiority observed empirically in social settings, scientific collaboration networks, as well as online social media. We posit a quality-based network growth model in which the chance for a node to receive new links depends both on its degree and a quality parameter. Nodes are assigned qualities the first time they join the network, and these do not change over time. We analyse the model theoretically, finding expressions for the joint degree-quality distribution and nearest-neighbor distribution. We then demonstrate that this model exhibits both the friendship paradox and the generalized friendship paradox at the network level, regardless of the distribution of qualities. We also show that, in the proposed model, the degree and quality of each node are positively correlated regardless of how node qualities are distributed.

Babak Fotouhi, Naghmeh Momeni, Michael G. Rabbat

### Collective Intelligence-Based Sequential Pattern Mining Approach for Marketing Data

It is important to understand consumer needs correctly and clarify target of goods and service in marketing. In recent years, as information processing technology develops, video image analysis also has become as important tool for customer behavior analysis. It is said that discovering consumers’ purchase patterns of choosing purchased goods may be possible by using video data. Video is sequential temporal data, so time-series data mining technique is necessary. And generally consumer behavior is ambiguous. To respond to these situation, we are now developing a collective intelligence-based sequential pattern mining approach with high robustness and adaptability, and this time, we have succeeded in visualizing the relation of goods that they are continuously touched up by consumer.

Kazuaki Tsuboi, Kosuke Shinoda, Hirohiko Suwa, Satoshi Kurihara

### Workshop on Social Influence – SI 2014 - Introduction

The enormous popularity of the Internet and the evolution of social media create new areas for observing and modelling processes related to social sciences, such as social influence or diffusion of innovations. Now it is possible to evaluate different strategies of targeting people and observing the outcome of the process, since social graphs and social activity logs are definitely easier to obtain than two decades before. In this area several interesting topics can be distinguished, such as modelling the spread of influence, implementing and evaluating epidemiological models, tracking dynamics of diffusion processes or designing new algorithms towards these processes prediction or optimisation. This workshop aims to connect research related to both social and technical systems and one of key topics is social influence in socio-technical systems.

Radosław Michalski, Paulo Shakarian, Ingo Scholtes, Jarosław Jankowski

### Naming Game Dynamics on Pairs of Connected Networks with Competing Opinions

We study the Naming Game (NG) dynamics when two disjoint networks with nodes in consensus on competing opinions are connected with new links. We consider two sets of networks; one contains several networks with real-life communities, the other networks generated with the Watts-Strogatz and Barabási-Albert models. For each set, we run NG on all the possible pairs of networks and observe whether a consensus is reached to determine network features that correlate highly with such outcome. The main conclusion is that the quality of network community structure informs network’s ability to resist or exert influence from/on others. Moreover, the outcomes depend on whether Speaker-First of Listener-First NG is run and on whether a speaker or listener is biased towards high or low degree nodes. The results reveal strategies that may be used to enable and accelerate convergence to consensus in social networks.

Albert Trias Mansilla, Mingming Chen, Boleslaw K. Szymanski, Josep Lluís de la Rosa Esteva

### Threshold of Herd Effect for Online Events in China

Herd effect, as a way people are influenced by others, is popular in Internet. Prior empirical work has shown that when the online purchase passes one particular threshold, it is more likely that people are influenced by the product purchase may become hot. For online events, is there the same phenomena? Based on the data collected from SinaWeibo, the largest microblog in China, we use the fluctuation scaling method to analyze the influence process online. We also found the particular threshold for online events. Once the follow-up number of some event surpasses a particular threshold of popularity, collective behavior is easily to be observed. Interestingly, we classified all events into three types, political events, social events and non-public events. The threshold for these different types of events varies. The lowest threshold for social events can be explained by some offline surveys too.

Tieying Liu, Kai Chen, Yang Zhong

### Identifying Bridges for Information Spread Control in Social Networks

In this paper scalable method for cluster analysis based on random walks is presented. The main aim of the algorithm introduced in this paper is to detect dense subgraphs. Provided method has additional feature. It identifies groups of vertices which are responsible for information spreading among found clusters. The algorithm is sensitive to vertices assignment uncertainty. It distinguishes groups of nodes which form sparse clusters. These groups are mostly located in places crucial for information spreading so one can control signal propagation between separated dense subgraphs by using algorithm provided in this work.

Michał Wojtasiewicz, Krzysztof Ciesielski

### Think Before RT: An Experimental Study of Abusing Twitter Trends

Twitter is one of the most influential Online Social Networks (OSNs), adopted not only by hundreds of millions of users but also by public figures, organizations, news media, and official authorities. One of the factors contributing to this success is the inherent property of the platform for spreading news – encapsulated in short messages that are tweeted from one user to another – across the globe. Today, it is sufficient to just inspect the trending topics in Twitter for figuring out what is happening around the world. Unfortunately, the capabilities of the platform can be also abused and exploited for distributing illicit content or boosting false information, and the consequences of such actions can be

really

severe: one false tweet was enough for making the stock-market crash for a short period of time in 2013.

In this paper, we analyze a large collection of tweets and explore the dynamics of popular trends and other Twitter features in regards to deliberate misuse. We identify a specific class of trend-exploiting campaigns that exhibits a stealthy behavior and hides spam URLs within Google search-result links. We build a spam classifier for both users and tweets, and demonstrate its simplicity and efficiency. Finally, we visualize these spam campaigns and reveal their inner structure.

Despoina Antonakaki, Iasonas Polakis, Elias Athanasopoulos, Sotiris Ioannidis, Paraskevi Fragopoulou

### SoHuman 2014 – 3rd International Workshop on Social Media in Crowdsourcing and Human Computation - Introduction

Theme: Socially-Aware Crowdsourcing – The Value of the Human Touch

This workshop aims at bringing together researchers and practitioners from different disciplines to explore the challenges and opportunities of novel approaches to collective intelligence, crowdsourcing and human computation that address social aspects as a core element of their design principles, implementations or scientific investigation.

Jasminko Novak, Alessandro Bozzon, Piero Fraternali, Petros Daras, Otto Chrons, Bonnie Nardi, Alejandro Jaimes

### CrowdMonitor: Monitoring Physical and Digital Activities of Citizens During Emergencies

In recent times, emergencies such as the 2013 flood in mid Europe have clearly shown that besides the professional emergency services and authorities, citizens get a more and more active role in crisis response work. They organize themselves and coordinate private relief activities. Those activities can be found in (physical) groups of affected local citizens, but also within (digital) social media groups. To detect and use this civil potential by professional emergency services, approaches are needed that support the instructing of citizens and coordinating of their actions to avoid needless duplications or conflicts. In this paper we present a concept, based on a mobile crowd sensing approach, which was designed as well as implemented as the system prototype CrowdMonitor and facilitates the monitoring of physical and digital activities of and the assignment of specific tasks to citizens.

Thomas Ludwig, Tim Siebigteroth, Volkmar Pipek

### Crowd Work CV: Recognition for Micro Work

With an increasing micro-labor supply and a larger available workforce, new microtask platforms have emerged providing an extensive list of marketplaces where microtasks are offered by requesters and completed by crowd workers. The current microtask crowdsourcing infrastructure does not offer the possibility to be recognised for already accomplished and offered work in different microtask platforms. This lack of information leads to uninformed decisions in selection processes, which have been acknowledged as a promising way to improve the quality of crowd work. To overcome this limitation, we propose Crowd Work CV, an RDF-based data model that, similarly to traditional Curriculum Vitae, captures crowd workers’ interests, qualifications and work history, as well as requesters’ information. Crowd Work CV enables the representation of crowdsourcing agents’ identities and promotes their work experience across the different microtask marketplaces.

Cristina Sarasua, Matthias Thimm

### Means and Roles of Crowdsourcing Vis-À-Vis CrowdFunding for the Creation of Stakeholders Collective Benefits

This work aims at assessing characteristics and roles of Crowdsourced activities vis-à-vis online CrowdFunding platforms, assessing potential collective benefits for stakeholders that arise from social media individual activities and investment decisions of users-investors. CrowdFunding platforms in fact leverage crowds and undefined pools of potential investors to screen, select and spread each CrowdFunding initiative in a detailed and thorough way – hence allowing users to perform several tasks that are traditionally carried out throughout IT models and static criteria.

We identify 5 key roles played by Crowdsourcing Systems (CS) and we develop a potential model aimed at screening positive outcomes that benefit the collectivity (stakeholders). The model evaluates Crowdsourced activities as indicators for the creation of sustainable value for the enterprise and therefore for the collectivity of stakeholders. In order to test the model, we are currently deploying an Equity CrowdFunding platform embedding strong Crowdsourced tasks.

In conclusion, we classify opportunities, limits and potential for a successful deployment of Crowdsourced tasks in CrowdFunding.

Angelo Miglietta, Emanuele Parisi

### On Utilizing Player Models to Predict Behavior in Crowdsourcing Tasks

Player Modeling is a research field that studies player characteristics by analyzing in-game behavior. We aim to develop independent models, which are transferable and useful beyond a game’s context. We shall demonstrate the feasibility of this approach by applying player models to crowdsourcing to predict workers’ task completion effectiveness. Specifically, we model a user’s Need for Cognition based on in-game behavior, and based on that try to assign appropriate tasks to workers.

Carlos Pereira Santos, Vassilis-Javed Khan, Panos Markopoulos

### Comparing Human and Algorithm Performance on Estimating Word-Based Semantic Similarity

Understanding natural language is an inherently complex task for computer algorithms. Crowdsourcing natural language tasks such as semantic similarity is therefore a promising approach. In this paper, we investigate the performance of crowdworkers and compare them to offline contributors as well as to state of the art algorithms. We will illustrate that algorithms do outperform single human contributors but still cannot compete with results gathered from groups of contributors. Furthermore, we will demonstrate that this effect is persistent across different contributor populations. Finally, we give guidelines for easing the challenge of collecting word based semantic similarity data from human contributors.

Nils Batram, Markus Krause, Paul-Olivier Dehaye

### Mobile Picture Guess: A Crowdsourced Serious Game for Simulating Human Perception

In this paper we present a novel idea that combines a mobile game with a Crowdsourcing campaign. The game is designed for studies into the visual saliency of image segments, where the game objective is for players to guess what is depicted in an image that is gradually uncovering. Game scores depend on the number of correct answers and the speed at which these are provided. With these game mechanics, we can determine the image segments that are most essential to players when asked to guess the image content, thereby assessing the most salient image regions. Through the combination of this game scenario and a Crowdsourcing campaign, we also present a way to tackle the rising demands for higher salaries in this line of work. By providing workers with an entertaining task, we aim to increase player motivation and hopefully make them want to play longer than required. In this paper we also present a sample study that evaluates the visual saliency of 200 animal images from Flickr. We conclude with preliminary results from the study and with our insights on how this approach can be applied to improve the current understanding of human visual perception.

Michael Riegler, Ragnhild Eg, Mathias Lux, Markus Schicho

### histoGraph as a Demonstrator for Domain Specific Challenges to Crowd-Sourcing

histoGraph provides an integrated pipeline for the extraction of co-occurrence information in historical photos to build an exploreable social graph of relationships that can lead to new insights for historical research. The application leverages on the CUbRIK platform for human/machine computation and applies a hybrid approach to face-detection and -recognition that combines the strengths of algorithmic analysis with expert and generic crowd sourcing. Following a general overview of our approach, we explore the surplus value of human touch for the identification of identities in historical image collections through a uniform crowd-sourcing approach. We find that only a combination of generic and expert crowds yields promising results. Even though the application was designed and developed for a specific target audience, we aim not only at demonstrating the current functionality but also identify and discuss several core principles that can be transferred to other domains.

Lars Wieneke, Marten Düring, Vincenzo Croce, Jasminko Novak

### Backmatter

Weitere Informationen