Elsevier

Information Sciences

Volume 319, 20 October 2015, Pages 18-37
Information Sciences

Discovering missing me edges across social networks

https://doi.org/10.1016/j.ins.2015.05.014Get rights and content

Abstract

Distinct social networks are interconnected via membership overlap, which plays a key role when crossing information is investigated in the context of multiple-social-network analysis. Unfortunately, users do not always make their membership to two distinct social networks explicit, by specifying the so-called me edge (practically, corresponding to a link between the two accounts), thus missing a potentially very useful information. As a consequence, discovering missing me edges is an important problem to address in this context with potential powerful applications. In this paper, we propose a common-neighbor approach to detecting missing me edges, which returns good results in real-life settings. Indeed, an experimental campaign shows both that the state-of-the-art common-neighbor approaches cannot be effectively applied to our problem and, conversely, that our approach returns precise and complete results.

Introduction

In recent years, (on-line) social networks have been showing a rapid development growth becoming probably the main actor of the Web 2.0. The rapid and revolutionary diffusion of social networks among all segments of the population has attracted the interest of researchers from several fields of computer science, such as digital forensics [40], [24], user behavior [6], trust and reputation [26], steganography [25], [28], also for the applications that the analysis of involved data can enable [61], [22], [13], [38], [14], [12]. In this landscape, Social Network Analysis and Social Network Mining have assumed an important role because both the large volume of data and their graph-based organization have enforced the development of specific models and methods allowing the study of social-network data to discover knowledge from them. Clearly, the graph-based data schema gives a great information power to links among data, because it allows people profiles, resources, activities, and so on, to be directly (and indirectly) related. The crucial role of relationships in the expression of an individual’s personality and social identity, traditionally recognized by social sciences, is even strengthened in the field of virtual societies, in which relationship links are the main form of expression of participation of individuals to the community. To make more challenging the analysis of this reality, consider that the reference scenario does not consist of a single, isolated, independent social network, but is a constellation of social networks, each forming a community with specific connotations, but strongly interconnected with each other. It is a matter of fact that, despite the inherent underlying heterogeneity, the interaction among distinct social networks is the basis of a new emergent internetworking scenario enabling a lot of strategic applications, whose main strength will be just the integration of possibly different communities yet preserving their diversity and autonomy. Clearly, social mining and analysis approaches may strongly rely on this huge multi-network source of information, which reflects multiple aspects of people personal life, thus enabling a lot of powerful discovering activities.

From this perspective, links among different social networks assume a fundamental role. They connect the same user on two different social networks who thus assumes the role of passing point of information from one social network the other. For this reason, we call this user i-bridge.1 The link derives from the explicit user’s declaration (sometimes supported and encouraged by specific tools) consisting in the insertion of me edges [1]. Unfortunately, for disparate reasons, users do not always make their membership to two distinct social networks explicit, by specifying the so-called me edge (practically corresponding to a link between the two accounts), thus missing a potentially very useful information. As a consequence, in the overall underlying (social internetworking) graph a big number of missed me edges exists, whose discovery represents a very important issue. In other words, an interesting problem of missing link detection arises, which partially overlaps with a link prediction issue, because we may expect that a portion of missing me edges will be inserted in a next stage in the graph.

In this paper, we deal with the above problem by proposing an effective solution experimentally tested in a real-life Social Internetworking Scenario (SIS, for short) [15]. To the best of our knowledge, the problem of detecting me edges has not been investigated in the literature, but the approach we adopt in this work, which exploits a recursive notion of common-neighbor similarity, suggested us to prior verify whether common-neighbor approaches for link prediction [47] can be directly applied to our problem. The answer to this question was definitely negative, as intuitively explained in Section 2 and experimentally confirmed in Section 6, thus motivating our work. Our solution is based on a notion of node similarity, whose usage allows us to detect whether a suitable threshold is exceeded and then a missing me edge between two nodes is detected. The similarity between two nodes is obtained by combining two contributions: a string similarity between the associated usernames, and a contribution based on a suitable recursive notion of common-neighbor similarity. The neighborhood similarity allows these errors to be detected and avoided. As a consequence, it is important to clarify that the problem we are addressing does not deal with the case in which a user with membership overlap between two social networks chooses the corresponding account names very different from each other. It is worth noting that, under this case, often falls the situation in which a user voluntarily keeps the two accounts separated in their respective social networks and thus avoid also to have common friends. Therefore, also neighborhood similarity fails.

The plan of this paper is as follows: in the next section, we examine related literature. In Section 3, we present our recursive notion of similarity. On the basis of this notion, we design the method we use to detect missing me edges. This is described in Section 4. In Section 5, we determine the computational complexity of our approach. In Section 6, we illustrate the experiments we have carried out to verify the performances of our technique. Finally, in Section 7, we draw our conclusions and sketch possible future evolutions of our research.

Section snippets

Related work

In this section, we survey the literature related to the topics addressed in this paper. It is discussed subdivided into three categories, one for each subsection.

The notion of similarity

Our approach operates in a Social Internetworking System (SIS) resulting from the interconnection of a number of distinct social networks. We start with the basic notion of underlying graph, which is the layer we deal with.

Definition 3.1

A t-Social-Internetworking Graph is a directed graph G=N,E, where N is the set of nodes, E is the set of edges (i.e., ordered pairs of nodes) and N is partitioned into t subsets S1,,St. The graphs corresponding to S1,,St are called social graphs. Given a node aN we

Discovering links among social networks

In this section we focus on the main goal of this paper, that is the detection of me edges in a SIS. Consider given a Social Internetworking System (SIS) composed of t social networks, whose underlying Social Internetworking Graph is G=N,EfEm. Due to the hugeness of the search space and the high sparsity of me-edges [11], we risk to meet prohibitive obstacles of infeasibility if we do not pose the problem in a suitable way. Accordingly, we consider the following two problems:

  • P1

    (node-seed

Complexity issues

In this section, we analyze the computational complexity of our approach. We distinguish the operations that involve network accesses from those executed in memory, remarking that they differ in time of (at least) nine orders of magnitude. We denote by O(f(n)) I/Os, for a given function f(n), the asymptotical evaluation of the number of operations involving network accesses vs. the input variable n.

The first result concerns the computational complexity of the operator TQh(a,b,k0). Intuitively,

Experiments

In this section, we present our experimental campaign aimed at determining the performances of our approach. Because it operates on a SIS, we had to extract not only the connections among the accounts of different users in the same social network but also the connections among the accounts of the same user in different social networks. To handle these connections, two standards encoding human relationships are generally exploited. The former is XFN (XHTML Friends Network) [27]. It simply uses

Conclusion and future work

In this paper, we studied the problem of discovering missing me edges in a Social Internetworking Scenario. The most evident information we can use to detect missing me edges, besides usernames, concerns neighbors. By means of experimental campaign, we showed that state-of-the-art common-neighbor techniques cannot be applied to solve this problem. Thus, the need of studying the problem as new and finding a specific solution arises. We defined a suitable notion of “inter-social-network”

Acknowledgments

This work has been partially supported by the TENACE PRIN Project (n. 20103P34XC) funded by the Italian Ministry of Education, University and Research, by the Program “Programma Operativo Nazionale Ricerca e Competitività” 2007–2013, project BA2Kno (Business Analytics to Know) PON03PE_00001_1, in “Laboratorio in Rete di Service Innovation”, and by the Program “Programma Operativo Nazionale Ricerca e Competitività” 2007–2013, Distretto Tecnologico CyberSecurity funded by the Italian Ministry of

References (65)

  • L. Aiello et al.

    Link creation and information spreading over social and communication ties in an interest-based online social network

    EPJ Data Sci.

    (2012)
  • M. Al Hasan et al.

    A survey of link prediction in social networks

  • S. Bartunov et al.

    Joint link-attribute user identity resolution in online social networks

  • M. Berlingerio et al.

    Foundations of multidimensional network analysis

  • M. Berlingerio et al.

    Multidimensional networks: foundations of structural analysis

    World Wide Web

    (2013)
  • D. Brickley, L. Miller, The Friend of a Friend (FOAF) project, 2013....
  • M. Bröcheler et al.

    Probabilistic subgraph matching on huge social networks

  • F. Buccafurri et al.

    A model to support multi-social-network applications

  • F. Buccafurri et al.

    Measuring betweennes centrality in social internetworking scenarios

  • F. Buccafurri et al.

    Driving global team formation in social networks to obtain diversity

  • F. Buccafurri, G. Lax, A. Nocera, D. Ursino, Supporting Information spread in a social internetworking scenario, in:...
  • F. Buccafurri et al.

    Crawling social internetworking systems

  • F. Buccafurri et al.

    Discovering links among social networks

  • F. Buccafurri et al.

    Internetworking assortativity in Facebook

  • F. Buccafurri et al.

    A system for extracting structural information from social network accounts

    Softw.: Pract. Exper.

    (2014)
  • S.V. Buldyrev et al.

    Catastrophic cascade of failures in interdependent networks

    Nature

    (2010)
  • X. Cai et al.

    Reciprocal and heterogeneous link prediction in social networks

  • A. Castiglione et al.

    A forensic analysis of images on online social networks

  • A. Castiglione et al.

    Steganography and secure communication on online social networks and online photo sharing

  • T. Celik, E. Meyer, M. Mullenweg, XHTML friends network, in: Proc. ACM Hypertext, vol. 4,...
  • M. Conti et al.

    Virtual private social networks

  • I. Dagan, F. Pereira, L. Lee, Similarity-based estimation of word cooccurrence probabilities, in: Proc. of the Annual...
  • Cited by (52)

    • An approach to detect backbones of information diffusers among different communities of a social platform

      2022, Data and Knowledge Engineering
      Citation Excerpt :

      In fact, she allows the interaction between users of different social networks who could not communicate otherwise [44]. The user’s join to multiple social networks can be explicitly stated by her through the so-called “me edges”, or it can be inferred by applying approaches that detect two accounts on different social networks belonging to the same user [45,46]. Our approach can easily be extended to such a scenario.

    • Machine learning algorithms for social media analysis: A survey

      2021, Computer Science Review
      Citation Excerpt :

      One solution to this is, identifying the users who are maintaining multiple accounts to publish the same information on SM. In [161], authors developed a solution to the real-time scenario of having multiple accounts in different social media, and ‘me’ edges is a technique used to define a solution to avoid misuse of data with standard account credentials. It is challenging to maintain sensitive information not to reveal to all social media account.

    • A privacy-preserving approach to prevent feature disclosure in an IoT scenario

      2020, Future Generation Computer Systems
      Citation Excerpt :

      Indeed, (i) each identified group corresponds to a network of the system; (ii) each object can be modeled by means of a node; (iii) relationships between objects of the same group can be represented by means of arcs inside the corresponding networks (they are called inner arcs); (iv) relationships between objects of different groups are modeled as arcs linking nodes of different networks (they are called cross-arcs). The possibility to have a direct, natural and immediate multi-network representation of our scenario allows for an easy mapping with properties, operations and concepts of multi-network contexts [19–21]. Thanks to this, in our analyses, we can benefit from the wide variety of results found for multi-network systems in the past literature.

    • Detect potential relations by link prediction in multi-relational social networks

      2018, Decision Support Systems
      Citation Excerpt :

      Link prediction is an effective approach to detect such potential relations between the individuals in trade and social networks. Using link prediction, we can discover potential interpersonal contacts [19,20] by analyzing the customers' shopping behavior and social relationships. Link prediction can reveal the potential friendship of users in social networks [21,22] and can recommend the possible merchandise or services to the users [2,23].

    View all citing articles on Scopus

    A shorter abridged version of this paper appears in “Discovering Links among Social Networks”, by F. Buccafurri, G. Lax, A. Nocera, and D. Ursino, in the Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, United Kingdom, 2012. Springer [17].

    View full text