Discovering missing me edges across social networks☆
Introduction
In recent years, (on-line) social networks have been showing a rapid development growth becoming probably the main actor of the Web 2.0. The rapid and revolutionary diffusion of social networks among all segments of the population has attracted the interest of researchers from several fields of computer science, such as digital forensics [40], [24], user behavior [6], trust and reputation [26], steganography [25], [28], also for the applications that the analysis of involved data can enable [61], [22], [13], [38], [14], [12]. In this landscape, Social Network Analysis and Social Network Mining have assumed an important role because both the large volume of data and their graph-based organization have enforced the development of specific models and methods allowing the study of social-network data to discover knowledge from them. Clearly, the graph-based data schema gives a great information power to links among data, because it allows people profiles, resources, activities, and so on, to be directly (and indirectly) related. The crucial role of relationships in the expression of an individual’s personality and social identity, traditionally recognized by social sciences, is even strengthened in the field of virtual societies, in which relationship links are the main form of expression of participation of individuals to the community. To make more challenging the analysis of this reality, consider that the reference scenario does not consist of a single, isolated, independent social network, but is a constellation of social networks, each forming a community with specific connotations, but strongly interconnected with each other. It is a matter of fact that, despite the inherent underlying heterogeneity, the interaction among distinct social networks is the basis of a new emergent internetworking scenario enabling a lot of strategic applications, whose main strength will be just the integration of possibly different communities yet preserving their diversity and autonomy. Clearly, social mining and analysis approaches may strongly rely on this huge multi-network source of information, which reflects multiple aspects of people personal life, thus enabling a lot of powerful discovering activities.
From this perspective, links among different social networks assume a fundamental role. They connect the same user on two different social networks who thus assumes the role of passing point of information from one social network the other. For this reason, we call this user i-bridge.1 The link derives from the explicit user’s declaration (sometimes supported and encouraged by specific tools) consisting in the insertion of me edges [1]. Unfortunately, for disparate reasons, users do not always make their membership to two distinct social networks explicit, by specifying the so-called me edge (practically corresponding to a link between the two accounts), thus missing a potentially very useful information. As a consequence, in the overall underlying (social internetworking) graph a big number of missed me edges exists, whose discovery represents a very important issue. In other words, an interesting problem of missing link detection arises, which partially overlaps with a link prediction issue, because we may expect that a portion of missing me edges will be inserted in a next stage in the graph.
In this paper, we deal with the above problem by proposing an effective solution experimentally tested in a real-life Social Internetworking Scenario (SIS, for short) [15]. To the best of our knowledge, the problem of detecting me edges has not been investigated in the literature, but the approach we adopt in this work, which exploits a recursive notion of common-neighbor similarity, suggested us to prior verify whether common-neighbor approaches for link prediction [47] can be directly applied to our problem. The answer to this question was definitely negative, as intuitively explained in Section 2 and experimentally confirmed in Section 6, thus motivating our work. Our solution is based on a notion of node similarity, whose usage allows us to detect whether a suitable threshold is exceeded and then a missing me edge between two nodes is detected. The similarity between two nodes is obtained by combining two contributions: a string similarity between the associated usernames, and a contribution based on a suitable recursive notion of common-neighbor similarity. The neighborhood similarity allows these errors to be detected and avoided. As a consequence, it is important to clarify that the problem we are addressing does not deal with the case in which a user with membership overlap between two social networks chooses the corresponding account names very different from each other. It is worth noting that, under this case, often falls the situation in which a user voluntarily keeps the two accounts separated in their respective social networks and thus avoid also to have common friends. Therefore, also neighborhood similarity fails.
The plan of this paper is as follows: in the next section, we examine related literature. In Section 3, we present our recursive notion of similarity. On the basis of this notion, we design the method we use to detect missing me edges. This is described in Section 4. In Section 5, we determine the computational complexity of our approach. In Section 6, we illustrate the experiments we have carried out to verify the performances of our technique. Finally, in Section 7, we draw our conclusions and sketch possible future evolutions of our research.
Section snippets
Related work
In this section, we survey the literature related to the topics addressed in this paper. It is discussed subdivided into three categories, one for each subsection.
The notion of similarity
Our approach operates in a Social Internetworking System (SIS) resulting from the interconnection of a number of distinct social networks. We start with the basic notion of underlying graph, which is the layer we deal with. Definition 3.1 A t-Social-Internetworking Graph is a directed graph , where N is the set of nodes, E is the set of edges (i.e., ordered pairs of nodes) and N is partitioned into t subsets . The graphs corresponding to are called social graphs. Given a node we
Discovering links among social networks
In this section we focus on the main goal of this paper, that is the detection of me edges in a SIS. Consider given a Social Internetworking System (SIS) composed of t social networks, whose underlying Social Internetworking Graph is . Due to the hugeness of the search space and the high sparsity of me-edges [11], we risk to meet prohibitive obstacles of infeasibility if we do not pose the problem in a suitable way. Accordingly, we consider the following two problems:
(node-seed
Complexity issues
In this section, we analyze the computational complexity of our approach. We distinguish the operations that involve network accesses from those executed in memory, remarking that they differ in time of (at least) nine orders of magnitude. We denote by I/Os, for a given function , the asymptotical evaluation of the number of operations involving network accesses vs. the input variable n.
The first result concerns the computational complexity of the operator . Intuitively,
Experiments
In this section, we present our experimental campaign aimed at determining the performances of our approach. Because it operates on a SIS, we had to extract not only the connections among the accounts of different users in the same social network but also the connections among the accounts of the same user in different social networks. To handle these connections, two standards encoding human relationships are generally exploited. The former is XFN (XHTML Friends Network) [27]. It simply uses
Conclusion and future work
In this paper, we studied the problem of discovering missing me edges in a Social Internetworking Scenario. The most evident information we can use to detect missing me edges, besides usernames, concerns neighbors. By means of experimental campaign, we showed that state-of-the-art common-neighbor techniques cannot be applied to solve this problem. Thus, the need of studying the problem as new and finding a specific solution arises. We defined a suitable notion of “inter-social-network”
Acknowledgments
This work has been partially supported by the TENACE PRIN Project (n. 20103P34XC) funded by the Italian Ministry of Education, University and Research, by the Program “Programma Operativo Nazionale Ricerca e Competitività” 2007–2013, project BA2Kno (Business Analytics to Know) PON03PE_00001_1, in “Laboratorio in Rete di Service Innovation”, and by the Program “Programma Operativo Nazionale Ricerca e Competitività” 2007–2013, Distretto Tecnologico CyberSecurity funded by the Italian Ministry of
References (65)
- et al.
Evolution of the social network of scientific collaborations
Phys. A: Stat. Mech. Appl.
(2002) - et al.
Characterizing user navigation and interactions in online social networks
Inf. Sci.
(2012) - et al.
Bridge analysis in a social internetworking scenario
Inf. Sci.
(2013) - et al.
Moving from social networks to social internetworking scenarios: the crawling perspective
Inf. Sci.
(2014) - et al.
User identification for cross-system personalisation
Inf. Sci.
(2009) - et al.
The socialTrust framework for trusted social information management: architecture and algorithms
Inf. Sci.
(2010) - et al.
OWA operator based link prediction ensemble for social network
Expert Syst. Appl.
(2015) - et al.
Link prediction in complex networks: a survey
Phys. A: Stat. Mech. Appl.
(2011) - et al.
Fast and accurate link prediction in social networking systems
J. Syst. Softw.
(2012) - Google Social Graph, 2012....
Link creation and information spreading over social and communication ties in an interest-based online social network
EPJ Data Sci.
A survey of link prediction in social networks
Joint link-attribute user identity resolution in online social networks
Foundations of multidimensional network analysis
Multidimensional networks: foundations of structural analysis
World Wide Web
Probabilistic subgraph matching on huge social networks
A model to support multi-social-network applications
Measuring betweennes centrality in social internetworking scenarios
Driving global team formation in social networks to obtain diversity
Crawling social internetworking systems
Discovering links among social networks
Internetworking assortativity in Facebook
A system for extracting structural information from social network accounts
Softw.: Pract. Exper.
Catastrophic cascade of failures in interdependent networks
Nature
Reciprocal and heterogeneous link prediction in social networks
A forensic analysis of images on online social networks
Steganography and secure communication on online social networks and online photo sharing
Virtual private social networks
Cited by (52)
Unveiling Qzone: A measurement study of a large-scale online social network
2023, Information SciencesAn approach to detect backbones of information diffusers among different communities of a social platform
2022, Data and Knowledge EngineeringCitation Excerpt :In fact, she allows the interaction between users of different social networks who could not communicate otherwise [44]. The user’s join to multiple social networks can be explicitly stated by her through the so-called “me edges”, or it can be inferred by applying approaches that detect two accounts on different social networks belonging to the same user [45,46]. Our approach can easily be extended to such a scenario.
Machine learning algorithms for social media analysis: A survey
2021, Computer Science ReviewCitation Excerpt :One solution to this is, identifying the users who are maintaining multiple accounts to publish the same information on SM. In [161], authors developed a solution to the real-time scenario of having multiple accounts in different social media, and ‘me’ edges is a technique used to define a solution to avoid misuse of data with standard account credentials. It is challenging to maintain sensitive information not to reveal to all social media account.
A privacy-preserving approach to prevent feature disclosure in an IoT scenario
2020, Future Generation Computer SystemsCitation Excerpt :Indeed, (i) each identified group corresponds to a network of the system; (ii) each object can be modeled by means of a node; (iii) relationships between objects of the same group can be represented by means of arcs inside the corresponding networks (they are called inner arcs); (iv) relationships between objects of different groups are modeled as arcs linking nodes of different networks (they are called cross-arcs). The possibility to have a direct, natural and immediate multi-network representation of our scenario allows for an easy mapping with properties, operations and concepts of multi-network contexts [19–21]. Thanks to this, in our analyses, we can benefit from the wide variety of results found for multi-network systems in the past literature.
Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
2019, Information Processing and ManagementDetect potential relations by link prediction in multi-relational social networks
2018, Decision Support SystemsCitation Excerpt :Link prediction is an effective approach to detect such potential relations between the individuals in trade and social networks. Using link prediction, we can discover potential interpersonal contacts [19,20] by analyzing the customers' shopping behavior and social relationships. Link prediction can reveal the potential friendship of users in social networks [21,22] and can recommend the possible merchandise or services to the users [2,23].
- ☆
A shorter abridged version of this paper appears in “Discovering Links among Social Networks”, by F. Buccafurri, G. Lax, A. Nocera, and D. Ursino, in the Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, United Kingdom, 2012. Springer [17].