Discovering missing me edges across social networks

doi:10.1016/j.ins.2015.05.014

Information Sciences

Volume 319, 20 October 2015, Pages 18-37

https://doi.org/10.1016/j.ins.2015.05.014 Get rights and content

Abstract

Distinct social networks are interconnected via membership overlap, which plays a key role when crossing information is investigated in the context of multiple-social-network analysis. Unfortunately, users do not always make their membership to two distinct social networks explicit, by specifying the so-called me edge (practically, corresponding to a link between the two accounts), thus missing a potentially very useful information. As a consequence, discovering missing me edges is an important problem to address in this context with potential powerful applications. In this paper, we propose a common-neighbor approach to detecting missing me edges, which returns good results in real-life settings. Indeed, an experimental campaign shows both that the state-of-the-art common-neighbor approaches cannot be effectively applied to our problem and, conversely, that our approach returns precise and complete results.

Introduction

In recent years, (on-line) social networks have been showing a rapid development growth becoming probably the main actor of the Web 2.0. The rapid and revolutionary diffusion of social networks among all segments of the population has attracted the interest of researchers from several fields of computer science, such as digital forensics [40], [24], user behavior [6], trust and reputation [26], steganography [25], [28], also for the applications that the analysis of involved data can enable [61], [22], [13], [38], [14], [12]. In this landscape, Social Network Analysis and Social Network Mining have assumed an important role because both the large volume of data and their graph-based organization have enforced the development of specific models and methods allowing the study of social-network data to discover knowledge from them. Clearly, the graph-based data schema gives a great information power to links among data, because it allows people profiles, resources, activities, and so on, to be directly (and indirectly) related. The crucial role of relationships in the expression of an individual’s personality and social identity, traditionally recognized by social sciences, is even strengthened in the field of virtual societies, in which relationship links are the main form of expression of participation of individuals to the community. To make more challenging the analysis of this reality, consider that the reference scenario does not consist of a single, isolated, independent social network, but is a constellation of social networks, each forming a community with specific connotations, but strongly interconnected with each other. It is a matter of fact that, despite the inherent underlying heterogeneity, the interaction among distinct social networks is the basis of a new emergent internetworking scenario enabling a lot of strategic applications, whose main strength will be just the integration of possibly different communities yet preserving their diversity and autonomy. Clearly, social mining and analysis approaches may strongly rely on this huge multi-network source of information, which reflects multiple aspects of people personal life, thus enabling a lot of powerful discovering activities.

From this perspective, links among different social networks assume a fundamental role. They connect the same user on two different social networks who thus assumes the role of passing point of information from one social network the other. For this reason, we call this user i-bridge.¹ The link derives from the explicit user’s declaration (sometimes supported and encouraged by specific tools) consisting in the insertion of me edges [1]. Unfortunately, for disparate reasons, users do not always make their membership to two distinct social networks explicit, by specifying the so-called me edge (practically corresponding to a link between the two accounts), thus missing a potentially very useful information. As a consequence, in the overall underlying (social internetworking) graph a big number of missed me edges exists, whose discovery represents a very important issue. In other words, an interesting problem of missing link detection arises, which partially overlaps with a link prediction issue, because we may expect that a portion of missing me edges will be inserted in a next stage in the graph.

In this paper, we deal with the above problem by proposing an effective solution experimentally tested in a real-life Social Internetworking Scenario (SIS, for short) [15]. To the best of our knowledge, the problem of detecting me edges has not been investigated in the literature, but the approach we adopt in this work, which exploits a recursive notion of common-neighbor similarity, suggested us to prior verify whether common-neighbor approaches for link prediction [47] can be directly applied to our problem. The answer to this question was definitely negative, as intuitively explained in Section 2 and experimentally confirmed in Section 6, thus motivating our work. Our solution is based on a notion of node similarity, whose usage allows us to detect whether a suitable threshold is exceeded and then a missing me edge between two nodes is detected. The similarity between two nodes is obtained by combining two contributions: a string similarity between the associated usernames, and a contribution based on a suitable recursive notion of common-neighbor similarity. The neighborhood similarity allows these errors to be detected and avoided. As a consequence, it is important to clarify that the problem we are addressing does not deal with the case in which a user with membership overlap between two social networks chooses the corresponding account names very different from each other. It is worth noting that, under this case, often falls the situation in which a user voluntarily keeps the two accounts separated in their respective social networks and thus avoid also to have common friends. Therefore, also neighborhood similarity fails.

The plan of this paper is as follows: in the next section, we examine related literature. In Section 3, we present our recursive notion of similarity. On the basis of this notion, we design the method we use to detect missing me edges. This is described in Section 4. In Section 5, we determine the computational complexity of our approach. In Section 6, we illustrate the experiments we have carried out to verify the performances of our technique. Finally, in Section 7, we draw our conclusions and sketch possible future evolutions of our research.

Section snippets

Related work

In this section, we survey the literature related to the topics addressed in this paper. It is discussed subdivided into three categories, one for each subsection.

The notion of similarity

Our approach operates in a Social Internetworking System (SIS) resulting from the interconnection of a number of distinct social networks. We start with the basic notion of underlying graph, which is the layer we deal with.

Definition 3.1

A t-Social-Internetworking Graph is a directed graph $G = 〈 N, E 〉$ , where N is the set of nodes, E is the set of edges (i.e., ordered pairs of nodes) and N is partitioned into t subsets $S_{1}, \dots, S_{t}$ . The graphs corresponding to $S_{1}, \dots, S_{t}$ are called social graphs. Given a node $a \in N$ we

Discovering links among social networks

In this section we focus on the main goal of this paper, that is the detection of me edges in a SIS. Consider given a Social Internetworking System (SIS) composed of t social networks, whose underlying Social Internetworking Graph is $G = 〈 N, E_{f} \cup E_{m} 〉$ . Due to the hugeness of the search space and the high sparsity of me-edges [11], we risk to meet prohibitive obstacles of infeasibility if we do not pose the problem in a suitable way. Accordingly, we consider the following two problems:

$P_{1}$
(node-seed

Complexity issues

In this section, we analyze the computational complexity of our approach. We distinguish the operations that involve network accesses from those executed in memory, remarking that they differ in time of (at least) nine orders of magnitude. We denote by $O (f (n))$ I/Os, for a given function $f (n)$ , the asymptotical evaluation of the number of operations involving network accesses vs. the input variable n.

The first result concerns the computational complexity of the operator $T_{Q}^{h} (a, b, k_{0})$ . Intuitively,

Experiments

In this section, we present our experimental campaign aimed at determining the performances of our approach. Because it operates on a SIS, we had to extract not only the connections among the accounts of different users in the same social network but also the connections among the accounts of the same user in different social networks. To handle these connections, two standards encoding human relationships are generally exploited. The former is XFN (XHTML Friends Network) [27]. It simply uses

Conclusion and future work

In this paper, we studied the problem of discovering missing me edges in a Social Internetworking Scenario. The most evident information we can use to detect missing me edges, besides usernames, concerns neighbors. By means of experimental campaign, we showed that state-of-the-art common-neighbor techniques cannot be applied to solve this problem. Thus, the need of studying the problem as new and finding a specific solution arises. We defined a suitable notion of “inter-social-network”

Acknowledgments

This work has been partially supported by the TENACE PRIN Project (n. 20103P34XC) funded by the Italian Ministry of Education, University and Research, by the Program “Programma Operativo Nazionale Ricerca e Competitività” 2007–2013, project BA2Kno (Business Analytics to Know) PON03PE_00001_1, in “Laboratorio in Rete di Service Innovation”, and by the Program “Programma Operativo Nazionale Ricerca e Competitività” 2007–2013, Distretto Tecnologico CyberSecurity funded by the Italian Ministry of

References (65)

A. Barabási et al.
Evolution of the social network of scientific collaborations
Phys. A: Stat. Mech. Appl.
(2002)
F. Benevenuto et al.
Characterizing user navigation and interactions in online social networks
Inf. Sci.
(2012)
F. Buccafurri et al.
Bridge analysis in a social internetworking scenario
Inf. Sci.
(2013)
F. Buccafurri et al.
Moving from social networks to social internetworking scenarios: the crawling perspective
Inf. Sci.
(2014)
F. Carmagnola et al.
User identification for cross-system personalisation
Inf. Sci.
(2009)
J. Caverlee et al.
The socialTrust framework for trusted social information management: architecture and algorithms
Inf. Sci.
(2010)
Y. He et al.
OWA operator based link prediction ensemble for social network
Expert Syst. Appl.
(2015)
L. Lü et al.
Link prediction in complex networks: a survey
Phys. A: Stat. Mech. Appl.
(2011)
A. Papadimitriou et al.
Fast and accurate link prediction in social networking systems
J. Syst. Softw.
(2012)
Google Social Graph, 2012....

L. Aiello et al.

Link creation and information spreading over social and communication ties in an interest-based online social network

EPJ Data Sci.

(2012)

M. Al Hasan et al.

A survey of link prediction in social networks

S. Bartunov et al.

Joint link-attribute user identity resolution in online social networks

M. Berlingerio et al.

Foundations of multidimensional network analysis

M. Berlingerio et al.

Multidimensional networks: foundations of structural analysis

World Wide Web

(2013)

D. Brickley, L. Miller, The Friend of a Friend (FOAF) project, 2013....

M. Bröcheler et al.

Probabilistic subgraph matching on huge social networks

F. Buccafurri et al.

A model to support multi-social-network applications

F. Buccafurri et al.

Measuring betweennes centrality in social internetworking scenarios

F. Buccafurri et al.

Driving global team formation in social networks to obtain diversity

F. Buccafurri, G. Lax, A. Nocera, D. Ursino, Supporting Information spread in a social internetworking scenario, in:...

F. Buccafurri et al.

Crawling social internetworking systems

F. Buccafurri et al.

Discovering links among social networks

F. Buccafurri et al.

Internetworking assortativity in Facebook

F. Buccafurri et al.

A system for extracting structural information from social network accounts

Softw.: Pract. Exper.

(2014)

S.V. Buldyrev et al.

Catastrophic cascade of failures in interdependent networks

Nature

(2010)

X. Cai et al.

Reciprocal and heterogeneous link prediction in social networks

A. Castiglione et al.

A forensic analysis of images on online social networks

A. Castiglione et al.

Steganography and secure communication on online social networks and online photo sharing

T. Celik, E. Meyer, M. Mullenweg, XHTML friends network, in: Proc. ACM Hypertext, vol. 4,...

M. Conti et al.

Virtual private social networks

I. Dagan, F. Pereira, L. Lee, Similarity-based estimation of word cooccurrence probabilities, in: Proc. of the Annual...

Cited by (52)

Unveiling Qzone: A measurement study of a large-scale online social network
2023, Information Sciences
Nowadays, Online Social Network (OSN) is very important in maintaining relationships and disseminating information. Qzone, as one of the most popular OSNs in China, lacks detailed quantitative measurement and analysis. To the best of our knowledge, we are the first to conduct a large-scale measurement on Qzone. Firstly, we develop a dedicated Qzone Crawler System (Qzone-CS) and collect over a billion user and topology data. Subsequently, we measure user behavior and network topology characteristics by analyzing their posting and interactive behaviors. The results show that: 1) the temporal distribution of user behavior follows the periodic distribution, while user posting behavior follows the segmented power-law distribution. Among all the user behaviors on Qzone, the probability of posting Shuoshuo is the highest. Meanwhile, users prefer to comment on others’ blogs than to forward them. Unlike other OSNs, Qzone is a relatively closed system, which is not conducive to the dissemination of information. 2) Qzone is a small-world and scale-free community; the friendship network has higher clustering characteristic compared with the interactive network; out-degree in the interactive network has an obvious correlation with in-degree; and unlike other OSNs, Qzone is a heterogeneous network. Our results can help researchers better understand the user behavior and network topology characteristics of Qzone.
An approach to detect backbones of information diffusers among different communities of a social platform
2022, Data and Knowledge Engineering
Citation Excerpt :
In fact, she allows the interaction between users of different social networks who could not communicate otherwise [44]. The user’s join to multiple social networks can be explicitly stated by her through the so-called “me edges”, or it can be inferred by applying approaches that detect two accounts on different social networks belonging to the same user [45,46]. Our approach can easily be extended to such a scenario.
Information diffusion in social networks is a classic and, at the same time, very current problem. In fact, information diffusers are always looking for new techniques to disseminate information of their interest by creating backbones among them. In this paper, we focus on a specific, but very current and relevant, scenario regarding this way of proceeding. In fact, we propose an approach for the detection of possible backbones of information diffusers among different communities of a social network. Our approach is based on a new centrality measure that we call disseminator centrality. It is specifically designed to detect the so-called disseminator bridges, i.e., users belonging to multiple communities of a single social network, who want to disseminate information of their interest from one community to another by supporting each other. This paper describes the proposed approach, presents the disseminator centrality, illustrates the differences with respect to the related literature and presents the results of the experiments carried out to evaluate its performance.
Machine learning algorithms for social media analysis: A survey
2021, Computer Science Review
Citation Excerpt :
One solution to this is, identifying the users who are maintaining multiple accounts to publish the same information on SM. In [161], authors developed a solution to the real-time scenario of having multiple accounts in different social media, and ‘me’ edges is a technique used to define a solution to avoid misuse of data with standard account credentials. It is challenging to maintain sensitive information not to reveal to all social media account.
Social Media (SM) are the most widespread and rapid data generation applications on the Internet increase the study of these data. However, the efficient processing of such massive data is challenging, so we require a system that learns from these data, like machine learning. Machine learning methods make the systems to learn itself. Many papers are published on SM using machine learning approaches over the past few decades. In this paper, we provide a comprehensive survey of multiple applications of SM analysis using robust machine learning algorithms. Initially, we discuss a summary of machine learning algorithms, which are used in SM analysis. After that, we provide a detailed survey of machine learning approaches to SM analysis. Furthermore, we summarize the challenges and benefits of Machine Learning usages in SM analysis. Finally, we presented open issues and consequences in SM analysis for further research.
A privacy-preserving approach to prevent feature disclosure in an IoT scenario
2020, Future Generation Computer Systems
Citation Excerpt :
Indeed, (i) each identified group corresponds to a network of the system; (ii) each object can be modeled by means of a node; (iii) relationships between objects of the same group can be represented by means of arcs inside the corresponding networks (they are called inner arcs); (iv) relationships between objects of different groups are modeled as arcs linking nodes of different networks (they are called cross-arcs). The possibility to have a direct, natural and immediate multi-network representation of our scenario allows for an easy mapping with properties, operations and concepts of multi-network contexts [19–21]. Thanks to this, in our analyses, we can benefit from the wide variety of results found for multi-network systems in the past literature.
In this paper, we propose a privacy-preserving approach to prevent feature disclosure in a multiple IoT scenario, i.e., a scenario where objects can be organized in (partially overlapped) networks interacting with each other. Our approach is based on two notions derived from database theory, namely k-anonymity and t-closeness. They are applied to cluster the involved objects in order to provide a unitary view of them and of their features. Indeed, the use of k-anonymity and t-closeness makes derived groups robust from a privacy perspective. In this way, not only information disclosure, but also feature disclosure, is prevented. This is an important strength of our approach because the malicious analysis of objects’ features can have disruptive effects on the privacy (and, ultimately, on the life) of people.
Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
2019, Information Processing and Management
Recently, social network has been paid more and more attention by people. Inaccurate community detection in social network can provide better product designs, accurate information recommendation and public services. Thus, the community detection (CD) algorithm based on network topology and user interests is proposed in this paper. This paper mainly includes two parts. In first part, the focused crawler algorithm is used to acquire the personal tags from the tags posted by other users. Then, the tags are selected from the tag set based on the TFIDF weighting scheme, the semantic extension of tags and the user semantic model. In addition, the tag vector of user interests is derived with the respective tag weight calculated by the improved PageRank algorithm. In second part, for detecting communities, an initial social network, which consists of the direct and unweighted edges and the vertexes with interest vectors, is constructed by considering the following/follower relationship. Furthermore, initial social network is converted into a new social network including the undirected and weighted edges. Then, the weights are calculated by the direction and the interest vectors in the initial social network and the similarity between edges is calculated by the edge weights. The communities are detected by the hierarchical clustering algorithm based on the edge-weighted similarity. Finally, the number of detected communities is detected by the partition density. Also, the extensively experimental study shows that the performance of the proposed user interest detection (PUID) algorithm is better than that of CF algorithm and TFIDF algorithm with respect to F-measure, Precision and Recall. Moreover, Precision of the proposed community detection (PCD) algorithm is improved, on average, up to 8.21% comparing with that of Newman algorithm and up to 41.17% comparing with that of CPM algorithm.
Detect potential relations by link prediction in multi-relational social networks
2018, Decision Support Systems
Citation Excerpt :
Link prediction is an effective approach to detect such potential relations between the individuals in trade and social networks. Using link prediction, we can discover potential interpersonal contacts [19,20] by analyzing the customers' shopping behavior and social relationships. Link prediction can reveal the potential friendship of users in social networks [21,22] and can recommend the possible merchandise or services to the users [2,23].
Potential relation detecting on social network has become more important for decision making in many business disciplines, such as marketing, business strategy, human resources development, finance planning, business transformation, insurance policy design, and tourism management. People are used to seeking useful information from the relationships among social members to support their decisions on investment, partner seeking and marketing. Corporations are seeking opportunities to leverage them for “word of mouth” advertising based on the relations between the customers. When we collect and observe relationships between people, missing or redundant relations unavoidably occur since the time and cost restrictions in market or social investigation prevent us to discover all the relations. Moreover, since the social relations are changing constantly, current social relations may disappear, and new relations will be established. Many trade and social networks consist of multiple types of relations between the individuals. This paper presents an efficient method to detect the potential and future social relations between individuals in multi-relational social networks using link prediction. First, we calculate the belief of each individual by belief propagation on each type of relations. Based on the belief vectors, the similarities between various types of relations are computed to measure their mutual influence. Based on the similarities between various types of relations, we model link prediction as the problem of matrix completion by optimizing its max-norm constrained formulation. We propose a projected gradient descent optimization algorithm which is scalable to large size networks. Empirical results on real multi-relational social networks demonstrate that the predicting results of our algorithm have higher quality compared with other similar algorithms.

View all citing articles on Scopus

^☆: A shorter abridged version of this paper appears in “Discovering Links among Social Networks”, by F. Buccafurri, G. Lax, A. Nocera, and D. Ursino, in the Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, United Kingdom, 2012. Springer [17].

View full text

Discovering missing me edges across social networks☆

Abstract

Introduction

Section snippets

Related work

The notion of similarity

Discovering links among social networks

Complexity issues

Experiments

Conclusion and future work

Acknowledgments

Phys. A: Stat. Mech. Appl.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Inf. Sci.

Expert Syst. Appl.

Phys. A: Stat. Mech. Appl.

J. Syst. Softw.

Link creation and information spreading over social and communication ties in an interest-based online social network

EPJ Data Sci.

A survey of link prediction in social networks

Joint link-attribute user identity resolution in online social networks

Foundations of multidimensional network analysis

Multidimensional networks: foundations of structural analysis

World Wide Web

Probabilistic subgraph matching on huge social networks

A model to support multi-social-network applications

Measuring betweennes centrality in social internetworking scenarios

Driving global team formation in social networks to obtain diversity

Crawling social internetworking systems

Discovering links among social networks

Internetworking assortativity in Facebook

A system for extracting structural information from social network accounts

Softw.: Pract. Exper.

Catastrophic cascade of failures in interdependent networks

Nature

Reciprocal and heterogeneous link prediction in social networks

A forensic analysis of images on online social networks

Steganography and secure communication on online social networks and online photo sharing

Virtual private social networks