Nepotistic relationships in Twitter and their impact on rank prestige algorithms
Introduction
Twitter is a service which allows users to publish short text messages (tweets) which are shown to other users following the author of the message. In case the author is not protecting his tweets, they appear in the so-called public timeline and they are served as search results in response to user submitted queries. Thus, Twitter can be a source of valuable real-time information and, in fact, several major search engines were including tweets as search results at the moment of this writing.
Given that tweets are published by individual users, ranking them to find the most relevant information is a crucial matter. Indeed, at the moment of this writing, Google seemed to be applying the PageRank method to rank Twitter users to that end (Talbot, 2010). Nevertheless, the behavior of different graph centrality methods and their vulnerabilities when confronted with the Twitter user graph, in general, and Twitter spammers in particular, are still little-known.
Thus, this paper aims to shed some light on this particular issue besides providing some recommendations for future research in the area. As it will be later discussed, user ranking in social networks cannot be an end in itself, but a tool to be used for other tasks. Hence, this author is not considering any a priori “good” ranking and, instead, he suggests measuring the performance of the different methods on the basis of two desirable features: on one hand presumed relevant users should rank atop – although the actual ordering among them is irrelevant; and, on the other hand, spammers should achieve lower rankings.
The paper is organized as follows. First of all, a comprehensive literature review is provided. It deals with several rank prestige algorithms (some well-known and others lesser-known) which are applicable to social networks; their known vulnerabilities; and some partially related work and proprietary tools outside the scope of this study. In addition to that, Twitter spam is discussed with a focus on link spam (known as follow spam in Twitter). Then, the different strategies to fight spam in social websites are overviewed. Finally, the research questions are stated and the feasibility of “desensitizing” prestige ranking algorithms against follow spam is analyzed. After that, the experimental framework in which this study was conducted is described: the dataset crawled from Twitter; the elaboration of the subset of relevant and abusive users; and the straightforward nature of the evaluation. Afterwards, results obtained with each of the different ranking methods are discussed along with the implications of the study. Finally, an in-depth analysis of the collected dataset is provided in an appendix: it provides details on the nature of the social network, in addition to some demographical analysis.
Section snippets
Literature review
A social network, despite the current association with online services, is any interconnected system whose connections are a product of social relations or interactions among persons or groups. That way, families, companies, groups of friends, or scientific production are social networks.
Social networks can be mathematically modeled as graphs and, thus, graph theory has become inextricably related to social network analysis with a long history of research. Think, for instance, of bibliometric
Spam countermeasures
From the literature review regarding Twitter spam it can be concluded that false negatives (uncatched spammers) and false positives (legitimate users labeled as spammers) are unavoidable and, even worse, as it was stated in Benevenuto et al., 2010, Chowdhury, 2010, Yardi et al., 2010 the “war on spam” is a never ending fight because every deployed anti-spam measure will be eventually overcome by spammers.
However, all of those works approach Twitter spam as a detection task while there are other
Research design
The main goal of this study was to compare the performance of different rank prestige algorithms when applied to social networks. To attain that, a relative large dataset was needed, in addition to an objective criterion against which to compare the performance of the different methods. The dataset, described in the following subsection, was crawled by the author from Twitter. Within that dataset two different subsets were prepared: one of presumed relevant users and another of abusive users.
Prestige of abusive users and verified accounts when applying PageRank
As it has been aforementioned, 4884 verified accounts appear in the collected dataset. They amount for a mere 0.3% of the users in the graph but, still, they accumulate 12.7% of the total PageRank in the graph. Regarding their positioning in the global ranking (see Table 5, and Fig. 3, Fig. 4), 85% of the verified accounts appear among the top 10% of users.
With regards to the subset of abusive users, about 50% of the spammers detected in the collection of tweets did not appear in the graph.
Discussion of results
As we have explained before, our approach to evaluate the available ranking algorithms in the context of social networks was not based on an a priori “good” ranking but, instead, their capacity of ranking relevant users (i.e. verified accounts) atop as a whole group while, at the same time, being robust to “gaming” by abusive users, i.e. their “ability” to penalize spammers and aggressive marketers who try to reach better positions by exchanging links instead of providing better content.
Because
Implications, conclusions, and future work
This study makes four main contributions. First, when applying graph prestige algorithms to social networks, ranking is not only a measure of a user’s value but also the result of “gaming” the algorithm by means of relationship links. The fact that spammers – who contribute no valuable content – are consistently better positioned than marketers – who contribute somewhat valuable information – no matter the method employed supports this assert.
Second, evaluating ranking should not be a point in
Acknowledgements
The author would like to thank David J. Brenes, Miguel Fernández, Fernando Zapico, and Diego Guerra for their help during the Twitter dataset collection. He is also in debt with Miguel Fernández for comments on an early draft of this paper. This work was partially financed by grant UNOV-09-RENOV-MB-2 from the University of Oviedo. Finally, the author would like to thank the anonymous reviewers for their constructive suggestions to improve this paper, especially regarding the use of ground truth
References (52)
- et al.
The anatomy of a large-scale hypertextual web search engine
Computer Networks and ISDN Systems
(1998) - Bakshy, E., et al. (2011). Everyone’s an influencer: quantifying influence on Twitter. In WSDM ‘11 Proceedings of the...
- Benevenuto, F., et al. (2010). Detecting spammers on Twitter. In CEAS 2010, Proceedings of the seventh annual...
- Bharat, K., & Henzinger, M. (1998). Improved algorithms for topic distillation in hyper-linked environments. In...
Choosing physiology journals
Bulletin of the Medical Library Association
(1944)- Cha, M., et al. (2010). Measuring user influence in Twitter: The million follower fallacy. In ICWSM’10: Proceedings of...
Mining the web – Discovering knowledge from hypertext data
(2002)- Choudhury, M. D., Sundaram, H., John, A., Seligmann, D. D., & Kelliher, A. (2010). Birds of a feather: Does attribute...
- Chowdhury, A. (2010). State of Twitter spam....
- Chu, Z., et al. (2010). Who is tweeting on Twitter: Human, bot, or cyborg? In Proceedings of annual computer security...
Characteristics of the research literature used by chemists and physicists in the United States 1949
Citation analysis as a tool in journal evaluation
Science
De retibus socialibus et legibus momenti
Europhysics Letters
College libraries and chemical education, 1927
Fighting spam on social web sites: A survey of approaches and future challenges
IEEE Internet Computing
Social networks that matter: Twitter under the microscope
First Monday
Cited by (44)
Capricious opinions: A study of polarization of social media groups
2022, Government Information QuarterlyIdentification of influential users on Twitter: A novel weighted correlated influence measure for Covid-19
2020, Chaos, Solitons and FractalsCitation Excerpt :Follower-Rank is another such influence metric proposed by [8], which is expressed as the number of followers of an individual and Follower-Followee ratio [2] in the Twitter network. Another measure based upon followers and followee ratio is a paradoxical discounted measure [15]. To diminish the effect of trade-offs between several followers and followee of the user, a new metric namely, popularity [1] was developed.
Robustness and stability of enterprise intranet social networks: The impact of moderators
2017, Information Processing and ManagementA Centrality for Social Media Users Focusing on Information-Gathering Ability
2023, HT 2023 - The 34th ACM Conference on Hypertext and Social MediaWho leads the conversation? Influential Twitter users during a niche sporting event
2022, Revista Mediterranea de Comunicacion