The authors declare that they have no competing interests.
The content extraction tool was programmed by C. A. Piña-García. All authors helped to write the literature review and to collect data. C. A. Piña-García wrote the majority of the paper with assistance from Carlos Gershenson and Siqueiros-García. All authors read and approved the final manuscript.
One of the most significant current challenges in large-scale online social networks, is to establish a concise and coherent method aimed to collect and summarize data. Sampling the content of an Online Social Network (OSN) plays an important role as a knowledge discovery tool.
It is becoming increasingly difficult to ignore the fact that current sampling methods must cope with a lack of a full sampling frame i.e., there is an imposed condition determined by a limited data access. In addition, another key aspect to take into account is the huge amount of data generated by users of social networking services such as Twitter, which is perhaps the most influential microblogging service producing approximately 500 million tweets per day. In this context, due to the size of Twitter, which is problematic to be measured, the analysis of the entire network is infeasible and sampling is unavoidable.
In addition, we strongly believe that there is a clear need to develop a new methodology to collect information on social networks (social mining). In this regard, we think that this paper introduces a set of random strategies that could be considered as a reliable alternative to gather global trends on Twitter. It is important to note that this research pretends to show some initial ideas in how convenient are random walks to extract information or global trends.
The main purpose of this study, is to propose a suitable methodology to carry out an efficient collecting process via three random strategies: Brownian, Illusion and Reservoir. These random strategies will be applied through a Metropolis-Hastings Random Walk (MHRW). We show that interesting insights can be obtained by sampling emerging global trends on Twitter. The study also offers some important insights providing descriptive statistics and graphical description from the preliminary experiments.