Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies

Young Bin Kim; Jun Gi Kim; Wook Kim; Jae Ho Im; Tae Hyeong Kim; Shin Jin Kang; Chang Hun Kim

doi:10.1371/journal.pone.0161197

Abstract

This paper proposes a method to predict fluctuations in the prices of cryptocurrencies, which are increasingly used for online transactions worldwide. Little research has been conducted on predicting fluctuations in the price and number of transactions of a variety of cryptocurrencies. Moreover, the few methods proposed to predict fluctuation in currency prices are inefficient because they fail to take into account the differences in attributes between real currencies and cryptocurrencies. This paper analyzes user comments in online cryptocurrency communities to predict fluctuations in the prices of cryptocurrencies and the number of transactions. By focusing on three cryptocurrencies, each with a large market size and user base, this paper attempts to predict such fluctuations by using a simple and efficient method.

Citation: Kim YB, Kim JG, Kim W, Im JH, Kim TH, Kang SJ, et al. (2016) Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies. PLoS ONE 11(8): e0161197. https://doi.org/10.1371/journal.pone.0161197

Editor: Wei-Xing Zhou, East China University of Science and Technology, CHINA

Received: April 19, 2016; Accepted: July 31, 2016; Published: August 17, 2016

Copyright: © 2016 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science, ICT and future Planning (NRF-2014R1A2A2A01007143, NRF-2015R1C1A2A01053543, NRF- 2015R1A2A1A16074940, and NRF-2015R1A1A1A05001196).

Competing interests: The authors have declared that no competing interests exist.

Introduction

The ubiquity of Internet access has triggered the emergence of currencies distinct from those used in the prevalent monetary system. The advent of cryptocurrencies based on a unique method called “mining” has brought about significant changes in the online economic activities of users. Various cryptocurrencies have emerged since 2008, when Bitcoin was first introduced [1, 2]. Nowadays, cryptocurrencies are often used in online transactions, and their usage has increased every year since their introduction [3, 4].

Cryptocurrencies are primarily characterized by fluctuations in their price and number of transactions [2, 3]. For instance, the most famous cryptocurrency, Bitcoin, had witnessed no significant fluctuation in its price and number of transactions until the end of 2013 [3], when it began to garner worldwide attention, and witnessed a significant rise and fluctuation in its price and number of transactions. Other cryptocurrencies—Ripple and Litecoin, for instance—have shown significantly unstable fluctuations since the end of December 2013 [5]. Such unstable fluctuations have served as an opportunity for speculation for some users while hindering most others from using cryptocurrencies [2, 6, 7].

Research on the attributes of cryptocurrencies has made steady progress but has a long way to go. Most researchers analyze user sentiments related to cryptocurrencies on social media, e.g., Twitter, or quantified Web search queries on search engines, such as Google, as well as fluctuations in price and trade volume to determine any relation [8–12]. Past studies have been limited to Bitcoin because the large amount of data that it provides eliminates the need to build a model to predict fluctuations in the price and number of transactions of diverse cryptocurrencies.

Therefore, this paper proposes a method to predict fluctuations in the price and number of transactions of cryptocurrencies. The proposed method analyzes user comments on online cryptocurrency communities, and conducts an association analysis between these comments and fluctuations in the price and number of transactions of cryptocurrencies to extract significant factors and formulate a prediction model. The method is intended to predict fluctuations in cryptocurrencies based on the attributes of online communities.

Online communities serve as forums where people share opinions regarding topics of common interest [13–17]. Therefore, such communities mirror the responses of many users to certain cryptocurrencies on a daily basis. Cryptocurrencies are largely traded online, where many users rely on information on the Web to make decisions about selling or buying them [4, 18]. In this paper, daily topics and relevant comments/replies in cryptocurrency communities are analyzed to determine how the opinions of community users are associated with fluctuations in the price and number of transactions of cryptocurrencies on a daily basis.

The proposed method is applicable to a range of cryptocurrencies, and can predict fluctuations in the prices of such cryptocurrencies as Bitcoin, Ripple, and Ethereum to a certain extent (approximately 74% weighted average precision). Moreover, the rise and fall in the number of transactions of Bitcoin and Ethereum can be predicted to some extent.

Methods

System Overview

For the proposed system, we crawled all comments and replies posted in online communities relevant to cryptocurrencies [19–21]. We then analyzed the data (comments and replies) and tagged the extent of positivity or negativity of each topic as well as that of each comment and reply. Following this, we tested the relation between the price and number of transactions of cryptocurrencies based on user comments and replies to select data (comments and replies) that showed significant relation. Finally, we created a prediction model via machine learning based on the selected data to predict fluctuations (Fig 1).

Download:

Fig 1. System overview.

https://doi.org/10.1371/journal.pone.0161197.g001

Crawling user comment data

We crawled data needed to create the prediction model. Once the environment for cryptocurrency trading among users is established, transactions between users lead to fluctuations in price [4]. We hypothesized that user comments in certain online cryptocurrency communities may affect fluctuations in their price and trading volume. Thus, we crawled the relevant data. Approximately 670 types of cryptocurrencies existed as of February 2016 [22]. Of the available ones, we crawled online communities for the top three in terms of market cap, i.e., Bitcoin, Ethereum, and Ripple. We did not include Litecoin in this study because its online communities seemed not to be sufficiently active to be considered in this experiment, despite its large market cap and broad user base.

Since Bitcoin was the first cryptocurrency, it has a large user community. In the Bitcoin community [19], data items were collected starting from December 2013, when the cryptocurrency became widely available. In the Ethereum community [20], data were collected from August 7, 2015, since when the community stabilized to the extent that at least one topic has since been posted every day and transaction data are available. From the Ripple community [21], all data since the creation of the community were gathered. In all communities of interest, we collected data in a legitimate manner, in compliance with their terms and conditions. Moroever, the collected data did not involve any personally identifiable information.

The cryptocurrencies of interest in this paper had online communities where users shared opinions on the relevant topics. The Bitcoin community [19] is divided into four sections, i.e., a “Bitcoin” section on Bitcoin-related topics, an “Economy” section on transactions, an “Alternate cryptocurrencies” section concerning other cryptocurrencies, and an “Other” section for other topics. Each section has three-five subsections. The “Bitcoin” section consisted of “Bitcoin Discussion,” “Development & Technical Discussion,” “Mining,” “Technical Support,” and “Project Development.” The “Alternate cryptocurrencies” section had a similar structure. For this paper, we crawled the discussion sub-sections for topics related to each of the cryptocurrencies.

Comments and relevant replies posted by users on bulletin boards in each community were crawled. Furthermore, the time when each comment and replies to it were posted, the number of replies to each comment, and the number of views were crawled as well. Replies quoting previous comments and replies were crawled excluding overlapping sentences. Each community’s HTML page was crawled using Python [23]. Using Python’s regex, we parsed the tags on HTML pages to extract the number of topics, the number of replies, the dates on which the topics and replies were posted, and the URL of each topic from the bulletin boards. Based on the URLs of extracted topics, their contents and replies to them were extracted. The extracted topics, the dates on which they were posted, topic contents, reply contents, and reply dates were saved in .json format, which was in turn converted into other formats (e.g. csv) appropriate for different purposes. The .json files of the communities crawled can be viewed in the supporting information. One researcher executed the crawling on a single PC for 48 ~ 72 hours, where the time varied with the size of the community. The Bitcoin and Ethereum forums were crawled on February 1 and 8, 2016, respectively, whereas the Ripple forum was crawled on January 21, 2016. Table 1 outlines the arrangement of the opinion data that were gathered.

Download:

Table 1. Summary of crawled opinion data.

https://doi.org/10.1371/journal.pone.0161197.t001

The crawled data included garbage, e.g., ads and meaninglessly repetitive postings or replies. Quite a few spam filtering techniques were investigated to remove such garbage data [15, 24–29]. Any posting of more than two sentences found more than five times a day was considered spam and treated as such.

Tagging user comments data

In this step, positive/negative replies to the crawled user comment data were tagged. Many past studies have dealt with classifying user sentiment or comment data [15, 30–35]. In this vein, user reviews have been used to create a classifier based on machine learning [36–40], and user comments on the Web have been statistically analyzed for sentiment tagging [41–43].

Past research has mostly focused on classifying user comments in particular fields. Comments on online communities involve considerable use of neologisms, slang, and emoticons that transcend grammatical usage. C.J. Hutto and Eric Gilbert introduced an algorithm called VADER [44] to parse such expressions, and proposed a method to analyze social media texts by drawing on a rule-based model. Online communities of interest in this paper paralleled social media texts. Thus, user comment data were tagged based on this algorithm.

VADER normalizes positive and negative sentiments from -1 to 1. Based on the normalized figure, x, -1< = x < -0.6, -0.6< = x < -0.2, 0.2 < = x <0.6, and 0.6< = x < = 1.0 were tagged as very negative, negative, positive, and very positive, respectively. In this paper, each of the comments and replies was tagged (see the opinion analysis example in Table 2).

Download:

Table 2. Bitcoin Community Opinion Analysis Example.

https://doi.org/10.1371/journal.pone.0161197.t002

Prediction modeling

The crawled user comment data were tagged to create a prediction model. To create the prediction model, data selection was performed again. All opinions from very negative to very positive comments and replies could have been used. Yet, we intended to improve the qualitative results and minimize operation cost. For data selection, we performed an association analysis between the results of opinion analysis and fluctuations in cryptocurrency prices. In this paper, the Granger causality test, which is widely used in research on the value of shares and currencies, was adopted [45].

As shown in Eq 1, the results of opinion analysis based on the topics and replies (VADER-based tagged values), the number of topics posted, the number of replies posted, and the number of views of the entire topics posted on a certain day were transformed into z-scores for standardization against the previous 10 days. Likewise, the fluctuations in the price and number of transactions of cryptocurrencies were transformed into z-scores for standardization against the previous 10 days. On a certain date t (t = 10 in the paper), the z-score of a certain item , denoted by , was defined as: (1) where and respectively represent the mean and standard deviation of each item for every date. Fig 2 shows an example of test results comparing the fluctuations in cryptocurrency prices and results of opinion analysis z-scores.

Download:

Fig 2. Z-scores of fluctuations in cryptocurrency prices overlapping with results of opinion analysis.

Some opinions show a trend similar to that of fluctuations in cryptocurrency prices.

https://doi.org/10.1371/journal.pone.0161197.g002

The standardized z-scores underwent the Granger causality test to determine the significance of association. The Granger causality test relies on the assumption that if a variable X causes Y, then changes in X will systematically occur before changes in Y [46]. As demonstrated in previous studies, lagged values of X exhibit a statistically significant correlation with Y [15, 46]. Correlation does not prove causation, however. We are not testing actual causation, but only whether the time series of a community of opinions contained predictive information regarding the fluctuations in cryptocurrency prices.

Our time series for the prices of cryptocurrencies and number of transactions, denoted by S_t, reflected daily changes in the prices of cryptocurrencies and the number of transactions. To test whether the community opinions in the time series can predict changes in the fluctuations in cryptocurrency prices, we compared the variance explained by two linear models, as shown in Eqs 2 and 3. The first model uses only n lagged values of S_t (i.e., S_t−1, ⋯, S_t−n) for prediction, whereas the second model uses the n lagged values of both S_t and the selling prices of the item time series, denoted by X_t−1, ⋯, X_t−n. We performed the Granger causality test according to models in Eqs 2 and 3.

(2)

(3)

Based on the results of the Granger causality test, we can reject the null hypothesis, whereby the community opinions time series does not predict fluctuations in cryptocurrency prices—i.e., β_{1,2,⋯,b} ≠ 0—with a high level of confidence The community opinions with the highest Granger causality relation (p-value < 0.05) were extracted.

The Granger causality test was performed on each currency for a time lag of 1 to 13 days. Experimentally, a time lag of 14 days and longer proved insignificant. Depending on the difference in each time lag measurement, elements showing significant associations were identified. For the prediction, the fluctuations in cryptocurrency prices were determined in a binary manner. We generated and validated the prediction model based on averaged one-dependence estimators (AODE) [47]. Based on AODE, we estimated the probability of a binary class y, given that an item-related set of features was x₁,⋯x_n, P(y|x_1,⋯x_n). This probability was estimated as follows: (4) where denotes an estimate of P(⋅), F(⋅) is the frequency, and m is the frequency limit set at 1 in this paper. In the next section, we discuss the results of the applied system.

Experimental Results

Using our model, we made predictions regarding three cryptocurrencies (Bitcoin, Ethereum, and Ripple). In consonance with the days for which data were collected from these communities, each cryptocurrency’s daily price and number of transactions were crawled. Information concerning the price and number of transactions of Bitcoin was crawled via Coindesk [19], whereas price information for Ethereum was crawled via CoinMarketCap [22] and its transaction information was crawled via Etherscan [48]. Information regarding price for Ripple was crawled via rippleCharts [49], whereas its transaction information was not crawled. All data collected were in the public domain and excluded personal information. Table 3 outlines the arrangement of the market data that were gathered.

Download:

Table 3. Summary of crawled market data.

https://doi.org/10.1371/journal.pone.0161197.t003

The elements that exhibited significant associations in modeling for predictions were used for learning (Tables 4–8). P-values in the table are only shown for elements with prices of 0.05 or less.

Download:

Table 4. Statistical significance (p-values) of bivariate Granger causality correlation for Bitcoin price and community opinion.

https://doi.org/10.1371/journal.pone.0161197.t004

Download:

Table 5. Statistical significance (p-values) of bivariate Granger causality correlation for the number of transactions and community opinion for Bitcoin.

https://doi.org/10.1371/journal.pone.0161197.t005

Download:

Table 6. Statistical significance (p-values) of bivariate Granger causality correlation for Ethereum’s price and community opinion.

https://doi.org/10.1371/journal.pone.0161197.t006

Download:

Table 7. Statistical significance (p-values) of bivariate Granger causality correlation for the number of transactions and community opinion for Ethereum.

https://doi.org/10.1371/journal.pone.0161197.t007

Download:

Table 8. Statistical significance (p-values) of bivariate Granger causality correlation for Ripple’s price and community opinion.

https://doi.org/10.1371/journal.pone.0161197.t008

An example of applicable input data is shown in Table 9. The results of the predicted fluctuations in the price and number of transactions of each cryptocurrency are discussed below.

Download:

Table 9. Example of a machine learning dataset.

The z-score () of data for the previous 10 days was used as the values A~J, which indicate the value of the sum of the opinion of each community at the given date. Here, X~Z indicate the topic data values (number of topics, sum of replies, sum of views) on the given date.

https://doi.org/10.1371/journal.pone.0161197.t009

The accuracy rate, the F-measure and the Matthews correlation coefficient (MCC) were used to evaluate the performance of the proposed models. The computation of these evaluation measures required estimating precision and recall, which are evaluated from True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). These parameters are defined in Eqs 5, 6, 7 and 8: (5) (6) (7) (8)

Accuracy rate, weighted average of F-measure (F−Measure_w) and MCC are defined in Eqs 9, 10, 11, 12 and 13.

(9)

(10)

(11)

(12)

(13)

Of the Bitcoin-related data for 793 days, the first 88% (for 697 days) and the remaining 12% (for 94 days) were used for learning and verification, respectively. Fluctuations in the price of Bitcoin proved to be significantly associated with the number of topics, positive/very positive comments, and positive replies. The prediction result proved to be the highest when the time lag was six days with an accuracy of 79.57% (Table 10). Moreover, fluctuations in the number of transactions proved to be significantly associated with the section where a number of daily topics, very positive comments, and very positive replies were found. The predicted result of fluctuating numbers of transactions proved to be highest when the time lag was three days with an accuracy of 77.895% (Table 10).

Download:

Table 10. Experimental result of predicted Bitcoin fluctuation.

https://doi.org/10.1371/journal.pone.0161197.t010

A 10-fold cross-validation was performed on Ethereum for the entire days (for 187 days). Unlike Bitcoin, Ethereum showed a significant association in the Granger causality test with the section where a number of negative/very negative comments were found. A significant association with a number of positive user replies was also found. The predicted result proved to be highest when the time lag was six days with an accuracy of 71.823% (Table 11). The fluctuation in the number of transactions showed insignificant associations with most sections, but was significantly associated with very negative replies when the time lag was 11~13 days. The predicted fluctuation in the number of transactions when the time lag was one day yielded an accuracy of 66.129% (Table 11).

Download:

Table 11. Experimental result of predicted Ethereum fluctuation.

https://doi.org/10.1371/journal.pone.0161197.t011

Finally, Ripple underwent 10-fold cross-validation for the entire days (for 137 days). The predicted fluctuation in the price of Ripple proved to be highest when the time lag was seven days with an accuracy of 71.756% (Table 12).

Download:

Table 12. Experimental result of predicted Ripple price fluctuation.

https://doi.org/10.1371/journal.pone.0161197.t012

Like Ethereum, Ripple proved to be significantly associated with very negative comments, and with negative replies when the time lag was seven days and longer. The prediction of fluctuation in the number of transactions of Ripple could not be performed due to difficulties in acquiring relevant data.

To determine the effectiveness of the proposed prediction model, we performed a simulated investment in Bitcoin, using the simulated investment technique generally used in past studies on stock price prediction [50]. We invested in Bitcoin when the model predicted the price would rise the following day, and did not invest when the price was expected to drop the following day according to the model. The simulated investment was based on the rule whereby we would gain or lose from the investment (m) by r, which indicates the increment or decrement in the Bitcoin price (m = m + m × r or m = m−m × r, respectively). The six-day time lag, which corresponded to the best result in this study, was used in the prediction model. The prediction model was created based on data for the period from December 1, 2013 to November 10, 2015. The 84-day or 12-week data for the period from November 11, 2015 to February 2, 2016 were used in the experiment.

Fig 3 shows the results of the simulated investment program based on the above conditions. The random investment average refers to the mean of 10 simulated investments based on the random Bitcoin price prediction. Over 12 weeks, the Bitcoin price increased by 19.29% while the amount of investment grew by 35.09%. In random investment, the amount of investment increased by approximately 10.72%, which was lower than the increment in Bitcoin price.

Download:

Fig 3. Increment/decrement in the amount of simulated investment in Bitcoin.

https://doi.org/10.1371/journal.pone.0161197.g003

Discussion and Conclusion

This paper analyzed user comments in online communities to predict the price and the number of transactions of cryptocurrencies. The proposed method predicted fluctuations in the price of cryptocurrencies at low cost. In terms of the prediction rates for Bitcoin and other cryptocurrencies based on the limited resources in online communities, the proposed method paralleled previous studies designed for similar purposes [15, 51]. Moreover, user comments and replies in online communities proved to affect the number of transactions among users. The proposed method proved applicable to buying and selling cryptocurrencies, and shed light on aspects influencing user opinions. Furthermore, the simulated investment demonstrated that the proposed method is applicable to cryptocurrency trading.

Based on the learning data at the time of higher prediction rates, the types of comments that most significantly influenced fluctuations in the price and the number of transactions of each cryptocurrency were identified. Opinions affecting price fluctuations varied across cryptocurrencies. Positive user comments significantly affected price fluctuations of Bitcoin, whereas those of the other two currencies were significantly influenced by negative user comments and replies. Moreover, the association with the number of topics posted daily indicated that the variation in community activities could influence fluctuations in price. Further, unlike the price of cryptocurrencies, the number of transactions proved to be significantly associated with user replies rather than comments posted. Based on the prediction results, user opinions proved useful to predict the fluctuations in 6~7 days (Table 10).

The predicted fluctuations in the price of each cryptocurrency showed approximately 8% accuracy gaps. The predicted result was most precise in Bitcoin, which seems attributable to the amount of accumulated data and animated community activities (16.91 comments, 473.81 user replies, and 27443.18 views on average daily), which exerted a direct effect on fluctuations in the price of the cryptocurrency. The predicted result was least precise in Ripple, which had the smallest community regardless of its market size (3.41 comments, 29.14 user replies, and 1661.99 views on average daily). Ripple’s online community started in September, 2015, with little data accumulated and few user activities. These findings suggest that the difference in community sizes may have direct effects on fluctuations in the price of cryptocurrencies.

Improving the precision of prediction requires a few improvements. Despite the association analysis used to filter user comments and replies, more qualitative selection criteria are needed to build a prediction model. This paper focused on online communities to determine associations and predict fluctuations. Yet, as with past studies, using data on the Web [52, 53], analyzing social network data [46], and referring to search volumes on Google [10, 12] are conducive to more precise results. Moreover, partly adopting the stock market prediction technique used in previous studies [54] might help increase precision rate.

In this paper, we acquired information from users in online communities as a viable source for research on cryptocurrencies. In the same vein, the sentiments expressed by user comments and replies in online communities seem applicable to further analysis and understanding of cryptocurrencies. Moreover, the propensities of online community users may help understand the attributes of the relevant cryptocurrency. In addition, the rich information in online communities can contribute to understanding cryptocurrencies from different perspectives.

Cryptocurrencies are increasingly being used, and their usability has drawn attention from different perspectives [2–5]. Research on cryptocurrencies is insufficient, in that hardly any currency other than Bitcoin has been investigated. The proposed method of predicting fluctuations in the price and trading volume of cryptocurrencies based on user comments and replies in online communities is likely to increase the understanding and availability of cryptocurrencies if a range of improvements and applications are implemented. Furthermore, different approaches to user comments and replies in online communities are expected to bring more significant results in diverse fields.

Supporting Information

S1 File. Results of crawling Bitcoin forum, Ethereum forum, and Ripple forum (in .json format).

https://doi.org/10.1371/journal.pone.0161197.s001

(ZIP)

S2 File. Python-based crawler source code for community data collection.

https://doi.org/10.1371/journal.pone.0161197.s002

(ZIP)

S1 Table. The result of implementing opinion analysis from user opinion data (topic) on the Bitcoin forum (https://bitcointalk.org).

https://doi.org/10.1371/journal.pone.0161197.s003

(CSV)

S2 Table. The result of implementing opinion analysis from user opinion data (topic) on the Ethereum forum (https://forum.ethereum.org/).

https://doi.org/10.1371/journal.pone.0161197.s004

(CSV)

S3 Table. The result of implementing opinion analysis from user opinion data (topic) on the Ripple forum (http://www.xrpchat.com/).

https://doi.org/10.1371/journal.pone.0161197.s005

(CSV)

S4 Table. The result of implementing opinion analysis from user opinion data (reply) on the Bitcoin forum (https://bitcointalk.org).

https://doi.org/10.1371/journal.pone.0161197.s006

(ZIP)

S5 Table. The result of implementing opinion analysis from user opinion data (reply) on the Ethereum forum (https://forum.ethereum.org/).

https://doi.org/10.1371/journal.pone.0161197.s007

(CSV)

S6 Table. The result of implementing opinion analysis from user opinion data (reply) on the Ripple forum (http://www.xrpchat.com/).

https://doi.org/10.1371/journal.pone.0161197.s008

(CSV)

Author Contributions

Conceived and designed the experiments: YBK SJK CHK JHI THK.
Performed the experiments: YBK JGK.
Analyzed the data: YBK.
Contributed reagents/materials/analysis tools: JGK.
Wrote the paper: YBK WK CHK.

References

1. Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008.
2. Reid F, Harrigan M. An analysis of anonymity in the bitcoin system: Springer; 2013.
3. Böhme R, Christin N, Edelman B, Moore T. Bitcoin: Economics, technology, and governance. The Journal of Economic Perspectives. 2015;29(2):213–38.
- View Article
- Google Scholar
4. Grinberg R. Bitcoin: an innovative alternative digital currency. Hastings Sci & Tech LJ. 2012;4:159.
- View Article
- Google Scholar
5. Ahamad S, Nair M, Varghese B, editors. A survey on crypto currencies. 4th International Conference on Advances in Computer Science, AETACS; 2013: Citeseer.
6. Kondor D, Pósfai M, Csabai I, Vattay G. Do the rich get richer? An empirical analysis of the Bitcoin transaction network. PloS one. 2014;9(2):e86197. pmid:24505257
- View Article
- PubMed/NCBI
- Google Scholar
7. Ron D, Shamir A. Quantitative analysis of the full bitcoin transaction graph. Financial Cryptography and Data Security: Springer; 2013. p. 6–24.
8. Garcia D, Tessone CJ, Mavrodiev P, Perony N. The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy. Journal of the Royal Society Interface. 2014;11(99):20140623.
- View Article
- Google Scholar
9. Kondor D, Csabai I, Szüle J, Pósfai M, Vattay G. Inferring the interplay between network structure and market effects in Bitcoin. New Journal of Physics. 2014;16(12):125003.
- View Article
- Google Scholar
10. Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific reports. 2013;3.
- View Article
- Google Scholar
11. Kristoufek L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PloS one. 2015;10(4):e0123923. pmid:25874694
- View Article
- PubMed/NCBI
- Google Scholar
12. Yelowitz A, Wilson M. Characteristics of Bitcoin users: an analysis of Google search data. Applied Economics Letters. 2015;22(13):1030–6.
- View Article
- Google Scholar
13. Bernstein MS, Monroy-Hernández A, Harry D, André P, Panovich K, Vargas GG, editors. 4chan and/b: An Analysis of Anonymity and Ephemerality in a Large Online Community. ICWSM; 2011.
14. Hau YS, Kim Y-G. Why would online gamers share their innovation-conducive knowledge in the online game user community? Integrating individual motivations and social capital perspectives. Computers in Human Behavior. 2011;27(2):956–70.
- View Article
- Google Scholar
15. Kim YB, Lee SH, Kang SJ, Choi MJ, Lee J, Kim CH. Virtual world currency value fluctuation prediction system based on user sentiment analysis. PloS one. 2015;10(8):e0132944. pmid:26241496
- View Article
- PubMed/NCBI
- Google Scholar
16. Panzarasa P, Opsahl T, Carley KM. Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology. 2009;60(5):911–32.
- View Article
- Google Scholar
17. Sing CC, Khine MS. An analysis of interaction and participation patterns in online community. JOURNAL OF EDUCATIONAL TECHNOLOGYAND SOCIETY. 2006;9(1):250.
- View Article
- Google Scholar
18. Maurer B, Nelms TC, Swartz L. “When perhaps the real problem is money itself!”: the practical materiality of Bitcoin. Social Semiotics. 2013;23(2):261–77.
- View Article
- Google Scholar
19. Bitcoin Forum[Internet]: Simple Machines; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://bitcointalk.org/.
20. Ethereum[Internet]: forum.ethereum.org; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://forum.ethereum.org.
21. XRP CHAT[Internet]: xrpchat.com; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: http://www.xrpchat.com/.
22. Crypto-Currency Market Capitalizations[Internet]: CoinMarketCap; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: http://coinmarketcap.com/.
23. VanRossum G, Drake FL. The Python Language Reference: Python software foundation Amsterdam, Netherlands; 2010.
24. Mccord M, Chuah M. Spam detection on twitter using traditional classifiers. Autonomic and trusted computing: Springer; 2011. p. 175–86.
25. Prakash VV. Method and apparatus to block spam based on spam reports from a community of users. Google Patents; 2008.
26. Song J, Lee S, Kim J, editors. Spam filtering in twitter using sender-receiver relationship. Recent Advances in Intrusion Detection; 2011: Springer.
27. Thomas K, Grier C, Song D, Paxson V, editors. Suspended accounts in retrospect: an analysis of twitter spam. Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference; 2011: ACM.
28. Wang AH, editor Don't follow me: Spam detection in twitter. Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on; 2010: IEEE.
29. Yardi S, Romero D, Schoenebeck G. Detecting spam in a twitter network. First Monday. 2009;15(1).
- View Article
- Google Scholar
30. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R, editors. Sentiment analysis of twitter data. Proceedings of the workshop on languages in social media; 2011: Association for Computational Linguistics.
31. Baccianella S, Esuli A, Sebastiani F, editors. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC; 2010.
32. Bifet A, Frank E, editors. Sentiment knowledge discovery in twitter streaming data. Discovery Science; 2010: Springer.
33. Kouloumpis E, Wilson T, Moore JD. Twitter sentiment analysis: The good the bad and the omg! Icwsm. 2011;11:538–41.
- View Article
- Google Scholar
34. Mike T, Buckley K, Paltoglou G. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology. 2012;63(1):163–73.
- View Article
- Google Scholar
35. Pak A, Paroubek P, editors. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. LREc; 2010.
36. Glorot X, Bordes A, Bengio Y, editors. Domain adaptation for large-scale sentiment classification: A deep learning approach. Proceedings of the 28th International Conference on Machine Learning (ICML-11); 2011.
37. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D, editors. The Stanford CoreNLP Natural Language Processing Toolkit. ACL (System Demonstrations); 2014.
38. Pang B, Lee L, Vaithyanathan S, editors. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10; 2002: Association for Computational Linguistics.
39. Read J, editor Using emoticons to reduce dependency in machine learning techniques for sentiment classification. Proceedings of the ACL student research workshop; 2005: Association for Computational Linguistics.
40. Ye Q, Zhang Z, Law R. Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications. 2009;36(3):6527–35.
- View Article
- Google Scholar
41. Abbasi A, Chen H, Salem A. Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems (TOIS). 2008;26(3):12.
- View Article
- Google Scholar
42. Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and trends in information retrieval. 2008;2(1–2):1–135.
- View Article
- Google Scholar
43. Tang H, Tan S, Cheng X. A survey on sentiment detection of reviews. Expert Systems with Applications. 2009;36(7):10760–73.
- View Article
- Google Scholar
44. Hutto CJ, Gilbert E, editors. Vader: A parsimonious rule-based model for sentiment analysis of social media text. Eighth International AAAI Conference on Weblogs and Social Media; 2014.
45. Granger CW, Huangb B-N, Yang C-W. A bivariate causality between stock prices and exchange rates: evidence from recent Asianflu☆. The Quarterly Review of Economics and Finance. 2000;40(3):337–54.
- View Article
- Google Scholar
46. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science. 2011;2(1):1–8.
- View Article
- Google Scholar
47. Webb GI, Boughton JR, Wang Z. Not so naive Bayes: aggregating one-dependence estimators. Machine learning. 2005;58(1):5–24.
- View Article
- Google Scholar
48. Etherscan[Internet]: Etherscan.io; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://etherscan.io/.
49. RippleCharts[Internet]: Ripple Network; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://www.ripplecharts.com/.
50. Kim S-H, Kim D-H, Han C-H, Kim W-I. Stock Forecasting using Stock Index Relation and Genetic Algorithm. Journal of Korean Institute of Intelligent Systems. 2008;18(6):781–6.
- View Article
- Google Scholar
51. Shah D, Zhang K, editors. Bayesian regression and Bitcoin. Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on; 2014: IEEE.
52. Cohen-Charash Y, Scherbaum CA, Kammeyer-Mueller JD, Staw BM. Mood and the market: can press reports of investors' mood predict stock prices? PloS one. 2013;8(8):e72031. pmid:24015202
- View Article
- PubMed/NCBI
- Google Scholar
53. Matta M, Lunesu I, Marchesi M. Bitcoin spread prediction using social and web search media. Proceedings of DeCAT. 2015.
54. Lahmiri S. A comparison of PNN and SVM for stock market trend prediction using economic and technical information. International Journal of Computer Applications. 2011;29(3):24–30.
- View Article
- Google Scholar

[ref1] 1. Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008.

[ref2] 2. Reid F, Harrigan M. An analysis of anonymity in the bitcoin system: Springer; 2013.

[ref3] 3. Böhme R, Christin N, Edelman B, Moore T. Bitcoin: Economics, technology, and governance. The Journal of Economic Perspectives. 2015;29(2):213–38.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Grinberg R. Bitcoin: an innovative alternative digital currency. Hastings Sci & Tech LJ. 2012;4:159.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Ahamad S, Nair M, Varghese B, editors. A survey on crypto currencies. 4th International Conference on Advances in Computer Science, AETACS; 2013: Citeseer.

[ref6] 6. Kondor D, Pósfai M, Csabai I, Vattay G. Do the rich get richer? An empirical analysis of the Bitcoin transaction network. PloS one. 2014;9(2):e86197. pmid:24505257
View Article
PubMed/NCBI
Google Scholar

[11] View Article

[12] PubMed/NCBI

[13] Google Scholar

[ref7] 7. Ron D, Shamir A. Quantitative analysis of the full bitcoin transaction graph. Financial Cryptography and Data Security: Springer; 2013. p. 6–24.

[ref8] 8. Garcia D, Tessone CJ, Mavrodiev P, Perony N. The digital traces of bubbles: feedback cycles between socio-economic signals in the Bitcoin economy. Journal of the Royal Society Interface. 2014;11(99):20140623.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref9] 9. Kondor D, Csabai I, Szüle J, Pósfai M, Vattay G. Inferring the interplay between network structure and market effects in Bitcoin. New Journal of Physics. 2014;16(12):125003.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref10] 10. Kristoufek L. BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era. Scientific reports. 2013;3.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref11] 11. Kristoufek L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PloS one. 2015;10(4):e0123923. pmid:25874694
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref12] 12. Yelowitz A, Wilson M. Characteristics of Bitcoin users: an analysis of Google search data. Applied Economics Letters. 2015;22(13):1030–6.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Bernstein MS, Monroy-Hernández A, Harry D, André P, Panovich K, Vargas GG, editors. 4chan and/b: An Analysis of Anonymity and Ephemerality in a Large Online Community. ICWSM; 2011.

[ref14] 14. Hau YS, Kim Y-G. Why would online gamers share their innovation-conducive knowledge in the online game user community? Integrating individual motivations and social capital perspectives. Computers in Human Behavior. 2011;27(2):956–70.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref15] 15. Kim YB, Lee SH, Kang SJ, Choi MJ, Lee J, Kim CH. Virtual world currency value fluctuation prediction system based on user sentiment analysis. PloS one. 2015;10(8):e0132944. pmid:26241496
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref16] 16. Panzarasa P, Opsahl T, Carley KM. Patterns and dynamics of users' behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology. 2009;60(5):911–32.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref17] 17. Sing CC, Khine MS. An analysis of interaction and participation patterns in online community. JOURNAL OF EDUCATIONAL TECHNOLOGYAND SOCIETY. 2006;9(1):250.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref18] 18. Maurer B, Nelms TC, Swartz L. “When perhaps the real problem is money itself!”: the practical materiality of Bitcoin. Social Semiotics. 2013;23(2):261–77.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref19] 19. Bitcoin Forum[Internet]: Simple Machines; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://bitcointalk.org/.

[ref20] 20. Ethereum[Internet]: forum.ethereum.org; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://forum.ethereum.org.

[ref21] 21. XRP CHAT[Internet]: xrpchat.com; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: http://www.xrpchat.com/.

[ref22] 22. Crypto-Currency Market Capitalizations[Internet]: CoinMarketCap; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: http://coinmarketcap.com/.

[ref23] 23. VanRossum G, Drake FL. The Python Language Reference: Python software foundation Amsterdam, Netherlands; 2010.

[ref24] 24. Mccord M, Chuah M. Spam detection on twitter using traditional classifiers. Autonomic and trusted computing: Springer; 2011. p. 175–86.

[ref25] 25. Prakash VV. Method and apparatus to block spam based on spam reports from a community of users. Google Patents; 2008.

[ref26] 26. Song J, Lee S, Kim J, editors. Spam filtering in twitter using sender-receiver relationship. Recent Advances in Intrusion Detection; 2011: Springer.

[ref27] 27. Thomas K, Grier C, Song D, Paxson V, editors. Suspended accounts in retrospect: an analysis of twitter spam. Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference; 2011: ACM.

[ref28] 28. Wang AH, editor Don't follow me: Spam detection in twitter. Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on; 2010: IEEE.

[ref29] 29. Yardi S, Romero D, Schoenebeck G. Detecting spam in a twitter network. First Monday. 2009;15(1).
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref30] 30. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R, editors. Sentiment analysis of twitter data. Proceedings of the workshop on languages in social media; 2011: Association for Computational Linguistics.

[ref31] 31. Baccianella S, Esuli A, Sebastiani F, editors. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC; 2010.

[ref32] 32. Bifet A, Frank E, editors. Sentiment knowledge discovery in twitter streaming data. Discovery Science; 2010: Springer.

[ref33] 33. Kouloumpis E, Wilson T, Moore JD. Twitter sentiment analysis: The good the bad and the omg! Icwsm. 2011;11:538–41.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref34] 34. Mike T, Buckley K, Paltoglou G. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology. 2012;63(1):163–73.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref35] 35. Pak A, Paroubek P, editors. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. LREc; 2010.

[ref36] 36. Glorot X, Bordes A, Bengio Y, editors. Domain adaptation for large-scale sentiment classification: A deep learning approach. Proceedings of the 28th International Conference on Machine Learning (ICML-11); 2011.

[ref37] 37. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D, editors. The Stanford CoreNLP Natural Language Processing Toolkit. ACL (System Demonstrations); 2014.

[ref38] 38. Pang B, Lee L, Vaithyanathan S, editors. Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10; 2002: Association for Computational Linguistics.

[ref39] 39. Read J, editor Using emoticons to reduce dependency in machine learning techniques for sentiment classification. Proceedings of the ACL student research workshop; 2005: Association for Computational Linguistics.

[ref40] 40. Ye Q, Zhang Z, Law R. Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications. 2009;36(3):6527–35.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref41] 41. Abbasi A, Chen H, Salem A. Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems (TOIS). 2008;26(3):12.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref42] 42. Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and trends in information retrieval. 2008;2(1–2):1–135.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref43] 43. Tang H, Tan S, Cheng X. A survey on sentiment detection of reviews. Expert Systems with Applications. 2009;36(7):10760–73.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref44] 44. Hutto CJ, Gilbert E, editors. Vader: A parsimonious rule-based model for sentiment analysis of social media text. Eighth International AAAI Conference on Weblogs and Social Media; 2014.

[ref45] 45. Granger CW, Huangb B-N, Yang C-W. A bivariate causality between stock prices and exchange rates: evidence from recent Asianflu☆. The Quarterly Review of Economics and Finance. 2000;40(3):337–54.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref46] 46. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. Journal of Computational Science. 2011;2(1):1–8.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref47] 47. Webb GI, Boughton JR, Wang Z. Not so naive Bayes: aggregating one-dependence estimators. Machine learning. 2005;58(1):5–24.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref48] 48. Etherscan[Internet]: Etherscan.io; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://etherscan.io/.

[ref49] 49. RippleCharts[Internet]: Ripple Network; [updated 2016 Mar 30; cited 2016 Mar 30]. Available: https://www.ripplecharts.com/.

[ref50] 50. Kim S-H, Kim D-H, Han C-H, Kim W-I. Stock Forecasting using Stock Index Relation and Genetic Algorithm. Journal of Korean Institute of Intelligent Systems. 2008;18(6):781–6.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref51] 51. Shah D, Zhang K, editors. Bayesian regression and Bitcoin. Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on; 2014: IEEE.

[ref52] 52. Cohen-Charash Y, Scherbaum CA, Kammeyer-Mueller JD, Staw BM. Mood and the market: can press reports of investors' mood predict stock prices? PloS one. 2013;8(8):e72031. pmid:24015202
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref53] 53. Matta M, Lunesu I, Marchesi M. Bitcoin spread prediction using social and web search media. Proceedings of DeCAT. 2015.

[ref54] 54. Lahmiri S. A comparison of PNN and SVM for stock market trend prediction using economic and technical information. International Journal of Computer Applications. 2011;29(3):24–30.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

Figures

Abstract

Introduction

Methods

System Overview

Crawling user comment data

Tagging user comments data

Prediction modeling

Experimental Results

Discussion and Conclusion

Supporting Information

S1 File. Results of crawling Bitcoin forum, Ethereum forum, and Ripple forum (in .json format).

S2 File. Python-based crawler source code for community data collection.

S1 Table. The result of implementing opinion analysis from user opinion data (topic) on the Bitcoin forum (https://bitcointalk.org).

S2 Table. The result of implementing opinion analysis from user opinion data (topic) on the Ethereum forum (https://forum.ethereum.org/).

S3 Table. The result of implementing opinion analysis from user opinion data (topic) on the Ripple forum (http://www.xrpchat.com/).

S4 Table. The result of implementing opinion analysis from user opinion data (reply) on the Bitcoin forum (https://bitcointalk.org).

S5 Table. The result of implementing opinion analysis from user opinion data (reply) on the Ethereum forum (https://forum.ethereum.org/).

S6 Table. The result of implementing opinion analysis from user opinion data (reply) on the Ripple forum (http://www.xrpchat.com/).

Author Contributions

References