Skip to main content
Log in

Assessing reliability of social media data: lessons from mining TripAdvisor hotel reviews

  • Original Research
  • Published:
Information Technology & Tourism Aims and scope Submit manuscript

Abstract

As an emerging research paradigm, big data analytics has been gaining currency in various fields. However, in existing hospitality and tourism literature there is scarcity of discussions on the quality of data which may impact the validity and generalizability of research findings. This study examines the reliability of online hotel reviews in TripAdvisor by developing a text classifier to predict travel purpose (i.e., business vs. leisure) based upon review textual contents. The classifier is tested over a range of cities and data sizes to examine its sensitivity to data samples. The findings show that, while the classifier’s performance is consistent across different cities, there are variations in response to data sizes and sampling methods. More importantly, a considerable amount of noise is found in the data, which leads to misclassification. Furthermore, a novel approach is developed to address the misclassification problem resulting from data noise. This study reveals important data quality issues and contributes to the theoretical development of social media analytics in hospitality and tourism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abrahams AS, Fan W, Wang GA, Zhang ZJ, Jiao J (2015) An integrated text analytic framework for product defect discovery. Prod Oper Manag 24(6):975–990

    Article  Google Scholar 

  • Banerjee S, Chua AY (2016) In search of patterns among travellers’ hotel ratings in TripAdvisor. Tour Manag 53:125–131

    Article  Google Scholar 

  • Bird S, Klein E, Loper E (2009) Natural language processing with python. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  • Chua AY, Banerjee S (2013) Reliability of reviews on the internet: the case of Tripadvisor. In: Proceedings of the World Congress on Engineering and Computer Science (vol. 1). Available at http://www.iaeng.org/publication/WCECS2013/WCECS2013_pp453-457.pdf. Accessed Sep 2016

  • Ekbia H, Mattioli M, Kouper I, Arave G, Ghazinejad A, Bowman T, Sugimoto C (2015) Big data, bigger dilemmas: a critical review. J Assoc Inf Sci Technol 66(8):1523–1545

    Article  Google Scholar 

  • Fan W, Gordon MD (2014) The power of social media analytics. Commun ACM 57(6):74–81

    Article  Google Scholar 

  • Fesenmaier DR, Wöber KW, Werthner H (eds) (2006). Destination recommendation systems: behavioral foundations and applications. CABI

  • Frické M (2015) Big data and its epistemology. J Assoc Inf Sci Technol 66(4):651–661

    Article  Google Scholar 

  • Gretzel U, Fesenmaier DR (2002) Building narrative logic into tourism information systems. IEEE Intell Syst 17(6):59–61

    Google Scholar 

  • Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, Brewer D, Jebara T (2009) Life in the network: the coming age of computational social science. Science (New York, NY) 323(5915):721

    Article  Google Scholar 

  • McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. AAAI-98 Workshop on learning for text categorization, vol 752, pp 41–48

  • Mccleary KW, Weaver PA, Hutchinson JC (1993) Hotel selection factors as they relate to business travel situations. J Travel Res 32(2):42–48

    Article  Google Scholar 

  • Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134

    Article  Google Scholar 

  • Park S, Nicolau JL (2015) Asymmetric effects of online consumer reviews. Ann Tour Res 50:67–83

    Article  Google Scholar 

  • Ruths D, Pfeffer J (2014) Social media for large studies of behavior. Science 346(6213):1063–1064

    Article  Google Scholar 

  • Schuckert M, Liu X, Law R (2015) Hospitality and tourism online reviews: recent trends and future directions. J Travel Tour Mark 32(5):608–621

    Article  Google Scholar 

  • Schuckert M, Liu X, Law R (2016) Insights into suspicious online ratings: direct evidence from TripAdvisor. Asia Pac J Tour Res 21(3):259–272

    Article  Google Scholar 

  • Tufekci Z (2014) Big questions for social media big data: representativeness, validity and other methodological pitfalls. Preprint arXiv:1403.7400

  • Xiang Z, Pan B (2011) Travel queries on cities in the United States: implications for search engine marketing for tourist destinations. Tour Manag 32(1):88–97

    Article  Google Scholar 

  • Xiang Z, Schwartz Z, Gerdes J, Uysal M (2015) What can big data and text analytics tell us about hotel guest experience and satisfaction? Int J Hosp Manag 44(1):120–130

    Article  Google Scholar 

  • Xiang Z, Du Q, Ma Y, Fan W (2017) A comparative analysis of major online review platforms: implications for social media analytics in hospitality and tourism. Tour Manag 58:51–65

    Article  Google Scholar 

Download references

Acknowledgements

This study was sponsored by the National Natural Science Foundation of China (71373023) and Beijing Municipal Commission of Education (SM201611417001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Xiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiang, Z., Du, Q., Ma, Y. et al. Assessing reliability of social media data: lessons from mining TripAdvisor hotel reviews. Inf Technol Tourism 18, 43–59 (2018). https://doi.org/10.1007/s40558-017-0098-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40558-017-0098-z

Keywords

Navigation