skip to main content
10.1145/1081870.1081883acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

The predictive power of online chatter

Published:21 August 2005Publication History

ABSTRACT

An increasing fraction of the global discourse is migrating online in the form of blogs, bulletin boards, web pages, wikis, editorials, and a dizzying array of new collaborative technologies. The migration has now proceeded to the point that topics reflecting certain individual products are sufficiently popular to allow targeted online tracking of the ebb and flow of chatter around these topics. Based on an analysis of around half a million sales rank values for 2,340 books over a period of four months, and correlating postings in blogs, media, and web pages, we are able to draw several interesting conclusions.First, carefully hand-crafted queries produce matching postings whose volume predicts sales ranks. Second, these queries can be automatically generated in many cases. And third, even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.

References

  1. E. Adar, L. Zhang, L. A. Adamic, and R. M. Lukose. Implicit structure and the dynamics of blogspace. Workshop on the Weblogging Ecosystem, 13th International World Wide Web Conference, 2004.Google ScholarGoogle Scholar
  2. A. Admati and P eiderer. Disclosing information on the internet: Is it noise or is it news? Technical report, Graduate School of Business, Stanford University, 2001.Google ScholarGoogle Scholar
  3. J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.Google ScholarGoogle Scholar
  4. W. Antweiler and M. Z. Frank. Is all that talk just noise? The information content of Internet stock message boards. Journal of Finance, 59(3):1259--1295, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Arbesman. The memespread project: An initial analysis of the contagious nature of information in online networks. http://www.arbesman.net/memespread.pdf, 2004.Google ScholarGoogle Scholar
  6. Biz360. Market360 product datasheet. Technical report, Biz360, 2004.Google ScholarGoogle Scholar
  7. P. Blackshaw and M. Nazzaro. Consumer-generated media (cgm) 101. Technical report, Intelliseek, 2004.Google ScholarGoogle Scholar
  8. G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis, Forecasting and Control. Prentice Hall, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carma. How doe we gain an understanding of the media environment on our company as our industry comes under scrutiny? Technical report, Carma, 2004.Google ScholarGoogle Scholar
  10. C. Chatfield. The Analysis of Time Series. Chapman and Hall, 1984.Google ScholarGoogle Scholar
  11. S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. Tomlin, and J. Y. Zien. Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In Proc. of the 12th International World Wide Web Conference, pages 178--186, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Sornette, F. Deschâtres, T. Gilbert, and Y. Ageon. Endogenous versus exogenous shocks in complex networks: An empirical test using book sale rankings. Physical Review Letters, 93(228701), 2004.Google ScholarGoogle Scholar
  13. D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a webfountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. In Proc. of the 13th International World Wide Web Conference, pages 491--501, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 91--101, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In Proc. of the 12th International World Wide Web Conference, pages 568--576, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. Structure and evolution of blogspace. Communications of the ACM, 47(12):35--39, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lin and A. Halavais. Mapping the blogosphere in america. Workshop on the Weblogging Ecosystem, 13th International World Wide Web Conference, 2004.Google ScholarGoogle Scholar
  19. R. Papka. On-line new event detection, clustering, and tracking. Technical Report UM-CS-1999-045, University of Massachusetts, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Smith. Detecting and browsing events in unstructured text. In Proc. of the 25th ACM International Conference on Research and Development in Information Retrieval, pages 73--80, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Tong. Detecting and tracking opinions in on-line discussions. UCB/SIMS Web Mining Workshop, 2001.Google ScholarGoogle Scholar
  22. R. Tumarkin and R. F. Whitelaw. News or noise? internet postings and stock prices. Financial Analysts Journal, pages 41--51, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  23. B. Whitman and S. Lawrence. Inferring descriptions and similarity for music from community metadata. In Proc. of the 2002 International Computer Music Conference, pages 591--598, 2002.Google ScholarGoogle Scholar
  24. Y. Yang, T. Pierce, and J. Carbonell. A study on retrospective and on-line event detection. In Proc. of the 21st ACM International Conference on Research and Development in Information Retrieval, pages 28--36, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The predictive power of online chatter

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader