ABSTRACT
The aggregation and comparison of behavioral patterns on the WWW represent a tremendous opportunity for understanding past behaviors and predicting future behaviors. In this paper, we take a first step at achieving this goal. We present a large scale study correlating the behaviors of Internet users on multiple systems ranging in size from 27 million queries to 14 million blog posts to 20,000 news articles. We formalize a model for events in these time-varying datasets and study their correlation. We have created an interface for analyzing the datasets, which includes a novel visual artifact, the DTWRadar, for summarizing differences between time series. Using our tool we identify a number of behavioral properties that allow us to understand the predictive power of patterns of use.
- Aizen, J., D. Huttenlocher, J. Kleinberg, and A. Novak, "Traffic-Based Feedback on the Web," PNAS, Suppl. 1: 5254--5260, Apr. 6, 2004.Google ScholarCross Ref
- Allan, J., J. Carbonell, G. Doddington, J. Yamron, Y. Yang, "Topic Detection and Tracking Pilot Study Final Report," Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, Feb., 1998.Google Scholar
- Baeza-Yates, R., and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999. Google ScholarDigital Library
- Chien, S., and N. Immorlica, "Semantic Similarity Between Search Engine Queries Using Temporal Correlation," WWW '05, Chiba, Japan, May 10--14, 2005. Google ScholarDigital Library
- Gabrilovich, E., S. Dumais, and Eric Horvitz, "Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty," WWW '04, New York, NY, May 17-12, 2004. Google ScholarDigital Library
- Gruhl, D., R. Guha, R. Kumar, J. Novak, and A. Tomkins, "The Predictive Power of Online Chatter," KDD '05, Chicago, IL, Aug. 21-24, 2005. Google ScholarDigital Library
- Havre, S., E. Hezler, P. Whitney, and L. Nowell, "ThemeRiver: Visualizing Thematic Changes in Large Document Collections," IEEE Transaction on Visualization and Computer Graphics, 8(1):9--20, 2002. Google ScholarDigital Library
- Keogh, E.J., J. Lin, and A. Fu, "HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence," ICDM '05, Houston, TX, Nov. 27-30, 2005. Google ScholarDigital Library
- Keogh, E.J., and M.J. Pazzani, "Derivative Dynamic Time Warping," SDM '01, Chicago, Apr. 5-7, 2001.Google Scholar
- Kleinberg, J., "Bursty and Hierarchical Structure in Streams," KDD '02, Alberta, Canada, Jul. 23-26, 2002. Google ScholarDigital Library
- Kleinberg, J., "Temporal Dynamics of On-Line Information Streams," In Data Stream Management: Processing High-Speed Data Streams, M. Garofalakis, J. Gehrke, R. Rastogi, eds., Springer, 2006.Google Scholar
- Lavrenko, V., M. Schmill, D. Lawrie, and P. Ogilvie, D. Jensen and J. Allen, "Mining of Concurrent Text and Time Series," Workshop on Text Mining, KDD '00, Boston, MA. Aug. 20, 2000.Google Scholar
- Lin, J., E. Keogh, and S. Lonard, "Visualizing and discovering non-trivial patterns in large time series databases," Information Visualization, 4(2):61--82, July, 2005. Google ScholarDigital Library
- Martzoukou, K., "A review of Web information seeking research: considerations of method and foci of interest," Information Research, 10(2), paper 215, 2004.Google Scholar
- Microsoft Live Labs, "Accelerating Search in Academic Research," 2006.Google Scholar
- Murray, G. C., J. Lin, and A. Chowdhury, "Identification of User Sessions with Hierarchical Agglomerative Clustering," ASIS&T'06, Austin, TX, Nov. 3-8, 2006.Google Scholar
- Myers, C.S., and L.R. Rabiner, "A Comparative Study of Several Dynamic Time-Warping Algorithms for Connected Word Recognition," The Bell System Tech. J., 60(7):1389--1408, September, 191.Google ScholarCross Ref
- Nielsen BuzzMetrics, ICWSM Conference dataset, http://www.icwsm.org/data.htmlGoogle Scholar
- Pass, G., A. Chowdhury, C. Torgeson, "A Picture of Search" Infoscale '06, Hong Kong, June, 2006. Google ScholarDigital Library
- Sakoe, H., and S. Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-26(1):43--49, 1978.Google ScholarCross Ref
- Teevan, J., E. Adar, R. Jones, and M. Potts, "History repeats itself: repeat queries in Yahoo's logs," SIGIR'06, Seattle, WA, Aug., 6--11, 2006. Google ScholarDigital Library
- Tufte, E., Beautiful Evidence, Graphics Press, 2006. Google ScholarDigital Library
- Van Wijk, J.J. and van Selow, E.R., "Cluster and Calendar Based Visualization of Time Series Data," Infovis '99, San Francisco, CA, Oct. 24-29, 1999. Google ScholarDigital Library
- Vlachos, M., C. Meek, Z. Vagena, and D. Gunopulos, "Identifying Similarities, Periodicities, and Bursts for Online Search Queries," SIGMOD '04, Paris, France, June 13-18, 2004. Google ScholarDigital Library
- Weber, M., M. Alexa, and W. Muller, "Visualizing Time Series on Spirals," Infovis '01, San Diego, CA, Oct. 22-23, 2001. Google ScholarDigital Library
- Wen, J., J. Nie, H. Zhang, "Query Clustering Using User Logs," ACM Trans. on Info. Sys., 20(1):59--81, Jan. 2002. Google ScholarDigital Library
- Witkin, A. P. "Scale-space filtering", IJCAI '83, Karlsruche, Germany, Aug. 8-12, 1983.Google Scholar
Index Terms
- Why we search: visualizing and predicting user behavior
Recommendations
CrowdScape: interactively visualizing user behavior and output
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technologyCrowdsourcing has become a powerful paradigm for accomplishing work quickly and at scale, but involves significant challenges in quality control. Researchers have developed algorithmic quality control approaches based on either worker outputs (such as ...
Applying data mining technology to analyze user behavior in course website
ACST'07: Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and TechnologyInformation of network grows up fast, and there is an important thing providing user a tool which could search information quickly. In order to achieve this purpose and we must track and analyze user behavior of network. We apply data mining approach ...
Modeling and predicting user behavior in sponsored search
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningImplicit user feedback, including click-through and subsequent browsing behavior, is crucial for evaluating and improving the quality of results returned by search engines. Several recent studies [1, 2, 3, 13, 25] have used post-result browsing behavior ...
Comments