ABSTRACT
Mining user access patterns from a continuous stream of Web-clicks presents new challenges over traditional Web usage mining in a large static Web-click database. Modeling user access patterns as maximal forward references, we present a single-pass algorithm StreamPath for online discovering frequent path traversal patterns from an extended prefix tree-based data structure which stores the compressed and essential information about user's moving histories in the stream. Theoretical analysis and performance evaluation show that the space requirement of StreamPath is limited to a logarithmic boundary, and the execution time, compared with previous multiple-pass algorithms [2], is fast.
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. Models and Issues in Data Stream Systems. In Proc. of the 2002 ACM Symposium on Principles of Database Systems, 2002. Google ScholarDigital Library
- Chen, M.-S., Park, J.-S., and Yu, P. S. Efficient Data Mining for Path Traversal Patterns, IEEE Transactions on Knowledge and Data Engineering (TKDE), 10(2):209--221, 1998. Google ScholarDigital Library
- Han, J., Pei, J., Yin, Y., and Mao, R. Mining Frequent Patterns without Candidate Generation: A Frequent-pattern Tree Approach. Data Mining and Knowledge Discovery: An International Journal, 8(1):53--87, 2004. Google ScholarDigital Library
- Karp, R., Shenker, S., and Papadimitriou, C. A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM Transactions on Database Systems (TODS), 28(1):51--55, 2003. Google ScholarDigital Library
Index Terms
- On mining webclick streams for path traversal patterns
Recommendations
Mining frequent patterns across multiple data streams
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementMining frequent patterns from data streams has drawn increasing attention in recent years. However, previous mining algorithms were all focused on a single data stream. In many emerging applications, it is of critical importance to combine multiple data ...
A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences
Online mining of path traversal patterns from Web click-streams is one of the most important problems of Web usage mining. In this paper, we propose a sliding window-based Web data mining algorithm, called Top-SW (Top-kpath traversal patterns of Stream ...
An efficient algorithm for mining temporal high utility itemsets from data streams
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in ...
Comments