Sequential pattern mining algorithms using a vertical representation are the most efficient for mining sequential patterns in dense or long sequences, and have excellent overall performance. The vertical representation allows generating patterns and calculating their supports without performing costly database scans. However, a crucial performance bottleneck of vertical algorithms is that they use a generate-candidate-and-test approach that can generate a large amount of infrequent candidates.To address this issue, we propose pruning candidates based on the study of item co-occurrences. We present a new structure named CMAP (Co-occurence MAP) for storing co-occurrence information. We explain how CMAP can be used to prune candidates in three state-of-the-art vertical algorithms, namely SPADE, SPAM and ClaSP. An extensive experimental study with six real-life datasets shows that (1) co-occurrence-based pruning is effective, (2) CMAP is very compact and that (3) the resulting algorithms outperform state-of-the-art algorithms for mining sequential patterns (GSP, PrefixSpan, SPADE and SPAM) and closed sequential patterns (ClaSP and CloSpan).
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information
- Springer International Publishing
Neuer Inhalt/© ITandMEDIA