skip to main content
research-article

Beyond itemsets: mining frequent featuresets over structured items

Published:01 November 2014Publication History
Skip Abstract Section

Abstract

We assume a dataset of transactions generated by a set of users over structured items where each item could be described through a set of features. In this paper, we are interested in identifying the frequent featuresets (set of features) by mining item transactions. For example, in a news website, items correspond to news articles, the features are the named-entities/topics in the articles and an item transaction would be the set of news articles read by a user within the same session. We show that mining frequent featuresets over structured item transactions is a novel problem and show that straightforward extensions of existing frequent itemset mining techniques provide unsatisfactory results. This is due to the fact that while users are drawn to each item in the transaction due to a subset of its features, the transaction by itself does not provide any information about such underlying preferred features of users. In order to overcome this hurdle, we propose a featureset uncertainty model where each item transaction could have been generated by various featuresets with different probabilities. We describe a novel approach to transform item transactions into uncertain transaction over featuresets and estimate their probabilities using constrained least squares based approach. We propose diverse algorithms to mine frequent featuresets. Our experimental evaluation provides a comparative analysis of the different approaches proposed.

References

  1. C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In SIGKDD, pages 29--38. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE TKDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, New York, NY, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, pages 983--992. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Zuefle. Probabilistic frequent itemset mining in uncertain databases. In SIGKDD, pages 119--128. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996.Google ScholarGoogle Scholar
  9. C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDBJ, 16(4): 523--544, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Damaschke. The union of minimal hitting sets: parameterized combinatorial bounds and counting. In STACS, pages 332--343, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Gupta, G. Fang, B. Field, M. Steinbach, and V. Kumar. Quantitative evaluation of approximate frequent pattern mining algorithms. In KDD, pages 301--309, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hofmann. Learning what people (don't) want. In EMCL, EMCL '01, pages 214--225, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In AKDDM. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Reiter. A theory of diagnosis from first principles. Artif. Intell., 32(1): 57--95, Apr. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In Data Mining Workshops, ICDM, pages 507--512. IEEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In SIGKDD, pages 273--282. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Venetis, G. Koutrika, and H. Garcia-Molina. On the selection of tags for tag clouds. In WSDM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. SIGMOD '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 8, Issue 3
    November 2014
    144 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 November 2014
    Published in pvldb Volume 8, Issue 3

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader