research-article

Beyond itemsets: mining frequent featuresets over structured items

Authors:
Saravanan Thirumuruganathan

University of Texas, Arlington

University of Texas, Arlington
View Profile

,
Habibur Rahman

University of Texas, Arlington

University of Texas, Arlington
View Profile

,
Sofiane Abbar

Qatar Computing Research Institute

Qatar Computing Research Institute
View Profile

,
Gautam Das

University of Texas, Arlington

University of Texas, Arlington
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 3pp 257–268https://doi.org/10.14778/2735508.2735515

Published:01 November 2014Publication History

Proceedings of the VLDB Endowment

Abstract

We assume a dataset of transactions generated by a set of users over structured items where each item could be described through a set of features. In this paper, we are interested in identifying the frequent featuresets (set of features) by mining item transactions. For example, in a news website, items correspond to news articles, the features are the named-entities/topics in the articles and an item transaction would be the set of news articles read by a user within the same session. We show that mining frequent featuresets over structured item transactions is a novel problem and show that straightforward extensions of existing frequent itemset mining techniques provide unsatisfactory results. This is due to the fact that while users are drawn to each item in the transaction due to a subset of its features, the transaction by itself does not provide any information about such underlying preferred features of users. In order to overcome this hurdle, we propose a featureset uncertainty model where each item transaction could have been generated by various featuresets with different probabilities. We describe a novel approach to transform item transactions into uncertain transaction over featuresets and estimate their probabilities using constrained least squares based approach. We propose diverse algorithms to mine frequent featuresets. Our experimental evaluation provides a comparative analysis of the different approaches proposed.

References

C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In SIGKDD, pages 29--38. ACM, 2009. Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE TKDE, 2009. Google ScholarDigital Library
R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216, New York, NY, USA, 1993. Google ScholarDigital Library
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994. Google ScholarDigital Library
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, pages 983--992. IEEE, 2008. Google ScholarDigital Library
O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Zuefle. Probabilistic frequent itemset mining in uncertain databases. In SIGKDD, pages 119--128. ACM, 2009. Google ScholarDigital Library
Å. Björck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996.Google Scholar
C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, pages 47--58, 2007. Google ScholarDigital Library
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDBJ, 16(4): 523--544, 2007. Google ScholarDigital Library
P. Damaschke. The union of minimal hitting sets: parameterized combinatorial bounds and counting. In STACS, pages 332--343, 2007. Google ScholarDigital Library
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarDigital Library
R. Gupta, G. Fang, B. Field, M. Steinbach, and V. Kumar. Quantitative evaluation of approximate frequent pattern mining algorithms. In KDD, pages 301--309, 2008. Google ScholarDigital Library
T. Hofmann. Learning what people (don't) want. In EMCL, EMCL '01, pages 214--225, 2001. Google ScholarDigital Library
C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk. A tree-based approach for frequent pattern mining from uncertain data. In AKDDM. Springer, 2008. Google ScholarDigital Library
R. Reiter. A theory of diagnosis from first principles. Artif. Intell., 32(1): 57--95, Apr. 1987. Google ScholarDigital Library
P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In Data Mining Workshops, ICDM, pages 507--512. IEEE, 2007. Google ScholarDigital Library
L. Sun, R. Cheng, D. W. Cheung, and J. Cheng. Mining uncertain data with probabilistic guarantees. In SIGKDD, pages 273--282. ACM, 2010. Google ScholarDigital Library
H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996. Google ScholarDigital Library
Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. VLDB, 2012. Google ScholarDigital Library
P. Venetis, G. Koutrika, and H. Garcia-Molina. On the selection of tags for tag clouds. In WSDM, 2011. Google ScholarDigital Library
Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. SIGMOD '08, 2008. Google ScholarDigital Library

Recommendations

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress ...
Read More
Efficient algorithms for mining high-utility itemsets in uncertain databases

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Read More
Hiding co-occurring frequent itemsets
EDBT/ICDT '09: Proceedings of the 2009 EDBT/ICDT Workshops

Knowledge hiding, hiding rules/patterns that are inferable from published data and attributed sensitive, is extensively studied in the literature in the context of frequent itemsets and association rules mining from transactional data. The research in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 8, Issue 3
November 2014
144 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 November 2014
Published in pvldb Volume 8, Issue 3
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond itemsets: mining frequent featuresets over structured items

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Efficient algorithms for mining high-utility itemsets in uncertain databases

Hiding co-occurring frequent itemsets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Beyond itemsets: mining frequent featuresets over structured items

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Reference itemsets: useful itemsets to approximate the representation of frequent itemsets

Efficient algorithms for mining high-utility itemsets in uncertain databases

Hiding co-occurring frequent itemsets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media