research-article

Open Access

Sequences of Sets

Authors:
Austin R. Benson

Cornell University, Ithaca, NY, USA

Cornell University, Ithaca, NY, USA
View Profile

,
Ravi Kumar

Google, Mountain View, CA, USA

Google, Mountain View, CA, USA
View Profile

,
Andrew Tomkins

Google, Mountain View, CA, USA

Google, Mountain View, CA, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 1148–1157https://doi.org/10.1145/3219819.3220100

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1148–1157

ABSTRACT

Sequential behavior such as sending emails, gathering in groups, tagging posts, or authoring academic papers may be characterized by a set of recipients, attendees, tags, or coauthors respectively. Such "sequences of sets" show complex repetition behavior, sometimes repeating prior sets wholesale, and sometimes creating new sets from partial copies or partial merges of earlier sets.

In this paper, we provide a stochastic model to capture these patterns. The model has two classes of parameters. First, a correlation parameter determines how much of an earlier set will contribute to a future set. Second, a vector of recency parameters captures the fact that a set in a sequence is more similar to recent sets than more distant ones. Comparing against a strong baseline, we find that modeling both correlation and recency structures are required for high accuracy. We also find that both parameter classes vary widely across domains, so must be optimized on a per-dataset basis. We present the model in detail, provide a theoretical examination of its asymptotic behavior, and perform a set of detailed experiments on its predictive performance.

Supplemental Material

benson_sequences_of_sets.mp4

mp4

421.6 MB

Download

References

Eytan Adar, Jaime Teevan, and Susan T. Dumais . 2008. Large scale analysis of Web revisitation patterns Proceeding of the twenty-sixth Annual CHI Conference on Human Factors in Computing Systems. ACM Press. Google ScholarDigital Library
Eytan Adar, Jaime Teevan, and Susan T. Dumais . 2009. Resonance on the Web: Web dynamics and revisitation patterns Proceedings of the 27th international conference on Human factors in computing systems. ACM Press. Google ScholarDigital Library
Gediminas Adomavicius and Alexander Tuzhilin . 2005. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering Vol. 17, 6 (2005), 734--749. Google ScholarDigital Library
Ashton Anderson, Ravi Kumar, Andrew Tomkins, and Sergei Vassilvitskii . 2014. The dynamics of repeat consumption. In Proceedings of the 23rd International Conference on World Wide Web. ACM Press. Google ScholarDigital Library
Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Kleinberg . 2018 a. Simplicial closure and higher-order link prediction. arXiv:1802.06916 (2018). deftempurl%https://arxiv.org/abs/1802.06916 tempurlGoogle Scholar
Austin R. Benson, Ravi Kumar, and Andrew Tomkins . 2016. Modeling User Consumption Sequences. In Proceedings of the 25th International Conference on World Wide Web. ACM Press. Google ScholarDigital Library
Austin R. Benson, Ravi Kumar, and Andrew Tomkins . 2018 b. A Discrete Choice Model for Subset Selection. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM Press. Google ScholarDigital Library
Jun Chen, Chaokun Wang, and Jianmin Wang . 2015. Will You “Reconsume" the Near Past? Fast Prediction on Short-Term Reconsumption Behaviors. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. deftempurl%https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9491 tempurl Google ScholarDigital Library
Chao-Min Chiu, Meng-Hsiang Hsu, Hsiangchu Lai, and Chun-Ming Chang . 2012. Re-examining the influence of trust on online repeat purchase intention: The moderating role of habit and its antecedents. Decision Support Systems Vol. 53, 4 (2012), 835--845. Google ScholarDigital Library
Alan Collins, Chris Hand, and Maggie Linnell . 2008. Analyzing repeat consumption of identical cultural goods: some exploratory evidence from moviegoing. Journal of Cultural Economics Vol. 32, 3 (2008), 187--199.Google ScholarCross Ref
John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra . 2008. Efficient projections onto the $ell_1$-ball for learning in high dimensions Proceedings of the 25th International Conference on Machine Learning. ACM Press. Google ScholarDigital Library
Marion M. Hetherington, Ali Bell, and Barbara J. Rolls . 2000. Effects of repeat consumption on pleasantness, preference and intake. British Food Journal Vol. 102, 7 (2000), 507--521.Google ScholarCross Ref
Pamela J Hinds, Kathleen M Carley, David Krackhardt, and Doug Wholey . 2000. Choosing Work Group Members: Balancing Similarity, Competence, and Familiarity. Organizational Behavior and Human Decision Processes Vol. 81, 2 (2000), 226--251.Google ScholarCross Ref
Jacob Jacoby and David B. Kyner . 1973. Brand Loyalty vs. Repeat Purchasing Behavior. Journal of Marketing Research Vol. 10, 1 (1973).Google ScholarCross Ref
Barbara E. Kahn, Manohar U. Kalwani, and Donald G. Morrison . 1986. Measuring Variety-Seeking and Reinforcement Behaviors Using Panel Data. Journal of Marketing Research Vol. 23, 2 (1986), 89.Google ScholarCross Ref
Komal Kapoor, Mingxuan Sun, Jaideep Srivastava, and Tao Ye . 2014. A hazard based approach to user return time prediction Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press. Google ScholarDigital Library
Bryan Klimt and Yiming Yang . 2004. The Enron Corpus: A New Dataset for Email Classification Research Machine Learning: ECML 2004. Springer Berlin Heidelberg, 217--226. Google ScholarDigital Library
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos . 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data Vol. 1, 1 (2007). Google ScholarDigital Library
Rossana Mastrandrea, Julie Fournet, and Alain Barrat . 2015. Contact Patterns in a High School: A Comparison between Data Collected Using Wearable Sensors, Contact Diaries and Friendship Surveys. PLOS ONE Vol. 10, 9 (2015), e0136497.Google ScholarCross Ref
Leigh McAlister . 1982. A Dynamic Attribute Satiation Model of Variety-Seeking Behavior. Journal of Consumer Research Vol. 9, 2 (1982), 141.Google ScholarCross Ref
Ben Morris and Yuval Peres . 2003. Evolving sets and mixing. In Proceedings of the thirty-fifth ACM Symposium on Theory of Computing. ACM Press. Google ScholarDigital Library
Ben Morris and Yuval Peres . 2005. Evolving sets, mixing and heat kernel bounds. Probability Theory and Related Fields Vol. 133, 2 (2005), 245--266.Google ScholarCross Ref
Rebecca K. Ratner, Barbara E. Kahn, and Daniel Kahneman . 1999. Choosing Less-Preferred Experiences For the Sake of Variety. Journal of Consumer Research Vol. 26, 1 (1999), 1--15.Google ScholarCross Ref
Maja Rudolph, Francisco Ruiz, Stephan Mandt, and David Blei . 2016. Exponential family embeddings. In Advances in Neural Information Processing Systems. 478--486. deftempurl%https://papers.nips.cc/paper/6571-exponential-family-embeddings tempurl Google ScholarDigital Library
Laurent Saloff-Coste . 2004. Random Walks on Finite Groups. In Probability on Discrete Structures. Springer Berlin Heidelberg, 263--346.Google Scholar
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang . 2015. An Overview of Microsoft Academic Service (MAS) and Applications Proceedings of the 24th International Conference on World Wide Web. ACM Press. Google ScholarDigital Library
Juliette Stehlé, Nicolas Voirin, Alain Barrat, Ciro Cattuto, Lorenzo Isella, Jean-Franccois Pinton, Marco Quaggiotto, Wouter Van den Broeck, Corinne Régis, Bruno Lina, and Philippe Vanhems . 2011. High-Resolution Measurements of Face-to-Face Contact Patterns in a Primary School. PLOS ONE Vol. 6, 8 (2011), e23176.Google ScholarCross Ref
Jaime Teevan, Eytan Adar, Rosie Jones, and Michael Potts . 2006. History repeats itself: Repeat queries in Yahoo's logs Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press. Google ScholarDigital Library
Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts . 2007. Information re-retrieval: Repeat queries in Yahoo's logs Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press. Google ScholarDigital Library
Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich . 2017. Local Higher-Order Graph Clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press. Google ScholarDigital Library
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola . 2017. Deep sets Advances in Neural Information Processing Systems. 3394--3404. deftempurl%https://papers.nips.cc/paper/6931-deep-sets tempurlGoogle Scholar

Index Terms

Sequences of Sets
1. Applied computing
  1. Operations research
    1. Decision analysis
2. Information systems
  1. Information systems applications
    1. Data mining
  2. World Wide Web
    1. Web applications
      1. Internet communications tools
        Email
      2. Social networks

Recommendations

New Sets of Optimal p-ary Low-Correlation Zone Sequences

In this correspondence, three methods of constructing low-correlation zone (LCZ) sequences are proposed. In the first method, we constructed binary LCZ sequence sets of period 2ⁿ-1 using the Legendre sequences of period 2^m-1 as a column sequence when m|...
Read More
Unique Binary-Search-Tree Representations and Equality Testing of Sets and Sequences

This paper studies the problem of representing sets over an ordered universe by unique binary search trees, so that dictionary operations can be performed efficiently on any set. Although efficient randomized solutions to the problem are known, its ...
Read More
Polyphase zero correlation zone sequences from generalised bent functions
Abstract
Sequence families with zero correlation zone (ZCZ) have been extensively studied in recent years due to their important applications in quasi-synchronous code-division multiple-access (QS-CDMA) systems. To accommodate multiuser environments, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
repeat consumption
sequences
sets
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 1,696
  Total Downloads
- Downloads (Last 12 months)194
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sequences of Sets

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

New Sets of Optimal p-ary Low-Correlation Zone Sequences

Unique Binary-Search-Tree Representations and Equality Testing of Sets and Sequences

Polyphase zero correlation zone sequences from generalised bent functions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sequences of Sets

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

New Sets of Optimal p-ary Low-Correlation Zone Sequences

Unique Binary-Search-Tree Representations and Equality Testing of Sets and Sequences

Polyphase zero correlation zone sequences from generalised bent functions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media