skip to main content
research-article

Effective temporal dependence discovery in time series data

Published:01 April 2018Publication History
Skip Abstract Section

Abstract

To analyze user behavior over time, it is useful to group users into cohorts, giving rise to cohort analysis. We identify several crucial limitations of current cohort analysis, motivated by the unmet need for temporal dependence discovery. To address these limitations, we propose a generalization that we call recurrent cohort analysis. We introduce a set of operators for recurrent cohort analysis and design access methods specific to these operators in both single-node and distributed environments. Through extensive experiments, we show that recurrent cohort analysis when implemented using the proposed access methods is up to six orders faster than one implemented as a layer on top of a database in a single-node setting, and two orders faster than one implemented using Spark SQL in a distributed setting.

References

  1. Amplitude. https://amplitude.com.Google ScholarGoogle Scholar
  2. Apache zookeeper. https://zookeeper.apache.org/.Google ScholarGoogle Scholar
  3. Retention. https://mixpanel.com/retention/.Google ScholarGoogle Scholar
  4. Rjmetrics. https://rjmetrics.com/.Google ScholarGoogle Scholar
  5. Top 10 best stock market analysis software review 2018. https://www.liberatedstocktrader.com/top-10-best-stock-market-analysis-software-review/.Google ScholarGoogle Scholar
  6. Use the cohort analysis report. https://support.google.com/analytics/answer/6074676?hl=en.Google ScholarGoogle Scholar
  7. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. F. Adams, G. C. Fonarow, C. L. Emerman, T. H. LeJemtel, M. R. Costanzo, W. T. Abraham, R. L. Berkowitz, M. Galvao, and D. P. Horton. Characteristics and outcomes of patients hospitalized for heart failure in the united states: rationale, design, and preliminary observations from the first 100,000 cases in the acute decompensated heart failure national registry (adhere). American heart journal, 149(2):209--216, 2005.Google ScholarGoogle Scholar
  9. S. Amer-Yahia and T. Johnson. Optimizing queries on compressed bitmaps. In VLDB, pages 329--338, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383--1394. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. A. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.Google ScholarGoogle Scholar
  12. N. E. Breslow, J. Lubin, P. Marek, and B. Langholz. Multiplicative models and cohort analysis. Journal of the American Statistical Association, 78(381):1--12, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  13. Q. Cai, W. Guo, H. Zhang, D. Agrawal, G. Chen, B. C. Ooi, K.-L. Tan, Y. M. Teo, and S. Wang. Efficient distributed memory management with rdma and caching. Technical report, National University of Singapore, Department of Computer Science, 2018.Google ScholarGoogle Scholar
  14. Q. Cai, H. Zhang, W. Guo, G. Chen, B. C. Ooi, K. L. Tan, and W. F. Wong. Memepic: Towards a unified in-memory big data management system. IEEE Transactions on Big Data, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Donnell et al. Heterosexual hiv-1 transmission after initiation of antiretroviral therapy: a prospective cohort analysis. The Lancet, 375(9731):2092--2098, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. D. Glenn. Cohort Analysis. Sage Publications, Inc., London, 2005.Google ScholarGoogle Scholar
  18. E. A. Hoste, G. Clermont, A. Kersten, R. Venkataraman, D. C. Angus, D. De Bacquer, and J. A. Kellum. Rifle criteria for acute kidney injury are associated with hospital mortality in critically ill patients: a cohort analysis. Critical care, 10(3):1, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  19. D. Jiang, Q. Cai, G. Chen, H. V. Jagadish, B. C. Ooi, K.-L. Tan, and A. K. H. Tung. Cohort query processing. PVLDB, 10(1):1--12, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 447--456, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. L. Kupper, J. M. Janis, A. Karmous, and B. G. Greenberg. Statistical age-period-cohort analysis: a review and critique. Journal of chronic diseases, 38(10):811--830, 1985.Google ScholarGoogle Scholar
  22. Y. Li and J. M. Patel. Bitweaving: Fast scans for main memory data processing. In SIGMOD, pages 289--300, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing database architecture for the new bottleneck: Memory access. The VLDB Journal, 9(3):231--246, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. M. Martin, P. N. Biswas, S. N. Freemantle, G. L. Pearce, and R. D. Mann. Age and sex distribution of suspected adverse drug reactions to newly marketed drugs in general practice in england: analysis of 48 cohort studies. British journal of clinical pharmacology, 46(5):505--511, 1998.Google ScholarGoogle Scholar
  25. W. M. Mason and S. Fienberg. Cohort analysis in social research: Beyond the identification problem. Springer Science & Business Media, 2012.Google ScholarGoogle Scholar
  26. J. G. Pope. An investigation of the accuracy of virtual population analysis using cohort analysis. ICNAF Research Bulletin, 9(10):65--74, 1972.Google ScholarGoogle Scholar
  27. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: A column-oriented dbms. In VLDB, pages 553--564, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29(3):55--67, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Z. Xie, Q. Cai, F. He, G. Y. Ooi, W. Huang, and B. C. Ooi. Cohort analysis with ease. In ACM SIGMOD Demo, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. pages 2--2, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27(7):1920--1948, July 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In ICDE, pages 59--70, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 11, Issue 8
    April 2018
    94 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 April 2018
    Published in pvldb Volume 11, Issue 8

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader