skip to main content
10.1145/3377813.3381369acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Debugging crashes using continuous contrast set mining

Published:18 September 2020Publication History

ABSTRACT

Facebook operates a family of services used by over two billion people daily on a huge variety of mobile devices. Many devices are configured to upload crash reports should the app crash for any reason. Engineers monitor and triage millions of crash reports logged each day to check for bugs, regressions, and any other quality problems. Debugging groups of crashes is a manually intensive process that requires deep domain expertise and close inspection of traces and code, often under time constraints.

We use contrast set mining, a form of discriminative pattern mining, to learn what distinguishes one group of crashes from another. Prior works focus on discretization to apply contrast mining to continuous data. We propose the first direct application of contrast learning to continuous data, without the need for discretization. We also define a weighted anomaly score that unifies continuous and categorical contrast sets while mitigating bias, as well as uncertainty measures that communicate confidence to developers. We demonstrate the value of our novel statistical improvements by applying it on a challenging dataset from Facebook production logs, where we achieve 40x speedup over baseline approaches using discretization.

References

  1. Stephen D. Bay. 2000. Multivariate Discretization of Continuous Variables for Set Mining. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '00). ACM, New York, NY, USA, 315--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Stephen D. Bay and Michael J. Pazzani. 1999. Detecting Change in Categorical Data: Mining Contrast Sets. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99). ACM, New York, NY, USA, 302--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joshua Charles Campbell, Eddie Antonio Santos, and Abram Hindle. 2016. The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR '16). ACM, New York, NY, USA, 269--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi. 2017. Automatically Analyzing Groups of Crashes for Finding Correlations. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 717--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob Cohen. 1992. Statistical Power Analysis. Current Directions in Psychological Science 1, 3 (1992), 98--101.Google ScholarGoogle ScholarCross RefCross Ref
  6. Tejinder Dhaliwal, Foutse Khomh, and Ying Zou. 2011. Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox. IEEE International Conference on Software Maintenance, ICSM, 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Fan, T. Su, S. Chen, G. Meng, Y. Liu, L. Xu, G. Pu, and Z. Su. 2018. Large-Scale Analysis of Framework-Specific Exceptions in Android Apps. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 408--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shivani Rao and Avinash Kak. 2011. Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR '11). ACM, New York, NY, USA, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stephen Robertson. 2004. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation 60 (2004).Google ScholarGoogle Scholar
  10. Mondelle Simeon and Robert Hilderman. 2008. Categorical Proportional Difference: A Feature Selection Method for Text Categorization. In Proceedings of the 7th Australasian Data Mining Conference - Volume 87 (AusDM '08). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 201--208. http://dl.acm.org/citation.cfm?id=2449288.2449320Google ScholarGoogle Scholar
  11. Geoffrey I. Webb, Shane Butler, and Douglas Newlands. 2003. On Detecting Differences Between Groups. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03). ACM, New York, NY, USA, 256--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rongxin Wu, Ming Wen, Shing-Chi Cheung, and Hongyu Zhang. 2018. Change-Locator: Locate Crash-inducing Changes Based on Crash Reports. In Empirical Software Engineering 23 (ESE 2018). ACM, New York, NY, USA, 2866--2900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gangyi Zhu, Yi Wang, and Gagan Agrawal. 2015. SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (SSDBM '15). ACM, New York, NY, USA, Article 38, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Debugging crashes using continuous contrast set mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader