research-article

Debugging crashes using continuous contrast set mining

Authors:
Rebecca Qian

Facebook, Inc.

Facebook, Inc.
View Profile

,
Yang Yu

Purdue University

Purdue University
View Profile

,
Wonhee Park

Facebook, Inc.

Facebook, Inc.
View Profile

,
Vijayaraghavan Murali

Facebook, Inc.

Facebook, Inc.
View Profile

,
Stephen Fink

Facebook, Inc.

Facebook, Inc.
View Profile

,
Satish Chandra

Facebook, Inc.

Facebook, Inc.
View Profile

ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in PracticeJune 2020Pages 61–70https://doi.org/10.1145/3377813.3381369

Published:18 September 2020Publication History

ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice

Pages 61–70

ABSTRACT

Facebook operates a family of services used by over two billion people daily on a huge variety of mobile devices. Many devices are configured to upload crash reports should the app crash for any reason. Engineers monitor and triage millions of crash reports logged each day to check for bugs, regressions, and any other quality problems. Debugging groups of crashes is a manually intensive process that requires deep domain expertise and close inspection of traces and code, often under time constraints.

We use contrast set mining, a form of discriminative pattern mining, to learn what distinguishes one group of crashes from another. Prior works focus on discretization to apply contrast mining to continuous data. We propose the first direct application of contrast learning to continuous data, without the need for discretization. We also define a weighted anomaly score that unifies continuous and categorical contrast sets while mitigating bias, as well as uncertainty measures that communicate confidence to developers. We demonstrate the value of our novel statistical improvements by applying it on a challenging dataset from Facebook production logs, where we achieve 40x speedup over baseline approaches using discretization.

References

Stephen D. Bay. 2000. Multivariate Discretization of Continuous Variables for Set Mining. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '00). ACM, New York, NY, USA, 315--319. Google ScholarDigital Library
Stephen D. Bay and Michael J. Pazzani. 1999. Detecting Change in Categorical Data: Mining Contrast Sets. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99). ACM, New York, NY, USA, 302--306. Google ScholarDigital Library
Joshua Charles Campbell, Eddie Antonio Santos, and Abram Hindle. 2016. The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR '16). ACM, New York, NY, USA, 269--280. Google ScholarDigital Library
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi. 2017. Automatically Analyzing Groups of Crashes for Finding Correlations. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 717--726. Google ScholarDigital Library
Jacob Cohen. 1992. Statistical Power Analysis. Current Directions in Psychological Science 1, 3 (1992), 98--101.Google ScholarCross Ref
Tejinder Dhaliwal, Foutse Khomh, and Ying Zou. 2011. Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox. IEEE International Conference on Software Maintenance, ICSM, 333--342. Google ScholarDigital Library
L. Fan, T. Su, S. Chen, G. Meng, Y. Liu, L. Xu, G. Pu, and Z. Su. 2018. Large-Scale Analysis of Framework-Specific Exceptions in Android Apps. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 408--419. Google ScholarDigital Library
Shivani Rao and Avinash Kak. 2011. Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR '11). ACM, New York, NY, USA, 43--52. Google ScholarDigital Library
Stephen Robertson. 2004. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation 60 (2004).Google Scholar
Mondelle Simeon and Robert Hilderman. 2008. Categorical Proportional Difference: A Feature Selection Method for Text Categorization. In Proceedings of the 7th Australasian Data Mining Conference - Volume 87 (AusDM '08). Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 201--208. http://dl.acm.org/citation.cfm?id=2449288.2449320Google Scholar
Geoffrey I. Webb, Shane Butler, and Douglas Newlands. 2003. On Detecting Differences Between Groups. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03). ACM, New York, NY, USA, 256--265. Google ScholarDigital Library
Rongxin Wu, Ming Wen, Shing-Chi Cheung, and Hongyu Zhang. 2018. Change-Locator: Locate Crash-inducing Changes Based on Crash Reports. In Empirical Software Engineering 23 (ESE 2018). ACM, New York, NY, USA, 2866--2900. Google ScholarDigital Library
Gangyi Zhu, Yi Wang, and Gagan Agrawal. 2015. SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (SSDBM '15). ACM, New York, NY, USA, Article 38, 6 pages. Google ScholarDigital Library

Index Terms

Debugging crashes using continuous contrast set mining
1. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Recommendations

Automatically analyzing groups of crashes for finding correlations
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

We devised an algorithm, inspired by contrast-set mining algorithms such as STUCCO, to automatically find statistically significant properties (correlations) in crash groups. Many earlier works focused on improving the clustering of crashes but, to the ...
Read More
CSM-SD: Methodology for contrast set mining through subgroup discovery

This paper addresses a data analysis task, known as contrast set mining, whose goal is to find differences between contrasting groups. As a methodological novelty, it is shown that this task can be effectively solved by transforming it to a more common ...
Read More
A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy

Supervised descriptive rule discovery represents a set of data mining techniques whose objective is to describe data with respect to a property of interest. This concept encompasses different techniques such as subgroup discovery, emerging patterns and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice
June 2020
258 pages
ISBN:9781450371230
DOI:10.1145/3377813
General Chairs:
Gregg Rothermel
North Carolina State University
,
Doo-Hwan Bae
KAIST, South Korea
Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contrast set mining
crash analysis
descriptive rules
emerging patterns
multiple hypothesis testing
rule learning
subgroup discovery
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 134
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Debugging crashes using continuous contrast set mining

ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatically analyzing groups of crashes for finding correlations

CSM-SD: Methodology for contrast set mining through subgroup discovery

A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Debugging crashes using continuous contrast set mining

ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatically analyzing groups of crashes for finding correlations

CSM-SD: Methodology for contrast set mining through subgroup discovery

A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media