research-article

Heuristic stopping rules for technology-assisted review

Authors:
Eugene Yang

Georgetown University

Georgetown University
View Profile

,
David D. Lewis

Reveal Data

Reveal Data
View Profile

,
Ophir Frieder

Georgetown University

Georgetown University
View Profile

DocEng '21: Proceedings of the 21st ACM Symposium on Document EngineeringAugust 2021Article No.: 31Pages 1–10https://doi.org/10.1145/3469096.3469873

Published:16 August 2021Publication History

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

Pages 1–10

ABSTRACT

Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stopping rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stopping rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.

References

Mossaab Bagdouri, William Webber, David D Lewis, and Douglas W Oard. 2013. Towards minimizing the annotation cost of certified text classification. In CIKM 2013. ACM, 989--998.Google ScholarDigital Library
J.R. Baron, R.C. Losey, and M.D. Berman. 2016. Perspectives on Predictive Coding: And Other Advanced Search Methods for the Legal Practitioner. American Bar Association, Section of Litigation. https://books.google.com/books?id=TdJ2AQAACAAJGoogle Scholar
Jason R Baron, Mahmoud F Sayed, and Douglas W Oard. 2020. Providing More Efficient Access To Government Records: A Use Case Involving Application of Machine Learning to Improve FOIA Review for the Deliberative Process Privilege. arXiv preprint arXiv:2011.07203 (2020).Google Scholar
Max W Callaghan and Finn Müller-Hansen. 2020. Statistical stopping criteria for automated screening in systematic reviews. Systematic Reviews 9, 1 (2020), 1--14.Google ScholarCross Ref
Gordon F. Cormack and Maura F. Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. SIGIR 2014 (2014), 153--162. Google ScholarDigital Library
Gordon V Cormack and Maura R Grossman. 2015. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint arXiv:1504.06868 (2015).Google Scholar
Gordon V Cormack and Maura R Grossman. 2015. Waterloo (Cormack) Participation in the TREC 2015 Total Recall Track.. In TREC.Google Scholar
Gordon V. Cormack and Maura R. Grossman. 2016. Engineering Quality and Reliability in Technology-Assisted Review. In SIGIR. ACM Press, Pisa, Italy, 75--84. 00024. Google ScholarDigital Library
Gordon V Cormack and Maura R Grossman. 2016. Scalability of continuous active learning for reliable high-recall text classification. In Proceedings of the 25th ACM international on conference on information and knowledge management. 1039--1048.Google ScholarDigital Library
A Philip Dawid. 1982. The well-calibrated Bayesian. J. Amer. Statist. Assoc. 77, 379 (1982), 605--610.Google ScholarCross Ref
Wei Gao and Fabrizio Sebastiani. 2016. From classification to quantification in tweet sentiment analysis. Social Network Analysis and Mining 6, 1 (2016), 19.Google ScholarCross Ref
Maura R. Grossman, Gordon V. Cormack, and Adam Roegiest. 2016. TREC 2016 Total Recall Track Overview.Google Scholar
Theodore C Hirt. 2011. Applying Proportionality Principles in Electronic Discovery-Lessons for Federal Agencies and Their Litigators. US Att'ys Bull. 59 (2011), 43.Google Scholar
Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. CLEF 2017 Technologically Assisted Reviews in Empirical Medicine Overview. In CEUR workshop proceedings, Vol. 1866. 1--29.Google Scholar
Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2018. CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview. CEUR Workshop Proceedings 2125 (July 2018). https://strathprints.strath.ac.uk/66446/Google Scholar
Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2019. CLEF 2019 technology assisted reviews in empirical medicine overview. In CEUR workshop proceedings, Vol. 2380.Google Scholar
David D Lewis. 1995. Evaluating and optimizing autonomous text classification systems. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 246--254.Google ScholarDigital Library
David D Lewis. 2016. Defining and Estimating Effectiveness in Document Review. In Perspectives on Predictive Coding: And Other Advanced Search Methods for the Legal Practitioner. American Bar Association, Section of Litigation.Google Scholar
David D. Lewis and William A Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR 1994. 3--12.Google ScholarDigital Library
David D Lewis, Eugene Yang, and Ophir Frieder. 2021. On Sample-Based Stopping Rules for Technology-Assisted Review (Under review). (2021).Google Scholar
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. JMLR 5(2004), 361--397.Google ScholarDigital Library
Dan Li and Evangelos Kanoulas. 2020. When to Stop Reviewing in Technology-Assisted Reviews: Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents. ACM Transactions on Information Systems (TOIS) 38, 4 (2020), 1--36.Google ScholarDigital Library
Alex Papanicolaou. 2009. Taylor approximation and the delta method. April 28 (2009), 2009.Google Scholar
Stephen Robertson. 2004. Understanding inverse document frequency: on theoretical arguments for IDF. JDoc 60, 5 (2004), 503--520.Google Scholar
Joseph John Rocchio. 1971. Relevance feedback in information retrieval. (1971).Google Scholar
Adam Roegiest and Gordon V. Cormack. 2015. TREC 2015 Total Recall Track Overview. (2015).Google Scholar
Herbert L Roitblat. 2007. Search and information retrieval science. In Sedona Conf. J., Vol. 8. HeinOnline, 225.Google Scholar
T. K. Saha, M. A. Hasan, C. Burgess, M. A. Habib, and J. Johnson. 2015. Batch-mode active learning for technology-assisted review. In 2015 IEEE International Conference on Big Data (Big Data). 1134--1143. 00003.Google Scholar
Carl-Erik Särndal, Ib Thomsen, Jan M Hoem, DV Lindley, O Barndorff-Nielsen, and Tore Dalenius. 1978. Design-based and model-based inference in survey sampling [with discussion and reply]. Scandinavian Journal of Statistics (1978), 27--52.Google Scholar
T Thomas, C Scovel, C Kruger, and J Shumate. 1995. Text to information: Sampling uncertainty in an example from physician/patient encounters. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval", Information Science Research Institute, University of Las Vegas. 347--58.Google Scholar
Richard Valliant, Jill A Dever, and Frauke Kreuter. 2013. Practical tools for designing and weighting survey samples. Springer.Google Scholar
Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, and Christopher H Schmid. 2010. Semi-automated screening of biomedical citations for systematic reviews. BMC bioinformatics 11, 1 (2010), 55.Google Scholar
William Webber. 2013. Approximate recall confidence intervals. ACM Transactions on Information Systems (TOIS) 31, 1 (2013), 1--33.Google ScholarDigital Library
William Webber, Mossaab Bagdouri, David D Lewis, and Douglas W Oard. 2013. Sequential testing in classifier evaluation yields biased estimates of effectiveness. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 933--936.Google ScholarDigital Library
Eugene Yang, David D Lewis, and Ophir Frieder. 2019. Text Retrieval Priors for Bayesian Logistic Regression. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1045--1048.Google ScholarDigital Library
Eugene Yang, David D Lewis, and Ophir Frieder. 2021. On Minimizing Cost in Legal Document Review Workflows (Under review). (2021).Google Scholar
Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, and Hsuan-Tien Lin. 2017. libact: Pool-based Active Learning in Python. Technical Report. National Taiwan University. https://github.com/ntucllab/libactGoogle Scholar

Index Terms

Heuristic stopping rules for technology-assisted review

Recommendations

When to Stop Reviewing in Technology-Assisted Reviews: Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents

Technology-Assisted Reviews (TAR) aim to expedite document reviewing (e.g., medical articles or legal documents) by iteratively incorporating machine learning algorithms and human feedback on document relevance. Continuous Active Learning (CAL) ...
Read More
Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

The main drawback of most metaheuristics is the absence of effective stopping criteria. Most implementations stop after performing a given maximum number of iterations or a given maximum number of consecutive iterations without improvement in the best ...
Read More
Bayesian Stopping Rules for Greedy Randomized Procedures

A greedy randomized adaptive search procedure (GRASP) is proposed for the approximate solution of general mixed binary programming problems (MBP). Examples are provided of practical applications that can be formulated as MBP requiring the solution of a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering
August 2021
178 pages
ISBN:9781450385961
DOI:10.1145/3469096
General Chairs:
Patrick Healy
University of Limerick, Ireland
,
Mihai Bilauca
University of Limerick, Ireland
,
Program Chair:
Alexandra Bonnici
University of Malta, Malta
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate178of537submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 125
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heuristic stopping rules for technology-assisted review

DocEng '21: Proceedings of the 21st ACM Symposium on Document Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

When to Stop Reviewing in Technology-Assisted Reviews: Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations

Bayesian Stopping Rules for Greedy Randomized Procedures