ABSTRACT
Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stopping rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stopping rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.
- Mossaab Bagdouri, William Webber, David D Lewis, and Douglas W Oard. 2013. Towards minimizing the annotation cost of certified text classification. In CIKM 2013. ACM, 989--998.Google ScholarDigital Library
- J.R. Baron, R.C. Losey, and M.D. Berman. 2016. Perspectives on Predictive Coding: And Other Advanced Search Methods for the Legal Practitioner. American Bar Association, Section of Litigation. https://books.google.com/books?id=TdJ2AQAACAAJGoogle Scholar
- Jason R Baron, Mahmoud F Sayed, and Douglas W Oard. 2020. Providing More Efficient Access To Government Records: A Use Case Involving Application of Machine Learning to Improve FOIA Review for the Deliberative Process Privilege. arXiv preprint arXiv:2011.07203 (2020).Google Scholar
- Max W Callaghan and Finn Müller-Hansen. 2020. Statistical stopping criteria for automated screening in systematic reviews. Systematic Reviews 9, 1 (2020), 1--14.Google ScholarCross Ref
- Gordon F. Cormack and Maura F. Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. SIGIR 2014 (2014), 153--162. Google ScholarDigital Library
- Gordon V Cormack and Maura R Grossman. 2015. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint arXiv:1504.06868 (2015).Google Scholar
- Gordon V Cormack and Maura R Grossman. 2015. Waterloo (Cormack) Participation in the TREC 2015 Total Recall Track.. In TREC.Google Scholar
- Gordon V. Cormack and Maura R. Grossman. 2016. Engineering Quality and Reliability in Technology-Assisted Review. In SIGIR. ACM Press, Pisa, Italy, 75--84. 00024. Google ScholarDigital Library
- Gordon V Cormack and Maura R Grossman. 2016. Scalability of continuous active learning for reliable high-recall text classification. In Proceedings of the 25th ACM international on conference on information and knowledge management. 1039--1048.Google ScholarDigital Library
- A Philip Dawid. 1982. The well-calibrated Bayesian. J. Amer. Statist. Assoc. 77, 379 (1982), 605--610.Google ScholarCross Ref
- Wei Gao and Fabrizio Sebastiani. 2016. From classification to quantification in tweet sentiment analysis. Social Network Analysis and Mining 6, 1 (2016), 19.Google ScholarCross Ref
- Maura R. Grossman, Gordon V. Cormack, and Adam Roegiest. 2016. TREC 2016 Total Recall Track Overview.Google Scholar
- Theodore C Hirt. 2011. Applying Proportionality Principles in Electronic Discovery-Lessons for Federal Agencies and Their Litigators. US Att'ys Bull. 59 (2011), 43.Google Scholar
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. CLEF 2017 Technologically Assisted Reviews in Empirical Medicine Overview. In CEUR workshop proceedings, Vol. 1866. 1--29.Google Scholar
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2018. CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview. CEUR Workshop Proceedings 2125 (July 2018). https://strathprints.strath.ac.uk/66446/Google Scholar
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2019. CLEF 2019 technology assisted reviews in empirical medicine overview. In CEUR workshop proceedings, Vol. 2380.Google Scholar
- David D Lewis. 1995. Evaluating and optimizing autonomous text classification systems. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. 246--254.Google ScholarDigital Library
- David D Lewis. 2016. Defining and Estimating Effectiveness in Document Review. In Perspectives on Predictive Coding: And Other Advanced Search Methods for the Legal Practitioner. American Bar Association, Section of Litigation.Google Scholar
- David D. Lewis and William A Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR 1994. 3--12.Google ScholarDigital Library
- David D Lewis, Eugene Yang, and Ophir Frieder. 2021. On Sample-Based Stopping Rules for Technology-Assisted Review (Under review). (2021).Google Scholar
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. JMLR 5(2004), 361--397.Google ScholarDigital Library
- Dan Li and Evangelos Kanoulas. 2020. When to Stop Reviewing in Technology-Assisted Reviews: Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents. ACM Transactions on Information Systems (TOIS) 38, 4 (2020), 1--36.Google ScholarDigital Library
- Alex Papanicolaou. 2009. Taylor approximation and the delta method. April 28 (2009), 2009.Google Scholar
- Stephen Robertson. 2004. Understanding inverse document frequency: on theoretical arguments for IDF. JDoc 60, 5 (2004), 503--520.Google Scholar
- Joseph John Rocchio. 1971. Relevance feedback in information retrieval. (1971).Google Scholar
- Adam Roegiest and Gordon V. Cormack. 2015. TREC 2015 Total Recall Track Overview. (2015).Google Scholar
- Herbert L Roitblat. 2007. Search and information retrieval science. In Sedona Conf. J., Vol. 8. HeinOnline, 225.Google Scholar
- T. K. Saha, M. A. Hasan, C. Burgess, M. A. Habib, and J. Johnson. 2015. Batch-mode active learning for technology-assisted review. In 2015 IEEE International Conference on Big Data (Big Data). 1134--1143. 00003.Google Scholar
- Carl-Erik Särndal, Ib Thomsen, Jan M Hoem, DV Lindley, O Barndorff-Nielsen, and Tore Dalenius. 1978. Design-based and model-based inference in survey sampling [with discussion and reply]. Scandinavian Journal of Statistics (1978), 27--52.Google Scholar
- T Thomas, C Scovel, C Kruger, and J Shumate. 1995. Text to information: Sampling uncertainty in an example from physician/patient encounters. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval", Information Science Research Institute, University of Las Vegas. 347--58.Google Scholar
- Richard Valliant, Jill A Dever, and Frauke Kreuter. 2013. Practical tools for designing and weighting survey samples. Springer.Google Scholar
- Byron C Wallace, Thomas A Trikalinos, Joseph Lau, Carla Brodley, and Christopher H Schmid. 2010. Semi-automated screening of biomedical citations for systematic reviews. BMC bioinformatics 11, 1 (2010), 55.Google Scholar
- William Webber. 2013. Approximate recall confidence intervals. ACM Transactions on Information Systems (TOIS) 31, 1 (2013), 1--33.Google ScholarDigital Library
- William Webber, Mossaab Bagdouri, David D Lewis, and Douglas W Oard. 2013. Sequential testing in classifier evaluation yields biased estimates of effectiveness. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 933--936.Google ScholarDigital Library
- Eugene Yang, David D Lewis, and Ophir Frieder. 2019. Text Retrieval Priors for Bayesian Logistic Regression. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1045--1048.Google ScholarDigital Library
- Eugene Yang, David D Lewis, and Ophir Frieder. 2021. On Minimizing Cost in Legal Document Review Workflows (Under review). (2021).Google Scholar
- Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, and Hsuan-Tien Lin. 2017. libact: Pool-based Active Learning in Python. Technical Report. National Taiwan University. https://github.com/ntucllab/libactGoogle Scholar
Index Terms
- Heuristic stopping rules for technology-assisted review
Recommendations
When to Stop Reviewing in Technology-Assisted Reviews: Sampling from an Adaptive Distribution to Estimate Residual Relevant Documents
Technology-Assisted Reviews (TAR) aim to expedite document reviewing (e.g., medical articles or legal documents) by iteratively incorporating machine learning algorithms and human feedback on document relevance. Continuous Active Learning (CAL) ...
Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations
LION'05: Proceedings of the 5th international conference on Learning and Intelligent OptimizationThe main drawback of most metaheuristics is the absence of effective stopping criteria. Most implementations stop after performing a given maximum number of iterations or a given maximum number of consecutive iterations without improvement in the best ...
Bayesian Stopping Rules for Greedy Randomized Procedures
A greedy randomized adaptive search procedure (GRASP) is proposed for the approximate solution of general mixed binary programming problems (MBP). Examples are provided of practical applications that can be formulated as MBP requiring the solution of a ...
Comments