Skip to main content
Top
Published in:

30-07-2021 | Original Research

A sequence labeling model for catchphrase identification from legal case documents

Authors: Arpan Mandal, Kripabandhu Ghosh, Saptarshi Ghosh, Sekhar Mandal

Published in: Artificial Intelligence and Law | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In a Common Law system, legal practitioners need frequent access to prior case documents that discuss relevant legal issues. Case documents are generally very lengthy, containing complex sentence structures, and reading them fully is a strenuous task even for legal practitioners. Having a concise overview of these documents can relieve legal practitioners from the task of reading the complete case statements. Legal catchphrases are (multi-word) phrases that provide a concise overview of the contents of a case document, and automated generation of catchphrases is a challenging problem in legal analytics. In this paper, we propose a novel supervised neural sequence tagging model for the extraction of catchphrases from legal case documents. Specifically, we show that incorporating document-specific information along with a sequence tagging model can enhance the performance of catchphrase extraction. We perform experiments over a set of Indian Supreme Court case documents, for which the gold-standard catchphrases (annotated by legal practitioners) are obtained from a popular legal information system. The performance of our proposed method is compared with that of several existing supervised and unsupervised methods, and our proposed method is empirically shown to be superior to all baselines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
Accuracy is a well-known set-based evaluation metric to measure the performance of classification algorithms, that measures what fraction of instances are correctly classified by a model. In the present context, accuracy can be used to measure what fraction of catchphrases are correctly identified by a method.
 
7
The GitHub url to our noun phrase extractor is https://​github.​com/​amarnamarpan/​NNP-extractor.
 
12
To get viterbi accuracy scores in pyCRFsuite one can use the ‘-i’ option while tagging.
 
14
available online at https://​keras.​io/​.
 
17
To compute rouge recall score we use the implementation found at https://​pypi.​org/​project/​rouge-score/​.
 
Literature
go back to reference Al-Shboul B, Myaeng SH (2014) Wikipedia-based query phrase expansion in patent class search. Inform Retrieval J 17:430–451CrossRef Al-Shboul B, Myaeng SH (2014) Wikipedia-based query phrase expansion in patent class search. Inform Retrieval J 17:430–451CrossRef
go back to reference Alzaidy R, Caragea C, Giles CL (2019) Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: Proceedings of the International Conference on World Wide Web, pp 2551–2557 Alzaidy R, Caragea C, Giles CL (2019) Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: Proceedings of the International Conference on World Wide Web, pp 2551–2557
go back to reference Augenstein I, Das M, Riedel S, Vikraman L, McCallum A (2017) SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp 546–555 Augenstein I, Das M, Riedel S, Vikraman L, McCallum A (2017) SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp 546–555
go back to reference Bhattacharya P, Hiware K, Rajgaria S, Pochhi N, Ghosh K, Ghosh S (2019) A comparative study of summarization algorithms applied to legal case judgments. In: Advances in Information Retrieval, pp 413–428 Bhattacharya P, Hiware K, Rajgaria S, Pochhi N, Ghosh K, Ghosh S (2019) A comparative study of summarization algorithms applied to legal case judgments. In: Advances in Information Retrieval, pp 413–428
go back to reference Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) Dbpedia-a crystallization point for the web of data. J Web Semantics 7(3):154–165CrossRef Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) Dbpedia-a crystallization point for the web of data. J Web Semantics 7(3):154–165CrossRef
go back to reference Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. CRC Press, CambridgeMATH Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and regression trees. CRC Press, CambridgeMATH
go back to reference Caragea C, Bulgarov FA, Godea A, Das Gollapalli S (2014) Citation-enhanced keyphrase extraction from research papers: A supervised approach. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1435–1446 Caragea C, Bulgarov FA, Godea A, Das Gollapalli S (2014) Citation-enhanced keyphrase extraction from research papers: A supervised approach. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp 1435–1446
go back to reference Cardellino C, Teruel M, Alemany LA, Villata S (2017) A low-cost, high-coverage legal named entity recognizer, classifier and linker. In: Proceedings of International Conference on Articial Intelligence and Law), pp 9–18 Cardellino C, Teruel M, Alemany LA, Villata S (2017) A low-cost, high-coverage legal named entity recognizer, classifier and linker. In: Proceedings of International Conference on Articial Intelligence and Law), pp 9–18
go back to reference Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist 4:357–370CrossRef Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist 4:357–370CrossRef
go back to reference Dhondt E, Verberne S, Oostdijk N, Beney J, Koster C, Boves L (2014) Dealing with temporal variation in patent categorization. Inform Retrieval J 17:520–544CrossRef Dhondt E, Verberne S, Oostdijk N, Beney J, Koster C, Boves L (2014) Dealing with temporal variation in patent categorization. Inform Retrieval J 17:520–544CrossRef
go back to reference Firoozeh N, Nazarenko A, Alizon F, Daille B (2019) Keyword extraction: issues and methods. Nat Lang Eng 26:259–291CrossRef Firoozeh N, Nazarenko A, Alizon F, Daille B (2019) Keyword extraction: issues and methods. Nat Lang Eng 26:259–291CrossRef
go back to reference Florescu C, Caragea C (2017) PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, pp 1105–1115 Florescu C, Caragea C (2017) PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, pp 1105–1115
go back to reference Frank E, et al. (1999) Domain-specific keyphrase extraction. In: International Joint Conference on Artificial Intelligence, pp 668–673 Frank E, et al. (1999) Domain-specific keyphrase extraction. In: International Joint Conference on Artificial Intelligence, pp 668–673
go back to reference Galgani F, et al. (2012) Towards automatic generation of catchphrases for legal case reports. In: Proceedings of Computational Linguistics and Intelligent Text Processing (CICLing), pp 414–425 Galgani F, et al. (2012) Towards automatic generation of catchphrases for legal case reports. In: Proceedings of Computational Linguistics and Intelligent Text Processing (CICLing), pp 414–425
go back to reference Giamblanco N, Siddavaatam P (2017) Keyword and Keyphrase Extraction using Newton’s Law of Universal Gravitation. Proceedings of Canadian Conference on Electrical and Computer Engineering pp 1–4 Giamblanco N, Siddavaatam P (2017) Keyword and Keyphrase Extraction using Newton’s Law of Universal Gravitation. Proceedings of Canadian Conference on Electrical and Computer Engineering pp 1–4
go back to reference Gollapalli SD, Li X, Yang P (2017) Incorporating expert knowledge into keyphrase extraction. In: Association for the Advancement of Artificial Intelligence Gollapalli SD, Li X, Yang P (2017) Incorporating expert knowledge into keyphrase extraction. In: Association for the Advancement of Artificial Intelligence
go back to reference Hasan KS, Ng V (2014) Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1262–1273 Hasan KS, Ng V (2014) Automatic keyphrase extraction: A survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1262–1273
go back to reference Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the International Conference on World Wide Web, p 517–526 Haveliwala TH (2002) Topic-sensitive pagerank. In: Proceedings of the International Conference on World Wide Web, p 517–526
go back to reference Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28CrossRef Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28CrossRef
go back to reference Hinton GE (1990) Connectionist learning procedures. In: Machine Learning, pp 555 – 610 Hinton GE (1990) Connectionist learning procedures. In: Machine Learning, pp 555 – 610
go back to reference Hu J, Li S, Yao Y, Yu L, Yang G, Hu J (2018) Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20(2):104CrossRef Hu J, Li S, Yao Y, Yu L, Yang G, Hu J (2018) Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20(2):104CrossRef
go back to reference Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of International Conference on Machine Learning, pp 282–289 Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of International Conference on Machine Learning, pp 282–289
go back to reference Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 260–270 Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 260–270
go back to reference Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of International Conference on Machine Learning, pp 1188–1196 Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of International Conference on Machine Learning, pp 1188–1196
go back to reference Le TTN, Shirai K, Nguyen ML, Shimazu A (2015) Extracting indices from Japanese legal documents. Art Intell Law 23(4):315–344CrossRef Le TTN, Shirai K, Nguyen ML, Shimazu A (2015) Extracting indices from Japanese legal documents. Art Intell Law 23(4):315–344CrossRef
go back to reference Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, p 257–266 Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, p 257–266
go back to reference Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on Empirical Methods in Natural Language Processing, pp 366–376 Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on Empirical Methods in Natural Language Processing, pp 366–376
go back to reference Lossio-Ventura JA, Jonquet C, Roche M, Teisseire M (2016) Biomedical term extraction: overview and a new methodology. Inform Ret J 19:59–99CrossRef Lossio-Ventura JA, Jonquet C, Roche M, Teisseire M (2016) Biomedical term extraction: overview and a new methodology. Inform Ret J 19:59–99CrossRef
go back to reference Mahdabi P, Crestani F (2014) The effect of citation analysis on query expansion for patent retrieval. Inform Ret J 17:412–429CrossRef Mahdabi P, Crestani F (2014) The effect of citation analysis on query expansion for patent retrieval. Inform Ret J 17:412–429CrossRef
go back to reference Mandal A, Ghosh K, Pal A, Ghosh S (2017) Automatic catchphrase identification from legal court case documents. In: Conference on Information and Knowledge Management, ACM, New York, USA, CIKM ’17, pp 2187–2190 Mandal A, Ghosh K, Pal A, Ghosh S (2017) Automatic catchphrase identification from legal court case documents. In: Conference on Information and Knowledge Management, ACM, New York, USA, CIKM ’17, pp 2187–2190
go back to reference Mandal A, Ghosh K, Ghosh S, Mandal S (2021) Unsupervised approaches for measuring textual similarity between legal court case reports. Artificial Intelligence and Law Mandal A, Ghosh K, Ghosh S, Mandal S (2021) Unsupervised approaches for measuring textual similarity between legal court case reports. Artificial Intelligence and Law
go back to reference Medelyan O (2009) Human-competitive automatic topic indexing. PhD thesis, The University of Waikato, New Zealand Medelyan O (2009) Human-competitive automatic topic indexing. PhD thesis, The University of Waikato, New Zealand
go back to reference Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Inform Process Manag 56(6):102088CrossRef Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Inform Process Manag 56(6):102088CrossRef
go back to reference Nguyen S, Nguyen LM, Tojo S, Satoh K, Shimazu A (2018) Recurrent neural network-based models for recognizing requisite and effectuation parts in legal texts. Artificial Intelligence and Law pp 1–31 Nguyen S, Nguyen LM, Tojo S, Satoh K, Shimazu A (2018) Recurrent neural network-based models for recognizing requisite and effectuation parts in legal texts. Artificial Intelligence and Law pp 1–31
go back to reference Okamoto M, Shan Z, Orihara R (2017) Applying information extraction for patent structure analysis. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, p 989–992 Okamoto M, Shan Z, Orihara R (2017) Applying information extraction for patent structure analysis. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, p 989–992
go back to reference Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 2227–2237 Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 2227–2237
go back to reference Qazvinian V, Radev DR, Özgür A (2010) Citation summarization through keyphrase extraction. In: Proceedings of Conference on Computational Linguistics, pp 895–903 Qazvinian V, Radev DR, Özgür A (2010) Citation summarization through keyphrase extraction. In: Proceedings of Conference on Computational Linguistics, pp 895–903
go back to reference Shi W, Zheng W, Yu JX, Cheng H, Zou L (2017) Keyphrase extraction using knowledge graphs. Data Sci Eng 2(4):275–288CrossRef Shi W, Zheng W, Yu JX, Cheng H, Zou L (2017) Keyphrase extraction using knowledge graphs. Data Sci Eng 2(4):275–288CrossRef
go back to reference Siddiqi S, Sharan A (2015) Keyword and keyphrase extraction techniques: a literature review. Int J Comput Appl 109(2) Siddiqi S, Sharan A (2015) Keyword and keyphrase extraction techniques: a literature review. Int J Comput Appl 109(2)
go back to reference Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill series in psychology, McGraw-Hill Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill series in psychology, McGraw-Hill
go back to reference Suzuki S, Takatsuka H (2016) Extraction of keywords of novelties from patent claims. In: Proceedings of Conference on Computational Linguistics, pp 1192–1200 Suzuki S, Takatsuka H (2016) Extraction of keywords of novelties from patent claims. In: Proceedings of Conference on Computational Linguistics, pp 1192–1200
go back to reference Tannebaum W, Rauber A (2014) Using query logs of uspto patent examiners for automatic query expansion in patent searching. Inform Ret J 17:452–470CrossRef Tannebaum W, Rauber A (2014) Using query logs of uspto patent examiners for automatic query expansion in patent searching. Inform Ret J 17:452–470CrossRef
go back to reference Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of ACL Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp 33–40 Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of ACL Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp 33–40
go back to reference Tran V, Le Nguyen M, Tojo S, Satoh K (2020) Encoded summarization: summarizing documents into continuous vector space for legal case retrieval. Artificial Intelligence and Law pp 1–27 Tran V, Le Nguyen M, Tojo S, Satoh K (2020) Encoded summarization: summarizing documents into continuous vector space for legal case retrieval. Artificial Intelligence and Law pp 1–27
go back to reference Tran VD, Nguyen ML, Satoh K (2018) Automatic catchphrase extraction from legal case documents via scoring using deep neural networks. CoRR arxiv:abs/1809.05219 Tran VD, Nguyen ML, Satoh K (2018) Automatic catchphrase extraction from legal case documents via scoring using deep neural networks. CoRR arxiv:abs/1809.05219
go back to reference Truong S, Le Minh N, Satoh K, Satoshi T, Shimazu A (2017) Single and multiple layer bi-lstmcrf for recognizing requisite and effectuation parts in legal texts. In: Proceedings of Automated Semantic Analysis of Information in Legal Texts Truong S, Le Minh N, Satoh K, Satoshi T, Shimazu A (2017) Single and multiple layer bi-lstmcrf for recognizing requisite and effectuation parts in legal texts. In: Proceedings of Automated Semantic Analysis of Information in Legal Texts
go back to reference Vega-Oliveros DA, Gomes PS, Milios EE, Berton L (2019) A multi-centrality index for graph-based keyword extraction. Inform Process Manag 56(6):102063CrossRef Vega-Oliveros DA, Gomes PS, Milios EE, Berton L (2019) A multi-centrality index for graph-based keyword extraction. Inform Process Manag 56(6):102063CrossRef
go back to reference Verberne S, Sappelli M, Hiemstra D, Kraaij W (2016) Evaluation and analysis of term scoring methods for term extraction. Inform Ret J 19(5):510–545CrossRef Verberne S, Sappelli M, Hiemstra D, Kraaij W (2016) Evaluation and analysis of term scoring methods for term extraction. Inform Ret J 19(5):510–545CrossRef
go back to reference Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (1999) Kea: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p 254–255 Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (1999) Kea: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, p 254–255
go back to reference Wu YFB, Li Q (2008) Document keyphrases as subject metadata: Incorporating document key concepts in search results. Inform Ret J 11:229–249CrossRef Wu YFB, Li Q (2008) Document keyphrases as subject metadata: Incorporating document key concepts in search results. Inform Ret J 11:229–249CrossRef
go back to reference Zahoor F, Bajwa IS (2014) Automatic extraction of catchphrases from software license agreement. Proceedings of International Conference on Intelligent Human-Machine Systems and Cybernetics 2:189–193 Zahoor F, Bajwa IS (2014) Automatic extraction of catchphrases from software license agreement. Proceedings of International Conference on Intelligent Human-Machine Systems and Cybernetics 2:189–193
go back to reference Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: A summary of legal artificial intelligence. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 5218–5230 Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: A summary of legal artificial intelligence. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 5218–5230
go back to reference Zhou D, Truran M, Liu J, Zhang S (2014) Using multiple query representations in patent prior-art search. Inform Ret J 17:471–491CrossRef Zhou D, Truran M, Liu J, Zhang S (2014) Using multiple query representations in patent prior-art search. Inform Ret J 17:471–491CrossRef
go back to reference Zhu X, Lyu C, Ji D, Liao H, Li F (2020) Deep neural model with self-training for scientific keyphrase extraction. Public Library of Science (Plos one) 15(5):e0232547 Zhu X, Lyu C, Ji D, Liao H, Li F (2020) Deep neural model with self-training for scientific keyphrase extraction. Public Library of Science (Plos one) 15(5):e0232547
Metadata
Title
A sequence labeling model for catchphrase identification from legal case documents
Authors
Arpan Mandal
Kripabandhu Ghosh
Saptarshi Ghosh
Sekhar Mandal
Publication date
30-07-2021
Publisher
Springer Netherlands
Published in
Artificial Intelligence and Law / Issue 3/2022
Print ISSN: 0924-8463
Electronic ISSN: 1572-8382
DOI
https://doi.org/10.1007/s10506-021-09296-2

Premium Partner