Skip to main content

2024 | OriginalPaper | Buchkapitel

A BERT-Based Model for Legal Document Proofreading

verfasst von : Jinlong Liu, Xudong Luo

Erschienen in: Intelligent Information Processing XII

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Legal documents require high precision and accuracy in language use, leaving no room for grammatical and spelling errors. To address the issue, this paper proposes a novel application of the BERT pre-trained language model for legal document proofreading. The BERT-based model is trained to detect and correct legal texts’ grammatical and spelling errors. On a dataset of annotated legal documents, we experimentally show that our BERT-based model significantly outperforms state-of-the-art proofreading models in precision, recall, and F1 score, showing its potential as a valuable tool in legal document preparation and revision processes. The application of such advanced deep learning techniques could revolutionise the field of legal document proofreading, enhancing accuracy and efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer (2006) Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer (2006)
3.
Zurück zum Zitat Bryant, C., Yuan, Z., Qorib, M.R., Cao, H., Ng, H.T., Briscoe, T.: Grammatical error correction: a survey of the state of the art. Comput. Linguist. 1–59 (2022) Bryant, C., Yuan, Z., Qorib, M.R., Cao, H., Ng, H.T., Briscoe, T.: Grammatical error correction: a survey of the state of the art. Comput. Linguist. 1–59 (2022)
4.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
5.
Zurück zum Zitat Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)CrossRef Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)CrossRef
6.
Zurück zum Zitat Fang, T., et al.: Is ChatGPT a highly fluent grammatical error correction system? A comprehensive evaluation (2023). arXiv preprint arXiv:2304.01746 Fang, T., et al.: Is ChatGPT a highly fluent grammatical error correction system? A comprehensive evaluation (2023). arXiv preprint arXiv:​2304.​01746
7.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
8.
Zurück zum Zitat Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. In: Advances in Neural Information Processing Systems. vol. 32 (2019) Gu, J., Wang, C., Zhao, J.: Levenshtein transformer. In: Advances in Neural Information Processing Systems. vol. 32 (2019)
9.
Zurück zum Zitat Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160–169 (2019) Hong, Y., Yu, X., He, N., Liu, N., Liu, J.: FASPell: a fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm. In: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160–169 (2019)
10.
Zurück zum Zitat Katsumata, S., Komachi, M.: Stronger baselines for grammatical error correction using a pretrained encoder-decoder model. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 827–832 (2020) Katsumata, S., Komachi, M.: Stronger baselines for grammatical error correction using a pretrained encoder-decoder model. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 827–832 (2020)
11.
Zurück zum Zitat Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–15 (2014) Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–15 (2014)
12.
Zurück zum Zitat Korre, K., Pavlopoulos, J.: Enriching grammatical error correction resources for modern Greek. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4984–4991 (2022) Korre, K., Pavlopoulos, J.: Enriching grammatical error correction resources for modern Greek. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4984–4991 (2022)
13.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
14.
Zurück zum Zitat Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys.-Dokl. 10(8), 707–710 (1966)MathSciNet Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys.-Dokl. 10(8), 707–710 (1966)MathSciNet
15.
Zurück zum Zitat Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020) Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880 (2020)
16.
Zurück zum Zitat Malmi, E., Krause, S., Rothe, S., Mirylenka, D., Severyn, A.: Encode, tag, realize: high-precision text editing. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5054–5065 (2019) Malmi, E., Krause, S., Rothe, S., Mirylenka, D., Severyn, A.: Encode, tag, realize: high-precision text editing. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5054–5065 (2019)
17.
Zurück zum Zitat Musyafa, A., Gao, Y., Solyman, A., Wu, C., Khan, S.: Automatic correction of Indonesian grammatical errors based on transformer. Appl. Sci. 12(20), 10380 (2022)CrossRef Musyafa, A., Gao, Y., Solyman, A., Wu, C., Khan, S.: Automatic correction of Indonesian grammatical errors based on transformer. Appl. Sci. 12(20), 10380 (2022)CrossRef
18.
Zurück zum Zitat Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR – grammatical error correction: Tag, not rewrite. In: Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–170 (2020) Omelianchuk, K., Atrasevych, V., Chernodub, A., Skurzhanskyi, O.: GECToR – grammatical error correction: Tag, not rewrite. In: Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163–170 (2020)
19.
Zurück zum Zitat Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)MathSciNet Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)MathSciNet
20.
Zurück zum Zitat Rothe, S., Mallinson, J., Malmi, E., Krause, S., Severyn, A.: A simple recipe for multilingual grammatical error correction. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 702–707 (2021) Rothe, S., Mallinson, J., Malmi, E., Krause, S., Severyn, A.: A simple recipe for multilingual grammatical error correction. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 702–707 (2021)
21.
Zurück zum Zitat Solyman, A., Wang, Z., Tao, Q., Elhag, A.A.M., Zhang, R., Mahmoud, Z.: Automatic Arabic grammatical error correction based on expectation-maximization routing and target-bidirectional agreement. Knowl.-Based Syst. 241, 108180 (2022)CrossRef Solyman, A., Wang, Z., Tao, Q., Elhag, A.A.M., Zhang, R., Mahmoud, Z.: Automatic Arabic grammatical error correction based on expectation-maximization routing and target-bidirectional agreement. Knowl.-Based Syst. 241, 108180 (2022)CrossRef
22.
Zurück zum Zitat Stahlberg, F., Kumar, S.: Synthetic data generation for grammatical error correction with tagged corruption models (2021). arXiv preprint arXiv:2105.13318 Stahlberg, F., Kumar, S.: Synthetic data generation for grammatical error correction with tagged corruption models (2021). arXiv preprint arXiv:​2105.​13318
24.
Zurück zum Zitat Tarnavskyi, M., Chernodub, A., Omelianchuk, K.: Ensembling and knowledge distilling of large sequence taggers for grammatical error correction (2022). arXiv preprint arXiv:2203.13064 Tarnavskyi, M., Chernodub, A., Omelianchuk, K.: Ensembling and knowledge distilling of large sequence taggers for grammatical error correction (2022). arXiv preprint arXiv:​2203.​13064
25.
Zurück zum Zitat Trinh, V.A., Rozovskaya, A.: New dataset and strong baselines for the grammatical error correction of Russian. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 4103–4111 (2021) Trinh, V.A., Rozovskaya, A.: New dataset and strong baselines for the grammatical error correction of Russian. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 4103–4111 (2021)
26.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–10 (2017) Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–10 (2017)
27.
Zurück zum Zitat Wang, C., Yang, L., Wang, Y., Du, Y., Yang, E.: Chinese grammatical error correction method based on transformer enhanced architecture. J. Chin. Inf. Process. 34(6), 106–114 (2020) Wang, C., Yang, L., Wang, Y., Du, Y., Yang, E.: Chinese grammatical error correction method based on transformer enhanced architecture. J. Chin. Inf. Process. 34(6), 106–114 (2020)
28.
Zurück zum Zitat Xu, H.D., et al.: Read, listen, and see: Leveraging multimodal information helps Chinese spell checking. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 716–728 (2021) Xu, H.D., et al.: Read, listen, and see: Leveraging multimodal information helps Chinese spell checking. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 716–728 (2021)
29.
Zurück zum Zitat Yuan, Z., Briscoe, T.: Grammatical error correction using neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–386 (2016) Yuan, Z., Briscoe, T.: Grammatical error correction using neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–386 (2016)
30.
Zurück zum Zitat Yuan, Z., Bryant, C.: Document-level grammatical error correction. In: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 75–84 (2021) Yuan, Z., Bryant, C.: Document-level grammatical error correction. In: Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 75–84 (2021)
31.
Zurück zum Zitat Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890 (2020) Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890 (2020)
32.
Zurück zum Zitat Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3118–3130 (2022) Zhang, Y., et al.: MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3118–3130 (2022)
33.
Zurück zum Zitat Zhang, Y., Zhang, B., Li, Z., Bao, Z., Li, C., Zhang, M.: SynGEC: Syntax-enhanced grammatical error correction with a tailored GEC-oriented parser (2022). arXiv preprint arXiv:2210.12484 Zhang, Y., Zhang, B., Li, Z., Bao, Z., Li, C., Zhang, M.: SynGEC: Syntax-enhanced grammatical error correction with a tailored GEC-oriented parser (2022). arXiv preprint arXiv:​2210.​12484
35.
Zurück zum Zitat Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 1226–1233 (2020) Zhao, Z., Wang, H.: MaskGEC: improving neural grammatical error correction via dynamic masking. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 1226–1233 (2020)
Metadaten
Titel
A BERT-Based Model for Legal Document Proofreading
verfasst von
Jinlong Liu
Xudong Luo
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-57808-3_14