Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms

verfasst von : Miaomiao Lei, Jidong Ge, Zhongjin Li, Chuanyi Li, Yemao Zhou, Xiaoyu Zhou, Bin Luo

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In law, a judgment is a decision by a court that resolves a controversy and determines the rights and liabilities of parties in a legal action or proceeding. In 2013, China Judgments Online system was launched officially for record keeping and notification, up to now, over 23 million electronic judgment documents are recorded. The huge amount of judgment documents has witnessed the improvement of judicial justice and openness. Document categorization becomes increasingly important for judgments indexing and further analysis. However, it is almost impossible to categorize them manually due to their large volume and rapid growth. In this paper, we propose a machine learning approach to automatically classify Chinese judgment documents using machine learning algorithms including Naive Bayes (NB), Decision Tree (DT), Random Forest (RF) and Support Vector Machine (SVM). A judgment document is represented as vector space model (VSM) using TF-IDF after words segmentation. To improve performance, we construct a set of judicial stop words. Besides, as TF-IDF generates a high dimensional feature vector, which leads to an extremely high time complexity, we utilize three dimensional reduction methods. Based on 6735 pieces of judgment documents, extensive experiments demonstrate the effectiveness and high classification performance of our proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, C.C., Zhai, C.X.: An introduction to text mining. In: Mining Text Data, pp. 1–10 (2012) Aggarwal, C.C., Zhai, C.X.: An introduction to text mining. In: Mining Text Data, pp. 1–10 (2012)
2.
Zurück zum Zitat Strzalkowski, T.: Document representation in natural language text retrieval. In: Proceedings of the Workshop on Human Language Technology, pp. 364–369 (1994) Strzalkowski, T.: Document representation in natural language text retrieval. In: Proceedings of the Workshop on Human Language Technology, pp. 364–369 (1994)
3.
Zurück zum Zitat Jiang, S., Lewris, J., Voltmer, M.: Integrating rich document representations for text classification. In: Systems and Information Engineering Design Symposium (SIEDS) (2016) Jiang, S., Lewris, J., Voltmer, M.: Integrating rich document representations for text classification. In: Systems and Information Engineering Design Symposium (SIEDS) (2016)
4.
Zurück zum Zitat Liu, Y., Song, W., Liu, L.: Document representation based on semantic smoothed topic model. In: International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016) Liu, Y., Song, W., Liu, L.: Document representation based on semantic smoothed topic model. In: International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (2016)
5.
Zurück zum Zitat Yang, S., Guo, J.: A novel approach for business document representation and processing without semantic ambiguity in e-commerce. In: 6th IEEE Conference on Software Engineering and Service Science (ICSESS) (2015) Yang, S., Guo, J.: A novel approach for business document representation and processing without semantic ambiguity in e-commerce. In: 6th IEEE Conference on Software Engineering and Service Science (ICSESS) (2015)
6.
Zurück zum Zitat Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM) (2008) Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Proceedings of the 2nd International Conference on Weblogs and Social Media (ICWSM) (2008)
7.
Zurück zum Zitat Berry, M.: Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992) Berry, M.: Large-scale sparse singular value computations. Int. J. Supercomput. Appl. 6(1), 13–49 (1992)
8.
Zurück zum Zitat Blei, D., Lafferty, J.: Dynamic topic models. In: ICML, pp. 113–120 (2006) Blei, D., Lafferty, J.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
9.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, p. 21 (1999) Hofmann, T.: Probabilistic latent semantic analysis. In: UAI, p. 21 (1999)
10.
Zurück zum Zitat Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. 3, 993–1022 (2003)MATH Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. 3, 993–1022 (2003)MATH
11.
Zurück zum Zitat Apte, C., Damerau, F., Weiss, S.: Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12(3), 233–251 (1994)CrossRef Apte, C., Damerau, F., Weiss, S.: Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12(3), 233–251 (1994)CrossRef
12.
Zurück zum Zitat Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: ACM SIGIR Conference (1998) Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: ACM SIGIR Conference (1998)
13.
Zurück zum Zitat Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)CrossRef Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)CrossRef
14.
Zurück zum Zitat Ng, A.Y., Jordan, M.I., On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: NIPS, pp. 841–848 (2001) Ng, A.Y., Jordan, M.I., On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: NIPS, pp. 841–848 (2001)
15.
Zurück zum Zitat Sun, J.-T., Chen, Z., Zeng, H.-J., Lu, Y., Shi, C.-Y., Ma, W.-Y.: Supervised latent semantic indexing for document categorization. In: ICDM Conference (2004) Sun, J.-T., Chen, Z., Zeng, H.-J., Lu, Y., Shi, C.-Y., Ma, W.-Y.: Supervised latent semantic indexing for document categorization. In: ICDM Conference (2004)
16.
17.
Zurück zum Zitat Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the COLING: Demonstrations, Beijing, China, pp. 13–16, August 2010 Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the COLING: Demonstrations, Beijing, China, pp. 13–16, August 2010
Metadaten
Titel
Automatically Classify Chinese Judgment Documents Utilizing Machine Learning Algorithms
verfasst von
Miaomiao Lei
Jidong Ge
Zhongjin Li
Chuanyi Li
Yemao Zhou
Xiaoyu Zhou
Bin Luo
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55705-2_1