Skip to main content
Top

2020 | OriginalPaper | Chapter

15. Die Anwendung von Machine Learning zur Gewinnung von Erkenntnissen aus Dokumentenstapeln

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Zusammenfassung

„Document Understanding“ ist das tiefe Verständnis eines Textes. Im Kern geht es um die Konvertierung von unstrukturierten Daten in Informationen und für Unternehmen gleichermaßen um die die Einhaltung von Governance- und Compliance-Richtlinien. Zum Einsatz kommt zumeist eine Sammlung von verschiedenen Methoden, zu denen unter anderem die Document Classification oder auch die Entity Extraction gehören. Viele Ansätze beruhen auf regelbasierten Systemen respektive auf statistischen Verfahren.
Der Einsatz von Machine Learning zur massenhaften Erschließung unstrukturierter Dokumente eröffnet neue Wege, um unter anderem Beziehungen zwischen Dokumenten sichtbar zu machen. ML ermöglicht Vorhersagen zur Dokumentenklassifizierung oder etwa die Extraktion von Wissen aus Textpassagen, Grafiken oder Feldern jenseits einfacher Mustererkennung. ML stellt Möglichkeiten einer semantischen Suche über Dokumente hinweg zur Verfügung und legt den Grundstein für erweiterte Analysen beispielsweise der Anomalieerkennung.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Kurzbezeichnung für einen Trainingsdatensatz bereitgestellt durch die Message Understanding Conferences.
 
3
Vielfach erfolgt eine Transkription für Audio- und Video-Daten durch das entsprechende System automatisch.
 
Literature
go back to reference Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28, 15–21.CrossRef Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28, 15–21.CrossRef
go back to reference Cash, G. L., & Hatamian, M. (1987). Optical character recognition by the method of moments. Comput Vision, Graph Image Process, 39, 291–310.CrossRef Cash, G. L., & Hatamian, M. (1987). Optical character recognition by the method of moments. Comput Vision, Graph Image Process, 39, 291–310.CrossRef
go back to reference Chinchor, N., & Robinson, P. (1997). MUC-7 named entity task definition. Proceedings of the Sixth Message Understanding Conference (MUC-6), 21. Chinchor, N., & Robinson, P. (1997). MUC-7 named entity task definition. Proceedings of the Sixth Message Understanding Conference (MUC-6), 21.
go back to reference Cimiano, P., & Völker, J. (2005). Towards large-scale, open-domain and ontology-based named entity classification. International Conference Recent Advances in Natural Language Process RANLP, 2005(1), 166–172. Cimiano, P., & Völker, J. (2005). Towards large-scale, open-domain and ontology-based named entity classification. International Conference Recent Advances in Natural Language Process RANLP, 2005(1), 166–172.
go back to reference Dang, H. T. (2005). Overview of DUC 2005. In Proceedings of the document understanding conference. Dang, H. T. (2005). Overview of DUC 2005. In Proceedings of the document understanding conference.
go back to reference Dengel, A., & Dubiel, F. (1995). Clustering and classification of document structure-a machine learning approach. In Proceedings of 3rd international conference on document analysis and recognition, 587–591. Dengel, A., & Dubiel, F. (1995). Clustering and classification of document structure-a machine learning approach. In Proceedings of 3rd international conference on document analysis and recognition, 587–591.
go back to reference Dengel, A. R. (2003). Making documents work: Challenges for document understanding. In Proceedings of the international conference on document analysis and recognition, ICDAR. Dengel, A. R. (2003). Making documents work: Challenges for document understanding. In Proceedings of the international conference on document analysis and recognition, ICDAR.
go back to reference Gharehchopogh, F. S., & Khalifelu, Z. A. (2011). Analysis and evaluation of unstructured data: Text mining versus natural language processing. In 5th International Conference on Application of Information and Communication Technologies (AICT), 1–4. Gharehchopogh, F. S., & Khalifelu, Z. A. (2011). Analysis and evaluation of unstructured data: Text mining versus natural language processing. In 5th International Conference on Application of Information and Communication Technologies (AICT), 1–4.
go back to reference Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. Proceedings of the 28th International Conference on Machine Learning ICML, 2011, 513–520. Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. Proceedings of the 28th International Conference on Machine Learning ICML, 2011, 513–520.
go back to reference Google Patents. (o. J.). Smart-home automation system that suggests or autmatically implements selected household policies based on sensed observations. Zugegriffen: 23. Dez. 2019. Google Patents. (o. J.). Smart-home automation system that suggests or autmatically implements selected household policies based on sensed observations. Zugegriffen: 23. Dez. 2019.
go back to reference Guerra, P. H. C., Veloso, A., Meira, W., & Almeida, V. (2011). From bias to opinion: A transfer-learning approach to real-time sentiment analysis. Processding ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 150–158. https://doi.org/10.1145/2020408.2020438. Guerra, P. H. C., Veloso, A., Meira, W., & Almeida, V. (2011). From bias to opinion: A transfer-learning approach to real-time sentiment analysis. Processding ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 150–158. https://​doi.​org/​10.​1145/​2020408.​2020438.
go back to reference Gunning, D. (2017). Explainable artificial intelligence (xai). The Defense Advanced Research Projects Agency (DARPA), nd Web 2. Gunning, D. (2017). Explainable artificial intelligence (xai). The Defense Advanced Research Projects Agency (DARPA), nd Web 2.
go back to reference Hamdaqa, M., & Hamou-Lhadj, A. (2009). Citation analysis: An approach for facilitating the understanding and the analysis of regulatory compliance documents. ITNG 2009 – 6th International Conference on Information Technology- New Generations, 2009, 278–283. https://doi.org/10.1109/ITNG.2009.161. Hamdaqa, M., & Hamou-Lhadj, A. (2009). Citation analysis: An approach for facilitating the understanding and the analysis of regulatory compliance documents. ITNG 2009 – 6th International Conference on Information Technology- New Generations, 2009, 278–283. https://​doi.​org/​10.​1109/​ITNG.​2009.​161.
go back to reference Han, A. L.-F., Wong, D. F., & Chao, L. S. (2013). Chinese named entity recognition with conditional random fields in the light of Chinese characteristics BT – Language processing. In M. A. Kłopotek, J. Koronacki, M. Marciniak, et al. (Hrsg.), Intelligent information systems (S. 57–68). Berlin: Springer. Han, A. L.-F., Wong, D. F., & Chao, L. S. (2013). Chinese named entity recognition with conditional random fields in the light of Chinese characteristics BT – Language processing. In M. A. Kłopotek, J. Koronacki, M. Marciniak, et al. (Hrsg.), Intelligent information systems (S. 57–68). Berlin: Springer.
go back to reference Hardy, H., Shimizu, N., Strzalkowski, T., et al. (2002). Cross-document summarization by concept classification. SIGIR Forum (ACM Spec Interes Gr Inf Retrieval), 2002, 121–128. Hardy, H., Shimizu, N., Strzalkowski, T., et al. (2002). Cross-document summarization by concept classification. SIGIR Forum (ACM Spec Interes Gr Inf Retrieval), 2002, 121–128.
go back to reference Holzinger, A. (2018). From machine learning to explainable AI. In 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), S. 55–66. Holzinger, A. (2018). From machine learning to explainable AI. In 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), S. 55–66.
go back to reference Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. ACL 2018 – 56th Annual Meeting of the Association for Computational Linguistics Proceeding Conference, 1, 328–339. Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. ACL 2018 – 56th Annual Meeting of the Association for Computational Linguistics Proceeding Conference, 1, 328–339.
go back to reference Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions on Computers, 4, 966–974. Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions on Computers, 4, 966–974.
go back to reference Jindal, R., Malhotra, R., & Jain, A. (2015). Techniques for text classification: Literature review and current trends. Webology, 12, 1–28. Jindal, R., Malhotra, R., & Jain, A. (2015). Techniques for text classification: Literature review and current trends. Webology, 12, 1–28.
go back to reference Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for text-documents classification. Journal of Advanced Information Technology, 1, 4–20.CrossRef Khan, A., Baharudin, B., Lee, L. H., & Khan, K. (2010). A review of machine learning algorithms for text-documents classification. Journal of Advanced Information Technology, 1, 4–20.CrossRef
go back to reference Lanjouw, J. O., Pakes, A., & Putnam, J. (1998). How to count patents and value intellectual property: The uses of patent renewal and application data. The Journal of Industrial Economics, 46, 405–432.CrossRef Lanjouw, J. O., Pakes, A., & Putnam, J. (1998). How to count patents and value intellectual property: The uses of patent renewal and application data. The Journal of Industrial Economics, 46, 405–432.CrossRef
go back to reference Lin, D., & Wu, X. (2009). Phrase clustering for discriminative learning. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2, 1030–1038. https://doi.org/10.3115/1690219.1690290. Lin, D., & Wu, X. (2009). Phrase clustering for discriminative learning. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2, 1030–1038. https://​doi.​org/​10.​3115/​1690219.​1690290.
go back to reference Lin, Y., Liu, Z., Sun, M., et al. (2015). Learning entity and relation embeddings for knowledge graph completion. Proceeding of National Conference on Artificial Intelligence, 3, 2181–2187. Lin, Y., Liu, Z., Sun, M., et al. (2015). Learning entity and relation embeddings for knowledge graph completion. Proceeding of National Conference on Artificial Intelligence, 3, 2181–2187.
go back to reference Liu, T., Chen, Z., Zhang, B., et al. (2004). Improving text classification using local latent semantic indexing. Proceeding – Fourth IEEE International Conference on Data Mining, ICDM, 2004, 162–169. Liu, T., Chen, Z., Zhang, B., et al. (2004). Improving text classification using local latent semantic indexing. Proceeding – Fourth IEEE International Conference on Data Mining, ICDM, 2004, 162–169.
go back to reference Marinai, S., & Fujisawa, H. (2008). Machine learning in document analysis and recognition. Heidelberg: Springer. Marinai, S., & Fujisawa, H. (2008). Machine learning in document analysis and recognition. Heidelberg: Springer.
go back to reference Mooney, R. J., & Roy, L. (2000). Content-based book recommending using learning for text categorization. Proceeding of ACM International Journal on Digital Libraries, 2000, 195–204. Mooney, R. J., & Roy, L. (2000). Content-based book recommending using learning for text categorization. Proceeding of ACM International Journal on Digital Libraries, 2000, 195–204.
go back to reference Mori, S., Nishida, H., & Yamada, H. (1999). Optical character recognition (1. Aufl.). New York: Wiley. Mori, S., Nishida, H., & Yamada, H. (1999). Optical character recognition (1. Aufl.). New York: Wiley.
go back to reference Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30, 3–26.CrossRef Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30, 3–26.CrossRef
go back to reference Nigyogi, D., & Srihari, S. N. (1986). A rule-based system for document understanding. Proceeding of AAAI, 1986, 789–793. Nigyogi, D., & Srihari, S. N. (1986). A rule-based system for document understanding. Proceeding of AAAI, 1986, 789–793.
go back to reference Nrl EM, Nrl DP, & Nyu RG. (1998). MUC-7 EVALUATION OF IE TECHNOLOGY : Overview of Results MUC-7 Program Committee. Program. Nrl EM, Nrl DP, & Nyu RG. (1998). MUC-7 EVALUATION OF IE TECHNOLOGY : Overview of Results MUC-7 Program Committee. Program.
go back to reference Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. LREc, 2010, 320–1326. Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. LREc, 2010, 320–1326.
go back to reference Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.CrossRef Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.CrossRef
go back to reference Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.
go back to reference Passonneau, R. (2011). Sentiment analysis of twitter data. Proceeding of Work Language Social Media (LSM 2011), 2011, 30–38. Passonneau, R. (2011). Sentiment analysis of twitter data. Proceeding of Work Language Social Media (LSM 2011), 2011, 30–38.
go back to reference Prince, V, & Labadié, A. (2007). Text segmentation based on document understanding for information retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Prince, V, & Labadié, A. (2007). Text segmentation based on document understanding for information retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
go back to reference Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. CoNLL 2009 – Proceedings of Thirteen Conference on Computational Natural Language Learning, 2009, 147–155. Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. CoNLL 2009 – Proceedings of Thirteen Conference on Computational Natural Language Learning, 2009, 147–155.
go back to reference Samek W., Wiegand T., & Müller K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. ITU Journal: ICT Discoveries, Special Issue No. 1 – Impact Artificial intelligence (AI) Communication Network Service, 1(1), 39–48. Samek W., Wiegand T., & Müller K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. ITU Journal: ICT Discoveries, Special Issue No. 1 – Impact Artificial intelligence (AI) Communication Network Service, 1(1), 39–48.
go back to reference Shaalan, K., & Raza, H. (2008). Arabic named entity recognition from diverse text types. In International Conference on Natural Language Processing, S. 440–451. Shaalan, K., & Raza, H. (2008). Arabic named entity recognition from diverse text types. In International Conference on Natural Language Processing, S. 440–451.
go back to reference Stevenson, R. A., Mikels, J. A., & James, T. W. (2007). Characterization of the affective norms for english words by discrete emotional categories. Behavior Research Methods, 39, 1020–1024.CrossRef Stevenson, R. A., Mikels, J. A., & James, T. W. (2007). Characterization of the affective norms for english words by discrete emotional categories. Behavior Research Methods, 39, 1020–1024.CrossRef
go back to reference Tanner, S. (2004). Deciding whether optical character recognition is feasible. London: King’s Digital Consultancy Services, 1–11. Tanner, S. (2004). Deciding whether optical character recognition is feasible. London: King’s Digital Consultancy Services, 1–11.
go back to reference Taylor, S. L., Lipshutz, M., Dahl, D. A., &Weir, C. (1993). An intelligent document understanding system. In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR ’93), S. 107–110. Taylor, S. L., Lipshutz, M., Dahl, D. A., &Weir, C. (1993). An intelligent document understanding system. In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR ’93), S. 107–110.
go back to reference Tramèr, F., Zhang, F., Juels, A., et al. (2016). Stealing machine learning models via prediction apis. In 25th ${$USENIX$}$ Security Symposium (${$USENIX$}$ Security 16), S. 601–618. Tramèr, F., Zhang, F., Juels, A., et al. (2016). Stealing machine learning models via prediction apis. In 25th ${$USENIX$}$ Security Symposium (${$USENIX$}$ Security 16), S. 601–618.
go back to reference Vincent, L. (2007). Google book search: Document understanding on a massive scale. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR Vincent, L. (2007). Google book search: Document understanding on a massive scale. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
go back to reference Yoshida, Y., Hirao, T., Iwata, T., et al. (2011). Transfer learning for multiple-domain sentiment analysis – Identifying domain dependent/independent word polarity. Proceeding of the National Conference on Artificial Intelligence, 2, 1286–1291. Yoshida, Y., Hirao, T., Iwata, T., et al. (2011). Transfer learning for multiple-domain sentiment analysis – Identifying domain dependent/independent word polarity. Proceeding of the National Conference on Artificial Intelligence, 2, 1286–1291.
go back to reference Yuan, Y., & Zhou, Y. (2013). Twitter Sentiment Analysis with Recursive Neural Networks. CS224D Course Projects, 2013, 1–8. Yuan, Y., & Zhou, Y. (2013). Twitter Sentiment Analysis with Recursive Neural Networks. CS224D Course Projects, 2013, 1–8.
go back to reference Zweig, G. G., & Padmanabhan, M. (2005). Information extraction from documents with regular expression matching. Washington: U.S. Patent and Trademark Office. Zweig, G. G., & Padmanabhan, M. (2005). Information extraction from documents with regular expression matching. Washington: U.S. Patent and Trademark Office.
Metadata
Title
Die Anwendung von Machine Learning zur Gewinnung von Erkenntnissen aus Dokumentenstapeln
Author
Stefan Ebener
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-658-29550-9_15