Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2016

01.04.2016

SAUText - a system for analysis of unstructured textual data

verfasst von: Grzegorz Protaziuk, Jacek Lewandowski, Robert Bembenik

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, semantic lexical resources, like ontologies, are becoming increasingly important in many systems, in particular those providing access to unstructured textual data. Typically, such resources are built based on already existing repositories and by analyzing available texts. In practice, however, building new or enriching existing resources of such type cannot be accomplished without using an appropriate tool. In this paper the SAUText is presented; it is a new system which provides the infrastructure for carrying out research involving the usage of semantic resources and the analysis of unstructured textual data. In the system a dedicated repository for storing various kinds of text data is used and parallelization is taken advantage of in order to speed up the analysis. As an example of a method for knowledge discovery available in the system, a new approach for synonym discovery is introduced.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
Literatur
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB ’94. 487–499: Morgan Kaufmann Publishers Inc. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB ’94. 487–499: Morgan Kaufmann Publishers Inc.
Zurück zum Zitat Blondel, V.D., & Senellart, P. (2002). Automatic extraction of synonyms in a dictionary. In Proceeding TMW, Arlington, USA, pp 7–13, also registered as Technical Report 89 (2001). Louvain-la-neuve: Université catholique de Louvain. Blondel, V.D., & Senellart, P. (2002). Automatic extraction of synonyms in a dictionary. In Proceeding TMW, Arlington, USA, pp 7–13, also registered as Technical Report 89 (2001). Louvain-la-neuve: Université catholique de Louvain.
Zurück zum Zitat Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protege plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece. Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protege plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece.
Zurück zum Zitat Chakrabarti, K., Chaudhuri, S., Cheng, T., & Xin, D. (2012). A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1384–1392): ACM. Chakrabarti, K., Chaudhuri, S., Cheng, T., & Xin, D. (2012). A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1384–1392): ACM.
Zurück zum Zitat Cimiano, P., & Vlker, J. (2005). Text2onto - a framework for ontology learning and data-driven change discovery. In A. Montoyo, R. Munoz, & E. Metais (Eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, Alicante, Spain, Lecture Notes in Computer Science, vol 3513 (pp. 227–238). Cimiano, P., & Vlker, J. (2005). Text2onto - a framework for ontology learning and data-driven change discovery. In A. Montoyo, R. Munoz, & E. Metais (Eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, Alicante, Spain, Lecture Notes in Computer Science, vol 3513 (pp. 227–238).
Zurück zum Zitat Cimiano, P., Mdche, A., Staab, S., & Vlker, J. (2009). Ontology learning. In S. Staab, & R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems (pp. 245–267). Berlin Heidelberg: Springer. doi:10.1007/978-3-540-92673-3_11. Cimiano, P., Mdche, A., Staab, S., & Vlker, J. (2009). Ontology learning. In S. Staab, & R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems (pp. 245–267). Berlin Heidelberg: Springer. doi:10.​1007/​978-3-540-92673-3_​11.
Zurück zum Zitat Gawrysiak, P., Protaziuk, G., Rybinski, H., & Delteil, A. (2008). Text onto miner–a semi automated ontology building system. In Foundations of Intelligent Systems (pp. 563–573): Springer. Gawrysiak, P., Protaziuk, G., Rybinski, H., & Delteil, A. (2008). Text onto miner–a semi automated ontology building system. In Foundations of Intelligent Systems (pp. 563–573): Springer.
Zurück zum Zitat Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1–17): Springer. Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1–17): Springer.
Zurück zum Zitat Hagiwara, M., Ogawa, Y., & Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In Proceedings of COLING/ACL (pp. 353–360). Hagiwara, M., Ogawa, Y., & Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In Proceedings of COLING/ACL (pp. 353–360).
Zurück zum Zitat Kao, A., & Poteet, S. (2007). Natural language processing and text mining: Springer. Kao, A., & Poteet, S. (2007). Natural language processing and text mining: Springer.
Zurück zum Zitat Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.CrossRefMATH Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.CrossRefMATH
Zurück zum Zitat Maedche, A., & Volz, R. (2001). The text-to-onto ontology extraction and maintenance system. In Workshop on Integrating Data Mining and Knowledge Management co-located with the 1st International Conference on Data Mining, San Jose, California, USA. Maedche, A., & Volz, R. (2001). The text-to-onto ontology extraction and maintenance system. In Workshop on Integrating Data Mining and Knowledge Management co-located with the 1st International Conference on Data Mining, San Jose, California, USA.
Zurück zum Zitat Maynard, D., Funk, A., & Peters, W. (2009). Sprat: a tool for automatic semantic pattern-based ontology population. In International conference for digital libraries and semantic web. Maynard, D., Funk, A., & Peters, W. (2009). Sprat: a tool for automatic semantic pattern-based ontology population. In International conference for digital libraries and semantic web.
Zurück zum Zitat Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In J. Hajic, S. Carberry, & S. Clark (Eds.), ACL, The Association for Computer Linguistics (pp. 296–305). Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In J. Hajic, S. Carberry, & S. Clark (Eds.), ACL, The Association for Computer Linguistics (pp. 296–305).
Zurück zum Zitat Protaziuk, G., Kryszkiewicz, M., Rybinski, H., & Delteil, A. (2007). Discovering compound and proper nouns. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 505–515). Protaziuk, G., Kryszkiewicz, M., Rybinski, H., & Delteil, A. (2007). Discovering compound and proper nouns. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 505–515).
Zurück zum Zitat Protaziuk, G., Kaczynski, M., & Bembenik R (2016). Automatic translation of multi-word labels. (in press). Protaziuk, G., Kaczynski, M., & Bembenik R (2016). Automatic translation of multi-word labels. (in press).
Zurück zum Zitat Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2005). Using context-window overlapping in synonym discovery and ontology extension. In International Conference on Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria. Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2005). Using context-window overlapping in synonym discovery and ontology extension. In International Conference on Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria.
Zurück zum Zitat Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., & Delteil, A. (2007). Discovering synonyms based on frequent termsets. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 516–525). Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., & Delteil, A. (2007). Discovering synonyms based on frequent termsets. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 516–525).
Zurück zum Zitat Turney, P.D. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, EMCL ’01 (pp. 491–502). Turney, P.D. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, EMCL ’01 (pp. 491–502).
Zurück zum Zitat Velardi, P., Navigli, R., Cucchiarelli, A., & Neri, F. (2006). Evaluation of OntoLearn, a methodology for automatic population of domain ontologies. In P. Buitelaar, P. Cimiano, & B. Magnini (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation: IOS Press. Velardi, P., Navigli, R., Cucchiarelli, A., & Neri, F. (2006). Evaluation of OntoLearn, a methodology for automatic population of domain ontologies. In P. Buitelaar, P. Cimiano, & B. Magnini (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation: IOS Press.
Zurück zum Zitat Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(03), 13–342. Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(03), 13–342.
Zurück zum Zitat Weiss, S., Indurkhya, N., & Zhang, T. (2010). Fundamentals of predictive text mining. Texts in computer science: Springer. Weiss, S., Indurkhya, N., & Zhang, T. (2010). Fundamentals of predictive text mining. Texts in computer science: Springer.
Zurück zum Zitat White, T. (2015). Hadoop: 1e: O’Reilly Media. White, T. (2015). Hadoop: 1e: O’Reilly Media.
Metadaten
Titel
SAUText - a system for analysis of unstructured textual data
verfasst von
Grzegorz Protaziuk
Jacek Lewandowski
Robert Bembenik
Publikationsdatum
01.04.2016
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2016
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-015-0384-1

Weitere Artikel der Ausgabe 2/2016

Journal of Intelligent Information Systems 2/2016 Zur Ausgabe

Premium Partner