nach oben

Journal of Intelligent Information Systems

Erschienen in:

01.04.2016

SAUText - a system for analysis of unstructured textual data

verfasst von: Grzegorz Protaziuk, Jacek Lewandowski, Robert Bembenik

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Nowadays, semantic lexical resources, like ontologies, are becoming increasingly important in many systems, in particular those providing access to unstructured textual data. Typically, such resources are built based on already existing repositories and by analyzing available texts. In practice, however, building new or enriching existing resources of such type cannot be accomplished without using an appropriate tool. In this paper the SAUText is presented; it is a new system which provides the infrastructure for carrying out research involving the usage of semantic resources and the analysis of unstructured textual data. In the system a dedicated repository for storing various kinds of text data is used and parallelization is taken advantage of in order to speed up the analysis. As an example of a method for knowledge discovery available in the system, a new approach for synonym discovery is introduced.

Vorheriger Artikel Robust recommendation method based on suspicious users measurement and multidimensional trust

Nächster Artikel Machine learning for intrusion detection in MANET: a state-of-the-art survey

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Apache Solr, https://lucene.apache.org/solr/

Apache Cassandra http://cassandra.apache.org/

Akka http://akka.io/

accessed June 2015

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases. VLDB ’94. 487–499: Morgan Kaufmann Publishers Inc.

Blondel, V.D., & Senellart, P. (2002). Automatic extraction of synonyms in a dictionary. In Proceeding TMW, Arlington, USA, pp 7–13, also registered as Technical Report 89 (2001). Louvain-la-neuve: Université catholique de Louvain.

Buitelaar, P., Olejnik, D., & Sintek, M. (2004). A protege plug-in for ontology extraction from text based on linguistic analysis. In Proceedings of the 1st European Semantic Web Symposium (ESWS), Heraklion, Greece.

Chakrabarti, K., Chaudhuri, S., Cheng, T., & Xin, D. (2012). A framework for robust discovery of entity synonyms. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1384–1392): ACM.

Cimiano, P., & Vlker, J. (2005). Text2onto - a framework for ontology learning and data-driven change discovery. In A. Montoyo, R. Munoz, & E. Metais (Eds.), Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Springer, Alicante, Spain, Lecture Notes in Computer Science, vol 3513 (pp. 227–238).

Cimiano, P., Mdche, A., Staab, S., & Vlker, J. (2009). Ontology learning. In S. Staab, & R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems (pp. 245–267). Berlin Heidelberg: Springer. doi:10.1007/978-3-540-92673-3_11.

Gawrysiak, P., Protaziuk, G., Rybinski, H., & Delteil, A. (2008). Text onto miner–a semi automated ontology building system. In Foundations of Intelligent Systems (pp. 563–573): Springer.

Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology?. In Handbook on ontologies (pp. 1–17): Springer.

Hagiwara, M., Ogawa, Y., & Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In Proceedings of COLING/ACL (pp. 353–360).

Kao, A., & Poteet, S. (2007). Natural language processing and text mining: Springer.

Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.CrossRefMATH

Maedche, A., & Volz, R. (2001). The text-to-onto ontology extraction and maintenance system. In Workshop on Integrating Data Mining and Knowledge Management co-located with the 1st International Conference on Data Mining, San Jose, California, USA.

Maynard, D., Funk, A., & Peters, W. (2009). Sprat: a tool for automatic semantic pattern-based ontology population. In International conference for digital libraries and semantic web.

Poon, H., & Domingos, P. (2010). Unsupervised ontology induction from text. In J. Hajic, S. Carberry, & S. Clark (Eds.), ACL, The Association for Computer Linguistics (pp. 296–305).

Protaziuk, G., Kryszkiewicz, M., Rybinski, H., & Delteil, A. (2007). Discovering compound and proper nouns. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 505–515).

Protaziuk, G., Kaczynski, M., & Bembenik R (2016). Automatic translation of multi-word labels. (in press).

Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2005). Using context-window overlapping in synonym discovery and ontology extension. In International Conference on Recent Advances in Natural Language Processing (RANLP 2005), Borovets, Bulgaria.

Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., & Delteil, A. (2007). Discovering synonyms based on frequent termsets. In M. Kryszkiewicz, J. F. Peters, H. Rybinski, & A. Skowron (Eds.), RSEISP, Springer, Lecture Notes in Computer Science, vol 4585 (pp. 516–525).

Turney, P.D. (2001). Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the 12th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, EMCL ’01 (pp. 491–502).

Velardi, P., Navigli, R., Cucchiarelli, A., & Neri, F. (2006). Evaluation of OntoLearn, a methodology for automatic population of domain ontologies. In P. Buitelaar, P. Cimiano, & B. Magnini (Eds.), Ontology Learning from Text: Methods, Applications and Evaluation: IOS Press.

Wang, T., & Hirst, G. (2012). Exploring patterns in dictionary definitions for synonym extraction. Natural Language Engineering, 18(03), 13–342.

Weiss, S., Indurkhya, N., & Zhang, T. (2010). Fundamentals of predictive text mining. Texts in computer science: Springer.

White, T. (2015). Hadoop: 1e: O’Reilly Media.

Titel: SAUText - a system for analysis of unstructured textual data
verfasst von: Grzegorz Protaziuk
Jacek Lewandowski
Robert Bembenik
Publikationsdatum: 01.04.2016
Verlag: Springer US
Erschienen in: Journal of Intelligent Information Systems / Ausgabe 2/2016
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-015-0384-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

Guest editors’ introduction: special issue on case-based reasoning

Case-base maintenance with multi-objective evolutionary algorithms

Enhancing case-based regression with automatically-generated ensembles of adaptations

A hierarchical multi-criteria sorting approach for recommender systems

Combining similarity and sentiment in opinion mining for product recommendation

Robust recommendation method based on suspicious users measurement and multidimensional trust

Premium Partner