Skip to main content

2016 | OriginalPaper | Buchkapitel

Enhancing Concept Extraction from Polish Texts with Rule Management

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a system for extraction of concepts from unstructured Polish texts. Here concepts are understood as n-grams, whose words satisfy specific grammatical constraints. Detection and transformation of concepts to their normalized form are performed with rules defined in a language, which combines elements of colored and fuzzy Petri nets. We apply a user friendly method for specification of samples of transformation patterns that are further compiled to rules. To improve accuracy and performance, we recently introduced rule management mechanisms, which are based on two relations between rules: partial refinement and covering. The implemented methods include filtering with metarules and removal of redundant rules (i.e. these covered by other rules). We report results of experiments, which aimed at extracting specific concepts (actions) using a ruleset refactored with the developed rule management techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Acedański, S.: A morphosyntactic brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)CrossRef Acedański, S.: A morphosyntactic brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)CrossRef
2.
Zurück zum Zitat Blake, C., Pratt, W.: Better rules, fewer features: a semantic approach to selecting features from text. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 59–66. IEEE (2001) Blake, C., Pratt, W.: Better rules, fewer features: a semantic approach to selecting features from text. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 59–66. IEEE (2001)
3.
Zurück zum Zitat Bloehdorn, S., Cimiano, P., Hotho, A.: Learning ontologies to improve text clustering and classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 334–341. Springer, Heidelberg (2006). http://dx.doi.org/10.1007/3-540-31314-1_40 CrossRef Bloehdorn, S., Cimiano, P., Hotho, A.: Learning ontologies to improve text clustering and classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 334–341. Springer, Heidelberg (2006). http://​dx.​doi.​org/​10.​1007/​3-540-31314-1_​40 CrossRef
4.
Zurück zum Zitat Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John Wiley & Sons, New York (2004)CrossRefMATH Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John Wiley & Sons, New York (2004)CrossRefMATH
5.
Zurück zum Zitat Challis, J.: Lateral thinking in information retrieval white paper. Technical report, Concept Searching (2003) Challis, J.: Lateral thinking in information retrieval white paper. Technical report, Concept Searching (2003)
6.
Zurück zum Zitat Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE Trans. Knowl. Data Eng. 2(3), 311–319 (1990)CrossRef Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE Trans. Knowl. Data Eng. 2(3), 311–319 (1990)CrossRef
7.
Zurück zum Zitat Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. (JAIR) 24, 305–339 (2005)MATH Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. (JAIR) 24, 305–339 (2005)MATH
8.
Zurück zum Zitat Daciuk, J.: Incremental construction of finite-state automata and transducers, and their use in the natural language processing. Ph.D. thesis, Gdansk University of Technology, ETI faculty, Gabriela Narutowicza 11(12), pp. 80–233 Gdansk Poland (1998) Daciuk, J.: Incremental construction of finite-state automata and transducers, and their use in the natural language processing. Ph.D. thesis, Gdansk University of Technology, ETI faculty, Gabriela Narutowicza 11(12), pp. 80–233 Gdansk Poland (1998)
9.
Zurück zum Zitat Dalvi, N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A web of concepts. In: Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–12. ACM (2009) Dalvi, N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A web of concepts. In: Proceedings of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–12. ACM (2009)
12.
Zurück zum Zitat Jensen, K.: Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use, vol. 1. Springer, Berlin Heidelberg (1996)CrossRefMATH Jensen, K.: Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use, vol. 1. Springer, Berlin Heidelberg (1996)CrossRefMATH
13.
Zurück zum Zitat Ligeza, A.: Logical Foundations for Rule-Based Systems. Studies in Computational Intelligence, vol. 11, 2nd edn. Springer, Heidelberg (2006)MATH Ligeza, A.: Logical Foundations for Rule-Based Systems. Studies in Computational Intelligence, vol. 11, 2nd edn. Springer, Heidelberg (2006)MATH
14.
Zurück zum Zitat Maedche, A., Staab, S.: Ontology learning for the semantic web. Intell. Syst. IEEE 16(2), 72–79 (2001)CrossRef Maedche, A., Staab, S.: Ontology learning for the semantic web. Intell. Syst. IEEE 16(2), 72–79 (2001)CrossRef
16.
Zurück zum Zitat Miłkowski, M.: Developing an open-source, rule-based proofreading tool. Softw.: Pract. Exp. 40(7), 543–566 (2010) Miłkowski, M.: Developing an open-source, rule-based proofreading tool. Softw.: Pract. Exp. 40(7), 543–566 (2010)
19.
Zurück zum Zitat Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. Intell. Syst. IEEE 20(3), 48–54 (2005)CrossRef Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. Intell. Syst. IEEE 20(3), 48–54 (2005)CrossRef
20.
Zurück zum Zitat Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: Extracting concepts from large datasets. Proc. VLDB Endow. 3(1–2), 566–577 (2010)CrossRef Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: Extracting concepts from large datasets. Proc. VLDB Endow. 3(1–2), 566–577 (2010)CrossRef
21.
Zurück zum Zitat Ross, T.: Fuzzy Logic with Engineering Applications. Wiley, New York (2009) Ross, T.: Fuzzy Logic with Engineering Applications. Wiley, New York (2009)
22.
Zurück zum Zitat Smith, B.: Beyond concepts: ontology as reality representation. In: Proceedings of the Third International Conference on Formal Ontology in Information Systems (FOIS 2004), pp. 73–84 (2004) Smith, B.: Beyond concepts: ontology as reality representation. In: Proceedings of the Third International Conference on Formal Ontology in Information Systems (FOIS 2004), pp. 73–84 (2004)
23.
Zurück zum Zitat Stavrianou, A., Andritsos, P., Nicoloyannis, N.: Overview and semantic issues of text mining. ACM Sigmod Rec. 36(3), 23–34 (2007)CrossRef Stavrianou, A., Andritsos, P., Nicoloyannis, N.: Overview and semantic issues of text mining. ACM Sigmod Rec. 36(3), 23–34 (2007)CrossRef
24.
Zurück zum Zitat Szwed, P.: Application of fuzzy ontological reasoning in an implementation of medical guidelines. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 342–349, June 2013 Szwed, P.: Application of fuzzy ontological reasoning in an implementation of medical guidelines. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 342–349, June 2013
26.
Zurück zum Zitat Szwed, P.: Concepts extraction from unstructured Polish texts: A rule based approach. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 355–364, September 2015 Szwed, P.: Concepts extraction from unstructured Polish texts: A rule based approach. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 355–364, September 2015
27.
Zurück zum Zitat Szwed, P., Komorkiewicz, M.: Object tracking and video event recognition with fuzzy semantic petri nets. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, 8–11 September 2013, pp. 167–174 (2013) Szwed, P., Komorkiewicz, M.: Object tracking and video event recognition with fuzzy semantic petri nets. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, 8–11 September 2013, pp. 167–174 (2013)
28.
Zurück zum Zitat Wolinski, M., Milkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: Polimorf: a (not so) new open morphological dictionary for polish. In: LREC, pp. 860–864 (2012) Wolinski, M., Milkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: Polimorf: a (not so) new open morphological dictionary for polish. In: LREC, pp. 860–864 (2012)
Metadaten
Titel
Enhancing Concept Extraction from Polish Texts with Rule Management
verfasst von
Piotr Szwed
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-34099-9_27

Premium Partner