Computer Science and Information Systems 2012 Volume 9, Issue 1, Pages: 23-47
https://doi.org/10.2298/CSIS100901038S
Full text ( 453 KB)
Cited by
Representation of texts in structured form
Stanojević Mladen (Institute Mihajlo Pupin, Belgrade)
Vraneš Sanja (Institute Mihajlo Pupin, Belgrade)
Although the existing knowledge representation techniques, ranging from the
relational databases to the most recent Semantic web languages, are
successfully applied in numerous practical applications, they are still
unable to represent the information contained in text documents and web pages
in structured form, suitable for productive text processing. Text files can
represent text documents with no loss of information, however, this
information is represented in an unstructured form. Various knowledge
formalisms used in different phases of Natural Language Understanding, such
as lexical, syntactic, semantic, pragmatic and discourse analysis, are still
unable to represent texts in structured form with no loss of information. In
this paper, we define the crucial requirements for structured text
representation and then, we give a brief introduction to a representation
technique that fulfills all these requirements, including the basic data
types and learning techniques used to create, maintain and interpret the
resulting representation formalism.
Keywords: structured representation, learning, text processing, natural language understanding, regular languages