2004 | OriginalPaper | Buchkapitel
Automatic utterance boundaries recognition in large Polish text corpora
verfasst von : Michał Rudolf, Marek Świdziński
Erschienen in: Intelligent Information Processing and Web Mining
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The paper reports on the first step in the process of automatic Polish text analysis. Such an analysis is aimed at assigning a structure to the input text units. Our aim is to present an effective method of segmentation of text corpora into utterances, which are the highest level syntactic units. We have implemented the method; our results look promising. In the experiments we have used some fragments of the 60-million corpus of the PWN Publishing House. The corpus is digitally accessible.