Skip to main content

2004 | OriginalPaper | Buchkapitel

Automatic utterance boundaries recognition in large Polish text corpora

verfasst von : Michał Rudolf, Marek Świdziński

Erschienen in: Intelligent Information Processing and Web Mining

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

The paper reports on the first step in the process of automatic Polish text analysis. Such an analysis is aimed at assigning a structure to the input text units. Our aim is to present an effective method of segmentation of text corpora into utterances, which are the highest level syntactic units. We have implemented the method; our results look promising. In the experiments we have used some fragments of the 60-million corpus of the PWN Publishing House. The corpus is digitally accessible.

Metadaten
Titel
Automatic utterance boundaries recognition in large Polish text corpora
verfasst von
Michał Rudolf
Marek Świdziński
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-39985-8_26

Neuer Inhalt