Understanding what makes written texts sound like they are written by their author has been an unsolved problem for hundreds of years. The attributes of authorship are often clumped together as an attempt to solve the case of an unknown author while the practice of investigating a single attribute by eliminating the effect of all others has been paid little attention. One of the debated attributes is the size of the text segments which authors use to group words together. Texts consist of these segments — sentences — which are of different lengths, the values being distributed in ways that are assumed to be characteristic of the author. Comparing the statistics of paired text samples, we can show that differences in the statistics in fact indicate difference in the authorship of the texts. However, certain choices of metrics and units easily lead to random and meaningless results.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- On sentence length distribution as an authorship attribute
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA, Product Lifecycle Management/© Eisenhans | vege | Fotolia