Skip to main content

2004 | OriginalPaper | Buchkapitel

Semantic Sequence Kin: A Method of Document Copy Detection

verfasst von : Jun-Peng Bao, Jun-Yi Shen, Xiao-Dong Liu, Hai-Yan Liu, Xiao-Di Zhang

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

The string matching and global word frequency model are two basic models of Document Copy Detection, although they are both unsatisfied in some respects. The String Kernel (SK) and Word Sequence Kernel (WSK) may map string pairs into a new feature space directly, in which the data is linearly separable. This idea inspires us with the Semantic Sequence Kin (SSK) and we apply it to document copy detection. SK and WSK only take into account the gap between the first word/term and the last word/term so that it is not good for plagiarism detection. SSK considers each common word’s position information so as to detect plagiarism in a fine granularity. SSK is based on semantic density that is indeed the local word frequency information. We believe these measures diminish the noise of rewording greatly. We test SSK in a small corpus with several common copy types. The result shows that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.

Metadaten
Titel
Semantic Sequence Kin: A Method of Document Copy Detection
verfasst von
Jun-Peng Bao
Jun-Yi Shen
Xiao-Dong Liu
Hai-Yan Liu
Xiao-Di Zhang
Copyright-Jahr
2004
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-24775-3_63