nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Recursive Style Breach Detection with Multifaceted Ensemble Learning

verfasst von : Daniel Kopev, Dimitrina Zlatkova, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Erschienen in: Artificial Intelligence: Methodology, Systems, and Applications

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

Nächstes Kapitel Improving Machine Learning Prediction Performance for Premature Termination of Psychotherapy

http://pan.webis.de/clef18/pan18-web/author-identification.html.

http://pan.webis.de/clef17/pan17-web/author-identification.html.

http://semanticsimilarity.files.wordpress.com/2013/08/jim-oshea-fwlist-277.pdf.

http://www.sequencepublishing.com/1/academic.html.

http://www.edu.uwo.ca/faculty-profiles/docs/other/webb/essential-word-list.pdf.

http://norvig.com/google-books-common-words.txt.

http://github.com/shivam5992/textstat.

http://pan.webis.de/clef17/pan17-web/author-identification.html.

In this dataset, style change also means switch of authorship.

Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)CrossRef

Meyer zu Eissen, S., Stein, B., Kulig, M.: Plagiarism detection without reference collections. In: Decker, R., Lenz, H.J. (eds.) Advances in Data Analysis, pp. 359–366. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70981-7_40CrossRef

Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2017: safety evaluation revisited. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, vol. 1866 (2017)

Karadzhov, G., Mihaylova, T., Kiprov, Y., Georgiev, G., Koychev, I., Nakov, P.: The case for being average: a mediocrity approach to style masking and author obfuscation. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 173–185. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_18CrossRef

Karaś, D., Śpiewak, M., Sobecki, P.: OPI-JSA at CLEF 2017: author clustering and style breach detection-notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)

Ke, G., et al: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, California, pp. 3146–3154 (2017)

Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018)

Khan, J.: Style breach detection: an unsupervised detection model–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)

Kuznetsov, M., Motrenko, A., Kuznetsova, R., Strijov, V.: Methods for intrinsic plagiarism detection and author diarization–notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016)

10.

Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)MathSciNet

11.

Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, Philadelphia, Pennsylvania, pp. 63–70 (2002)

12.

Mihaylova, T., Karadjov, G., Kiprov, Y., Georgiev, G., Koychev, I., Nakov, P.: SU@PAN’2016: author obfuscation. In: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum, CLEF 2016, Évora, Portugal, pp. 956–969 (2016)

13.

Pervaz, I., Ameer, I., Sittar, A., Nawab, R.: Identification of author personality traits using stylistic features–notebook for PAN at CLEF 2015. In: CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2015, Toulouse, France (2015)

14.

Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef

15.

Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes Papers of the CLEF 2016 Evaluation Labs, CLEF 2016, Évora, Portugal (2016)

16.

Safin, K., Kuznetsova, R.: Style breach detection with neural sentence embeddings–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)

17.

Scaiano, M., Inkpen, D.: Getting more from segmentation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2012, Montreal, Canada, pp. 362–366 (2012)

18.

Sittar, A., Iqbal, H., Nawab, R.: Author diarization using cluster-distance approach-notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016)

19.

Tschuggnall, M., et al.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, Dublin, Ireland (2017)

20.

Zlatkova, D., et al.: An ensemble-rich multi-aspect approach towards robust style change detection: notebook for PAN at CLEF 2018. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018)

Titel: Recursive Style Breach Detection with Multifaceted Ensemble Learning
verfasst von: Daniel Kopev
Dimitrina Zlatkova
Kristiyan Mitov
Atanas Atanasov
Momchil Hardalov
Ivan Koychev
Preslav Nakov
Verlag: Springer International Publishing
Buch: Artificial Intelligence: Methodology, Systems, and Applications
Print ISBN: 978-3-319-99343-0

Electronic ISBN: 978-3-319-99344-7

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-99344-7_12

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"