Skip to main content

2018 | OriginalPaper | Buchkapitel

Recursive Style Breach Detection with Multifaceted Ensemble Learning

verfasst von : Daniel Kopev, Dimitrina Zlatkova, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Erschienen in: Artificial Intelligence: Methodology, Systems, and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)CrossRef Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)CrossRef
3.
Zurück zum Zitat Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2017: safety evaluation revisited. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, vol. 1866 (2017) Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2017: safety evaluation revisited. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, vol. 1866 (2017)
5.
Zurück zum Zitat Karaś, D., Śpiewak, M., Sobecki, P.: OPI-JSA at CLEF 2017: author clustering and style breach detection-notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017) Karaś, D., Śpiewak, M., Sobecki, P.: OPI-JSA at CLEF 2017: author clustering and style breach detection-notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)
6.
Zurück zum Zitat Ke, G., et al: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, California, pp. 3146–3154 (2017) Ke, G., et al: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, California, pp. 3146–3154 (2017)
7.
Zurück zum Zitat Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018) Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018)
8.
Zurück zum Zitat Khan, J.: Style breach detection: an unsupervised detection model–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017) Khan, J.: Style breach detection: an unsupervised detection model–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)
9.
Zurück zum Zitat Kuznetsov, M., Motrenko, A., Kuznetsova, R., Strijov, V.: Methods for intrinsic plagiarism detection and author diarization–notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016) Kuznetsov, M., Motrenko, A., Kuznetsova, R., Strijov, V.: Methods for intrinsic plagiarism detection and author diarization–notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016)
10.
Zurück zum Zitat Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)MathSciNet Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)MathSciNet
11.
Zurück zum Zitat Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, Philadelphia, Pennsylvania, pp. 63–70 (2002) Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, Philadelphia, Pennsylvania, pp. 63–70 (2002)
12.
Zurück zum Zitat Mihaylova, T., Karadjov, G., Kiprov, Y., Georgiev, G., Koychev, I., Nakov, P.: SU@PAN’2016: author obfuscation. In: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum, CLEF 2016, Évora, Portugal, pp. 956–969 (2016) Mihaylova, T., Karadjov, G., Kiprov, Y., Georgiev, G., Koychev, I., Nakov, P.: SU@PAN’2016: author obfuscation. In: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum, CLEF 2016, Évora, Portugal, pp. 956–969 (2016)
13.
Zurück zum Zitat Pervaz, I., Ameer, I., Sittar, A., Nawab, R.: Identification of author personality traits using stylistic features–notebook for PAN at CLEF 2015. In: CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2015, Toulouse, France (2015) Pervaz, I., Ameer, I., Sittar, A., Nawab, R.: Identification of author personality traits using stylistic features–notebook for PAN at CLEF 2015. In: CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2015, Toulouse, France (2015)
14.
Zurück zum Zitat Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)CrossRef
15.
Zurück zum Zitat Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes Papers of the CLEF 2016 Evaluation Labs, CLEF 2016, Évora, Portugal (2016) Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes Papers of the CLEF 2016 Evaluation Labs, CLEF 2016, Évora, Portugal (2016)
16.
Zurück zum Zitat Safin, K., Kuznetsova, R.: Style breach detection with neural sentence embeddings–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017) Safin, K., Kuznetsova, R.: Style breach detection with neural sentence embeddings–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)
17.
Zurück zum Zitat Scaiano, M., Inkpen, D.: Getting more from segmentation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2012, Montreal, Canada, pp. 362–366 (2012) Scaiano, M., Inkpen, D.: Getting more from segmentation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2012, Montreal, Canada, pp. 362–366 (2012)
18.
Zurück zum Zitat Sittar, A., Iqbal, H., Nawab, R.: Author diarization using cluster-distance approach-notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016) Sittar, A., Iqbal, H., Nawab, R.: Author diarization using cluster-distance approach-notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016)
19.
Zurück zum Zitat Tschuggnall, M., et al.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, Dublin, Ireland (2017) Tschuggnall, M., et al.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, Dublin, Ireland (2017)
20.
Zurück zum Zitat Zlatkova, D., et al.: An ensemble-rich multi-aspect approach towards robust style change detection: notebook for PAN at CLEF 2018. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018) Zlatkova, D., et al.: An ensemble-rich multi-aspect approach towards robust style change detection: notebook for PAN at CLEF 2018. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018)
Metadaten
Titel
Recursive Style Breach Detection with Multifaceted Ensemble Learning
verfasst von
Daniel Kopev
Dimitrina Zlatkova
Kristiyan Mitov
Atanas Atanasov
Momchil Hardalov
Ivan Koychev
Preslav Nakov
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-99344-7_12