Skip to main content
Top

2017 | OriginalPaper | Chapter

Wikipedia Vandal Early Detection: From User Behavior to User Embedding

Authors : Shuhan Yuan, Panpan Zheng, Xintao Wu, Yang Xiang

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Wikipedia is the largest online encyclopedia that allows anyone to edit articles. In this paper, we propose the use of deep learning to detect vandals based on their edit history. In particular, we develop a multi-source long-short term memory network (M-LSTM) to model user behaviors by using a variety of user edit aspects as inputs, including the history of edit reversion information, edit page titles and categories. With M-LSTM, we can encode each user into a low dimensional real vector, called user embedding. Meanwhile, as a sequential model, M-LSTM updates the user embedding each time after the user commits a new edit. Thus, we can predict whether a user is benign or vandal dynamically based on the up-to-date user embedding. Furthermore, those user embeddings are crucial to discover collaborative vandals. Code and data related to this chapter are available at: https://​bitbucket.​org/​bookcold/​vandal_​detection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Adler, B.T., de Alfaro, L.: A content-driven reputation system for the wikipedia. In: WWW, pp. 261–270 (2007) Adler, B.T., de Alfaro, L.: A content-driven reputation system for the wikipedia. In: WWW, pp. 261–270 (2007)
2.
go back to reference Adler, B.T., de Alfaro, L., Mola-Velasco, S.M., Rosso, P., West, A.G.: Wikipedia vandalism detection: combining natural language, metadata, and reputation features. In: CICLing, pp. 277–288 (2011) Adler, B.T., de Alfaro, L., Mola-Velasco, S.M., Rosso, P., West, A.G.: Wikipedia vandalism detection: combining natural language, metadata, and reputation features. In: CICLing, pp. 277–288 (2011)
4.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
5.
go back to reference Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. TPAMI 35(8), 1798–1828 (2013)CrossRef Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. TPAMI 35(8), 1798–1828 (2013)CrossRef
6.
go back to reference Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., Huang, T.S.: Heterogeneous network embedding via deep architectures. In: KDD (2015) Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., Huang, T.S.: Heterogeneous network embedding via deep architectures. In: KDD (2015)
7.
go back to reference Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996) Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
8.
go back to reference Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting burstiness in reviews for review spammer detection. In: ICWSM, pp. 175–184 (2013) Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting burstiness in reviews for review spammer detection. In: ICWSM, pp. 175–184 (2013)
9.
go back to reference Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP, pp. 6645–6649 (2013) Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP, pp. 6645–6649 (2013)
10.
go back to reference Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD (2016) Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD (2016)
11.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
12.
go back to reference Heindorf, S., Potthast, M., Stein, B., Engels, G.: Vandalism detection in wikidata. In: CIKM, pp. 327–336 (2016) Heindorf, S., Potthast, M., Stein, B., Engels, G.: Vandalism detection in wikidata. In: CIKM, pp. 327–336 (2016)
13.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
14.
go back to reference Javanmardi, S., McDonald, D.W., Lopes, C.V.: Vandalism detection in wikipedia: a high-performing, feature-rich model and its reduction through lasso. In: WikiSym, pp. 82–90 (2011) Javanmardi, S., McDonald, D.W., Lopes, C.V.: Vandalism detection in wikipedia: a high-performing, feature-rich model and its reduction through lasso. In: WikiSym, pp. 82–90 (2011)
15.
go back to reference Kumar, S., Spezzano, F., Subrahmanian, V.: Vews: a wikipedia vandal early warning system. In: KDD, pp. 607–616 (2015) Kumar, S., Spezzano, F., Subrahmanian, V.: Vews: a wikipedia vandal early warning system. In: KDD, pp. 607–616 (2015)
16.
go back to reference Li, J., Zhu, J., Zhang, B.: Discriminative deep random walk for network classification. In: ACL (2016) Li, J., Zhu, J., Zhang, B.: Discriminative deep random walk for network classification. In: ACL (2016)
17.
go back to reference Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting product review spammers using rating behaviors. In: CIKM, pp. 939–948 (2010) Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting product review spammers using rating behaviors. In: CIKM, pp. 939–948 (2010)
18.
go back to reference Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)MATH Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)MATH
19.
go back to reference McKeown, K., Wang, W.: Got you!: automatic vandalism detection in wikipedia with web-based shallow syntactic-semantic modeling. In: COLING, pp. 1146–1154 (2010) McKeown, K., Wang, W.: Got you!: automatic vandalism detection in wikipedia with web-based shallow syntactic-semantic modeling. In: COLING, pp. 1146–1154 (2010)
20.
go back to reference Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013) Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
21.
go back to reference Mola-Velasco, S.M.: Wikipedia vandalism detection through machine learning: feature review and new proposals. arXiv:1210.5560 [cs] (2012) Mola-Velasco, S.M.: Wikipedia vandalism detection through machine learning: feature review and new proposals. arXiv:​1210.​5560 [cs] (2012)
22.
go back to reference Mukherjee, A., Venkataraman, V., Liu, B., Glance, N.S.: What yelp fake review filter might be doing? In: ICWSM, pp. 409–418 (2013) Mukherjee, A., Venkataraman, V., Liu, B., Glance, N.S.: What yelp fake review filter might be doing? In: ICWSM, pp. 409–418 (2013)
23.
go back to reference Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: WWW, pp. 83–92 (2006) Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: WWW, pp. 83–92 (2006)
24.
go back to reference Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. J. Internet Serv. Appl. 1(1), 19–30 (2010)CrossRef Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. J. Internet Serv. Appl. 1(1), 19–30 (2010)CrossRef
25.
go back to reference Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
26.
go back to reference Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD, pp. 701–710 (2014) Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD, pp. 701–710 (2014)
27.
go back to reference Spirin, N., Han, J.: Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl. 13(2), 50–64 (2011)CrossRef Spirin, N., Han, J.: Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl. 13(2), 50–64 (2011)CrossRef
28.
go back to reference Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: ICDM, pp. 1–8 (2005) Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly detection in bipartite graphs. In: ICDM, pp. 1–8 (2005)
29.
go back to reference Tang, J., Qu, M., Mei, Q.: PTE: Predictive text embedding through large-scale heterogeneous text networks. In: KDD (2015) Tang, J., Qu, M., Mei, Q.: PTE: Predictive text embedding through large-scale heterogeneous text networks. In: KDD (2015)
30.
go back to reference Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: WWW, pp. 1067–1077 (2015) Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: WWW, pp. 1067–1077 (2015)
31.
go back to reference West, A.G., Kannan, S., Lee, I.: Detecting wikipedia vandalism via spatio-temporal analysis of revision metadata? In: EUROSEC, pp. 22–28 (2010) West, A.G., Kannan, S., Lee, I.: Detecting wikipedia vandalism via spatio-temporal analysis of revision metadata? In: EUROSEC, pp. 22–28 (2010)
32.
go back to reference Xie, S., Wang, G., Lin, S., Yu, P.S.: Review spam detection via temporal pattern discovery. In: KDD, pp. 823–831 (2012) Xie, S., Wang, G., Lin, S., Yu, P.S.: Review spam detection via temporal pattern discovery. In: KDD, pp. 823–831 (2012)
33.
go back to reference Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016) Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016)
34.
go back to reference Ying, X., Wu, X., Barbará, D.: Spectrum based fraud detection in social networks. In: Proceedings of the 27th International Conference on Data Engineering, Hannover, Germany, pp. 912–923. ICDE 2011, 11–16 April 2011 (2011) Ying, X., Wu, X., Barbará, D.: Spectrum based fraud detection in social networks. In: Proceedings of the 27th International Conference on Data Engineering, Hannover, Germany, pp. 912–923. ICDE 2011, 11–16 April 2011 (2011)
35.
go back to reference Yuan, S., Wu, X., Li, J., Lu, A.: Spectrum-based deep neural networks for fraud detection. CoRR abs/1706.00891 (2017) Yuan, S., Wu, X., Li, J., Lu, A.: Spectrum-based deep neural networks for fraud detection. CoRR abs/1706.00891 (2017)
Metadata
Title
Wikipedia Vandal Early Detection: From User Behavior to User Embedding
Authors
Shuhan Yuan
Panpan Zheng
Xintao Wu
Yang Xiang
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-71249-9_50

Premium Partner