Skip to main content

2017 | OriginalPaper | Buchkapitel

Protection Against Information in eSociety: Using Data Mining Methods to Counteract Unwanted and Malicious Data

verfasst von : Igor Kotenko, Igor Saenko, Andrey Chechulin

Erschienen in: Digital Transformation and Global Society

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Despite the positive aspects of usage of the Internet and social networks within the concept of eSociety, huge data collections available for viewing and analysis to the user of the Internet can contain information which can be unwanted or malicious. The paper considers the problem of protection of users in the “electronic society” infrastructure against such information. The paper discusses the nature of the problem and possible approaches to its solution. To solve the problem it is proposed to use modular approach to construction of automated systems of protection against information, based on application of Data Mining methods. We consider the implementation of the system of protection against unwanted and harmful content, based on the classifier with three-level hierarchical architecture. Its experimental evaluation, which confirmed high efficiency of functioning of the system for most of the analyzed categories of web sites, are also discussed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Dumais, S., Platt, J., Heckermann, D., Sahami M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of 7th International Conference on Information and Knowledge Management, pp. 148–155. ACM, New York (1998). doi:10.1145/288627.288651 Dumais, S., Platt, J., Heckermann, D., Sahami M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of 7th International Conference on Information and Knowledge Management, pp. 148–155. ACM, New York (1998). doi:10.​1145/​288627.​288651
4.
Zurück zum Zitat Kotenko, I., Chechulin, A., Shorov, A., Komashinsky, D.: Analysis and evaluation of web pages classification techniques for inappropriate content blocking. In: Advances in Data Mining: Applications and Theoretical Aspects. In: Proceedings of 14th Industrial Conference, ICDM 2014, St. Petersburg, Russia, 16–20 July 2014, pp. 39–54 (2014). doi:10.1007/978-3-319-08976-8 Kotenko, I., Chechulin, A., Shorov, A., Komashinsky, D.: Analysis and evaluation of web pages classification techniques for inappropriate content blocking. In: Advances in Data Mining: Applications and Theoretical Aspects. In: Proceedings of 14th Industrial Conference, ICDM 2014, St. Petersburg, Russia, 16–20 July 2014, pp. 39–54 (2014). doi:10.​1007/​978-3-319-08976-8
5.
Zurück zum Zitat Kotenko, I., Chechulin, A., Komashinsky, D.: Evaluation of text classification techniques for inappropriate web content blocking. In: 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), pp. 412–417. IEEE Press, New York (2015). doi:10.1109/IDAACS.2015.7340769 Kotenko, I., Chechulin, A., Komashinsky, D.: Evaluation of text classification techniques for inappropriate web content blocking. In: 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), pp. 412–417. IEEE Press, New York (2015). doi:10.​1109/​IDAACS.​2015.​7340769
6.
Zurück zum Zitat Novozhilov, D., Kotenko, I., Chechulin, A.: Improving the categorization of web sites by analysis of HTML-tags statistics to block inappropriate content. In: Novais, P., Camacho, D., Analide, C., El Fallah Seghrouchni, A., Badica, C. (eds.) Intelligent Distributed Computing IX. SCI, vol. 616, pp. 257–263. Springer, Cham (2016). doi:10.1007/978-3-319-25017-5_24 CrossRef Novozhilov, D., Kotenko, I., Chechulin, A.: Improving the categorization of web sites by analysis of HTML-tags statistics to block inappropriate content. In: Novais, P., Camacho, D., Analide, C., El Fallah Seghrouchni, A., Badica, C. (eds.) Intelligent Distributed Computing IX. SCI, vol. 616, pp. 257–263. Springer, Cham (2016). doi:10.​1007/​978-3-319-25017-5_​24 CrossRef
7.
Zurück zum Zitat Kotenko, I., Chechulin, A., Komashinsky, D.: Categorization of web pages for protection against inappropriate content in the internet. Int. J. Internet Protocol Technol. (JIPT) 10(1), 61–71 (2017). doi:10.1504/IJIPT.2017.10003851 CrossRef Kotenko, I., Chechulin, A., Komashinsky, D.: Categorization of web pages for protection against inappropriate content in the internet. Int. J. Internet Protocol Technol. (JIPT) 10(1), 61–71 (2017). doi:10.​1504/​IJIPT.​2017.​10003851 CrossRef
8.
Zurück zum Zitat Elsas, J., Efron, M.: HTML tag based metrics for use in web page type classification. In: American Society for Information Science and Technology Annual Meeting. Providence, Rhode Island, USA (2004) Elsas, J., Efron, M.: HTML tag based metrics for use in web page type classification. In: American Society for Information Science and Technology Annual Meeting. Providence, Rhode Island, USA (2004)
9.
Zurück zum Zitat Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: 10th European Conference on Machine Learning, pp. 137–142 (1998). doi:10.1007/BFb0026683 Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: 10th European Conference on Machine Learning, pp. 137–142 (1998). doi:10.​1007/​BFb0026683
10.
Zurück zum Zitat Ko, Y., Seo, J.: Automatic text categorization by unsupervised learning. In: 18th Conference on Computational Linguistics, pp. 453–459 (2000). doi:10.3115/990820.990886 Ko, Y., Seo, J.: Automatic text categorization by unsupervised learning. In: 18th Conference on Computational Linguistics, pp. 453–459 (2000). doi:10.​3115/​990820.​990886
11.
Zurück zum Zitat Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: 15th International World Wide Web Conference (WWW), pp. 83–92 (2006) Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: 15th International World Wide Web Conference (WWW), pp. 83–92 (2006)
12.
Zurück zum Zitat Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word- and sense-based text categorization using several classification algorithms. J. Intell. Inf. Syst. 21(3), 227–247 (2003). doi:10.1023/A:1025554732352 CrossRef Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word- and sense-based text categorization using several classification algorithms. J. Intell. Inf. Syst. 21(3), 227–247 (2003). doi:10.​1023/​A:​1025554732352 CrossRef
13.
Zurück zum Zitat Attardi, G., Gulli, A., Sebastiani, F.: Automatic web page categorization by link and context analysis. In: 1st European Symposium on Telematics, Hypermedia and Artificial Intelligence, pp. 105–119 (1999) Attardi, G., Gulli, A., Sebastiani, F.: Automatic web page categorization by link and context analysis. In: 1st European Symposium on Telematics, Hypermedia and Artificial Intelligence, pp. 105–119 (1999)
14.
Zurück zum Zitat Khonji, M., Iraqi, Y., Jones, A.: Enhancing phishing e-mail classifiers: a lexical URL analysis approach. Int. J. Inf. Secur. Res. 6, 236–245 (2012) Khonji, M., Iraqi, Y., Jones, A.: Enhancing phishing e-mail classifiers: a lexical URL analysis approach. Int. J. Inf. Secur. Res. 6, 236–245 (2012)
15.
Zurück zum Zitat Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2009), pp. 1245–1254. ACM, New York (2009). doi:10.1145/1557019.1557153 Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2009), pp. 1245–1254. ACM, New York (2009). doi:10.​1145/​1557019.​1557153
16.
Zurück zum Zitat Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using URL features. In: Proceedings of 14th ACM International Conference on Information and Knowledge Management (CIKM 2005), pp. 325–326. ACM, New York (2005). doi:10.1145/1099554.1099649 Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using URL features. In: Proceedings of 14th ACM International Conference on Information and Knowledge Management (CIKM 2005), pp. 325–326. ACM, New York (2005). doi:10.​1145/​1099554.​1099649
18.
Zurück zum Zitat Patil, A.S., Pawar, B.V.: Automated classification of web sites using naive Bayesian algorithm. In: Proceedings of International MultiConference of Engineers and Computer Scientists, pp. 466–470 (2012) Patil, A.S., Pawar, B.V.: Automated classification of web sites using naive Bayesian algorithm. In: Proceedings of International MultiConference of Engineers and Computer Scientists, pp. 466–470 (2012)
19.
Zurück zum Zitat Riboni, D.: Feature selection for web page classification. In: Proceedings of Workshop on Web Content Mapping: A Challenge to ICT, pp. 121–128 (2002) Riboni, D.: Feature selection for web page classification. In: Proceedings of Workshop on Web Content Mapping: A Challenge to ICT, pp. 121–128 (2002)
20.
Zurück zum Zitat Meshkizadeh, S., Masoud-Rahmani, A.: Webpage classification based on compound of using HTML features & URL features and features of sibling pages. Int. J. Adv. Comput. Technol. 2(4), 36–46 (2010) Meshkizadeh, S., Masoud-Rahmani, A.: Webpage classification based on compound of using HTML features & URL features and features of sibling pages. Int. J. Adv. Comput. Technol. 2(4), 36–46 (2010)
21.
Zurück zum Zitat Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRef
22.
Zurück zum Zitat Qi, X., Davison, B.D.: Knowing a web page by the company it keeps. Proceedings of CIKM 2006, 228–237 (2006) Qi, X., Davison, B.D.: Knowing a web page by the company it keeps. Proceedings of CIKM 2006, 228–237 (2006)
23.
Zurück zum Zitat Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Int. J. Very Large Data Bases 7(3), 163–178 (1998)CrossRef Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Int. J. Very Large Data Bases 7(3), 163–178 (1998)CrossRef
24.
Zurück zum Zitat Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Goncalves, M.A.: Combining link-based and content-based methods for web document classification. In: Proceedings of CIKM 2003, New York, USA, pp. 394–401 (2003) Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Goncalves, M.A.: Combining link-based and content-based methods for web document classification. In: Proceedings of CIKM 2003, New York, USA, pp. 394–401 (2003)
25.
Zurück zum Zitat Liparas, D., HaCohen-Kerner, Y., Moumtzidou. A., Vrochidis, S., Kompatsiaris, I.: News articles classification using random forests and weighted multimodal features. In: Proceedings of 7th Information Retrieval Facility Conference (IRFC 2014), Copenhagen, Denmark, pp. 63–75 (2014) Liparas, D., HaCohen-Kerner, Y., Moumtzidou. A., Vrochidis, S., Kompatsiaris, I.: News articles classification using random forests and weighted multimodal features. In: Proceedings of 7th Information Retrieval Facility Conference (IRFC 2014), Copenhagen, Denmark, pp. 63–75 (2014)
26.
Zurück zum Zitat Mangai, J.A., Wagle, S.M., Kumar, V.S.: A novel web page classification model using an improved k nearest neighbor algorithm. In: Proceedings of 3rd International Conference on Intelligent Computational Systems (ICICS 2013), Singapore, pp. 49–53 (2013) Mangai, J.A., Wagle, S.M., Kumar, V.S.: A novel web page classification model using an improved k nearest neighbor algorithm. In: Proceedings of 3rd International Conference on Intelligent Computational Systems (ICICS 2013), Singapore, pp. 49–53 (2013)
Metadaten
Titel
Protection Against Information in eSociety: Using Data Mining Methods to Counteract Unwanted and Malicious Data
verfasst von
Igor Kotenko
Igor Saenko
Andrey Chechulin
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-69784-0_15