Skip to main content

2018 | OriginalPaper | Buchkapitel

Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace

verfasst von : Haitao Xu, Zhao Li, Chen Chu, Yuanmi Chen, Yifan Yang, Haifeng Lu, Haining Wang, Angelos Stavrou

Erschienen in: Computer Security

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A certain amount of web traffic is attributed to web bots on the Internet. Web bot traffic has raised serious concerns among website operators, because they usually consume considerable resources at web servers, resulting in high workloads and longer response time, while not bringing in any profit. Even worse, the content of the pages it crawled might later be used for other fraudulent activities. Thus, it is important to detect web bot traffic and characterize it. In this paper, we first propose an efficient approach to detect web bot traffic in a large e-commerce marketplace and then perform an in-depth analysis on the characteristics of web bot traffic. Specifically, our proposed bot detection approach consists of the following modules: (1) an Expectation Maximization (EM)-based feature selection method to extract the most distinguishable features, (2) a gradient based decision tree to calculate the likelihood of being a bot IP, and (3) a threshold estimation mechanism aiming to recover a reasonable amount of non-bot traffic flow. The detection approach has been applied on Taobao/Tmall platforms, and its detection capability has been demonstrated by identifying a considerable amount of web bot traffic. Based on data samples of traffic originating from web bots and normal users, we conduct a comparative analysis to uncover the behavioral patterns of web bots different from normal users. The analysis results reveal their differences in terms of active time, search queries, item and store preferences, and many other aspects. These findings provide new insights for public websites to further improve web bot traffic detection for protecting valuable web contents.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Click-through rate is calculated as the total number of clicks on the product detail webpage divided by the number of impressions of the product information in the Taobao/Tmall search engine return results.
 
2
Both the y-axis and x-axis (denoting the time in 2015) values are deliberately hidden for confidentiality reasons.
 
3
When you search a keyword in e-commerce website, the resulting page is the so-called search result page. A search result page view is a page view of search result, for example searching a keyword or going to the next page in search result page.
 
4
We use the two terms “wireless” and“mobile” interchangeably.
 
5
Note that not all not-logged-on visitors were deemed as bots by Alibaba IT teams. In addition, a user could be logged on both PC and wireless devices.
 
6
Note that the favorite here refers to the favorite feature provided by the e-commerce sites, rather than the bookmark features of modern web browsers.
 
7
In essence, a direct click on an item is the same as a direct access to the item’s detail page via its URL.
 
8
The number was calculated with the dataset and cannot be inferred from Fig. 14.
 
9
An item is deemed visited if its detail page is viewed or retrieved.
 
Literatur
10.
Zurück zum Zitat Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: IMC (2011) Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: IMC (2011)
11.
Zurück zum Zitat Weng, H., Li, Z., et al.: Online e-commerce fraud: a large-scale detection and analysis. In: ICDE (2018) Weng, H., Li, Z., et al.: Online e-commerce fraud: a large-scale detection and analysis. In: ICDE (2018)
12.
Zurück zum Zitat Su, N., Liu, Y., et al.: Detecting crowdturfing “add to favorites” activities in online shopping. In: WWW (2018) Su, N., Liu, Y., et al.: Detecting crowdturfing “add to favorites” activities in online shopping. In: WWW (2018)
13.
Zurück zum Zitat Quinlan, J.R.: Generating production rules from decision trees. In: IJCAI (1987) Quinlan, J.R.: Generating production rules from decision trees. In: IJCAI (1987)
14.
Zurück zum Zitat Meiss, M., Menczer, F., Vespignani, A.: On the lack of typical behavior in the global web traffic network. In: WWW (2005) Meiss, M., Menczer, F., Vespignani, A.: On the lack of typical behavior in the global web traffic network. In: WWW (2005)
15.
Zurück zum Zitat Lan, K., Hussain, A., Dutta, D.: Effect of malicious traffic on the network. In: PAM (2003) Lan, K., Hussain, A., Dutta, D.: Effect of malicious traffic on the network. In: PAM (2003)
16.
Zurück zum Zitat Buehrer, G., Stokes, J.W., Chellapilla, K.: A large-scale study of automated web search traffic. In: AIRWeb (2008) Buehrer, G., Stokes, J.W., Chellapilla, K.: A large-scale study of automated web search traffic. In: AIRWeb (2008)
17.
Zurück zum Zitat Adar, E., Teevan, J., Dumais, S.T.: Large scale analysis of web revisitation patterns. In: CHI (2008) Adar, E., Teevan, J., Dumais, S.T.: Large scale analysis of web revisitation patterns. In: CHI (2008)
18.
Zurück zum Zitat Goseva-Popstojanova, K., Anastasovski, G., Dimitrijevikj, A., Pantev, R., Miller, B.: Characterization and classification of malicious web traffic. Comput. Secur. 42, 92–115 (2014)CrossRef Goseva-Popstojanova, K., Anastasovski, G., Dimitrijevikj, A., Pantev, R., Miller, B.: Characterization and classification of malicious web traffic. Comput. Secur. 42, 92–115 (2014)CrossRef
19.
Zurück zum Zitat Suchacka, G., Sobków, M.: Detection of Internet robots using a Bayesian approach. In: IEEE 2nd International Conference on Cybernetics (CYBCONF) (2015) Suchacka, G., Sobków, M.: Detection of Internet robots using a Bayesian approach. In: IEEE 2nd International Conference on Cybernetics (CYBCONF) (2015)
20.
Zurück zum Zitat McKenna, S.F.: Detection and classification of Web robots with honeypots. Naval Postgraduate School (2016) McKenna, S.F.: Detection and classification of Web robots with honeypots. Naval Postgraduate School (2016)
21.
Zurück zum Zitat Rude, H.N.: Intelligent caching to mitigate the impact of web robots on web servers. Wright State University (2016) Rude, H.N.: Intelligent caching to mitigate the impact of web robots on web servers. Wright State University (2016)
22.
Zurück zum Zitat Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: ICMLA (2015) Rude, H.N., Doran, D.: Request type prediction for web robot and internet of things traffic. In: ICMLA (2015)
23.
Zurück zum Zitat Koehl, A., Wang, H.: Surviving a search engine overload. In: WWW (2012) Koehl, A., Wang, H.: Surviving a search engine overload. In: WWW (2012)
24.
Zurück zum Zitat Gummadi, R., Balakrishnan, H., Maniatis, P., Ratnasamy, S.: Not-a-Bot: improving service availability in the face of botnet attacks. In: NSDI (2009) Gummadi, R., Balakrishnan, H., Maniatis, P., Ratnasamy, S.: Not-a-Bot: improving service availability in the face of botnet attacks. In: NSDI (2009)
25.
Zurück zum Zitat Jamshed, M.A., Kim, W., Park, K.: Suppressing bot traffic with accurate human attestation. In: Proceedings of the First ACM Asia-Pacific Workshop on Workshop on Systems (2010) Jamshed, M.A., Kim, W., Park, K.: Suppressing bot traffic with accurate human attestation. In: Proceedings of the First ACM Asia-Pacific Workshop on Workshop on Systems (2010)
26.
Zurück zum Zitat Kang, H., Wang, K., Soukal, D., Behr, F., Zheng, Z.: Large-scale bot detection for search engines. In: WWW (2010) Kang, H., Wang, K., Soukal, D., Behr, F., Zheng, Z.: Large-scale bot detection for search engines. In: WWW (2010)
27.
Zurück zum Zitat Xu, H., Liu, D., Wang, H., Stavrou, A.: E-commerce reputation manipulation: the emergence of reputation-escalation-as-a-service. In: WWW (2015) Xu, H., Liu, D., Wang, H., Stavrou, A.: E-commerce reputation manipulation: the emergence of reputation-escalation-as-a-service. In: WWW (2015)
28.
Zurück zum Zitat Kohavi, R., Parekh, R.: Ten supplementary analyses to improve e-commerce web sites. In: SIGKDD Workshop (2003) Kohavi, R., Parekh, R.: Ten supplementary analyses to improve e-commerce web sites. In: SIGKDD Workshop (2003)
29.
Zurück zum Zitat Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50(7), 80–84 (2017)CrossRef Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50(7), 80–84 (2017)CrossRef
Metadaten
Titel
Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace
verfasst von
Haitao Xu
Zhao Li
Chen Chu
Yuanmi Chen
Yifan Yang
Haifeng Lu
Haining Wang
Angelos Stavrou
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-98989-1_8