Skip to main content
Erschienen in: Annals of Data Science 2/2019

06.06.2018

Mining and Classifying Images from an Advertisement Image Remover

verfasst von: Graeme O’Meara

Erschienen in: Annals of Data Science | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

AdEater is an early browsing assistant that automatically removes advertisement images from internet pages. It works by generating rules from training data and implementing these rules when browsing the internet. Advertisement images on web pages are replaced by transparent images that display on the image the word “ad”, and where images are misclassified, non-advertisement images on a webpage will also be replaced by transparent images displaying “ad”. This paper critically examines the dataset derived from a trial of AdEater and tries to build a robust image classifier. We apply data mining techniques to uncover associations between features of advertisements and non-advertisements and try to predict whether the images are advertisements or non-advertisements based on three classification methods. We achieve classification accuracy of 96.5%, using k-fold cross validation to train and test the model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
Fiol-Roig et al. [2].
 
3
See Electronic Supplementary Material. Full code available upon request.
 
4
Agrawal et al. [8].
 
5
As indicated by the value 0. We converted these to 1 and converted the present (indicated by 1) values to 0 to look at rules between absent image features.
 
6
It was decided not to report lists of rules of more than 20 for reasons of brevity. These rules can be made available upon request.
 
7
\( \frac{{Lift \left( {A \to B} \right) - L}}{U - L} \).
 
9
Witten and Frank [10].
 
11
We applied the “binary” method in computing the distance in R.
 
12
Note that the data excludes the classifier variable and relates only to the 1554 image features.
 
13
Objective function: \( \mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{K} z_{ij} d\left( {x_{i} ,\mu_{j} } \right) \).
 
14
See Rand [11]. The Rand Index is between 0 and 1:
\( \frac{{\left( {\begin{array}{*{20}c} n \\ 2 \\ \end{array} } \right) + 2\mathop \sum \nolimits_{i = 1}^{{c_{1} }} \mathop \sum \nolimits_{j = 1}^{{c_{2} }} \left( {\begin{array}{*{20}c} {n_{ij} } \\ 2 \\ \end{array} } \right) - \left[ {\mathop \sum \nolimits_{i = 1}^{{c_{1} }} \left( {\begin{array}{*{20}c} {n._{j} } \\ 2 \\ \end{array} } \right) + \mathop \sum \nolimits_{j = 1}^{{c_{2} }} \left( {\begin{array}{*{20}c} {n._{i} } \\ 2 \\ \end{array} } \right)} \right]}} {\left({{\begin{array}{*{20}c}n \\ 2 \\ \end{array} }}\right) }\).
 
15
Hubert and Arabie [12] developed an adjusted Rand Index.
 
16
Other methods including bagging and random forest were implemented, however, due to run times, these could not make the final report.
 
17
We selected 10 for computing/run time reasons. This means that each fold has about 328 observations.
 
Literatur
1.
Zurück zum Zitat Kushmerick N (1999) Learning to remove internet advertisements. In: Agents’99, proceedings of the third annual conference on autonomous agents, pp 175–181 Kushmerick N (1999) Learning to remove internet advertisements. In: Agents’99, proceedings of the third annual conference on autonomous agents, pp 175–181
5.
Zurück zum Zitat Cohen S, Ruppin E, Dror G (2005) Feature selection based on the Shapley value. In: Proceedings of the 19th international joint conference on artificial intelligence, pp 665–670 Cohen S, Ruppin E, Dror G (2005) Feature selection based on the Shapley value. In: Proceedings of the 19th international joint conference on artificial intelligence, pp 665–670
8.
9.
Zurück zum Zitat McNicholas PD, Murphy TB, O’Regan M (2008) Standardising the life of an association rule. Comput Stat Data Anal 52(10):4712–4721CrossRef McNicholas PD, Murphy TB, O’Regan M (2008) Standardising the life of an association rule. Comput Stat Data Anal 52(10):4712–4721CrossRef
10.
Zurück zum Zitat Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Academic Press, New York Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Academic Press, New York
11.
Zurück zum Zitat Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRef
12.
Zurück zum Zitat Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRef Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRef
Metadaten
Titel
Mining and Classifying Images from an Advertisement Image Remover
verfasst von
Graeme O’Meara
Publikationsdatum
06.06.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 2/2019
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-018-0164-1

Weitere Artikel der Ausgabe 2/2019

Annals of Data Science 2/2019 Zur Ausgabe