Skip to main content
Erschienen in: Soft Computing 5/2017

04.09.2015 | Methodologies and Application

Searching the Web for illegal content: the anatomy of a semantic search engine

verfasst von: Luigi Laura, Gianluigi Me

Erschienen in: Soft Computing | Ausgabe 5/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we describe the challenges in the realization of a semantic search engine, suited to help law enforcements in the fight against the online drug marketplaces, where new psychoactive substances are sold. This search engine has been developed under the Semantic Illegal Content Hunter (SICH) Project, with the financial support of the Prevention of and Fight Against Crime Programme ISEC 2012 European Commission. The SICH Project-specific objective is to develop new strategic tools and assessment techniques, based on semantic analysis on texts, to support the dynamic mapping and the automatic identification of illegal content over the Net. In particular, a Web search engine can be roughly divided into three main components: (a) the crawler that is in charge of collecting the Web pages to be indexed, (b) the indexer that parses and stores the collected data and (c) the query processor that interacts with the user parsing a query and returning the relevant document; in this paper, we detail each of these components of the SICH search engine, highlighting the differences from a traditional Web search engine.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, vol 463. ACM Press, New York
Zurück zum Zitat Bitcoin (2011) Bitcoin P2P digital currency Bitcoin (2011) Bitcoin P2P digital currency
Zurück zum Zitat Brandes U, Gaertler M, Wagner D (2003) Experiments on graph clustering algorithms. Springer, New YorkCrossRefMATH Brandes U, Gaertler M, Wagner D (2003) Experiments on graph clustering algorithms. Springer, New YorkCrossRefMATH
Zurück zum Zitat Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRef Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRef
Zurück zum Zitat Camastra F, Ciaramella A, Staiano A (2013) Machine learning and soft computing for ICT security: an overview of current trends. J Ambient Intell Humaniz Comput 4(2):235–247CrossRef Camastra F, Ciaramella A, Staiano A (2013) Machine learning and soft computing for ICT security: an overview of current trends. J Ambient Intell Humaniz Comput 4(2):235–247CrossRef
Zurück zum Zitat Cho J, Garcia-Molina H (2002) Parallel crawlers. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 124–135 Cho J, Garcia-Molina H (2002) Parallel crawlers. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 124–135
Zurück zum Zitat Corazza O, Assi S, Simonato P, Corkery J, Bersani FS, Demetrovics Z, Stair J, Fergus S, Pezzolesi C, Pasinetti M, Deluca P, Drummond C, Davey Z, Blaszko U, Moskalewicz J, Mervo B, Furia LD, Farre M, Flesland L, Pisarska A, Shapiro H, Siemann H, Skutle A, Sferrazza E, Torrens M, Sambola F, van der Kreeft P, Scherbaum N, Schifano F (2013) Promoting innovation and excellence to face the rapid diffusion of novel psychoactive substances in the EU: the outcomes of the rednet project. Hum Psychopharmacol Clin Exp 28(4):317–323CrossRef Corazza O, Assi S, Simonato P, Corkery J, Bersani FS, Demetrovics Z, Stair J, Fergus S, Pezzolesi C, Pasinetti M, Deluca P, Drummond C, Davey Z, Blaszko U, Moskalewicz J, Mervo B, Furia LD, Farre M, Flesland L, Pisarska A, Shapiro H, Siemann H, Skutle A, Sferrazza E, Torrens M, Sambola F, van der Kreeft P, Scherbaum N, Schifano F (2013) Promoting innovation and excellence to face the rapid diffusion of novel psychoactive substances in the EU: the outcomes of the rednet project. Hum Psychopharmacol Clin Exp 28(4):317–323CrossRef
Zurück zum Zitat Corazza O, Valeriani G, Bersani FS, Corkery J, Martinotti G, Bersani G, Schifano F (2014) “Spice”, “Kryptonite”, “Black Mamba”: an overview of brand names and marketing strategies of novel psychoactive substances on the Web. J Psychoact Drugs 46(4):287–294CrossRef Corazza O, Valeriani G, Bersani FS, Corkery J, Martinotti G, Bersani G, Schifano F (2014) “Spice”, “Kryptonite”, “Black Mamba”: an overview of brand names and marketing strategies of novel psychoactive substances on the Web. J Psychoact Drugs 46(4):287–294CrossRef
Zurück zum Zitat Deluca P, Davey Z, Corazza O, Furia LD, Farre M, Flesland LH, Mannonen M, Majava A, Peltoniemi T, Pasinetti M, Pezzolesi C, Scherbaum N, Siemann H, Skutle A, Torrens M, van der Kreeft P, Iversen E, Schifano F (2012) Identifying emerging trends in recreational drug use; outcomes from the psychonaut web mapping project. Prog Neuro Psychopharmacol Biol Psychiatr 39(2):221–226 (new drugs of abuse)CrossRef Deluca P, Davey Z, Corazza O, Furia LD, Farre M, Flesland LH, Mannonen M, Majava A, Peltoniemi T, Pasinetti M, Pezzolesi C, Scherbaum N, Siemann H, Skutle A, Torrens M, van der Kreeft P, Iversen E, Schifano F (2012) Identifying emerging trends in recreational drug use; outcomes from the psychonaut web mapping project. Prog Neuro Psychopharmacol Biol Psychiatr 39(2):221–226 (new drugs of abuse)CrossRef
Zurück zum Zitat Diestel R (2012) Graph theory, Graduate texts in mathematics, vol 173, 4th edn. Springer, Heidelberg Diestel R (2012) Graph theory, Graduate texts in mathematics, vol 173, 4th edn. Springer, Heidelberg
Zurück zum Zitat Fruchterman TM, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164CrossRef Fruchterman TM, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164CrossRef
Zurück zum Zitat Han X, Ma J, Wu Y, Cui C (2014) A novel machine learning approach to rank web forum posts. Soft Comput 18(5):941–959CrossRef Han X, Ma J, Wu Y, Cui C (2014) A novel machine learning approach to rank web forum posts. Soft Comput 18(5):941–959CrossRef
Zurück zum Zitat Hoque E, Hoeber O, Strong G, Gong M (2013) Combining conceptual query expansion and visual search results exploration for web image retrieval. J Ambient Intell Humaniz Comput 4(3):389–400CrossRef Hoque E, Hoeber O, Strong G, Gong M (2013) Combining conceptual query expansion and visual search results exploration for web image retrieval. J Ambient Intell Humaniz Comput 4(3):389–400CrossRef
Zurück zum Zitat Hout MCV, Bingham T (2013a) Silk Road, the virtual drug marketplace: a single case study of user experiences. Int J Drug Policy 24(5):385–391 Hout MCV, Bingham T (2013a) Silk Road, the virtual drug marketplace: a single case study of user experiences. Int J Drug Policy 24(5):385–391
Zurück zum Zitat Hout MCV, Bingham T (2013b) Surfing the Silk Road: a study of users experiences. Int J Drug Policy 24(6):524–529 Hout MCV, Bingham T (2013b) Surfing the Silk Road: a study of users experiences. Int J Drug Policy 24(6):524–529
Zurück zum Zitat Hout MCV, Bingham T (2014) Responsible vendors, intelligent consumers: Silk road, the online revolution in drug trading. Int J Drug Policy 25(2):183–189CrossRef Hout MCV, Bingham T (2014) Responsible vendors, intelligent consumers: Silk road, the online revolution in drug trading. Int J Drug Policy 25(2):183–189CrossRef
Zurück zum Zitat Jansen BJ (2006) Adversarial information retrieval aspects of sponsored search. In: AIRWeb, pp 33–36 Jansen BJ (2006) Adversarial information retrieval aspects of sponsored search. In: AIRWeb, pp 33–36
Zurück zum Zitat Laura L, Me G (2015) Searching the web for illegal content: the anatomy of a semantic search engine. In: Proceedings of the 10th international conference on global security, safety & sustainability. Springer Laura L, Me G (2015) Searching the web for illegal content: the anatomy of a semantic search engine. In: Proceedings of the 10th international conference on global security, safety & sustainability. Springer
Zurück zum Zitat Maleki-Dizaji S, Siddiqi J, Soltan-Zadeh Y, Rahman F (2014) Adaptive information retrieval system via modelling user behaviour. J Ambient Intell Humaniz Comput 5(1):105–110CrossRef Maleki-Dizaji S, Siddiqi J, Soltan-Zadeh Y, Rahman F (2014) Adaptive information retrieval system via modelling user behaviour. J Ambient Intell Humaniz Comput 5(1):105–110CrossRef
Zurück zum Zitat Nikravesh M, Loia V, Azvine B (2002) Fuzzy logic and the internet (flint): Internet, world wide web, and search engines. Soft Comput 6(5):287–299CrossRefMATH Nikravesh M, Loia V, Azvine B (2002) Fuzzy logic and the internet (flint): Internet, world wide web, and search engines. Soft Comput 6(5):287–299CrossRefMATH
Zurück zum Zitat Ogiela M, Sukowski P (2014) Protocol for irreversible off-line transactions in anonymous electronic currency exchange. Soft Comput 18(12):2587–2594CrossRef Ogiela M, Sukowski P (2014) Protocol for irreversible off-line transactions in anonymous electronic currency exchange. Soft Comput 18(12):2587–2594CrossRef
Zurück zum Zitat Pereira RAM, Molinari A, Pasi G (2005) Contextual weighted representations and indexing models for the retrieval of html documents. Soft Comput 9(7):481–492CrossRef Pereira RAM, Molinari A, Pasi G (2005) Contextual weighted representations and indexing models for the retrieval of html documents. Soft Comput 9(7):481–492CrossRef
Zurück zum Zitat Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, San FranciscoMATH Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, San FranciscoMATH
Metadaten
Titel
Searching the Web for illegal content: the anatomy of a semantic search engine
verfasst von
Luigi Laura
Gianluigi Me
Publikationsdatum
04.09.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 5/2017
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-015-1857-4

Weitere Artikel der Ausgabe 5/2017

Soft Computing 5/2017 Zur Ausgabe