Skip to main content

2020 | OriginalPaper | Buchkapitel

Semantic Keywords Clustering to Optimize Text Ads Campaigns

verfasst von : Pietro Fodra, Emmanuel Pasquet, Bruno Goutorbe, Guillaume Mohr, Matthieu Cornec

Erschienen in: Nonparametric Statistics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we describe how to use some well-known machine learning tools to make groups of textual queries of similar semantic meaning. Such a clusterization can be used to improve the performances of bidding algorithms for online advertising, by mutualizing the signal gathered by text ads displayed on result pages of search queries which share a similar meaning. Indeed, search engines organize auctions wherein participants bid on selected search terms on which they wish to display an ad. Generalist e-commerce companies such as Cdiscount bid simultaneously on millions of terms that reflect the diversity of their catalog of products, according to the expected profits associated with the ads. Methods to estimate these expected returns suffer from a sparsity of data, since most of the keywords have little or no historical signal. Grouping them and exploiting information on the most frequent keywords (short tail) to infer information on the less frequent ones (long tail), allow to anticipate the user behavior by semantics and improve the bidding strategy. The plan is the following: pre-process the keywords by stemming, choose an e-commerce training corpus for the Word2Vec model, train it, and perform an embedding into a euclidean space where we can cluster keywords thanks to a K-means algorithm. We validate our approach on a sub-sample of the keywords for which they have a non-semantic distance available. Finally, all the keywords in the same cluster share the same bid, which is computed aggregating the cluster historical signal.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baroni, Marco., Bernardini, Silvia., Ferraresi, Adriano, Zanchetta, Eros: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3), 209–226 (2009)CrossRef Baroni, Marco., Bernardini, Silvia., Ferraresi, Adriano, Zanchetta, Eros: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3), 209–226 (2009)CrossRef
2.
Zurück zum Zitat Sébastien Bubeck and Nicolò Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR, abs/1204.5721, 2012 Sébastien Bubeck and Nicolò Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR, abs/1204.5721, 2012
3.
Zurück zum Zitat Sven Kosub. A note on the triangle inequality for the jaccard distance. CoRR, abs/1612.02696, 2016 Sven Kosub. A note on the triangle inequality for the jaccard distance. CoRR, abs/1612.02696, 2016
4.
Zurück zum Zitat Lloyd, Stuart P.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28, 129–137 (1982)MathSciNetCrossRef Lloyd, Stuart P.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28, 129–137 (1982)MathSciNetCrossRef
5.
Zurück zum Zitat Edward Loper and Steven Bird. Nltk: The natural language toolkit. In In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics, 2002 Edward Loper and Steven Bird. Nltk: The natural language toolkit. In In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics, 2002
6.
Zurück zum Zitat Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013
7.
Zurück zum Zitat Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013 Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013
8.
Zurück zum Zitat Juan Ramos et al. Using tf-idf to determine word relevance in document queries Juan Ramos et al. Using tf-idf to determine word relevance in document queries
9.
Zurück zum Zitat Radim Řehůřek and Petr Sojka. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en Radim Řehůřek and Petr Sojka. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, May 2010. ELRA. http://​is.​muni.​cz/​publication/​884893/​en
10.
Zurück zum Zitat Van der Maaten, Laurens, Hinton, Geoffrey: Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008) Van der Maaten, Laurens, Hinton, Geoffrey: Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008)
Metadaten
Titel
Semantic Keywords Clustering to Optimize Text Ads Campaigns
verfasst von
Pietro Fodra
Emmanuel Pasquet
Bruno Goutorbe
Guillaume Mohr
Matthieu Cornec
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-57306-5_19