Skip to main content

2015 | OriginalPaper | Buchkapitel

Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale

verfasst von : Robert Meusel, Anna Primpeli, Christian Meilicke, Heiko Paulheim, Christian Bizer

Erschienen in: E-Commerce and Web Technologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Semantically annotated data, using markup languages like RDFa and Microdata, has become more and more publicly available in the Web, especially in the area of e-commerce. Thus, a large amount of structured product descriptions are freely available and can be used for various applications, such as product search or recommendation. However, little efforts have been made to analyze the categories of the available product descriptions. Although some products have an explicit category assigned, the categorization schemes vary a lot, as the products originate from thousands of different sites. This heterogeneity makes the use of supervised methods, which have been proposed by most previous works, hard to apply. Therefore, in this paper, we explain how distantly supervised approaches can be used to exploit the heterogeneous category information in order to map the products to set of target categories from an existing product catalogue. Our results show that, even though this task is by far not trivial, we can reach almost \(56\,\%\) accuracy for classifying products into 37 categories.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Similar to our previous works [9], we will analysis the data based on PLDs embedding certain vocabularies, classes and properties.
 
11
As for each class, only one example exists k needs to be set to 1, otherwise the method would consider other examples then the nearest, which by design belong to another class. This setup is equal to Nearest Centroid Classification, where each feature vector of Cat is equal to one centroid.
 
12
As stated before, such instances are counted as false negatives within the evaluation.
 
13
We thank Stefano Faralli for his valuable feedback and recommendations.
 
16
We also applied up-sampling of under-represented classes in the dataset, but the results did not improve.
 
Literatur
1.
Zurück zum Zitat Bizer, C., Eckert, K., Meusel, R., Mühleisen, H., Schuhmacher, M., Völker, J.: Deployment of RDFa, microdata, and microformats on the web – a quantitative analysis. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 17–32. Springer, Heidelberg (2013) CrossRef Bizer, C., Eckert, K., Meusel, R., Mühleisen, H., Schuhmacher, M., Völker, J.: Deployment of RDFa, microdata, and microformats on the web – a quantitative analysis. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 17–32. Springer, Heidelberg (2013) CrossRef
2.
Zurück zum Zitat Domingos, P., Lowd, D.: Markov logic: An interface layer for artificial intelligence. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–155 (2009)CrossRefMATH Domingos, P., Lowd, D.: Markov logic: An interface layer for artificial intelligence. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–155 (2009)CrossRefMATH
3.
Zurück zum Zitat Eberius, J., Thiele, M., Braunschweig, K., Lehner, W.: Top-k entity augmentation using consistent set covering. In: SSDBM 2015 (2015) Eberius, J., Thiele, M., Braunschweig, K., Lehner, W.: Top-k entity augmentation using consistent set covering. In: SSDBM 2015 (2015)
5.
Zurück zum Zitat Kolb, P.: Disco: A multilingual database of distributionally similar words.In: Proceedings of KONVENS (2008) Kolb, P.: Disco: A multilingual database of distributionally similar words.In: Proceedings of KONVENS (2008)
6.
Zurück zum Zitat Lehmberg, O., Ritze, D., Ristoski, P., Meusel, R., Paulheim, H., Bizer, C.: Mannheim Search Join Engine. Science, Services and Agents on the World Wide Web, Web Semantics (2015) Lehmberg, O., Ritze, D., Ristoski, P., Meusel, R., Paulheim, H., Bizer, C.: Mannheim Search Join Engine. Science, Services and Agents on the World Wide Web, Web Semantics (2015)
7.
Zurück zum Zitat Meusel, R., Bizer, C., Paulheim, H.: A web-scale study of the adoption and evolution of the schema.org vocabulary over time. In: Proceedings WIMS 2015, pp. 15:1–15:11. ACM, New York, NY, USA (2015) Meusel, R., Bizer, C., Paulheim, H.: A web-scale study of the adoption and evolution of the schema.org vocabulary over time. In: Proceedings WIMS 2015, pp. 15:1–15:11. ACM, New York, NY, USA (2015)
8.
Zurück zum Zitat Meusel, R., Paulheim, H.: Heuristics for fixing errors in deployed schema.org microdata. In: Extended Semantic Web Conference (2015) Meusel, R., Paulheim, H.: Heuristics for fixing errors in deployed schema.org microdata. In: Extended Semantic Web Conference (2015)
9.
Zurück zum Zitat Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014) Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014)
11.
Zurück zum Zitat Mika, P., Potter, T.: Metadata statistics for a large web corpus. In: LDOW 2012, CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012) Mika, P., Potter, T.: Metadata statistics for a large web corpus. In: LDOW 2012, CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012)
12.
Zurück zum Zitat Nguyen, H., Fuxman, A., Paparizos, S., Freire, J., Agrawal, R.: Synthesizing products for online catalogs. Proc. VLDB Endow. 4(7), 409–418 (2011)CrossRef Nguyen, H., Fuxman, A., Paparizos, S., Freire, J., Agrawal, R.: Synthesizing products for online catalogs. Proc. VLDB Endow. 4(7), 409–418 (2011)CrossRef
13.
Zurück zum Zitat Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting parallelism and symmetry for MAP inference in statistical relational models. In: Proceedings of the AAAI 2013 (2013) Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting parallelism and symmetry for MAP inference in statistical relational models. In: Proceedings of the AAAI 2013 (2013)
14.
Zurück zum Zitat Patel-Schneider, P.F.: Analyzing schema.org. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 261–276. Springer, Heidelberg (2014) Patel-Schneider, P.F.: Analyzing schema.org. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 261–276. Springer, Heidelberg (2014)
15.
Zurück zum Zitat Petrovski, P., Bryl, V., Bizer, C.: Integrating product data from websites offering microdata markup. In: DEOS 2014 (2014) Petrovski, P., Bryl, V., Bizer, C.: Integrating product data from websites offering microdata markup. In: DEOS 2014 (2014)
16.
Zurück zum Zitat Qiu, D., Barbosa, L., Dong, X.L., Shen, Y., Srivastava, D.: Dexter: Large-scale discovery and extraction of product specifications on the web. Proc. VLDB Endowment 8(13), 2194–2205 (2015)CrossRef Qiu, D., Barbosa, L., Dong, X.L., Shen, Y., Srivastava, D.: Dexter: Large-scale discovery and extraction of product specifications on the web. Proc. VLDB Endowment 8(13), 2194–2205 (2015)CrossRef
17.
Zurück zum Zitat Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to dbpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015) Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to dbpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015)
Metadaten
Titel
Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale
verfasst von
Robert Meusel
Anna Primpeli
Christian Meilicke
Heiko Paulheim
Christian Bizer
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-27729-5_7

Premium Partner