Skip to main content
Top

2015 | OriginalPaper | Chapter

Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale

Authors : Robert Meusel, Anna Primpeli, Christian Meilicke, Heiko Paulheim, Christian Bizer

Published in: E-Commerce and Web Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Semantically annotated data, using markup languages like RDFa and Microdata, has become more and more publicly available in the Web, especially in the area of e-commerce. Thus, a large amount of structured product descriptions are freely available and can be used for various applications, such as product search or recommendation. However, little efforts have been made to analyze the categories of the available product descriptions. Although some products have an explicit category assigned, the categorization schemes vary a lot, as the products originate from thousands of different sites. This heterogeneity makes the use of supervised methods, which have been proposed by most previous works, hard to apply. Therefore, in this paper, we explain how distantly supervised approaches can be used to exploit the heterogeneous category information in order to map the products to set of target categories from an existing product catalogue. Our results show that, even though this task is by far not trivial, we can reach almost \(56\,\%\) accuracy for classifying products into 37 categories.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
Similar to our previous works [9], we will analysis the data based on PLDs embedding certain vocabularies, classes and properties.
 
11
As for each class, only one example exists k needs to be set to 1, otherwise the method would consider other examples then the nearest, which by design belong to another class. This setup is equal to Nearest Centroid Classification, where each feature vector of Cat is equal to one centroid.
 
12
As stated before, such instances are counted as false negatives within the evaluation.
 
13
We thank Stefano Faralli for his valuable feedback and recommendations.
 
16
We also applied up-sampling of under-represented classes in the dataset, but the results did not improve.
 
Literature
1.
go back to reference Bizer, C., Eckert, K., Meusel, R., Mühleisen, H., Schuhmacher, M., Völker, J.: Deployment of RDFa, microdata, and microformats on the web – a quantitative analysis. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 17–32. Springer, Heidelberg (2013) CrossRef Bizer, C., Eckert, K., Meusel, R., Mühleisen, H., Schuhmacher, M., Völker, J.: Deployment of RDFa, microdata, and microformats on the web – a quantitative analysis. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 17–32. Springer, Heidelberg (2013) CrossRef
2.
go back to reference Domingos, P., Lowd, D.: Markov logic: An interface layer for artificial intelligence. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–155 (2009)CrossRefMATH Domingos, P., Lowd, D.: Markov logic: An interface layer for artificial intelligence. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–155 (2009)CrossRefMATH
3.
go back to reference Eberius, J., Thiele, M., Braunschweig, K., Lehner, W.: Top-k entity augmentation using consistent set covering. In: SSDBM 2015 (2015) Eberius, J., Thiele, M., Braunschweig, K., Lehner, W.: Top-k entity augmentation using consistent set covering. In: SSDBM 2015 (2015)
5.
go back to reference Kolb, P.: Disco: A multilingual database of distributionally similar words.In: Proceedings of KONVENS (2008) Kolb, P.: Disco: A multilingual database of distributionally similar words.In: Proceedings of KONVENS (2008)
6.
go back to reference Lehmberg, O., Ritze, D., Ristoski, P., Meusel, R., Paulheim, H., Bizer, C.: Mannheim Search Join Engine. Science, Services and Agents on the World Wide Web, Web Semantics (2015) Lehmberg, O., Ritze, D., Ristoski, P., Meusel, R., Paulheim, H., Bizer, C.: Mannheim Search Join Engine. Science, Services and Agents on the World Wide Web, Web Semantics (2015)
7.
go back to reference Meusel, R., Bizer, C., Paulheim, H.: A web-scale study of the adoption and evolution of the schema.org vocabulary over time. In: Proceedings WIMS 2015, pp. 15:1–15:11. ACM, New York, NY, USA (2015) Meusel, R., Bizer, C., Paulheim, H.: A web-scale study of the adoption and evolution of the schema.org vocabulary over time. In: Proceedings WIMS 2015, pp. 15:1–15:11. ACM, New York, NY, USA (2015)
8.
go back to reference Meusel, R., Paulheim, H.: Heuristics for fixing errors in deployed schema.org microdata. In: Extended Semantic Web Conference (2015) Meusel, R., Paulheim, H.: Heuristics for fixing errors in deployed schema.org microdata. In: Extended Semantic Web Conference (2015)
9.
go back to reference Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014) Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014)
11.
go back to reference Mika, P., Potter, T.: Metadata statistics for a large web corpus. In: LDOW 2012, CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012) Mika, P., Potter, T.: Metadata statistics for a large web corpus. In: LDOW 2012, CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012)
12.
go back to reference Nguyen, H., Fuxman, A., Paparizos, S., Freire, J., Agrawal, R.: Synthesizing products for online catalogs. Proc. VLDB Endow. 4(7), 409–418 (2011)CrossRef Nguyen, H., Fuxman, A., Paparizos, S., Freire, J., Agrawal, R.: Synthesizing products for online catalogs. Proc. VLDB Endow. 4(7), 409–418 (2011)CrossRef
13.
go back to reference Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting parallelism and symmetry for MAP inference in statistical relational models. In: Proceedings of the AAAI 2013 (2013) Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting parallelism and symmetry for MAP inference in statistical relational models. In: Proceedings of the AAAI 2013 (2013)
14.
go back to reference Patel-Schneider, P.F.: Analyzing schema.org. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 261–276. Springer, Heidelberg (2014) Patel-Schneider, P.F.: Analyzing schema.org. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 261–276. Springer, Heidelberg (2014)
15.
go back to reference Petrovski, P., Bryl, V., Bizer, C.: Integrating product data from websites offering microdata markup. In: DEOS 2014 (2014) Petrovski, P., Bryl, V., Bizer, C.: Integrating product data from websites offering microdata markup. In: DEOS 2014 (2014)
16.
go back to reference Qiu, D., Barbosa, L., Dong, X.L., Shen, Y., Srivastava, D.: Dexter: Large-scale discovery and extraction of product specifications on the web. Proc. VLDB Endowment 8(13), 2194–2205 (2015)CrossRef Qiu, D., Barbosa, L., Dong, X.L., Shen, Y., Srivastava, D.: Dexter: Large-scale discovery and extraction of product specifications on the web. Proc. VLDB Endowment 8(13), 2194–2205 (2015)CrossRef
17.
go back to reference Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to dbpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015) Ritze, D., Lehmberg, O., Bizer, C.: Matching html tables to dbpedia. In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, p. 10. ACM (2015)
Metadata
Title
Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale
Authors
Robert Meusel
Anna Primpeli
Christian Meilicke
Heiko Paulheim
Christian Bizer
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-27729-5_7

Premium Partner