Skip to main content

2016 | OriginalPaper | Buchkapitel

Extracting Product Offers from e-Shop Websites

verfasst von : Andrea Horch, Holger Kett, Anette Weisbecker

Erschienen in: Web Information Systems and Technologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

On-line retailers as well as e-shoppers are very interested in gathering product records from the Web in order to compare products and prices. The consumers compare products and prices to find the best price for a specific product or they want to identify alternatives for a product whereas the on-line retailers need to compare their offers with those of their competitors for being able to remain competitive. As there is a huge number and vast array of product offers in the Web the product data needs to be collected through an automated approach. The contribution of this papers is a novel approach for automatically identify and extract product records from arbitrary e-shop websites. The approach extends an existing technique which is called Tag Path Clustering for clustering similar HTML tag paths. The clustering mechanism is combined with a novel filtering mechanism for identifying the product records to be extracted within the websites.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Simon, H., Fassnacht, M.: Preismanagement: Strategie - Analyse - Entscheidung - Umsetzung. Gabler Verlag, Wiesbaden (2008) Simon, H., Fassnacht, M.: Preismanagement: Strategie - Analyse - Entscheidung - Umsetzung. Gabler Verlag, Wiesbaden (2008)
4.
Zurück zum Zitat Grigalis, T.: Towards web-scale structured web data extraction. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 753–758 (2013) Grigalis, T.: Towards web-scale structured web data extraction. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 753–758 (2013)
5.
Zurück zum Zitat Grigalis, T., Cenys, A.: Unsupervised structured data extraction from template-generated web pages. J. Univ. Comput. Sci. 20, 169–192 (2014) Grigalis, T., Cenys, A.: Unsupervised structured data extraction from template-generated web pages. J. Univ. Comput. Sci. 20, 169–192 (2014)
6.
Zurück zum Zitat Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 601–606 (2003) Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 601–606 (2003)
7.
Zurück zum Zitat Zhao, H., et al.: Fully automatic wrapper generation for search engines. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 66–75 (2005) Zhao, H., et al.: Fully automatic wrapper generation for search engines. In: Proceedings of the 14th International Conference on World Wide Web, WWW 2005, pp. 66–75 (2005)
8.
Zurück zum Zitat Walther, M., er al.: Locating and extracting product specifications from producer websites. In: Proceedings of the 12th International Conference on Enterprise Information Systems, ICEIS 2010, pp. 13–22 (2010) Walther, M., er al.: Locating and extracting product specifications from producer websites. In: Proceedings of the 12th International Conference on Enterprise Information Systems, ICEIS 2010, pp. 13–22 (2010)
9.
Zurück zum Zitat Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data, pp. 1–14. Springer, Heidelberg (2006) Liu, B.: Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data, pp. 1–14. Springer, Heidelberg (2006)
10.
Zurück zum Zitat Andreson, N., Hong, J.: Visually extracting data records from the deep web. In: Proceedings of the 22nd International World Wide Web Conference, WWW 2013, pp. 1233–1238 (2013) Andreson, N., Hong, J.: Visually extracting data records from the deep web. In: Proceedings of the 22nd International World Wide Web Conference, WWW 2013, pp. 1233–1238 (2013)
11.
Zurück zum Zitat Real, R., Vargas, J.M.: The probabilistic basis of jaccard’s index of similarity. Syst. Biol. 3, 380–385 (1996)CrossRef Real, R., Vargas, J.M.: The probabilistic basis of jaccard’s index of similarity. Syst. Biol. 3, 380–385 (1996)CrossRef
12.
Zurück zum Zitat Horch, A., Kett, H., Weisbecker, A.: A lightweight approach for extracting product records from the web. In: Proceedings of the 11th International Conference on Web Information Systems and Technologies, WEBIST 2015, pp. 420–430 (2015) Horch, A., Kett, H., Weisbecker, A.: A lightweight approach for extracting product records from the web. In: Proceedings of the 11th International Conference on Web Information Systems and Technologies, WEBIST 2015, pp. 420–430 (2015)
13.
Zurück zum Zitat Peters, J.F.: Topology of Digital Images. Visual Pattern Discovery in Proximity Spaces. ISRL, vol. 63, pp. 1–76. Springer, Heidelberg (2014)CrossRefMATH Peters, J.F.: Topology of Digital Images. Visual Pattern Discovery in Proximity Spaces. ISRL, vol. 63, pp. 1–76. Springer, Heidelberg (2014)CrossRefMATH
15.
Zurück zum Zitat Van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, New York (1979)MATH Van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, New York (1979)MATH
Metadaten
Titel
Extracting Product Offers from e-Shop Websites
verfasst von
Andrea Horch
Holger Kett
Anette Weisbecker
Copyright-Jahr
2016
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-30996-5_12

Premium Partner