Skip to main content
Erschienen in: World Wide Web 6/2016

01.11.2016

Following the dynamic block on the Web

verfasst von: Sha Hu, Ji-Rong Wen, Zhicheng Dou, Shuo Shang

Erschienen in: World Wide Web | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapid changes in dynamic web pages, there is an increasing need for receiving instant updates for dynamic blocks on the Web. In this paper, we address the problem of automatically following dynamic blocks in web pages. Given a user-specified block on a web page, we continuously track the content of the block and report the updates in real time. This service can bring obvious benefits to users, such as the ability to track top-ten breaking news on CNN, the prices of iPhones on Amazon, or NBA game scores. We study 3,346 human labeled blocks from 1,127 pages, and analyze the effectiveness of four types of patterns, namely visual area, DOM tree path, inner content and close context, for tracking content blocks. Because of frequent web page changes, we find that the initial patterns generated on the original page could be invalidated over time, leading to the failure of extracting correct blocks. According to our observations, we combine different patterns to improve the accuracy and stability of block extractions. Moreover, we propose an adaptive model that adapts each pattern individually and adjusts pattern weights for an improved combination. The experimental results show that the proposed models outperform existing approaches, with the adaptive model performing the best.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adar, E., Dontcheva, M., Fogarty, J., Weld, D.S.: Zoetrope: interacting with the ephemeral web. In: Proceedings of the 21st annual ACM symposium on User interface software and technology, UIST 08, p. 239C248, CA, USA (2008) Adar, E., Dontcheva, M., Fogarty, J., Weld, D.S.: Zoetrope: interacting with the ephemeral web. In: Proceedings of the 21st annual ACM symposium on User interface software and technology, UIST 08, p. 239C248, CA, USA (2008)
2.
Zurück zum Zitat Adar, E., Teevan, J., Dumais, S.T.: Resonance on the web: Web dynamics and revisitation patterns. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 09, p. 1381C1390, MA, USA (2009) Adar, E., Teevan, J., Dumais, S.T.: Resonance on the web: Web dynamics and revisitation patterns. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 09, p. 1381C1390, MA, USA (2009)
3.
Zurück zum Zitat Adar, E., Teevan, J., Dumais, S.T., Elsas, J.L.: The web changes everything: Understanding the dynamics of web content. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM 09, p. 282C291, Barcelona, Spain (2009) Adar, E., Teevan, J., Dumais, S.T., Elsas, J.L.: The web changes everything: Understanding the dynamics of web content. In: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM 09, p. 282C291, Barcelona, Spain (2009)
4.
Zurück zum Zitat Agrawal, N., Ananthanarayanan, R., Gupta, R., Joshi, S., Krishnapuram, R., Negi, S.: Eshopmonitor: A web content monitoring tool. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 04, p. 817C820, MA, USA (2004) Agrawal, N., Ananthanarayanan, R., Gupta, R., Joshi, S., Krishnapuram, R., Negi, S.: Eshopmonitor: A web content monitoring tool. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 04, p. 817C820, MA, USA (2004)
5.
Zurück zum Zitat Anderson, C.R., Horvitz, E.: Web montage: A dynamic personalized start page. In: Proceedings of the 11th International Conference on World Wide Web, WWW 02, p. 704C712, Hawaii, USA (2002) Anderson, C.R., Horvitz, E.: Web montage: A dynamic personalized start page. In: Proceedings of the 11th International Conference on World Wide Web, WWW 02, p. 704C712, Hawaii, USA (2002)
6.
Zurück zum Zitat Boyapati, V., Chevrier, K., Finkel, A., Glance, N., Pierce, T., Stockton, R., Whitmer, C.: Changedetector: A site-level monitoring tool for the www. In: Proceedings of WWW 2002, p. 570C579, Hawaii, USA (2002) Boyapati, V., Chevrier, K., Finkel, A., Glance, N., Pierce, T., Stockton, R., Whitmer, C.: Changedetector: A site-level monitoring tool for the www. In: Proceedings of WWW 2002, p. 570C579, Hawaii, USA (2002)
7.
Zurück zum Zitat Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Vips: a vision-based page segmentation algorithm. In: Microsoft Technical Report, p. MSRCTRC2003C79 (2003) Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Vips: a vision-based page segmentation algorithm. In: Microsoft Technical Report, p. MSRCTRC2003C79 (2003)
8.
Zurück zum Zitat Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Block-based web search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 04, p. 456C463, Sheffield, United Kingdom (2004) Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Block-based web search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 04, p. 456C463, Sheffield, United Kingdom (2004)
9.
Zurück zum Zitat Cho, J., Garcia-Molina, H.: The evolution of the web and implications for an incremental crawler. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 00, p. 200C209, Cairo, Egypt (2000) Cho, J., Garcia-Molina, H.: The evolution of the web and implications for an incremental crawler. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 00, p. 200C209, Cairo, Egypt (2000)
10.
Zurück zum Zitat Dontcheva, M., Drucker, S.M., Salesin, D., Cohen, M.F.: Changes in webpage structure over time. In: UW CSE Technical Report (2007) Dontcheva, M., Drucker, S.M., Salesin, D., Cohen, M.F.: Changes in webpage structure over time. In: UW CSE Technical Report (2007)
11.
Zurück zum Zitat Dontcheva, M., Drucker, S.M., Wade, G., Salesin, D., Cohen, M.F.: Summarizing personal web browsing sessions. In: Proceedings of UIST 2006, p. 115C124, Montreux, Switzerland (2006) Dontcheva, M., Drucker, S.M., Wade, G., Salesin, D., Cohen, M.F.: Summarizing personal web browsing sessions. In: Proceedings of UIST 2006, p. 115C124, Montreux, Switzerland (2006)
12.
Zurück zum Zitat Douglis, F., Ball, T., Chen, Y.f., Koutsofios, E.: The at&t internet difference engine: Tracking and viewing changes on the web. World Wide Web 1(1), 27C44 (1998)CrossRef Douglis, F., Ball, T., Chen, Y.f., Koutsofios, E.: The at&t internet difference engine: Tracking and viewing changes on the web. World Wide Web 1(1), 27C44 (1998)CrossRef
13.
Zurück zum Zitat Fetterly, D., Manasse, M., Najork, M., Wiener, J.: A large-scale study of the evolution of web pages. In: Proceedings of WWW 2003, p. 669C678, Budapest, Hungary (2003) Fetterly, D., Manasse, M., Najork, M., Wiener, J.: A large-scale study of the evolution of web pages. In: Proceedings of WWW 2003, p. 669C678, Budapest, Hungary (2003)
14.
Zurück zum Zitat Freire, J., Kumar, B., Lieuwen, D.: Webviews: Accessing personalized web content and services. In: Proceedings of WWW 2001, p. 576C586, Hong Kong (2001) Freire, J., Kumar, B., Lieuwen, D.: Webviews: Accessing personalized web content and services. In: Proceedings of WWW 2001, p. 576C586, Hong Kong (2001)
15.
Zurück zum Zitat Greenberg, S., Boyle, M.: Generating custom notification histories by tracking visual differences between web page visits. In: Proceedings of Graphics Interface 2006, GI 06, p. 227C234, Quebec, Canada (2006) Greenberg, S., Boyle, M.: Generating custom notification histories by tracking visual differences between web page visits. In: Proceedings of Graphics Interface 2006, GI 06, p. 227C234, Quebec, Canada (2006)
16.
Zurück zum Zitat Han, J., Han, D., Lin, C., Zeng, H.J., Chen, Z., Yu, Y.: Homepage live: Automatic block tracing for web personalization. In: Proceedings of WWW 2007, p. 1C10, Alberta, Canada (2007) Han, J., Han, D., Lin, C., Zeng, H.J., Chen, Z., Yu, Y.: Homepage live: Automatic block tracing for web personalization. In: Proceedings of WWW 2007, p. 1C10, Alberta, Canada (2007)
17.
Zurück zum Zitat Hupp, D., Miller, R.C.: Smart bookmarks: automatic retroactive macro recording on the web. In: Proceedings of UIST 2007, p. 81C90, Rhode Island, USA (2007) Hupp, D., Miller, R.C.: Smart bookmarks: automatic retroactive macro recording on the web. In: Proceedings of UIST 2007, p. 81C90, Rhode Island, USA (2007)
18.
Zurück zum Zitat Kushmerick, N.: Wrapper induction for information extraction. Ph.D. thesis. University of Washington (1997) Kushmerick, N.: Wrapper induction for information extraction. Ph.D. thesis. University of Washington (1997)
19.
Zurück zum Zitat Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 03, p. 601C606, Washington, D.C (2003) Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 03, p. 601C606, Washington, D.C (2003)
20.
Zurück zum Zitat Liu, B., Zhai, Y.: Net - a system for extracting web data from flat and nested data records. In: Proceedings of the 6th International Conference on Web Information Systems Engineering, WISE 05, p. 487C495, New York, NY (2005) Liu, B., Zhai, Y.: Net - a system for extracting web data from flat and nested data records. In: Proceedings of the 6th International Conference on Web Information Systems Engineering, WISE 05, p. 487C495, New York, NY (2005)
21.
Zurück zum Zitat Liu, L., Pu, C., Tang, W.: Webcq - detecting and delivering information changes on the web. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 00, p. 512C519, Virginia, USA (2000) Liu, L., Pu, C., Tang, W.: Webcq - detecting and delivering information changes on the web. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, CIKM 00, p. 512C519, Virginia, USA (2000)
22.
Zurück zum Zitat Muslea, I., Minton, S.N., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 03, p. 415C420, Acapulco, Mexico (2003) Muslea, I., Minton, S.N., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI 03, p. 415C420, Acapulco, Mexico (2003)
23.
Zurück zum Zitat Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd. Prentice Hall Press, Upper Saddle River, NJ, USA (2009)MATH Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd. Prentice Hall Press, Upper Saddle River, NJ, USA (2009)MATH
24.
Zurück zum Zitat Sugiura, A., Koseki, Y.: Internet scrapbook: automating web browsing tasks by demonstration. In: Proceedings of UIST 1998, p. 9C18, California, USA (1998) Sugiura, A., Koseki, Y.: Internet scrapbook: automating web browsing tasks by demonstration. In: Proceedings of UIST 1998, p. 9C18, California, USA (1998)
25.
Zurück zum Zitat Teevan, J., Dumais, S.T., Liebling, D.J.: A longitudinal study of how highlighting web content change affects peoples web interactions. In: Proceedings of CHI 2010, p. 1353C1356, Georgia, USA (2010) Teevan, J., Dumais, S.T., Liebling, D.J.: A longitudinal study of how highlighting web content change affects peoples web interactions. In: Proceedings of CHI 2010, p. 1353C1356, Georgia, USA (2010)
26.
Zurück zum Zitat Teevan, J., Dumais, S.T., Liebling, D.J., Hughes, R.L.: Changing how people view changes on the web. In: Proceedings of UIST 2009, p. 237C246, BC, Canada (2009) Teevan, J., Dumais, S.T., Liebling, D.J., Hughes, R.L.: Changing how people view changes on the web. In: Proceedings of UIST 2009, p. 237C246, BC, Canada (2009)
27.
Zurück zum Zitat Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: Proceedings of WWW 2005, p. 76C85, Chiba, Japan (2005) Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: Proceedings of WWW 2005, p. 76C85, Chiba, Japan (2005)
28.
Zurück zum Zitat Zhai, Y., Liu, B.: Extracting web data using instance-based learning. World Wide Web 10(2), 113C132 (2007)CrossRef Zhai, Y., Liu, B.: Extracting web data using instance-based learning. World Wide Web 10(2), 113C132 (2007)CrossRef
29.
Zurück zum Zitat Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.: Fully automatic wrapper generation for search engines. In: Proceedings of WWW 2005, p. 66C75, Chiba, Japan (2005) Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.: Fully automatic wrapper generation for search engines. In: Proceedings of WWW 2005, p. 66C75, Chiba, Japan (2005)
Metadaten
Titel
Following the dynamic block on the Web
verfasst von
Sha Hu
Ji-Rong Wen
Zhicheng Dou
Shuo Shang
Publikationsdatum
01.11.2016
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 6/2016
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-015-0374-9

Weitere Artikel der Ausgabe 6/2016

World Wide Web 6/2016 Zur Ausgabe