Skip to main content
Erschienen in: The Journal of Supercomputing 3/2020

28.03.2018

GUIDE: an interactive and incremental approach for crawling Web applications

verfasst von: Chien-Hung Liu, Woei-Kae Chen, Chi-Chia Sun

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Internet, having a sea of Web applications, is one of the largest data stores for big data analysis. To explore and retrieve the states (pages) from Web applications, Web crawlers have been extensively used. Most crawlers allow the users to define a few crawling directives so as to increase the coverage of states that the crawler can explore. A directive can, for example, assign an input value to a specified input field so that the application is instructed to perform a specific action and visit some special states. Note that, a crawler is supposedly capable of exploring an unknown application. But, given an unknown application, how could the user possibly prepare the required directives in advance? This paper proposes an interactive crawling approach and a crawler called GUIDE to overcome this issue. Instead of passively receiving directives from the user, GUIDE actively asks the user for directives when Web pages containing input fields are found. In addition, GUIDE offers a hierarchical directive structure, allowing the user to define multiple values for the same input field. A case study with three Web applications indicated that (1) interactive directives were very useful for increasing the code coverage of the application being explored—up to 10.3–50.5% of code coverage improvement can be achieved, and (2) using GUIDE is more efficient than using a traditional crawler—given the same amount of time, up to 11% of code coverage improvement can be achieved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Brin S, Page L (1998) The anatomy of a large-scale hypertexual web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRef Brin S, Page L (1998) The anatomy of a large-scale hypertexual web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRef
3.
Zurück zum Zitat Burner M (1997) Crawling towards eternity: building an archive of the world wide web. Web Tech. Mag. 2(5):37–40 Burner M (1997) Crawling towards eternity: building an archive of the world wide web. Web Tech. Mag. 2(5):37–40
4.
Zurück zum Zitat Ferrucci F, Sarro F, Ronca D, Abrahao S (2011) A crawljax based approach to exploit traditional accessibility evaluation tools for AJAX applications. In: Information Technology and Innovation Trends in Organizations. Springer, pp 255–262 Ferrucci F, Sarro F, Ronca D, Abrahao S (2011) A crawljax based approach to exploit traditional accessibility evaluation tools for AJAX applications. In: Information Technology and Innovation Trends in Organizations. Springer, pp 255–262
5.
Zurück zum Zitat Muñoz FR, Cortes IIS, Villalba LJG (2017) Enlargement of vulnerable web applications for testing. J Supercomput Muñoz FR, Cortes IIS, Villalba LJG (2017) Enlargement of vulnerable web applications for testing. J Supercomput
6.
Zurück zum Zitat Park JH, Sung Y, Sharma PK, Jeong Y-S, Yi G (2017) Novel assessment method for accessing private data in social network security services. J Supercomput 73(7):3307–3325CrossRef Park JH, Sung Y, Sharma PK, Jeong Y-S, Yi G (2017) Novel assessment method for accessing private data in social network security services. J Supercomput 73(7):3307–3325CrossRef
7.
Zurück zum Zitat Groeneveld F, Mesbah A, van Deursen A (2010) Automatic invariant detection in dynamic web applications. Technical Report Series TUD-SERG-2010-037 Groeneveld F, Mesbah A, van Deursen A (2010) Automatic invariant detection in dynamic web applications. Technical Report Series TUD-SERG-2010-037
8.
Zurück zum Zitat Mesbah A, Prasad MR (2011) Automated cross-browser compatibility testing. In: Proceedings of the 33rd International Conference on Software Engineering. ACM, pp 561–570 Mesbah A, Prasad MR (2011) Automated cross-browser compatibility testing. In: Proceedings of the 33rd International Conference on Software Engineering. ACM, pp 561–570
9.
Zurück zum Zitat Mirshokraie S, Mesbah A (2012) JSART: Javascript assertion-based regression testing. In: Web Engineering. pp 238–252 Mirshokraie S, Mesbah A (2012) JSART: Javascript assertion-based regression testing. In: Web Engineering. pp 238–252
10.
Zurück zum Zitat Tanida H, Prasad MR, Rajan SP, Fujita M (2011) Automated system testing of dynamic web applications. In: ICSOFT (Selected Papers). Springer, pp 181–196 Tanida H, Prasad MR, Rajan SP, Fujita M (2011) Automated system testing of dynamic web applications. In: ICSOFT (Selected Papers). Springer, pp 181–196
11.
Zurück zum Zitat Mesbah A, van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans Web (TWEB) 6(1):3 Mesbah A, van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans Web (TWEB) 6(1):3
12.
Zurück zum Zitat Silva CE, Campos JC (2013) Combining static and dynamic analysis for the reverse engineering of web applications. In: Proceedings of the 5th ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, pp 107–112 Silva CE, Campos JC (2013) Combining static and dynamic analysis for the reverse engineering of web applications. In: Proceedings of the 5th ACM SIGCHI Symposium on Engineering Interactive Computing Systems. ACM, pp 107–112
13.
Zurück zum Zitat Olston C, Najork M (2010) Web crawling. Found. Trends Inf. Retr. 4(3):175–246CrossRef Olston C, Najork M (2010) Web crawling. Found. Trends Inf. Retr. 4(3):175–246CrossRef
14.
Zurück zum Zitat Choudhary S, Dincturk ME, Mirtaheri SM, Moosavi A, von Bochmann G, Jourdan G-V, Onut IV (2012) Crawling rich internet applications: the state of the art. In: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’12, IBM Corp, Riverton, pp 146–160 Choudhary S, Dincturk ME, Mirtaheri SM, Moosavi A, von Bochmann G, Jourdan G-V, Onut IV (2012) Crawling rich internet applications: the state of the art. In: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’12, IBM Corp, Riverton, pp 146–160
15.
Zurück zum Zitat Mirtaheri SM, Dinçtürk ME, Hooshmand S, Bochmann GV, Jourdan G-V, Onut IV (2013) A brief history of web crawlers. In: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’13. IBM Corp, Riverton, pp 40–54 Mirtaheri SM, Dinçtürk ME, Hooshmand S, Bochmann GV, Jourdan G-V, Onut IV (2013) A brief history of web crawlers. In: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’13. IBM Corp, Riverton, pp 40–54
16.
Zurück zum Zitat van Deursen A, Mesbah A, Nederlof A (2015) Crawl-based analysis of web applications. Sci. Comput. Program. 97(P1):173–180CrossRef van Deursen A, Mesbah A, Nederlof A (2015) Crawl-based analysis of web applications. Sci. Comput. Program. 97(P1):173–180CrossRef
17.
Zurück zum Zitat Fard AM, Mesbah A (2013) Feedback-directed exploration of web applications to derive test models. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). pp 278–287 Fard AM, Mesbah A (2013) Feedback-directed exploration of web applications to derive test models. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). pp 278–287
18.
Zurück zum Zitat Dincturk ME, Choudhary S, von Bochmann G, Jourdan G-V, Onut IV (2012) A statistical approach for efficient crawling of rich internet applications. In: Proceedings of the 12th International Conference on Web Engineering, ICWE’12. Springer, Berlin, pp 362–369 Dincturk ME, Choudhary S, von Bochmann G, Jourdan G-V, Onut IV (2012) A statistical approach for efficient crawling of rich internet applications. In: Proceedings of the 12th International Conference on Web Engineering, ICWE’12. Springer, Berlin, pp 362–369
19.
Zurück zum Zitat Choudhary S, Dincturk ME, Mirtaheri SM, Jourdan G-V, Bochmann GV, Onut IV (2013) Building rich internet applications models: example of a better strategy. In: Proceedings of the 13th International Conference on Web Engineering, ICWE’13. Springer, Berlin, pp 291–305 Choudhary S, Dincturk ME, Mirtaheri SM, Jourdan G-V, Bochmann GV, Onut IV (2013) Building rich internet applications models: example of a better strategy. In: Proceedings of the 13th International Conference on Web Engineering, ICWE’13. Springer, Berlin, pp 291–305
20.
Zurück zum Zitat Dincturk ME, Jourdan G-V, Bochmann GV, Onut IV (2014) A model-based approach for crawling rich internet applications. ACM Trans. Web 8(3):19:1–19:39CrossRef Dincturk ME, Jourdan G-V, Bochmann GV, Onut IV (2014) A model-based approach for crawling rich internet applications. ACM Trans. Web 8(3):19:1–19:39CrossRef
21.
Zurück zum Zitat Moosavi A, Hooshmand S, Baghbanzadeh S, Jourdan G-V, Bochmann GV, Onut IV (2014) Indexing rich internet applications using components-based crawling. Springer International Publishing, Cham, pp 200–217 Moosavi A, Hooshmand S, Baghbanzadeh S, Jourdan G-V, Bochmann GV, Onut IV (2014) Indexing rich internet applications using components-based crawling. Springer International Publishing, Cham, pp 200–217
22.
Zurück zum Zitat Artzi S, Dolby J, Jensen SH, Møller A, Tip F (2011) A framework for automated testing of javascript web applications. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11. ACM, New York, pp 571–580 Artzi S, Dolby J, Jensen SH, Møller A, Tip F (2011) A framework for automated testing of javascript web applications. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11. ACM, New York, pp 571–580
23.
Zurück zum Zitat Pellegrino G, Tschürtz C, Bodden E, Rossow C (2015) jÄk: using dynamic analysis to crawl and test modern web applications. Springer International Publishing, Cham, pp 295–316 Pellegrino G, Tschürtz C, Bodden E, Rossow C (2015) jÄk: using dynamic analysis to crawl and test modern web applications. Springer International Publishing, Cham, pp 295–316
24.
Zurück zum Zitat Chen W-K, Liu C-H, Chen K-MA (2017) Web crawler supporting interactive and incremental user directives. In: Proceedings of the 6th International Conference on Frontier Computing Theory, Technologies, and Applications. pp 105–114 Chen W-K, Liu C-H, Chen K-MA (2017) Web crawler supporting interactive and incremental user directives. In: Proceedings of the 6th International Conference on Frontier Computing Theory, Technologies, and Applications. pp 105–114
Metadaten
Titel
GUIDE: an interactive and incremental approach for crawling Web applications
verfasst von
Chien-Hung Liu
Woei-Kae Chen
Chi-Chia Sun
Publikationsdatum
28.03.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2335-4

Weitere Artikel der Ausgabe 3/2020

The Journal of Supercomputing 3/2020 Zur Ausgabe

Premium Partner