skip to main content
research-article

A Model-Based Approach for Crawling Rich Internet Applications

Published:08 July 2014Publication History
Skip Abstract Section

Abstract

New Web technologies, like AJAX, result in more responsive and interactive Web applications, sometimes called Rich Internet Applications (RIAs). Crawling techniques developed for traditional Web applications are not sufficient for crawling RIAs. The inability to crawl RIAs is a problem that needs to be addressed for at least making RIAs searchable and testable. We present a new methodology, called “model-based crawling”, that can be used as a basis to design efficient crawling strategies for RIAs. We illustrate model-based crawling with a sample strategy, called the “hypercube strategy”. The performances of our model-based crawling strategies are compared against existing standard crawling strategies, including breadth-first, depth-first, and a greedy strategy. Experimental results show that our model-based crawling approach is significantly more efficient than these standard strategies.

References

  1. M. Aigner. 1973. Lexicographic matching in boolean algebras. J. Combin. Theory 14, 3, 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Amalfitano, A. R. Fasolino, and P. Tramontana. 2008. Reverse engineering finite state machines from rich Internet applications. In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). IEEE Computer Society, 69--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Amalfitano, A. R. Fasolino, and P. Tramontana. 2010. Rich Internet application testing using execution trace data. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW'10). IEEE Computer Society, 274--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Anderson. 1987. Combinatorics of Finite Sets. Oxford University Press, London.Google ScholarGoogle Scholar
  5. Apache. 2004. Apache flex. http://incubator.apache.org/flex/.Google ScholarGoogle Scholar
  6. A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. 2001. Searching the web. ACM Trans. Internet Technol. 1, 1, 2--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Bau, E. Bursztein, D. Gupta, and J. Mitchell. 2010. State of the art: Automated black-box web application vulnerability testing. In Proceedings of the IEEE Symposium on Security and Privacy (SP'10). IEEE Computer Society, 332--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Benjamin. 2010. A strategy for efficient crawling of rich Internet applications. M.S. thesis, EECS - University of Ottawa. http://ssrg.eecs.uottawa.ca/docs/Benjamin-Thesis.pdf.Google ScholarGoogle Scholar
  9. K. Benjamin, G. V. Bochmann, G.-V. Jourdan, and I.-V. Onut. 2010. Some modeling challenges when testing rich Internet applications for security. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW'10). IEEE Computer Society, 403--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Benjamin, G. Von Bochmann, M. E. Dincturk, G.-V. Jourdan, and I. V. Onut. 2011. A strategy for efficient crawling of rich Internet applications. In Proceedings of the 11th International Conference on Web Engineering (ICWE'11). Springer, 74--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C.-P. Bezemer, A. Mesbah, and A. Van Deursen. 2009. Automated security testing of web widget interactions. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Bruijn, C. Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wisk. 23, 191--194.Google ScholarGoogle Scholar
  13. G. Carpaneto, M. Dellamico, and P. Toth. 1995. Exact solution of large-scale, asymmetric traveling salesman problems. ACM Trans. Math. Softw. 21, 4, 394--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Cho and H. Garcia-Molina. 2003. Estimating frequency of change. ACM Trans. Internet Technol. 3, 3, 256--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Choudhary. 2012. M-crawler: Crawling rich Internet applications using menu meta-model. M.S. thesis, EECS - University of Ottawa. http://ssrg.site.uottawa.ca/docs/Surya-Thesis.pdf.Google ScholarGoogle Scholar
  16. S. Choudhary, M. E. Dincturk, G. V. Bochmann, G.-V. Jourdan, I. V. Onut, and P. Ionescu. 2012. Solving some modeling challenges when testing rich Internet applications for security. In Proceedings of the International Conference on Software Testing, Verification, and Validation. 850--857. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Choudhary, M. E. Dincturk, S. M. Mirtaheri, G.-V. Jourdan, G. Bochmann, and I.-V. Onut. 2013. Building rich Internet applications models: Example of a better strategy. In Proceedings of the 13th International Conference on Web Engineering (ICWE'13). Lecture Notes in Computer Science, vol. 7977, Springer, 291--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. G. Coffman, Z. Liu, and R. R. Weber. 1998. Optimal robot scheduling for web search engines. J. Schedul. 1, 1, 15--29.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. P. Dilworth. 1950. A decomposition theorem for partially ordered sets. Ann. Math. 51, 1, 161--166.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. E. Dincturk. 2013. Model-based crawling - An approach to design efficient crawling strategies for rich Internet applications. Ph.D. thesis, EECS - University of Ottawa. http://ssrg.site.uottawa.ca/docs/Dincturk_MustafaEmre_2013_thesis.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. E. Dincturk, S. Choudhary, G. Bochmann, G.-V. Jourdan, and I. V. Onut. 2012. A statistical approach for efficient crawling of rich Internet applications. In Proceedings of the 12th International Conference on Web Engineering (ICWE'12). Springer, 74--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Duda, G. Frey, D. Kossmann, R. Matter, and C. Zhou. 2009. Ajax crawl: Making Ajax applications searchable. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). IEEE Computer Society, 78--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Faheem and P. Senellart. 2013. Intelligent and adaptive crawling of web applications for web archiving. In Proceedings of the 13th International Conference on Web Engineering (ICWE'13). F. Daniel, P. Dolog, and Q. Li, Eds., Lecture Notes in Computer Science, vol. 7977, Springer, 306--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Frey. 2007. Indexing Ajax web applications. M.S. thesis, ETH Zurich. http://e-collection.library.ethz.ch/eserv/eth:30111/eth-30111-01.pdf.Google ScholarGoogle Scholar
  25. J. J. Garrett. 2005. Ajax: A new approach to web applications. http://www.adaptivepath.com/publications/essays/archives/000385.php.Google ScholarGoogle Scholar
  26. Google. 2009. Making Ajax applications crawlable. http://code.google.com/web/ajaxcrawling/index.html.Google ScholarGoogle Scholar
  27. C. Greene and D. J. Kleitman. 1976. Strong versions of Sperner's theorem. J. Combin. Theory A20, 1, 80--88.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Griggs, C. E. Killian, and C. Savage. 2004. Venn diagrams and symmetric chain decompositions in the boolean lattice. Electron. J. Combin. 11, 2.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Lu, Y. Wang, J. Liang, J. Chen, and J. Liu. 2008. An approach to deep web crawling by sampling. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'08), Vol. 1. 718--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Mesbah, E. Bozdag, and A. V. Deursen. 2008. Crawling Ajax by inferring user interface state changes. In Proceedings of the 8th International Conference on Web Engineering (ICWE'08). IEEE Computer Society, 122--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Mesbah and A. Van Deursen. 2009. Invariant-based automatic testing of Ajax user interfaces. In Proceedings of the 31st IEEE International Conference on Software Engineering (ICSE'09). 210--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Mesbah, A. Van Deursen, and S. Lenselink. 2012. Crawling Ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans. Web 6, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Microsoft. 2007. Silverlight. http://www.microsoft.com/silverlight/.Google ScholarGoogle Scholar
  34. A. Ntoulas, P. Zerfos, and J. Cho. 2005. Downloading textual hidden web content through keyword queries. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05). ACM Press, New York, 100--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Olston and M. Najork. 2010. Web crawling. Found. Trends Inf. Retr. 4, 3, 175--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep., Standford University. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.Google ScholarGoogle Scholar
  37. Z. Peng, N. He, C. Jiang, Z. Li, L. Xu, Y. Li, and Y. Ren. 2012. Graph-based Ajax crawl: Mining data from rich Internet applications. In Proceedings of the International Conference on Computer Science and Electronics Engineering (ICCSEE'12). Vol. 3, 590--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Roest, A. Mesbah, and A. Van Deursen. 2010. Regression testing Ajax applications: Coping with dynamism. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST'10). IEEE Computer Society, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. W3C. 2005. Document object model (dom). http://www.w3.org/DOM/.Google ScholarGoogle Scholar
  40. P. Wu, J.-R. Wen, H. Liu, and W.-Y. Ma. 2006. Query selection techniques for efficient crawling of structured web sources. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE Computer Society, 47. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Model-Based Approach for Crawling Rich Internet Applications

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on the Web
            ACM Transactions on the Web  Volume 8, Issue 3
            June 2014
            256 pages
            ISSN:1559-1131
            EISSN:1559-114X
            DOI:10.1145/2639948
            Issue’s Table of Contents

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 July 2014
            • Accepted: 1 December 2013
            • Revised: 1 August 2013
            • Received: 1 May 2012
            Published in tweb Volume 8, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader