Abstract
New Web technologies, like AJAX, result in more responsive and interactive Web applications, sometimes called Rich Internet Applications (RIAs). Crawling techniques developed for traditional Web applications are not sufficient for crawling RIAs. The inability to crawl RIAs is a problem that needs to be addressed for at least making RIAs searchable and testable. We present a new methodology, called “model-based crawling”, that can be used as a basis to design efficient crawling strategies for RIAs. We illustrate model-based crawling with a sample strategy, called the “hypercube strategy”. The performances of our model-based crawling strategies are compared against existing standard crawling strategies, including breadth-first, depth-first, and a greedy strategy. Experimental results show that our model-based crawling approach is significantly more efficient than these standard strategies.
- M. Aigner. 1973. Lexicographic matching in boolean algebras. J. Combin. Theory 14, 3, 187--194. Google ScholarDigital Library
- D. Amalfitano, A. R. Fasolino, and P. Tramontana. 2008. Reverse engineering finite state machines from rich Internet applications. In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). IEEE Computer Society, 69--73. Google ScholarDigital Library
- D. Amalfitano, A. R. Fasolino, and P. Tramontana. 2010. Rich Internet application testing using execution trace data. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW'10). IEEE Computer Society, 274--283. Google ScholarDigital Library
- I. Anderson. 1987. Combinatorics of Finite Sets. Oxford University Press, London.Google Scholar
- Apache. 2004. Apache flex. http://incubator.apache.org/flex/.Google Scholar
- A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. 2001. Searching the web. ACM Trans. Internet Technol. 1, 1, 2--43. Google ScholarDigital Library
- J. Bau, E. Bursztein, D. Gupta, and J. Mitchell. 2010. State of the art: Automated black-box web application vulnerability testing. In Proceedings of the IEEE Symposium on Security and Privacy (SP'10). IEEE Computer Society, 332--345. Google ScholarDigital Library
- K. Benjamin. 2010. A strategy for efficient crawling of rich Internet applications. M.S. thesis, EECS - University of Ottawa. http://ssrg.eecs.uottawa.ca/docs/Benjamin-Thesis.pdf.Google Scholar
- K. Benjamin, G. V. Bochmann, G.-V. Jourdan, and I.-V. Onut. 2010. Some modeling challenges when testing rich Internet applications for security. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW'10). IEEE Computer Society, 403--409. Google ScholarDigital Library
- K. Benjamin, G. Von Bochmann, M. E. Dincturk, G.-V. Jourdan, and I. V. Onut. 2011. A strategy for efficient crawling of rich Internet applications. In Proceedings of the 11th International Conference on Web Engineering (ICWE'11). Springer, 74--89. Google ScholarDigital Library
- C.-P. Bezemer, A. Mesbah, and A. Van Deursen. 2009. Automated security testing of web widget interactions. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 81--90. Google ScholarDigital Library
- N. Bruijn, C. Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wisk. 23, 191--194.Google Scholar
- G. Carpaneto, M. Dellamico, and P. Toth. 1995. Exact solution of large-scale, asymmetric traveling salesman problems. ACM Trans. Math. Softw. 21, 4, 394--409. Google ScholarDigital Library
- J. Cho and H. Garcia-Molina. 2003. Estimating frequency of change. ACM Trans. Internet Technol. 3, 3, 256--290. Google ScholarDigital Library
- S. Choudhary. 2012. M-crawler: Crawling rich Internet applications using menu meta-model. M.S. thesis, EECS - University of Ottawa. http://ssrg.site.uottawa.ca/docs/Surya-Thesis.pdf.Google Scholar
- S. Choudhary, M. E. Dincturk, G. V. Bochmann, G.-V. Jourdan, I. V. Onut, and P. Ionescu. 2012. Solving some modeling challenges when testing rich Internet applications for security. In Proceedings of the International Conference on Software Testing, Verification, and Validation. 850--857. Google ScholarDigital Library
- S. Choudhary, M. E. Dincturk, S. M. Mirtaheri, G.-V. Jourdan, G. Bochmann, and I.-V. Onut. 2013. Building rich Internet applications models: Example of a better strategy. In Proceedings of the 13th International Conference on Web Engineering (ICWE'13). Lecture Notes in Computer Science, vol. 7977, Springer, 291--305. Google ScholarDigital Library
- E. G. Coffman, Z. Liu, and R. R. Weber. 1998. Optimal robot scheduling for web search engines. J. Schedul. 1, 1, 15--29.Google ScholarCross Ref
- R. P. Dilworth. 1950. A decomposition theorem for partially ordered sets. Ann. Math. 51, 1, 161--166.Google ScholarCross Ref
- M. E. Dincturk. 2013. Model-based crawling - An approach to design efficient crawling strategies for rich Internet applications. Ph.D. thesis, EECS - University of Ottawa. http://ssrg.site.uottawa.ca/docs/Dincturk_MustafaEmre_2013_thesis.pdf. Google ScholarDigital Library
- M. E. Dincturk, S. Choudhary, G. Bochmann, G.-V. Jourdan, and I. V. Onut. 2012. A statistical approach for efficient crawling of rich Internet applications. In Proceedings of the 12th International Conference on Web Engineering (ICWE'12). Springer, 74--89. Google ScholarDigital Library
- C. Duda, G. Frey, D. Kossmann, R. Matter, and C. Zhou. 2009. Ajax crawl: Making Ajax applications searchable. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). IEEE Computer Society, 78--89. Google ScholarDigital Library
- M. Faheem and P. Senellart. 2013. Intelligent and adaptive crawling of web applications for web archiving. In Proceedings of the 13th International Conference on Web Engineering (ICWE'13). F. Daniel, P. Dolog, and Q. Li, Eds., Lecture Notes in Computer Science, vol. 7977, Springer, 306--322. Google ScholarDigital Library
- G. Frey. 2007. Indexing Ajax web applications. M.S. thesis, ETH Zurich. http://e-collection.library.ethz.ch/eserv/eth:30111/eth-30111-01.pdf.Google Scholar
- J. J. Garrett. 2005. Ajax: A new approach to web applications. http://www.adaptivepath.com/publications/essays/archives/000385.php.Google Scholar
- Google. 2009. Making Ajax applications crawlable. http://code.google.com/web/ajaxcrawling/index.html.Google Scholar
- C. Greene and D. J. Kleitman. 1976. Strong versions of Sperner's theorem. J. Combin. Theory A20, 1, 80--88.Google ScholarCross Ref
- J. Griggs, C. E. Killian, and C. Savage. 2004. Venn diagrams and symmetric chain decompositions in the boolean lattice. Electron. J. Combin. 11, 2.Google ScholarCross Ref
- J. Lu, Y. Wang, J. Liang, J. Chen, and J. Liu. 2008. An approach to deep web crawling by sampling. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'08), Vol. 1. 718--724. Google ScholarDigital Library
- A. Mesbah, E. Bozdag, and A. V. Deursen. 2008. Crawling Ajax by inferring user interface state changes. In Proceedings of the 8th International Conference on Web Engineering (ICWE'08). IEEE Computer Society, 122--134. Google ScholarDigital Library
- A. Mesbah and A. Van Deursen. 2009. Invariant-based automatic testing of Ajax user interfaces. In Proceedings of the 31st IEEE International Conference on Software Engineering (ICSE'09). 210--220. Google ScholarDigital Library
- A. Mesbah, A. Van Deursen, and S. Lenselink. 2012. Crawling Ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans. Web 6, 1. Google ScholarDigital Library
- Microsoft. 2007. Silverlight. http://www.microsoft.com/silverlight/.Google Scholar
- A. Ntoulas, P. Zerfos, and J. Cho. 2005. Downloading textual hidden web content through keyword queries. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05). ACM Press, New York, 100--109. Google ScholarDigital Library
- C. Olston and M. Najork. 2010. Web crawling. Found. Trends Inf. Retr. 4, 3, 175--246. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep., Standford University. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.Google Scholar
- Z. Peng, N. He, C. Jiang, Z. Li, L. Xu, Y. Li, and Y. Ren. 2012. Graph-based Ajax crawl: Mining data from rich Internet applications. In Proceedings of the International Conference on Computer Science and Electronics Engineering (ICCSEE'12). Vol. 3, 590--594. Google ScholarDigital Library
- D. Roest, A. Mesbah, and A. Van Deursen. 2010. Regression testing Ajax applications: Coping with dynamism. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST'10). IEEE Computer Society, 127--136. Google ScholarDigital Library
- W3C. 2005. Document object model (dom). http://www.w3.org/DOM/.Google Scholar
- P. Wu, J.-R. Wen, H. Liu, and W.-Y. Ma. 2006. Query selection techniques for efficient crawling of structured web sources. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE Computer Society, 47. Google ScholarDigital Library
Index Terms
- A Model-Based Approach for Crawling Rich Internet Applications
Recommendations
Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes
Using JavaScript and dynamic DOM manipulation on the client side of Web applications is becoming a widespread approach for achieving rich interactivity and responsiveness in modern Web applications. At the same time, such techniques---collectively known ...
A statistical approach for efficient crawling of rich internet applications
ICWE'12: Proceedings of the 12th international conference on Web EngineeringModern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling ...
Building rich internet applications models: example of a better strategy
ICWE'13: Proceedings of the 13th international conference on Web EngineeringCrawling "classical" web applications is a problem that has been addressed more than a decode ago. Efficient crawling of web applications that use advanced technologies such as AJAX (called Rich Internet Applications, RIAs) is still an open problem. ...
Comments