Skip to main content
Top
Published in:
Cover of the book

2017 | OriginalPaper | Chapter

Fuzzy Based Efficient Mechanism for URL Assignment in Dynamic Web Crawler

Authors : Raghav Sharma, Rajesh Bhatia, Sahil Garg, Gagangeet Singh Aujla, Ravinder Singh Mann

Published in: Advanced Informatics for Computing Research

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

World wide web (WWW) is a huge collection of unorganized documents. To build the database from this unorganized network, web crawlers are often used. The crawler which interacts with millions of web pages needs to be efficient in order to make a search engine powerful. This utmost requirement necessitates the parallelization of web crawlers. In this work, a fuzzy-based technique for uniform resource locater (URL) assignment in dynamic web crawler is proposed that utilizes the task splitting property of the processor. In order to optimize the performance of the crawler, the proposed scheme addresses two important aspects, (i) creation of crawling framework with load balancing among parallel crawlers, and (ii) making of crawling process faster by using parallel crawlers with efficient network access. Several experiments are conducted to monitor the performance of the proposed scheme. The results prove the effectiveness of the proposed scheme.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Grover, S., Aujla, G.S.: Twitter data based prediction model for influenza epidemic. In: 2nd IEEE International Conference on Computing for Sustainable Global Development (INDIACom), pp. 873–879, March 2015 Grover, S., Aujla, G.S.: Twitter data based prediction model for influenza epidemic. In: 2nd IEEE International Conference on Computing for Sustainable Global Development (INDIACom), pp. 873–879, March 2015
2.
go back to reference Etter, V., Grossglauser, M., Thiran, P.: Launch hard or go home!: predicting the success of kickstarter campaigns. In: 1st ACM Conference on Online Social Networks, pp. 177–182 (2013) Etter, V., Grossglauser, M., Thiran, P.: Launch hard or go home!: predicting the success of kickstarter campaigns. In: 1st ACM Conference on Online Social Networks, pp. 177–182 (2013)
3.
go back to reference Seyfi, A., Patel, A.: A focused crawler combinatory link and content model based on t-graph principles. Comput. Standards Interfaces 43, 1–11 (2016) Seyfi, A., Patel, A.: A focused crawler combinatory link and content model based on t-graph principles. Comput. Standards Interfaces 43, 1–11 (2016)
4.
go back to reference Lu, H., Zhan, D., Zhou, L., He, D.: An improved focused crawler: using web page classification and link priority evaluation. Math. Probl. Eng. (2016) Lu, H., Zhan, D., Zhou, L., He, D.: An improved focused crawler: using web page classification and link priority evaluation. Math. Probl. Eng. (2016)
5.
go back to reference Merlet, J.-P., Gosselin, C., Huang, T.: Parallel mechanisms. In: Springer Hand-book of Robotics, pp. 443–462. Springer, Heidelberg (2016) Merlet, J.-P., Gosselin, C., Huang, T.: Parallel mechanisms. In: Springer Hand-book of Robotics, pp. 443–462. Springer, Heidelberg (2016)
6.
go back to reference Marin, M., Paredes, R., Bonacic, C.: High-performance priority queues for parallel crawlers. In: 10th ACM Workshop on Web Information and Data Management, pp. 47–54 (2008) Marin, M., Paredes, R., Bonacic, C.: High-performance priority queues for parallel crawlers. In: 10th ACM Workshop on Web Information and Data Management, pp. 47–54 (2008)
7.
go back to reference Ahmadi-Abkenari, F., Selamat, A.: An architecture for a focused trend parallel web crawler with the application of clickstream analysis. Inf. Sci. 184(1), 266–281 (2012)CrossRef Ahmadi-Abkenari, F., Selamat, A.: An architecture for a focused trend parallel web crawler with the application of clickstream analysis. Inf. Sci. 184(1), 266–281 (2012)CrossRef
8.
go back to reference Cho, J., Garcia-Molina, H.: Parallel crawlers. In: 11th ACM International Conference on World Wide Web, pp. 124–135 (2002) Cho, J., Garcia-Molina, H.: Parallel crawlers. In: 11th ACM International Conference on World Wide Web, pp. 124–135 (2002)
9.
go back to reference Chau, D.H., Pandit, S., Wang, S., Faloutsos, C.: Parallel crawling for online social networks. In: 16th ACM International Conference on World Wide Web, pp. 1283–1284 (2007) Chau, D.H., Pandit, S., Wang, S., Faloutsos, C.: Parallel crawling for online social networks. In: 16th ACM International Conference on World Wide Web, pp. 1283–1284 (2007)
10.
go back to reference Batsakis, S.E., Petrakis, G., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 1001–1013 (2009) Batsakis, S.E., Petrakis, G., Milios, E.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 1001–1013 (2009)
11.
go back to reference Yadav, D., Sharma, A., Sanchez-Cuadrado, S., Morato, J.: An approach to design incremental parallel webcrawler. J. Theoret. Appl. Inf. Technol. 43(1), 08–29 (2012) Yadav, D., Sharma, A., Sanchez-Cuadrado, S., Morato, J.: An approach to design incremental parallel webcrawler. J. Theoret. Appl. Inf. Technol. 43(1), 08–29 (2012)
12.
go back to reference Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Trans. Internet Technol. 1(1), 2–43 (2001)CrossRef Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Trans. Internet Technol. 1(1), 2–43 (2001)CrossRef
13.
go back to reference Bedi, P., Thukral, A., Banati, H., Behl, A., Mendiratta, V.: A multi-threaded semantic focused crawler. J. Comput. Sci. Technol. 27(6), 1233–1242 (2012)CrossRef Bedi, P., Thukral, A., Banati, H., Behl, A., Mendiratta, V.: A multi-threaded semantic focused crawler. J. Comput. Sci. Technol. 27(6), 1233–1242 (2012)CrossRef
14.
go back to reference Zhao, F., Zhou, J., Nie, C., Huang, H., Jin, H.: Smartcrawler: a two-stage crawler for efficiently harvesting deep-web interfaces. IEEE Trans. Serv. Comput. 9(4), 608–620 (2016)CrossRef Zhao, F., Zhou, J., Nie, C., Huang, H., Jin, H.: Smartcrawler: a two-stage crawler for efficiently harvesting deep-web interfaces. IEEE Trans. Serv. Comput. 9(4), 608–620 (2016)CrossRef
15.
go back to reference Guerriero, A., Ragni, F., Martines, C.: A dynamic URL assignment method for parallel web crawler. In: IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, pp. 119–123, September 2010 Guerriero, A., Ragni, F., Martines, C.: A dynamic URL assignment method for parallel web crawler. In: IEEE International Conference on Computational Intelligence for Measurement Systems and Applications, pp. 119–123, September 2010
16.
go back to reference Bhaginath, W.R., Shingade, S., Shirole, M.: Virtualized dynamic URL assignment web crawling model. In: International Conference on Advances in Engineering Technology Research, pp. 1–7, August 2014 Bhaginath, W.R., Shingade, S., Shirole, M.: Virtualized dynamic URL assignment web crawling model. In: International Conference on Advances in Engineering Technology Research, pp. 1–7, August 2014
Metadata
Title
Fuzzy Based Efficient Mechanism for URL Assignment in Dynamic Web Crawler
Authors
Raghav Sharma
Rajesh Bhatia
Sahil Garg
Gagangeet Singh Aujla
Ravinder Singh Mann
Copyright Year
2017
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-5780-9_1

Premium Partner