Skip to main content
Top
Published in: Wireless Personal Communications 2/2021

03-02-2021

Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction

Authors: E. Suganya, S. Vijayarani

Published in: Wireless Personal Communications | Issue 2/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Web citation analysis is primarily used to examine the impact of an author, an article or a publication by counting the amount of intervals that has been cited by other authors. The significant goal of web citation analysis is helping the researchers to find their related papers for their further analysis. The references of the paper must be cited by the author, which is used to recognize the link among the previous relevant research. The citation provides the valuable information that directs the researchers to gain knowledge about the current trends and future developments and obtain new ideas in their respective fields. The citation’s information are incorporate in the database called web citation database such as Google Scholar, Web of Science, Scopus and so on. From the web citation database, extracting the user required information is very complex task. Most of the open source tools are available online but manual process is needed to select the user-required information from the web page. For instance if the user need author name and publisher from the web citation database, they required to choose the exact information tags manually in existing tools which consumes more time. To overcome this difficulty we proposed an algorithm Firefly Optimization Algorithm based Web Scraping for web content extraction from web citation database. The primary purpose of this research is to extract author information extraction process extracts citation information published by an author, journal name, publisher, year and citation using web citation analysis. The user’s query input will be the keyword for example big data, artificial intelligence, etc. The web citation information from multiple web pages using Web crawling and web scraping techniques are applied for web citation information based on the user query and Particle Swarm Optimization, Hidden Markov Model are applied for finding the best solution from all the feasible solutions. Experiments illustrate the proposed FOAWS algorithm outperforms well comparing to other two algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sari, F. S., & Kurniawan, A. (2010). Implementation of Indonesian electronic citation system based on web. Third International Conference on Knowledge Discovery and Data Mining—IEEE Transaction, 1, 494–497. Sari, F. S., & Kurniawan, A. (2010). Implementation of Indonesian electronic citation system based on web. Third International Conference on Knowledge Discovery and Data Mining—IEEE Transaction, 1, 494–497.
2.
go back to reference Liu, B. (2007). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Berlin: Springer.MATH Liu, B. (2007). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Berlin: Springer.MATH
3.
go back to reference Mehra, J., & Thakur, R. S. (2018). An Effective method for Web Log 3 preprocessing and page access frequency using web usage mining. International Journal of Applied Engineering Research, 13(2), 1227–1232. Mehra, J., & Thakur, R. S. (2018). An Effective method for Web Log 3 preprocessing and page access frequency using web usage mining. International Journal of Applied Engineering Research, 13(2), 1227–1232.
4.
go back to reference Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., et al. (2013). Web information retrieval. Berlin: Springer.CrossRef Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., et al. (2013). Web information retrieval. Berlin: Springer.CrossRef
5.
go back to reference Khalil, S., & Fakir, M. (2017). R Crawler: An R package for parallel web crawling and scraping. Software X, 6, 98–106. Khalil, S., & Fakir, M. (2017). R Crawler: An R package for parallel web crawling and scraping. Software X, 6, 98–106.
6.
go back to reference Ammulu, K., & Venugopal, T. (2017). Mining web data using PSO algorithm. International Journal for Innovative Research in Science & Technology, 4(2), 201–207. Ammulu, K., & Venugopal, T. (2017). Mining web data using PSO algorithm. International Journal for Innovative Research in Science & Technology, 4(2), 201–207.
7.
go back to reference Li, R. (2013). Web information extraction by using improved hybrid intelligent algorithm and HMM. In Information and computer technologies, computer modelling and new technologies (pp. 251–259). Li, R. (2013). Web information extraction by using improved hybrid intelligent algorithm and HMM. In Information and computer technologies, computer modelling and new technologies (pp. 251–259).
8.
go back to reference Kim, C., & Shim, K. (2011). TEXT: Automatic template extraction from heterogeneous web pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612–626.CrossRef Kim, C., & Shim, K. (2011). TEXT: Automatic template extraction from heterogeneous web pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612–626.CrossRef
9.
go back to reference Udapure, T. V., Kale, R. D., & Dharmik, R. C. (2014). Study of web crawler and its different types. Journal of Computer Engineering, 16(1), 01–05. Udapure, T. V., Kale, R. D., & Dharmik, R. C. (2014). Study of web crawler and its different types. Journal of Computer Engineering, 16(1), 01–05.
10.
go back to reference Balaji, H., & Govardhan, A. (2014). A survey on hidden Markov models for information extraction. International Journal of Application or Innovation in Engineering & Management, 3(8), 176–179. Balaji, H., & Govardhan, A. (2014). A survey on hidden Markov models for information extraction. International Journal of Application or Innovation in Engineering & Management, 3(8), 176–179.
11.
go back to reference Saklani, N. S., & Sharma, S. (2016). Extracting news from the web pages by using concept of clustering with neural genetic approach. International Journal of Advance Research, Ideas, and Innovations in Technology, 2(5), 1–17. Saklani, N. S., & Sharma, S. (2016). Extracting news from the web pages by using concept of clustering with neural genetic approach. International Journal of Advance Research, Ideas, and Innovations in Technology, 2(5), 1–17.
12.
go back to reference Bhagat, A., & Raut, V. (2015). Ternary based web crawler for optimized search results. International Journal of Computer Science and Information Technologies, 6(5), 4444–4449. Bhagat, A., & Raut, V. (2015). Ternary based web crawler for optimized search results. International Journal of Computer Science and Information Technologies, 6(5), 4444–4449.
13.
go back to reference Ge, A., Mao, W., & Zeng, D. (2010). Story extraction from the Web: A case study in security informatics. In IEEE international conference on service operations and logistics, and informatics (pp. 306–310). Ge, A., Mao, W., & Zeng, D. (2010). Story extraction from the Web: A case study in security informatics. In IEEE international conference on service operations and logistics, and informatics (pp. 306–310).
14.
go back to reference Huang J., Liu, Z., Wang, B., et al. (2018). Web data extraction from scientific publishers’ website using hidden Markov model. In Knowledge science, engineering and management (pp. 469–476). Berlin: Springer. Huang J., Liu, Z., Wang, B., et al. (2018). Web data extraction from scientific publishers’ website using hidden Markov model. In Knowledge science, engineering and management (pp. 469–476). Berlin: Springer.
15.
go back to reference Powley, B., & Dale, R. (2007). High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In International conference on natural language processing and knowledge engineering (pp. 119–124). Beijing: IEEE Xplore. Powley, B., & Dale, R. (2007). High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers. In International conference on natural language processing and knowledge engineering (pp. 119–124). Beijing: IEEE Xplore.
16.
go back to reference Liangtu, S., & Xiaoming, Z. (2007). Web text feature extraction David with particle swarm optimization. International Journal of Computer Science and Network Security, 7(6), 132–136. Liangtu, S., & Xiaoming, Z. (2007). Web text feature extraction David with particle swarm optimization. International Journal of Computer Science and Network Security, 7(6), 132–136.
17.
go back to reference Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden Markov model structure for information extraction. AAAI-99 Workshop on a machine (pp. 37–42). Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden Markov model structure for information extraction. AAAI-99 Workshop on a machine (pp. 37–42).
18.
go back to reference Linnet Hailey, V. P. A., & Sudha, N. (2013). An optimization approach of firefly algorithm to record deduplication. International Journal of Engineering Research & Technology, 2(9), 2045–2049. Linnet Hailey, V. P. A., & Sudha, N. (2013). An optimization approach of firefly algorithm to record deduplication. International Journal of Engineering Research & Technology, 2(9), 2045–2049.
19.
go back to reference Alexandrescu, A. (2019). Optimization and security in information retrieval, extraction, processing, and presentation on a cloud platform. Information, 10, 200.CrossRef Alexandrescu, A. (2019). Optimization and security in information retrieval, extraction, processing, and presentation on a cloud platform. Information, 10, 200.CrossRef
20.
go back to reference Pal, A., Tomar, D. S., & Shrivastava, S. C. (2009). Effective focused crawling based on content and link structure analysis. International Journal of Computer Science and Information Security, 2(1), 103–107. Pal, A., Tomar, D. S., & Shrivastava, S. C. (2009). Effective focused crawling based on content and link structure analysis. International Journal of Computer Science and Information Security, 2(1), 103–107.
21.
go back to reference Pranav, A., & Chauhan, S. (2015). Efficient focused web crawling approach for search engine. International Journal of Computer Science and Mobile Computing, 4(5), 545–551. Pranav, A., & Chauhan, S. (2015). Efficient focused web crawling approach for search engine. International Journal of Computer Science and Mobile Computing, 4(5), 545–551.
22.
go back to reference Kausar, Md. A., Dhaka, V. S., & Singh, S. V. (2013). Web crawler: A review. International Journal of Computer Applications, 63(2), 31–36.CrossRef Kausar, Md. A., Dhaka, V. S., & Singh, S. V. (2013). Web crawler: A review. International Journal of Computer Applications, 63(2), 31–36.CrossRef
23.
go back to reference David R. H., & Miller, T. L. (1999). Bbn at trec7: Using hidden markov models for information retrieval. TREC - 7. David R. H., & Miller, T. L. (1999). Bbn at trec7: Using hidden markov models for information retrieval. TREC - 7.
24.
go back to reference Shekhar, S., Agrawal, R., & Arya, K. V. (2010). An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections. In International conference on advances in computer engineering (pp. 29–33). IEEE Conference Publications. Shekhar, S., Agrawal, R., & Arya, K. V. (2010). An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections. In International conference on advances in computer engineering (pp. 29–33). IEEE Conference Publications.
25.
go back to reference Slamet, C., Andrian, R., Maylawati, D.S., et al. (2018). Web scraping and naïve Bayes classification for job search engine. In IOP conference series: Materials science and engineering. Indonesia: IOP Science.1–7. Slamet, C., Andrian, R., Maylawati, D.S., et al. (2018). Web scraping and naïve Bayes classification for job search engine. In IOP conference series: Materials science and engineering. Indonesia: IOP Science.1–7.
26.
go back to reference Munzert, S., Rubba, C., Meissner, P., & Nyhuis, D. (2015). Automated data collection with R: A practical guide to web scraping and text mining. Journal of Statistical Software, 68, 1–3. Munzert, S., Rubba, C., Meissner, P., & Nyhuis, D. (2015). Automated data collection with R: A practical guide to web scraping and text mining. Journal of Statistical Software, 68, 1–3.
27.
go back to reference Malik, S. K., & Rizvi, S. (2011). Information extraction using web usage mining, web scrapping and semantic annotation. In International conference on computational intelligence and communication networks (pp. 465–469). Malik, S. K., & Rizvi, S. (2011). Information extraction using web usage mining, web scrapping and semantic annotation. In International conference on computational intelligence and communication networks (pp. 465–469).
29.
go back to reference Kennedy, J., & Eberhart, R, (1995). Particle swarm optimization. In International conference on neural networks-Australia (pp. 1942–1948). IEEE. Kennedy, J., & Eberhart, R, (1995). Particle swarm optimization. In International conference on neural networks-Australia (pp. 1942–1948). IEEE.
30.
go back to reference Younus, Z. S., Mohamad, D., Saba, T., Alkawaz, M. H., et al. (2014). Content-based image retrieval using PSO and k-means clustering algorithm. Arabian Journal of Geosciences, 8(8), 6211–6224.CrossRef Younus, Z. S., Mohamad, D., Saba, T., Alkawaz, M. H., et al. (2014). Content-based image retrieval using PSO and k-means clustering algorithm. Arabian Journal of Geosciences, 8(8), 6211–6224.CrossRef
31.
go back to reference Lai, J., Liu, Q., & Liu, Y. (2010). Web information extraction based on hidden Markov model. In International conference on computer supported cooperative work in design (pp. 234–238). IEEE Explore. Lai, J., Liu, Q., & Liu, Y. (2010). Web information extraction based on hidden Markov model. In International conference on computer supported cooperative work in design (pp. 234–238). IEEE Explore.
32.
go back to reference Larabi Marie-Sainte, S., & Alalyani, A. (2018). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University – Computer and Information Sciences, 32(3), 320–328.CrossRef Larabi Marie-Sainte, S., & Alalyani, A. (2018). Firefly algorithm based feature selection for Arabic text classification. Journal of King Saud University – Computer and Information Sciences, 32(3), 320–328.CrossRef
Metadata
Title
Firefly Optimization Algorithm Based Web Scraping for Web Citation Extraction
Authors
E. Suganya
S. Vijayarani
Publication date
03-02-2021
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 2/2021
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-021-08093-z

Other articles of this Issue 2/2021

Wireless Personal Communications 2/2021 Go to the issue